Commit Graph

8075 Commits

Author SHA1 Message Date
Fangrui Song 65bbd235a6 Align buffer for cmsg
`char control[CMSG_SPACE(sizeof(*out_cmsg_value)) + 1];` is not aligned as
cmsghdr. Accessing its member may lead to a -fsanitize=alignment failure.
Align `control` to struct cmsghdr to fix the issue.

PiperOrigin-RevId: 555951052
2023-08-11 07:54:10 -07:00
gVisor bot 31b8262ce0 Merge pull request #9146 from sitano:ivan_nocompressio
PiperOrigin-RevId: 555750100
2023-08-10 19:19:42 -07:00
Nicolas Lacasse 2f93ddbe62 Create kernel.SendExternalSignalProcessGroup and use it in boot/loader.go
This will send a signal to all processes (ThreadGroups) in a ProcessGroup.

PiperOrigin-RevId: 555679773
2023-08-10 15:37:22 -07:00
Nayana Bidari a6c3a61c29 Make changes for CgroupsReadControlFile method to get memory usage per cgroup.
- Adds methods to Enter, Leave and Migrate the tasks for memory controller.
- Update the memory cgroup id when the task enters and leaves the cgroups.

PiperOrigin-RevId: 555568875
2023-08-10 11:09:28 -07:00
Ivan Prisyazhnyy 669726877e state: compressio: don't use flate for some workloads
checkpoint image compression (compressio) implies additional overhead
during its operations. when gvisor restores the kernel state inflate()
algorithm requires:

- CPU to un/compress the data
- Memory blocks to store and un/compress data

memory blocks originate from the bytes.Buffers and the sync.Pool that
tries to reuse them. they are released only when the system decides it
is a good moment:

    pool.go: runtime_registerPoolCleanup(poolCleanup)

in my system (and in production) it takes around 240s to get the related
memory region freed (unmapped()).

during that period of time from the image state is read and the kernel is
loaded till the moment when the `poolCleanup` is called + GC() releasing
buffers gVisor Kernel (sandbox) process holds tens and hundreds of
megabytes of anonymous memory pages (RAM) busy (allocated+reserved).

pretty much often, the memory overhead of using compression can result in x2
memory overhead in production system with checkpoints restore and +100ms
(hundreds) ms of startup latency just to uncompress the image.

our use case does not suffer from having uncompressed images on disk but
suffer from the waste of memory during startup and CPU overhead.

this patch adds flag to disable compression for containers checkpoints.

Signed-off-by: Ivan Prisyazhnyy <john.koepi@gmail.com>
2023-08-10 17:55:40 +02:00
Zach Koopmans 324735cfc0 Update docker packages to new moby repo.
PiperOrigin-RevId: 555358714
2023-08-09 20:48:47 -07:00
Etienne Perot 582bf0d72d Add disk usage monitoring to BuildKite script.
If disk usage of any filesystem approaches or reaches 100%, the agent
will log its disk usage to the console, and then reboot.

PiperOrigin-RevId: 555334222
2023-08-09 18:21:17 -07:00
Konstantin Bogomolov 821459c942 systrap: Enable using xsaveopt.
PiperOrigin-RevId: 554906814
2023-08-08 12:33:59 -07:00
Jeff Martin 3458b18514 Add test coverage for RTM_GETRULE, RTM_ADDRULE, and RTM_DELRULE
PiperOrigin-RevId: 554806910
2023-08-08 06:43:27 -07:00
Jeff Martin 32537556ff Add test coverage for RTM_NEWROUTE & RTM_DELROUTE
PiperOrigin-RevId: 554612706
2023-08-07 15:37:13 -07:00
Jeff Martin 4c5d0ffc37 Internal change.
PiperOrigin-RevId: 554591336
2023-08-07 14:22:55 -07:00
Ayush Ranjan f2b4f07d78 Add generic tooling to scrape output from fastlane ruby test suite.
PiperOrigin-RevId: 554562254
2023-08-07 12:49:30 -07:00
Shambhavi Srivastava 21d66119b7 Implementing clone3
Updates #8585

PiperOrigin-RevId: 554554034
2023-08-07 12:19:32 -07:00
Shambhavi Srivastava 8ff6816f07 Implementing CopyInN
PiperOrigin-RevId: 554542787
2023-08-07 11:42:20 -07:00
Jing Chen e89e40fded Implement setns CLONE_NEWUTS namespace type.
PiperOrigin-RevId: 554306089
2023-08-06 15:33:25 -07:00
gVisor bot 8f6af3062d Merge pull request #9235 from avagin:bazel-update
PiperOrigin-RevId: 553961337
2023-08-04 18:02:08 -07:00
Andrei Vagin 118a17d92d kernfs: set DenySpliceIn for DynamicBytesFD
DynamicBytesFD is used for cgroup and proc files. In Linux, these file systems
don't set splice_write callbacks.

Reported-by: syzbot+79b8543454bedce9858b@syzkaller.appspotmail.com
PiperOrigin-RevId: 553818208
2023-08-04 08:45:42 -07:00
Andrei Vagin 506d87b963 Update bazel, rules_go and gazelle 2023-08-03 18:22:26 -07:00
Andrei Vagin 6f978d7185 kernel: GetMountNamespace has to check that mntns isn't nil
Reported-by: syzbot+e21bed832e505430bb27@syzkaller.appspotmail.com
PiperOrigin-RevId: 553568085
2023-08-03 13:14:04 -07:00
Kevin Krakauer b4aeb6cd0c remove unnecessary build tags
Fixes #9237

PiperOrigin-RevId: 553523044
2023-08-03 10:48:45 -07:00
Ayush Ranjan 297e7e4e00 Use initialized config.Overlay2 for sub-containers.
Earlier we were not initializing Container.OverlayConf for sub containers.
That field itself is a foot gun, so I removed it. Instead added it as a
function parameter to force callers to fetch a fresh config.

PiperOrigin-RevId: 553519973
2023-08-03 10:34:34 -07:00
Ayush Ranjan 17e10cc47d Add NV0000_CTRL_CMD_SYSTEM_GET_P2P_CAPS to nvproxy.
PiperOrigin-RevId: 553015606
2023-08-01 21:23:05 -07:00
Andrei Vagin abe7cee096 kernel: don't use atomic pointers for task.netns
task.netns is always changed from a task goroutine under task.mu.

It means that we can access it without any locks from a task goroutine
we don't need to increment a reference counter in such cases.

In all other cases, we need to take task.mu.

PiperOrigin-RevId: 552913323
2023-08-01 14:04:53 -07:00
Lucas Manning fa9163e7a0 Bind mount PCI paths for TPUs as read only.
PiperOrigin-RevId: 552912189
2023-08-01 13:56:45 -07:00
Fabricio Voznika 8baec93b4c Add test case to TestCheckpointRestore
TestCheckpointRestore was testing that container can be restored into
a new one (e.g. clone) - twice. Change one of the restores to use the
same identity, to test the case where the same container is being saved
and then resumed at a later time.

PiperOrigin-RevId: 552866396
2023-08-01 11:34:17 -07:00