gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Nicolas Lacasse	f9b1ce2f7d	Clean up tty.CheckChange and call it in SetForegroundProcessGroup. Previously, CheckChange (corresponding to Linux's tty/tty_check_change()) was only used the host TTY implementation, not the devpts implementation. Furthermore, ThreadGroup.SetForegroundProcessGroup() duplicated some of the logic in CheckChange, notably sending SIGTTOU to background tasks. This means that, for host TTYs, we could send SIGTTOU multiple times. In some circumstances, this leads the ioctl returning ERESTARTSYS in an infinite loop. PiperOrigin-RevId: 735934036	2025-03-11 16:46:55 -07:00
Fabricio Voznika	c041d9bd58	Add missing binary_sha256 field Fixes #11466 PiperOrigin-RevId: 734209881	2025-03-06 11:01:58 -08:00
Ayush Ranjan	156f457e28	Add kernel.ThreadGroup.ForEachTask(). PiperOrigin-RevId: 733991896	2025-03-05 22:23:11 -08:00
gVisor bot	86abc85f37	Merge pull request #11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support PiperOrigin-RevId: 730560110	2025-02-25 14:52:09 -08:00
Jimmy Tran	17563a8af9	Return EACCES when calling setpgid() after execve() From setpgid manpage, EACCES - An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve(2) (setpgid(), setpgrp()). This CL makes gVisor implement this rule and updates the exec test suite accordingly. TESTED: http://sponge2/7f364e8a-4f82-463e-ba62-79234c4d054d PiperOrigin-RevId: 727095560	2025-02-14 16:14:14 -08:00
Nicolas Lacasse	d949e7177c	taskCopyContext should not require holding task.mu. The primary existing user (ptrace) does not do this, and it leads to lock inversion with MemoryManager.mappingMu. PiperOrigin-RevId: 725353311	2025-02-10 14:49:56 -08:00
Jimmy Tran	de6637c27c	Recompute `max` variable after setting FD in the bitmap. `fdBitmap.FirstZero()` could return `max` value; if it does, then recompute the max value to avoid reusing the old max value twice. The default bitmap size for file descriptors in gVisor is 65535. Add a pipe test that attempts to create more than 65535 FDs to hit the edge case where fdBitmap.FirstZero() returns the default bitmap max value of 65535. TESTED: http://sponge2/4c12ce75-3763-4773-ad62-87c6b8fe0446 http://sponge2/9c9d6ea0-b69c-432c-a16b-9446214109ba PiperOrigin-RevId: 724410846	2025-02-07 11:22:54 -08:00
gVisor bot	e0435b9a53	Merge pull request #11415 from avagin:codespell PiperOrigin-RevId: 721421397	2025-01-30 09:44:28 -08:00
Andrei Vagin	f010ae01ac	Fix a few typos	2025-01-29 21:16:51 -08:00
Etienne Perot	04f9204697	Yield thread group leader `*Task` in `TaskSet.ForEachThreadGroup`. This makes this function usable from outside of the `kernel` package without needing to call `tg.Leader()` (which requires a lock that `TaskSet.ForEachThreadGroup` already acquires). PiperOrigin-RevId: 721168957	2025-01-29 17:30:05 -08:00
Andrei Vagin	1864d9d091	Untag user addresses before handling them in the Sentry Top-Byte-Ignore (TBI) is a feature on all ARMv8.0 CPUs that causes the top byte of virtual addresses to be ignored on loads and stores. Instead, bit 55 is extended over bits 56-63 before address translation. This feature allows use of the (ignored) top byte as a tag or for other in-band metadata. In Linux, brk()/mmap()/mremap() syscalls don't untag addresses. More details are in dcde237319e6 ("mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()") PiperOrigin-RevId: 715885990	2025-01-15 11:52:40 -08:00
gVisor bot	7aa4c49b0d	Merge pull request #11291 from xianzhe-databricks:fix-uds-auth PiperOrigin-RevId: 712981221	2025-01-07 11:25:40 -08:00
xianzhe-databricks	c4f686f4e1	Add a new RPC ConnectWithCreds to allow gofer to connect to a unix domain socket with application's credentials	2025-01-03 17:50:06 +01:00
Fabricio Voznika	fb730ff784	Remove checkpoint_count from `runsc wait --checkpoint` This is done because external callers are not able to know the snapshot generation number from the outside. PiperOrigin-RevId: 707979556	2024-12-19 11:48:10 -08:00
Nayana Bidari	a3e5887415	Changes to support netstack save restore. - Added a new Stats() method in inet.Stack to get the saved stats during restore. - Mark stack.nic, tcpip.Route and stack.addressState structs as "nosave". These fields should not be saved because the IP addresses and routes can change during restore and new configuration of routes and IP addresses will be extracted from the restore spec and initialized in the saved stack. - Changes in Restore() method in icmp, udp, tcp, packet and raw endpoint files to support save restore of these endpoints. These changes are flag guarded by the TESTONLY-save-restore-netstack flag. PiperOrigin-RevId: 707639274	2024-12-18 12:52:22 -08:00
Andrei Vagin	c27c9a02ae	kernel: use the kernel context to run task destroy actions A task context can be used only if actions are executed in a task goroutine. In addition, these actions are executed asynchronously, so the task can be destroyed. Reported-by: syzbot+a9f3e03ea801374b8089@syzkaller.appspotmail.com PiperOrigin-RevId: 706078457	2024-12-13 19:08:53 -08:00
Andrei Vagin	9fcf0b5b53	proc: invalidate task inodes when tasks are destroyed PiperOrigin-RevId: 705785809	2024-12-13 00:58:08 -08:00
Etienne Perot	2b55090a58	Do not crash when creating thread group with already-exceeded soft CPU limit. Reported-by: syzbot+da9595a72d0762aaa48d@syzkaller.appspotmail.com PiperOrigin-RevId: 699425946	2024-11-23 01:28:50 -08:00
Nayana Bidari	df9ba5fb67	Restore listening connections when netstack s/r is enabled. This CL restores the listening connections when netstack s/r is enabled. The changes include: - New method as a workaround to replace the new routes and nics to the loaded stack after restore. - New Restore() for transport layer protocols to restore the protocol level background workers. - Adds afterLoad() method for fdbased processors. - Adds a test to verify listening connection is restored after checkpointing with netstack s/r enabled. - Few other changes to save restore fields to enable netstack s/r. PiperOrigin-RevId: 698453124	2024-11-20 11:13:57 -08:00
Jamie Liu	94aa652d10	kernel: start RLIMIT_CPU timers in NewThreadGroup Before cl/695198313, this bug only affected RLIMIT_CPU soft limits, which were represented by tg.rlimitCPUSoftSetting and was similarly uninitialized by Kernel.NewThreadGroup(); the CPU clock ticker fetched RLIMIT_CPU hard limits in each tick. After cl/695198313, this bug affects both RLIMIT_CPU soft and hard limits. Itimers don't have the same issue since they're not preserved across fork(). PiperOrigin-RevId: 695936410	2024-11-12 18:15:32 -08:00
Jamie Liu	7920b5b40a	kernel: improve tcpip.Timer implementation - Move ktime.VariableTimer to kernel.timekeeperTcpipTimer, its only use case. This allows timekeeperTcpipTimer to use concrete types kernel.timekeeperClock and ktime.SampledTimer instead of ktime.Clock and ktime.Timer, saving a tiny amount of memory (interface values consist of two pointers) and CPU (for interface method calls). - Fix a bug where timekeeperTcpipTimer expiration can cancel a racing call to timekeeperTcpipTimer.Reset() (see use of new field timekeeperTcpipTimer.resets). - Define Listener.NotifyTimer directly on timekeeperTcpipTimer (dropping ktime.functionNotifier), and move goroutine spawning from the anonymous function in ktime.AfterFunc() into timekeeperTcpipTimer.NotifyTimer(). This slightly simplifies the control flow and saves an allocation for the anonymous function object. - Use monotonicClock rather than realtimeClock. It doesn't make sense for time-of-day clock adjustments to affect netstack timeouts, and this is consistent with tcpip.stdClock => time.AfterFunc => runtime.timer. PiperOrigin-RevId: 695504159	2024-11-11 15:38:55 -08:00
Jamie Liu	2d90353f9f	kernel: drive all CPU timers in CPU clock ticker gVisor currently implements CPU clocks as follows: - A per-sentry "CPU clock ticker goroutine" (task_sched.go:Kernel.runCPUClockTicker()) periodically advances Kernel.cpuClock, causing it to serve as a very coarse but inexpensive monotonic wall clock (that happens to be suspended when no tasks are running). - Task goroutines observe the most recent value of Kernel.cpuClock when changing state (Task.gosched.Timestamp), and use it to compute the number of CPU clock ticks that have elapsed in a given state. Thus, task CPU clocks are approximately based on the wall time during which they were marked as running. - ITIMER_VIRTUAL, ITIMER_PROF, and RLIMIT_CPU are checked by the CPU clock ticker goroutine after advancing Kernel.cpuClock. POSIX interval timers and timerfds check CPU clocks (taskClock/tgClock) in ktime.SampledTimer goroutines. This has three major problems: - ktime.SampledTimer goroutines for CPU clock timers run concurrently with the CPU clock ticker, and are not informed as to when corresponding tasks start or stop running (due to overhead on the task execution critical path), so they can't determine when CPU clocks have/will advance; instead, they simply poll CPU clocks on a period equal to that of the represented timer, resulting in significant overhead for CPU-clock-based POSIX interval timers and timerfds. - For the same reason, CPU clock interval timers and timerfds may expire much later than when the CPU clock is actually incremented; in the interval timer case, this can result in notification signals being sent long after tasks have stopped running. (This is the same problem as in b/116538398, which motivated the special-casing of ITIMER_VIRTUAL and ITIMER_PROF described above, but applied to POSIX interval timers.) - The sentry does not impose a limit on the number of tasks that may be concurrently marked running, so if more tasks are marked running than the number of CPUs advertised to applications, application CPU utilization can appear to exceed 100%. This CL fixes these problems by introducing explicit per-Task and ThreadGroup CPU clocks, directly advancing (up to Kernel.applicationCores of) them in the CPU clock ticker, and directly expiring CPU timers when doing so. Itimer and RLIMIT_CPU timers lose their special-casing and instead behave like other CPU timers (see task_acct.go). Kernel.cpuClock is still required, but only for the sentry watchdog. Minor cleanup changes: - Gather all stateify hooks in kernel_state.go. - Replace kernel.randInt31n() with math/rand/v2, which fixes the same problem (https://go.dev/blog/randv2#problem.rand). Test workload: ``` #include <err.h> #include <signal.h> #include <time.h> #include <chrono> #include <thread> constexpr int kNumTimers = 1000; constexpr long kTimerPeriodNS = 10000000; int main(int argc, char** argv) { for (int i = 0; i < kNumTimers; i++) { struct sigevent sev = {.sigev_notify = SIGEV_NONE}; timer_t timerid; if (timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timerid) < 0) { err(1, "timer_create failed"); } struct itimerspec it = { .it_interval = {0, kTimerPeriodNS}, .it_value = {0, kTimerPeriodNS}, }; if (timer_settime(timerid, 0, &it, nullptr) < 0) { err(1, "timer_settime failed"); } } std::this_thread::sleep_for(std::chrono::seconds(5)); return 0; } ``` Before this CL: ``` # /usr/bin/time ./runsc --ignore-cgroups --platform kvm --network none do $(pwd)/workloads/threadcputimers 1.50user 0.17system 0:05.25elapsed 31%CPU (0avgtext+0avgdata 35792maxresident)k 0inputs+184outputs (10major+20889minor)pagefaults 0swaps ``` After this CL: ``` # /usr/bin/time ./runsc --ignore-cgroups --platform kvm --network none do $(pwd)/workloads/threadcputimers 0.10user 0.12system 0:05.22elapsed 4%CPU (0avgtext+0avgdata 34040maxresident)k 0inputs+192outputs (6major+20929minor)pagefaults 0swaps ``` PiperOrigin-RevId: 695198313	2024-11-10 22:19:30 -08:00
Jamie Liu	2e6cfa72f2	ktime: support varying Timer implementations - Rename Timer to SampledTimer. - Move all Clock methods except Now to new interface SampledClock. - Move SampledTimer's exported methods (except SetClock) to new interface Timer. Combine Swap and SwapAnd into Set to reduce the number of redundant methods that must be implemented. - Add interface method Clock.NewTimer. This is in preparation for cl/693856539, which adds a second Timer implementation. PiperOrigin-RevId: 694299679	2024-11-07 17:19:25 -08:00
Jamie Liu	379108ca91	ktime: simplify Listener.NotifyTimer() No implementations of Listener use the Setting argument or the ability to override the new Setting, so remove these. PiperOrigin-RevId: 694195729	2024-11-07 11:47:58 -08:00
Jamie Liu	e23347e5b5	Move //pkg/sentry/kernel/time to //pkg/sentry/ktime. This avoids needing to rename it everywhere it's imported. PiperOrigin-RevId: 693930089	2024-11-06 18:13:51 -08:00

1 2 3 4 5 ...

779 Commits