gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Nicolas Lacasse	f9b1ce2f7d	Clean up tty.CheckChange and call it in SetForegroundProcessGroup. Previously, CheckChange (corresponding to Linux's tty/tty_check_change()) was only used the host TTY implementation, not the devpts implementation. Furthermore, ThreadGroup.SetForegroundProcessGroup() duplicated some of the logic in CheckChange, notably sending SIGTTOU to background tasks. This means that, for host TTYs, we could send SIGTTOU multiple times. In some circumstances, this leads the ioctl returning ERESTARTSYS in an infinite loop. PiperOrigin-RevId: 735934036	2025-03-11 16:46:55 -07:00
Jimmy Tran	17563a8af9	Return EACCES when calling setpgid() after execve() From setpgid manpage, EACCES - An attempt was made to change the process group ID of one of the children of the calling process and the child had already performed an execve(2) (setpgid(), setpgrp()). This CL makes gVisor implement this rule and updates the exec test suite accordingly. TESTED: http://sponge2/7f364e8a-4f82-463e-ba62-79234c4d054d PiperOrigin-RevId: 727095560	2025-02-14 16:14:14 -08:00
Etienne Perot	2b55090a58	Do not crash when creating thread group with already-exceeded soft CPU limit. Reported-by: syzbot+da9595a72d0762aaa48d@syzkaller.appspotmail.com PiperOrigin-RevId: 699425946	2024-11-23 01:28:50 -08:00
Jamie Liu	94aa652d10	kernel: start RLIMIT_CPU timers in NewThreadGroup Before cl/695198313, this bug only affected RLIMIT_CPU soft limits, which were represented by tg.rlimitCPUSoftSetting and was similarly uninitialized by Kernel.NewThreadGroup(); the CPU clock ticker fetched RLIMIT_CPU hard limits in each tick. After cl/695198313, this bug affects both RLIMIT_CPU soft and hard limits. Itimers don't have the same issue since they're not preserved across fork(). PiperOrigin-RevId: 695936410	2024-11-12 18:15:32 -08:00
Jamie Liu	2d90353f9f	kernel: drive all CPU timers in CPU clock ticker gVisor currently implements CPU clocks as follows: - A per-sentry "CPU clock ticker goroutine" (task_sched.go:Kernel.runCPUClockTicker()) periodically advances Kernel.cpuClock, causing it to serve as a very coarse but inexpensive monotonic wall clock (that happens to be suspended when no tasks are running). - Task goroutines observe the most recent value of Kernel.cpuClock when changing state (Task.gosched.Timestamp), and use it to compute the number of CPU clock ticks that have elapsed in a given state. Thus, task CPU clocks are approximately based on the wall time during which they were marked as running. - ITIMER_VIRTUAL, ITIMER_PROF, and RLIMIT_CPU are checked by the CPU clock ticker goroutine after advancing Kernel.cpuClock. POSIX interval timers and timerfds check CPU clocks (taskClock/tgClock) in ktime.SampledTimer goroutines. This has three major problems: - ktime.SampledTimer goroutines for CPU clock timers run concurrently with the CPU clock ticker, and are not informed as to when corresponding tasks start or stop running (due to overhead on the task execution critical path), so they can't determine when CPU clocks have/will advance; instead, they simply poll CPU clocks on a period equal to that of the represented timer, resulting in significant overhead for CPU-clock-based POSIX interval timers and timerfds. - For the same reason, CPU clock interval timers and timerfds may expire much later than when the CPU clock is actually incremented; in the interval timer case, this can result in notification signals being sent long after tasks have stopped running. (This is the same problem as in b/116538398, which motivated the special-casing of ITIMER_VIRTUAL and ITIMER_PROF described above, but applied to POSIX interval timers.) - The sentry does not impose a limit on the number of tasks that may be concurrently marked running, so if more tasks are marked running than the number of CPUs advertised to applications, application CPU utilization can appear to exceed 100%. This CL fixes these problems by introducing explicit per-Task and ThreadGroup CPU clocks, directly advancing (up to Kernel.applicationCores of) them in the CPU clock ticker, and directly expiring CPU timers when doing so. Itimer and RLIMIT_CPU timers lose their special-casing and instead behave like other CPU timers (see task_acct.go). Kernel.cpuClock is still required, but only for the sentry watchdog. Minor cleanup changes: - Gather all stateify hooks in kernel_state.go. - Replace kernel.randInt31n() with math/rand/v2, which fixes the same problem (https://go.dev/blog/randv2#problem.rand). Test workload: ``` #include <err.h> #include <signal.h> #include <time.h> #include <chrono> #include <thread> constexpr int kNumTimers = 1000; constexpr long kTimerPeriodNS = 10000000; int main(int argc, char** argv) { for (int i = 0; i < kNumTimers; i++) { struct sigevent sev = {.sigev_notify = SIGEV_NONE}; timer_t timerid; if (timer_create(CLOCK_THREAD_CPUTIME_ID, &sev, &timerid) < 0) { err(1, "timer_create failed"); } struct itimerspec it = { .it_interval = {0, kTimerPeriodNS}, .it_value = {0, kTimerPeriodNS}, }; if (timer_settime(timerid, 0, &it, nullptr) < 0) { err(1, "timer_settime failed"); } } std::this_thread::sleep_for(std::chrono::seconds(5)); return 0; } ``` Before this CL: ``` # /usr/bin/time ./runsc --ignore-cgroups --platform kvm --network none do $(pwd)/workloads/threadcputimers 1.50user 0.17system 0:05.25elapsed 31%CPU (0avgtext+0avgdata 35792maxresident)k 0inputs+184outputs (10major+20889minor)pagefaults 0swaps ``` After this CL: ``` # /usr/bin/time ./runsc --ignore-cgroups --platform kvm --network none do $(pwd)/workloads/threadcputimers 0.10user 0.12system 0:05.22elapsed 4%CPU (0avgtext+0avgdata 34040maxresident)k 0inputs+192outputs (6major+20929minor)pagefaults 0swaps ``` PiperOrigin-RevId: 695198313	2024-11-10 22:19:30 -08:00
Jamie Liu	2e6cfa72f2	ktime: support varying Timer implementations - Rename Timer to SampledTimer. - Move all Clock methods except Now to new interface SampledClock. - Move SampledTimer's exported methods (except SetClock) to new interface Timer. Combine Swap and SwapAnd into Set to reduce the number of redundant methods that must be implemented. - Add interface method Clock.NewTimer. This is in preparation for cl/693856539, which adds a second Timer implementation. PiperOrigin-RevId: 694299679	2024-11-07 17:19:25 -08:00
Jamie Liu	379108ca91	ktime: simplify Listener.NotifyTimer() No implementations of Listener use the Setting argument or the ability to override the new Setting, so remove these. PiperOrigin-RevId: 694195729	2024-11-07 11:47:58 -08:00
Jamie Liu	e23347e5b5	Move //pkg/sentry/kernel/time to //pkg/sentry/ktime. This avoids needing to rename it everywhere it's imported. PiperOrigin-RevId: 693930089	2024-11-06 18:13:51 -08:00
Nicolas Lacasse	cceb04f05a	Clean up host.TTYFileOperations. We used to track the foreground process group & session on the TTYFileOperation, but these are already tracked in kernel.TTY.ThreadGroup. So remove TTYFileOperations.fgProcessGroup and .session, and replace them with a kernel.TTY. This is analogous to how sentry-internal tty's already work. Updates #10925 PiperOrigin-RevId: 681957240	2024-10-03 11:25:52 -07:00
Jamie Liu	b99fd8711f	kernel: fix lock order inversion in ThreadGroup.Release() PiperOrigin-RevId: 681199251	2024-10-01 16:10:15 -07:00
Jamie Liu	a32d047f68	kernel: don't hold TaskSet.mu during most of Kernel.runCPUClockTicker() The removed `tg.leader == nil` check doesn't actually affect the correctness of the rest of the loop body. PiperOrigin-RevId: 681163998	2024-10-01 14:25:47 -07:00
Jamie Liu	03bebc4402	kernel: add ThreadGroup.signalLock() This allows "remote" locking of ThreadGroup.signalHandlers.mu without needing to lock TaskSet.mu, analogously to Linux's lock_task_sighand(). This reveals a bug: kernel.Task.sendSignal[Timer]Locked() unintentionally requires TaskSet.mu to be locked since it reads Task.exitState. To fix this, use atomic memory operations on Task.exitState when required. PiperOrigin-RevId: 681128063	2024-10-01 12:48:10 -07:00
Ayush Ranjan	7e395bbbd4	Plumb restore context to load*() methods. This allows for external information to be passed to restore code. Similar to `c087777e37` ("Plumb restore context to afterLoad()"). Updates #1956. PiperOrigin-RevId: 614125262	2024-03-08 20:28:02 -08:00
NymanRobin	f481172b53	Convert atomic.Value to atomic.Pointer[T]	2024-03-05 11:09:23 +02:00
Ayush Ranjan	f82d97c9ee	Only reset tty.tg to nil when its controlling process is being released. This means that when tg is being released, IFF tg.tty.tg == tg (which means tg was tg.tty's controlling process), then we can reset tty.tg to nil. Otherwise, as shown in reproducers of #9898, when a non-controlling process exits, it resets the TTY's tg field (which indicates the controlling thread group) and subsequently the alive controlling thread group can no longer receive signals from the TTY. Fixes #9898 PiperOrigin-RevId: 600987817	2024-01-23 20:22:06 -08:00
Etienne Perot	69e0c7643d	Use `clear` on `map` types wherever possible. This is similar as pull request #9749 but for maps rather than slices. PiperOrigin-RevId: 586504320	2023-11-29 18:00:07 -08:00
Nicolas Lacasse	47db4119a2	ThreadGroup should disassociate from tty on exit. Added syscall test case and also tested with: ``` $ docker container run -it --name debian-runsc --runtime=runsc debian:12 bash -c "apt update && apt install -y curl" <snip> Setting up libkeyutils1:amd64 (1.6.3-2) ... Setting up libpsl5:amd64 (0.21.2-1) ... Setting up libbrotli1:amd64 (1.0.9-2+b6) ... Setting up libssl3:amd64 (3.0.11-1~deb12u2) ... Setting up libnghttp2-14:amd64 (1.52.0-1) ... Setting up krb5-locales (1.20.1-2+deb12u1) ... Setting up libldap-common (2.5.13+dfsg-5) ... Setting up libkrb5support0:amd64 (1.20.1-2+deb12u1) ... Setting up libsasl2-modules-db:amd64 (2.1.28+dfsg-10) ... Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2+b2) ... Setting up libk5crypto3:amd64 (1.20.1-2+deb12u1) ... Setting up libsasl2-2:amd64 (2.1.28+dfsg-10) ... Setting up libssh2-1:amd64 (1.10.0-3+b1) ... Setting up libkrb5-3:amd64 (1.20.1-2+deb12u1) ... Setting up openssl (3.0.11-1~deb12u2) ... Setting up publicsuffix (20230209.2326-1) ... Setting up libsasl2-modules:amd64 (2.1.28+dfsg-10) ... Setting up libldap-2.5-0:amd64 (2.5.13+dfsg-5) ... Setting up ca-certificates (20230311) ... <snip> ``` With these changes, there is no more error about TIOCSCTTY, and the `Setting up..` log lines are formatted properly. Fixes #9642 PiperOrigin-RevId: 580204710	2023-11-07 09:26:16 -08:00
Andrei Vagin	b357d71828	TIOCSCTTY has to succeed if a specified tty is a controlling one already This behavior isn't documented in the tty_ioctl man, but it is in the kernel for ages. PiperOrigin-RevId: 577097643	2023-10-26 23:57:42 -07:00
Nicolas Lacasse	e7bd1b4c9c	Implement PR_{S,G}ET_CHILD_SUBREAPER. Closes #2323 PiperOrigin-RevId: 548205854	2023-07-14 13:19:25 -07:00
Nicolas Lacasse	8184fa1db0	Clean up devpts code, and deduplicate the foreground process state. We no longer store the foreground process directly in the terminal. Instead, we get it from the terminal TTY's ThreadGroup. Added a new method: tty.SignalForegroundProcessGroup to simplify this. Cleaned up some things along the way: * Terminal had a bunch of methods to get/set foreground process group and controlling TTY, but those methods were only usable by Ioctl, since they read/wrote to syscall arguments. I moved that logic to Ioctl, and deleted the methods from Terminal, which is now a very simple type. * Fixed a bug in ThreadGroud.SetForegroundProcessGroup where we were overwriting the ID of an existing process group, rather than setting a new process group on the session. * Simplified the construction of lineDiscipline type. Reported-by: syzbot+ae5b769cec8ad969c086@syzkaller.appspotmail.com PiperOrigin-RevId: 512330758	2023-02-25 14:08:58 -08:00
Etienne Perot	445fa6f40c	Lockdep: Print more info in the "unbalanced unlock" case. This CL does the following: - Add the ability for nested locks to have names. - Give names to all current uses of nested locks in the codebase. - Truncate `lockdep` debug stack traces to avoid the clutter from the `lockdep` code itself - Simplify `lockdep` to not longer require `classMap`. PiperOrigin-RevId: 491486620	2022-11-28 17:53:09 -08:00
Ayush Ranjan	1fa3c06f1e	Delete VFS1 completely. - Delete pkg/sentry/fs/*. - Move pkg/sentry/fs/fsutil out of VFS1 directory and remove VFS1 components. - Remove remaining unused references to VFS1 from remaining codebase. - Rename/refactor code to avoid even referencing VFS2, unless necessary. - Rewrite VFS1-only tests to VFS2. Updates #1624 PiperOrigin-RevId: 490064269	2022-11-21 13:57:52 -08:00
Nelson Elhage	a3a5772491	Fixes to TTOU handling in TIOCSPGRP Fixes two issues in TTOU handling while handling TIOCSPGRP on a tty device. fixes #7941; see that issue for details of the bugs. Updates the tests to test the fixed behavior; both tests are verified to fail without the `thread_group.go` fixes.	2022-09-02 14:32:54 -07:00
Andrei Vagin	604233c9f6	kernel: use lockdep mutexes PiperOrigin-RevId: 449877248	2022-05-19 18:33:59 -07:00
Ayush Ranjan	f6ed4523dc	Reformat codebase. PiperOrigin-RevId: 449358041	2022-05-17 17:48:35 -07:00

1 2 3

65 Commits