gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Andrei Vagin	03a28d158e	platform/systrap: return memory access type based on a page fault error code Now we don't need to trigger a second fault to figure out whether it was write or read access. Fixes #11008 Co-developed-by: Jamie Liu <jamieliu@google.com> PiperOrigin-RevId: 697677262	2024-11-18 10:33:59 -08:00
Konstantin Bogomolov	fe66cae2ed	Enumerate known systrap stub failures to exit process cleanly. This helps to rectify a long standing problem of Systrap panicking when encountering corrupted sysmsg stub memory. These errors specifically are easier to notice and debug since we check for them in the stub code and flag them to the sentry explicitly. They are now very grep-able to make finding their origin in the stub code easier. PiperOrigin-RevId: 604743496	2024-02-06 13:19:26 -08:00
Konstantin Bogomolov	cffce1a94a	systrap: Revise slow-path enablement. The current systrap fastpath heuristics do a good job getting high performance when there are idle CPUs, but fail when there are not enough and do much worse then even the "pure slowpath". Here is a summary of changes made to remedy that: 1. Disable stub and dispatcher fastpath by default. 2. Decouple fastpath states to be separate between dispatcher and stub fastpath. 3. Implement response latency metrics for both sentry->stub and stub->sentry messages. Use these latency metrics in order keep track of the baseline latency for both sides. With baseline latency established, compare fastpath latency to determine how beneficial it is to keep fastpath enabled. Some sampled benchmarks: - sysbench-X-Y: ``` ``` - gettid_benchmark - getpid_benchmark Some benchmark results (5-run average): - On a 4 core machine: []() \| HEAD \| ThisCL -------------------\|----------\|----------- sysbench-1-8: \| 48218ms \| 50282ms sysbench-2-4: \| 65900ms \| 72282ms sysbench-4-2: \|427880ms \| 175714ms sysbench-1-2: \| 12998ms \| 13688ms getpid_benchmark: (HEAD) ``` Benchmark Time CPU Iterations ------------------------------------------------------- BM_Getpid 3471 ns 3441 ns 212121 BM_GetpidOpt 1039 ns 1029 ns 700000 ``` getpid_benchmark: (This CL) ``` Benchmark Time CPU Iterations ------------------------------------------------------- BM_Getpid 3718 ns 3600 ns 200000 BM_GetpidOpt 1320 ns 1281 ns 538462 ``` gettid_benchmark: Like getpid, this CL slightly slower on lower thread count test variants. - On a 1 core machine: getpid_benchmark: (HEAD) ``` Benchmark Time CPU Iterations ------------------------------------------------------- BM_Getpid 74868 ns 75000 ns 10000 BM_GetpidOpt 74463 ns 74286 ns 8750 ``` getpid_benchmark: (This CL) ``` Benchmark Time CPU Iterations ------------------------------------------------------- BM_Getpid 12425 ns 12443 ns 53846 BM_GetpidOpt 8645 ns 8686 ns 87500 ``` gettid_benchmark: Same trend as for getpid_benchmark across the board. Another interesting case to look at for 1-core machines is copying one large file: ``` ./runsc --rootless --network none --ignore-cgroups do --force-overlay=false sh -c "time head -c 1073741824 </dev/zero >full-file" ``` - file copy (HEAD): 36.07user 0.00system 0:36.44elapsed 98%CPU - file copy (This CL): 2.96user 0.23system 0:07.14elapsed 44%CPU Fixes #9119. PiperOrigin-RevId: 576600019	2023-10-25 11:56:38 -07:00
Andrei Vagin	74e63e9e29	Update packages PiperOrigin-RevId: 532582853	2023-05-16 15:01:22 -07:00
Andrei Vagin	cd358f833a	systrap: don't wake up each thread separately Now all threads are waiting on queue->num_thread_to_wakeup, it is a single point for all threads. This change allows us to avoid cases when num_active_threads are inconsistent with threads states, because they can't be changed atomically. PiperOrigin-RevId: 531020012	2023-05-10 15:36:38 -07:00
Konstantin Bogomolov	d7f590dd00	Clean up context decoupling experiment. This change removes code branches and variables only used in coupled-context mode. PiperOrigin-RevId: 529776383	2023-05-05 11:55:50 -07:00
Andrei Vagin	96aa115516	systrap: simplify interrupt handling in syshandler syshandler can be interrupted by SIGCHLD. Here are two separate cases. The first one is when the interrupt are addressed to a context that has triggered syshanlder. In this case, the interrupt can be ignored, because the context is switching to the sentry. Another case is when syshandler is resuming a new context. It means we need to stop resuming it and return the context back to the sentry. The good thing is that the context state is up to date, and so sighandler can do its job ignoring a state from a signal frame. Reported-by: syzbot+2e305803e0d29e8faeb3@syzkaller.appspotmail.com PiperOrigin-RevId: 520986791	2023-03-31 12:34:16 -07:00
Andrei Vagin	d6ed799ade	systrap: save context pointer on sysmsg We don't need to calculate an address from context_id each time. PiperOrigin-RevId: 518640998	2023-03-22 12:26:03 -07:00
Andrei Vagin	f8a73a7d1a	Remove sysmsg->interrupted_context_id ctx->interrupt can be used to find out where the current context has to be interrupted or not. PiperOrigin-RevId: 518597531	2023-03-22 09:58:00 -07:00
Andrei Vagin	0cbe6fc835	systrap: introduce a spinning queue The spinning queue is a queue of spinning threads. It solves the fragmentation problem. The idea is to minimize the number of threads processing requests. We can't control how system threads are scheduled, so can't distribute requests efficiently. The spinning queue emulates virtual threads sorted by their spinning time. PiperOrigin-RevId: 518470754	2023-03-21 22:10:15 -07:00
Konstantin Bogomolov	897c03039e	Implement systrap context queue. This is the initial implementation of the systrap context queue via a ringbuffer in shared memory between stub threads and the sentry. In this new model there is no longer a bound sysmsg thread for every context; instead each subprocess starts with one initial sysmsg thread, which starts polling the context queue for new contexts arriving from the sentry. If the sentry detects that contexts are spending too much time in the context queue without being processed, it will create new sysmsg threads or wake sleeping ones. Tangentially, sysmsg threads will go to sleep if they spend too much time busy looping without new context arrivals. This model does not yet take into account the full load of the host system or even multiple subprocesses in the same sandbox. Multiple overloaded subprocesses are liable to make each other run slower by kicking sysmsg threads more often than they need to; this will be remedied in follow up CLs. PiperOrigin-RevId: 516680504	2023-03-14 17:48:13 -07:00
Konstantin Bogomolov	263dad6258	Handle context interrupts based on syshandler state. Also add interrupt handling for context decoupling. Previously syshandler interrupts would be retriggered no matter if the interrupt arrived before the switch to sentry or after. We only need to handle the case of it arriving after. Additionally this CL introduces interrupt handling for the decoupled context mode, by making interrupts target task contexts rather than sysmsg threads. PiperOrigin-RevId: 515065101	2023-03-08 09:58:12 -08:00
Konstantin Bogomolov	39f2721c9b	Implement saving decoupled context from syshandler. Rewrite the syshandler assembly routine to save the full state of user threads, like the sighandler would. With fpstate, it does so by writing straight to the thread context struct, so there is no need to do an intermediate copy. PiperOrigin-RevId: 514751894	2023-03-07 09:16:13 -08:00
Konstantin Bogomolov	702540baec	Implement saving decoupled context from sighandler. Saves task context state to the separate context memory region which is mapped to all subprocess sysmsg threads, instead of always saving the context to the thread-specific sysmsg. When context decoupling is disabled fpstate is not saved to this region, but GP registers and signal info are. PiperOrigin-RevId: 514432596	2023-03-06 09:24:24 -08:00
Konstantin Bogomolov	9ec69054f8	Map shared region for systrap thread contexts. Introduces what a ThreadContext struct is in the context of systrap. It makes the mappings of the region where the contexts will be stored into both the sentry and the address space of stub processes. PiperOrigin-RevId: 513913793	2023-03-04 00:42:24 -08:00
Konstantin Bogomolov	35937b7f61	Add context decoupling flag. This is a quick and dirty way to activate context decoupling related changes. Will be removed once context decoupling is finalized. PiperOrigin-RevId: 513854004	2023-03-03 09:58:24 -08:00
Andrei Vagin	192bfb03fb	Open-sourcing the systrap platform. The systrap platform like the ptrace platform uses stub processes to manage the user address space. The difference is how they intercept system calls and other events like memory faults, exceptions, etc. In case of systrap, all events that have to be handled by the Sentry trigger signals that are handled by a custom signal handler installed on stub processes. The signal handler switches control to the Sentry. Here are a few other optimizations: * On x86, system calls can be replaced with a function call to remove overhead of signals. * For fast interactions of sentry and stub processes, futex wait/wake can be a bottle neck, so we use a polling mode. The platform is launched for the purpose of testing and gathering initial feedback. It is not yet ready for use in production. PiperOrigin-RevId: 511650064	2023-02-22 18:22:49 -08:00

17 Commits