24 Commits

Author SHA1 Message Date
Andrei Vagin d432952bbe systrap: prevent corruptions of spinning sueues
The current synchronization is based on an assumption that a queue buffer can't
be recycled if the current thread itself is in the queue. Unfortunately, this
assumption is incorrect, and thus the queue buffer can be corrupted.

This change reworks the synchronization part so that the start index is updated
only after committing changes in the queue buffer.

Fixes #10000

PiperOrigin-RevId: 615226911
2024-03-12 17:32:04 -07:00
Andrei Vagin bfd27a1e43 systrap: track the spinning queue length in a separate counter
The current implementation has a race condition resulting in the skipping of
one element in the queue array. When retrieving objects from the queue, the
stub code can get stuck in an infinite loop due to unexpected unused elements.

Updates #10000

PiperOrigin-RevId: 608679165
2024-02-20 11:35:21 -08:00
Konstantin Bogomolov fe66cae2ed Enumerate known systrap stub failures to exit process cleanly.
This helps to rectify a long standing problem of Systrap panicking
when encountering corrupted sysmsg stub memory.

These errors specifically are easier to notice and debug since we
check for them in the stub code and flag them to the sentry
explicitly. They are now very grep-able to make finding their origin
in the stub code easier.

PiperOrigin-RevId: 604743496
2024-02-06 13:19:26 -08:00
Konstantin Bogomolov cffce1a94a systrap: Revise slow-path enablement.
The current systrap fastpath heuristics do a good job getting high
performance when there are idle CPUs, but fail when there are not
enough and do much worse then even the "pure slowpath".

Here is a summary of changes made to remedy that:

1. Disable stub and dispatcher fastpath by default.
2. Decouple fastpath states to be separate between dispatcher and stub
   fastpath.
3. Implement response latency metrics for both sentry->stub and
   stub->sentry messages. Use these latency metrics in order keep track of
   the baseline latency for both sides. With baseline latency established,
   compare fastpath latency to determine how beneficial it is to keep
   fastpath enabled.

Some sampled benchmarks:
- sysbench-X-Y:
```
```
- gettid_benchmark
- getpid_benchmark

Some benchmark results (5-run average):
- On a 4 core machine:

[]() | HEAD | ThisCL
-------------------|----------|-----------
sysbench-1-8:      | 48218ms  |  50282ms
sysbench-2-4:      | 65900ms  |  72282ms
sysbench-4-2:      |427880ms  | 175714ms
sysbench-1-2:      | 12998ms  |  13688ms

getpid_benchmark: (HEAD)
```
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BM_Getpid          3471 ns         3441 ns       212121
BM_GetpidOpt       1039 ns         1029 ns       700000
```

getpid_benchmark: (This CL)
```
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BM_Getpid          3718 ns         3600 ns       200000
BM_GetpidOpt       1320 ns         1281 ns       538462
```

gettid_benchmark: Like getpid, this CL slightly slower on lower thread count
                  test variants.

- On a 1 core machine:
getpid_benchmark: (HEAD)

```
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BM_Getpid         74868 ns        75000 ns        10000
BM_GetpidOpt      74463 ns        74286 ns         8750
```

getpid_benchmark: (This CL)
```
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BM_Getpid         12425 ns        12443 ns        53846
BM_GetpidOpt       8645 ns         8686 ns        87500
```

gettid_benchmark: Same trend as for getpid_benchmark across the board.

  Another interesting case to look at for 1-core machines is copying one large file:
```
  ./runsc --rootless --network none --ignore-cgroups do --force-overlay=false sh -c "time head -c 1073741824 </dev/zero >full-file"
```
- file copy (HEAD):   36.07user 0.00system 0:36.44elapsed 98%CPU
- file copy (This CL): 2.96user 0.23system 0:07.14elapsed 44%CPU

Fixes #9119.

PiperOrigin-RevId: 576600019
2023-10-25 11:56:38 -07:00
Andrei Vagin cd358f833a systrap: don't wake up each thread separately
Now all threads are waiting on queue->num_thread_to_wakeup,
it is a single point for all threads.

This change allows us to avoid cases when num_active_threads
are inconsistent with threads states, because they can't be
changed atomically.

PiperOrigin-RevId: 531020012
2023-05-10 15:36:38 -07:00
Andrei Vagin bd0acf9da9 systrap: add wrappers for gcc atomic functions
It makes code a bit more readable.

PiperOrigin-RevId: 530758808
2023-05-09 17:39:33 -07:00
Konstantin Bogomolov d7f590dd00 Clean up context decoupling experiment.
This change removes code branches and variables only used in coupled-context
mode.

PiperOrigin-RevId: 529776383
2023-05-05 11:55:50 -07:00
Andrei Vagin ff424dce7f systrap: queue_get_context has to detect cases when a ring buffer is recycled
queue_get_context reads `start`, then it gets a value of ringbuffer[start] and
increments `start` if the value is a valid context ID. The issue is that
the ring buffer can be recycled between first two operations.

This change puts an index into a buffer value. It allows us to detect when a
buffer is recycled and we read a value of a wrong index.

PiperOrigin-RevId: 529552389
2023-05-04 17:03:19 -07:00
Andrei Vagin ac45274bd7 systrap: disable the fast path in stub threads when it isn't effective
Here are two conditions when the fast path is effective:
* the side that has to change a state is running on CPU when another side
  is polling the state. This is why the Sentry threads have a higher
  priority than stub threads.
* The sentry handles events faster than the overhead of scheduling another
  stub thread.

This patch addresses the second condition. The fast path in stub threads is
disabled when we reach the limit of stub threads. The idea is that more stub
threads can generate more events to the sentry.

PiperOrigin-RevId: 529120681
2023-05-03 10:02:21 -07:00
Andrei Vagin c34261d265 systrap: limit a number of stub thread by a number of awake contexts
A context is awake if its guest thread isn't in the interruption sleep state.

The idea here is that a task in the sleep state will not return within the fast
path timeout and so we don't need to hold a stub thread for it.

PiperOrigin-RevId: 529006216
2023-05-02 23:58:59 -07:00
Andrei Vagin 9fabe79f94 systrap: disable fast path if it fails too often
Disable the fast path if it fails 5 times in a row and try to enable
it again in 10 ms.

PiperOrigin-RevId: 523756114
2023-04-12 11:26:39 -07:00
Andrei Vagin 5645cdf2d4 systrap: handler interrupted contexts in __export_start
contexts with pending interrupts should not be executed.

PiperOrigin-RevId: 521835904
2023-04-04 12:36:37 -07:00
Andrei Vagin 96aa115516 systrap: simplify interrupt handling in syshandler
syshandler can be interrupted by SIGCHLD. Here are two separate cases. The first
one is when the interrupt are addressed to a context that has triggered
syshanlder. In this case, the interrupt can be ignored, because the context
is switching to the sentry. Another case is when syshandler is resuming a new
context. It means we need to stop resuming it and return the context back to
the sentry. The good thing is that the context state is up to date, and so
sighandler can do its job ignoring a state from a signal frame.

Reported-by: syzbot+2e305803e0d29e8faeb3@syzkaller.appspotmail.com
PiperOrigin-RevId: 520986791
2023-03-31 12:34:16 -07:00
Andrei Vagin df2bf5201c systrap: kick another stub thread after the handshake timeout
The idea of the handshake timeout is to see if here is any free active thread.

In addition, here are a few fixes that prevent waking up too many threads.

PiperOrigin-RevId: 519871285
2023-03-27 17:21:27 -07:00
Andrei Vagin 585533eae7 systrap: check that minimum one stub thread is active after queueing a context
and don't activate more threads than contexts.

PiperOrigin-RevId: 519168272
2023-03-24 09:51:22 -07:00
Andrei Vagin d6ed799ade systrap: save context pointer on sysmsg
We don't need to calculate an address from context_id each time.

PiperOrigin-RevId: 518640998
2023-03-22 12:26:03 -07:00
Andrei Vagin f8a73a7d1a Remove sysmsg->interrupted_context_id
ctx->interrupt can be used to find out where the current context has to
be interrupted or not.

PiperOrigin-RevId: 518597531
2023-03-22 09:58:00 -07:00
Andrei Vagin 0cbe6fc835 systrap: introduce a spinning queue
The spinning queue is a queue of spinning threads. It solves the
fragmentation problem. The idea is to minimize the number of threads
processing requests. We can't control how system threads are scheduled, so
can't distribute requests efficiently. The spinning queue emulates virtual
threads sorted by their spinning time.

PiperOrigin-RevId: 518470754
2023-03-21 22:10:15 -07:00
Konstantin Bogomolov 897c03039e Implement systrap context queue.
This is the initial implementation of the systrap context queue via a ringbuffer
in shared memory between stub threads and the sentry.

In this new model there is no longer a bound sysmsg thread for every context;
instead each subprocess starts with one initial sysmsg thread, which starts
polling the context queue for new contexts arriving from the sentry. If the
sentry detects that contexts are spending too much time in the context queue
without being processed, it will create new sysmsg threads or wake sleeping
ones. Tangentially, sysmsg threads will go to sleep if they spend too much time
busy looping without new context arrivals.

This model does not yet take into account the full load of the host system or
even multiple subprocesses in the same sandbox. Multiple overloaded subprocesses
are liable to make each other run slower by kicking sysmsg threads more often
than they need to; this will be remedied in follow up CLs.

PiperOrigin-RevId: 516680504
2023-03-14 17:48:13 -07:00
Konstantin Bogomolov 39f2721c9b Implement saving decoupled context from syshandler.
Rewrite the syshandler assembly routine to save the full state of user threads,
like the sighandler would. With fpstate, it does so by writing straight to the
thread context struct, so there is no need to do an intermediate copy.

PiperOrigin-RevId: 514751894
2023-03-07 09:16:13 -08:00
Konstantin Bogomolov 702540baec Implement saving decoupled context from sighandler.
Saves task context state to the separate context memory region which is mapped
to all subprocess sysmsg threads, instead of always saving the context to the
thread-specific sysmsg.

When context decoupling is disabled fpstate is not saved to this region, but
GP registers and signal info are.

PiperOrigin-RevId: 514432596
2023-03-06 09:24:24 -08:00
Konstantin Bogomolov 9ec69054f8 Map shared region for systrap thread contexts.
Introduces what a ThreadContext struct is in the context of systrap. It
makes the mappings of the region where the contexts will be stored into both the
sentry and the address space of stub processes.

PiperOrigin-RevId: 513913793
2023-03-04 00:42:24 -08:00
Konstantin Bogomolov 35937b7f61 Add context decoupling flag.
This is a quick and dirty way to activate context decoupling related changes.
Will be removed once context decoupling is finalized.

PiperOrigin-RevId: 513854004
2023-03-03 09:58:24 -08:00
Andrei Vagin 192bfb03fb Open-sourcing the systrap platform.
The systrap platform like the ptrace platform uses stub processes to manage
the user address space. The difference is how they intercept system calls and
other events like memory faults, exceptions, etc.

In case of systrap, all events that have to be handled by the Sentry trigger
signals that are handled by a custom signal handler installed on stub
processes. The signal handler switches control to the Sentry.

Here are a few other optimizations:
* On x86, system calls can be replaced with a function call to remove overhead
  of signals.
* For fast interactions of sentry and stub processes, futex wait/wake can
  be a bottle neck, so we use a polling mode.

The platform is launched for the purpose of testing and gathering initial
feedback. It is not yet ready for use in production.

PiperOrigin-RevId: 511650064
2023-02-22 18:22:49 -08:00