gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Jing Chen	a093ad0450	Simplify and format gVisor codebase. The changes are just output of `gofmt -s -w .`.	2024-10-13 00:50:32 -07:00
Andrei Vagin	32bbb18823	systrap: use seccomp notifications to communicate with syscall threads The new synchronous mode of seccomp-unotify (v6.6-rc1~205^2~6) reduces overhead of context switches. PiperOrigin-RevId: 622924980	2024-04-08 12:47:44 -07:00
Fabricio Voznika	c087777e37	Plumb restore context to afterLoad() This allows for external information to be passed to restore code, like host FDs to be remapped. Updates #1956 PiperOrigin-RevId: 612540749	2024-03-04 12:21:50 -08:00
Etienne Perot	62175dea49	`seccomp.BuildProgram`: Add `ProgramOptions` struct. This is just a refactoring, but the intention of this `struct` is to add other useful options for the program build, such as the list of expected "hottest" syscalls by frequency. The interface is a bit awkward, because of the need for two entry points (one which didn't have any way to set the default actions), and because the zero value of `linux.BPFAction` is a valid (and common) "default action": killing the program (`linux.SECCOMP_RET_KILL_THREAD`). We also can't use `linux.BPFAction`, as `linux.SECCOMP_RET_` are constants. So the struct fields use functions that "resolve" to an action. This reads fairly well in the call sites (`DefaultAction: Return(action)`), at the cost of slightly convoluted logic in `seccomp.go`. PiperOrigin-RevId: 581477348	2023-11-11 00:44:40 -08:00
Etienne Perot	f098b9b06e	`seccomp`: Make `SyscallRules` map type opaque. This wraps the `map[uintptry]SyscallRule` into an unexported field of a struct so that it cannot be accessed directly. This is helpful for the `runsc` and `fsgofer` seccomp filters which are quite complex and built across multiple files and multiple functions, where it is not always clear which order they are executed in. By forcing mutations to be more explicit about their intent (especially "merge with this new rule" vs "override what happens for this syscall with this new rule"), we can crash if that intent isn't what's actually happening. PiperOrigin-RevId: 572361619	2023-10-10 14:11:01 -07:00
Etienne Perot	71dc79e653	`secbench`: Benchmark optimization duration and compression ratio. Current values for the Sentry filters: ``` │ current │ │ build-sec │ SentrySystrap 13.73m ± 0% SentryKVM 16.36m ± 0% │ current │ │ compression-ratio │ SentrySystrap 2.165 ± 0% SentryKVM 2.132 ± 0% │ current │ │ gen-instr │ SentrySystrap 1.288k ± 0% SentryKVM 1.373k ± 0% │ current │ │ opt-instr │ SentrySystrap 595.0 ± 0% SentryKVM 644.0 ± 0% │ current │ │ opt-sec │ SentrySystrap 819.0µ ± 2% SentryKVM 897.0µ ± 1% ``` PiperOrigin-RevId: 572089103	2023-10-09 17:53:22 -07:00
Etienne Perot	addac5f248	Refactor seccomp rules with interfaces rather than disjunctive normal form. This replaces the `seccomp.Rule` type with the `seccomp.SyscallRule` interface, which is an abstraction that defines how to match a syscall's arguments and RIP. This has the following benefits: - The code can verify that rules are self-contained, as the `SyscallRule.Render` contract specifies that the rule must jump to either a "matched" or "not matched" label, and may not fall through. It uses `ProgramBuilder`'s support for asserting unreachability to enforce this. - Rules that match everything are more explicit (no more implicit "no rules means everything matches" behavior, instead you have to explicitly specify `seccomp.MatchAll{}`). - "OR" behavior is explicit (a disjunctive rule is marked as `seccomp.Or` rather than the current implicit meaning of a list of rules). - Allows the creation of more sophisticated matching rules that don't work on a per-argument basis. This change does not do any of that yet, it simply refactors existing rules without changing the way they work. - Decouples rule-specific rendering code from the larger program generation code (BST, architecture check, etc.). Unfortunately there is no easy way to split this change into multiple sub-changes without introducing additional complexity to support both forms of expressing rules, so sorry if this is a large change. But note that it is actually net-negative in line count. Despite the size of this change, please review it carefully, as this is a security-sensitive change. PiperOrigin-RevId: 571459670	2023-10-06 16:19:26 -07:00
Etienne Perot	dcfe2d169e	`seccomp`: Rename `seccomp.MatchAny` to `seccomp.AnyValue`. This reduces the diff on an upcoming refactor which modifies all seccomp rules. `AnyValue` better reflects the fact that the matcher is about matching a single syscall argument value, as opposed to e.g. a rule that allows a syscall through regardless of its argument. PiperOrigin-RevId: 571110444	2023-10-05 13:19:06 -07:00
Etienne Perot	5f5692dd20	`bpf`: Replace most uses of `linux.BPFInstruction` with `bpf.Instruction`. `bpf.Instruction` is the same type as `linux.BPFInstruction`, except that it uses the BPF instruction-to-string decoder to give a nice human-readable stringification. PiperOrigin-RevId: 570499020	2023-10-03 14:34:53 -07:00
Konstantin Bogomolov	6763252ef0	Cleanup unused systrap code. PiperOrigin-RevId: 529810117	2023-05-05 14:10:37 -07:00
Konstantin Bogomolov	d7f590dd00	Clean up context decoupling experiment. This change removes code branches and variables only used in coupled-context mode. PiperOrigin-RevId: 529776383	2023-05-05 11:55:50 -07:00
Konstantin Bogomolov	f727f06c81	Add debug logging to systrap futex waits. In general it is probably a good idea to set a timeout on any futex waits that the sentry is doing. For now just output some helpful logs about what the shared memory looks like; in the future we may want to do something more useful on ETIMEDOUT events. PiperOrigin-RevId: 518919966	2023-03-23 11:44:22 -07:00
Konstantin Bogomolov	897c03039e	Implement systrap context queue. This is the initial implementation of the systrap context queue via a ringbuffer in shared memory between stub threads and the sentry. In this new model there is no longer a bound sysmsg thread for every context; instead each subprocess starts with one initial sysmsg thread, which starts polling the context queue for new contexts arriving from the sentry. If the sentry detects that contexts are spending too much time in the context queue without being processed, it will create new sysmsg threads or wake sleeping ones. Tangentially, sysmsg threads will go to sleep if they spend too much time busy looping without new context arrivals. This model does not yet take into account the full load of the host system or even multiple subprocesses in the same sandbox. Multiple overloaded subprocesses are liable to make each other run slower by kicking sysmsg threads more often than they need to; this will be remedied in follow up CLs. PiperOrigin-RevId: 516680504	2023-03-14 17:48:13 -07:00
Konstantin Bogomolov	263dad6258	Handle context interrupts based on syshandler state. Also add interrupt handling for context decoupling. Previously syshandler interrupts would be retriggered no matter if the interrupt arrived before the switch to sentry or after. We only need to handle the case of it arriving after. Additionally this CL introduces interrupt handling for the decoupled context mode, by making interrupts target task contexts rather than sysmsg threads. PiperOrigin-RevId: 515065101	2023-03-08 09:58:12 -08:00
Konstantin Bogomolov	702540baec	Implement saving decoupled context from sighandler. Saves task context state to the separate context memory region which is mapped to all subprocess sysmsg threads, instead of always saving the context to the thread-specific sysmsg. When context decoupling is disabled fpstate is not saved to this region, but GP registers and signal info are. PiperOrigin-RevId: 514432596	2023-03-06 09:24:24 -08:00
Andrei Vagin	192bfb03fb	Open-sourcing the systrap platform. The systrap platform like the ptrace platform uses stub processes to manage the user address space. The difference is how they intercept system calls and other events like memory faults, exceptions, etc. In case of systrap, all events that have to be handled by the Sentry trigger signals that are handled by a custom signal handler installed on stub processes. The signal handler switches control to the Sentry. Here are a few other optimizations: * On x86, system calls can be replaced with a function call to remove overhead of signals. * For fast interactions of sentry and stub processes, futex wait/wake can be a bottle neck, so we use a polling mode. The platform is launched for the purpose of testing and gathering initial feedback. It is not yet ready for use in production. PiperOrigin-RevId: 511650064	2023-02-22 18:22:49 -08:00

16 Commits