gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Koichi Shiraishi	0cf77c02f8	all: remove use io/ioutil deprecated package & fix some deprecated thing Signed-off-by: Koichi Shiraishi <zchee.io@gmail.com>	2024-10-10 20:36:24 +09:00
gVisor bot	3c4b246cf2	Fix printf violations inside of the gvisor code Recently printf.Analyzer has become stricter (https://github.com/golang/go/issues/60529) which led to new findings. gvisor nogo tests run this analyzer and fail if it produces findings. PiperOrigin-RevId: 671657227	2024-09-06 00:45:23 -07:00
Etienne Perot	b4ca91450f	Standardize timestamps in `runsc` log filenames. Prior to this change, the log files each have their own timestamp computed independently. For example, this means that the coverage log file, the panic log file, the debug log file, the first Gofer's log file, and the profile files for the same Sentry may all have different timestamps in their filenames. Now they are the same. This change introduces a central `runsc/starttime` package for which the sole purpose is to hold the start time of the `runsc` process, for easy plumbing in all places that need it. PiperOrigin-RevId: 646667986	2024-06-25 17:48:13 -07:00
Etienne Perot	c8da73daaf	Add option to dump profiling metrics within a container's stdout logs. This is part of a series of changes to add metric charts in performance benchmarks. This change is useful to be able to extract profiling metric data easily regardless of runtime configuration. This change also changes where profiling metrics are initialized and configured. They are now only part of `runsc boot`, rather than all `runsc` invocations. PiperOrigin-RevId: 629900287	2024-05-01 18:33:26 -07:00
Etienne Perot	9425d102e5	Make `runsc` log header more helpful. Changes: - Header is more compact (in non-debug mode). - Added Go runtime version. - Added number of CPU cores. - In debug mode, log page size. - Removed some non-important pieces of configuration to info-level logs. - Made debug mode enable the entire configuration to be logged. - Move initialization of some non-logging-related stuff be after log initialization. (Some of them use the log, so it makes no sense to do that before logging is actually initialized.) PiperOrigin-RevId: 595309362	2024-01-02 23:52:26 -08:00
Etienne Perot	127262d21a	Add annotation to send `runsc` debug logs to user logs. On Kubernetes, these are logged at the pod level. This makes it convenient to debug gVisor on Kubernetes clusters where SSH access to nodes is difficult or prohibited by policy. With this and other pod annotations, it is possible to do strace debugging with only pod annotations and no SSH. PiperOrigin-RevId: 595197543	2024-01-02 13:37:07 -08:00
Tiwei Bie	da2b10e207	runsc: decouple GoferMountConf to two layers This patch decouples GoferMountConf to two layers to allow us to configure all combinations of a gofer mount in a succinct way: - Upper layer config: none, memory, self, anon. The upper layer is always tmpfs. It describes the backend for tmpfs. - Lower layer config: none, lisafs. It describes the backend for the filesystem which actually holds the image contents. The old SelfTmpfs will be represented as "upper=self,lower=none", MemoryOverlay will be "upper=memory,lower=lisafs", SelfOverlay will be "upper=self,lower=lisafs", and so on. Thanks to @ayushr2 for the suggestion on how to better decouple this. This is a preparation for adding the EROFS rootfs support. There is no functional change intended. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>	2023-11-05 09:02:22 +08:00
Josh Seba	5f52ed57e5	Add subcommand to list nvproxy supported driver versions Since the driver versions are hard-coded, determining the supported version list requires code inspection, which is difficult to automate. Add a sub-command (and category) to print the list of nvidia driver versions supported by nvproxy for a given build of runsc. Signed-off-by: Josh Seba <jseba@cloudflare.com>	2023-10-30 15:19:06 -07:00
Konstantin Bogomolov	440b37a5c1	Add profiling metric flags to output metric data to local TSV file. The idea behind conditionally compiled metrics originally was to use them in hotpaths for profiling purposes. This CL makes that possible by outputting declared metrics in TSV format, which can be used to track custom events at runtime in relatively high resolution. Usage: 1. Optionally enable compilation of runsc conditionally-compiled metrics by passing in condmetric_profiling to the Go tags. 2. Add these flags to runsc: - [Required] --profiling-metrics-log=/tmp/some.csv - [Optional] --profiling-metrics=/task/syscalls,/task/faults - If this flag is not specified it will monitor all conditionally-compiled metrics by default. - [Optional] --profiling-metrics-rate-us=10000 Some future improvements: - Flag to output a metric-difference between timestamps instead of constant accumulation. - Output a gnuplot command along with the data. - Current monitoring resolution is limited by what time.Sleep allows. This can be overcome by spinning/yielding when lower monitoring rates are requested. PiperOrigin-RevId: 560849611	2023-08-28 16:28:09 -07:00
Ayush Ranjan	7981df85f3	Make all custom flag.Value implementations idempotent. Set(String()) should be an idempotent operation. This is a useful property which allows us to generate args while re-execing the same process. Setting `--flag-name=val.String()` should work. PiperOrigin-RevId: 552598313	2023-07-31 14:53:21 -07:00
Etienne Perot	f815fa9079	`runsc`: Only register some base flags if they are not already defined. PiperOrigin-RevId: 538323866	2023-06-06 16:38:18 -07:00
Ayush Ranjan	a1006d486d	Unexport fields of config.Overlay2. Users/callers of config.Overlay2 relied on its internal functioning/layout. The stable contract is at the --overlay2 flag level. So changed callers to use that contract instead via Overlay2.Set(). Unexported fields to encapsulate the type correctly. PiperOrigin-RevId: 530818635	2023-05-09 23:42:02 -07:00
Etienne Perot	8f991198b4	Move rlimit-setting code from `runsc` main to run when starting a sandbox. This allows `runsc` subcommands that don't start sandboxes to run in restricted contexts without printing a warning about not being able to set `RLIMIT_MEMLOCK` when the ability to do so this doesn't matter. In particular, this helps with `runsc metric-server`, which can be locked down to run with very little capabilities. A previous version of this change had moved this to the beginning of the `runsc boot` subcommand code. However, this doesn't work, because `runsc boot` runs as an unprivileged user (`nobody`) and does not have `CAP_SYS_RESOURCE`. Prior to that change, all `runsc` invocations tried to call `setrlimit`, so what happened in practice is that `runsc create` (running as `root`) would call `setrlimit`, and then `runsc boot` would inherit the `RLIM_INFINITY` and would therefore never actually call `setrlimit` by itself. When moving the `setrlimit` code to only run within `runsc boot`, suddenly the `runsc boot` invocation found itself in a context where it started trying to call `setrlimit`, which would silently fail. This approach has the downside of having the side-effect of needlessly setting `RLIM_INFINITY` on the calling `runsc` process. This was effectively what was already happening prior to moving this code into `runsc boot` anyway, so this should be OK. The alternative would be to add yet another intermediate subcommand before `runsc boot` which runs with `CAP_SYS_RESOURCE`, then calls `setrlimit`, then drops `CAP_SYS_RESOURCE`, then execs `runsc boot`, but that seems like adding a lot more extra complexity to the boot process than is warranted for this feature. Thanks to Ayush Ranjan for bisecting the performance regression down to this change. Ran benchmarks and performance is comparable to before moving `setrlimit` code within `runsc boot`. PiperOrigin-RevId: 521916084	2023-04-04 18:08:14 -07:00
Etienne Perot	8820b3bb90	Automated rollback of changelist 508491474 PiperOrigin-RevId: 521829631	2023-04-04 12:10:26 -07:00
Zach Koopmans	f92957314c	Add portforward command to runsc Add portforward comand so that we can use runsc to forward connections to container ports. This will eventually be supported in k8s. PiperOrigin-RevId: 520739913	2023-03-30 14:16:19 -07:00
Etienne Perot	d0326a67da	`runsc`: Refactor in how the version string is propagated in `runsc`. This is helpful so that it can be imported form other packages without import loops. This will be used in a follow-up change to add the version string as a per-sandbox metric metadata label. PiperOrigin-RevId: 519002695	2023-03-23 17:12:09 -07:00
Ayush Ranjan	f9638850c6	Add support for directfs in runsc. directfs is a feature much like host networking. The gofer security boundary is dropped for additional performance. The sentry is allowed to directly access the host filesystem to access or mutate files in order to service application syscalls. This mode bypasses the gofer completely. With directfs, the sentry is run in the root user namespace. (Similar to how hostinet works.) The sentry also runs with additional capabilities that allow it to bypass file permissions checks, change file owner, chroot, etc. Seccomp filters are loosened around the sentry to allow it to make filesystem syscalls directly to the host. The sandbox still runs with an empty pivot root. The gofer donates FDs for the root mount and all bind mounts so that the sentry can perform filesystem operations using them. There is a dummy gofer process in directfs mode. The gofer sets up all bind mounts and donates host FDs of the mount point to the sandbox. The gofer dies with the sandbox. All gofer mount options are supported with directfs. directfs mounts are set up similarly to how gofer mounts are set up. PiperOrigin-RevId: 513634936	2023-03-02 14:10:58 -08:00
Adin Scannell	1ceb814544	Add `default_applicable_licenses` rules to packages. PiperOrigin-RevId: 513581243	2023-03-02 10:50:04 -08:00
Konstantin Bogomolov	1832c38a95	Add TESTONLY sentry panic trigger through afs_syscall. This flag is added for tests that need to trigger a panic in the sentry kernel. Only done for x86_64 which does have a dedicated syscall number for afs_syscall; ARM does not. PiperOrigin-RevId: 512731631	2023-02-27 14:28:30 -08:00
Etienne Perot	81d6f80caf	Add `runsc metric-metadata` subcommand. Also group metric-related commands together in their own command group. PiperOrigin-RevId: 511240778	2023-02-21 10:34:37 -08:00
Etienne Perot	5419f17710	Move rlimit-setting code from `runsc` main to run only as part of `runsc boot`. This allows other `runsc` subcommands to run in restricted contexts without printing a warning about not being able to set `RLIMIT_MEMLOCK` in contexts where this doesn't matter. In particular, this helps with `runsc metric-server`, which can be locked down to run with very little capabilities. PiperOrigin-RevId: 508491474	2023-02-09 15:29:05 -08:00
Ayush Ranjan	c4fe64c5ef	Add dir= prefix in overlay2 flag's medium. This is to make it clear that the host file will be created inside this directory. This also makes it look cleaner when other medium options are added later. PiperOrigin-RevId: 505033408	2023-01-26 22:56:52 -08:00
Etienne Perot	2ce059fcad	gVisor: Implement `runsc metric-server` which serves Prometheus metrics. The metrics server implements the following interface: - GET `/metrics`: Serves Prometheus metrics. - POST `/runsc-metrics`: Contains administrative endpoints. All of them require the `root` argument to be specified, and match the one that the server expects. This is used to avoid confusing multiple instances of the `runsc` metrics server. - POST `/runsc-metrics/healthcheck`: Used by clients to verify that the server is running and is the expected metric server. This change has no tests, but coverage is provided in a later change that provides an end-to-end container tests that the metric server works and exports data faithfully. This change is part of a series of changes to support Prometheus-style metrics in `runsc`. Doing so requires making several seemingly-odd design decisions, due to the following architectural constraints: - Prometheus requires an HTTP server serving the `/metrics` endpoint. - For performance reasons, the `runsc boot` process cannot run the `netpoller` goroutine. - Since we don't want to write our own HTTP server implementation, this means the HTTP endpoint has to be served by a separate process that remains running during the lifetime of the container. - The `runsc boot` process is untrusted. - This means we cannot trust metrics data that comes out of the Sentry. Therefore, there needs to be an elaborate dance where we pre-register metric metadata before starting any untrusted workload. Then, the server relaying the metric data must verify the validity of metric values against this metric metadata. This avoids leaking metrics, cardinality blow-ups, and other such DoS vectors. - This feature needs to be easy-to-use in a typical Docker setting. - This means having the ability to just say `--metrics-server=localhost:1337` in the `runsc` runtime entry in `/etc/docker/daemon.json` and have that Just Work(TM), even when multiple containers are running. - Since only one process may listen on a port at a given time, this means the metric server needs to be able to multiplex requests out to multiple running sandboxes, and remain alive for the entire duration of either of these sandboxes. - For this reason, the metrics server runs outside of the usual per-container cgroups. - This also saves system resources by not running one server per sandbox. - The metrics server must be exposed to the outside world, and cannot assume that its clients are trustworthy. - For this reason, a metrics server is bound to a runtime root directory, and double-checks all that the sandboxes it is asked to follow actually exist in this root directory. PiperOrigin-RevId: 503254345	2023-01-19 13:45:46 -08:00
Rahat Mahmood	ef96e9328e	Disable io_uring syscalls by default. The current io_uring support is very limited and experimental. Disable it by default, and add a flag to enable it for testing. PiperOrigin-RevId: 500760451	2023-01-09 11:11:57 -08:00
Etienne Perot	01061a8f20	gVisor: Add `runsc metrics-export` subcommand. This subcommand prints a sandbox's instrumentation data in Prometheus format to stdout. This change is part of a series of changes to support Prometheus-style metrics in `runsc`. Doing so requires making several seemingly-odd design decisions, due to the following architectural constraints: - Prometheus requires an HTTP server serving the `/metrics` endpoint. - For performance reasons, the `runsc boot` process cannot run the `netpoller` goroutine. - Since we don't want to write our own HTTP server implementation, this means the HTTP endpoint has to be served by a separate process that remains running during the lifetime of the container. - The `runsc boot` process is untrusted. - This means we cannot trust metrics data that comes out of the Sentry. Therefore, there needs to be an elaborate dance where we pre-register metric metadata before starting any untrusted workload. Then, the server relaying the metric data must verify the validity of metric values against this metric metadata. This avoids leaking metrics, cardinality blow-ups, and other such DoS vectors. - This feature needs to be easy-to-use in a typical Docker setting. - This means having the ability to just say `--metrics-server=localhost:1337` in the `runsc` runtime entry in `/etc/docker/daemon.json` and have that Just Work(TM), even when multiple containers are running. - Since only one process may listen on a port at a given time, this means the metric server needs to be able to multiplex requests out to multiple running sandboxes, and remain alive for the entire duration of either of these sandboxes. However, it should also die when there are no sandboxes, so that we don't end up with leftover metric servers lying around. - For this reason, the metrics server runs outside of the usual per-container cgroups. - This also saves system resources by not running one server per sandbox. - The metrics server must be exposed to the outside world, and cannot assume that its clients are trustworthy. - For this reason, a metrics server is bound to a runtime root directory, and double-checks all that the sandboxes it is asked to follow actually exist in this root directory. PiperOrigin-RevId: 498067941	2022-12-27 18:11:57 -08:00

1 2 3

52 Commits