gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Etienne Perot	4362b11be9	Add `--final-metrics-log` flag to export metric data upon sandbox termination. Fixes issue #11068 PiperOrigin-RevId: 688395401	2024-10-21 22:09:40 -07:00
Etienne Perot	d3b95fae44	Deflake `prometheus_test:TestWriteMultipleSnapshots`. The Prometheus parser doesn't guarantee that it returns metric data in timestamp-sorted order. So sort it in the test manually. Before: Fails 7 out of 2048 times After: Fails 0 out of 2048 times PiperOrigin-RevId: 688224325	2024-10-21 12:10:27 -07:00
Jing Chen	a093ad0450	Simplify and format gVisor codebase. The changes are just output of `gofmt -s -w .`.	2024-10-13 00:50:32 -07:00
Etienne Perot	2c842da781	Profiling metrics: Add per-line checksum to output. This allows detecting data corruption more finely. A future change will make these errors skipped over, allowing data to still be visualized even if partially corrupt. PiperOrigin-RevId: 634038986	2024-05-15 12:30:44 -07:00
Etienne Perot	640a42e63c	`prometheus`: Remove interface indirection, and output strings not bytes. This is part of a series of changes to add metric charts in performance benchmarks. This change is meant to do three things: - Remove the interface indirection from the Prometheus library, which is performance-critical due to its use in writing out profiling metrics (although the runsc metric server also benefits from this too). - Use a `StringWriter`-like writer contract, to avoid needless casting between strings and bytes within the Prometheus library. The library only ever needs to deal with strings, so it is up to callers to do the conversion to bytes if they need to (which the runsc metric-server does). - Avoid buffer allocations in the metric server when each snapshot is larger than the buffer size. Instead, buffers are saved and reused. PiperOrigin-RevId: 630500004	2024-05-03 14:36:10 -07:00
Andrei Vagin	5f4abad306	Fix a few typos It is an idea of running codespell as part of our presubmit checks. Before enabling it for new changes, let's fix what it has found. Signed-off-by: Andrei Vagin <avagin@gmail.com>	2023-10-25 12:13:42 -07:00
Konstantin Bogomolov	440b37a5c1	Add profiling metric flags to output metric data to local TSV file. The idea behind conditionally compiled metrics originally was to use them in hotpaths for profiling purposes. This CL makes that possible by outputting declared metrics in TSV format, which can be used to track custom events at runtime in relatively high resolution. Usage: 1. Optionally enable compilation of runsc conditionally-compiled metrics by passing in condmetric_profiling to the Go tags. 2. Add these flags to runsc: - [Required] --profiling-metrics-log=/tmp/some.csv - [Optional] --profiling-metrics=/task/syscalls,/task/faults - If this flag is not specified it will monitor all conditionally-compiled metrics by default. - [Optional] --profiling-metrics-rate-us=10000 Some future improvements: - Flag to output a metric-difference between timestamps instead of constant accumulation. - Output a gnuplot command along with the data. - Current monitoring resolution is limited by what time.Sleep allows. This can be overcome by spinning/yielding when lower monitoring rates are requested. PiperOrigin-RevId: 560849611	2023-08-28 16:28:09 -07:00
Etienne Perot	7706a5177f	Internal change (diffbased). PiperOrigin-RevId: 539241054	2023-06-09 20:47:41 -07:00
Etienne Perot	4708d1c115	Internal change (diffbased). PiperOrigin-RevId: 539237974	2023-06-09 20:23:32 -07:00
Etienne Perot	291d746532	gVisor metric verification: Verify statistics of distribution metrics. This adds additional checks that the distribution statistics (minimum, maximum, sum-of-squared-deviations) trend in the correct direction across snapshots. PiperOrigin-RevId: 538028640	2023-06-05 17:48:42 -07:00
Etienne Perot	8e6f57da4a	Metric verification lib: Port over data potentially missing from new snapshot. The Prometheus metric verification library uses a `numberPacker` to pack the numbers it must retain from snapshot to snapshot into a tiny amount of space. This involves putting those that fit in a few bits as-is, but for the larger ones, they are stored in the `numberPacker` struct itself, and the packed number represents an offset within that struct instead. When the library verifies a new snapshot, it instantiates a new `numberPacker` so that numbers that are no longer referenced are not kept around forever. This works well in most cases, but in the case where a new snapshot does not contain a particular metric for whatever reason, the library will still report it as existing, but the indirectly-referenced numbers will no longer exist. This CL reworks how `numberPacker` is used such that this usage of indirectly-stored numbers is tracked more precisely across snapshots, and ensures that all such numbers are ported over from a snapshot to the next even if the next snapshot only contains a partial result. It also changes the `numberPacker` semantics to have its storage be explicitly allocated, and will `panic` if asked to store more than that. This means the `numberPacker` never needs to allocate memory, so (as a bonus) it can use `go:nosplit`. Additionally, the Verifier checks that it uses exactly all the storage slots it thinks it will need, which ensures that it has correctly tracked the expected usage. The tests are minimal but will be further reinforced in a future CL which adds additional distribution statistics into distribution metrics. One of these (the sum-of-squared-deviations statistic) is a floating-point number which almost always requires indirect storage, and thus provides more consistent coverage. Previous unit tests almost never actually required indirect storage, hence this bug not having been found until now. PiperOrigin-RevId: 538002697	2023-06-05 15:53:46 -07:00
Etienne Perot	f8d8cc45ab	gVisor metrics library: Add distribution statistics to Prometheus `Histogram`. This plumbs the distribution statistics out to the data consumed from the gVisor sandbox process by the `runsc metric-server` process. PiperOrigin-RevId: 537990236	2023-06-05 15:04:15 -07:00
Etienne Perot	779cd96de5	`runsc metric-server`: Fix printing labels with only external labels specified Prior to this CL, attempting to write data for a data point with no labels other than `d.ExternalLabels` set would not actually print these labels. This CL adds `d.ExternalLabels` to the check that checks whether there are any labels to write, and simplifies it to not check for nil-ness (as the `len` of a `nil` map is 0). PiperOrigin-RevId: 526715338	2023-04-24 12:05:02 -07:00
Etienne Perot	0776a6d557	Add a secondary label map to `prometheus.Data`. This is useful for metrics where some labels are specific to the `Data` struct in question, while others are shared. It is wasteful to create unique maps for each `*Data` when they contain mostly the same labels. There used to only be one such metric that requires data labels, but an upcoming change will change that, making this change worthwhile to avoid wasting memory. PiperOrigin-RevId: 521596697	2023-04-03 16:44:47 -07:00
Etienne Perot	0101b166b4	`runsc metric-server`: Check that sandboxes don't export reserved metric names This change prevents sandboxes from exporting metrics that match the names reserved for process-level Prometheus metrics: https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics PiperOrigin-RevId: 520736834	2023-03-30 14:04:54 -07:00
Etienne Perot	7b5cd4dda5	`runsc`: Prohibit runsc metrics from starting with the prefix "`meta_`". This prefix is used by the metric server to synthesize its own metrics. If the sandbox were to define metrics with the same name, they would conflict. By having this prefix check, this prevents a malicious sandbox from defining metrics that conflict with those that the metric server is trying to export. PiperOrigin-RevId: 518922970	2023-03-23 11:59:57 -07:00
Etienne Perot	ee8a6b9584	`runsc metric-server`: Add `name_` suffix to `pod` and `namespace` labels. PiperOrigin-RevId: 516350308	2023-03-13 16:17:10 -07:00
Etienne Perot	064faf80a4	`runsc metric-server`: Optimize memory usage and allocation-heavy functions. This is an effort to reduce it to be a well-behaved background process. With 110 sandboxes running, at rest, this goes from ``` VmRSS: 72376 kB RssAnon: 51944 kB ``` to: ``` VmRSS: 45864 kB RssAnon: 25788 kB ``` This GCs much more aggressively, including after every single request, which means we do spend disproportionately more CPU in order to get that low memory usage. From my testing, serving requests takes about 12% more CPU, and it's all spent in GC. The optimizations that went into this are: - Add a method in `state` to discard the global type maps. - Add a custom "packed" number type in `prometheus` library that encodes small integers and floating-point numbers in 32 bits whenever possible without loss of precision, otherwise they are encoded in their full 64-bit glory and the 32-bit representation is used as a pointer to the 64-bit representation. These are stored either per-sandbox (for static-after-sandbox-creation numbers like distribution bucket boundaries), or per-metric-retrieval attempt otherwise. - Use string interning for commonly-seen strings across sandboxes, like metric names and label names. Label values are also interned, but only at a per-sandbox granularity. - Reworked allocation-heavy functions like `OrderedLabels` and some string rendering functions to be (almost) allocation-free. This doesn't reduce memory usage at rest, and does increase their CPU cost, but in return it significantly cuts down on the percentage of CPU time spent in GC (>50% -> 25%) enough to justify spending the extra CPU in these functions. PiperOrigin-RevId: 515181387	2023-03-08 17:15:52 -08:00
Adin Scannell	1ceb814544	Add `default_applicable_licenses` rules to packages. PiperOrigin-RevId: 513581243	2023-03-02 10:50:04 -08:00
Etienne Perot	4e75dc4650	`runsc metric-server`: Add per-sandbox start timestamp metric. This synthetic metric contains the Unix timestamp that each sandbox was started at. This is useful for counter metrics, such that rates of change over time can be properly on a per-sandbox basis. PiperOrigin-RevId: 505228878	2023-01-27 15:56:37 -08:00
Etienne Perot	87b3c88c32	`runsc metric-server`: Ensure iteration ID is stable across server restarts. Prior to this CL, the metric server generated its own ID when it discovered a new sandbox. This means existing sandboxes get new iteration IDs, which breaks the continuity of counter metrics across process restarts. This CL uses the container creation time in the sandbox state file, and uses this (together with the sandbox ID) to generate a unique ID for each instantiation of a sandbox with a given ID. Added test to verify this behavior. PiperOrigin-RevId: 505208685	2023-01-27 14:24:09 -08:00
Etienne Perot	b096b98d6f	gVisor Prometheus lib: Group same-name metrics when writing multiple snapshots This removes `prometheus.Snapshot.WriteTo` and replaces it with a `Write` function that handles writing multiple `Snapshot` objects to the same writer. This is necessary for following the OpenMetrics spec more closely, which e.g. requires same-name metrics to be grouped together in the output. When rendering data from multiple `Snapshot`s from multiple sandboxes, this grouping was not respected. This change is part of a series of changes to support Prometheus-style metrics in `runsc`. PiperOrigin-RevId: 499551401	2023-01-04 12:31:07 -08:00
Etienne Perot	151797db64	gVisor: Implement Prometheus metric integrity verification library. This library implements metric verifier functionality. Given metric registration information extracted from the sandbox at boot time (before any untrusted container is started), it accepts successive data snapshots and verifies that they meet all checks: metrics exist, metadata matches, cardinality is within bounds, etc. A metric verifier is stateful, as it verifies that counters count only upwards, and snapshots in time are only taken with time advancing ever forward. This change is part of a series of changes to support Prometheus-style metrics in `runsc`. Doing so requires making several seemingly-odd design decisions, due to the following architectural constraints: - Prometheus requires an HTTP server serving the `/metrics` endpoint. - For performance reasons, the `runsc boot` process cannot run the `netpoller` goroutine. - Since we don't want to write our own HTTP server implementation, this means the HTTP endpoint has to be served by a separate process that remains running during the lifetime of the container. - The `runsc boot` process is untrusted. - This means we cannot trust metrics data that comes out of the Sentry. Therefore, there needs to be an elaborate dance where we pre-register metric metadata before starting any untrusted workload. Then, the server relaying the metric data must verify the validity of metric values against this metric metadata. This avoids leaking metrics, cardinality blow-ups, and other such DoS vectors. - This feature needs to be easy-to-use in a typical Docker setting. - This means having the ability to just say `--metrics-server=localhost:1337` in the `runsc` runtime entry in `/etc/docker/daemon.json` and have that Just Work(TM), even when multiple containers are running. - Since only one process may listen on a port at a given time, this means the metric server needs to be able to multiplex requests out to multiple running sandboxes, and remain alive for the entire duration of either of these sandboxes. However, it should also die when there are no sandboxes, so that we don't end up with leftover metric servers lying around. - For this reason, the metrics server runs outside of the usual per-container cgroups. - This also saves system resources by not running one server per sandbox. - The metrics server must be exposed to the outside world, and cannot assume that its clients are trustworthy. - For this reason, a metrics server is bound to a runtime root directory, and double-checks all that the sandboxes it is asked to follow actually exist in this root directory. PiperOrigin-RevId: 499370269	2023-01-03 19:37:23 -08:00
Etienne Perot	452f226933	gVisor Prometheus formatting library: Support sharing map of metrics written. When writing data from multiple Prometheus snapshots at once to the same output stream, writing the same metric preamble (`HELP`/`TYPE` comment) is invalid. This change moves the `metricsWritten` map (used to track which metrics have had their preamble already written) into `ExportOptions`, which allows it to be shared across `Snapshot`s. This is useful in the metrics server, which has to write data from multiple sandboxes in the same HTTP response. This way, metrics preamble from the second sandbox is not written to the output. Also print a newline before each new preamble, for ａｅｓｔｈｅｔｉｃｓ. This change is part of a series of changes to support Prometheus-style metrics in `runsc`. PiperOrigin-RevId: 499351638	2023-01-03 17:20:20 -08:00
Etienne Perot	d04a8d3460	gVisor: Add library for exporting instrumentation data in Prometheus format. This adds a new library, `//pkg/prometheus`, which contains just enough data structures such that we can encode instrumentation information in Prometheus information. These data structures are JSON-encodable, such that they can be used over the `runsc` control channel for export (implemented in a future CL). The existing `metric.go` library gains new functionality to export its own data using this new export format. This change is part of a series of changes to support Prometheus-style metrics in `runsc`. Doing so requires making several seemingly-odd design decisions, due to the following architectural constraints: - Prometheus requires an HTTP server serving the `/metrics` endpoint. - For performance reasons, the `runsc boot` process cannot run the `netpoller` goroutine. - Since we don't want to write our own HTTP server implementation, this means the HTTP endpoint has to be served by a separate process that remains running during the lifetime of the container. - The `runsc boot` process is untrusted. - This means we cannot trust metrics data that comes out of the Sentry. Therefore, there needs to be an elaborate dance where we pre-register metric metadata before starting any untrusted workload. Then, the server relaying the metric data must verify the validity of metric values against this metric metadata. This avoids leaking metrics, cardinality blow-ups, and other such DoS vectors. - This feature needs to be easy-to-use in a typical Docker setting. - This means having the ability to just say `--metrics-server=localhost:1337` in the `runsc` runtime entry in `/etc/docker/daemon.json` and have that Just Work(TM), even when multiple containers are running. - Since only one process may listen on a port at a given time, this means the metric server needs to be able to multiplex requests out to multiple running sandboxes, and remain alive for the entire duration of either of these sandboxes. However, it should also die when there are no sandboxes, so that we don't end up with leftover metric servers lying around. - For this reason, the metrics server runs outside of the usual per-container cgroups. - This also saves system resources by not running one server per sandbox. - The metrics server must be exposed to the outside world, and cannot assume that its clients are trustworthy. - For this reason, a metrics server is bound to a runtime root directory, and double-checks all that the sandboxes it is asked to follow actually exist in this root directory. PiperOrigin-RevId: 498039624	2022-12-27 15:05:27 -08:00

25 Commits