25 Commits

Author SHA1 Message Date
Etienne Perot 4362b11be9 Add --final-metrics-log flag to export metric data upon sandbox termination.
Fixes issue #11068

PiperOrigin-RevId: 688395401
2024-10-21 22:09:40 -07:00
Etienne Perot d3b95fae44 Deflake prometheus_test:TestWriteMultipleSnapshots.
The Prometheus parser doesn't guarantee that it returns metric data in
timestamp-sorted order. So sort it in the test manually.

Before: Fails 7 out of 2048 times
After: Fails 0 out of 2048 times
PiperOrigin-RevId: 688224325
2024-10-21 12:10:27 -07:00
Jing Chen a093ad0450 Simplify and format gVisor codebase.
The changes are just output of `gofmt -s -w .`.
2024-10-13 00:50:32 -07:00
Etienne Perot 2c842da781 Profiling metrics: Add per-line checksum to output.
This allows detecting data corruption more finely.

A future change will make these errors skipped over, allowing data to
still be visualized even if partially corrupt.

PiperOrigin-RevId: 634038986
2024-05-15 12:30:44 -07:00
Etienne Perot 640a42e63c prometheus: Remove interface indirection, and output strings not bytes.
This is part of a series of changes to add metric charts in performance
benchmarks.

This change is meant to do three things:

- Remove the interface indirection from the Prometheus library, which is
  performance-critical due to its use in writing out profiling metrics
  (although the runsc metric server also benefits from this too).
- Use a `StringWriter`-like writer contract, to avoid needless casting
  between strings and bytes within the Prometheus library. The library only
  ever needs to deal with strings, so it is up to callers to do the
  conversion to bytes if they need to (which the runsc metric-server does).
- Avoid buffer allocations in the metric server when each snapshot is larger
  than the buffer size. Instead, buffers are saved and reused.

PiperOrigin-RevId: 630500004
2024-05-03 14:36:10 -07:00
Andrei Vagin 5f4abad306 Fix a few typos
It is an idea of running codespell as part of our presubmit checks.
Before enabling it for new changes, let's fix what it has found.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
2023-10-25 12:13:42 -07:00
Konstantin Bogomolov 440b37a5c1 Add profiling metric flags to output metric data to local TSV file.
The idea behind conditionally compiled metrics originally was to use them in
hotpaths for profiling purposes. This CL makes that possible by outputting
declared metrics in TSV format, which can be used to track custom events at
runtime in relatively high resolution.

Usage:
    1. Optionally enable compilation of runsc conditionally-compiled metrics
       by passing in condmetric_profiling to the Go tags.
    2. Add these flags to runsc:
    - [Required] --profiling-metrics-log=/tmp/some.csv
    - [Optional] --profiling-metrics=/task/syscalls,/task/faults
      - If this flag is not specified it will monitor all
        conditionally-compiled metrics by default.
    - [Optional] --profiling-metrics-rate-us=10000

Some future improvements:
    - Flag to output a metric-difference between timestamps instead of
      constant accumulation.
    - Output a gnuplot command along with the data.
    - Current monitoring resolution is limited by what time.Sleep allows.
      This can be overcome by spinning/yielding when lower monitoring
      rates are requested.

PiperOrigin-RevId: 560849611
2023-08-28 16:28:09 -07:00
Etienne Perot 7706a5177f Internal change (diffbased).
PiperOrigin-RevId: 539241054
2023-06-09 20:47:41 -07:00
Etienne Perot 4708d1c115 Internal change (diffbased).
PiperOrigin-RevId: 539237974
2023-06-09 20:23:32 -07:00
Etienne Perot 291d746532 gVisor metric verification: Verify statistics of distribution metrics.
This adds additional checks that the distribution statistics (minimum,
maximum, sum-of-squared-deviations) trend in the correct direction across
snapshots.

PiperOrigin-RevId: 538028640
2023-06-05 17:48:42 -07:00
Etienne Perot 8e6f57da4a Metric verification lib: Port over data potentially missing from new snapshot.
The Prometheus metric verification library uses a `numberPacker` to pack the
numbers it must retain from snapshot to snapshot into a tiny amount of space.
This involves putting those that fit in a few bits as-is, but for the larger
ones, they are stored in the `numberPacker` struct itself, and the packed
number represents an offset within that struct instead.

When the library verifies a new snapshot, it instantiates a new `numberPacker`
so that numbers that are no longer referenced are not kept around forever.
This works well in most cases, but in the case where a new snapshot does *not*
contain a particular metric for whatever reason, the library will still report
it as existing, but the indirectly-referenced numbers will no longer exist.

This CL reworks how `numberPacker` is used such that this usage of
indirectly-stored numbers is tracked more precisely across snapshots, and
ensures that all such numbers are ported over from a snapshot to the next even
if the next snapshot only contains a partial result.

It also changes the `numberPacker` semantics to have its storage be explicitly
allocated, and will `panic` if asked to store more than that. This means the
`numberPacker` never needs to allocate memory, so (as a bonus) it can use
`go:nosplit`. Additionally, the Verifier checks that it uses exactly all the
storage slots it thinks it will need, which ensures that it has correctly
tracked the expected usage.

The tests are minimal but will be further reinforced in a future CL which adds
additional distribution statistics into distribution metrics. One of these
(the sum-of-squared-deviations statistic) is a floating-point number which
almost always requires indirect storage, and thus provides more consistent
coverage. Previous unit tests almost never actually required indirect storage,
hence this bug not having been found until now.

PiperOrigin-RevId: 538002697
2023-06-05 15:53:46 -07:00
Etienne Perot f8d8cc45ab gVisor metrics library: Add distribution statistics to Prometheus Histogram.
This plumbs the distribution statistics out to the data consumed from the
gVisor sandbox process by the `runsc metric-server` process.

PiperOrigin-RevId: 537990236
2023-06-05 15:04:15 -07:00
Etienne Perot 779cd96de5 runsc metric-server: Fix printing labels with only external labels specified
Prior to this CL, attempting to write data for a data point with no labels
other than `d.ExternalLabels` set would not actually print these labels.

This CL adds `d.ExternalLabels` to the check that checks whether there are any
labels to write, and simplifies it to not check for nil-ness (as the `len` of
a `nil` map is 0).

PiperOrigin-RevId: 526715338
2023-04-24 12:05:02 -07:00
Etienne Perot 0776a6d557 Add a secondary label map to prometheus.Data.
This is useful for metrics where some labels are specific to the `Data` struct
in question, while others are shared. It is wasteful to create unique maps
for each `*Data` when they contain mostly the same labels.

There used to only be one such metric that requires data labels, but an
upcoming change will change that, making this change worthwhile to avoid
wasting memory.

PiperOrigin-RevId: 521596697
2023-04-03 16:44:47 -07:00
Etienne Perot 0101b166b4 runsc metric-server: Check that sandboxes don't export reserved metric names
This change prevents sandboxes from exporting metrics that match the names
reserved for process-level Prometheus metrics:
https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics

PiperOrigin-RevId: 520736834
2023-03-30 14:04:54 -07:00
Etienne Perot 7b5cd4dda5 runsc: Prohibit runsc metrics from starting with the prefix "meta_".
This prefix is used by the metric server to synthesize its own metrics.

If the sandbox were to define metrics with the same name, they would conflict.

By having this prefix check, this prevents a malicious sandbox from defining
metrics that conflict with those that the metric server is trying to export.

PiperOrigin-RevId: 518922970
2023-03-23 11:59:57 -07:00
Etienne Perot ee8a6b9584 runsc metric-server: Add name_ suffix to pod and namespace labels.
PiperOrigin-RevId: 516350308
2023-03-13 16:17:10 -07:00
Etienne Perot 064faf80a4 runsc metric-server: Optimize memory usage and allocation-heavy functions.
This is an effort to reduce it to be a well-behaved background process.

With 110 sandboxes running, at rest, this goes from

```
VmRSS:   72376 kB
RssAnon: 51944 kB
```

to:

```
VmRSS:   45864 kB
RssAnon: 25788 kB
```

This GCs much more aggressively, including after every single request, which
means we do spend disproportionately more CPU in order to get that low memory
usage. From my testing, serving requests takes about 12% more CPU, and it's
all spent in GC.

The optimizations that went into this are:

- Add a method in `state` to discard the global type maps.
- Add a custom "packed" number type in `prometheus` library that encodes small
  integers and floating-point numbers in 32 bits whenever possible without
  loss of precision, otherwise they are encoded in their full 64-bit glory and
  the 32-bit representation is used as a pointer to the 64-bit representation.
  These are stored either per-sandbox (for static-after-sandbox-creation
  numbers like distribution bucket boundaries), or per-metric-retrieval
  attempt otherwise.
- Use string interning for commonly-seen strings across sandboxes, like metric
  names and label names. Label values are also interned, but only at a
  per-sandbox granularity.
- Reworked allocation-heavy functions like `OrderedLabels` and some string
  rendering functions to be (almost) allocation-free. This doesn't reduce
  memory usage at rest, and does increase their CPU cost, but in return it
  significantly cuts down on the percentage of CPU time spent in GC
  (>50% -> 25%) enough to justify spending the extra CPU in these functions.

PiperOrigin-RevId: 515181387
2023-03-08 17:15:52 -08:00
Adin Scannell 1ceb814544 Add default_applicable_licenses rules to packages.
PiperOrigin-RevId: 513581243
2023-03-02 10:50:04 -08:00
Etienne Perot 4e75dc4650 runsc metric-server: Add per-sandbox start timestamp metric.
This synthetic metric contains the Unix timestamp that each sandbox was
started at.

This is useful for counter metrics, such that rates of change over time can
be properly on a per-sandbox basis.

PiperOrigin-RevId: 505228878
2023-01-27 15:56:37 -08:00
Etienne Perot 87b3c88c32 runsc metric-server: Ensure iteration ID is stable across server restarts.
Prior to this CL, the metric server generated its own ID when it discovered
a new sandbox. This means existing sandboxes get new iteration IDs, which
breaks the continuity of counter metrics across process restarts.

This CL uses the container creation time in the sandbox state file, and uses
this (together with the sandbox ID) to generate a unique ID for each
instantiation of a sandbox with a given ID.

Added test to verify this behavior.

PiperOrigin-RevId: 505208685
2023-01-27 14:24:09 -08:00
Etienne Perot b096b98d6f gVisor Prometheus lib: Group same-name metrics when writing multiple snapshots
This removes `prometheus.Snapshot.WriteTo` and replaces it with a `Write`
function that handles writing multiple `Snapshot` objects to the same writer.

This is necessary for following the OpenMetrics spec more closely, which e.g.
requires same-name metrics to be grouped together in the output. When rendering
data from multiple `Snapshot`s from multiple sandboxes, this grouping was not
respected.

This change is part of a series of changes to support Prometheus-style metrics
in `runsc`.

PiperOrigin-RevId: 499551401
2023-01-04 12:31:07 -08:00
Etienne Perot 151797db64 gVisor: Implement Prometheus metric integrity verification library.
This library implements metric verifier functionality. Given metric
registration information extracted from the sandbox at boot time (before
any untrusted container is started), it accepts successive data snapshots
and verifies that they meet all checks: metrics exist, metadata matches,
cardinality is within bounds, etc.

A metric verifier is stateful, as it verifies that counters count only
upwards, and snapshots in time are only taken with time advancing ever
forward.

This change is part of a series of changes to support Prometheus-style metrics
in `runsc`. Doing so requires making several seemingly-odd design decisions,
due to the following architectural constraints:

- Prometheus requires an HTTP server serving the `/metrics` endpoint.
- For performance reasons, the `runsc boot` process cannot run the `netpoller`
  goroutine.
  - Since we don't want to write our own HTTP server implementation, this
    means the HTTP endpoint has to be served by a separate process that
    remains running during the lifetime of the container.
- The `runsc boot` process is untrusted.
  - This means we cannot trust metrics data that comes out of the Sentry.
    Therefore, there needs to be an elaborate dance where we pre-register
    metric metadata before starting any untrusted workload. Then, the server
    relaying the metric data must verify the validity of metric values against
    this metric metadata. This avoids leaking metrics, cardinality blow-ups,
    and other such DoS vectors.
- This feature needs to be easy-to-use in a typical Docker setting.
  - This means having the ability to just say
    `--metrics-server=localhost:1337` in the `runsc` runtime entry in
    `/etc/docker/daemon.json` and have that Just Work(TM), even when multiple
    containers are running.
  - Since only one process may listen on a port at a given time, this means
    the metric server needs to be able to multiplex requests out to multiple
    running sandboxes, and remain alive for the entire duration of either of
    these sandboxes. However, it should also die when there are no sandboxes,
    so that we don't end up with leftover metric servers lying around.
  - For this reason, the metrics server runs *outside* of the usual
    per-container cgroups.
  - This also saves system resources by not running one server per sandbox.
- The metrics server must be exposed to the outside world, and cannot assume
  that its clients are trustworthy.
  - For this reason, a metrics server is bound to a runtime root directory,
    and double-checks all that the sandboxes it is asked to follow actually
    exist in this root directory.

PiperOrigin-RevId: 499370269
2023-01-03 19:37:23 -08:00
Etienne Perot 452f226933 gVisor Prometheus formatting library: Support sharing map of metrics written.
When writing data from multiple Prometheus snapshots at once to the same
output stream, writing the same metric preamble (`HELP`/`TYPE` comment) is
invalid.

This change moves the `metricsWritten` map (used to track which metrics have
had their preamble already written) into `ExportOptions`, which allows it to
be shared across `Snapshot`s.

This is useful in the metrics server, which has to write data from multiple
sandboxes in the same HTTP response. This way, metrics preamble from the
second sandbox is not written to the output.

Also print a newline before each new preamble, for aesthetics.

This change is part of a series of changes to support Prometheus-style metrics
in `runsc`.

PiperOrigin-RevId: 499351638
2023-01-03 17:20:20 -08:00
Etienne Perot d04a8d3460 gVisor: Add library for exporting instrumentation data in Prometheus format.
This adds a new library, `//pkg/prometheus`, which contains just enough data
structures such that we can encode instrumentation information in Prometheus
information. These data structures are JSON-encodable, such that they can be
used over the `runsc` control channel for export (implemented in a future CL).

The existing `metric.go` library gains new functionality to export its own
data using this new export format.

This change is part of a series of changes to support Prometheus-style metrics
in `runsc`. Doing so requires making several seemingly-odd design decisions,
due to the following architectural constraints:

- Prometheus requires an HTTP server serving the `/metrics` endpoint.
- For performance reasons, the `runsc boot` process cannot run the `netpoller`
  goroutine.
  - Since we don't want to write our own HTTP server implementation, this
    means the HTTP endpoint has to be served by a separate process that
    remains running during the lifetime of the container.
- The `runsc boot` process is untrusted.
  - This means we cannot trust metrics data that comes out of the Sentry.
    Therefore, there needs to be an elaborate dance where we pre-register
    metric metadata before starting any untrusted workload. Then, the server
    relaying the metric data must verify the validity of metric values against
    this metric metadata. This avoids leaking metrics, cardinality blow-ups,
    and other such DoS vectors.
- This feature needs to be easy-to-use in a typical Docker setting.
  - This means having the ability to just say
    `--metrics-server=localhost:1337` in the `runsc` runtime entry in
    `/etc/docker/daemon.json` and have that Just Work(TM), even when multiple
    containers are running.
  - Since only one process may listen on a port at a given time, this means
    the metric server needs to be able to multiplex requests out to multiple
    running sandboxes, and remain alive for the entire duration of either of
    these sandboxes. However, it should also die when there are no sandboxes,
    so that we don't end up with leftover metric servers lying around.
  - For this reason, the metrics server runs *outside* of the usual
    per-container cgroups.
  - This also saves system resources by not running one server per sandbox.
- The metrics server must be exposed to the outside world, and cannot assume
  that its clients are trustworthy.
  - For this reason, a metrics server is bound to a runtime root directory,
    and double-checks all that the sandboxes it is asked to follow actually
    exist in this root directory.

PiperOrigin-RevId: 498039624
2022-12-27 15:05:27 -08:00