This is an effort to reduce it to be a well-behaved background process.
With 110 sandboxes running, at rest, this goes from
```
VmRSS: 72376 kB
RssAnon: 51944 kB
```
to:
```
VmRSS: 45864 kB
RssAnon: 25788 kB
```
This GCs much more aggressively, including after every single request, which
means we do spend disproportionately more CPU in order to get that low memory
usage. From my testing, serving requests takes about 12% more CPU, and it's
all spent in GC.
The optimizations that went into this are:
- Add a method in `state` to discard the global type maps.
- Add a custom "packed" number type in `prometheus` library that encodes small
integers and floating-point numbers in 32 bits whenever possible without
loss of precision, otherwise they are encoded in their full 64-bit glory and
the 32-bit representation is used as a pointer to the 64-bit representation.
These are stored either per-sandbox (for static-after-sandbox-creation
numbers like distribution bucket boundaries), or per-metric-retrieval
attempt otherwise.
- Use string interning for commonly-seen strings across sandboxes, like metric
names and label names. Label values are also interned, but only at a
per-sandbox granularity.
- Reworked allocation-heavy functions like `OrderedLabels` and some string
rendering functions to be (almost) allocation-free. This doesn't reduce
memory usage at rest, and does increase their CPU cost, but in return it
significantly cuts down on the percentage of CPU time spent in GC
(>50% -> 25%) enough to justify spending the extra CPU in these functions.
PiperOrigin-RevId: 515181387
This removes the need for ongoing tags.
This change requires some minor updates to remove dependency cycles, since
the goid package is a base library used by many internals (log, sync, etc.).
PiperOrigin-RevId: 504066914
fs.syncableDentries saves all non-synthetic dentries. This requires a map insert
operation every time a new dentry is created and map removal operation when a
dentry is destroyed. This can be expensive there can be a very large number of
non-synthetic dentries.
Using a map does not provide any additional benefits. We do not require lookup.
Instead use a linked list, as insert and remove are really fast and it allows us
to iterate on the list. It also saves the heap allocations to maintain the map.
Also simplify pkg/state to not use a custom ElementMapper. There is no need to.
PiperOrigin-RevId: 476145150
This adds significant costs to startup, since it is done for
every type in the system. Since the state package already saves
sanity checks for race builds, use this for type registration.
PiperOrigin-RevId: 350259336
- When encodeState.resolve() determines that the resolved reflect.Value is
contained by a previously-resolved object, set wire.Ref.Type to the
containing object's type (existing.obj.Type()) rather than the contained
value's type (obj.Type()).
- When encodeState.resolve() determines that the resolved reflect.Value
contains a previously-resolved object, handle cases where the new object
contains *multiple* previously-resolved objects. (This may cause
previously-allocated object IDs to become unused; to facilitate this, change
encodeState.pending to a map, and change the wire format to prefix each
object with its object ID.)
- Add encodeState.encodedStructs to avoid redundant encoding of structs, since
deduplication of objects via encodeState.resolve() doesn't work for objects
instantiated by StateSave() and passed to SaveValue() (i.e. fields tagged
`state:".(whatever)"`).
- Make unexported array fields deserializable via slices that refer to them by
casting away their unexportedness in decodeState.decodeObject().
Updates #1663
PiperOrigin-RevId: 338727687
Previously, it was not possible to encode/decode an object graph which
contained a pointer to a field within another type. This was because the
encoder was previously unable to disambiguate a pointer to an object and a
pointer within the object.
This CL remedies this by constructing an address map tracking the full memory
range object occupy. The encoded Refvalue message has been extended to allow
references to children objects within another object. Because the encoding
process may learn about object structure over time, we cannot encode any
objects under the entire graph has been generated.
This CL also updates the state package to use standard interfaces intead of
reflection-based dispatch in order to improve performance overall. This
includes a custom wire protocol to significantly reduce the number of
allocations and take advantage of structure packing.
As part of these changes, there are a small number of minor changes in other
places of the code base:
* The lists used during encoding are changed to use intrusive lists with the
objectEncodeState directly, which required that the ilist Len() method is
updated to work properly with the ElementMapper mechanism.
* A bug is fixed in the list code wherein Remove() called on an element that is
already removed can corrupt the list (removing the element if there's only a
single element). Now the behavior is correct.
* Standard error wrapping is introduced.
* Compressio was updated to implement the new wire.Reader and wire.Writer
inteface methods directly. The lack of a ReadByte and WriteByte caused issues
not due to interface dispatch, but because underlying slices for a Read or
Write call through an interface would always escape to the heap!
* Statify has been updated to support the new APIs.
See README.md for a description of how the new mechanism works.
PiperOrigin-RevId: 318010298
These packages don't actually use go_stateify or go_marshal, but end
up implicitly dependent on the respective packages due to our build
rules.
These unnecessary dependencies make them unusuable in certain contexts
due to circular dependency.
PiperOrigin-RevId: 312595738