132 Commits

Author SHA1 Message Date
Jamie Liu df6a537346 Make it possible to pass internal /dev/null via control.Proc.Exec*
PiperOrigin-RevId: 733614278
2025-03-05 00:22:48 -08:00
Fabricio Voznika 0c17600995 Fix restore with pending exec session
Exec'd processes cannot be stitched back to the original caller
and are killed after restore. So ignore failures
to restore host FDs (generally stdio) that belong
to them.

Fixes #11439

PiperOrigin-RevId: 732972054
2025-03-03 10:30:25 -08:00
gVisor bot 86abc85f37 Merge pull request #11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support
PiperOrigin-RevId: 730560110
2025-02-25 14:52:09 -08:00
Jamie Liu e23347e5b5 Move //pkg/sentry/kernel/time to //pkg/sentry/ktime.
This avoids needing to rename it everywhere it's imported.

PiperOrigin-RevId: 693930089
2024-11-06 18:13:51 -08:00
Nicolas Lacasse cceb04f05a Clean up host.TTYFileOperations.
We used to track the foreground process group & session on the
TTYFileOperation, but these are already tracked in kernel.TTY.ThreadGroup.

So remove TTYFileOperations.fgProcessGroup and .session, and replace them with
a kernel.TTY.

This is analogous to how sentry-internal tty's already work.

Updates #10925

PiperOrigin-RevId: 681957240
2024-10-03 11:25:52 -07:00
Nicolas Lacasse d5a9d523bb Implement /dev/tty for donated host TTYs
Fixes #10925

PiperOrigin-RevId: 681684673
2024-10-02 19:40:43 -07:00
Nicolas Lacasse 72193f12c9 Implement /dev/tty for sentry-internal ttys.
The /dev/tty acts as a replica for the current thread group's controlling
terminal.

In a follow-up, I will make /dev/tty work for donated host ttys.

Updates #10925

PiperOrigin-RevId: 681629892
2024-10-02 16:23:54 -07:00
gVisor bot 3c4b246cf2 Fix printf violations inside of the gvisor code
Recently printf.Analyzer has become stricter
(https://github.com/golang/go/issues/60529)
which led to new findings.
gvisor nogo tests run this analyzer and fail if it produces findings.

PiperOrigin-RevId: 671657227
2024-09-06 00:45:23 -07:00
Nicolas Lacasse 63e04396fa Add SignalProcess control method.
PiperOrigin-RevId: 662259919
2024-08-12 16:07:19 -07:00
gVisor bot b04d0e9293 Internal change.
PiperOrigin-RevId: 658853348
2024-08-02 11:39:57 -07:00
Ayush Ranjan e14770381f Add save/restore/resume hooks in runsc/boot/restore.go.
Move save/resume methods to containerManager.

PiperOrigin-RevId: 642011529
2024-06-10 13:54:20 -07:00
Jamie Liu e882488ed7 pgalloc: add SaveOpts.ExcludeCommittedZeroPages
This option, enabled via `runsc checkpoint --exclude-committed-zero-pages`,
instructs `pgalloc.MemoryFile.SaveTo()` to also exclude definitely-committed
zero pages from checkpointing (in addition to possibly-committed zero pages,
which are always scanned for and excluded). This is useful when the application
being checkpointed is known to have a large number of committed zero pages:
pages that (1) have been touched by application memory accesses, a syscall such
as read(), or page pinning by e.g. a driver, and (2) have not been subsequently
released by the application to the operating system by e.g. munmap() or
madvise(MADV_DONTNEED) (+ page unpinning if necessary), and (3) are filled with
zero bytes.

Minor changes:

- In `MemoryFile.updateUsageLocked()`, pass file offset to `checkCommitted` so
  that `MemoryFile.SaveTo()`'s `checkCommitted` can use `FALLOC_FL_PUNCH_HOLE`
  to decommit pages rather than `MADV_REMOVE` (which translates addresses to
  file offsets and then invokes `FALLOC_FL_PUNCH_HOLE`).

- In `MemoryFile.SaveTo()`, buffer up to a hugepage worth of pages to decommit
  rather than decommitting one page per syscall.

- Increment `MemoryFile.usageExpected` in `MemoryFile.LoadFrom()`, such that
  the first following call to `MemoryFile.UpdateUsage()` might skip the call
  to `MemoryFile.updateUsageLocked()` (if memory usage hasn't changed since
  loading).

PiperOrigin-RevId: 632370455
2024-05-09 21:50:58 -07:00
Ayush Ranjan a3ae7f25a0 Create separate MemoryFile metadata file in compression=none mode.
This helps in parallelizing MemoryFile restore with kernel restore.

PiperOrigin-RevId: 629378902
2024-04-30 05:21:53 -07:00
Fabricio Voznika 3d32050710 Multi-container restore
Checkpoint of any container continues to trigger an entire pod
checkpoint, which includes the state of all containers.

Restore must be done for each of the containers, one at a time.
The actual restore is triggered when the last container is restored.
The set of flags and spec must be the same for the `restore` command
as it was for the `create` commands when the containers were created.
Containers are identified by their names and can be restored in any
order. If containers have no name, they must be stored in the same
order they were created. Container IDs and host FDs are rewired
correctly after restore.

Updates #1956

PiperOrigin-RevId: 629272041
2024-04-29 20:36:21 -07:00
Nayana Bidari 93bbcbf35b Retrieve UID/GID from the user string.
Add a method to retrieve the UID and GID for a user and the tests to verify.

PiperOrigin-RevId: 626187254
2024-04-18 16:46:35 -07:00
Ayush Ranjan 5e9207a966 Create separate pages.img checkpoint file when compression=none.
PiperOrigin-RevId: 624210535
2024-04-12 09:57:06 -07:00
Fabricio Voznika d514dc4424 Track exec'ed processes and kill them after restore
Processes that are exec'ed into a container cannot be properly
restored because the caller is no longer present. This change
tracks processes that are exec'ed and kill them upon restore.

Updates #1956

PiperOrigin-RevId: 623644184
2024-04-10 16:53:49 -07:00
Nayana Bidari 87d8df37c7 Enable save/checkpoint resume with runsc checkpoint command.
Enables save resume with checkpoint command. Previously when --leave-running
was set, the sandbox was destroyed after the checkpoint and restored with the
same id. With this change the sandbox will not be destroyed and resumes running
after the checkpoint.

PiperOrigin-RevId: 623282685
2024-04-09 14:34:50 -07:00
Fabricio Voznika e243aa6b91 Automated rollback of changelist 621996848
PiperOrigin-RevId: 623223878
2024-04-09 11:13:23 -07:00
Fabricio Voznika 1a5bd5cfdf Track container name in the kernel
This is done to allow re-mapping of container IDs after the
container has been restored. Upon restore, container names
remain the same, but container ID may be (and likely are)
different.

Updates #1956

PiperOrigin-RevId: 621996848
2024-04-04 16:01:28 -07:00
Fabricio Voznika 26dd42a0ea Allow host FD to be restored with a different FD
FD numbers can vary between depending on the options used with
runsc command. For example, there are extra FDs passed to
`runsc boot` if `debug-log` is enabled. So instead of requiring
all FDs to have the exact same numbering during restore, provide
a mechanism to remap the FD. Each host FD has a unique identifier
with a map to their corresponding FD. Then during restore, FD
numbers are remapped to the correct ones.

Updates #1956

PiperOrigin-RevId: 615215783
2024-03-12 16:57:07 -07:00
Nayana Bidari 29234bc44b Mount cgroups per container in runsc.
Adds support for per container stats in runsc based on cgroups.
1. Removed the 'cgroupfs' config flag.
2. Mounts the cgroups (/sys/fs/cgroup/<controller>) which will be shared
across all containers during root/pause container startup.
3. The container cgroups (eg:/sys/fs/cgroup/controller/<container-id>) are
mounted along with other container mounts before starting the container
process if the cgroups mount is in the spec.

Updates #172

PiperOrigin-RevId: 590752853
2023-12-13 16:47:49 -08:00
Nayana Bidari 0fd906d7e0 Fix the sandbox memory usage via GetContainerMemoryUsage API.
The total(sandbox) memory usage using the GetContainerMemoryUsage API will
return incorrect usage when called before calling the API for each individual
containers in the sandbox. This is because the memory usage for the containers
cgroup is not updated while calculating the total usage. This CL fixes it by
updating the usage for every child cgroup, which will return the correct memory
usage for the parent cgroup.

PiperOrigin-RevId: 574300913
2023-10-17 16:44:07 -07:00
Andrei Vagin f3b0a527c2 inet: allow to create abstract unix sockets in non-root namespaces
PiperOrigin-RevId: 573253619
2023-10-13 10:20:56 -07:00
Andrei Vagin eb6b3ac00b vfs: MountNamespace.Root() has to return a top mount of /
A few mounts can be mounted on top of `/`.

PiperOrigin-RevId: 558264274
2023-08-18 15:35:47 -07:00