86 Commits

Author SHA1 Message Date
Andrei Vagin 3b28deddf4 sentry/syscalls: update docs for the unshare syscall
Mount and network namespace are supported.

PiperOrigin-RevId: 663442921
2024-08-15 14:04:15 -07:00
Nicolas Lacasse b667130795 Clean up and re-enable process_vm_readv/writev
Some fixes:

* First argument of Task.CopyContext should always be the context.Context
  derived from the currently running task, because it is used to get a
  CopyScratchBuffer, which must be from the current task. This solved a bunch
  of data races.

* Fix logic around which process is remote and which is local. These were
  getting mixed up.

* Always read iovec structs (local and remote) from the local process's address
  space, since they are syscall arguments. Only use the remote process address
  space to read the memory pointed to by the remote iovecs.

* Added ptrace permissions check, per linux.

* Delete unused code from kernel/task_usermem.go

* Rewrote tests so that we read to (write from) a subprocess, rather than the
  other way around. So we don't need CAP_PTRACE to run the tests.

* Also make tests async-signal-safe after call to fork(). I think this was the
  source of the flakyness on linux previously.

PiperOrigin-RevId: 570506366
2023-10-03 15:01:49 -07:00
Etienne Perot 02f70b5df0 Implement a subset of keyctl(2) and keyrings(7) for better Docker support.
The intention of this change is to cover a sufficient surface to accommodate
the use of running Docker within gVisor, rather than a full implementation.

This implements the following features:

  - Keys as a first-class concept in the kernel.
  - Tracking keys in user namespaces.
  - Task session keyrings: possession, inheritance.
  - Key permission enforcement.
  - The following `keyctl(2)` operations:
    - `KEYCTL_GET_KEYRING_ID`
    - `KEYCTL_DESCRIBE`
    - `KEYCTL_JOIN_SESSION_KEYRING`
    - `KEYCTL_SETPERM`

Notably, this does not implement:

  - The ability to actually add any keys other than the session keyring
    (which does not hold any cryptographic key data).
  - Other special keyrings (thread keyring, process keyring, user session
    keyring, etc.).
  - Lots of `keyctl(2)` operations.
  - Key expiration.
  - Key garbage collection. Keys live until their user namespace is destroyed.
    However, each user namespace is limited to 200 keys, so memory growth is
    bounded.
  - `add_key(2)`
  - `request_key(2)`

... However, this makes design choices that seem odd given the limited scope
of this change, but make sense when taking into account the desire to
eventually accommodate them in the future. For example, there are many
`switch` statements with only one option for session keyrings, which would get
more options when adding support for other special keyrings. Similarly, the
signature of `PossessedKeys` takes in all 3 special "possessed" keyrings, but
currently only ever gets the session keyring as non-nil.

PiperOrigin-RevId: 567047896
2023-09-20 12:38:39 -07:00
Shambhavi Srivastava 8623c872ce Automated rollback of changelist 557871250
PiperOrigin-RevId: 560158129
2023-08-25 11:58:29 -07:00
Nicolas Lacasse 9be6f98612 Automated rollback of changelist 554554034
PiperOrigin-RevId: 557871250
2023-08-17 10:50:10 -07:00
Shambhavi Srivastava 21d66119b7 Implementing clone3
Updates #8585

PiperOrigin-RevId: 554554034
2023-08-07 12:19:32 -07:00
Andrei Vagin 46115504ec Implement the setns syscall
This change introduces the nsfs file system. Each new namespace allocates
a new nsfs inode.

Here are reasons why we need these inodes:
* each namespace has to have an unique id.
* proc/pid/ns/ contains one entry for each namespace. Bind mounting one of
  the files in this directory to somewhere else in the filesystem keeps the
  corresponding namespace alive even if all processes currently in
  the namespace terminate.
* setns() allows the calling process to join an existing namespace specified
  by a file descriptor.

PiperOrigin-RevId: 550694515
2023-07-24 15:45:08 -07:00
Etienne Perot 44e2d0fcfe gVisor: Refactor SyscallFn to take in the syscall number as argument.
This will be used to plumb the syscall number through to a counter metric that
exports the number of times an unimplemented syscall has been called.

Plenty of syscall implementations call `EmitUnimplementedEvent` for flags and
settings that are not implemented. With `sysno` available, they will be able
to plumb that bit of information through.

PiperOrigin-RevId: 518635831
2023-03-22 12:06:26 -07:00
Konstantin Bogomolov 1832c38a95 Add TESTONLY sentry panic trigger through afs_syscall.
This flag is added for tests that need to trigger a panic in the sentry
kernel. Only done for x86_64 which does have a dedicated syscall number for
afs_syscall; ARM does not.

PiperOrigin-RevId: 512731631
2023-02-27 14:28:30 -08:00
Fabricio Voznika 0e5d0cc13a Rename pselect to pselect6
The actual syscall name is pselect6. This name is used in Docker's
default seccomp profile.

PiperOrigin-RevId: 503297316
2023-01-19 16:38:48 -08:00
Zach Koopmans bfbb9fa4cc Disable process_vm_(read|write)v.
Syzkaller has found several issues with the two syscalls and a rework is
required. Disable tests and the syscall until issues can be fixed.

PiperOrigin-RevId: 491716795
2022-11-29 13:14:29 -08:00
Zach Koopmans 106f6ea967 Re-enable process_vm_(read|write)v
PiperOrigin-RevId: 489298284
2022-11-17 13:50:43 -08:00
Fabricio Voznika e6f019594e Add read/write syscalls to trace points
Closes #8092

PiperOrigin-RevId: 488719448
2022-11-15 11:54:23 -08:00
Ayush Ranjan ed35016d99 Delete VFS1 syscall handlers.
Directly use VFS2 syscall handlers. No need to override VFS2 handlers.
Updates #1624

PiperOrigin-RevId: 488448348
2022-11-14 13:11:22 -08:00
Shambhavi Srivastava c8e98d9f5e Add Points to some syscalls
Added a raw syscall points to all syscalls. Added schematized syscall
points to the following syscalls:

  - timerfd_create
  - timerfd_settime
  - timerfd_gettime
  - fork, vfork
  - inotify_init, inotify_init1
  - inotify_add_watch
  - inotify_rm_watch
  - socketpair

Updates #4805

PiperOrigin-RevId: 459596784
2022-07-07 14:10:36 -07:00
Shambhavi Srivastava 45b06bbb76 Add Points to some syscalls
Added a raw syscall points to all syscalls. Added schematized syscall
points to the following syscalls:

  - chroot
  - dup, dup2, dup3
  - prlimit64
  - eventfd, eventfd2
  - signalfd, signalfd4
  - bind
  - accept, accept4
  - fcntl
  - pipe, pipe2

Updates #4805

PiperOrigin-RevId: 457139504
2022-06-24 19:37:41 -07:00
Shambhavi Srivastava f84e9a85d1 Add Points to some syscalls
Added a raw syscall points to all syscalls. Added schematized syscall
points to the following syscalls:

- Chdir
- Fchdir
- Setgid
- Setuid
- Setsid
- Setresuid
- Setresgid

PiperOrigin-RevId: 451001973
2022-05-25 13:34:03 -07:00
Fabricio Voznika f2b6fbb47e Add Points to some syscalls
Added a raw syscall points to all syscalls. Added schematized syscall
points to the following syscalls:

  - read
  - close
  - socket
  - connect
  - execve
  - creat
  - openat
  - execveat

Updates #4805

PiperOrigin-RevId: 446008358
2022-05-02 13:03:04 -07:00
Konstantin Bogomolov 5c95e1d39c Implement close_range.
Fixes #5500

PiperOrigin-RevId: 431454836
2022-02-28 09:37:03 -08:00
gVisor bot b495ae599a Merge pull request #6262 from sudo-sturbia:msgqueue/syscalls3
PiperOrigin-RevId: 391416650
2021-08-17 17:44:26 -07:00
Zyad A. Ali 2f1c65e7fa Implement stub for msgctl(2).
Add support for msgctl and enable tests.

Fixes #135
2021-08-17 20:34:51 +02:00
Zach Koopmans 02370bbd31 [syserror] Convert remaining syserror definitions to linuxerr.
Convert remaining public errors (e.g. EINTR) from syserror to linuxerr.

PiperOrigin-RevId: 390471763
2021-08-12 15:19:12 -07:00
Zyad A. Ali 6ef2f177fb Implement MSG_COPY option for msgrcv(2).
Implement Queue.Copy and add more tests for it.

Updates #135
2021-08-03 18:13:24 +02:00
Zyad A. Ali eb638ee583 Implement stubs for msgsnd(2) and msgrcv(2).
Add support for msgsnd and msgrcv and enable syscall tests.

Updates #135
2021-08-03 18:13:24 +02:00
Zyad A. Ali 4a874557f5 Implement stubs for msgget(2) and msgctl(IPC_RMID).
Add support for msgget, and msgctl(IPC_RMID), and enable msgqueue
syscall tests.

Updates #135
2021-07-13 22:12:02 +02:00