Commit Graph

304 Commits

Author SHA1 Message Date
Andrei Vagin f347a578b7 Move platform.File in memmap
The subsequent systrap changes will need to import memmap from
the platform package.

PiperOrigin-RevId: 323409486
2020-07-27 11:59:10 -07:00
Sam Balana 82a5cada59 Add AfterFunc to tcpip.Clock
Changes the API of tcpip.Clock to also provide a method for scheduling and
rescheduling work after a specified duration. This change also implements the
AfterFunc method for existing implementations of tcpip.Clock.

This is the groundwork required to mock time within tests. All references to
CancellableTimer has been replaced with the tcpip.Job interface, allowing for
custom implementations of scheduling work.

This is a BREAKING CHANGE for clients that implement their own tcpip.Clock or
use tcpip.CancellableTimer. Migration plan:
 1. Add AfterFunc(d, f) to tcpip.Clock
 2. Replace references of tcpip.CancellableTimer with tcpip.Job
 3. Replace calls to tcpip.CancellableTimer#StopLocked with tcpip.Job#Cancel
 4. Replace calls to tcpip.CancellableTimer#Reset with tcpip.Job#Schedule
 5. Replace calls to tcpip.NewCancellableTimer with tcpip.NewJob.

PiperOrigin-RevId: 322906897
2020-07-23 18:00:43 -07:00
Nicolas Lacasse 4ec3516332 Implement get/set_robust_list.
PiperOrigin-RevId: 322904430
2020-07-23 17:42:50 -07:00
Dean Deng 8fed97794e Add task work mechanism.
Like task_work in Linux, this allows us to register callbacks to be executed
before returning to userspace. This is needed for kcov support, which requires
coverage information to be up-to-date whenever we are in user mode. We will
provide coverage data through the kcov interface to enable coverage-directed
fuzzing in syzkaller.

One difference from Linux is that task work cannot queue work before the
transition to userspace that it precedes; queued work will be picked up before
the next transition.

PiperOrigin-RevId: 322889984
2020-07-23 16:25:34 -07:00
gVisor bot 8939fae0af Merge pull request #3165 from ridwanmsharif:ridwanmsharif/fuse-off-by-default
PiperOrigin-RevId: 321411758
2020-07-15 12:14:42 -07:00
Fabricio Voznika 59a5479409 Disable debug time adjustment logging
When --debug is enabled, the following log messages are
printed every second filling up the log:

D0430 18:04:42.823775  129561 parameters.go:238] Clock(Monotonic): error: 46 ns, adjusted frequency from 3591713733 Hz to 3591714196 Hz
D0430 18:04:42.823870  129561 parameters.go:238] Clock(Realtime): error: 36 ns, adjusted frequency from 3591714003 Hz to 3591714169 Hz
D0430 18:04:42.823892  129561 timekeeper.go:209] Updating VDSO parameters: {monotonicReady:1 monotonicBaseCycles:15758797714254696 monotonicBaseRef:29000233837 monotonicFrequency:3591714196 realtimeReady:1 realtimeBaseCycles:15758797714610880 realtimeBaseRef:1588269882823867374 realtimeFrequency:3591714169}

Info and warning messages for larger changes are kept the same.

PiperOrigin-RevId: 321048523
2020-07-13 15:42:53 -07:00
Ridwan Sharif abffebde7b Gate FUSE behind a runsc flag
This change gates all FUSE commands (by gating /dev/fuse) behind a runsc
flag. In order to use FUSE commands, use the --fuse flag with the --vfs2
flag. Check if FUSE is enabled by running dmesg in the sandbox.
2020-07-09 02:01:29 -04:00
Zach Koopmans 6a90c88b97 Port fallocate to VFS2.
PiperOrigin-RevId: 319283715
2020-07-01 13:14:44 -07:00
Dean Deng cda2979b63 Complete async signal delivery support in vfs2.
- Support FIOASYNC, FIO{SET,GET}OWN, SIOC{G,S}PGRP (refactor getting/setting
  owner in the process).
- Unset signal recipient when setting owner with pid == 0 and
  valid owner type.

Updates #2923.

PiperOrigin-RevId: 319231420
2020-07-01 08:42:12 -07:00
Dean Deng e8f1a5c1f6 Port GETOWN, SETOWN fcntls to vfs2.
Also make some fixes to vfs1's F_SETOWN. The fcntl test now entirely passes
on vfs2.

Fixes #2920.

PiperOrigin-RevId: 318669529
2020-06-27 21:33:37 -07:00
Kevin Krakauer 9cfc154975 Require CAP_SYS_ADMIN in the root user namespace for TTY theft
PiperOrigin-RevId: 318563543
2020-06-26 16:24:39 -07:00
Tamir Duberstein 4069461877 Avoid an allocation in epoll
PiperOrigin-RevId: 318346153
2020-06-25 14:18:33 -07:00
Tamir Duberstein 10930b0f8c Remove waiter.Entry.Context
This field is redundant since state can be stored in the callback.

PiperOrigin-RevId: 318134855
2020-06-24 13:56:38 -07:00
Adin Scannell 364ac92baf Support for saving pointers to fields in the state package.
Previously, it was not possible to encode/decode an object graph which
contained a pointer to a field within another type. This was because the
encoder was previously unable to disambiguate a pointer to an object and a
pointer within the object.

This CL remedies this by constructing an address map tracking the full memory
range object occupy. The encoded Refvalue message has been extended to allow
references to children objects within another object. Because the encoding
process may learn about object structure over time, we cannot encode any
objects under the entire graph has been generated.

This CL also updates the state package to use standard interfaces intead of
reflection-based dispatch in order to improve performance overall. This
includes a custom wire protocol to significantly reduce the number of
allocations and take advantage of structure packing.

As part of these changes, there are a small number of minor changes in other
places of the code base:

* The lists used during encoding are changed to use intrusive lists with the
  objectEncodeState directly, which required that the ilist Len() method is
  updated to work properly with the ElementMapper mechanism.

* A bug is fixed in the list code wherein Remove() called on an element that is
  already removed can corrupt the list (removing the element if there's only a
  single element). Now the behavior is correct.

* Standard error wrapping is introduced.

* Compressio was updated to implement the new wire.Reader and wire.Writer
  inteface methods directly. The lack of a ReadByte and WriteByte caused issues
  not due to interface dispatch, but because underlying slices for a Read or
  Write call through an interface would always escape to the heap!

* Statify has been updated to support the new APIs.

See README.md for a description of how the new mechanism works.

PiperOrigin-RevId: 318010298
2020-06-23 23:34:06 -07:00
Fabricio Voznika 96519e2c9d Implement POSIX locks
- Change FileDescriptionImpl Lock/UnlockPOSIX signature to
  take {start,length,whence}, so the correct offset can be
  calculated in the implementations.
- Create PosixLocker interface to make it possible to share
  the same locking code from different implementations.

Closes #1480

PiperOrigin-RevId: 316910286
2020-06-17 10:04:26 -07:00
Nicolas Lacasse 810748f5c9 Port aio to VFS2.
In order to make sure all aio goroutines have stopped during S/R, a new
WaitGroup was added to TaskSet, analagous to runningGoroutines. This WaitGroup
is incremented with each aio goroutine, and waited on during kernel.Pause.

The old VFS1 aio code was changed to use this new WaitGroup, rather than
fs.Async. The only uses of fs.Async are now inode and mount Release operations,
which do not call fs.Async recursively. This fixes a lock-ordering violation
that can cause deadlocks.

Updates #1035.

PiperOrigin-RevId: 316689380
2020-06-16 08:49:06 -07:00
Jamie Liu 3b0b1f104d Miscellaneous VFS2 fixes.
PiperOrigin-RevId: 316627764
2020-06-16 00:15:20 -07:00
Andrei Vagin 6ec9d60403 vfs2: implement fcntl(fd, F_SETFL, flags)
PiperOrigin-RevId: 316148074
2020-06-12 11:58:15 -07:00
gVisor bot 508e7c3a79 Merge pull request #2763 from gaurav1086:sentry_kernel_timekeeper_use_buffered_channel
PiperOrigin-RevId: 315803553
2020-06-10 17:43:16 -07:00
Gaurav Singh f1f85f475d sentry: use defer wg.Done() unconditionally
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
2020-06-09 22:56:39 -04:00
Fabricio Voznika 67565078bb Implement flock(2) in VFS2
LockFD is the generic implementation that can be embedded in
FileDescriptionImpl implementations. Unique lock ID is
maintained in vfs.FileDescription and is created on demand.

Updates #1480

PiperOrigin-RevId: 315604825
2020-06-09 18:46:42 -07:00
Rahat Mahmood 21b6bc7280 Implement mount(2) and umount2(2) for VFS2.
This is mostly syscall plumbing, VFS2 already implements the internals of
mounts. In addition to the syscall defintions, the following mount-related
mechanisms are updated:

- Implement MS_NOATIME for VFS2, but only for tmpfs and goferfs. The other VFS2
  filesystems don't implement node-level timestamps yet.

- Implement the 'mode', 'uid' and 'gid' mount options for VFS2's tmpfs.

- Plumb mount namespace ownership, which is necessary for checking appropriate
  capabilities during mount(2).

Updates #1035

PiperOrigin-RevId: 315035352
2020-06-05 19:12:03 -07:00
Andrei Vagin 8c1f5b5cd8 Unshare files on exec
The current task can share its fdtable with a few other tasks,
but after exec, this should be a completely separate process.

PiperOrigin-RevId: 314999565
2020-06-05 14:45:32 -07:00
Dean Deng ccf69bdd7e Implement IN_EXCL_UNLINK inotify option in vfs2.
Limited to tmpfs. Inotify support in other filesystem implementations to
follow.

Updates #1479

PiperOrigin-RevId: 313828648
2020-05-29 12:28:49 -07:00
Dean Deng fe464f44b7 Port inotify to vfs2, with support in tmpfs.
Support in other filesystem impls is still needed. Unlike in Linux and vfs1, we
need to plumb inotify down to each filesystem implementation in order to keep
track of links/inode structures properly.

IN_EXCL_UNLINK still needs to be implemented, as well as a few inotify hooks
that are not present in either vfs1 or vfs2. Those will be addressed in
subsequent changes.

Updates #1479.

PiperOrigin-RevId: 313781995
2020-05-29 08:09:14 -07:00