gvisor

mirror of https://github.com/netbirdio/gvisor.git synced 2026-05-22 17:12:49 -07:00

Author	SHA1	Message	Date
Andrei Vagin	52321c7e00	kernel/pipe: trigger EPOLLERR on a write end if all readers has been closed Fixes #10066 PiperOrigin-RevId: 612838688	2024-03-05 07:48:01 -08:00
Fabricio Voznika	c087777e37	Plumb restore context to afterLoad() This allows for external information to be passed to restore code, like host FDs to be remapped. Updates #1956 PiperOrigin-RevId: 612540749	2024-03-04 12:21:50 -08:00
Jamie Liu	da3eb80271	Fix #10046 See updated comment in sentry/kernel/pipe/vfs.go. PiperOrigin-RevId: 610516821	2024-02-26 13:56:42 -08:00
Jamie Liu	94f3d8a792	Fix splices to FDs that call `usermem.IO.CopyIn/CopyInTo` more than once. Fixes #9932. When Go is able to detect `io.Copy()` from a TCP socket or `AF_UNIX` stream socket to a TCP socket, it attempts to implement the copy as a `splice(2)` from the source to a pipe, followed by a `splice(2)` from the pipe to the destination [1] (since `splice(2)` requires that one of the endpoints be a pipe); the size of the pipe is set to 1 MB [2] (from a default of 64 KB [3]) to reduce the number of splice syscalls required. In gVisor, a bug causes each splice syscall from pipe to TCP socket to repeatedly read the first 64 KB [4] of the pipe's data (when it contains more than 64 KB of data) rather than successive chunks of 64 KB. To fix this, advance pipe state by calling `Pipe.consumeLocked()` immediately after `Pipe.peekLocked()`. Also defensively check that such FDs call `Pipe.(usermem.IO)` methods on sequential addresses, and change `fuse.deviceFD.Write()` to have this property. [1] Go: `net/tcpsock_posix.go:TCPConn.readFrom()` => `net/splice_linux.go:splice()` => `internal/poll/splice_linux.go:Splice()` [2] Go: `internal/poll/splice_linux.go:newPipe()` => `maxSpliceSize` [3] `pkg/kernel/pipe/pipe.go:DefaultPipeSize` [4] `pkg/tcpip/transport/tcp/endpoint.go:endpoint.Write()` => `endpoint.queueSegment()` => `endpoint.readFromPayloader()` => `pkg/buffer/buffer.go:Buffer.WriteFromReader()` => `pkg/buffer/chunk.go:MaxChunkSize` PiperOrigin-RevId: 603151951	2024-01-31 14:02:18 -08:00
Jing Chen	be48200c0e	Re-order loads in BUILD files to make transformations reversible in Copybara. PiperOrigin-RevId: 598898756	2024-01-16 11:21:40 -08:00
Jamie Liu	704543b8f3	Ensure empty VFSPipeFD.SpliceToNonPipe(/dev/null) returns ErrWouldBlock. When grep's stdout is /dev/null (so printed matches are discarded), its outcome is only observable in its exit code, which is binary (0 for matches, 1 for no matches). When grep's stdin is additionally a pipe, GNU grep optimizes for this specific case by switching from reading input to splicing it directly to stdout after the first match: ``` if (exit_on_match \| dev_null_output) list_files = LISTFILES_NONE; ... if (list_files == LISTFILES_NONE) finalize_input (desc, &st, ineof); ... static bool drain_input (int fd, struct stat const st) { ssize_t nbytes; if (S_ISFIFO (st->st_mode) && dev_null_output) { #ifdef SPLICE_F_MOVE / Should be faster, since it need not copy data to user space. */ nbytes = splice (fd, NULL, STDOUT_FILENO, NULL, INITIAL_BUFSIZE, SPLICE_F_MOVE); ``` This triggers a bug in our splice implementation: since memdev.nullFD.Write() never calls back into pipe.Pipe.peekLocked() to get ErrWouldBlock, this is never propagated up to syscalls/linux.Splice(). Consequently, splice() returns 0 instead of blocking; grep interprets this as EOF from the pipe and exits. We can't fix this by calling src.CopyInTo() in memdev.nullFD.Write() because this would have the wrong behavior for `write(/dev/null, unmapped addr)`, which should succeed because `drivers/char/mem.c:null_write()` also ignores the application-provided pointer. Instead, handle this in VFSPipeFD.SpliceToNonPipe(). (Linux instead avoids this problem by distinguishing file_operations::write and file_operations::splice_write, which we would prefer to avoid if possible.) Fixes #9736 PiperOrigin-RevId: 584091971	2023-11-20 12:04:52 -08:00
Andrei Vagin	5f4abad306	Fix a few typos It is an idea of running codespell as part of our presubmit checks. Before enabling it for new changes, let's fix what it has found. Signed-off-by: Andrei Vagin <avagin@gmail.com>	2023-10-25 12:13:42 -07:00
Jamie Liu	ff81c0c639	Remove //pkg/sentry/device. This package was used for VFS1 device number assignment. PiperOrigin-RevId: 538918926	2023-06-08 16:21:04 -07:00
Etienne Perot	f8b9824813	Update `unimpl.EmitUnimplementedEvent` interface to add the syscall number. This catches up the interface to the `EmitUnimplementedEvent` method signature on `kernel.Kernel`. Also add build-time test to verify that `kernel.Kernel` implements this interface, in order to catch such breakages at build time in the future. PiperOrigin-RevId: 519000411	2023-03-23 17:01:37 -07:00
Adin Scannell	1ceb814544	Add `default_applicable_licenses` rules to packages. PiperOrigin-RevId: 513581243	2023-03-02 10:50:04 -08:00
Etienne Perot	445fa6f40c	Lockdep: Print more info in the "unbalanced unlock" case. This CL does the following: - Add the ability for nested locks to have names. - Give names to all current uses of nested locks in the codebase. - Truncate `lockdep` debug stack traces to avoid the clutter from the `lockdep` code itself - Simplify `lockdep` to not longer require `classMap`. PiperOrigin-RevId: 491486620	2022-11-28 17:53:09 -08:00
Ayush Ranjan	1fa3c06f1e	Delete VFS1 completely. - Delete pkg/sentry/fs/*. - Move pkg/sentry/fs/fsutil out of VFS1 directory and remove VFS1 components. - Remove remaining unused references to VFS1 from remaining codebase. - Rename/refactor code to avoid even referencing VFS2, unless necessary. - Rewrite VFS1-only tests to VFS2. Updates #1624 PiperOrigin-RevId: 490064269	2022-11-21 13:57:52 -08:00
Andrei Vagin	604233c9f6	kernel: use lockdep mutexes PiperOrigin-RevId: 449877248	2022-05-19 18:33:59 -07:00
Ayush Ranjan	f6ed4523dc	Reformat codebase. PiperOrigin-RevId: 449358041	2022-05-17 17:48:35 -07:00
Kevin Krakauer	9050184c20	switch fsimpl/ from sync/atomic to atomicbitops for 32 bit values PiperOrigin-RevId: 443535714	2022-04-21 18:32:04 -07:00
Kevin Krakauer	370672e989	prohibit direct use of sync/atomic (u)int64 functions All atomic 64 bit ints are changed to atomicbitops.(Ui\|I)nt64. A nogo checker enforces that sync/atomic 64 bit functions are not called. For reviewers: the interesting changes are in the atomicbitops and checkaligned packages. Why do this? - It is very easy to accidentally use atomic values without sync/atomic funcs. - We have checkatomics, but this is optional and is forgotten in several places. - Using a type+checker to enforce this seems less error prone and simpler. - We get NoCopy protection. - Use of 64 bit atomics can break 32 bit builds. We have types to handle this without any runtime cost, so we might as well use them. PiperOrigin-RevId: 440473398	2022-04-08 16:06:26 -07:00
Fabricio Voznika	dfcf798425	Fix epoll_ctl(2) regular files and dirs Linux behaves differently for regular files and dirs for poll(2)/select(2) compared to epoll_ctl(2). The latter returns EPERM for file and dirs. I've also changed host FDs to behave like the underlying FD in regards to epoll to keep it compatible with docker. Fixes #7134 PiperOrigin-RevId: 429412692	2022-02-17 15:12:36 -08:00
Jamie Liu	8e22ce5019	Consistently order Pipe.mu before other file mutexes and MM.activeMu. PiperOrigin-RevId: 422894869	2022-01-19 13:47:38 -08:00
Andrei Vagin	271e4f4ae6	kernel/pipe: clean up unused fields from the Pipe structure They have been added by mistake. PiperOrigin-RevId: 417716586	2021-12-21 17:14:04 -08:00
Andrei Vagin	b76119a1e7	pipe: a reader has to wait when all writers will be notified Otherwise, we can have a race when a reader cloes a pipe before a write detects this reader. PiperOrigin-RevId: 417645683	2021-12-21 10:19:23 -08:00
Andrei Vagin	4d29819e13	pipe: have separate notifiers for readers and writers This change fixes a busy loop in the pipe code. VFSPipe.Open calls ctx.BlockOn to wait an opposite side, but waitQueue.EventRegister always triggers EventInternal, so we never block. Reported-by: syzbot+773e19ca2574516c9e00@syzkaller.appspotmail.com PiperOrigin-RevId: 415428542	2021-12-09 21:51:32 -08:00
Adin Scannell	dedb7e6ca1	Align Context API with kernel internals. This change adapts the existing context to use more suitable non-channel-based methods. This is a requisite for migrating the kernel internals to a sleeper-based notification mechanism. The last uses of amutex outside those migrated as part of this change were dropped in a previous change. Since amutex depends on the channel-based implementation, this package is also deleted as part of this change. PiperOrigin-RevId: 415189675	2021-12-08 23:51:37 -08:00
Fabricio Voznika	9768009a79	Don't eat error from epoll_ctl EPOLL_CTL_ADD Docker maps stdin to `/dev/null` which doesn't support epoll. Host FD was ignoring the error and suceeding the epoll_ctl call from the container, giving false impressing that epoll would be notified. This required plumbing failure to all waiter.Waitable.EventRegister callers and implementers. Closes #6795 PiperOrigin-RevId: 414797621	2021-12-07 12:36:00 -08:00
Adin Scannell	91f58d2cc8	Update Waitable API. Instead of passing the event mask at registratrion time, pass the mask as part of the waiter. This makes the mask immutable and simplifies the architecture of waiters. This is also necessary for a future fix that will allow the fdnotifier to keep persistent entries, as opposed to requiring constant updates. This change is intended to be a no-op in terms of function. The only exception is signalfd, where this mask was abused. To handle this case, the operation of signalfd changed to allow one layer of indirection. PiperOrigin-RevId: 409702998	2021-11-13 12:54:39 -08:00
Jamie Liu	8682ce689e	Remove state:"nosave"/"zerovalue" annotations from all waiter.Queues. Prior to cl/318010298, //pkg/state couldn't handle pointers to struct fields, which meant that it couldn't handle intrusive linked lists, which meant that it couldn't handle waiter.Queue, which meant that it couldn't handle epoll. As a result, VFS1 unregisters all epoll waiters before saving and re-registers them after loading, and waitable VFS1 file implementations tag their waiter.Queues state:"nosave" (causing them to be skipped by the save/restore machinery) or state:"zerovalue" (causing them to only be checked for zero-value-equality on save). VFS2 required cl/318010298 to support save/restore (due to the Impl inheritance pattern used by vfs.FileDescription, vfs.Dentry, etc.); correspondingly, VFS2 epoll assumes that waiter.Queues will be saved and loaded correctly, and VFS2 file implementations do not tag waiter.Queues. Some waiter.Queues, e.g. pipe.Pipe.Queue and kernel.Task.signalQueue, are used by both VFS1 and VFS2 (the latter via signalfd); as a result of the above, tagging these Queues state:"nosave" or state:"zerovalue" breaks VFS2 epoll. Remove VFS1 epoll unregistration before saving (bringing it in line with VFS2), and remove these tags from all waiter.Queues. Also clean up after the epoll test added by cl/402323053, which implied this issue (by instantiating DisableSave in the new test) without reporting it. PiperOrigin-RevId: 402596216	2021-10-12 10:25:30 -07:00

1 2 3 4

92 Commits