pipe_write() would return 0 if it failed to merge the beginning of the
data to write with the last, partially filled pipe buffer. It should
return an error code instead. Userspace programs could be confused by
write() returning 0 when called with a nonzero 'count'.
The EFAULT error case was a regression from f0d1bec9d5 ("new helper:
copy_page_from_iter()"), while the ops->confirm() error case was a much
older bug.
Test program:
#include <assert.h>
#include <errno.h>
#include <unistd.h>
int main(void)
{
int fd[2];
char data[1] = {0};
assert(0 == pipe(fd));
assert(1 == write(fd[1], data, 1));
/* prior to this patch, write() returned 0 here */
assert(-1 == write(fd[1], NULL, 1));
assert(errno == EFAULT);
}
Cc: stable@vger.kernel.org # at least v3.15+
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If sys_pipe() was unable to allocate a 'struct file', it always failed
with ENFILE, which means "The number of simultaneously open files in the
system would exceed a system-imposed limit." However, alloc_file()
actually returns an ERR_PTR value and might fail with other error codes.
Currently, in addition to ENFILE, it can fail with ENOMEM, potentially
when there are few open files in the system. Update sys_pipe() to
preserve this error code.
In a prior submission of a similar patch (1) some concern was raised
about introducing a new error code for sys_pipe(). However, for most
system calls, programs cannot assume that new error codes will never be
introduced. In addition, ENOMEM was, in fact, already a possible error
code for sys_pipe(), in the case where the file descriptor table could
not be expanded due to insufficient memory.
(1) http://comments.gmane.org/gmane.linux.kernel/1357942
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
For now, just use the same thing we pass to ->direct_IO() - it's all
iovec-based at the moment. Pass it explicitly to iov_iter_init() and
account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
uses.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:
spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);
where the final freeing is done last.
HOWEVER. The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.
Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.
Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early. And since there were two users of this
pattern, create a helper function for it.
Introduced commit ba5bb14733 ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").
Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Ian Applegate <ia@cloudflare.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # v3.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
it's used only as a flag to distinguish normal pipes/FIFOs from the
internal per-task one used by file-to-file splice. And pipe->files
would work just as well for that purpose...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs/pipe.c file_operations methods *know* that pipe is not an internal one;
no need to check pipe->inode for those callers.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new field - pipe->files; number of struct file over that pipe (all
sharing the same inode, of course); protected by inode->i_lock.
* pipe_release() decrements pipe->files, clears inode->i_pipe when
if the counter has reached 0 (all under ->i_lock) and, in that case,
frees pipe after having done pipe_unlock()
* fifo_open() starts with grabbing ->i_lock, and either bumps pipe->files
if ->i_pipe was non-NULL or allocates a new pipe (dropping and regaining
->i_lock) and rechecks ->i_pipe; if it's still NULL, inserts new pipe
there, otherwise bumps ->i_pipe->files and frees the one we'd allocated.
At that point we know that ->i_pipe is non-NULL and won't go away, so
we can do pipe_lock() on it and proceed as we used to. If we end up
failing, decrement pipe->files and if it reaches 0 clear ->i_pipe and
free the sucker after pipe_unlock().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* use the fact that file_inode(file)->i_pipe doesn't change
while the file is opened - no locks needed to access that.
* switch to pipe_lock/pipe_unlock where it's easy to do
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If you open a pipe for neither read nor write, the pipe code will not
add any usage counters to the pipe, causing the 'struct pipe_inode_info"
to be potentially released early.
That doesn't normally matter, since you cannot actually use the pipe,
but the pipe release code - particularly fasync handling - still expects
the actual pipe infrastructure to all be there. And rather than adding
NULL pointer checks, let's just disallow this case, the same way we
already do for the named pipe ("fifo") case.
This is ancient going back to pre-2.4 days, and until trinity, nobody
naver noticed.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>