Currently we rely on the PIPE_BUF_FLAG_LRU flag being set correctly
to know whether we need to fiddle with page LRU state after stealing it,
however for some origins we just don't know if the page is on the LRU
list or not.
So remove PIPE_BUF_FLAG_LRU and do this check/add manually in pipe_to_file()
instead.
Signed-off-by: Jens Axboe <axboe@suse.de>
We need to use the minium of {len, PAGE_SIZE-off}, not {len, PAGE_SIZE}-off.
The latter doesn't make any sense, and could cause us to attempt negative
length transfers...
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
[PATCH] vmsplice: allow user to pass in gift pages
[PATCH] pipe: enable atomic copying of pipe data to/from user space
[PATCH] splice: call handle_ra_miss() on failure to lookup page
[PATCH] Add ->splice_read/splice_write to def_blk_fops
[PATCH] pipe: introduce ->pin() buffer operation
[PATCH] splice: fix bugs in pipe_to_file()
[PATCH] splice: fix bugs with stealing regular pipe pages
If SPLICE_F_GIFT is set, the user is basically giving this pages away to
the kernel. That means we can steal them for eg page cache uses instead
of copying it.
The data must be properly page aligned and also a multiple of the page size
in length.
Signed-off-by: Jens Axboe <axboe@suse.de>
The pipe ->map() method uses kmap() to virtually map the pages, which
is both slow and has known scalability issues on SMP. This patch enables
atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
instead.
lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
are results from that on a UP machine with highmem (1.5GiB of RAM), running
first a UP kernel, SMP kernel, and SMP kernel patched.
Vanilla-UP:
Pipe bandwidth: 1622.28 MB/sec
Pipe bandwidth: 1610.59 MB/sec
Pipe bandwidth: 1608.30 MB/sec
Pipe latency: 7.3275 microseconds
Pipe latency: 7.2995 microseconds
Pipe latency: 7.3097 microseconds
Vanilla-SMP:
Pipe bandwidth: 1382.19 MB/sec
Pipe bandwidth: 1317.27 MB/sec
Pipe bandwidth: 1355.61 MB/sec
Pipe latency: 9.6402 microseconds
Pipe latency: 9.6696 microseconds
Pipe latency: 9.6153 microseconds
Patched-SMP:
Pipe bandwidth: 1578.70 MB/sec
Pipe bandwidth: 1579.95 MB/sec
Pipe bandwidth: 1578.63 MB/sec
Pipe latency: 9.1654 microseconds
Pipe latency: 9.2266 microseconds
Pipe latency: 9.1527 microseconds
Signed-off-by: Jens Axboe <axboe@suse.de>
The ->map() function is really expensive on highmem machines right now,
since it has to use the slower kmap() instead of kmap_atomic(). Splice
rarely needs to access the virtual address of a page, so it's a waste
of time doing it.
Introduce ->pin() to take over the responsibility of making sure the
page data is valid. ->map() is then reduced to just kmap(). That way we
can also share a most of the pipe buffer ops between pipe.c and splice.c
Signed-off-by: Jens Axboe <axboe@suse.de>
Found by Oleg Nesterov <oleg@tv-sign.ru>, fixed by me.
- Only allow full pages to go to the page cache.
- Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
- Remember to clear 'stolen' if add_to_page_cache() fails.
And as a cleanup on that:
- Make the bottom fall-through logic a little less convoluted. Also make
the steal path hold an extra reference to the page, so we don't have
to differentiate between stolen and non-stolen at the end.
Signed-off-by: Jens Axboe <axboe@suse.de>
- Check that page has suitable count for stealing in the regular pipes.
- pipe_to_file() assumes that the page is locked on succesful steal, so
do that in the pipe steal hook
- Missing unlock_page() in add_to_page_cache() failure.
Signed-off-by: Jens Axboe <axboe@suse.de>
Wire up *at syscalls.
This patch has been tested on ppc64 (using glibc's testsuite, both 32bit
and 64bit), and compile-tested for ppc32 (I have currently no ppc32 system
available, but I expect no problems).
Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use the new find_get_pages_contig() to potentially look up the entire
splice range in one single call. This speeds up generic_file_splice_read()
quite a bit.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch addresses a flaw in LSM, where there is no mediation of readv()
and writev() in for 32-bit compatible apps using a 64-bit kernel.
This bug was discovered and fixed initially in the native readv/writev
code [1], but was not fixed in the compat code. Thanks to Al for spotting
this one.
[1] http://lwn.net/Articles/154282/
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
All modifications of ->i_flags in inodes that might be visible to
somebody else must be under ->i_mutex. That patch fixes ext3 ioctl()
setting S_APPEND and friends.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sbi->s_group_desc is an array of pointers to buffer_head. memcpy() of
buffer size from address of buffer_head is a bad idea - it will generate
junk in any case, may oops if buffer_head is close to the end of slab
page and next page is not mapped and isn't what was intended there.
IOW, ->b_data is missing in that call. Fortunately, result doesn't go
into the primary on-disk data structures, so only backup ones get crap
written to them; that had allowed this bug to remain unnoticed until
now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice()
moves data to a pipe, with the input being a user address range instead.
This uses an approach suggested by Linus, where we can hold partial ranges
inside the pages[] map. Hopefully this will be useful for network
receive support as well.
Signed-off-by: Jens Axboe <axboe@suse.de>
BKL does not protect against races if the task may sleep between
checking and setting a value. So move checking of file->private_data
near to setting it in fuse_fill_super().
Found by Al Viro.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
A deadlock was possible, when the last reference to the superblock was
held due to a background request containing a file reference.
Releasing the file would release the vfsmount which in turn would
release the superblock. Since sbput_sem is held during the fput() and
fuse_put_super() tries to acquire this same semaphore, a deadlock
results.
The solution is to move the fput() outside the region protected by
sbput_sem.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
This reverts 73ce8355c2 commit.
It was wrong, because it didn't take into account the requirement,
that iput() for background requests must be performed synchronously
with ->put_super(), otherwise active inodes may remain after unmount.
The right solution is to keep the sbput_sem and perform iput() within
the locked region, but move fput() outside sbput_sem.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Make the move_from_pipe() actors return number of bytes processed, then
move_from_pipe() can decide more cleverly when to move on to the next
buffer.
This fixes problems with pipe offset and differing file offset.
Signed-off-by: Jens Axboe <axboe@suse.de>