Pull nowait read support from Al Viro:
"Support IOCB_NOWAIT for buffered reads and block devices"
* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
block_dev: support RFW_NOWAIT on block device nodes
fs: support RWF_NOWAIT for buffered reads
fs: support IOCB_NOWAIT in generic_file_buffered_read
fs: pass iocb to do_generic_file_read
Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).
This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like
list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')
sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
-e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
-e 's/\<MS_NODEV\>/SB_NODEV/g' \
-e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
-e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
-e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
-e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
-e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
-e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
-e 's/\<MS_SILENT\>/SB_SILENT/g' \
-e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
-e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
-e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
-e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
$list
and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
Pull overlayfs updates from Miklos Szeredi:
"This fixes d_ino correctness in readdir, which brings overlayfs on par
with normal filesystems regarding inode number semantics, as long as
all layers are on the same filesystem.
There are also some bug fixes, one in particular (random ioctl's
shouldn't be able to modify lower layers) that touches some vfs code,
but of course no-op for non-overlay fs"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix false positive ESTALE on lookup
ovl: don't allow writing ioctl on lower layer
ovl: fix relatime for directories
vfs: add flags to d_real()
ovl: cleanup d_real for negative
ovl: constant d_ino for non-merge dirs
ovl: constant d_ino across copy up
ovl: fix readdir error value
ovl: check snprintf return
The fstatat(2) and statx() calls can pass the flag AT_NO_AUTOMOUNT which
is meant to clear the LOOKUP_AUTOMOUNT flag and prevent triggering of an
automount by the call. But this flag is unconditionally cleared for all
stat family system calls except statx().
stat family system calls have always triggered mount requests for the
negative dentry case in follow_automount() which is intended but prevents
the fstatat(2) and statx() AT_NO_AUTOMOUNT case from being handled.
In order to handle the AT_NO_AUTOMOUNT for both system calls the negative
dentry case in follow_automount() needs to be changed to return ENOENT
when the LOOKUP_AUTOMOUNT flag is clear (and the other required flags are
clear).
AFAICT this change doesn't have any noticable side effects and may, in
some use cases (although I didn't see it in testing) prevent unnecessary
callbacks to the automount daemon.
It's also possible that a stat family call has been made with a path that
is in the process of being mounted by some other process. But stat family
calls should return the automount state of the path as it is "now" so it
shouldn't wait for mount completion.
This is the same semantic as the positive dentry case already handled.
Link: http://lkml.kernel.org/r/150216641255.11652.4204561328197919771.stgit@pluto.themaw.net
Fixes: deccf497d8 ("Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()")
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull quota scaling updates from Jan Kara:
"This contains changes to make the quota subsystem more scalable.
Reportedly it improves number of files created per second on ext4
filesystem on fast storage by about a factor of 2x"
* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
quota: Add lock annotations to struct members
quota: Reduce contention on dq_data_lock
fs: Provide __inode_get_bytes()
quota: Inline dquot_[re]claim_reserved_space() into callsite
quota: Inline inode_{incr,decr}_space() into callsites
quota: Inline functions into their callsites
ext4: Disable dirty list tracking of dquots when journalling quotas
quota: Allow disabling tracking of dirty dquots in a list
quota: Remove dq_wait_unused from dquot
quota: Move locking into clear_dquot_dirty()
quota: Do not dirty bad dquots
quota: Fix possible corruption of dqi_flags
quota: Propagate ->quota_read errors from v2_read_file_info()
quota: Fix error codes in v2_read_file_info()
quota: Push dqio_sem down to ->read_file_info()
quota: Push dqio_sem down to ->write_file_info()
quota: Push dqio_sem down to ->get_next_id()
quota: Push dqio_sem down to ->release_dqblk()
quota: Remove locking for writing to the old quota format
quota: Do not acquire dqio_sem for dquot overwrites in v2 format
...
Pull block layer updates from Jens Axboe:
"This is the first pull request for 4.14, containing most of the code
changes. It's a quiet series this round, which I think we needed after
the churn of the last few series. This contains:
- Fix for a registration race in loop, from Anton Volkov.
- Overflow complaint fix from Arnd for DAC960.
- Series of drbd changes from the usual suspects.
- Conversion of the stec/skd driver to blk-mq. From Bart.
- A few BFQ improvements/fixes from Paolo.
- CFQ improvement from Ritesh, allowing idling for group idle.
- A few fixes found by Dan's smatch, courtesy of Dan.
- A warning fixup for a race between changing the IO scheduler and
device remova. From David Jeffery.
- A few nbd fixes from Josef.
- Support for cgroup info in blktrace, from Shaohua.
- Also from Shaohua, new features in the null_blk driver to allow it
to actually hold data, among other things.
- Various corner cases and error handling fixes from Weiping Zhang.
- Improvements to the IO stats tracking for blk-mq from me. Can
drastically improve performance for fast devices and/or big
machines.
- Series from Christoph removing bi_bdev as being needed for IO
submission, in preparation for nvme multipathing code.
- Series from Bart, including various cleanups and fixes for switch
fall through case complaints"
* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
kernfs: checking for IS_ERR() instead of NULL
drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
drbd: Fix allyesconfig build, fix recent commit
drbd: switch from kmalloc() to kmalloc_array()
drbd: abort drbd_start_resync if there is no connection
drbd: move global variables to drbd namespace and make some static
drbd: rename "usermode_helper" to "drbd_usermode_helper"
drbd: fix race between handshake and admin disconnect/down
drbd: fix potential deadlock when trying to detach during handshake
drbd: A single dot should be put into a sequence.
drbd: fix rmmod cleanup, remove _all_ debugfs entries
drbd: Use setup_timer() instead of init_timer() to simplify the code.
drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
drbd: new disk-option disable-write-same
drbd: Fix resource role for newly created resources in events2
drbd: mark symbols static where possible
drbd: Send P_NEG_ACK upon write error in protocol != C
drbd: add explicit plugging when submitting batches
drbd: change list_for_each_safe to while(list_first_entry_or_null)
drbd: introduce drbd_recv_header_maybe_unplug
...
Merge updates from Andrew Morton:
- various misc bits
- DAX updates
- OCFS2
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (119 commits)
mm,fork: introduce MADV_WIPEONFORK
x86,mpx: make mpx depend on x86-64 to free up VMA flag
mm: add /proc/pid/smaps_rollup
mm: hugetlb: clear target sub-page last when clearing huge page
mm: oom: let oom_reap_task and exit_mmap run concurrently
swap: choose swap device according to numa node
mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
mm, oom: do not rely on TIF_MEMDIE for memory reserves access
z3fold: use per-cpu unbuddied lists
mm, swap: don't use VMA based swap readahead if HDD is used as swap
mm, swap: add sysfs interface for VMA based swap readahead
mm, swap: VMA based swap readahead
mm, swap: fix swap readahead marking
mm, swap: add swap readahead hit statistics
mm/vmalloc.c: don't reinvent the wheel but use existing llist API
mm/vmstat.c: fix wrong comment
selftests/memfd: add memfd_create hugetlbfs selftest
mm/shmem: add hugetlbfs support to memfd_create()
mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
...
Pull writeback error handling updates from Jeff Layton:
"This pile continues the work from last cycle on better tracking
writeback errors. In v4.13 we added some basic errseq_t infrastructure
and converted a few filesystems to use it.
This set continues refining that infrastructure, adds documentation,
and converts most of the other filesystems to use it. The main
exception at this point is the NFS client"
* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
ecryptfs: convert to file_write_and_wait in ->fsync
mm: remove optimizations based on i_size in mapping writeback waits
fs: convert a pile of fsync routines to errseq_t based reporting
gfs2: convert to errseq_t based writeback error reporting for fsync
fs: convert sync_file_range to use errseq_t based error-tracking
mm: add file_fdatawait_range and file_write_and_wait
fuse: convert to errseq_t based error tracking for fsync
mm: consolidate dax / non-dax checks for writeback
Documentation: add some docs for errseq_t
errseq: rename __errseq_set to errseq_set
Pull file locking updates from Jeff Layton:
"This pile just has a few file locking fixes from Ben Coddington. There
are a couple of cleanup patches + an attempt to bring sanity to the
l_pid value that is reported back to userland on an F_GETLK request.
After a few gyrations, he came up with a way for filesystems to
communicate to the VFS layer code whether the pid should be translated
according to the namespace or presented as-is to userland"
* tag 'locks-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: restore a warn for leaked locks on close
fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks
fs/locks: Use allocation rather than the stack in fcntl_getlk()
Pull XFS updates from Darrick Wong:
"Here are the changes for xfs for 4.14. Most of these are cleanups and
fixes for bad behavior, as we're mostly focusing on improving
reliablity this cycle (read: there's potentially a lot of stuff on the
horizon for 4.15 so better to spend a few weeks killing other bugs
now).
Summary:
- Write unmount record for a ro mount to avoid unnecessary log replay
- Clean up orphaned inodes when mounting fs readonly
- Resubmit inode log items when buffer writeback fails to avoid
umount hang
- Fix log recovery corruption problems when log headers wrap around
the end
- Avoid infinite loop searching for free inodes when inode counters
are wrong
- Evict inodes involved with log redo so that we don't leak them
later
- Fix a potential race between reclaim and inode cluster freeing
- Refactor the inode joining code w.r.t. transaction rolling &
deferred ops
- Fix a bug where the log doesn't properly deal with dirty buffers
that are about to become ordered buffers
- Fix the extent swap code to deal with making dirty buffers ordered
properly
- Consolidate page fault handlers
- Refactor the incore extent manipulation functions to use the iext
abstractions instead of directly modifying with extent data
- Disable crashy chattr +/-x until we fix it
- Don't allow us to set S_DAX for v2 inodes
- Various cleanups
- Clarify some documentation
- Fix a problem where fsync and a log commit race to send the disk a
flush command, resulting in a small window where power fail data
loss could occur
- Simplify some rmap operations in the fcollapse code
- Fix some use-after-free problems in async writeback"
* tag 'xfs-4.14-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
xfs: use kmem_free to free return value of kmem_zalloc
xfs: open code end_buffer_async_write in xfs_finish_page_writeback
xfs: don't set v3 xflags for v2 inodes
xfs: fix compiler warnings
fsmap: fix documentation of FMR_OF_LAST
xfs: simplify the rmap code in xfs_bmse_merge
xfs: remove unused flags arg from xfs_file_iomap_begin_delay
xfs: fix incorrect log_flushed on fsync
xfs: disable per-inode DAX flag
xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves
xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extent
xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_at
xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extents
xfs: move some code around inside xfs_bmap_shift_extents
xfs: use xfs_iext_get_extent in xfs_bmap_first_unused
xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insert
xfs: add a xfs_iext_update_extent helper
xfs: consolidate the various page fault handlers
iomap: return VM_FAULT_* codes from iomap_page_mkwrite
xfs: relog dirty buffers during swapext bmbt owner change
...
Pull char/misc driver updates from Greg KH:
"Here is the big char/misc driver update for 4.14-rc1.
Lots of different stuff in here, it's been an active development cycle
for some reason. Highlights are:
- updated binder driver, this brings binder up to date with what
shipped in the Android O release, plus some more changes that
happened since then that are in the Android development trees.
- coresight updates and fixes
- mux driver file renames to be a bit "nicer"
- intel_th driver updates
- normal set of hyper-v updates and changes
- small fpga subsystem and driver updates
- lots of const code changes all over the driver trees
- extcon driver updates
- fmc driver subsystem upadates
- w1 subsystem minor reworks and new features and drivers added
- spmi driver updates
Plus a smattering of other minor driver updates and fixes.
All of these have been in linux-next with no reported issues for a
while"
* tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits)
ANDROID: binder: don't queue async transactions to thread.
ANDROID: binder: don't enqueue death notifications to thread todo.
ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl
ANDROID: binder: push new transactions to waiting threads.
ANDROID: binder: remove proc waitqueue
android: binder: Add page usage in binder stats
android: binder: fixup crash introduced by moving buffer hdr
drivers: w1: add hwmon temp support for w1_therm
drivers: w1: refactor w1_slave_show to make the temp reading functionality separate
drivers: w1: add hwmon support structures
eeprom: idt_89hpesx: Support both ACPI and OF probing
mcb: Fix an error handling path in 'chameleon_parse_cells()'
MCB: add support for SC31 to mcb-lpc
mux: make device_type const
char: virtio: constify attribute_group structures.
Documentation/ABI: document the nvmem sysfs files
lkdtm: fix spelling mistake: "incremeted" -> "incremented"
perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file
nvmem: include linux/err.h from header
...
We've got no modular users left, and any potential modular user is better
of with iov_iter based variants.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
No modular users left, and any new ones should use kernel_read/write
or iov_iter variants instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This matches kernel_read and kernel_write and avoids any need for casts in
the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make the position an in/out argument like all the other read/write
helpers and and make the buf argument a void pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use proper ssize_t and size_t types for the return value and count
argument, move the offset last and make it an in/out argument like
all other read/write helpers, and make the buf argument a void pointer
to get rid of lots of casts in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This is based on the old idea and code from Milosz Tanski. With the aio
nowait code it becomes mostly trivial now. Buffered writes continue to
return -EOPNOTSUPP if RWF_NOWAIT is passed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery. This also had the
effect of putting linked inodes on the lru instead of evicting them.
Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.
Therefore, evict unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.
Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: viro@ZenIV.linux.org.uk
Reviewed-by: Brian Foster <bfoster@redhat.com>