Pull third hunk of vfs changes from Al Viro:
"This contains the ->direct_IO() changes from Omar + saner
generic_write_checks() + dealing with fcntl()/{read,write}() races
(mirroring O_APPEND/O_DIRECT into iocb->ki_flags and instead of
repeatedly looking at ->f_flags, which can be changed by fcntl(2),
check ->ki_flags - which cannot) + infrastructure bits for dhowells'
d_inode annotations + Christophs switch of /dev/loop to
vfs_iter_write()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (30 commits)
block: loop: switch to VFS ITER_BVEC
configfs: Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode
VFS: Make pathwalk use d_is_reg() rather than S_ISREG()
VFS: Fix up debugfs to use d_is_dir() in place of S_ISDIR()
VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk
NFS: Don't use d_inode as a variable name
VFS: Impose ordering on accesses of d_inode and d_flags
VFS: Add owner-filesystem positive/negative dentry checks
nfs: generic_write_checks() shouldn't be done on swapout...
ocfs2: use __generic_file_write_iter()
mirror O_APPEND and O_DIRECT into iocb->ki_flags
switch generic_write_checks() to iocb and iter
ocfs2: move generic_write_checks() before the alignment checks
ocfs2_file_write_iter: stop messing with ppos
udf_file_write_iter: reorder and simplify
fuse: ->direct_IO() doesn't need generic_write_checks()
ext4_file_write_iter: move generic_write_checks() up
xfs_file_aio_write_checks: switch to iocb/iov_iter
generic_write_checks(): drop isblk argument
blkdev_write_iter: expand generic_file_checks() call in there
...
Merge first patchbomb from Andrew Morton:
- arch/sh updates
- ocfs2 updates
- kernel/watchdog feature
- about half of mm/
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (122 commits)
Documentation: update arch list in the 'memtest' entry
Kconfig: memtest: update number of test patterns up to 17
arm: add support for memtest
arm64: add support for memtest
memtest: use phys_addr_t for physical addresses
mm: move memtest under mm
mm, hugetlb: abort __get_user_pages if current has been oom killed
mm, mempool: do not allow atomic resizing
memcg: print cgroup information when system panics due to panic_on_oom
mm: numa: remove migrate_ratelimited
mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
mm: split ET_DYN ASLR from mmap ASLR
s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
mm: expose arch_mmap_rnd when available
s390: standardize mmap_rnd() usage
powerpc: standardize mmap_rnd() usage
mips: extract logic for mmap_rnd()
arm64: standardize mmap_rnd() usage
x86: standardize mmap_rnd() usage
arm: factor out mmap ASLR into mmap_rnd
...
This patch replaces cancel_dirty_page() with a helper function
account_page_cleaned() which only updates counters. It's called from
truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
Page is locked in both cases, page-lock protects against concurrent
dirtiers: see commit 2d6d7f9828 ("mm: protect set_page_dirty() from
ongoing truncation").
Delete_from_page_cache() shouldn't be called for dirty pages, they must
be handled by caller (either written or truncated). This patch treats
final dirty accounting fixup at the end of __delete_from_page_cache() as
a debug check and adds WARN_ON_ONCE() around it. If something removes
dirty pages without proper handling that might be a bug and unwritten
data might be lost.
Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
here.
cancel_dirty_page() in nfs_wb_page_cancel() is redundant. This is
helper for nfs_invalidate_page() and it's called only in case complete
invalidation.
The mess was started in v2.6.20 after commits 46d2277c79 ("Clean up
and make try_to_free_buffers() not race with dirty pages") and
3e67c0987d ("truncate: clear page dirtiness before running
try_to_free_buffers()") first was reverted right in v2.6.20 in commit
ecdfc9787f ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
v2.6.25 commit a2b345642f ("Fix dirty page accounting leak with ext3
data=journal").
Custom fixes were introduced between these points. NFS in v2.6.23, commit
1b3b4a1a2d ("NFS: Fix a write request leak in nfs_invalidate_page()").
Kludge in __delete_from_page_cache() in v2.6.24, commit 3a6927906f ("Do
dirty page accounting when removing a page from the page cache"). Since
v2.6.25 all of them are redundant.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success. Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Conflicts:
fs/ext4/file.c
A side effect worth noting: in O_APPEND case we set ->ki_pos early,
so if it turns out to be an error or a zero-length write, we'll
end up with ->ki_pos modified. Safe, since all callers never
look at the ->ki_pos after the call of __generic_file_write_iter()
returning non-positive, all the way to caller of ->write_iter() and
those discard ->ki_pos when getting that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull backing device changes from Jens Axboe:
"This contains a cleanup of how the backing device is handled, in
preparation for a rework of the life time rules. In this part, the
most important change is to split the unrelated nommu mmap flags from
it, but also removing a backing_dev_info pointer from the
address_space (and inode), and a cleanup of other various minor bits.
Christoph did all the work here, I just fixed an oops with pages that
have a swap backing. Arnd fixed a missing export, and Oleg killed the
lustre backing_dev_info from staging. Last patch was from Al,
unexporting parts that are now no longer needed outside"
* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
Make super_blocks and sb_lock static
mtd: export new mtd_mmap_capabilities
fs: make inode_to_bdi() handle NULL inode
staging/lustre/llite: get rid of backing_dev_info
fs: remove default_backing_dev_info
fs: don't reassign dirty inodes to default_backing_dev_info
nfs: don't call bdi_unregister
ceph: remove call to bdi_unregister
fs: remove mapping->backing_dev_info
fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
nilfs2: set up s_bdi like the generic mount_bdev code
block_dev: get bdev inode bdi directly from the block device
block_dev: only write bdev inode on close
fs: introduce f_op->mmap_capabilities for nommu mmap support
fs: kill BDI_CAP_SWAP_BACKED
fs: deduplicate noop_backing_dev_info
Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case. Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit 2457aec637 ("mm: non-atomically mark page accessed during page
cache allocation where possible") has added a separate parameter for
specifying gfp mask for radix tree allocations.
Not only this is less than optimal from the API point of view because it
is error prone, it is also buggy currently because
grab_cache_page_write_begin is using GFP_KERNEL for radix tree and if
fgp_flags doesn't contain FGP_NOFS (mostly controlled by fs by
AOP_FLAG_NOFS flag) but the mapping_gfp_mask has __GFP_FS cleared then
the radix tree allocation wouldn't obey the restriction and might
recurse into filesystem and cause deadlocks. This is the case for most
filesystems unfortunately because only ext4 and gfs2 are using
AOP_FLAG_NOFS.
Let's simply remove radix_gfp_mask parameter because the allocation
context is same for both page cache and for the radix tree. Just make
sure that the radix tree gets only the sane subset of the mask (e.g. do
not pass __GFP_WRITE).
Long term it is more preferable to convert remaining users of
AOP_FLAG_NOFS to use mapping_gfp_mask instead and simplify this
interface even further.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory. To
this end, this lock can also be a rwsem. In addition, there are some
important opportunities to share the lock when there are no tree
modifications.
This conversion is straightforward. For now, all users take the write
lock.
[sfr@canb.auug.org.au: update fremap.c]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- fix an NFSv4.1 state renewal regression
- fix open/lock state recovery error handling
- fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
- fix statd when reconnection fails
- don't wake tasks during connection abort
- don't start reboot recovery if lease check fails
- fix duplicate proc entries
Features:
- pNFS block driver fixes and clean ups from Christoph
- More code cleanups from Anna
- Improve mmap() writeback performance
- Replace use of PF_TRANS with a more generic mechanism for avoiding
deadlocks in nfs_release_page"
* tag 'nfs-for-3.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (66 commits)
NFSv4.1: Fix an NFSv4.1 state renewal regression
NFSv4: fix open/lock state recovery error handling
NFSv4: Fix lock recovery when CREATE_SESSION/SETCLIENTID_CONFIRM fails
NFS: Fabricate fscache server index key correctly
SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT
NFSv3: Fix missing includes of nfs3_fs.h
NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()
NFS: avoid waiting at all in nfs_release_page when congested.
NFS: avoid deadlocks with loop-back mounted NFS filesystems.
MM: export page_wakeup functions
SCHED: add some "wait..on_bit...timeout()" interfaces.
NFS: don't use STABLE writes during writeback.
NFSv4: use exponential retry on NFS4ERR_DELAY for async requests.
rpc: Add -EPERM processing for xs_udp_send_request()
rpc: return sent and err from xs_sendpages()
lockd: Try to reconnect if statd has moved
SUNRPC: Don't wake tasks during connection abort
Fixing lease renewal
nfs: fix duplicate proc entries
pnfs/blocklayout: Fix a 64-bit division/remainder issue in bl_map_stripe
...
In commit c1221321b7
sched: Allow wait_on_bit_action() functions to support a timeout
I suggested that a "wait_on_bit_timeout()" interface would not meet my
need. This isn't true - I was just over-engineering.
Including a 'private' field in wait_bit_key instead of a focused
"timeout" field was just premature generalization. If some other
use is ever found, it can be generalized or added later.
So this patch renames "private" to "timeout" with a meaning "stop
waiting when "jiffies" reaches or passes "timeout",
and adds two of the many possible wait..bit..timeout() interfaces:
wait_on_page_bit_killable_timeout(), which is the one I want to use,
and out_of_line_wait_on_bit_timeout() which is a reasonably general
example. Others can be added as needed.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This patch fix spelling typo found in DocBook/kernel-api.xml.
It is because the file is generated from the source comments,
I have to fix the comments in source codes.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull vfs updates from Al Viro:
"Stuff in here:
- acct.c fixes and general rework of mnt_pin mechanism. That allows
to go for delayed-mntput stuff, which will permit mntput() on deep
stack without worrying about stack overflows - fs shutdown will
happen on shallow stack. IOW, we can do Eric's umount-on-rmdir
series without introducing tons of stack overflows on new mntput()
call chains it introduces.
- Bruce's d_splice_alias() patches
- more Miklos' rename() stuff.
- a couple of regression fixes (stable fodder, in the end of branch)
and a fix for API idiocy in iov_iter.c.
There definitely will be another pile, maybe even two. I'd like to
get Eric's series in this time, but even if we miss it, it'll go right
in the beginning of for-next in the next cycle - the tricky part of
prereqs is in this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
fix copy_tree() regression
__generic_file_write_iter(): fix handling of sync error after DIO
switch iov_iter_get_pages() to passing maximal number of pages
fs: mark __d_obtain_alias static
dcache: d_splice_alias should detect loops
exportfs: update Exporting documentation
dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
dcache: remove unused d_find_alias parameter
dcache: d_obtain_alias callers don't all want DISCONNECTED
dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
dcache: d_splice_alias mustn't create directory aliases
dcache: close d_move race in d_splice_alias
dcache: move d_splice_alias
namei: trivial fix to vfs_rename_dir comment
VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
cifs: support RENAME_NOREPLACE
hostfs: support rename flags
shmem: support RENAME_EXCHANGE
shmem: support RENAME_NOREPLACE
btrfs: add RENAME_NOREPLACE
...