Pull backing device changes from Jens Axboe:
"This contains a cleanup of how the backing device is handled, in
preparation for a rework of the life time rules. In this part, the
most important change is to split the unrelated nommu mmap flags from
it, but also removing a backing_dev_info pointer from the
address_space (and inode), and a cleanup of other various minor bits.
Christoph did all the work here, I just fixed an oops with pages that
have a swap backing. Arnd fixed a missing export, and Oleg killed the
lustre backing_dev_info from staging. Last patch was from Al,
unexporting parts that are now no longer needed outside"
* 'for-3.20/bdi' of git://git.kernel.dk/linux-block:
Make super_blocks and sb_lock static
mtd: export new mtd_mmap_capabilities
fs: make inode_to_bdi() handle NULL inode
staging/lustre/llite: get rid of backing_dev_info
fs: remove default_backing_dev_info
fs: don't reassign dirty inodes to default_backing_dev_info
nfs: don't call bdi_unregister
ceph: remove call to bdi_unregister
fs: remove mapping->backing_dev_info
fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
nilfs2: set up s_bdi like the generic mount_bdev code
block_dev: get bdev inode bdi directly from the block device
block_dev: only write bdev inode on close
fs: introduce f_op->mmap_capabilities for nommu mmap support
fs: kill BDI_CAP_SWAP_BACKED
fs: deduplicate noop_backing_dev_info
Use iov_iter_kvec() there, get rid of set_fs() games - now that
rxrpc_send_data() uses iov_iter primitives, it'll handle ITER_KVEC just
fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.
Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead. Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device. It also removes the need for
the mtd_inodefs filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)
- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)
- sched/numa updates and optimizations (Rik van Riel)
- sysbench speedup (Vincent Guittot)
- capacity calculation cleanups/refactoring (Vincent Guittot)
- Various cleanups to thread group iteration (Oleg Nesterov)
- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)
- various sched/deadline fixes
... and lots of other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...
Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.
Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Correctly assemble the client UUID by OR'ing in the flags rather than
assigning them over the other components.
Reported-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.
In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...
Pull file locking changes from Jeff Layton:
"Pretty quiet on the file-locking related front this cycle. Just some
small cleanups and the addition of some tracepoints in the lease
handling code"
* tag 'locks-v3.16' of git://git.samba.org/jlayton/linux:
locks: add some tracepoints in the lease handling code
fs/locks.c: replace seq_printf by seq_puts
locks: ensure that fl_owner is always initialized properly in flock and lease codepaths
Currently, the fl_owner isn't set for flock locks. Some filesystems use
byte-range locks to simulate flock locks and there is a common idiom in
those that does:
fl->fl_owner = (fl_owner_t)filp;
fl->fl_start = 0;
fl->fl_end = OFFSET_MAX;
Since flock locks are generally "owned" by the open file description,
move this into the common flock lock setup code. The fl_start and fl_end
fields are already set appropriately, so remove the unneeded setting of
that in flock ops in those filesystems as well.
Finally, the lease code also sets the fl_owner as if they were owned by
the process and not the open file description. This is incorrect as
leases have the same ownership semantics as flock locks. Set them the
same way. The lease code doesn't actually use the fl_owner value for
anything, so this is more for consistency's sake than a bugfix.
Reported-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Staging portion)
Acked-by: J. Bruce Fields <bfields@fieldses.org>
call->async_workfn() can take an afs_call* arg rather than a work_struct* as
the functions assigned there are now called from afs_async_workfn() which has
to call container_of() anyway.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
At present, it is not possible to successfully unload the kafs module if there
are outstanding async outgoing calls (those made with afs_make_call()). This
appears to be due to the changes introduced by:
commit 059499453a
Author: Tejun Heo <tj@kernel.org>
Date: Fri Mar 7 10:24:50 2014 -0500
Subject: afs: don't use PREPARE_WORK
which didn't go far enough. The problem is due to:
(1) The aforementioned commit introduced a separate handler function pointer
in the call, call->async_workfn, in addition to the original workqueue
item, call->async_work, for asynchronous operations because workqueues
subsystem cannot handle the workqueue item pointer being changed whilst
the item is queued or being processed.
(2) afs_async_workfn() was introduced in that commit to be the callback for
call->async_work. Its sole purpose is to run whatever call->async_workfn
points to.
(3) call->async_workfn is only used from afs_async_workfn(), which is only
set on async_work by afs_collect_incoming_call() - ie. for incoming
calls.
(4) call->async_workfn is *not* set by afs_make_call() when outgoing calls are
made, and call->async_work is set afs_process_async_call() - and not
afs_async_workfn().
(5) afs_process_async_call() now changes call->async_workfn rather than
call->async_work to point to afs_delete_async_call() to clean up, but this
is only effective for incoming calls because call->async_work does not
point to afs_async_workfn() for outgoing calls.
(6) Because, for incoming calls, call->async_work remains pointing to
afs_process_async_call() this results in an infinite loop.
Instead, make the workqueue uniformly vector through call->async_workfn, via
afs_async_workfn() and simply initialise call->async_workfn to point to
afs_process_async_call() in afs_make_call().
Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Split afs_end_call() into two pieces, one of which is identical to code in
afs_process_async_call(). Replace the latter with a call to the first part of
afs_end_call().
Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the cache manager RPC service handlers. The afs_send_empty_reply() and
afs_send_simple_reply() functions:
(a) Kill the call and free up the buffers associated with it if they fail.
(b) Return with call intact if it they succeed.
However, none of the callers actually check the result or clean up if
successful - and may use the now non-existent data if it fails.
This was detected by Dan Carpenter using a static checker:
The patch 08e0e7c82e: "[AF_RXRPC]: Make the in-kernel AFS
filesystem use AF_RXRPC." from Apr 26, 2007, leads to the following
static checker warning:
"fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
warn: 'call' was already freed."
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.
afs_call->async_work is multiplexed with multiple work functions.
Introduce afs_async_workfn() which invokes afs_call->async_workfn and
always use it as the work function and update the users to set the
->async_workfn field instead of overriding the work function using
PREPARE_WORK().
It would probably be best to route this with other related updates
through the workqueue tree.
Compile tested.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-afs@lists.infradead.org
Both proc files are writeable and used for configuring cells. But
there is missing correct mode flag for writeable files. Without
this patch both proc files are read only.
[ It turns out they aren't really read-only, since root can write to
them even if the write bit isn't set due to CAP_DAC_OVERRIDE ]
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>