addr_is_blank() should return true if family is neither AF_INET nor
AF_INET6. This is what its counterpart entity_addr_t::is_blank_ip() is
doing and it is the right thing to do: in process_banner() we check if
our address is blank and if it is "learn" it from our peer. As it is,
we never learn our address and always send out a blank one. This goes
way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and
some ipv6 support groundwork") from 2009.
While at at, do not open-code ipv6_addr_any() and use INADDR_ANY
constant instead of 0.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net. Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.
This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
Commit 835a6a2f86 ("Bluetooth: Stop sabotaging list poisoning")
thought that the code was sabotaging the list poisoning when NULL'ing
out the list pointers and removed it.
But what was going on was that the bluetooth code was using NULL
pointers for the list as a way to mark it empty, and that commit just
broke it (and replaced the test with NULL with a "list_empty()" test on
a uninitialized list instead, breaking things even further).
So fix it all up to use the regular and real list_empty() handling
(which does not use NULL, but a pointer to itself), also making sure to
initialize the list properly (the previous NULL case was initialized
implicitly by the session being allocated with kzalloc())
This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
An.
[ I would normally expect to get this through the bt tree, but I'm going
to release -rc1, so I'm just committing this directly - Linus ]
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Original-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Original-by: Marcel Holtmann <marcel@holtmann.org>:
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
if response is impossible to parse and we discard the request, get the
out of the loop right there.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If we'd already sent a request and decide to abort it, we *must*
issue TFLUSH properly and not just blindly reuse the tag, or
we'll get seriously screwed when response eventually arrives
and we confuse it for response to later request that had reused
the same tag.
Cc: stable@vger.kernel.org # v3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull Ceph updates from Sage Weil:
"We have a pile of bug fixes from Ilya, including a few patches that
sync up the CRUSH code with the latest from userspace.
There is also a long series from Zheng that fixes various issues with
snapshots, inline data, and directory fsync, some simplification and
improvement in the cap release code, and a rework of the caching of
directory contents.
To top it off there are a few small fixes and cleanups from Benoit and
Hong"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
rbd: use GFP_NOIO in rbd_obj_request_create()
crush: fix a bug in tree bucket decode
libceph: Fix ceph_tcp_sendpage()'s more boolean usage
libceph: Remove spurious kunmap() of the zero page
rbd: queue_depth map option
rbd: store rbd_options in rbd_device
rbd: terminate rbd_opts_tokens with Opt_err
ceph: fix ceph_writepages_start()
rbd: bump queue_max_segments
ceph: rework dcache readdir
crush: sync up with userspace
crush: fix crash from invalid 'take' argument
ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
ceph: pre-allocate data structure that tracks caps flushing
ceph: re-send flushing caps (which are revoked) in reconnect stage
ceph: send TID of the oldest pending caps flush to MDS
ceph: track pending caps flushing globally
ceph: track pending caps flushing accurately
libceph: fix wrong name "Ceph filesystem for Linux"
ceph: fix directory fsync
...
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix a crash in the NFSv4 file locking code.
- Fix an fsync() regression, where we were failing to retry I/O in
some circumstances.
- Fix an infinite loop in NFSv4.0 OPEN stateid recovery
- Fix a memory leak when an attempted pnfs fails.
- Fix a memory leak in the backchannel code
- Large hostnames were not supported correctly in NFSv4.1
- Fix a pNFS/flexfiles bug that was impeding error reporting on I/O.
- Fix a couple of credential issues in pNFS/flexfiles
Bugfixes + cleanups:
- Open flag sanity checks in the NFSv4 atomic open codepath
- More NFSv4 delegation related bugfixes
- Various NFSv4.1 backchannel bugfixes and cleanups
- Fix the NFS swap socket code
- Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code
- Fix a UDP transport deadlock issue
Features:
- More RDMA client transport improvements
- NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles"
* tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits)
nfs: Remove invalid tk_pid from debug message
nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh
nfs: Drop bad comment in nfs41_walk_client_list()
nfs: Remove unneeded micro checking of CONFIG_PROC_FS
nfs: Don't setting FILE_CREATED flags always
nfs: Use remove_proc_subtree() instead remove_proc_entry()
nfs: Remove unused argument in nfs_server_set_fsinfo()
nfs: Fix a memory leak when meeting an unsupported state protect
nfs: take extra reference to fl->fl_file when running a LOCKU operation
NFSv4: When returning a delegation, don't reclaim an incompatible open mode.
NFSv4.2: LAYOUTSTATS is optional to implement
NFSv4.2: Fix up a decoding error in layoutstats
pNFS/flexfiles: Fix the reset of struct pgio_header when resending
pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
pnfs/flexfiles: protect ktime manipulation with mirror lock
nfs: provide pnfs_report_layoutstat when NFS42 is disabled
nfs: verify open flags before allowing open
nfs: always update creds in mirror, even when we have an already connected ds
nfs: fix potential credential leak in ff_layout_update_mirror_cred
pnfs/flexfiles: report layoutstat regularly
...
Pull module_init replacement part two from Paul Gortmaker:
"Replace module_init with appropriate alternate initcall in non
modules.
This series converts non-modular code that is using the module_init()
call to hook itself into the system to instead use one of our
alternate priority initcalls.
Unlike the previous series that used device_initcall and hence was a
runtime no-op, these commits change to one of the alternate initcalls,
because (a) we have them and (b) it seems like the right thing to do.
For example, it would seem logical to use arch_initcall for arch
specific setup code and fs_initcall for filesystem setup code.
This does mean however, that changes in the init ordering will be
taking place, and so there is a small risk that some kind of implicit
init ordering issue may lie uncovered. But I think it is still better
to give these ones sensible priorities than to just assign them all to
device_initcall in order to exactly preserve the old ordering.
Thad said, we have already made similar changes in core kernel code in
commit c96d6660dc ("kernel: audit/fix non-modular users of
module_init in core code") without any regressions reported, so this
type of change isn't without precedent. It has also got the same
local testing and linux-next coverage as all the other pull requests
that I'm sending for this merge window have got.
Once again, there is an unused module_exit function removal that shows
up as an outlier upon casual inspection of the diffstat"
* tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
mm/page_owner.c: use late_initcall to hook in enabling
lib/list_sort: use late_initcall to hook in self tests
arm: use subsys_initcall in non-modular pl320 IPC code
powerpc: don't use module_init for non-modular core hugetlb code
powerpc: use subsys_initcall for Freescale Local Bus
x86: don't use module_init for non-modular core bootflag code
netfilter: don't use module_init/exit in core IPV4 code
fs/notify: don't use module_init for non-modular inotify_user code
mm: replace module_init usages with subsys_initcall in nommu.c
Pull networking fixes from David Miller:
1) mlx4 driver bug fixes (TX queue wakeups, csum complete indications)
from Ido Shamay, Eran Ben Elisha, and Or Gerlitz.
2) Missing unlock in error path of PTP support in renesas driver, from
Dan Carpenter.
3) Add Vitesse 8641 phy IDs to vitesse PHY driver, from Shaohui Xie.
4) Bnx2x driver bug fixes (linearization of encap packets, scratchpad
parity error notifications, flow-control and speed settings) from
Yuval Mintz, Manish Chopra, Shahed Shaikh, and Ariel Elior.
5) ipv6 extension header parsing in the igb chip has a HW errata,
disable it. Frm Todd Fujinaka.
6) Fix PCI link state locking issue in e1000e driver, from Yanir
Lubetkin.
7) Cure panics during MTU change in i40e, from Mitch Williams.
8) Don't leak promisc refs in DSA slave driver, from Gilad Ben-Yossef.
9) Add missing HAS_DMA dep to VIA Rhine driver, from Geery
Uytterhoeven.
10) Make sure DMA map/unmap calls are symmetric in bnx2x driver, from
Michal Schmidt.
11) Workaround for MDIO access problems in bcm7xxx devices, from FLorian
Fainelli.
12) Fix races in SCTP protocol between OTTB responses and route
removals, from Alexander Sverdlin.
13) Fix jumbo frame checksum issue with some mvneta devices, from Simon
Guinot.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (58 commits)
sock_diag: don't broadcast kernel sockets
net: mvneta: disable IP checksum with jumbo frames for Armada 370
ARM: mvebu: update Ethernet compatible string for Armada XP
net: mvneta: introduce compatible string "marvell, armada-xp-neta"
api: fix compatibility of linux/in.h with netinet/in.h
net: icplus: fix typo in constant name
sis900: Trivial: Fix typos in enums
stmmac: Trivial: fix typo in constant name
sctp: Fix race between OOTB responce and route removal
net-Liquidio: Delete unnecessary checks before the function call "vfree"
vmxnet3: Bump up driver version number
amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs
net: bcmgenet: workaround initial read failures for integrated PHYs
net: phy: bcm7xxx: workaround MDIO management controller initial read
bnx2x: fix DMA API usage
net: via: VIA_RHINE and VIA_VELOCITY should depend on HAS_DMA
net/phy: tune get_phy_c45_ids to support more c45 phy
bnx2x: fix lockdep splat
net: fec: don't access RACC register when not available
...
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.
A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
should be used. -Wconversion catches this, but I guess it went
unnoticed in all the noise it spews. The actual problem (at least for
common crushmaps) isn't the u32 -> u8 truncation though - it's the
advancement by 4 bytes instead of 1 in the crushmap buffer.
Fixes: http://tracker.ceph.com/issues/2759
Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Kernel sockets do not hold a reference for the network namespace to
which they point. Socket destruction broadcasting relies on the
network namespace and will cause the splat below when a kernel socket
is destroyed.
This fix simply ignores kernel sockets when they are destroyed.
Reported as:
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 9130 Comm: kworker/1:1 Not tainted 4.1.0-gelk-debug+ #1
Workqueue: sock_diag_events sock_diag_broadcast_destroy_work
Stack:
ffff8800b9c586c0 ffff8800b9c586c0 ffff8800ac4692c0 ffff8800936d4a90
ffff8800352efd38 ffffffff8469a93e ffff8800352efd98 ffffffffc09b9b90
ffff8800352efd78 ffff8800ac4692c0 ffff8800b9c586c0 ffff8800831b6ab8
Call Trace:
[<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10
[<ffffffffc09b9b90>] ? inet_diag_handler_get_info+0x110/0x1fb [inet_diag]
[<ffffffff845c868d>] netlink_broadcast+0x1d/0x20
[<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10
[<ffffffff845b2bf5>] sock_diag_broadcast_destroy_work+0xd5/0x160
[<ffffffff8408ea97>] process_one_work+0x147/0x420
[<ffffffff8408f0f9>] worker_thread+0x69/0x470
[<ffffffff8409fda3>] ? preempt_count_sub+0xa3/0xf0
[<ffffffff8408f090>] ? rescuer_thread+0x320/0x320
[<ffffffff84093cd7>] kthread+0x107/0x120
[<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff8469d31f>] ret_from_fork+0x3f/0x70
[<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0
Tested:
Using a debug kernel while 'ss -E' is running:
ip netns add test-ns
ip netns delete test-ns
Fixes: eb4cb00852 sock_diag: define destruction multicast groups
Fixes: 26abe14379 net: Modify sk_alloc to not reference count the
netns of kernel sockets.
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:
bool last_piece; /* current is last piece */
In ceph_msg_data_next():
*last_piece = cursor->last_piece;
A call to ceph_msg_data_next() is followed by:
ret = ceph_tcp_sendpage(con->sock, page, page_offset,
length, last_piece);
while ceph_tcp_sendpage() is:
static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
int offset, size_t size, bool more)
The logic is inverted: correct it.
Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
net/core/flow_dissector.c: In function ‘__skb_flow_dissect’:
net/core/flow_dissector.c:132: warning: ‘ip_proto’ may be used uninitialized in this function
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 1f66d161ab
("tipc: introduce starvation free send algorithm")
we introduced a counter per priority level for buffers
in the link backlog queue. We also introduced a new
function tipc_link_purge_backlog(), to reset these
counters to zero when the link is reset.
Unfortunately, we missed to call this function when
the broadcast link is reset, with the result that the
values of these counters might be permanently skewed
when new nodes are attached. This may in the worst case
lead to permananent, but spurious, broadcast link
congestion, where no broadcast packets can be sent at
all.
We fix this bug with this commit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd updates from Bruce Fields:
"A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"
* 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
nfsd: wrap too long lines in nfsd4_encode_read
nfsd: fput rd_file from XDR encode context
nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
nfsd: refactor nfs4_preprocess_stateid_op
nfsd: clean up raparams handling
nfsd: use swap() in sort_pacl_range()
rpcrdma: Merge svcrdma and xprtrdma modules into one
svcrdma: Add a separate "max data segs macro for svcrdma
svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
svcrdma: Keep rpcrdma_msg fields in network byte-order
svcrdma: Fix byte-swapping in svc_rdma_sendto.c
nfsd: Update callback sequnce id only CB_SEQUENCE success
nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
SUNRPC: Move EXPORT_SYMBOL for svc_process
uapi/nfs: Add NFSv4.1 ACL definitions
nfsd: Remove dead declarations
nfsd: work around a gcc-5.1 warning
nfsd: Checking for acl support does not require fetching any acls
...
Fix typo in the validation rules for flower's attributes
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>