Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
- Stable fix for a loop on irrecoverable errors when returning
delegations
- Fix a 3-way deadlock between layoutreturn, open, and state recovery
- Update the MAINTAINERS file with contact information for Trond
Myklebust
- Close needs to handle NFS4ERR_ADMIN_REVOKED
- Enabling v4.2 should not recompile nfsd and lockd
- Fix a couple of compile warnings
* tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix do_div() warning by instead using sector_div()
MAINTAINERS: Update contact information for Trond Myklebust
NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
SUNRPC: do not fail gss proc NULL calls with EACCES
NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
NFSv4: Update list of irrecoverable errors on DELEGRETURN
NFSv4 wait on recovery for async session errors
NFS: Fix a warning in nfs_setsecurity
NFS: Enabling v4.2 should not recompile nfsd and lockd
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
shown below. Fix this warning by instead using sector_div() which is provided
by the kernel.h header file.
fs/nfs/blocklayout/extents.c: In function ‘normalize’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson reports:
The state manager is recovering expired state and recovery OPENs are being
processed. If kswapd is pruning inodes at the same time, a deadlock can occur
when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
resultant layoutreturn gets an error that the state mangager is to handle,
causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
At the same time an open is waiting for the inode deletion to complete in
__wait_on_freeing_inode.
If the open is either the open called by the state manager, or an open from
the same open owner that is holding the NFSv4 sequence id which causes the
OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
then the state is deadlocked with kswapd.
The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
We already know that layouts are dropped on all server reboots, and that
it has to be coded to deal with the "forgetful client model" that doesn't
send layoutreturns.
Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
Pull squashfs bugfix from Phillip Lougher:
"Just a single bug fix to the new "directly decompress into the page
cache" code"
* tag 'squashfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
Squashfs: fix failure to unlock pages on decompress error
The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:
spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);
where the final freeing is done last.
HOWEVER. The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.
Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.
Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early. And since there were two users of this
pattern, create a helper function for it.
Introduced commit ba5bb14733 ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").
Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Ian Applegate <ia@cloudflare.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # v3.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs dentry reference count fix from Al Viro.
This fixes a possible inode_permission NULL pointer dereference (and
other problems) that were due to the root dentry count being decremented
too much. In commit 48a066e72d ("RCU'd vfsmounts") the placement of
clearing the LOOKUP_RCU bit changed, and we then returned failure of
incrementing the lockref on the parent dentry with LOOKUP_RCU cleared.
But that meant we needed to go through the same cleanup routines that
the later failures did wrt LOOKUP_ROOT and nd->root.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix bogus path_put() of nd->root after some unlazy_walk() failures
Failure to grab reference to parent dentry should go through the
same cleanup as nd->seq mismatch. As it is, we might end up with
caller thinking it needs to path_put() nd->root, with obvious
nasty results once we'd hit that bug enough times to drive the
refcount of root dentry all the way to zero...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull cifs fixes from Steve French:
"SMB3 "validate negotiate" is needed to prevent certain types of
downgrade attacks.
Also changes SMB2/SMB3 copy offload from using the BTRFS copy ioctl
(BTRFS_IOC_CLONE) to a cifs specific ioctl (CIFS_IOC_COPYCHUNK_FILE)
to address Christoph's comment that there are semantic differences
between requesting copy offload in which copy-on-write is mandatory
(as in the BTRFS ioctl) and optional in the SMB2/SMB3 case. Also
fixes SMB2/SMB3 copychunk for large files"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload
Check SMB3 dialects against downgrade attacks
Removed duplicated (and unneeded) goto
CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files
Pull driver core fixes from Greg KH:
"Here are 3 patches for sysfs issues that have been reported. Well, 1
patch really, the first one is reverted as it's not really needed (the
correct fix is coming in through the different driver subsystems
instead)
But that 1 sysfs fix is needed, so this is still a good thing to pull
in now"
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tag 'driver-core-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()"
sysfs: use a separate locking class for open files depending on mmap
sysfs: handle duplicate removal attempts in sysfs_remove_group()
This tool hasn't been maintained in over a decade, and is pretty much
useless these days. Let's pretend it never happened.
Also remove a long-dead email address.
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 54d71145a4.
The root cause of these "inverted" sysfs removals have now been found,
so there is no need for this patch. Keep this functionality around so
that this type of error doesn't show up in driver code again.
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull ceph bug-fixes from Sage Weil:
"These include a couple fixes to the new fscache code that went in
during the last cycle (which will need to go stable@ shortly as well),
a couple client-side directory fragmentation fixes, a fix for a race
in the cap release queuing path, and a couple race fixes in the
request abort and resend code.
Obviously some of this could have gone into 3.12 final, but I
preferred to overtest rather than send things in for a late -rc, and
then my travel schedule intervened"
* 'for-linus-bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: allocate non-zero page to fscache in readpage()
ceph: wake up 'safe' waiters when unregistering request
ceph: cleanup aborted requests when re-sending requests.
ceph: handle race between cap reconnect and cap release
ceph: set caps count after composing cap reconnect message
ceph: queue cap release in __ceph_remove_cap()
ceph: handle frag mismatch between readdir request and reply
ceph: remove outdated frag information
ceph: hung on ceph fscache invalidate in some cases
Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
of BTRFS_IOC_CLONE to avoid confusion about whether
copy-on-write is required or optional for this operation.
SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
they both speed up copy by offloading the copy rather than
passing many read and write requests back and forth and both have
identical syntax (passing file handles), but for SMB2/SMB3
CopyChunk the server is not required to use copy-on-write
to make a copy of the file (although some do), and Christoph
has commented that since CopyChunk does not require
copy-on-write we should not reuse BTRFS_IOC_CLONE.
This patch renames the ioctl to use a cifs specific IOCTL
CIFS_IOCTL_COPYCHUNK. This ioctl is particularly important
for SMB2/SMB3 since large file copy over the network otherwise
can be very slow, and with this is often more than 100 times
faster putting less load on server and client.
Note that if a copy syscall is ever introduced, depending on
its requirements/format it could end up using one of the other
three methods that CIFS/SMB2/SMB3 can do for copy offload,
but this method is particularly useful for file copy
and broadly supported (not just by Samba server).
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Direct decompression into the page cache. If we fall back
to using an intermediate buffer (because we cannot grab all the
page cache pages) and we get a decompress fail, we forgot to
release the pages.
Reported-by: Roman Peniaev <r.peniaev@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
ceph_osdc_readpages() returns number of bytes read, currently,
the code only allocate full-zero page into fscache, this patch
fixes this.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
When a cap get released while composing the cap reconnect message.
We should skip queuing the release message if the cap hasn't been
added to the cap reconnect message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
It's possible that some caps get released while composing the cap
reconnect message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The following two commits implemented mmap support in the regular file
path and merged bin file support into the regular path.
73d9714627 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
3124eb1679 ("sysfs: merge regular and bin file handling")
After the merge, the following commands trigger a spurious lockdep
warning. "test-mmap-read" simply mmaps the file and dumps the
content.
$ cat /sys/block/sda/trace/act_mask
$ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0-work+ #378 Not tainted
-------------------------------------------------------
test-mmap-read/567 is trying to acquire lock:
(&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&mm->mmap_sem){++++++}:
...
-> #2 (sr_mutex){+.+.+.}:
...
-> #1 (&bdev->bd_mutex){+.+.+.}:
...
-> #0 (&of->mutex){+.+.+.}:
...
other info that might help us debug this:
Chain exists of:
&of->mutex --> sr_mutex --> &mm->mmap_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(sr_mutex);
lock(&mm->mmap_sem);
lock(&of->mutex);
*** DEADLOCK ***
1 lock held by test-mmap-read/567:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
stack backtrace:
CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
Call Trace:
[<ffffffff81611ad2>] dump_stack+0x4e/0x7a
[<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
[<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
[<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
[<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
[<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
[<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
[<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
[<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
[<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
[<ffffffff81008282>] SyS_mmap+0x22/0x30
[<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
This happens because one file nests sr_mutex, which nests mm->mmap_sem
under it, under of->mutex while mmap implementation naturally nests
of->mutex under mm->mmap_sem. The warning is false positive as
of->mutex is per open-file and the two paths belong to two different
files. This warning didn't trigger before regular and bin file
supports were merged because only bin file supported mmap and the
other side of locking happened only on regular files which used
equivalent but separate locking.
It'd be best if we give separate locking classes per file but we can't
easily do that. Let's differentiate on ->mmap() for now. Later we'll
add explicit file operations struct and can add per-ops lockdep key
there.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit bcdde7e221 (sysfs: make __sysfs_remove_dir() recursive) changed
the behavior so that directory removals will be done recursively. This
means that the sysfs group might already be removed if its parent directory
has been removed.
The current code outputs warnings similar to following log snippet when it
detects that there is no group for the given kobject:
WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
Modules linked in:
CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
Call Trace:
[<ffffffff817daab1>] dump_stack+0x45/0x56
[<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
[<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
[<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
[<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
[<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
[<ffffffff8142a0d0>] device_del+0x40/0x1b0
[<ffffffff8142a24d>] device_unregister+0xd/0x20
[<ffffffff8144131a>] scsi_remove_host+0xba/0x110
[<ffffffff8145f526>] ata_host_detach+0xc6/0x100
[<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
[<ffffffff812e8f48>] pci_device_remove+0x28/0x60
[<ffffffff8142d854>] __device_release_driver+0x64/0xd0
[<ffffffff8142d8de>] device_release_driver+0x1e/0x30
[<ffffffff8142d257>] bus_remove_device+0xf7/0x140
[<ffffffff8142a1b1>] device_del+0x121/0x1b0
[<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
[<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
[<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
[<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
[<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
[<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
[<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
[<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
[<ffffffff812fd90d>] hotplug_event+0xcd/0x160
[<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
[<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
[<ffffffff8105cf3a>] process_one_work+0x17a/0x430
[<ffffffff8105db29>] worker_thread+0x119/0x390
[<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
[<ffffffff81063a5d>] kthread+0xcd/0xf0
[<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
[<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
[<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
On this particular machine I see ~16 of these message during Thunderbolt
hot-unplug.
Fix this in similar way that was done for sysfs_remove_one() by checking
if the parent directory has already been removed and bailing out early.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull minor eCryptfs fix from Tyler Hicks:
"Quiet static checkers by removing unneeded conditionals"
* tag 'ecryptfs-3.13-rc1-quiet-checkers' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: file->private_data is always valid
Pull aio fixes from Benjamin LaHaise.
* git://git.kvack.org/~bcrl/aio-next:
aio: nullify aio->ring_pages after freeing it
aio: prevent double free in ioctx_alloc
aio: Fix a trinity splat
Pull nfsd bugfixes from Bruce Fields:
"A couple nfsd bugfixes"
* 'for-3.13' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix xdr decoding of large non-write compounds
nfsd: make sure to balance get/put_write_access
nfsd: split up nfsd_setattr