Pull CIFS fixes from Steve French:
"Fixes for a couple of DFS problems, a problem with extended security
negotiation and two other small cifs fixes"
* 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix composing of mount options for DFS referrals
cifs: stop printing the unc= option in /proc/mounts
cifs: fix error handling when calling cifs_parse_devname
cifs: allow sec=none mounts to work against servers that don't support extended security
cifs: fix potential buffer overrun when composing a new options string
cifs: only set ops for inodes in I_NEW state
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available
* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix SETCLIENTID fallback if GSS is not available
SUNRPC: Prevent an rpc_task wakeup race
NFSv4.1 Fix a pNFS session draining deadlock
SUNRPC: Convert auth_gss pipe detection to work in namespaces
SUNRPC: Faster detection if gssd is actually running
SUNRPC: Fix a bug in gss_create_upcall
Pull xfs fixes from Ben Myers:
"Here are fixes for corruption on 512 byte filesystems, a rounding
error, a use-after-free, some flags to fix lockdep reports, and
several fixes related to CRCs. We have a somewhat larger post -rc1
queue than usual due to fixes related to the CRC feature we merged for
3.10:
- Fix for corruption with FSX on 512 byte blocksize filesystems
- Fix rounding error in xfs_free_file_space
- Fix use-after-free with extent free intents
- Add several missing KM_NOFS flags to fix lockdep reports
- Several fixes for CRC related code"
* tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
xfs: remote attribute lookups require the value length
xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
xfs: fix missing KM_NOFS tags to keep lockdep happy
xfs: Don't reference the EFI after it is freed
xfs: fix rounding in xfs_free_file_space
xfs: fix sub-page blocksize data integrity writes
The recent changes overhauling fs/aio.c introduced a bug that results in
the kioctx not being freed when outstanding kiocbs are cancelled at
exit_aio() time. Specifically, a kiocb that is cancelled has its
completion events discarded by batch_complete_aio(), which then fails to
wake up the process stuck in free_ioctx(). Fix this by modifying the
wait_event() condition in free_ioctx() appropriately.
This patch was tested with the cancel operation in the thread based code
posted yesterday.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug. My kernel version is 3.0.13, and it is also in the lastest version
3.9. In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call. This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places). Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
ocfs2_inode_lock().
But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
rw lock. This will cause a bug in ocfs2_lock_res_free() when testing
res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
decreased in __ocfs2_cluster_unlock().
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: "Duyongfeng (B)" <du.duyongfeng@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit e461fcb194)
xfstests generic/117 fails with:
XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)
indicating a function that does not handle the attr3 format
correctly. Fix it.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit b38958d715)
There are several places where we use KM_SLEEP allocation contexts
and use the fact that they are called from transaction context to
add KM_NOFS where appropriate. Unfortunately, there are several
places where the code makes this assumption but can be called from
outside transaction context but with filesystem locks held. These
places need explicit KM_NOFS annotations to avoid lockdep
complaining about reclaim contexts.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit ac14876cf9)
Checking the EFI for whether it is being released from recovery
after we've already released the known active reference is a mistake
worthy of a brown paper bag. Fix the (now) obvious use after free
that it can cause.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 52c24ad39f)
The offset passed into xfs_free_file_space() needs to be rounded
down to a certain size, but the rounding mask is built by a 32 bit
variable. Hence the mask will always mask off the upper 32 bits of
the offset and lead to incorrect writeback and invalidation ranges.
This is not actually exposed as a bug because we writeback and
invalidate from the rounded offset to the end of the file, and hence
the offset we are actually punching a hole out of will always be
covered by the code. This needs fixing, however, if we ever want to
use exact ranges for writeback/invalidation here...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 28ca489c63)
FSX on 512 byte block size filesystems has been failing for some
time with corrupted data. The fault dates back to the change in
the writeback data integrity algorithm that uses a mark-and-sweep
approach to avoid data writeback livelocks.
Unfortunately, a side effect of this mark-and-sweep approach is that
each page will only be written once for a data integrity sync, and
there is a condition in writeback in XFS where a page may require
two writeback attempts to be fully written. As a result of the high
level change, we now only get a partial page writeback during the
integrity sync because the first pass through writeback clears the
mark left on the page index to tell writeback that the page needs
writeback....
The cause is writing a partial page in the clustering code. This can
happen when a mapping boundary falls in the middle of a page - we
end up writing back the first part of the page that the mapping
covers, but then never revisit the page to have the remainder mapped
and written.
The fix is simple - if the mapping boundary falls inside a page,
then simple abort clustering without touching the page. This means
that the next ->writepage entry that write_cache_pages() will make
is the page we aborted on, and xfs_vm_writepage() will map all
sections of the page correctly. This behaviour is also optimal for
non-data integrity writes, as it results in contiguous sequential
writeback of the file rather than missing small holes and having to
write them a "random" writes in a future pass.
With this fix, all the fsx tests in xfstests now pass on a 512 byte
block size filesystem on a 4k page machine.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 49b137cbbc)
With the change to ignore the unc= and prefixpath= mount options, there
is no longer any need to add them to the options string when mounting.
By the same token, we now need to build a device name that includes the
prefixpath when mounting.
To make things neater, the delimiters on the devicename are changed
to '/' since that's preferred when mounting anyway.
v2: fix some comments and don't bother looking at whether there is
a prepath in the ref->node_name when deciding whether to pass
a prepath to cifs_build_devname.
v3: rebase on top of potential buffer overrun fix for stable
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Since we no longer recognize that option, stop printing it out. The
devicename is now the canonical source for this info.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
When we allowed separate unc= and prefixpath= mount options, we could
ignore EINVAL errors from cifs_parse_devname. Now that they are
deprecated, we need to check for that as well and fail the mount if it's
malformed.
Also fix a later error message that refers to the unc= option.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
In the case of sec=none, we're not sending a username or password, so
there's little benefit to mandating NTLMSSP auth. Allow it to use
unencapsulated auth in that case.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Consider the case where we have a very short ip= string in the original
mount options, and when we chase a referral we end up with a very long
IPv6 address. Be sure to allow for that possibility when estimating the
size of the string to allocate.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
It's generally not safe to reset the inode ops once they've been set. In
the case where the inode was originally thought to be a directory and
then later found to be a DFS referral, this can lead to an oops when we
try to trigger an inode op on it after changing the ops to the blank
referral operations.
Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Pull CIFS fix from Steve French:
"One cifs fix to merge now - fixes possible DFS oops (I expect to
request a merge of 4 additional cifs fixes next week)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: only set ops for inodes in I_NEW state