[ Upstream commit ae62ca7b03 ]
commit 35f9c09fe9 (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.
The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.
But some programs pass an arbitrary high value, and the test fails.
The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.
We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)
Many thanks to Willy for providing very clear bug report, bisection
and test programs.
Reported-by: Willy Tarreau <w@1wt.eu>
Bisected-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a9bae18954 upstream.
There exists a situation when GC can work in background alone without
any other filesystem activity during significant time.
The nilfs_clean_segments() method calls nilfs_segctor_construct() that
updates superblocks in the case of NILFS_SC_SUPER_ROOT and
THE_NILFS_DISCONTINUED flags are set. But when GC is working alone the
nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
As a result, the update of superblocks doesn't occurred all this time
and in the case of SPOR superblocks keep very old values of last super
root placement.
SYMPTOMS:
Trying to mount a NILFS2 volume after SPOR in such environment ends with
very long mounting time (it can achieve about several hours in some
cases).
REPRODUCING PATH:
1. It needs to use external USB HDD, disable automount and doesn't
make any additional filesystem activity on the NILFS2 volume.
2. Generate temporary file with size about 100 - 500 GB (for example,
dd if=/dev/zero of=<file_name> bs=1073741824 count=200). The size of
file defines duration of GC working.
3. Then it needs to delete file.
4. Start GC manually by means of command "nilfs-clean -p 0". When you
start GC by means of such way then, at the end, superblocks is updated
by once. So, for simulation of SPOR, it needs to wait sometime (15 -
40 minutes) and simply switch off USB HDD manually.
5. Switch on USB HDD again and try to mount NILFS2 volume. As a
result, NILFS2 volume will mount during very long time.
REPRODUCIBILITY: 100%
FIX:
This patch adds checking that superblocks need to update and set
THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.
Reported-by: Sergey Alexandrov <splavgm@gmail.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 10b8c7dff5 upstream.
When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 66bea92c69 upstream.
ext4_da_block_invalidatepages is missing a pagevec_init(),
which means that pvec->cold contains random garbage.
This affects whether the page goes to the front or
back of the LRU when ->cold makes it to
free_hot_cold_page()
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 96e5d1d3ad upstream.
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
commit fb719c59bd upstream.
Incrementing lenExtents even while writing to a hole is bad
for performance as calls to udf_discard_prealloc and
udf_truncate_tail_extent would not return from start if
isize != lenExtents
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Shuah Khan <shuah.khan@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 721e3eba21 upstream.
Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken. It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case. Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d096ad0f79 upstream.
When a journal-less ext4 filesystem is mounted on a read-only block
device (blockdev --setro will do), each remount (for other, unrelated,
flags, like suid=>nosuid etc) results in a series of scary messages
from kernel telling about I/O errors on the device.
This is becauese of the following code ext4_remount():
if (sbi->s_journal == NULL)
ext4_commit_super(sb, 1);
at the end of remount procedure, which forces writing (flushing) of
a superblock regardless whenever it is dirty or not, if the filesystem
is readonly or not, and whenever the device itself is readonly or not.
We only need call ext4_commit_super when the file system had been
previously mounted read/write.
Thanks to Eric Sandeen for help in diagnosing this issue.
Signed-off-By: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d7961c7fa4 upstream.
The following race is possible between start_this_handle() and someone
calling jbd2_journal_flush().
Process A Process B
start_this_handle().
if (journal->j_barrier_count) # false
if (!journal->j_running_transaction) { #true
read_unlock(&journal->j_state_lock);
jbd2_journal_lock_updates()
jbd2_journal_flush()
write_lock(&journal->j_state_lock);
if (journal->j_running_transaction) {
# false
... wait for committing trans ...
write_unlock(&journal->j_state_lock);
...
write_lock(&journal->j_state_lock);
if (!journal->j_running_transaction) { # true
jbd2_get_transaction(journal, new_transaction);
write_unlock(&journal->j_state_lock);
goto repeat; # eventually blocks on j_barrier_count > 0
...
J_ASSERT(!journal->j_running_transaction);
# fails
We fix the race by rechecking j_barrier_count after reacquiring j_state_lock
in exclusive mode.
Reported-by: yjwsignal@empal.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2cbba75a56 upstream.
Users of jffs2_do_reserve_space() expect they still held
erase_completion_lock after call to it. But there is a path
where jffs2_do_reserve_space() leaves erase_completion_lock unlocked.
The patch fixes it.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24ec19b0ae upstream.
In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.
This patch fixes that.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e25fbe380c upstream.
The following null pointer check is broken.
*option = match_strdup(args);
return !option;
The pointer `option' must be non-null, and thus `!option' is always false.
Use `!*option' instead.
The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some
cut-and-paste code.").
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5f50b0c29 upstream.
If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1f018458b3 upstream.
It is almost always wrong for NFS to call drop_nlink() after removing a
file. What we really want is to mark the inode's attributes for
revalidation, and we want to ensure that the VFS drops it if we're
reasonably sure that this is the final unlink().
Do the former using the usual cache validity flags, and the latter
by testing if inode->i_nlink == 1, and clearing it in that case.
This also fixes the following warning reported by Neil Brown and
Jeff Layton (among others).
[634155.004438] WARNING:
at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442]
Hardware name: Latitude E6510 [634155.004577] crc_itu_t crc32c_intel
snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm:
bash Tainted: G W 3.5.0-36-desktop # [634155.004611] Call Trace:
[634155.004630] [<ffffffff8100444a>] dump_trace+0xaa/0x2b0
[634155.004641] [<ffffffff815a23dc>] dump_stack+0x69/0x6f
[634155.004653] [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0
[634155.004662] [<ffffffff811832e4>] drop_nlink+0x34/0x40
[634155.004687] [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs]
[634155.004714] [<ffffffff8118049e>] dput+0x12e/0x230
[634155.004726] [<ffffffff8116b230>] __fput+0x170/0x230
[634155.004735] [<ffffffff81167c0f>] filp_close+0x5f/0x90
[634155.004743] [<ffffffff81167cd7>] sys_close+0x97/0x100
[634155.004754] [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b
[634155.004767] [<00007f2a73a0d110>] 0x7f2a73a0d10f
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f259613a1e upstream.
In rare circumstances, nfs_clone_server() of a v2 or v3 server can get
an error between setting server->destory (to nfs_destroy_server), and
calling nfs_start_lockd (which will set server->nlm_host).
If this happens, nfs_clone_server will call nfs_free_server which
will call nfs_destroy_server and thence nlmclnt_done(NULL). This
causes the NULL to be dereferenced.
So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL.
The other guards there are irrelevant as nlm_host can only be non-NULL
if one of these flags are set - so remove those tests. (Thanks to Trond
for this suggestion).
This is suitable for any stable kernel since 2.6.25.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b66c598401 upstream.
If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.
Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.
After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.
This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.
For a proof of concept, see DoTest.sh from:
http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: halfdog <me@halfdog.net>
Cc: P J P <ppandit@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>