just repeat the frozen check after regaining it, and check that sb
is still alive. If several threads hit acct_auto_close() at the
same time, acct_auto_close() will survive that just fine. And we
really don't want to play with writes and closing the file with
->s_umount held exclusive - it's a deadlock country.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make acct->count atomic and acct freeing - rcu-delayed.
* instead of grabbing acct_lock around the places where we take a reference,
do that under rcu_read_lock() with atomic_long_inc_not_zero().
* have the new acct locked before making ns->bacct point to it
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) file can't be NULL
b) file can't be changed under us
c) all writes are serialized by acct->lock; no need to mess with
spinlock there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Do not reuse bsd_acct_struct after closing the damn thing.
Structure lifetime is controlled by refcount now. We also
have a mutex in there, held over closing and writing (the
file is O_APPEND, so we are not losing any concurrency).
As the result, we do not need to bother with get_file()/fput()
on log write anymore. Moreover, do_acct_process() only needs
acct itself; file and pidns are picked from it.
Killed instances are distinguished by having NULL ->ns.
Refcount is protected by acct_lock; anybody taking the
mutex needs to grab a reference first.
The things will get a lot simpler in the next commits - this
is just the minimal chunk switching to the new lifetime rules.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There was an amusing bogosity in ac_rw calculation - it tried to
do encode_comp_t(encode_comp_t(0) / 1024). Seeing that comp_t is
a 3-bit exponent + 13-bit mantissa... it's a good thing that 0 is
represented by all-bits-clear.
The history of that one is interesting - it was introduced in
2.1.68pre1, when acct.c had been reworked and moved to separate
file. Two months later (2.1.86) somebody has noticed that the
sucker won't compile - there was no task_struct::io_usage.
At which point the ac_io calculation had changed from
encode_comp_t(current->io_usage) to encode_comp_t(0) and the
bug in the next line (absolutely real back then, had it ever
managed to compile) become a harmless bogosity. Looks like
nobody has ever noticed until now.
Anyway, let's bury that idiocy now that it got noticed. 17 years
is long enough...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull timer fixes from Thomas Gleixner:
"Two fixes in the timer area:
- a long-standing lock inversion due to a printk
- suspend-related hrtimer corruption in sched_clock"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
sched_clock: Avoid corrupting hrtimer tree during suspend
Pull ARM fixes from Russell King:
"A few fixes for ARM. Some of these are correctness issues:
- TLBs must be flushed after the old mappings are removed by the DMA
mapping code, but before the new mappings are established.
- An off-by-one entry error in the Keystone LPAE setup code.
Fixes include:
- ensuring that the identity mapping for LPAE does not remove the
kernel image from the identity map.
- preventing userspace from trapping into kgdb.
- fixing a preemption issue in the Intel iwmmxt code.
- fixing a build error with nommu.
Other changes include:
- Adding a note about which areas of memory are expected to be
accessible while the identity mapping tables are in place"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
ARM: idmap: add identity mapping usage note
ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
ARM: fix alignment of keystone page table fixup
ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled
ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable()
ARM: DMA: ensure that old section mappings are flushed from the TLB
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map. This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull vfs fixes from Al Viro:
"This contains a couple of fixes - one is the aio fix from Christoph,
the other a fallocate() one from Eric"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: fix check for fallocate on active swapfile
direct-io: fix AIO regression
Pull x86 fix from Peter Anvin:
"A single fix to not invoke the espfix code on Xen PV, as it turns out
to oops the guest when invoked after all. This patch leaves some
amount of dead code, in particular unnecessary initialization of the
espfix stacks when they won't be used, but in the interest of keeping
the patch minimal that cleanup can wait for the next cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64/entry/xen: Do not invoke espfix64 on Xen
Pull staging driver bugfixes from Greg KH:
"Here are some tiny staging driver bugfixes that I've had in my tree
for the past week that resolve some reported issues. Nothing major at
all, but it would be good to get them merged for 3.16-rc8 or -final"
* tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vt6655: Fix disassociated messages every 10 seconds
staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
staging: rtl8723au: rtw_resume(): release semaphore before exit on error
iio:bma180: Missing check for frequency fractional part
iio:bma180: Fix scale factors to report correct acceleration units
iio: buffer: Fix demux table creation
Pull device mapper fixes from Mike Snitzer:
"Fix dm bufio shrinker to properly zero-fill all fields.
Fix race in dm cache that caused improper reporting of the number of
dirty blocks in the cache"
* tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix race affecting dirty block count
dm bufio: fully initialize shrinker
Pull ARM straggler SoC fix from Olof Johansson:
"A DT bugfix for Nomadik that had an ambigouos double-inversion of a
gpio line, and one MAINTAINER URL update that might as well go in now.
We could hold off until the merge window, but then we'll just have to
mark the DT fix for stable and it just seems like in total causing
more work"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: Update Tegra Git URL
ARM: nomadik: fix up double inversion in DT
nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks. This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.
People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648
Fix this by using an atomic_t for nr_dirty.
Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
1d3d4437ea ("vmscan: per-node deferred work") added a flags field to
struct shrinker assuming that all shrinkers were zero filled. The dm
bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE.
But there are proposed patches which add other bits to shrinker.flags
(e.g. memcg awareness).
Rather than simply initializing the shrinker, this patch uses kzalloc()
when allocating the dm_bufio_client to ensure that the embedded shrinker
and any other similar structures are zeroed.
This fixes theoretical over aggressive shrinking of dm bufio objects.
If the uninitialized dm_bufio_client.shrinker.flags contains
SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
each numa node rather than just once. This has been broken since 3.12.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.12+