Merge emailed fixes from Andrew Morton:
"Bunch of fixes:
- delayed IPC updates. I held back on this because of some possible
outstanding bug reports, but they appear to have been addressed in
later versions
- A bunch of MAINTAINERS updates
- Yet Another RTC driver. I'd held this back while a couple of
little issues were being worked out.
I'm expecting an intrusive-but-simple patchset from Joe Perches which
splits up printk.c into kernel/printk/*. That will be a pig to
maintain for two months so if it passes testing I'd like to get it
upstream after a week or so."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
printk: fix incorrect length from print_time() when seconds > 99999
drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time
drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield
rtc: add RTC driver for TPS6586x
MAINTAINERS: fix drivers/staging/sm7xx/
MAINTAINERS: remove include/linux/of_pwm.h
MAINTAINERS: remove arch/*/lib/perf_event*.c
MAINTAINERS: remove drivers/mmc/host/imxmmc.*
MAINTAINERS: fix Documentation/mei/
MAINTAINERS: remove arch/x86/platform/mrst/pmu.*
MAINTAINERS: remove firmware/isci/
MAINTAINERS: fix drivers/ieee802154/
MAINTAINERS: fix .../plat-mxc/include/mach/imxfb.h
MAINTAINERS: remove drivers/video/epson1355fb.c
MAINTAINERS: fix drivers/media/usb/dvb-usb/cxusb*
MAINTAINERS: adjust for UAPI
MAINTAINERS: fix drivers/media/platform/atmel-isi.c
MAINTAINERS: fix arch/arm/mach-at91/include/mach/at_hdmac.h
MAINTAINERS: fix drivers/rtc/rtc-vt8500.c
MAINTAINERS: remove arch/arm/plat-s5p/
...
Cleanup. And I think we need more cleanups, in particular
__set_current_blocked() and sigprocmask() should die. Nobody should
ever block SIGKILL or SIGSTOP.
- Change set_current_blocked() to use __set_current_blocked()
- Change sys_sigprocmask() to use set_current_blocked(), this way it
should not worry about SIGKILL/SIGSTOP.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg pointed out that in a pid namespace the sequence.
- pid 1 becomes a zombie
- setns(thepidns), fork,...
- reaping pid 1.
- The injected processes exiting.
Can lead to processes attempting access their child reaper and
instead following a stale pointer.
That waitpid for init can return before all of the processes in
the pid namespace have exited is also unfortunate.
Avoid these problems by disabling the allocation of new pids in a pid
namespace when init dies, instead of when the last process in a pid
namespace is reaped.
Pointed-out-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The sequence:
unshare(CLONE_NEWPID)
clone(CLONE_THREAD|CLONE_SIGHAND|CLONE_VM)
Creates a new process in the new pid namespace without setting
pid_ns->child_reaper. After forking this results in a NULL
pointer dereference.
Avoid this and other nonsense scenarios that can show up after
creating a new pid namespace with unshare by adding a new
check in copy_prodcess.
Pointed-out-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull filesystem notification updates from Eric Paris:
"This pull mostly is about locking changes in the fsnotify system. By
switching the group lock from a spin_lock() to a mutex() we can now
hold the lock across things like iput(). This fixes a problem
involving unmounting a fs and having inodes be busy, first pointed out
by FAT, but reproducible with tmpfs.
This also restores signal driven I/O for inotify, which has been
broken since about 2.6.32."
Ugh. I *hate* the timing of this. It was rebased after the merge
window opened, and then left to sit with the pull request coming the day
before the merge window closes. That's just crap. But apparently the
patches themselves have been around for over a year, just gathering
dust, so now it's suddenly critical.
Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen
Rothwell's fixes from -next.
* 'for-next' of git://git.infradead.org/users/eparis/notify:
inotify: automatically restart syscalls
inotify: dont skip removal of watch descriptor if creation of ignored event failed
fanotify: dont merge permission events
fsnotify: make fasync generic for both inotify and fanotify
fsnotify: change locking order
fsnotify: dont put marks on temporary list when clearing marks by group
fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()
fsnotify: pass group to fsnotify_destroy_mark()
fsnotify: use a mutex instead of a spinlock to protect a groups mark list
fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed
fsnotify: take groups mark_lock before mark lock
fsnotify: use reference counting for groups
fsnotify: introduce fsnotify_get_group()
inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.
That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include <uapi/...
revert "rtc: recycle id when unloading a rtc driver"
mm: clean up transparent hugepage sysfs error messages
hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
hfsplus: rework processing of hfs_btree_write() returned error
hfsplus: rework processing errors in hfsplus_free_extents()
hfsplus: avoid crash on failed block map free
kcmp: include linux/ptrace.h
drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...
Pull signal handling cleanups from Al Viro:
"sigaltstack infrastructure + conversion for x86, alpha and um,
COMPAT_SYSCALL_DEFINE infrastructure.
Note that there are several conflicts between "unify
SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
resolution is trivial - just remove definitions of SS_ONSTACK and
SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
include/uapi/linux/signal.h contains the unified variant."
Fixed up conflicts as per Al.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
alpha: switch to generic sigaltstack
new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
generic compat_sys_sigaltstack()
introduce generic sys_sigaltstack(), switch x86 and um to it
new helper: compat_user_stack_pointer()
new helper: restore_altstack()
unify SS_ONSTACK/SS_DISABLE definitions
new helper: current_user_stack_pointer()
missing user_stack_pointer() instances
Bury the conditionals from kernel_thread/kernel_execve series
COMPAT_SYSCALL_DEFINE: infrastructure
task_numa_placement() oopsed on NULL p->mm when task_numa_fault() got
called in the handling of break_ksm() for ksmd. That might be a
peculiar case, which perhaps KSM could takes steps to avoid? but it's
more robust if task_numa_placement() allows for such a possibility.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull random updates from Ted Ts'o:
"A few /dev/random improvements for the v3.8 merge window."
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: Mix cputime from each thread that exits to the pool
random: prime last_data value per fips requirements
random: fix debug format strings
random: make it possible to enable debugging without rebuild
Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not
select it are completely unaffected
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 8d4516904b ("watchdog: Fix CPU hotplug regression") causes an
oops or hard lockup when doing
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog
and the kernel is booted with nmi_watchdog=1 (default)
Running laptop-mode-tools and disconnecting/connecting AC power will
cause this to trigger, making it a common failure scenario on laptops.
Instead of bailing out of watchdog_disable() when !watchdog_enabled we
can initialize the hrtimer regardless of watchdog_enabled status. This
makes it safe to call watchdog_disable() in the nmi_watchdog=0 case,
without the negative effect on the enabled => disabled => enabled case.
All these tests pass with this patch:
- nmi_watchdog=1
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog
- nmi_watchdog=0
echo 0 > /sys/devices/system/cpu/cpu1/online
- nmi_watchdog=0
echo mem > /sys/power/state
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51661
Cc: <stable@vger.kernel.org> # v3.7
Cc: Norbert Warmuth <nwarmuth@t-online.de>
Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull module update from Rusty Russell:
"Nothing all that exciting; a new module-from-fd syscall for those who
want to verify the source of the module (ChromeOS) and/or use standard
IMA on it or other security hooks."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Fix kbuild output when using default extra_certificates
MODSIGN: Avoid using .incbin in C source
modules: don't hand 0 to vmalloc.
module: Remove a extra null character at the top of module->strtab.
ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants
ASN.1: Define indefinite length marker constant
moduleparam: use __UNIQUE_ID()
__UNIQUE_ID()
MODSIGN: Add modules_sign make target
powerpc: add finit_module syscall.
ima: support new kernel module syscall
add finit_module syscall to asm-generic
ARM: add finit_module syscall to ARM
security: introduce kernel_module_from_file hook
module: add flags arg to sys_finit_module()
module: add syscall to load module from fd
Merge patches from Andrew Morton:
"Most of the rest of MM, plus a few dribs and drabs.
I still have quite a few irritating patches left around: ones with
dubious testing results, lack of review, ones which should have gone
via maintainer trees but the maintainers are slack, etc.
I need to be more activist in getting these things wrapped up outside
the merge window, but they're such a PITA."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits)
mm/vmscan.c: avoid possible deadlock caused by too_many_isolated()
vmscan: comment too_many_isolated()
mm/kmemleak.c: remove obsolete simple_strtoul
mm/memory_hotplug.c: improve comments
mm/hugetlb: create hugetlb cgroup file in hugetlb_init
mm/mprotect.c: coding-style cleanups
Documentation: ABI: /sys/devices/system/node/
slub: drop mutex before deleting sysfs entry
memcg: add comments clarifying aspects of cache attribute propagation
kmem: add slab-specific documentation about the kmem controller
slub: slub-specific propagation changes
slab: propagate tunable values
memcg: aggregate memcg cache values in slabinfo
memcg/sl[au]b: shrink dead caches
memcg/sl[au]b: track all the memcg children of a kmem_cache
memcg: destroy memcg caches
sl[au]b: allocate objects from memcg cache
sl[au]b: always get the cache from its page in kmem_cache_free()
memcg: skip memcg kmem allocations in specified code regions
memcg: infrastructure to match an allocation to the right cache
...
Because those architectures will draw their stacks directly from the page
allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG
flag, and issue the corresponding free_pages.
This code path is taken when the architecture doesn't define
CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has
THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the remaining
architectures fall in this category.
This will guarantee that every stack page is accounted to the memcg the
process currently lives on, and will have the allocations to fail if they
go over limit.
For the time being, I am defining a new variant of THREADINFO_GFP, not to
mess with the other path. Once the slab is also tracked by memcg, we can
get rid of that flag.
Tested to successfully protect against :(){ :|:& };:
Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is useful to know how many charges are still left after a call to
res_counter_uncharge. While it is possible to issue a res_counter_read
after uncharge, this can be racy.
If we need, for instance, to take some action when the counters drop down
to 0, only one of the callers should see it. This is the same semantics
as the atomic variables in the kernel.
Since the current return value is void, we don't need to worry about
anything breaking due to this change: nobody relied on that, and only
users appearing from now on will be checking this value.
Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: JoonSoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The array check is useless so remove it.
[akpm@linux-foundation.org: remove comment, per David]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull minor tracing updates and fixes from Steven Rostedt:
"It seems that one of my old pull requests have slipped through.
The changes are contained to just the files that I maintain, and are
changes from others that I told I would get into this merge window.
They have already been in linux-next for several weeks, and should be
well tested."
* 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read
tracing: Remove unneeded checks from the stack tracer
tracing: Add a resize function to make one buffer equivalent to another buffer