If kimage_normal_alloc() fails to initialize an allocated kimage, it will
free the image but would still set 'rimage', as a result kexec_load will
try to free it again.
This would explode as part of the freeing process is accessing internal
members which point to uninitialized memory.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch exports a PG_hwpoison into vmcoreinfo when
CONFIG_MEMORY_FAILURE is defined. "makedumpfile" needs to read
information of memory, such as 'mem_section', 'zone', 'pageflags' from
vmcore.
We introduce a function into "makedumpfile" to exclude hwpoison page from
vmcore dump. In order to introduce this function, PG_hwpoison flag have
to export into vmcoreinfo.
Signed-off-by: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tAdd adds the values related to buddy system to vmcoreinfo data so that
makedumpfile (dump filtering command) can filter out all free pages with
the new logic.
It's faster than the current logic because it can distinguish free page
by analyzing page structure at the same time as filtering for other
unnecessary pages (e.g. anonymous page).
OTOH, the current logic has to trace free_list to distinguish free pages
while analyzing page structure to filter out other unnecessary pages.
The new logic uses the fact that buddy page is marked by _mapcount ==
PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other
fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab
is set or not before looking up _mapcount value. And we can get the
order of buddy system from private field. To sum it up, the values
below are required for this logic.
Required values:
- OFFSET(page._mapcount)
- OFFSET(page.private)
- NUMBER(PG_slab)
- NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)
Changelog from v1 to v2:
1. remove SIZE(pageflags)
The new logic was changed after I sent v1 patch.
Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile.
What's makedumpfile:
makedumpfile creates a small dumpfile by excluding unnecessary pages
for the analysis. To distinguish unnecessary pages, makedumpfile gets
the vmcoreinfo data which has the minimum debugging information only
for dump filtering.
Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If new_nsproxy is set we will always call switch_task_namespaces and
then set new_nsproxy back to NULL so the reassignment and fall through
check are redundant
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prevents hung_task detector from panicing the machine. This is also
needed to prevent this wait from blocking suspend.
(It doesnt' currently block suspend but it would once the next
patch in this series is applied.)
[yongjun_wei@trendmicro.com.cn: kernel/exit.c: remove duplicated include]
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ben Chan <benchan@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We shouldn't try_to_freeze if locks are held. Holding a lock can cause a
deadlock if the lock is later acquired in the suspend or hibernate path
(e.g. by dpm). Holding a lock can also cause a deadlock in the case of
cgroup_freezer if a lock is held inside a frozen cgroup that is later
acquired by a process outside that group.
[akpm@linux-foundation.org: export debug_check_no_locks_held]
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ben Chan <benchan@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several printk's were missing KERN_INFO and KERN_CONT flags. In
addition, a printk that was outside a #if/#endif should have been
inside, which would result in stray blank line on non-x86 boxes.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The idea is simple. We need to get the siginfo for each signal on
checkpointing dump, and then return it back on restore.
The first problem is that the kernel doesn't report complete siginfos to
userspace. In a signal handler the kernel strips SI_CODE from siginfo.
When a siginfo is received from signalfd, it has a different format with
fixed sizes of fields. The interface of signalfd was extended. If a
signalfd is created with the flag SFD_RAW, it returns siginfo in a raw
format.
rt_sigqueueinfo looks suitable for restoring signals, but it can't send
siginfo with a positive si_code, because these codes are reserved for
the kernel. In the real world each person has right to do anything with
himself, so I think a process should able to send any siginfo to itself.
This patch:
The kernel prevents sending of siginfo with positive si_code, because
these codes are reserved for kernel. I think we can allow a task to
send such a siginfo to itself. This operation should not be dangerous.
This functionality is required for restoring signals in
checkpoint/restart.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__orderly_poweroff() does argv_free() if call_usermodehelper_fns()
returns -ENOMEM. As Lucas pointed out, this can be wrong if -ENOMEM was
not triggered by the failing call_usermodehelper_setup(), in this case
both __orderly_poweroff() and argv_cleanup() can do kfree().
Kill argv_cleanup() and change __orderly_poweroff() to call argv_free()
unconditionally like do_coredump() does. This info->cleanup() is not
needed (and wrong) since 6c0c0d4d "fix bug in orderly_poweroff() which
did the UMH_NO_WAIT => UMH_WAIT_EXEC change, we can rely on the fact
that CLONE_VFORK can't return until do_execve() succeeds/fails.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: hongfeng <hongfeng@marvell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
Pull scheduler fixes from Ingo Molnar.
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cputime: Use local_clock() for full dynticks cputime accounting
cputime: Constify timeval_to_cputime(timeval) argument
sched: Move RR_TIMESLICE from sysctl.h to rt.h
sched: Fix /proc/sched_debug failure on very very large systems
sched: Fix /proc/sched_stat failure on very very large systems
sched/core: Remove the obsolete and unused nr_uninterruptible() function