Recent tools do not want to use /proc to retrieve module information. A few
values are currently missing from sysfs to replace the information available
in /proc/modules.
This adds /sys/module/*/{coresize,initsize,taint} attributes.
TAINT_PROPRIETARY_MODULE (P) and TAINT_OOT_MODULE (O) flags are both always
shown now, and do no longer exclude each other, also in /proc/modules.
Replace the open-coded sysfs attribute initializers with the __ATTR() macro.
Add the new attributes to Documentation/ABI.
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use more flexible pr_debug. This allows:
echo "module params +p" > /dbg/dynamic_debug/control
to turn on debug messages when needed.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use more flexible pr_debug. This allows:
echo "module module +p" > /dbg/dynamic_debug/control
to turn on debug messages when needed.
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module_ref contains two "unsigned int" fields.
Thats now too small, since some machines can open more than 2^32 files.
Check commit 518de9b39e (fs: allow for more than 2^31 files) for
reference.
We can add an aligned(2 * sizeof(unsigned long)) attribute to force
alloc_percpu() allocating module_ref areas in single cache lines.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Tejun Heo <tj@kernel.org>
CC: Robin Holt <holt@sgi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Looking at /proc/kallsyms, one starts to ponder whether all of the extra
strtab-related complexity in module.c is worth the memory savings.
Instead of making the add_kallsyms() loop even more complex, I tried the
other route of deleting the strmap logic and naively copying each string
into core_strtab with no consideration for consolidating duplicates.
Performance on an "already exists" insmod of nvidia.ko (runs
add_kallsyms() but does not actually initialize the module):
Original scheme: 1.230s
With naive copying: 0.058s
Extra space used: 35k (of a 408k module).
Signed-off-by: Kevin Cernekee <cernekee@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <73defb5e4bca04a6431392cc341112b1@localhost>
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, reboot: Fix typo in nmi reboot path
x86, NMI: Add to_cpumask() to silence compile warning
x86, NMI: NMI selftest depends on the local apic
x86: Add stack top margin for stack overflow checking
x86, NMI: NMI-selftest should handle the UP case properly
x86: Fix the 32-bit stackoverflow-debug build
x86, NMI: Add knob to disable using NMI IPIs to stop cpus
x86, NMI: Add NMI IPI selftest
x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus
x86: Clean up the range of stack overflow checking
x86: Panic on detection of stack overflow
x86: Check stack overflow in detail
Eric and David reported dead machines and traced it to commit
a195f004 ("sched: Fix load-balance lock-breaking"), it turns out
there's still a scenario where we can end up re-trying forever.
Since there is no strict forward progress guarantee in the
load-balance iteration we can get stuck re-retrying the same
task-set over and over.
Creating a forward progress guarantee with the existing
structure is somewhat non-trivial, for now simply terminate the
retry loop after a few tries.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: David Ahern <dsahern@gmail.com>
[ logic cleanup as suggested by Eric ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1326297936.2442.157.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andrew elucidates:
- First installmeant of MM. We have a HUGE number of MM patches this
time. It's crazy.
- MAINTAINERS updates
- backlight updates
- leds
- checkpatch updates
- misc ELF stuff
- rtc updates
- reiserfs
- procfs
- some misc other bits
* akpm: (124 commits)
user namespace: make signal.c respect user namespaces
workqueue: make alloc_workqueue() take printf fmt and args for name
procfs: add hidepid= and gid= mount options
procfs: parse mount options
procfs: introduce the /proc/<pid>/map_files/ directory
procfs: make proc_get_link to use dentry instead of inode
signal: add block_sigmask() for adding sigmask to current->blocked
sparc: make SA_NOMASK a synonym of SA_NODEFER
reiserfs: don't lock root inode searching
reiserfs: don't lock journal_init()
reiserfs: delay reiserfs lock until journal initialization
reiserfs: delete comments referring to the BKL
drivers/rtc/interface.c: fix alarm rollover when day or month is out-of-range
drivers/rtc/rtc-twl.c: add DT support for RTC inside twl4030/twl6030
drivers/rtc/: remove redundant spi driver bus initialization
drivers/rtc/rtc-jz4740.c: make jz4740_rtc_driver static
drivers/rtc/rtc-mc13xxx.c: make mc13xxx_rtc_idtable static
rtc: convert drivers/rtc/* to use module_platform_driver()
drivers/rtc/rtc-wm831x.c: convert to devm_kzalloc()
drivers/rtc/rtc-wm831x.c: remove unused period IRQ handler
...
ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
user namespace. (new, thanks Oleg)
__send_signal: convert current's uid to the recipient's user namespace
for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
again :)
do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
user namespace
ptrace_signal maps parent's uid into current's user namespace before
including in signal to current. IIUC Oleg has argued that this shouldn't
matter as the debugger will play with it, but it seems like not converting
the value currently being set is misleading.
Changelog:
Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
simplify callers and help make clear what we are translating
(which uid into which namespace). Passing the target task would
make callers even easier to read, but we pass in user_ns because
current_user_ns() != task_cred_xxx(current, user_ns).
Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
in ptrace_signal().
Sep 23: In send_signal(), detect when (user) signal is coming from an
ancestor or unrelated user namespace. Pass that on to __send_signal,
which sets si_uid to 0 or overflowuid if needed.
Oct 12: Base on Oleg's fixup_uid() patch. On top of that, handle all
SI_FROMKERNEL cases at callers, because we can't assume sender is
current in those cases.
Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Serge Hallyn <serge.hallyn@canonical.com>
Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
Eric Biederman pointed out that passing info is a bug and could lead to a
NULL pointer deref to boot.
A collection of signal, securebits, filecaps, cap_bounds, and a few other
ltp tests passed with this kernel.
Changelog:
Nov 18: previous patch missed a leading '&'
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
From: Dan Carpenter <dan.carpenter@oracle.com>
Subject: ipc/mqueue: lock() => unlock() typo
There was a double lock typo introduced in b085f4bd6b21 "user namespace:
make signal.c respect user namespaces"
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_workqueue() currently expects the passed in @name pointer to remain
accessible. This is inconvenient and a bit silly given that the whole wq
is being dynamically allocated. This patch updates alloc_workqueue() and
friends to take printf format string instead of opaque string and matching
varargs at the end. The name is allocated together with the wq and
formatted.
alloc_ordered_workqueue() is converted to a macro to unify varargs
handling with alloc_workqueue(), and, while at it, add comment to
alloc_workqueue().
None of the current in-kernel users pass in string with '%' as constant
name and this change shouldn't cause any problem.
[akpm@linux-foundation.org: use __printf]
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Abstract the code sequence for adding a signal handler's sa_mask to
current->blocked because the sequence is identical for all architectures.
Furthermore, in the past some architectures actually got this code wrong,
so introduce a wrapper that all architectures can use.
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
oom_score_adj is used for guarding processes from OOM-Killer. One of
problem is that it's inherited at fork(). When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.
This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
- creating new task
- renaming a task (exec)
- set oom_score_adj
To debug, users need to enable some trace pointer. Maybe filtering is useful as
# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable
output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699 [007] d..3 5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699 [007] ...1 5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729 [003] ...2 5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699 [002] ...1 5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730 [007] ...2 5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
0, we have lot of free pages that are not marked so. Snapshot code
account them as savable, what cause hibernate memory preallocation
failure.
It is pretty hard to make hibernate allocation succeed with
debug_guardpage_minorder=1. This change at least make it possible when
system has relatively big amount of RAM.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits)
KVM: PPC: Whitespace fix for kvm.h
KVM: Fix whitespace in kvm_para.h
KVM: PPC: annotate kvm_rma_init as __init
KVM: x86 emulator: implement RDPMC (0F 33)
KVM: x86 emulator: fix RDPMC privilege check
KVM: Expose the architectural performance monitoring CPUID leaf
KVM: VMX: Intercept RDPMC
KVM: SVM: Intercept RDPMC
KVM: Add generic RDPMC support
KVM: Expose a version 2 architectural PMU to a guests
KVM: Expose kvm_lapic_local_deliver()
KVM: x86 emulator: Use opcode::execute for Group 9 instruction
KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions
KVM: x86 emulator: Use opcode::execute for Group 1A instruction
KVM: ensure that debugfs entries have been created
KVM: drop bsp_vcpu pointer from kvm struct
KVM: x86: Consolidate PIT legacy test
KVM: x86: Do not rely on implicit inclusions
KVM: Make KVM_INTEL depend on CPU_SUP_INTEL
KVM: Use memdup_user instead of kmalloc/copy_from_user
...
* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cgroup: fix to allow mounting a hierarchy by name
cgroup: move assignement out of condition in cgroup_attach_proc()
cgroup: Remove task_lock() from cgroup_post_fork()
cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
cgroup: only need to check oldcgrp==newgrp once
cgroup: remove redundant get/put of task struct
cgroup: remove redundant get/put of old css_set from migrate
cgroup: Remove unnecessary task_lock before fetching css_set on migration
cgroup: Drop task_lock(parent) on cgroup_fork()
cgroups: remove redundant get/put of css_set from css_set_check_fetched()
resource cgroups: remove bogus cast
cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
cgroup, cpuset: don't use ss->pre_attach()
cgroup: don't use subsys->can_attach_task() or ->attach_task()
cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
cgroup: improve old cgroup handling in cgroup_attach_proc()
cgroup: always lock threadgroup during migration
threadgroup: extend threadgroup_lock() to cover exit and exec
threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
...
Fix up conflict in kernel/cgroup.c due to commit e0197aae59: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...
Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
PM / Hibernate: Implement compat_ioctl for /dev/snapshot
PM / Freezer: fix return value of freezable_schedule_timeout_killable()
PM / shmobile: Allow the A4R domain to be turned off at run time
PM / input / touchscreen: Make st1232 use device PM QoS constraints
PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
PM / shmobile: Remove the stay_on flag from SH7372's PM domains
PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
ARM: S3C64XX: Implement basic power domain support
PM / shmobile: Use common always on power domain governor
...
Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit