Initialize jump_labels much, much earlier, so they're available for use
during system setup.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Commit d5767c5353 ("bootup: move 'usermodehelper_enable()' to the end
of do_basic_setup()") moved 'usermodehelper_enable()' to end of
do_basic_setup() to after the initcalls. But then I get failed to let
uvesafb work on my computer, and lose the splash boot.
So maybe we could start usermodehelper_enable a little early to make
some task work that need eary init with the help of user mode.
[ I would *really* prefer that initcalls not call into user space - even
the real 'init' hasn't been execve'd yet, after all! But for uvesafb
it really does look like we don't have much choice.
I considered doing this when we mount the root filesystem, but
depending on config options that is in multiple places. We could do
the usermode helper enable as a rootfs_initcall()..
So I'm just using wang yanqing's trivial patch. It's not wonderful,
but it's simple and should work. We should revisit this some day,
though. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit eliminates the possibility of running TREE_PREEMPT_RCU
when SMP=n and of running TINY_RCU when PREEMPT=y. People who really
want these combinations can hand-edit init/Kconfig, but eliminating
them as choices for production systems reduces the amount of testing
required. It will also allow cutting out a few #ifdefs.
Note that running TREE_RCU and TINY_RCU on single-CPU systems using
SMP-built kernels is still supported.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Doing it just before starting to call into cpu_idle() made a sick kind
of sense only because the original bug we fixed (see commit
288d5abec8: "Boot up with usermodehelper disabled") was about problems
with some scheduler data structures not being initialized, and they had
better be initialized at that point.
But it really didn't make any other conceptual sense, and doing it after
the initial "schedule()" call for the idle thread actually opened up a
race: what if the main initialization thread did everything without
needing to sleep, and got all the way into user land too? Without
actually having scheduled back to the idle thread?
Now, in normal circumstances that doesn't ever happen, but it looks like
Richard Cochran triggered exactly that on his ARM IXP4xx machines:
"I have some ARM IXP4xx based machines that use the two on chip MAC
ports (aka NPEs). The NPE needs a firmware in order to function.
Ever since the following commit [that 288d5abec8 one], it is no
longer possible to bring up the interfaces during the init scripts."
with a call trace showing an ioctl coming from user space. Richard says:
"The init is busybox, and the startup script does mount, syslogd, and
then ifup, so that all can go by quickly."
The fix is to move the usermodehelper_enable() into the main 'init'
thread, and just put it after we've done all our initcalls. By then,
everything really should be up, but we've obviously not actually started
the user-mode portion of init yet.
Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a malformed loglevel value (for example "${abc}") is passed on the
kernel cmdline, the loglevel itself is being set to 0.
That then suppresses all following messages, including all the errors
and crashes caused by other malformed cmdline options. This could make
debugging process quite tricky.
This patch leaves the previous value of loglevel if the new value is
incorrect and reports an error code in this case.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@sysgo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In this patch we introduce the notion of CFS bandwidth, partitioned into
globally unassigned bandwidth, and locally claimed bandwidth.
- The global bandwidth is per task_group, it represents a pool of unclaimed
bandwidth that cfs_rqs can allocate from.
- The local bandwidth is tracked per-cfs_rq, this represents allotments from
the global pool bandwidth assigned to a specific cpu.
Bandwidth is managed via cgroupfs, adding two new interfaces to the cpu subsystem:
- cpu.cfs_period_us : the bandwidth period in usecs
- cpu.cfs_quota_us : the cpu bandwidth (in usecs) that this tg will be allowed
to consume over period above.
Signed-off-by: Paul Turner <pjt@google.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110721184756.972636699@google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The core device layer sends tons of uevent notifications for each device
it finds, and if the kernel has been built with a non-empty
CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
helper binary for all these events very early in the boot.
Not only won't the root filesystem even be mounted at that point, we
literally won't have necessarily even initialized all the process
handling data structures at that point, which causes no end of silly
problems even when the usermode helper doesn't actually succeed in
executing.
So just use our existing infrastructure to disable the usermodehelpers
to make the kernel start out with them disabled. We enable them when
we've at least initialized stuff a bit.
Problems related to an uninitialized
init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex
reported by various people.
Reported-by: Manuel Lauss <manuel.lauss@googlemail.com>
Reported-by: Richard Weinberger <richard@nod.at>
Reported-by: Marc Zyngier <maz@misterjones.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While it's at its least, make a number of boring nitpicky cleanups to
shmem.c, mostly for consistency of variable naming. Things like "swap"
instead of "entry", "pgoff_t index" instead of "unsigned long idx".
And since everything else here is prefixed "shmem_", better change
init_tmpfs() to shmem_init().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For each CPU, do the calibration delay only once. For subsequent calls,
use the cached per-CPU value of loops_per_jiffy.
This saves about 200ms of resume time on dual core Intel Atom N5xx based
systems. This helps bring down the kernel resume time on such systems
from about 500ms to about 300ms.
[akpm@linux-foundation.org: make cpu_loops_per_jiffy static]
[akpm@linux-foundation.org: clean up message text]
[akpm@linux-foundation.org: fix things up after upstream rmk changes]
Signed-off-by: Sameer Nanda <snanda@chromium.org>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
um: Make rwsem.S depend on CONFIG_RWSEM_XCHGADD_ALGORITHM
* 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
debug: Make CONFIG_EXPERT select CONFIG_DEBUG_KERNEL to unhide debug options
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Remove unused CHECK_IRQ_PER_CPU()
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools, x86: Fix 32-bit compile on 64-bit system
* 'timers-cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mips: Fix i8253 clockevent fallout
i8253: Cleanup outb/inb magic
arm: Footbridge: Use common i8253 clockevent
mips: Use common i8253 clockevent
x86: Use common i8253 clockevent
i8253: Create common clockevent implementation
i8253: Export i8253_lock unconditionally
pcpskr: MIPS: Make config dependencies finer grained
pcspkr: Cleanup Kconfig dependencies
i8253: Move remaining content and delete asm/i8253.h
i8253: Consolidate definitions of PIT_LATCH
x86: i8253: Consolidate definitions of global_clock_event
i8253: Alpha, PowerPC: Remove unused asm/8253pit.h
alpha: i8253: Cleanup remaining users of i8253pit.h
i8253: Remove I8253_LOCK config
i8253: Make pcsp sound driver use the shared i8253_lock
i8253: Make pcspkr input driver use the shared i8253_lock
i8253: Consolidate all kernel definitions of i8253_lock
i8253: Unify all kernel declarations of i8253_lock
i8253: Create linux/i8253.h and use it in all 8253 related files
Secondary CPU bringup typically calls calibrate_delay() during its
initialization. However, calibrate_delay() modifies a global variable
(loops_per_jiffy) used for udelay() and __delay().
A side effect of 71c696b1 ("calibrate: extract fall-back calculation
into own helper") introduced in the 2.6.39 merge window means that we
end up with a substantial period where loops_per_jiffy is zero. This
causes the spinlock debugging code to malfunction:
u64 loops = loops_per_jiffy * HZ;
for (;;) {
for (i = 0; i < loops; i++) {
if (arch_spin_trylock(&lock->raw_lock))
return;
__delay(1);
}
...
}
by never calling arch_spin_trylock() - resulting in the CPU locking
up in an infinite loop inside __spin_lock_debug().
Work around this by only writing to loops_per_jiffy only once we have
completed all the calibration decisions.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: <stable@kernel.org> (2.6.39-stable)
--
Better solutions (such as omitting the calibration for secondary CPUs,
or arranging for calibrate_delay() to return the LPJ value and leave
it to the caller to decide where to store it) are a possibility, but
would be much more invasive into each architecture.
I think this is the best solution for -rc and stable, but it should be
revisited for the next merge window.
init/calibrate.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.
To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.
The details of the crash are:
(1) 2nd kernel boots up
(2) A pending IPI from 1st kernel comes when irqs are first enabled
in start_kernel().
(3) Kernel tries to handle the interrupt, but call_single_queue
is not initialized yet at this point. As a result, in the
generic_smp_call_function_single_interrupt(), NULL pointer
dereference occurs when list_replace_init() tries to access
&q->list.next.
Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().
Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Milton Miller <miltonm@bga.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time. According to commit b99b87f70c ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of
CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.
Observed in the short list of =y values in a minimal kernel configuration.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
calculation as it appears when booting every core on setups with
'ignore_loglevel' which dmesg people scan for possible issues. As the
message doesn't show very useful information to the widest audience of
kernel boot message gazers, it should be removed.
Introduced by commit d2b463135f ("init/calibrate.c: fix for critical
bogoMIPS intermittent calculation failure").
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist. Distribution init scripts have the same
fallback. However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs. Furthermore, "(none)" doesn't typically resolve to anything
useful.
Make the default hostname configurable. This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration. Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several debugging options currently default to y, such as
CONFIG_DEBUG_BUGVERBOSE and CONFIG_DEBUG_RODATA. Embedded users
might want to turn those options off to save space; however,
turning them off requires turning on CONFIG_DEBUG_KERNEL to
unhide them. Since CONFIG_DEBUG_KERNEL exists specifically to
unhide debugging options, and CONFIG_EXPERT exists specifically
to unhide options potentially needed by experts and/or embedded
users, make CONFIG_EXPERT automatically imply
CONFIG_DEBUG_KERNEL.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110606012358.GA1909@leaf
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c11ae035>] find_next_bit+0x55/0xb0
Call Trace:
[<c11addda>] cpumask_any_but+0x2a/0x70
[<c102396b>] flush_tlb_mm+0x2b/0x80
[<c1022705>] pud_populate+0x35/0x50
[<c10227ba>] pgd_alloc+0x9a/0xf0
[<c103a3fc>] mm_init+0xec/0x120
[<c103a7a3>] mm_alloc+0x53/0xd0
which was introduced by commit de03c72cfc ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask
Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:
* cgroup creation is out-of-control
* cgroup name can conflict when pids are looping
* it is not possible to have a single process handling a lot of
namespaces without falling in a exponential creation time
* we may want to create a namespace without creating a cgroup
The ns_cgroup was replaced by a compatibility flag 'clone_children',
where a newly created cgroup will copy the parent cgroup values.
The userspace has to manually create a cgroup and add a task to
the 'tasks' file.
This patch removes the ns_cgroup as suggested in the following thread:
https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html
The 'cgroup_clone' function is removed because it is no longer used.
This is a userspace-visible change. Commit 45531757b4 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal. Since that
time we have heard from XXX users who were affected by this.
Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
the static log buffer overflows before the larger one specified by the
log_buf_len param is allocated. Minimize the overflow by allocating the
new log buffer as soon as possible.
On kernels without memblock, a later call to setup_log_buf from
kernel/init.c is the fallback.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>