Commit Graph

853 Commits

Author SHA1 Message Date
Russell King
1b19ca9f0b Fix CPU spinlock lockups on secondary CPU bringup
Secondary CPU bringup typically calls calibrate_delay() during its
initialization.  However, calibrate_delay() modifies a global variable
(loops_per_jiffy) used for udelay() and __delay().

A side effect of 71c696b1 ("calibrate: extract fall-back calculation
into own helper") introduced in the 2.6.39 merge window means that we
end up with a substantial period where loops_per_jiffy is zero.  This
causes the spinlock debugging code to malfunction:

	u64 loops = loops_per_jiffy * HZ;
	for (;;) {
		for (i = 0; i < loops; i++) {
			if (arch_spin_trylock(&lock->raw_lock))
				return;
			__delay(1);
		}
		...
	}

by never calling arch_spin_trylock() - resulting in the CPU locking
up in an infinite loop inside __spin_lock_debug().

Work around this by only writing to loops_per_jiffy only once we have
completed all the calibration decisions.

Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: <stable@kernel.org> (2.6.39-stable)
--
Better solutions (such as omitting the calibration for secondary CPUs,
or arranging for calibrate_delay() to return the LPJ value and leave
it to the caller to decide where to store it) are a possibility, but
would be much more invasive into each architecture.

I think this is the best solution for -rc and stable, but it should be
revisited for the next merge window.

 init/calibrate.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-23 08:59:38 -07:00
Takao Indoh
d8ad7d1123 generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts
There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.

To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.

The details of the crash are:

 (1) 2nd kernel boots up

 (2) A pending IPI from 1st kernel comes when irqs are first enabled
     in start_kernel().

 (3) Kernel tries to handle the interrupt, but call_single_queue
     is not initialized yet at this point. As a result, in the
     generic_smp_call_function_single_interrupt(), NULL pointer
     dereference occurs when list_replace_init() tries to access
     &q->list.next.

Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Milton Miller <miltonm@bga.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-17 10:17:12 +02:00
Josh Triplett
d2c3225879 gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL
CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time.  According to commit b99b87f70c ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this.  However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it.  Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of
CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

Observed in the short list of =y values in a minimal kernel configuration.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
Borislav Petkov
de695e159e init/calibrate.c: remove annoying printk
Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
calculation as it appears when booting every core on setups with
'ignore_loglevel' which dmesg people scan for possible issues.  As the
message doesn't show very useful information to the widest audience of
kernel boot message gazers, it should be removed.

Introduced by commit d2b463135f ("init/calibrate.c: fix for critical
bogoMIPS intermittent calculation failure").

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:01 -07:00
Josh Triplett
bd5dc17be8 uts: make default hostname configurable, rather than always using "(none)"
The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist.  Distribution init scripts have the same
fallback.  However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs.  Furthermore, "(none)" doesn't typically resolve to anything
useful.

Make the default hostname configurable.  This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration.  Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Linus Torvalds
6345d24daf mm: Fix boot crash in mm_alloc()
Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:

    BUG: unable to handle kernel NULL pointer dereference at   (null)
    IP: [<c11ae035>] find_next_bit+0x55/0xb0
    Call Trace:
     [<c11addda>] cpumask_any_but+0x2a/0x70
     [<c102396b>] flush_tlb_mm+0x2b/0x80
     [<c1022705>] pud_populate+0x35/0x50
     [<c10227ba>] pgd_alloc+0x9a/0xf0
     [<c103a3fc>] mm_init+0xec/0x120
     [<c103a7a3>] mm_alloc+0x53/0xd0

which was introduced by commit de03c72cfc ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask

Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-29 11:32:28 -07:00
Daniel Lezcano
a77aea9201 cgroup: remove the ns_cgroup
The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757b4 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26 17:12:34 -07:00
Mike Travis
162a7e7500 printk: allocate kernel log buffer earlier
On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
the static log buffer overflows before the larger one specified by the
log_buf_len param is allocated.  Minimize the overflow by allocating the
new log buffer as soon as possible.

On kernels without memblock, a later call to setup_log_buf from
kernel/init.c is the fallback.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:48 -07:00
Andrew Worsley
d2b463135f init/calibrate.c: fix for critical bogoMIPS intermittent calculation failure
A fix to the TSC (Time Stamp Counter) based bogoMIPS calculation used on
secondary CPUs which has two faults:

1: Not handling wrapping of the lower 32 bits of the TSC counter on
   32bit kernel - perhaps TSC is not reset by a warm reset?

2: TSC and Jiffies are no incrementing together properly.  Either
   jiffies increment too quickly or Time Stamp Counter isn't incremented
   in during an SMI but the real time clock is and jiffies are
   incremented.

Case 1 can result in a factor of 16 too large a value which makes udelay()
values too small and can cause mysterious driver errors.  Case 2 appears
to give smaller 10-15% errors after averaging but enough to cause
occasional failures on my own board

I have tested this code on my own branch and attach patch suitable for
current kernel code.  See below for examples of the failures and how the
fix handles these situations now.

I reported this issue earlier here:
     Intermittent problem with BogoMIPs calculation on Intel AP CPUs -
http://marc.info/?l=linux-kernel&m=129947246316875&w=4

I suspect this issue has been seen by others but as it is intermittent and
bogoMIPS for secondary CPUs are no longer printed out it might have been
difficult to identify this as the cause.  Perhaps these unresolved issues,
although quite old, might be relevant as possibly this fault has been
around for a while.  In particular Case 1 may only be relevant to 32bit
kernels on newer HW (most people run 64bit kernels?).  Case 2 is less
dramatic since the earlier fix in this area and also intermittent.

   Re: bogomips discrepancy on Intel Core2 Quad CPU -
http://marc.info/?l=linux-kernel&m=118929277524298&w=4
   slow system and bogus bogomips  -
http://marc.info/?l=linux-kernel&m=116791286716107&w=4
   Re: Re: [RFC-PATCH] clocksource: update lpj if clocksource has -
http://marc.info/?l=linux-kernel&m=128952775819467&w=4

This issue is masked a little by commit feae3203d7 ("timers, init:
Limit the number of per cpu calibration bootup messages") which only
prints out the first bogoMIPS value making it much harder to notice other
values differing.  Perhaps it should be changed to only suppress them when
they are similar values?

Here are some outputs showing faults occurring and the new code handling
them properly.  See my earlier message for examples of the original
failure.

    Case 1:   A Time Stamp Counter wrap:
...
Calibrating delay loop (skipped), value calculated using timer
frequency.. 6332.70 BogoMIPS (lpj=31663540)
....
calibrate_delay_direct() timer_rate_max=31666493
timer_rate_min=31666151 pre_start=4170369255 pre_end=4202035539
calibrate_delay_direct() timer_rate_max=2425955274
timer_rate_min=2425954941 pre_start=4265368533 pre_end=2396356387
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4265368581 >=post_end=2396356511
calibrate_delay_direct() timer_rate_max=31666274
timer_rate_min=31665942 pre_start=2440373374 pre_end=2472039515
calibrate_delay_direct() timer_rate_max=31666492
timer_rate_min=31666160 pre_start=2535372139 pre_end=2567038422
calibrate_delay_direct() timer_rate_max=31666455
timer_rate_min=31666207 pre_start=2630371084 pre_end=2662037415
Calibrating delay using timer specific routine.. 6333.28 BogoMIPS (lpj=31666428)
Total of 2 processors activated (12665.99 BogoMIPS).
....

    Case 2:  Some thing (presumably the SMM interrupt?) causing the
very low increase in TSC counter for the DELAY_CALIBRATION_TICKS
increase in jiffies
...
Calibrating delay loop (skipped), value calculated using timer
frequency.. 6333.25 BogoMIPS (lpj=31666270)
...
calibrate_delay_direct() timer_rate_max=31666483
timer_rate_min=31666074 pre_start=4199536526 pre_end=4231202809
calibrate_delay_direct() timer_rate_max=864348 timer_rate_min=864016
pre_start=2405343672 pre_end=2406207897
calibrate_delay_direct() timer_rate_max=31666483
timer_rate_min=31666179 pre_start=2469540464 pre_end=2501206823
calibrate_delay_direct() timer_rate_max=31666511
timer_rate_min=31666122 pre_start=2564539400 pre_end=2596205712
calibrate_delay_direct() timer_rate_max=31666084
timer_rate_min=31665685 pre_start=2659538782 pre_end=2691204657
calibrate_delay_direct() dropping min bogoMips estimate 1 = 864348
Calibrating delay using timer specific routine.. 6333.27 BogoMIPS (lpj=31666390)
Total of 2 processors activated (12666.53 BogoMIPS).
...

After 70 boots I saw 2 variations <1% slip through

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix straggly printk mess]
Signed-off-by: Andrew Worsley <amworsley@gmail.com>
Reviewed-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:46 -07:00
KOSAKI Motohiro
de03c72cfc mm: convert mm->cpu_vm_cpumask into cpumask_var_t
cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
It might lead to reduce cache hit ratio.

This patch has two change.
1) Move the place of cpumask into last of mm_struct. Because usually cpumask
   is accessed only front bits when the system has cpu-hotplug capability
2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
   footprint if cpumask_size() will use nr_cpumask_bits properly in future.

In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
It may help to detect out of tree cpu_vm_mask users.

This patch has no functional change.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:21 -07:00
Linus Torvalds
2bb732cdb4 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  kbuild: make KBUILD_NOCMDDEP=1 handle empty built-in.o
  scripts/kallsyms.c: fix potential segfault
  scripts/gen_initramfs_list.sh: Convert to a /bin/sh script
  kbuild: Fix GNU make v3.80 compatibility
  kbuild: Fix passing -Wno-* options to gcc 4.4+
  kbuild: move scripts/basic/docproc.c to scripts/docproc.c
  kbuild: Fix Makefile.asm-generic for um
  kbuild: Allow to combine multiple W= levels
  kbuild: Disable -Wunused-but-set-variable for gcc 4.6.0
  Fix handling of backlash character in LINUX_COMPILE_BY name
  kbuild: asm-generic support
  kbuild: implement several W= levels
  kbuild: Fix build with binutils <= 2.19
  initramfs: Use KBUILD_BUILD_TIMESTAMP for generated entries
  kbuild: Allow to override LINUX_COMPILE_BY and LINUX_COMPILE_HOST macros
  kbuild: Drop unused LINUX_COMPILE_TIME and LINUX_COMPILE_DOMAIN macros
  kbuild: Use the deterministic mode of ar
  kbuild: Call gzip with -n
  kbuild: move KALLSYMS_EXTRA_PASS from Kconfig to Makefile
  Kconfig: improve KALLSYMS_ALL documentation

Fix up trivial conflict in Makefile
2011-05-24 13:31:37 -07:00
Linus Torvalds
e98bae7592 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (28 commits)
  sparc32: fix build, fix missing cpu_relax declaration
  SCHED_TTWU_QUEUE is not longer needed since sparc32 now implements IPI
  sparc32,leon: Remove unnecessary page_address calls in LEON DMA API.
  sparc: convert old cpumask API into new one
  sparc32, sun4d: Implemented SMP IPIs support for SUN4D machines
  sparc32, sun4m: Implemented SMP IPIs support for SUN4M machines
  sparc32,leon: Implemented SMP IPIs for LEON CPU
  sparc32: implement SMP IPIs using the generic functions
  sparc32,leon: SMP power down implementation
  sparc32,leon: added some SMP comments
  sparc: add {read,write}*_be routines
  sparc32,leon: don't rely on bootloader to mask IRQs
  sparc32,leon: operate on boot-cpu IRQ controller registers
  sparc32: always define boot_cpu_id
  sparc32: removed unused code, implemented by generic code
  sparc32: avoid build warning at mm/percpu.c:1647
  sparc32: always register a PROM based early console
  sparc32: probe for cpu info only during startup
  sparc: consolidate show_cpuinfo in cpu.c
  sparc32,leon: implement genirq CPU affinity
  ...
2011-05-22 22:06:24 -07:00
Linus Torvalds
281dc5c5ec Give up on pushing CC_OPTIMIZE_FOR_SIZE
I still happen to believe that I$ miss costs are a major thing, but
sadly, -Os doesn't seem to be the solution.  With or without it, gcc
will miss some obvious code size improvements, and with it enabled gcc
will sometimes make choices that aren't good even with high I$ miss
ratios.

For example, with -Os, gcc on x86 will turn a 20-byte constant memcpy
into a "rep movsl".  While I sincerely hope that x86 CPU's will some day
do a good job at that, they certainly don't do it yet, and the cost is
higher than a L1 I$ miss would be.

Some day I hope we can re-enable this.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-22 14:30:36 -07:00
Daniel Hellstrom
17d9f311ec SCHED_TTWU_QUEUE is not longer needed since sparc32 now implements IPI
Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-20 13:10:55 -07:00
Linus Torvalds
eb04f2f04e Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits)
  Revert "rcu: Decrease memory-barrier usage based on semi-formal proof"
  net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree
  batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu
  batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree()
  batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu
  net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu()
  net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu()
  net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu()
  net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu()
  net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu()
  perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu()
  perf,rcu: convert call_rcu(free_ctx) to kfree_rcu()
  net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu()
  net,rcu: convert call_rcu(net_generic_release) to kfree_rcu()
  net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu()
  net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu()
  security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu()
  net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()
  net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()
  net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()
  ...
2011-05-19 18:14:34 -07:00
Linus Torvalds
80fe02b5da Merge branches 'sched-core-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
  sched: Fix and optimise calculation of the weight-inverse
  sched: Avoid going ahead if ->cpus_allowed is not changed
  sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
  sched: Remove unused parameters from sched_fork() and wake_up_new_task()
  sched: Shorten the construction of the span cpu mask of sched domain
  sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
  sched: Remove unused 'this_best_prio arg' from balance_tasks()
  sched: Remove noop in alloc_rt_sched_group()
  sched: Get rid of lock_depth
  sched: Remove obsolete comment from scheduler_tick()
  sched: Fix sched_domain iterations vs. RCU
  sched: Next buddy hint on sleep and preempt path
  sched: Make set_*_buddy() work on non-task entities
  sched: Remove need_migrate_task()
  sched: Move the second half of ttwu() to the remote cpu
  sched: Restructure ttwu() some more
  sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
  sched: Remove rq argument from ttwu_stat()
  sched: Remove rq->lock from the first half of ttwu()
  sched: Drop rq->lock from sched_exec()
  ...

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix rt_rq runtime leakage bug
2011-05-19 17:41:22 -07:00
Catalin Marinas
9b090f2da8 kmemleak: Initialise kmemleak after debug_objects_mem_init()
Kmemleak frees objects via RCU and when CONFIG_DEBUG_OBJECTS_RCU_HEAD
is enabled, the RCU callback triggers a call to free_object() in
lib/debugobjects.c. Since kmemleak is initialised before debug objects
initialisation, it may result in a kernel panic during booting. This
patch moves the kmemleak_init() call after debug_objects_mem_init().

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Tested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@kernel.org>
2011-05-19 17:36:37 +01:00
Ingo Molnar
9cb5baba5e Merge commit 'v2.6.39-rc7' into sched/core 2011-05-12 09:36:18 +02:00
David Rientjes
21a43e397e slub: Revert "[PARISC] slub: fix panic with DISCONTIGMEM"
This reverts commit 4a5fa3590f, which did not allow SLUB to be used
on architectures that use DISCONTIGMEM without compiling NUMA support
without CONFIG_BROKEN also set.

The slub panic that it was intended to prevent is addressed by
d9b41e0b54 ("[PARISC] set memory ranges in N_NORMAL_MEMORY when
onlined") on parisc so there is no further slub issues with such a
configuration.

The reverts allows SLUB now to be used on such architectures since
there haven't been any reports of additional errors.

Cc: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-10 17:37:34 -07:00
Paul E. McKenney
27f4d28057 rcu: priority boosting for TREE_PREEMPT_RCU
Add priority boosting for TREE_PREEMPT_RCU, similar to that for
TINY_PREEMPT_RCU.  This is enabled by the default-off RCU_BOOST
kernel parameter.  The priority to which to boost preempted
RCU readers is controlled by the RCU_BOOST_PRIO kernel parameter
(defaulting to real-time priority 1) and the time to wait before
boosting the readers who are blocking a given grace period is
controlled by the RCU_BOOST_DELAY kernel parameter (defaulting to
500 milliseconds).

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-05 23:16:55 -07:00
Linus Torvalds
e8dad69408 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] slub: fix panic with DISCONTIGMEM
  [PARISC] set memory ranges in N_NORMAL_MEMORY when onlined
2011-04-27 15:20:33 -07:00
Randy Dunlap
6befe5f69b init/Kconfig: fix EXPERT menu list
The EXPERT menu list was recently broken by the insertion of a
kconfig symbol (EMBEDDED) at the beginning of the EXPERT list of
kconfig items.  Broken by:

  commit 6a108a14fa
  Author: David Rientjes <rientjes@google.com>
  Date:   Thu Jan 20 14:44:16 2011 -0800
    kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

Restore the EXPERT menu list -- don't inject a symbol (EMBEDDED)
that does not depend on EXPERT into the list.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Peter Foley <pefoley2@verizon.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-26 20:48:37 -07:00
James Bottomley
4a5fa3590f [PARISC] slub: fix panic with DISCONTIGMEM
Slub makes assumptions about page_to_nid() which are violated by
DISCONTIGMEM and !NUMA.  This violation results in a panic because
page_to_nid() can be non-zero for pages in the discontiguous ranges and
this leads to a null return by get_node().  The assertion by the
maintainer is that DISCONTIGMEM should only be allowed when NUMA is also
defined.  However, at least six architectures: alpha, ia64, m32r, m68k,
mips, parisc violate this.  The panic is a regression against slab, so
just mark slub broken in the problem configuration to prevent users
reporting these panics.

Cc: stable@kernel.org
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-22 15:42:46 -05:00
Artem Bityutskiy
1e2795a119 kbuild: move KALLSYMS_EXTRA_PASS from Kconfig to Makefile
At the moment we have the CONFIG_KALLSYMS_EXTRA_PASS Kconfig switch,
which users can enable or disable while configuring the kernel. This
option is then used by 'make' to determine whether an extra kallsyms
pass is needed or not.

However, this approach is not nice and confusing, and this patch moves
CONFIG_KALLSYMS_EXTRA_PASS from Kconfig to Makefile instead. The
rationale is below.

1. CONFIG_KALLSYMS_EXTRA_PASS is really about the build time, not
   run-time. There is no real need for it to be in Kconfig. It is
   just an additional work-around which should be used only in rare
   cases, when someone breaks kallsyms, so Kbuild/Makefile is much
   better place for this option.
2. Grepping CONFIG_KALLSYMS_EXTRA_PASS shows that many defconfigs have
   it enabled, probably not because they try to work-around a kallsyms
   bug, but just because the Kconfig help text is confusing and does
   not really make it clear that this option should not be used unless
   except when kallsyms is broken.
3. And since many people have CONFIG_KALLSYMS_EXTRA_PASS enabled in
   their Kconfig, we do might fail to notice kallsyms bugs in time. E.g.,
   many testers use "make allyesconfig" to test builds, which will enable
   CONFIG_KALLSYMS_EXTRA_PASS and kallsyms breakage will not be noticed.

To address that, this patch:

1. Kills CONFIG_KALLSYMS_EXTRA_PASS
2. Changes Makefile so that people can use "make KALLSYMS_EXTRA_PASS=1"
   to enable the extra pass if needed. Additionally, they may define
   KALLSYMS_EXTRA_PASS as an environment variable.
3. By default KALLSYMS_EXTRA_PASS is disabled and if kallsyms has issues,
   "make" should print a warning and suggest using KALLSYMS_EXTRA_PASS

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
[mmarek: Removed make help text, is not necessary]
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-04-15 15:56:15 +02:00
Artem Bityutskiy
71a83ec7da Kconfig: improve KALLSYMS_ALL documentation
Dumb users like myself are not able to grasp from the existing KALLSYMS_ALL
documentation that this option is not what they need. Improve the help
message and make it clearer that KALLSYMS is enough in the majority of
use cases, and KALLSYMS_ALL should really be used very rarely.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-04-15 15:48:01 +02:00