Commit Graph

13922 Commits

Author SHA1 Message Date
Rusty Russell 9bb9c3be56 module: wait when loading a module which is currently initializing.
The original module-init-tools module loader used a fnctl lock on the
.ko file to avoid attempts to simultaneously load a module.
Unfortunately, you can't get an exclusive fcntl lock on a read-only
fd, making this not work for read-only mounted filesystems.
module-init-tools has a hacky sleep-and-loop for this now.

It's not that hard to wait in the kernel, and only return -EEXIST once
the first module has finished loading (or continue loading the module
if the first one failed to initialize for some reason).  It's also
consistent with what we do for dependent modules which are still loading.

Suggested-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 14:31:03 +09:30
Rusty Russell 6f13909f4f module: fix symbol waiting when module fails before init
We use resolve_symbol_wait(), which blocks if the module containing
the symbol is still loading.  However:

1) The module_wq we use is only woken after calling the modules' init
   function, but there are other failure paths after the module is
   placed in the linked list where we need to do the same thing.

2) wake_up() only wakes one waiter, and our waitqueue is shared by all
   modules, so we need to wake them all.

3) wake_up_all() doesn't imply a memory barrier: I feel happier calling
   it after we've grabbed and dropped the module_mutex, not just after
   the state assignment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 14:31:03 +09:30
David Howells 786d35d45c Make most arch asm/module.h files use asm-generic/module.h
Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
into asm-generic/module.h for all arches bar MIPS.

Also, use the generic definition mod_arch_specific where possible.

To this end, I've defined three new config bools:

 (*) HAVE_MOD_ARCH_SPECIFIC

     Arches define this if they don't want to use the empty generic
     mod_arch_specific struct.

 (*) MODULES_USE_ELF_RELA

     Arches define this if their modules can contain RELA records.  This causes
     the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
     defined by the arch rather than have the core emit an error message.

 (*) MODULES_USE_ELF_REL

     Arches define this if their modules can contain REL records.  This causes
     the Elf_Rel mapping to be emitted and allows apply_relocate() to be
     defined by the arch rather than have the core emit an error message.

Note that it is possible to allow both REL and RELA records: m68k and mips are
two arches that do this.

With this, some arch asm/module.h files can be deleted entirely and replaced
with a generic-y marker in the arch Kbuild file.

Additionally, I have removed the bits from m32r and score that handle the
unsupported type of relocation record as that's now handled centrally.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-09-28 14:31:03 +09:30
Matthew Garrett c99af3752b module: taint kernel when lve module is loaded
Cloudlinux have a product called lve that includes a kernel module. This
was previously GPLed but is now under a proprietary license, but the
module continues to declare MODULE_LICENSE("GPL") and makes use of some
EXPORT_SYMBOL_GPL symbols. Forcibly taint it in order to avoid this.

Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Alex Lyashkov <umka@cloudlinux.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
2012-09-28 14:31:02 +09:30
Linus Torvalds 4651afbbae Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull another workqueue fix from Tejun Heo:
 "Unfortunately, yet another late fix.  This too is discovered and fixed
  by Lai.  This bug was introduced during this merge window by commit
  25511a4776 ("workqueue: reimplement CPU online rebinding to handle
  idle workers") which started using WORKER_REBIND flag for idle rebind
  too.

  The bug is relatively easy to trigger if the CPU rapidly goes through
  off, on and then off (and stay off).  The fix is on the safer side.
  This hasn't been on linux-next yet but I'm pushing early so that it
  can get more exposure before v3.6 release."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
2012-09-17 16:05:23 -07:00
Lai Jiangshan 960bd11bf2 workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
(CPU is down again).  This used to be okay because the flag wasn't
used for anything else.

However, after 25511a477 "workqueue: reimplement CPU online rebinding
to handle idle workers", WORKER_REBIND is also used to command idle
workers to rebind.  If not cleared, the worker may confuse the next
CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
prematurely calling idle_worker_rebind().

  WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
 00()
  Hardware name: Bochs
  Modules linked in: test_wq(O-)
  Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
  Call Trace:
   [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
   [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
   [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
   [<ffffffff810bc16e>] kthread+0xbe/0xd0
   [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
  ---[ end trace e977cf20f4661968 ]---
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: test_wq(O-)
  CPU 0
  Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
  RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
  RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
  RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
  RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
  R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
  FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
  Stack:
   ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
   ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
   ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
  Call Trace:
   [<ffffffff810bc16e>] kthread+0xbe/0xd0
   [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
  Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
  RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
   RSP <ffff88001e1c9de0>
  CR2: 0000000000000000

There was no reason to keep WORKER_REBIND on failure in the first
place - WORKER_UNBOUND is guaranteed to be set in such cases
preventing incorrectly activating concurrency management.  Always
clear WORKER_REBIND.

tj: Updated comment and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-17 15:42:31 -07:00
Andrew Vagin 579035dc5d pid-namespace: limit value of ns_last_pid to (0, max_pid)
The kernel doesn't check the pid for negative values, so if you try to
write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic.

The crash happens because the next pid is -1, and alloc_pidmap() will
try to access to a nonexistent pidmap.

  map = &pid_ns->pidmap[pid/BITS_PER_PAGE];

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17 15:00:38 -07:00
Linus Torvalds 37407ea7f9 Revert "sched: Improve scalability via 'CPU buddies', which withstand random perturbations"
This reverts commit 970e178985.

Nikolay Ulyanitsky reported thatthe 3.6-rc5 kernel has a 15-20%
performance drop on PostgreSQL 9.2 on his machine (running "pgbench").

Borislav Petkov was able to reproduce this, and bisected it to this
commit 970e178985 ("sched: Improve scalability via 'CPU buddies' ...")
apparently because the new single-idle-buddy model simply doesn't find
idle CPU's to reschedule on aggressively enough.

Mike Galbraith suspects that it is likely due to the user-mode spinlocks
in PostgreSQL not reacting well to preemption, but we don't really know
the details - I'll just revert the commit for now.

There are hopefully other approaches to improve scheduler scalability
without it causing these kinds of downsides.

Reported-by: Nikolay Ulyanitsky <lystor@gmail.com>
Bisected-by: Borislav Petkov <bp@alien8.de>
Acked-by: Mike Galbraith <efault@gmx.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-16 12:29:43 -07:00
Linus Torvalds 889cb3b9a4 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Smaller fixlets"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix kernel-doc warnings in kernel/sched/fair.c
  sched: Unthrottle rt runqueues in __disable_runtime()
  sched: Add missing call to calc_load_exit_idle()
  sched: Fix load avg vs cpu-hotplug
2012-09-14 17:44:52 -07:00
Linus Torvalds 7ef6e97380 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree includes various fixes"

Ingo really needs to improve on the whole "explain git pull" part.
"Various fixes" indeed.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled
  perf/x86: Enable Intel Cedarview Atom suppport
  perf_event: Switch to internal refcount, fix race with close()
  oprofile, s390: Fix uninitialized memory access when writing to oprofilefs
  perf/x86: Fix microcode revision check for SNB-PEBS
2012-09-14 17:43:45 -07:00
Linus Torvalds 0bd1189e23 Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "It's later than I'd like but well the timing just didn't work out this
  time.

  There are three bug fixes.  One from before 3.6-rc1 and two from the
  new CPU hotplug code.  Kudos to Lai for discovering all of them and
  providing fixes.

   * Atomicity bug when clearing a flag and setting another.  The two
     operation should have been atomic but wasn't.  This bug has existed
     for a long time but is unlikely to have actually happened.  Fix is
     safe.  Marked for -stable.

   * If CPU hotplug cycles happen back-to-back before workers finish the
     previous cycle, the states could get out of sync and it could get
     stuck.  Fixed by waiting for workers to complete before finishing
     hotplug cycle.

   * While CPU hotplug is in progress, idle workers could be depleted
     which can then lead to deadlock.  I think both happening together
     is highly unlikely but still better to fix it and the fix isn't too
     scary.

  There's another workqueue related regression which reported a few days
  ago:

    https://bugzilla.kernel.org/show_bug.cgi?id=47301

  It's a bit of head scratcher but there is a semi-reliable reproduce
  case, so I'm hoping to resolve it soonish."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix possible idle worker depletion across CPU hotplug
  workqueue: restore POOL_MANAGING_WORKERS
  workqueue: fix possible deadlock in idle worker rebinding
  workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the function
  workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
2012-09-12 07:16:54 +08:00
Lai Jiangshan ee378aa49b workqueue: fix possible idle worker depletion across CPU hotplug
To simplify both normal and CPU hotplug paths, worker management is
prevented while CPU hoplug is in progress.  This is achieved by CPU
hotplug holding the same exclusion mechanism used by workers to ensure
there's only one manager per pool.

If someone else seems to be performing the manager role, workers
proceed to execute work items.  CPU hotplug using the same mechanism
can lead to idle worker depletion because all workers could proceed to
execute work items while CPU hotplug is in progress and CPU hotplug
itself wouldn't actually perform the worker management duty - it
doesn't guarantee that there's an idle worker left when it releases
management.

This idle worker depletion, under extreme circumstances, can break
forward-progress guarantee and thus lead to deadlock.

This patch fixes the bug by using separate mechanisms for manager
exclusion among workers and hotplug exclusion.  For manager exclusion,
POOL_MANAGING_WORKERS which was restored by the previous patch is
used.  pool->manager_mutex is now only used for exclusion between the
elected manager and CPU hotplug.  The elected manager won't proceed
without holding pool->manager_mutex.

This ensures that the worker which won the manager position can't skip
managing while CPU hotplug is in progress.  It will block on
manager_mutex and perform management after CPU hotplug is complete.

Note that hotplug may happen while waiting for manager_mutex.  A
manager isn't either on idle or busy list and thus the hoplug code
can't unbind/rebind it.  Make the manager handle its own un/rebinding.

tj: Updated comment and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-10 10:05:54 -07:00
Lai Jiangshan 552a37e936 workqueue: restore POOL_MANAGING_WORKERS
This patch restores POOL_MANAGING_WORKERS which was replaced by
pool->manager_mutex by 6037315269 "workqueue: use mutex for global_cwq
manager exclusion".

There's a subtle idle worker depletion bug across CPU hotplug events
and we need to distinguish an actual manager and CPU hotplug
preventing management.  POOL_MANAGING_WORKERS will be used for the
former and manager_mutex the later.

This patch just lays POOL_MANAGING_WORKERS on top of the existing
manager_mutex and doesn't introduce any synchronization changes.  The
next patch will update it.

Note that this patch fixes a non-critical anomaly where
too_many_workers() may return %true spuriously while CPU hotplug is in
progress.  While the issue could schedule idle timer spuriously, it
didn't trigger any actual misbehavior.

tj: Rewrote patch description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-10 10:04:54 -07:00
Tejun Heo ec58815ab0 workqueue: fix possible deadlock in idle worker rebinding
Currently, rebind_workers() and idle_worker_rebind() are two-way
interlocked.  rebind_workers() waits for idle workers to finish
rebinding and rebound idle workers wait for rebind_workers() to finish
rebinding busy workers before proceeding.

Unfortunately, this isn't enough.  The second wait from idle workers
is implemented as follows.

	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));

rebind_workers() clears WORKER_REBIND, wakes up the idle workers and
then returns.  If CPU hotplug cycle happens again before one of the
idle workers finishes the above wait_event(), rebind_workers() will
repeat the first part of the handshake - set WORKER_REBIND again and
wait for the idle worker to finish rebinding - and this leads to
deadlock because the idle worker would be waiting for WORKER_REBIND to
clear.

This is fixed by adding another interlocking step at the end -
rebind_workers() now waits for all the idle workers to finish the
above WORKER_REBIND wait before returning.  This ensures that all
rebinding steps are complete on all idle workers before the next
hotplug cycle can happen.

This problem was diagnosed by Lai Jiangshan who also posted a patch to
fix the issue, upon which this patch is based.

This is the minimal fix and further patches are scheduled for the next
merge window to simplify the CPU hotplug path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Original-patch-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <1346516916-1991-3-git-send-email-laijs@cn.fujitsu.com>
2012-09-05 16:10:15 -07:00
Tejun Heo 90beca5de5 workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the function
This doesn't make any functional difference and is purely to help the
next patch to be simpler.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2012-09-05 16:10:14 -07:00
Lai Jiangshan 96e65306b8 workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
The compiler may compile the following code into TWO write/modify
instructions.

	worker->flags &= ~WORKER_UNBOUND;
	worker->flags |= WORKER_REBIND;

so the other CPU may temporarily see worker->flags which doesn't have
either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
prematurely.

Fix it by using single explicit assignment via ACCESS_ONCE().

Because idle workers have another WORKER_NOT_RUNNING flag, this bug
doesn't exist for them; however, update it to use the same pattern for
consistency.

tj: Applied the change to idle workers too and updated comments and
    patch description a bit.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2012-09-04 17:04:45 -07:00
K.Prasad 500ad2d8b0 perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled
While debugging a warning message on PowerPC while using hardware
breakpoints, it was discovered that when perf_event_disable is invoked
through hw_breakpoint_handler function with interrupts disabled, a
subsequent IPI in the code path would trigger a WARN_ON_ONCE message in
smp_call_function_single function.

This patch calls __perf_event_disable() when interrupts are already
disabled, instead of perf_event_disable().

Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
Signed-off-by: K.Prasad <Prasad.Krishnan@gmail.com>
[naveen.n.rao@linux.vnet.ibm.com: v3: Check to make sure we target current task]
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120802081635.5811.17737.stgit@localhost.localdomain
[ Fixed build error on MIPS. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 17:29:53 +02:00
Al Viro a6fa941d94 perf_event: Switch to internal refcount, fix race with close()
Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event.  Use explicit refcount of its own
instead.  Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 17:29:22 +02:00
Randy Dunlap 9450d57eab sched: Fix kernel-doc warnings in kernel/sched/fair.c
Fix two kernel-doc warnings in kernel/sched/fair.c:

  Warning(kernel/sched/fair.c:3660): Excess function parameter 'cpus' description in 'update_sg_lb_stats'
  Warning(kernel/sched/fair.c:3806): Excess function parameter 'cpus' description in 'update_sd_lb_stats'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/50303714.3090204@xenotime.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 14:30:49 +02:00
Peter Boonstoppel a4c96ae319 sched: Unthrottle rt runqueues in __disable_runtime()
migrate_tasks() uses _pick_next_task_rt() to get tasks from the
real-time runqueues to be migrated. When rt_rq is throttled
_pick_next_task_rt() won't return anything, in which case
migrate_tasks() can't move all threads over and gets stuck in an
infinite loop.

Instead unthrottle rt runqueues before migrating tasks.

Additionally: move unthrottle_offline_cfs_rqs() to rq_offline_fair()

Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/5FBF8E85CA34454794F0F7ECBA79798F379D3648B7@HQMAIL04.nvidia.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 14:30:30 +02:00
Charles Wang 749c8814f0 sched: Add missing call to calc_load_exit_idle()
Azat Khuzhin reported high loadavg in Linux v3.6

After checking the upstream scheduler code, I found Peter's commit:

  5167e8d541 sched/nohz: Rewrite and fix load-avg computation -- again

not fully applied, missing the call to calc_load_exit_idle().

After that idle exit in sampling window will always be calculated
to non-idle, and the load will be higher than normal.

This patch adds the missing call to calc_load_exit_idle().

Signed-off-by: Charles Wang <muming.wq@taobao.com>
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1345449754-27130-1-git-send-email-muming.wq@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 14:30:29 +02:00
Peter Zijlstra f319da0c68 sched: Fix load avg vs cpu-hotplug
Rabik and Paul reported two different issues related to the same few
lines of code.

Rabik's issue is that the nr_uninterruptible migration code is wrong in
that he sees artifacts due to this (Rabik please do expand in more
detail).

Paul's issue is that this code as it stands relies on us using
stop_machine() for unplug, we all would like to remove this assumption
so that eventually we can remove this stop_machine() usage altogether.

The only reason we'd have to migrate nr_uninterruptible is so that we
could use for_each_online_cpu() loops in favour of
for_each_possible_cpu() loops, however since nr_uninterruptible() is the
only such loop and its using possible lets not bother at all.

The problem Rabik sees is (probably) caused by the fact that by
migrating nr_uninterruptible we screw rq->calc_load_active for both rqs
involved.

So don't bother with fancy migration schemes (meaning we now have to
keep using for_each_possible_cpu()) and instead fold any nr_active delta
after we migrate all tasks away to make sure we don't have any skewed
nr_active accounting.

Reported-by: Rakib Mullick <rakib.mullick@gmail.com>
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1345454817.23018.27.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04 14:30:18 +02:00
John Stultz cee58483cf time: Move ktime_t overflow checking into timespec_valid_strict
Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526c ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-01 10:24:48 -07:00
Linus Torvalds 7ca63ee1b0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains misc fixlets: a perf script python binding fix, a
  uprobes fix and a syscall tracing fix."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Add missing files to build the python binding
  uprobes: Fix mmap_region()'s mm->mm_rb corruption if uprobe_mmap() fails
  tracing/syscalls: Fix perf syscall tracing when syscall_nr == -1
2012-08-23 21:48:41 -07:00
Linus Torvalds b5bc0c7054 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "Mostly small fixes for the fallout of the timekeeping overhaul in 3.6
  along with stable fixes to address an accumulation problem and missing
  sanity checks for RTC readouts and user space provided values."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Avoid making adjustments if we haven't accumulated anything
  time: Avoid potential shift overflow with large shift values
  time: Fix casting issue in timekeeping_forward_now
  time: Ensure we normalize the timekeeper in tk_xtime_add
  time: Improve sanity checking of timekeeping inputs
2012-08-23 21:46:57 -07:00