Commit Graph

17356 Commits

Author SHA1 Message Date
Tejun Heo
51ffe41178 cpuset: convert away from cftype->read()
In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

All users of cftype->read() can be easily served, usually better, by
seq_file and other methods.  Rename cpuset_common_file_read() to
cpuset_common_read_seq_string() and convert it to use
read_seq_string() interface instead.  This not only simplifies the
code but also makes it more versatile.  Before, the file couldn't
output if the result is longer than PAGE_SIZE.  After the conversion,
seq_file automatically grows the buffer until the output can fit.

This patch doesn't make any visible behavior changes except for being
able to handle output larger than PAGE_SIZE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-12-05 12:28:02 -05:00
Tejun Heo
44ffc75ba9 cgroup, sched: convert away from cftype->read_map()
In preparation of conversion to kernfs, cgroup file handling is being
consolidated so that it can be easily mapped to the seq_file based
interface of kernfs.

cftype->read_map() doesn't add any value and being replaced with
->read_seq_string().  Update cpu_stats_show() and cpuacct_stats_show()
accordingly.

This patch doesn't make any visible behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2013-12-05 12:28:01 -05:00
Mathias Krause
c0e656b7a6 padata: Fix wrong usage of rcu_dereference()
A kernel with enabled lockdep complains about the wrong usage of
rcu_dereference() under a rcu_read_lock_bh() protected region.

  ===============================
  [ INFO: suspicious RCU usage. ]
  3.13.0-rc1+ #126 Not tainted
  -------------------------------
  linux/kernel/padata.c:115 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by cryptomgr_test/153:
   #0:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8115c235>] padata_do_parallel+0x5/0x270

Fix that by using rcu_dereference_bh() instead.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-12-05 21:28:42 +08:00
Wanpeng Li
40ea2b42d7 sched/numa: Drop idx field of task_numa_env struct
Drop unused idx field of task_numa_env struct.

Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1386241817-5051-2-git-send-email-liwanp@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-05 13:38:36 +01:00
Linus Torvalds
1ab231b274 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:

 - timekeeping: Cure a subtle drift issue on GENERIC_TIME_VSYSCALL_OLD

 - nohz: Make CONFIG_NO_HZ=n and nohz=off command line option behave the
   same way.  Fixes a long standing load accounting wreckage.

 - clocksource/ARM: Kconfig update to avoid ARM=n wreckage

 - clocksource/ARM: Fixlets for the AT91 and SH clocksource/clockevents

 - Trivial documentation update and kzalloc conversion from akpms pile

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off
  time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD
  clocksource: arm_arch_timer: Hide eventstream Kconfig on non-ARM
  clocksource: sh_tmu: Add clk_prepare/unprepare support
  clocksource: sh_tmu: Release clock when sh_tmu_register() fails
  clocksource: sh_mtu2: Add clk_prepare/unprepare support
  clocksource: sh_mtu2: Release clock when sh_mtu2_register() fails
  ARM: at91: rm9200: switch back to clockevents_config_and_register
  tick: Document tick_do_timer_cpu
  timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
  NOHZ: Check for nohz active instead of nohz enabled
2013-12-04 08:52:09 -08:00
Ingo Molnar
a934a56e12 Merge branch 'timers/core-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
Pull dynticks updates from Frederic Weisbecker:

  * Fix a bug where posix cpu timers requeued due to interval got ignored on full
    dynticks CPUs (not a regression though as it only impacts full dynticks and the
    bug is there since we merged full dynticks).

  * Optimizations and cleanups on the use of per CPU APIs to improve code readability,
    performance and debuggability in the nohz subsystem;

  * Optimize posix cpu timer by sparing stub workqueue queue with full dynticks off case

  * Rename some functions to extend with *_this_cpu() suffix for clarity

  * Refine the naming of some context tracking subsystem state accessors

  * Trivial spelling fix by Paul Gortmaker

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-04 10:09:53 +01:00
Felipe Contreras
88a88b320a params: improve standard definitions
We are repeating the functionality of kstrtol in param_set_long, and the
same for kstrtoint. We can get rid of the extra code by using the right
functions.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-12-04 14:09:46 +10:30
Paul E. McKenney
3947909814 rcu: Let the world know when RCU adjusts its geometry
Some RCU bugs have been specific to the layout of the rcu_node tree,
but RCU will silently adjust the tree at boot time if appropriate.
This obscures valuable debugging information, so print a message when
this happens.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03 10:10:19 -08:00
Paul E. McKenney
4461212aa0 rcu: Fix srcu_barrier() docbook header
The srcu_barrier() docbook header left out the "sp" argument, so this
commit adds that argument's docbook text.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03 10:10:19 -08:00
Paul E. McKenney
3a5924052a rcu: Allow task-level idle entry/exit nesting
The current task-level idle entry/exit code forces an entry/exit on
each call, regardless of the nesting level.  This commit therefore
properly accounts for nesting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-12-03 10:10:19 -08:00
Paul E. McKenney
96d3fd0d31 rcu: Break call_rcu() deadlock involving scheduler and perf
Dave Jones got the following lockdep splat:

>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.12.0-rc3+ #92 Not tainted
>  -------------------------------------------------------
>  trinity-child2/15191 is trying to acquire lock:
>   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
>
> but task is already holding lock:
>   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&ctx->lock){-.-...}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
>         [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
>         [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
>         [<ffffffff81732052>] __schedule+0x1d2/0xa20
>         [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
>         [<ffffffff817352b6>] retint_kernel+0x26/0x30
>         [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
>         [<ffffffff813f0504>] pty_write+0x54/0x60
>         [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
>         [<ffffffff813e5838>] tty_write+0x158/0x2d0
>         [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
>         [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
>         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> -> #2 (&rq->lock){-.-.-.}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
>         [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
>         [<ffffffff81054336>] do_fork+0x126/0x460
>         [<ffffffff81054696>] kernel_thread+0x26/0x30
>         [<ffffffff8171ff93>] rest_init+0x23/0x140
>         [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
>         [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
>         [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
>
> -> #1 (&p->pi_lock){-.-.-.}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>         [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
>         [<ffffffff81097d62>] default_wake_function+0x12/0x20
>         [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
>         [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
>         [<ffffffff8108ff59>] __wake_up+0x39/0x50
>         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>         [<ffffffff81111450>] __call_rcu+0x140/0x820
>         [<ffffffff81111b8d>] call_rcu+0x1d/0x20
>         [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
>         [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
>         [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
>         [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
>         [<ffffffff817200be>] kernel_init+0xe/0x190
>         [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
>
> -> #0 (&rdp->nocb_wq){......}:
>         [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>         [<ffffffff8108ff43>] __wake_up+0x23/0x50
>         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>         [<ffffffff81111450>] __call_rcu+0x140/0x820
>         [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
>         [<ffffffff81149abf>] put_ctx+0x4f/0x70
>         [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
>         [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
>         [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
>         [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
>         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> other info that might help us debug this:
>
> Chain exists of:
>   &rdp->nocb_wq --> &rq->lock --> &ctx->lock
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&ctx->lock);
>                                 lock(&rq->lock);
>                                 lock(&ctx->lock);
>    lock(&rdp->nocb_wq);
>
>  *** DEADLOCK ***
>
> 1 lock held by trinity-child2/15191:
>  #0:  (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> stack backtrace:
> CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
>  ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
>  ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
>  ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
> Call Trace:
>  [<ffffffff8172a363>] dump_stack+0x4e/0x82
>  [<ffffffff81726741>] print_circular_bug+0x200/0x20f
>  [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
>  [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
>  [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
>  [<ffffffff810cc243>] lock_acquire+0x93/0x200
>  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
>  [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
>  [<ffffffff8108ff43>] __wake_up+0x23/0x50
>  [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>  [<ffffffff81111450>] __call_rcu+0x140/0x820
>  [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
>  [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
>  [<ffffffff81149abf>] put_ctx+0x4f/0x70
>  [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
>  [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
>  [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
>  [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
>  [<ffffffff8173d4e4>] tracesys+0xdd/0xe2

The underlying problem is that perf is invoking call_rcu() with the
scheduler locks held, but in NOCB mode, call_rcu() will with high
probability invoke the scheduler -- which just might want to use its
locks.  The reason that call_rcu() needs to invoke the scheduler is
to wake up the corresponding rcuo callback-offload kthread, which
does the job of starting up a grace period and invoking the callbacks
afterwards.

One solution (championed on a related problem by Lai Jiangshan) is to
simply defer the wakeup to some point where scheduler locks are no longer
held.  Since we don't want to unnecessarily incur the cost of such
deferral, the task before us is threefold:

1.	Determine when it is likely that a relevant scheduler lock is held.

2.	Defer the wakeup in such cases.

3.	Ensure that all deferred wakeups eventually happen, preferably
	sooner rather than later.

We use irqs_disabled_flags() as a proxy for relevant scheduler locks
being held.  This works because the relevant locks are always acquired
with interrupts disabled.  We may defer more often than needed, but that
is at least safe.

The wakeup deferral is tracked via a new field in the per-CPU and
per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.

This flag is checked by the RCU core processing.  The __rcu_pending()
function now checks this flag, which causes rcu_check_callbacks()
to initiate RCU core processing at each scheduling-clock interrupt
where this flag is set.  Of course this is not sufficient because
scheduling-clock interrupts are often turned off (the things we used to
be able to count on!).  So the flags are also checked on entry to any
state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
state and NO_HZ_FULL user-mode-execution state.

This approach should allow call_rcu() to be invoked regardless of what
locks you might be holding, the key word being "should".

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
2013-12-03 10:10:18 -08:00
Paul E. McKenney
78e4bc34e5 rcu: Fix and comment ordering around wait_event()
It is all too easy to forget that wait_event() does not necessarily
imply a full memory barrier.  The case where it does not is where the
condition transitions to true just as wait_event() starts execution.
This is actually a feature: The standard use of wait_event() involves
locking, in which case the locks provide the needed ordering (you hold a
lock across the wake_up() and acquire that same lock after wait_event()
returns).

Given that I did forget that wait_event() does not necessarily imply a
full memory barrier in one case, this commit fixes that case.  This commit
also adds comments calling out the placement of existing memory barriers
relied on by wait_event() calls.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03 10:10:18 -08:00
Paul E. McKenney
6193c76aba rcu: Kick CPU halfway to RCU CPU stall warning
When an RCU CPU stall warning occurs, the CPU invokes resched_cpu() on
itself.  This can help move the grace period forward in some situations,
but it would be even better to do this -before- the RCU CPU stall warning.
This commit therefore causes resched_cpu() to be called every five jiffies
once the system is halfway to an RCU CPU stall warning.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-12-03 10:10:18 -08:00
Frederic Weisbecker
c925077c33 posix-timers: Fix full dynticks CPUs kick on timer rescheduling
A posix CPU timer can be rearmed while it is firing or after it is
notified with a signal. This can happen for example with timers that
were set with a non zero interval in timer_settime().

This rearming can happen in two places:

1) On timer firing time, which happens on the target's tick. If the timer
can't trigger a signal because it is ignored, it reschedules itself
to honour the timer interval.

2) On signal handling from the timer's notification target. This one
can be a different task than the timer's target itself. Once the
signal is notified, the notification target rearms the timer, again
to honour the timer interval.

When a timer is rearmed, we need to notify the full dynticks CPUs
such that they restart their tick in case they are running tasks that
may have a share in elapsing this timer.

Now the 1st case above handles full dynticks CPUs with a call to
posix_cpu_timer_kick_nohz() from the posix cpu timer firing code. But
the second case ignores the fact that some CPUs may run non-idle tasks
with their tick off. As a result, when a timer is resheduled after its signal
notification, the full dynticks CPUs may completely ignore it and not
tick on the timer as expected

This patch fixes this bug by handling both cases in one. All we need
is to move the kick to the rearming common code in posix_cpu_timer_schedule().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Olivier Langlois <olivier@olivierlanglois.net>
2013-12-02 20:46:27 +01:00
Frederic Weisbecker
d4283c6541 posix-timers: Spare workqueue if there is no full dynticks CPU to kick
After a posix cpu timer is set, a workqueue is scheduled in order to
kick the full dynticks CPUs and let them restart their tick if
necessary in case the task they are running is concerned by the
new timer.

This kick is implemented by way of IPIs, which require interrupts
to be enabled, hence the need for a workqueue to raise them because
the posix cpu timer set path has interrupts disabled.

Now if there is no full dynticks CPU on the system, the workqueue is
still scheduled but it simply won't send any IPI and return immediately.

So lets spare that worqueue when it is not needed.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02 20:43:16 +01:00
Frederic Weisbecker
58135f574f context_tracking: Wrap static key check into more intuitive function name
Use a function with a meaningful name to check the global context
tracking state. static_key_false() is a bit confusing for reviewers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02 20:43:14 +01:00
Frederic Weisbecker
e8fcaa5c54 nohz: Convert a few places to use local per cpu accesses
A few functions use remote per CPU access APIs when they
deal with local values.

Just do the right conversion to improve performance, code
readability and debug checks.

While at it, lets extend some of these function names with *_this_cpu()
suffix in order to display their purpose more clearly.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-12-02 20:39:30 +01:00
Linus Torvalds
a45299e727 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 - Correction of fuzzy and fragile IRQ_RETVAL macro
 - IRQ related resume fix affecting only XEN
 - ARM/GIC fix for chained GIC controllers

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: Gic: fix boot for chained gics
  irq: Enable all irqs unconditionally in irq_resume
  genirq: Correct fuzzy and fragile IRQ_RETVAL() definition
2013-12-02 10:15:39 -08:00
Linus Torvalds
a0b57ca33e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Various smaller fixlets, all over the place"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/doc: Fix generation of device-drivers
  sched: Expose preempt_schedule_irq()
  sched: Fix a trivial typo in comments
  sched: Remove unused variable in 'struct sched_domain'
  sched: Avoid NULL dereference on sd_busy
  sched: Check sched_domain before computing group power
  MAINTAINERS: Update file patterns in the lockdep and scheduler entries
2013-12-02 10:13:44 -08:00
Linus Torvalds
e321ae4c20 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc kernel and tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools lib traceevent: Fix conversion of pointer to integer of different size
  perf/trace: Properly use u64 to hold event_id
  perf: Remove fragile swevent hlist optimization
  ftrace, perf: Avoid infinite event generation loop
  tools lib traceevent: Fix use of multiple options in processing field
  perf header: Fix possible memory leaks in process_group_desc()
  perf header: Fix bogus group name
  perf tools: Tag thread comm as overriden
2013-12-02 10:13:09 -08:00
Leonardo Potenza
e0c7855e36 PM / hibernate: export hibernation_set_ops
To support the ability to implement PM hibernation code as modules
the hibernation_set_ops function requires to be exported.

Similar solution already available for suspend_set_ops
(please refer to commit a5e4fd8783).

Signed-off-by: Leonardo Potenza <leonardo.potenza@intel.com>
Signed-off-by: Edwin Verplanke <edwin.verplanke@intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-30 15:12:16 +01:00
Linus Torvalds
7224b31bd5 Merge branch 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "This contains one important fix.  The NUMA support added a while back
  broke ordering guarantees on ordered workqueues.  It was enforced by
  having single frontend interface with @max_active == 1 but the NUMA
  support puts multiple interfaces on unbound workqueues on NUMA
  machines thus breaking the ordered guarantee.  This is fixed by
  disabling NUMA support on ordered workqueues.

  The above and a couple other patches were sitting in for-3.12-fixes
  but I forgot to push that out, so they ended up waiting a bit too
  long.  My aplogies.

  Other fixes are minor"

* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues
  workqueue: fix comment typo for __queue_work()
  workqueue: fix ordered workqueues in NUMA setups
  workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
2013-11-29 09:49:08 -08:00
Linus Torvalds
2855987d13 Merge branch 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Fixes for three issues.

   - cgroup destruction path could swamp system_wq possibly leading to
     deadlock.  This actually seems to happen in the wild with memcg
     because memcg destruction path adds nested dependency on system_wq.

     Resolved by isolating cgroup destruction work items on its
     dedicated workqueue.

   - Possible locking context deadlock through seqcount reported by
     lockdep

   - Memory leak under certain conditions"

* 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix cgroup_subsys_state leak for seq_files
  cpuset: Fix memory allocator deadlock
  cgroup: use a dedicated workqueue for cgroup destruction
2013-11-29 09:47:06 -08:00
Tejun Heo
afb2bc14e1 cgroup: don't guarantee cgroup.procs is sorted if sane_behavior
For some reason, tasks and cgroup.procs guarantee that the result is
sorted.  This is the only reason this whole pidlist logic is necessary
instead of just iterating through sorted member tasks.  We can't do
anything about the existing interface but at least ensure that such
expectation doesn't exist for the new interface so that pidlist logic
may be removed in the distant future.

This patch scrambles the sort order if sane_behavior so that the
output is usually not sorted in the new interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29 10:42:59 -05:00
Tejun Heo
045023658c cgroup: remove cgroup_pidlist->use_count
After the recent changes, pidlist ref is held only between
cgroup_pidlist_start() and cgroup_pidlist_stop() during which
cgroup->pidlist_mutex is also held.  IOW, the reference count is
redundant now.  While in use, it's always one and pidlist_mutex is
held - holding the mutex has exactly the same protection.

This patch collapses destroy_dwork queueing into cgroup_pidlist_stop()
so that pidlist_mutex is not released inbetween and drops
pidlist->use_count.

This patch shouldn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-11-29 10:42:59 -05:00