Commit Graph

633 Commits

Author SHA1 Message Date
Linus Torvalds
aafd9d6a46 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer/dynticks updates from Ingo Molnar:
 "This tree contains misc dynticks updates: a fix and three cleanups"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
  nohz_full: fix code style issue of tick_nohz_full_stop_tick
  nohz: Get timekeeping max deferment outside jiffies_lock
  tick: Rename tick_check_idle() to tick_irq_enter()
2014-01-31 09:02:51 -08:00
Linus Torvalds
595bf999e3 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A crash fix and documentation updates"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Make sched_class::get_rr_interval() optional
  sched/deadline: Add sched_dl documentation
  sched: Fix docbook parameter annotation error in wait.h
2014-01-31 09:00:44 -08:00
Peter Zijlstra
a57beec5d4 sched: Make sched_class::get_rr_interval() optional
Not all classes implement (or can implement) a useful get_rr_interval()
function, default to a 0 time-slice for them.

This fixes a crash reported by Tommi Rantala.

Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140127105413.GC11314@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:08:41 +01:00
Dario Faggioli
712e5e34ae sched/deadline: Add sched_dl documentation
Add in Documentation/scheduler/ some hints about the design
choices, the usage and the future possible developments of the
sched_dl scheduling class and of the SCHED_DEADLINE policy.

Reviewed-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
[ Re-wrote sections 2 and 3. ]
Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390821615-23247-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-28 13:08:40 +01:00
Linus Torvalds
f6d13daadd Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A couple of regression fixes mostly hitting virtualized setups, but
  also some bare metal systems"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/x86/tsc: Initialize multiplier to 0
  sched/clock: Fixup early initialization
  sched/preempt/x86: Fix voluntary preempt for x86
  Revert "sched: Fix sleep time double accounting in enqueue entity"
2014-01-25 11:11:31 -08:00
Ingo Molnar
a2b4c607c9 Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/urgent
Pull dynticks cleanups from Frederic Weisbecker.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25 08:27:26 +01:00
Andi Kleen
54a43d5498 numa: add a sysctl for numa_balancing
Add a working sysctl to enable/disable automatic numa memory balancing
at runtime.

This allows us to track down performance problems with this feature and
is generally a good idea.

This was possible earlier through debugfs, but only with special
debugging options set.  Also fix the boot message.

[akpm@linux-foundation.org: s/sched_numa_balancing/sysctl_numa_balancing/]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-23 16:36:51 -08:00
Peter Zijlstra
d375b4e0fa sched/clock: Fixup early initialization
The code would assume sched_clock_stable() and switch to !stable
later, this switch brings a discontinuity in time.

The discontinuity on switching from stable to unstable was always
present, but previously we would set stable/unstable before
initializing TSC and usually stick to the one we start out with.

So the static_key bits brought an extra switch where there previously
wasn't one.

Things are further complicated by the fact that we cannot use
static_key as early as we usually call set_sched_clock_stable().

Fix things by tracking the stable state in a regular variable and only
set the static_key to the right state on sched_clock_init(), which is
ran right after late_time_init->tsc_init().

Before this we would not be using the TSC anyway.

Reported-and-Tested-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: dyoung@redhat.com
Fixes: 35af99e646 ("sched/clock, x86: Use a static_key for sched_clock_stable")
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: paulmck@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: rui.zhang@intel.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140122115918.GG3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:36 +01:00
Vincent Guittot
9390675af0 Revert "sched: Fix sleep time double accounting in enqueue entity"
This reverts commit 282cf499f0.

With the current implementation, the load average statistics of a sched entity
change according to other activity on the CPU even if this activity is done
between the running window of the sched entity and have no influence on the
running duration of the task.

When a task wakes up on the same CPU, we currently update last_runnable_update
with the return  of __synchronize_entity_decay without updating the
runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync
the load_contrib of the se with the rq's blocked_load_contrib before removing
it from the latter (with __synchronize_entity_decay) but we must keep
last_runnable_update unchanged for updating runnable_avg_sum/period during the
next update_entity_load_avg.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Cc: pjt@google.com
Cc: alex.shi@linaro.org
Link: http://lkml.kernel.org/r/1390376734-6800-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23 14:48:34 +01:00
Linus Torvalds
df32e43a54 Merge branch 'akpm' (incoming from Andrew)
Merge first patch-bomb from Andrew Morton:

 - a couple of misc things

 - inotify/fsnotify work from Jan

 - ocfs2 updates (partial)

 - about half of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm/migrate: remove unused function, fail_migrate_page()
  mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages
  mm/migrate: correct failure handling if !hugepage_migration_support()
  mm/migrate: add comment about permanent failure path
  mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure
  mm: compaction: reset scanner positions immediately when they meet
  mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
  mm: compaction: detect when scanners meet in isolate_freepages
  mm: compaction: reset cached scanner pfn's before reading them
  mm: compaction: encapsulate defer reset logic
  mm: compaction: trace compaction begin and end
  memcg, oom: lock mem_cgroup_print_oom_info
  sched: add tracepoints related to NUMA task migration
  mm: numa: do not automatically migrate KSM pages
  mm: numa: trace tasks that fail migration due to rate limiting
  mm: numa: limit scope of lock for NUMA migrate rate limiting
  mm: numa: make NUMA-migrate related functions static
  lib/show_mem.c: show num_poisoned_pages when oom
  mm/hwpoison: add '#' to hwpoison_inject
  mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter
  ...
2014-01-21 19:05:45 -08:00
Linus Torvalds
f075e0f699 Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "The bulk of changes are cleanups and preparations for the upcoming
  kernfs conversion.

   - cgroup_event mechanism which is and will be used only by memcg is
     moved to memcg.

   - pidlist handling is updated so that it can be served by seq_file.

     Also, the list is not sorted if sane_behavior.  cgroup
     documentation explicitly states that the file is not sorted but it
     has been for quite some time.

   - All cgroup file handling now happens on top of seq_file.  This is
     to prepare for kernfs conversion.  In addition, all operations are
     restructured so that they map 1-1 to kernfs operations.

   - Other cleanups and low-pri fixes"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits)
  cgroup: trivial style updates
  cgroup: remove stray references to css_id
  doc: cgroups: Fix typo in doc/cgroups
  cgroup: fix fail path in cgroup_load_subsys()
  cgroup: fix missing unlock on error in cgroup_load_subsys()
  cgroup: remove for_each_root_subsys()
  cgroup: implement for_each_css()
  cgroup: factor out cgroup_subsys_state creation into create_css()
  cgroup: combine css handling loops in cgroup_create()
  cgroup: reorder operations in cgroup_create()
  cgroup: make for_each_subsys() useable under cgroup_root_mutex
  cgroup: css iterations and css_from_dir() are safe under cgroup_mutex
  cgroup: unify pidlist and other file handling
  cgroup: replace cftype->read_seq_string() with cftype->seq_show()
  cgroup: attach cgroup_open_file to all cgroup files
  cgroup: generalize cgroup_pidlist_open_file
  cgroup: unify read path so that seq_file is always used
  cgroup: unify cgroup_write_X64() and cgroup_write_string()
  cgroup: remove cftype->read(), ->read_map() and ->write()
  hugetlb_cgroup: convert away from cftype->read()
  ...
2014-01-21 17:51:34 -08:00
Mel Gorman
286549dcaf sched: add tracepoints related to NUMA task migration
This patch adds three tracepoints
 o trace_sched_move_numa	when a task is moved to a node
 o trace_sched_swap_numa	when a task is swapped with another task
 o trace_sched_stick_numa	when a numa-related migration fails

The tracepoints allow the NUMA scheduler activity to be monitored and the
following high-level metrics can be calculated

 o NUMA migrated stuck	 nr trace_sched_stick_numa
 o NUMA migrated idle	 nr trace_sched_move_numa
 o NUMA migrated swapped nr trace_sched_swap_numa
 o NUMA local swapped	 trace_sched_swap_numa src_nid == dst_nid (should never happen)
 o NUMA remote swapped	 trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped)
 o NUMA group swapped	 trace_sched_swap_numa src_ngid == dst_ngid
			 Maybe a small number of these are acceptable
			 but a high number would be a major surprise.
			 It would be even worse if bounces are frequent.
 o NUMA avg task migs.	 Average number of migrations for tasks
 o NUMA stddev task mig	 Self-explanatory
 o NUMA max task migs.	 Maximum number of migrations for a single task

In general the intent of the tracepoints is to help diagnose problems
where automatic NUMA balancing appears to be doing an excessive amount
of useless work.

[akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:48 -08:00
Peter Zijlstra
eaad45132c sched: Fix __sched_setscheduler() nice test
With the introduction of sched_attr::sched_nice we need to check
if we've got permission to actually change the nice value.

Daniel found that can_nice() would always fail; and upon
inspection it turns out that can_nice() only tests to see if we
can lower the nice value, but it doesn't validate if we're
lowering or not.

Therefore amend the test to only call can_nice() when we lower
the nice value.

Reported-and-Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/20140116165425.GA9481@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 18:07:08 +01:00
Peter Zijlstra
7479f3c9cf sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
I noticed the new sched_{set,get}attr() calls didn't properly deal
with the SCHED_RESET_ON_FORK hack.

Instead of propagating the flags in high bits nonsense use the brand
spanking new attr::sched_flags field.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Link: http://lkml.kernel.org/r/20140115162242.GJ31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:17 +01:00
Peter Zijlstra
0bb040a443 sched: Fix up attr::sched_priority warning
Fengguang Wu reported the following build warning:

  > kernel/sched/core.c:3067 __sched_setscheduler() warn: unsigned 'attr->sched_priority' is never less than zero.

Since it doesn't make sense for attr::sched_priority to be negative,
remove the check, since we already test for an upper limit any actual
negative values passed in through the old param::sched_priority field
will still be detected.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/n/tip-fid9nalzii2r5voxtf4eh5kz@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:16 +01:00
Peter Zijlstra
39fd8fd22b sched: Fix up scheduler syscall LTP fails
Wu reported LTP failures:

  > ltp.sched_setparam02.1.TFAIL
  > ltp.sched_setparam02.2.TFAIL
  > ltp.sched_setparam02.3.TFAIL
  > ltp.sched_setparam03.1.TFAIL

There were 2 things wrong; firstly __setscheduler() failed on
sched_setparam()'s policy = -1, fix that by reading from p->policy in
that case.

Secondly, getparam() (and getattr()) would still report !0
sched_priority for !FIFO/RR tasks after having been such. So
unconditionally set p->rt_priority.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Dario Faggioli <raistlin@linux.it>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/20140115153320.GH31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:15 +01:00
Peter Zijlstra
e3de300d12 sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
Previously sched_setscheduler() and sched_setparam() would not affect
the nice value of a task, restore this behaviour.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: raistlin@linux.it
Cc: juri.lelli@gmail.com
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/20140115113015.GB31570@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:14 +01:00
Juri Lelli
5778fccf36 sched/core: Fix htmldocs warnings
Fengguang Wu's kbuild test robot reported the following new htmldocs warnings:

  >>> Warning(kernel/sched/core.c:3380): No description found for parameter 'uattr'
  >>> Warning(kernel/sched/core.c:3380): Excess function parameter 'attr' description in 'sys_sched_setattr'
  >>> Warning(kernel/sched/core.c:3520): No description found for parameter 'uattr'
  >>> Warning(kernel/sched/core.c:3520): Excess function parameter 'attr' description in 'sys_sched_getattr'

The second argument to sys_sched_{setattr,getattr}() is named uattr (not attr).

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Dario Faggioli <raistlin@linux.it>
Fixes: d50dde5a10 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Link: http://lkml.kernel.org/r/52D5552D.5000102@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:12 +01:00
Juri Lelli
71362650b5 sched/deadline: No need to check p if dl_se is valid
Dan Carpenter reported new 'Smatch' warnings:

  > tree:   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core
  > head:   130816ce4d
  > commit: 1baca4ce16 [17/50] sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic
  >
  > kernel/sched/deadline.c:937 pick_next_task_dl() warn: variable dereferenced before check 'p' (see line 934)

BUG_ON() already fires if pick_next_dl_entity() doesn't return a valid
dl_se. No need to check if p is valid afterward.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 1baca4ce16 ("sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic")
Link: http://lkml.kernel.org/r/52D54E25.6060100@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:11 +01:00
Peter Zijlstra
d8bf52311e sched/deadline: Remove unused variables
fix these new sparse warnings:

  >> kernel/sched/core.c:305:14: sparse: symbol 'sysctl_sched_dl_period' was not declared. Should it be static?
  >> kernel/sched/core.c:306:5: sparse: symbol 'sysctl_sched_dl_runtime' was not declared. Should it be static?

Better still, they're completely unused so remove them.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Link: http://lkml.kernel.org/n/tip-ke0shkG7vMnzmcdqhhiymyem@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:10 +01:00
Fengguang Wu
88f1ebbc25 sched/deadline: Fix sparse static warnings
new sparse warnings:

  >> kernel/sched/cpudeadline.c:38:6: sparse: symbol 'cpudl_exchange' was not declared. Should it be static?
  >> kernel/sched/cpudeadline.c:46:6: sparse: symbol 'cpudl_heapify' was not declared. Should it be static?
  >> kernel/sched/cpudeadline.c:71:6: sparse: symbol 'cpudl_change_key' was not declared. Should it be static?
  >> kernel/sched/cpudeadline.c:195:15: sparse: memset with byte count of 163928

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Fixes: 6bfd6d72f5 ("sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap")
Link: http://lkml.kernel.org/r/52d47f8c.EYJsA5+mELPBk4t6\%fengguang.wu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-16 09:27:03 +01:00
Kevin Hilman
8fe8ff09ce sched/nohz: Fix overflow error in scheduler_tick_max_deferment()
While calculating the scheduler tick max deferment, the delta is
converted from microseconds to nanoseconds through a multiplication
against NSEC_PER_USEC.

But this microseconds operand is an unsigned int, thus the result may
likely overflow. The result is cast to u64 but only once the operation
is completed, which is too late to avoid overflown result.

This is currently not a problem because the scheduler tick max deferment
is 1 second. But this may become an issue as we plan to make this
value tunable.

So lets fix this by casting the usecs value to u64 before multiplying by
NSECS_PER_USEC.

Also to prevent from this kind of mistake to happen again, move this
ad-hoc jiffies -> nsecs conversion to a new helper.

Signed-off-by: Kevin Hilman <khilman@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1387315388-31676-2-git-send-email-khilman@linaro.org
[move ad-hoc conversion to jiffies_to_nsecs helper]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-01-16 00:08:12 +01:00
Peter Zijlstra
8cb75e0c4e sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
With various drivers wanting to inject idle time; we get people
calling idle routines outside of the idle loop proper.

Therefore we need to be extra careful about not missing
TIF_NEED_RESCHED -> PREEMPT_NEED_RESCHED propagations.

While looking at this, I also realized there's a small window in the
existing idle loop where we can miss TIF_NEED_RESCHED; when it hits
right after the tif_need_resched() test at the end of the loop but
right before the need_resched() test at the start of the loop.

So move preempt_fold_need_resched() out of the loop where we're
guaranteed to have TIF_NEED_RESCHED set.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-x9jgh45oeayzajz2mjt0y7d6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:38:55 +01:00
Peter Zijlstra
6577e42a3e sched/clock: Fix up clear_sched_clock_stable()
The below tells us the static_key conversion has a problem; since the
exact point of clearing that flag isn't too important, delay the flip
and use a workqueue to process it.

[ ] TSC synchronization [CPU#0 -> CPU#22]:
[ ] Measured 8 cycles TSC warp between CPUs, turning off TSC clock.
[ ]
[ ] ======================================================
[ ] [ INFO: possible circular locking dependency detected ]
[ ] 3.13.0-rc3-01745-g848b0d0322cb-dirty #637 Not tainted
[ ] -------------------------------------------------------
[ ] swapper/0/1 is trying to acquire lock:
[ ]  (jump_label_mutex){+.+...}, at: [<ffffffff8115a637>] jump_label_lock+0x17/0x20
[ ]
[ ] but task is already holding lock:
[ ]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
[ ]
[ ] which lock already depends on the new lock.
[ ]
[ ]
[ ] the existing dependency chain (in reverse order) is:
[ ]
[ ] -> #1 (cpu_hotplug.lock){+.+.+.}:
[ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
[ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
[ ]        [<ffffffff81093fdc>] get_online_cpus+0x3c/0x60
[ ]        [<ffffffff8104cc67>] arch_jump_label_transform+0x37/0x130
[ ]        [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
[ ]        [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
[ ]        [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
[ ]        [<ffffffff810c0f65>] sched_feat_set+0xf5/0x100
[ ]        [<ffffffff810c5bdc>] set_numabalancing_state+0x2c/0x30
[ ]        [<ffffffff81d12f3d>] numa_policy_init+0x1af/0x1b7
[ ]        [<ffffffff81cebdf4>] start_kernel+0x35d/0x41f
[ ]        [<ffffffff81ceb5a5>] x86_64_start_reservations+0x2a/0x2c
[ ]        [<ffffffff81ceb6a2>] x86_64_start_kernel+0xfb/0xfe
[ ]
[ ] -> #0 (jump_label_mutex){+.+...}:
[ ]        [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
[ ]        [<ffffffff810def00>] lock_acquire+0x90/0x130
[ ]        [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
[ ]        [<ffffffff8115a637>] jump_label_lock+0x17/0x20
[ ]        [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
[ ]        [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
[ ]        [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
[ ]        [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
[ ]        [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
[ ]        [<ffffffff810941cb>] _cpu_up+0xdb/0x160
[ ]        [<ffffffff810942c9>] cpu_up+0x79/0x90
[ ]        [<ffffffff81d0af6b>] smp_init+0x60/0x8c
[ ]        [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
[ ]        [<ffffffff8164e32e>] kernel_init+0xe/0x130
[ ]        [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
[ ]
[ ] other info that might help us debug this:
[ ]
[ ]  Possible unsafe locking scenario:
[ ]
[ ]        CPU0                    CPU1
[ ]        ----                    ----
[ ]   lock(cpu_hotplug.lock);
[ ]                                lock(jump_label_mutex);
[ ]                                lock(cpu_hotplug.lock);
[ ]   lock(jump_label_mutex);
[ ]
[ ]  *** DEADLOCK ***
[ ]
[ ] 2 locks held by swapper/0/1:
[ ]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81094037>] cpu_maps_update_begin+0x17/0x20
[ ]  #1:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8109408b>] cpu_hotplug_begin+0x2b/0x60
[ ]
[ ] stack backtrace:
[ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
[ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ ]  ffffffff82c9c270 ffff880236843bb8 ffffffff8165c5f5 ffffffff82c9c270
[ ]  ffff880236843bf8 ffffffff81658c02 ffff880236843c80 ffff8802368586a0
[ ]  ffff880236858678 0000000000000001 0000000000000002 ffff880236858000
[ ] Call Trace:
[ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
[ ]  [<ffffffff81658c02>] print_circular_bug+0x1f9/0x207
[ ]  [<ffffffff810de141>] __lock_acquire+0x1701/0x1eb0
[ ]  [<ffffffff816680ff>] ? __atomic_notifier_call_chain+0x8f/0xb0
[ ]  [<ffffffff810def00>] lock_acquire+0x90/0x130
[ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
[ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
[ ]  [<ffffffff81661f83>] mutex_lock_nested+0x63/0x3e0
[ ]  [<ffffffff8115a637>] ? jump_label_lock+0x17/0x20
[ ]  [<ffffffff8115a637>] jump_label_lock+0x17/0x20
[ ]  [<ffffffff8115aa3b>] static_key_slow_inc+0x6b/0xb0
[ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
[ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
[ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
[ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
[ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
[ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
[ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
[ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
[ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
[ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
[ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ] ------------[ cut here ]------------
[ ] WARNING: CPU: 0 PID: 1 at /usr/src/linux-2.6/kernel/smp.c:374 smp_call_function_many+0xad/0x300()
[ ] Modules linked in:
[ ] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-01745-g848b0d0322cb-dirty #637
[ ] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ ]  0000000000000009 ffff880236843be0 ffffffff8165c5f5 0000000000000000
[ ]  ffff880236843c18 ffffffff81093d8c 0000000000000000 0000000000000000
[ ]  ffffffff81ccd1a0 ffffffff810ca951 0000000000000000 ffff880236843c28
[ ] Call Trace:
[ ]  [<ffffffff8165c5f5>] dump_stack+0x4e/0x7a
[ ]  [<ffffffff81093d8c>] warn_slowpath_common+0x8c/0xc0
[ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
[ ]  [<ffffffff81093dda>] warn_slowpath_null+0x1a/0x20
[ ]  [<ffffffff8110b72d>] smp_call_function_many+0xad/0x300
[ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
[ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
[ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
[ ]  [<ffffffff8110ba96>] smp_call_function+0x46/0x80
[ ]  [<ffffffff8104f200>] ? arch_unregister_cpu+0x30/0x30
[ ]  [<ffffffff8110bb3c>] on_each_cpu+0x3c/0xa0
[ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
[ ]  [<ffffffff810ca951>] ? sched_clock_tick+0x1/0xa0
[ ]  [<ffffffff8104f964>] text_poke_bp+0x64/0xd0
[ ]  [<ffffffff810ca950>] ? sched_clock_idle_sleep_event+0x20/0x20
[ ]  [<ffffffff8104ccde>] arch_jump_label_transform+0xae/0x130
[ ]  [<ffffffff8115a3cf>] __jump_label_update+0x5f/0x80
[ ]  [<ffffffff8115a48d>] jump_label_update+0x9d/0xb0
[ ]  [<ffffffff8115aa6d>] static_key_slow_inc+0x9d/0xb0
[ ]  [<ffffffff810ca775>] clear_sched_clock_stable+0x15/0x20
[ ]  [<ffffffff810503b3>] mark_tsc_unstable+0x23/0x70
[ ]  [<ffffffff810772cb>] check_tsc_sync_source+0x14b/0x150
[ ]  [<ffffffff81076612>] native_cpu_up+0x3a2/0x890
[ ]  [<ffffffff810941cb>] _cpu_up+0xdb/0x160
[ ]  [<ffffffff810942c9>] cpu_up+0x79/0x90
[ ]  [<ffffffff81d0af6b>] smp_init+0x60/0x8c
[ ]  [<ffffffff81cebf42>] kernel_init_freeable+0x8c/0x197
[ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ]  [<ffffffff8164e32e>] kernel_init+0xe/0x130
[ ]  [<ffffffff8166beec>] ret_from_fork+0x7c/0xb0
[ ]  [<ffffffff8164e320>] ? rest_init+0xd0/0xd0
[ ] ---[ end trace 6ff1df5620c49d26 ]---
[ ] tsc: Marking TSC unstable due to check_tsc_sync_source failed

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-v55fgqj3nnyqnngmvuu8ep6h@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:15 +01:00
Peter Zijlstra
35af99e646 sched/clock, x86: Use a static_key for sched_clock_stable
In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.

Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.

                        MAINLINE   PRE       POST

    sched_clock_stable: 1          1         1
    (cold) sched_clock: 329841     221876    215295
    (cold) local_clock: 301773     234692    220773
    (warm) sched_clock: 38375      25602     25659
    (warm) local_clock: 100371     33265     27242
    (warm) rdtsc:       27340      24214     24208
    sched_clock_stable: 0          0         0
    (cold) sched_clock: 382634     235941    237019
    (cold) local_clock: 396890     297017    294819
    (warm) sched_clock: 38194      25233     25609
    (warm) local_clock: 143452     71234     71232
    (warm) rdtsc:       27345      24245     24243

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 15:13:13 +01:00