If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.
Before:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
After:
48.73% hog [.] main
15.36% [kernel] [k] _raw_spin_lock_irqsave
9.77% [kernel] [k] _raw_spin_unlock_irqrestore
6.61% [kernel] [k] lock_timer_base.isra.38
6.42% [kernel] [k] mod_timer
3.90% [kernel] [k] detach_if_pending
3.76% [kernel] [k] del_timer
2.41% [kernel] [k] internal_add_timer
1.39% [kernel] [k] __internal_add_timer
0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We want to rename dev->state, so provide proper get and set
functions. Rename clockevents_set_state() to
clockevents_switch_state() to avoid confusion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.
Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.
Temporary change, will be refined in the future.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.
Split out the broadcast oneshot control into a separate function
and provide inline helpers. Switch clockevents_notify() over.
This will go away once all callers are converted.
This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast oneshot control functions do not
require clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.
Split out the broadcast control into a separate function and
provide inline helpers. Switch clockevents_notify() over. This
will go away once all callers are converted.
This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast control functions do not require
clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the broadcasting related section to the GENERIC_CLOCKEVENTS=y
section - this also solves build failures on architectures that
don't use generic clockevents yet.
Also standardize include file style to make it easier to read, and
use nesting depth aware preprocessor directives to make future merges
easier.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the new tick_suspend/resume_local() and get rid of the
homebrewn implementation of these in the ARM bL switcher. The
check for the cpumask is completely pointless. There is no harm
to suspend a per cpu tick device unconditionally. If that's a
real issue then we fix it proper at the core level and not with
some completely undocumented hacks in some random core code.
Move the tick internals to the core code, now that this nuisance
is gone.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ rjw: Rebase, changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/1655112.Ws17YsMfN7@vostro.rjw.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Xen calls on every cpu into tick_resume() which is just wrong.
tick_resume() is for the syscore global suspend/resume
invocation. What XEN really wants is a per cpu local resume
function.
Provide a tick_resume_local() function and use it in XEN.
Also provide a complementary tick_suspend_local() and modify
tick_unfreeze() and tick_freeze(), respectively, to use the
new local tick resume/suspend functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Combined two patches, rebased, modified subject/changelog. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1698741.eezk9tnXtG@vostro.rjw.lan
[ Merged to latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call.
We are way better off to have explicit calls instead of this
monstrosity. Split out the suspend/resume() calls and invoke
them directly from the call sites.
No locking required at this point because these calls happen
with interrupts disabled and a single cpu online.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Rebased on top of 4.0-rc5. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/713674030.jVm1qaHuPf@vostro.rjw.lan
[ Rebased on top of latest timers/core. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This way we unbloat a bit main.c and more importantly we initialize
nohz full after init_IRQ(). This dependency will be needed in further
patches because nohz full needs irq work to raise its own IRQ.
Information about the support for this ability on ARM64 is obtained on
init_IRQ() which initialize the pointer to __smp_call_function.
Since tick_init() is called right after init_IRQ(), this is a good place
to call tick_nohz_init() and prepare for that dependency.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Right now we have time related prototypes in 3 different header
files. Move it to a single timekeeping header file and move the core
internal stuff into a core private header.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Since the xtime lock was split into the timekeeping lock and
the jiffies lock, we no longer need to call update_wall_time()
while holding the jiffies lock.
Thus, this patch splits update_wall_time() out from do_timer().
This allows us to get away from calling clock_was_set_delayed()
in update_wall_time() and instead use the standard clock_was_set()
call that previously would deadlock, as it causes the jiffies lock
to be acquired.
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>