Commit Graph

582 Commits

Author SHA1 Message Date
Linus Torvalds
a95f9b6e09 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core updates (RCU and locking) from Ingo Molnar:
 "Most of the diffstat comes from the RCU slow boot regression fixes,
  but there's also a debuggability improvements/fixes."

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  memblock: Document memblock_is_region_{memory,reserved}()
  rcu: Precompute RCU_FAST_NO_HZ timer offsets
  rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
  rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
  rcu: RCU_FAST_NO_HZ detection of callback adoption
  spinlock: Indicate that a lockup is only suspected
  kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
  panic: Make panic_on_oops configurable
2012-06-15 16:52:35 -07:00
Ingo Molnar
4a1e001d2b Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Merge RCU fixes from Paul E. McKenney:

 " This series has four patches, the major point of which is to eliminate
   some slowdowns (including boot-time slowdowns) resulting from some
   RCU_FAST_NO_HZ changes.  The issue with the changes is that posting timers
   from the idle loop has no effect if the CPU has entered dyntick-idle
   mode because the CPU has already computed its wakeup time, and posting
   a timer does not cause it to be recomputed.  The short-term fix is for
   RCU to precompute the timeout value so that the CPU's calculation is
   correct. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-11 10:30:23 +02:00
Linus Torvalds
b1e25f4109 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull leap second timer fix from Thomas Gleixner.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
2012-06-08 09:11:33 -07:00
Paul E. McKenney
aa9b16306e rcu: Precompute RCU_FAST_NO_HZ timer offsets
When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
next wakeup time based on the timer wheels.  Only later, when actually
entering the idle loop, rcu_prepare_for_idle() will be invoked.  In some
cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
But all for naught: The next wakeup time for the CPU has already been
computed, and posting a timer afterwards does not force that wakeup
time to be recomputed.  This means that rcu_prepare_for_idle()'s have
no effect.

This is not a problem on a busy system because something else will wake
up the CPU soon enough.  However, on lightly loaded systems, the CPU
might stay asleep for a considerable length of time.  If that CPU has
a callback that the rest of the system is waiting on, the system might
run very slowly or (in theory) even hang.

This commit avoids this problem by having rcu_needs_cpu() give
tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
to wake back up, which tick_nohz_stop_sched_tick() takes into account
when programming the CPU's wakeup time.  An alternative approach is
for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
but timers are much more efficient than are hrtimers for frequently
and repeatedly posting and cancelling a given timer, which is exactly
what RCU_FAST_NO_HZ does.

Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
2012-06-06 20:43:28 -07:00
Linus Torvalds
0b3e9f3f21 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Remove NULL assignment of dattr_cur
  sched: Remove the last NULL entry from sched_feat_names
  sched: Make sched_feat_names const
  sched/rt: Fix SCHED_RR across cgroups
  sched: Move nr_cpus_allowed out of 'struct sched_rt_entity'
  sched: Make sure to not re-read variables after validation
  sched: Fix SD_OVERLAP
  sched: Don't try allocating memory from offline nodes
  sched/nohz: Fix rq->cpu_load calculations some more
  sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_mask
2012-06-05 09:47:15 -07:00
John Stultz
fad0c66c4b timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Richard Cochran <richardcochran@gmail.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-06-04 21:46:29 +02:00
Peter Zijlstra
5aaa0b7a2e sched/nohz: Fix rq->cpu_load calculations some more
Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.

Add a callback from the nohz exit to also age the rq->cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.

So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Cc: Venkatesh Pallipadi <venki@google.com>
Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30 14:02:16 +02:00
Thomas Gleixner
62cf20b32a tick: Move skew_tick option into the HIGH_RES_TIMER section
commit 5307c95 (tick: Add tick skew boot option) broke the
!CONFIG_HIGH_RES_TIMERS build.

Move the boot option parsing into the CONFIG_HIGH_RES_TIMERS section.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <mgalbraith@suse.de>
2012-05-25 14:08:57 +02:00
Magnus Damm
e5400321a6 clockevents: Make clockevents_config() a global symbol
Make clockevents_config() into a global symbol to allow it to be used
by compiled-in clockevent drivers. This is needed by drivers that want
to update the timer frequency after registration time.

Signed-off-by: Magnus Damm <damm@opensource.se>
Tested-by: Simon Horman <horms@verge.net.au>
Cc: arnd@arndb.de
Cc: johnstul@us.ibm.com
Cc: rjw@sisk.pl
Cc: lethal@linux-sh.org
Cc: gregkh@linuxfoundation.org
Cc: olof@lixom.net
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20120509143934.27521.46553.sendpatchset@w520
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-25 01:44:51 +02:00
Mike Galbraith
5307c9556b tick: Add tick skew boot option
Let the user decide whether power consumption or jitter is the
more important consideration for their machines.

Quoting removal commit af5ab277de:

"Historically, Linux has tried to make the regular timer tick on the
 various CPUs not happen at the same time, to avoid contention on
 xtime_lock.
    
 Nowadays, with the tickless kernel, this contention no longer happens
 since time keeping and updating are done differently. In addition,
 this skew is actually hurting power consumption in a measurable way on
 many-core systems."

Problems:

- Contrary to the above, systems do encounter contention on both
  xtime_lock and RCU structure locks when the tick is synchronized.
  
- Moderate sized RT systems suffer intolerable jitter due to the tick
  being synchronized.

- SGI reports the same for their large systems.

- Fully utilized systems reap no power saving benefit from skew removal,
  but do suffer from resulting induced lock contention.

- 0209f649 rcu: limit rcu_node leaf-level fanout
  This patch was born to combat lock contention which testing showed
  to have been _induced by_ skew removal.  Skew the tick, contention
  disappeared virtually completely.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-25 01:44:50 +02:00
Linus Torvalds
c7523a7c88 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner.

Various trivial conflict fixups in arch Kconfig due to addition of
unrelated entries nearby.  And one slightly more subtle one for sparc32
(new user of GENERIC_CLOCKEVENTS), fixed up as per Thomas.

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  timekeeping: Fix a few minor newline issues.
  time: remove obsolete declaration
  ntp: Fix a stale comment and a few stray newlines.
  ntp: Correct TAI offset during leap second
  timers: Fixup the Kconfig consolidation fallout
  x86: Use generic time config
  unicore32: Use generic time config
  um: Use generic time config
  tile: Use generic time config
  sparc: Use: generic time config
  sh: Use generic time config
  score: Use generic time config
  s390: Use generic time config
  openrisc: Use generic time config
  powerpc: Use generic time config
  mn10300: Use generic time config
  mips: Use generic time config
  microblaze: Use generic time config
  m68k: Use generic time config
  m32r: Use generic time config
  ...
2012-05-24 13:29:46 -07:00
Thomas Gleixner
b80fe1015b Merge branch 'fortglx/3.5/time' of git://git.linaro.org/people/jstultz/linux into timers/core 2012-05-22 10:30:39 +02:00
Richard Cochran
d239f49d77 timekeeping: Fix a few minor newline issues.
Fix a few minor newline issues.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-05-21 16:20:32 -07:00
Richard Cochran
cd5398bed9 ntp: Fix a stale comment and a few stray newlines.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-05-21 16:16:56 -07:00
Richard Cochran
dd48d708ff ntp: Correct TAI offset during leap second
When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-05-21 16:16:54 -07:00
Thomas Gleixner
764e0da14f timers: Fixup the Kconfig consolidation fallout
Sigh, I missed to check which architecture Kconfig files actually
include the core Kconfig file. There are a few which did not. So we
broke them.

Instead of adding the includes to those, we are better off to move the
include to init/Kconfig like we did already with irqs and others.

This does not change anything for the architectures using the old
style periodic timer mode. It just solves the build wreckage there.

For those architectures which use the clock events infrastructure it
moves the include of the core Kconfig file to "General setup" which is
a way more logical place than having it at random locations specified
by the architecture specific Kconfigs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Anna-Maria Gleixner <anna-maria@glx-um.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-21 23:43:46 +02:00
Thomas Gleixner
b5e498ad67 timers: Provide generic Kconfig switches
We really don't want all the arch code defining stuff
over and over.

[ anna-maria: Added missing GENERIC_CMOS_UPDATE switch ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@glx-um.de>
Cc: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/1337529587.3208.2.camel@dionysos
Acked-by: Sam Ravnborg <sam@ravnborg.org>
2012-05-21 11:01:40 +02:00
Greg Kroah-Hartman
d210267741 Merge 3.4-rc5 into staging-next
This resolves the conflict in:
	drivers/staging/vt6656/ioctl.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-02 11:48:07 -07:00
John Stultz
57c498fa5d alarmtimer: Provide accessor to alarmtimer rtc device
The Android alarm interface provides a settime call that sets both
the alarmtimer RTC device and CLOCK_REALTIME to the same value.

Since there may be multiple rtc devices, provide a hook to access the
one the alarmtimer infrastructure is using.

CC: Colin Cross <ccross@android.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Android Kernel Team <kernel-team@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-20 14:56:36 -07:00
Suresh Siddha
a6371f8023 tick: Fix the spurious broadcast timer ticks after resume
During resume, tick_resume_broadcast() programs the broadcast timer in
oneshot mode unconditionally. On the platforms where broadcast timer
is not really required, this will generate spurious broadcast timer
ticks upon resume. For example, on the always running apic timer
platforms with HPET, I see spurious hpet tick once every ~5minutes
(which is the 32-bit hpet counter wraparound time).

Similar to boot time, during resume make the oneshot mode setting of
the broadcast clock event device conditional on the state of active
broadcast users.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: svenjoac@gmx.de
Cc: torvalds@linux-foundation.org
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/1334802459.28674.209.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-19 21:27:50 +02:00
Thomas Gleixner
b9a6a23566 tick: Ensure that the broadcast device is initialized
Santosh found another trap when we avoid to initialize the broadcast
device in the switch_to_oneshot code. The broadcast device might be
still in SHUTDOWN state when we actually need to use it. That
obviously breaks, as set_next_event() is called on a shutdown
device. This did not break on x86, but Suresh analyzed it:

From the review, most likely on Sven's system we are force enabling
the hpet using the pci quirk's method very late. And in this case,
hpet_clockevent (which will be global_clock_event) handler can be
null, specifically as this platform might not be using deeper c-states
and using the reliable APIC timer.

Prior to commit 'fa4da365bc7772c', that handler will be set to
'tick_handle_oneshot_broadcast' when we switch the broadcast timer to
oneshot mode, even though we don't use it. Post commit
'fa4da365bc7772c', we stopped switching the broadcast mode to oneshot
as this is not really needed and his platform's global_clock_event's
handler will remain null. While on my SNB laptop, same is set to
'clockevents_handle_noop' because hpet gets enabled very early. (noop
handler on my platform set when the early enabled hpet timer gets
replaced by the lapic timer).

But the commit 'fa4da365bc7772c' tracked the broadcast timer mode in
the SW as oneshot, even though it didn't touch the HW timer. During
resume however, tick_resume_broadcast() saw the SW broadcast mode as
oneshot and actually programmed the broadcast device also into oneshot
mode. So this triggered the null pointer de-reference after the hpet
wraps around and depending on what the hpet counter is set to. On the
normal platforms where hpet gets enabled early we should be seeing a
spurious interrupt (in my SNB laptop I see one spurious interrupt
after around 5 minutes ;) which is 32-bit hpet counter wraparound
time), but that's a separate issue.

Enforce the mode setting when trying to set an event.

Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: torvalds@linux-foundation.org
Cc: svenjoac@gmx.de
Cc: rjw@sisk.pl
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181723350.2542@ionos
2012-04-19 21:27:35 +02:00
Thomas Gleixner
b435092f70 tick: Fix oneshot broadcast setup really
Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
pointer dereference. Linus spotted the clockevent handler being NULL.

commit fa4da365b(clockevents: tTack broadcast device mode change in
tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
broadcast device setup, which was introduced in commit 77b0d60c5(
clockevents: Leave the broadcast device in shutdown mode when not
needed).

The initial commit avoided to set up the broadcast device when no
broadcast request bits were set, but that left the broadcast device
disfunctional. In consequence deep idle states which need the
broadcast device were not woken up.

commit fa4da365b tried to fix that by initializing the state of the
broadcast facility, but that missed the fact, that nothing initializes
the event handler and some other state of the underlying clock event
device.

The fix is to revert both commits and make only the mode setting of
the clock event device conditional on the state of active broadcast
users. 

That initializes everything except the low level device mode, but this
happens when the broadcast functionality is invoked by deep idle.

Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1204181205540.2542@ionos
2012-04-18 14:00:56 +02:00
Suresh Siddha
fa4da365bc clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot()
In the commit 77b0d60c5a,
"clockevents: Leave the broadcast device in shutdown mode when not needed",
we were bailing out too quickly in tick_broadcast_switch_to_oneshot(),
with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'.

This breaks the platforms which need broadcast device oneshot services during
deep idle states. tick_broadcast_oneshot_control() thinks that it is
in periodic mode and fails to take proper decisions based on the
CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep
idle entry/exit.

Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT',
before leaving the broadcast HW device in shutdown mode if there are no active
requests for the moment.

Reported-and-tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-10 11:42:07 +02:00
Neal Cardwell
6f103929f8 nohz: Fix stale jiffies update in tick_nohz_restart()
Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
calling tick_do_update_jiffies64(now).

If we reach this point in the loop it means that we crossed a tick
boundary since we grabbed the "now" timestamp, so at this point "now"
refers to a time in the old jiffy, so using the old value for "now" is
incorrect, and is likely to give us a stale jiffies value.

In particular, the first time through the loop the
tick_do_update_jiffies64(now) call is always a no-op, since the
caller, tick_nohz_restart_sched_tick(), will have already called
tick_do_update_jiffies64(now) with that "now" value.

Note that tick_nohz_stop_sched_tick() already uses the correct
approach: when we notice we cross a jiffy boundary, grab a new
timestamp with ktime_get(), and *then* update jiffies.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-04-06 13:24:17 +02:00
Thomas Gleixner
3872c48b14 tick: Document TICK_ONESHOT config option
This option has been selected from arch code as it was assumed that
it's necessary to support oneshot mode clockevent devices. But it's
just a core internal helper to compile tick-oneshot.c if NOHZ or
HIG_RES_TIMERS are selected.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-03-31 12:45:43 +02:00