Commit Graph

170 Commits

Author SHA1 Message Date
Andi Kleen
b3bd3de66f gcc-4.6: kernel/*: Fix unused but set warnings
No real bugs I believe, just some dead code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: andi@firstfloor.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:36:58 +02:00
Linus Torvalds
b62ad9ab18 Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  um: Fix read_persistent_clock fallout
  kgdb: Do not access xtime directly
  powerpc: Clean up obsolete code relating to decrementer and timebase
  powerpc: Rework VDSO gettimeofday to prevent time going backwards
  clocksource: Add __clocksource_updatefreq_hz/khz methods
  x86: Convert common clocksources to use clocksource_register_hz/khz
  timekeeping: Make xtime and wall_to_monotonic static
  hrtimer: Cleanup direct access to wall_to_monotonic
  um: Convert to use read_persistent_clock
  timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
  powerpc: Cleanup xtime usage
  powerpc: Simplify update_vsyscall
  time: Kill off CONFIG_GENERIC_TIME
  time: Implement timespec_add
  x86: Fix vtime/file timestamp inconsistencies

Trivial conflicts in Documentation/feature-removal-schedule.txt

Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
per Thomas' earlier merge commit 47916be4e2 ("Merge branch
'powerpc.cherry-picks' into timers/clocksource")
2010-08-06 13:18:29 -07:00
John Stultz
8ab4351a4c hrtimer: Cleanup direct access to wall_to_monotonic
Provides an accessor function to replace hrtimer.c's
direct access of wall_to_monotonic.

This will allow wall_to_monotonic to be made static as
planned in Documentation/feature-removal-schedule.txt

Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-9-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-27 12:40:55 +02:00
Venkatesh Pallipadi
83cd4fe27a sched: Change nohz idle load balancing logic to push model
In the new push model, all idle CPUs indeed go into nohz mode. There is
still the concept of idle load balancer (performing the load balancing
on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz
balancer when any of the nohz CPUs need idle load balancing.
The kickee CPU does the idle load balancing on behalf of all idle CPUs
instead of the normal idle balance.

This addresses the below two problems with the current nohz ilb logic:
* the idle load balancer continued to have periodic ticks during idle and
  wokeup frequently, even though it did not have any rebalancing to do on
  behalf of any of the idle CPUs.
* On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
  periodic wakeup can result in a periodic additional interrupt on a CPU
  doing the timer broadcast.

Also currently we are migrating the unpinned timers from an idle to the cpu
doing idle load balancing (when all the cpus in the system are idle,
there is no idle load balancing cpu and timers get added to the same idle cpu
where the request was made. So the existing optimization works only on semi idle
system).

And In semi idle system, we no longer have periodic ticks on the idle load
balancer CPU. Using that cpu will add more delays to the timers than intended
(as that cpu's timer base may not be uptodate wrt jiffies etc). This was
causing mysterious slowdowns during boot etc.

For now, in the semi idle case, use the nearest busy cpu for migrating timers
from an idle cpu.  This is good for power-savings anyway.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1274486981.2840.46.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-06-09 10:34:52 +02:00
Stanislaw Gruszka
174bd1994e hrtimer: Avoid double seqlock
hrtimer_get_softirq_time() has it's own xtime lock protection, so it's
safe to use plain __current_kernel_time() and avoid the double seqlock
loop.

Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
LKML-Reference: <20100525214912.GA1934@r2bh72.net.upc.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-05-26 16:15:37 +02:00
Carsten Emde
351b3f7a21 hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME
The current version of schedule_hrtimeout() always uses the
monotonic clock. Some system calls such as mq_timedsend()
and mq_timedreceive(), however, require the use of the wall
clock due to the definition of the system call.

This patch provides the infrastructure to use schedule_hrtimeout() 
with a CLOCK_REALTIME timer.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Tested-by: Pradyumna Sampath <pradysam@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Veen <arjan@infradead.org>
LKML-Reference: <20100402204331.167439615@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-04-06 21:50:03 +02:00
Thomas Gleixner
ecb49d1a63 hrtimers: Convert to raw_spinlocks
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:34 +01:00
Heiko Carstens
5f201907df hrtimer: move timer stats helper functions to hrtimer.c
There is no reason to make timer_stats_hrtimer_set_start_info and
friends visible to the rest of the kernel. So move all of them to
hrtimer.c.  Also make timer_stats_hrtimer_set_start_info a static
inline function so it gets inlined and we avoid another function call.
Based on a patch by Thomas Gleixner.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20091210095629.GC4144@osiris.boeblingen.de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-12-10 13:08:11 +01:00
Thomas Gleixner
41d2e49493 hrtimer: Tune hrtimer_interrupt hang logic
The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
2009-12-10 13:08:11 +01:00
Linus Torvalds
60d8ce2cd6 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers, init: Limit the number of per cpu calibration bootup messages
  posix-cpu-timers: optimize and document timer_create callback
  clockevents: Add missing include to pacify sparse
  x86: vmiclock: Fix printk format
  x86: Fix printk format due to variable type change
  sparc: fix printk for change of variable type
  clocksource/events: Fix fallout of generic code changes
  nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
  nohz: Track last do_timer() cpu
  nohz: Prevent clocksource wrapping during idle
  nohz: Type cast printk argument
  mips: Use generic mult/shift factor calculation for clocks
  clocksource: Provide a generic mult/shift factor calculation
  clockevents: Use u32 for mult and shift factors
  nohz: Introduce arch_needs_cpu
  nohz: Reuse ktime in sub-functions of tick_check_idle.
  time: Remove xtime_cache
  time: Implement logarithmic time accumulation
2009-12-08 19:27:08 -08:00
Jon Hunter
97813f2fe7 nohz: Allow 32-bit machines to sleep for more than 2.15 seconds
In the dynamic tick code, "max_delta_ns" (member of the
"clock_event_device" structure) represents the maximum sleep time
that can occur between timer events in nanoseconds.

The variable, "max_delta_ns", is defined as an unsigned long
which is a 32-bit integer for 32-bit machines and a 64-bit
integer for 64-bit machines (if -m64 option is used for gcc).
The value of max_delta_ns is set by calling the function
"clockevent_delta2ns()" which returns a maximum value of LONG_MAX.
For a 32-bit machine LONG_MAX is equal to 0x7fffffff and in
nanoseconds this equates to ~2.15 seconds. Hence, the maximum
sleep time for a 32-bit machine is ~2.15 seconds, where as for
a 64-bit machine it will be many years.

This patch changes the type of max_delta_ns to be "u64" instead of
"unsigned long" so that this variable is a 64-bit type for both 32-bit
and 64-bit machines. It also changes the maximum value returned by
clockevent_delta2ns() to KTIME_MAX.  Hence this allows a 32-bit
machine to sleep for longer than ~2.15 seconds. Please note that this
patch also changes "min_delta_ns" to be "u64" too and although this is
unnecessary, it makes the patch simpler as it avoids to fixup all
callers of clockevent_delta2ns().

[ tglx: changed "unsigned long long" to u64 as we use this data type
  	through out the time code ]

Signed-off-by: Jon Hunter <jon-hunter@ti.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1250617512-23567-3-git-send-email-jon-hunter@ti.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-11-13 20:46:24 +01:00
Linus Torvalds
e69a9ac596 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Remove overly verbose "switch to high res mode" message
2009-10-05 12:04:16 -07:00
Linus Torvalds
6f5071020d Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimer: Eliminate needless reprogramming of clock events device
2009-09-27 10:39:04 -07:00
Roland Dreier
d3f6302e7e hrtimer: Remove overly verbose "switch to high res mode" message
On big systems, printing <number of CPUs> copies of

    Switched to high resolution mode on CPU nnn

clutters up the kernel log for minimal gain.  Just get rid of them.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <ada1vlw126s.fsf_-_@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-26 16:58:09 +02:00
Linus Torvalds
31bbb9b58d Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  itimers: Add tracepoints for itimer
  hrtimer: Add tracepoint for hrtimers
  timers: Add tracepoints for timer_list timers
  cputime: Optimize jiffies_to_cputime(1)
  itimers: Simplify arm_timer() code a bit
  itimers: Fix periodic tics precision
  itimers: Merge ITIMER_VIRT and ITIMER_PROF

Trivial header file include conflicts in kernel/fork.c
2009-09-23 09:46:15 -07:00
Linus Torvalds
a03fdb7612 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
  time: Prevent 32 bit overflow with set_normalized_timespec()
  clocksource: Delay clocksource down rating to late boot
  clocksource: clocksource_select must be called with mutex locked
  clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
  timers: Drop a function prototype
  clocksource: Resolve cpu hotplug dead lock with TSC unstable
  timer.c: Fix S/390 comments
  timekeeping: Fix invalid getboottime() value
  timekeeping: Fix up read_persistent_clock() breakage on sh
  timekeeping: Increase granularity of read_persistent_clock(), build fix
  time: Introduce CLOCK_REALTIME_COARSE
  x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
  clocksource: Avoid clocksource watchdog circular locking dependency
  clocksource: Protect the watchdog rating changes with clocksource_mutex
  clocksource: Call clocksource_change_rating() outside of watchdog_lock
  timekeeping: Introduce read_boot_clock
  timekeeping: Increase granularity of read_persistent_clock()
  timekeeping: Update clocksource with stop_machine
  timekeeping: Add timekeeper read_clock helper functions
  timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
  ...

Fix trivial conflict due to MIPS lemote -> loongson renaming.
2009-09-18 09:15:24 -07:00
Ashwin Chaugule
7403f41f19 hrtimer: Eliminate needless reprogramming of clock events device
On NOHZ systems the following timers,

-  tick_nohz_restart_sched_tick (tick_sched_timer)
-  hrtimer_start (tick_sched_timer)

are reprogramming the clock events device far more often than needed.
No specific test case was required to observe this effect. This
occurres because there was no check to see if the currently removed or
restarted hrtimer was:

1) the one which previously armed the clock events device.
2) going to be replaced by another timer which has the same expiry time.

Avoid the reprogramming in hrtimer_force_reprogram when the new expiry
value which is evaluated from the clock bases is equal to
cpu_base->expires_next. This results in faster application startup
time by ~4%.

[ tglx: simplified initial solution ]

Signed-off-by: Ashwin Chaugule <ashwinc@quicinc.com>
LKML-Reference: <4AA00165.90609@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-09-15 17:09:44 +02:00
Xiao Guangrong
c6a2a17702 hrtimer: Add tracepoint for hrtimers
Add tracepoints which cover the life cycle of a hrtimer. The
tracepoints are integrated with the already existing debug_object
debug points as far as possible.

[ tglx: Fixed comments, made output conistent, easier to read and
  	parse. Fixed output for 32bit archs which do not use the
  	scalar representation of ktime_t. Hand current time to
  	trace_hrtimer_expiry_entry instead of calling get_time()
  	inside of the trace assignment. ]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Anton Blanchard <anton@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>
LKML-Reference: <4A7F8B2B.5020908@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-29 14:10:06 +02:00
Stephen Hemminger
2bc481cf43 pktgen: spin using hrtimer
This changes how the pktgen thread spins/waits between
packets if delay is configured. It uses a high res timer to
wait for time to arrive.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-28 23:41:29 -07:00
Thomas Gleixner
4cd1993f00 Merge branch 'linus' into timers/core
Reason: Martin's timekeeping cleanup series depends on both
timers/core and mainline changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-14 15:59:30 +02:00
Peter Zijlstra
fbd90375d7 hrtimer: Remove cb_entry from struct hrtimer
It's unused, remove it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
2009-07-22 17:12:32 +02:00
Thomas Gleixner
6ff7041dbf hrtimer: Fix migration expiry check
The timer migration expiry check should prevent the migration of a
timer to another CPU when the timer expires before the next event is
scheduled on the other CPU. Migrating the timer might delay it because
we can not reprogram the clock event device on the other CPU. But the
code implementing that check has two flaws:

- for !HIGHRES the check compares the expiry value with the clock
  events device expiry value which is wrong for CLOCK_REALTIME based
  timers.

- the check is racy. It holds the hrtimer base lock of the target CPU,
  but the clock event device expiry value can be modified
  nevertheless, e.g. by an timer interrupt firing.

The !HIGHRES case is easy to fix as we can enqueue the timer on the
cpu which was selected by the load balancer. It runs the idle
balancing code once per jiffy anyway. So the maximum delay for the
timer is the same as when we keep the tick on the current cpu going.

In the HIGHRES case we can get the next expiry value from the hrtimer
cpu_base of the target CPU and serialize the update with the cpu_base
lock. This moves the lock section in hrtimer_interrupt() so we can set
next_event to KTIME_MAX while we are handling the expired timers and
set it to the next expiry value after we handled the timers under the
base lock. While the expired timers are processed timer migration is
blocked because the expiry time of the timer is always <= KTIME_MAX.

Also remove the now useless clockevents_get_next_event() function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-10 17:32:55 +02:00
Thomas Gleixner
7e0c5086c1 hrtimer: migration: do not check expiry time on current CPU
The timer migration code needs to check whether the expiry time of the
timer is before the programmed clock event expiry time when the timer
is enqueued on another CPU because we can not reprogram the timer
device on the other CPU. The current logic checks the expiry time even
if we enqueue on the current CPU when nohz_get_load_balancer() returns
current CPU. This might lead to an endless loop in the expiry check
code when the expiry time of the timer is before the current
programmed next event.

Check whether nohz_get_load_balancer() returns current CPU and skip
the expiry check if this is the case.

The bug was triggered from the networking code. The patch fixes the
regression http://bugzilla.kernel.org/show_bug.cgi?id=13738
(Soft-Lockup/Race in networking in 2.6.31-rc1+195)

Cc: Arun Bharadwaj <arun@linux.vnet.ibm.com
Tested-by: Joao Correia <joaomiguelcorreia@gmail.com>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-10 17:22:20 +02:00
Thomas Gleixner
a40f262cc2 timekeeping: Move ktime_get() functions to timekeeping.c
The ktime_get() functions for GENERIC_TIME=n are still located in
hrtimer.c. Move them to time/timekeeping.c where they belong.

LKML-Reference: <new-submission>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-07 13:00:31 +02:00
Martin Schwidefsky
951ed4d36b timekeeping: optimized ktime_get[_ts] for GENERIC_TIME=y
The generic ktime_get function defined in kernel/hrtimer.c is suboptimial
for GENERIC_TIME=y:

 0)               |  ktime_get() {
 0)               |    ktime_get_ts() {
 0)               |      getnstimeofday() {
 0)               |        read_tod_clock() {
 0)   0.601 us    |        }
 0)   1.938 us    |      }
 0)               |      set_normalized_timespec() {
 0)   0.602 us    |      }
 0)   4.375 us    |    }
 0)   5.523 us    |  }

Overall there are two read_seqbegin/read_seqretry loops and a lot of
unnecessary struct timespec calculations. ktime_get returns a nano second
value which is the sum of xtime, wall_to_monotonic and the nano second
delta from the clock source.

ktime_get can be optimized for GENERIC_TIME=y. The new version only calls
clocksource_read:

 0)               |  ktime_get() {
 0)               |    read_tod_clock() {
 0)   0.610 us    |    }
 0)   1.977 us    |  }

It uses a single read_seqbegin/readseqretry loop and just adds everthing
to a nano second value.

ktime_get_ts is optimized in a similar fashion.

[ tglx: added WARN_ON(timekeeping_suspended) as in getnstimeofday() ]

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: john stultz <johnstul@us.ibm.com>
LKML-Reference: <20090707112728.3005244d@skybase>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-07-07 12:47:33 +02:00