Remove time_interpolator code (This is generic code, but
only user was ia64. It has been superseded by the
CONFIG_GENERIC_TIME code).
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Peter Keilty <peter.keilty@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past. But with __GFP_ZERO it is possible now to do zeroing
while allocating.
Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag in /proc/timer_stats to indicate deferrable timers. This will
let developers/users to differentiate between types of tiemrs in
/proc/timer_stats.
Deferrable timer and normal timer will appear in /proc/timer_stats as below.
10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
Also version of timer_stats changes from v0.1 to v0.2
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_next_timer_interrupt() returns a delta of (LONG_MAX > 1) in case
there is no timer pending. On 64 bit machines this results in a
multiplication overflow in tick_nohz_stop_sched_tick().
Reported by: Dave Miller <davem@davemloft.net>
Make the return value a constant and limit the return value to a 32 bit
value.
When the max timeout value is returned, we can safely stop the tick
timer device. The max jiffies delta results in a 12 days timeout for
HZ=1000.
In the long term the get_next_timer_interrupt() code needs to be
reworked to return ktime instead of jiffies, but we have to wait until
the last users of the original NO_IDLE_HZ code are converted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The time keeping code move to kernel/time/timekeeping.c broke the
clocksource resume logic patch, which got applied to the old file by a
fuzzy application. Fix it up and move the clocksource_resume() call to
the appropriate place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ tssk, tssk, everybody should use --fuzz=0 ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 09-05-2007 21:10, Pallipadi, Venkatesh wrote:
...
> On a 64 bit system, converting pointer to int causes unnecessary
> compiler warning, and intermediate long conversion was to avoid that.
> I will have to rephrase my comment to remove 32 bit value and use int,
> as that is what the function returns.
So, this patch reverts all changes done by my previous patch.
I apologize for my wrong comment about "logical error" here.
Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Satyam Sharma <satyam.sharma@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to make sure that the clocksources are resumed, when timekeeping is
resumed. The current resume logic does not guarantee this.
Add a resume function pointer to the clocksource struct, so clocksource
drivers which need to reinitialize the clocksource can provide a resume
function.
Add a resume function, which calls the maybe available clocksource resume
functions and resets the watchdog function, so a stable TSC can be used
accross suspend/resume.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since nonboot CPUs are now disabled after tasks and devices have been
frozen and the CPU hotplug infrastructure is used for this purpose, we need
special CPU hotplug notifications that will help the CPU-hotplug-aware
subsystems distinguish normal CPU hotplug events from CPU hotplug events
related to a system-wide suspend or resume operation in progress. This
patch introduces such notifications and causes them to be used during
suspend and resume transitions. It also changes all of the
CPU-hotplug-aware subsystems to take these notifications into consideration
(for now they are handled in the same way as the corresponding "normal"
ones).
[oleg@tv-sign.ru: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many places in the kernel where the construction like
foo = list_entry(head->next, struct foo_struct, list);
are used.
The code might look more descriptive and neat if using the macro
list_first_entry(head, type, member) \
list_entry((head)->next, type, member)
Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the timekeeping code out of kernel/timer.c and into
kernel/time/timekeeping.c. I made no cleanups or other changes in transit.
[akpm@linux-foundation.org: build fix]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new flag for timers - deferrable: Timers that work normally
when system is busy. But, will not cause CPU to come out of idle (just to
service this timer), when CPU is idle. Instead, this timer will be
serviced when CPU eventually wakes up with a subsequent non-deferrable
timer.
The main advantage of this is to avoid unnecessary timer interrupts when
CPU is idle. If the routine currently called by a timer can wait until
next event without any issues, this new timer can be used to setup timer
event for that routine. This, with dynticks, allows CPUs to be lazy,
allowing them to stay in idle for extended period of time by reducing
unnecesary wakeup and thereby reducing the power consumption.
This patch:
Builds this new timer on top of existing timer infrastructure. It uses
last bit in 'base' pointer of timer_list structure to store this deferrable
timer flag. __next_timer_interrupt() function skips over these deferrable
timers when CPU looks for next timer event for which it has to wake up.
This is exported by a new interface init_timer_deferrable() that can be
called in place of regular init_timer().
[akpm@linux-foundation.org: Privatise a #define]
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Export try_to_del_timer_sync() for use by the AF_RXRPC module.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Soeren Sonnenburg reported that upon resume he is getting
this backtrace:
[<c0119637>] smp_apic_timer_interrupt+0x57/0x90
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0104d30>] apic_timer_interrupt+0x28/0x30
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0140068>] __kfifo_put+0x8/0x90
[<c0130fe5>] on_each_cpu+0x35/0x60
[<c0143538>] clock_was_set+0x18/0x20
[<c0135cdc>] timekeeping_resume+0x7c/0xa0
[<c02aabe1>] __sysdev_resume+0x11/0x80
[<c02ab0c7>] sysdev_resume+0x47/0x80
[<c02b0b05>] device_power_up+0x5/0x10
it turns out that on resume we mistakenly re-enable interrupts too
early. Do the timer retrigger only on the current CPU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Soeren Sonnenburg <kernel@nn7.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The rework of next_timer_interrupt() fixed the timer wheel bugs, but
invented a rounding error versus the next hrtimer event. This is caused
by the conversion of the hrtimer internal representation to relative
jiffies.
This causes bug #8100:
http://bugzilla.kernel.org/show_bug.cgi?id=8100
next_timer_interrupt() returns "now" in such a case and causes the code
in tick_nohz_stop_sched_tick() to trigger the timer softirq, which is
bogus as no timer is due for expiry. This results in an endless context
switching between idle and ksoftirqd until a timer is due for expiry.
Modify the hrtimer evaluation so that, it returns now + 1, when the
conversion results in a delta < 1 jiffie.
It's confirmed to resolve bug #8100
Reported-by: Emil Karlson <jkarlson@cc.hut.fi>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I've only seen this on x86_64.
The vsyscall state only gets updated when a timer interrupts comes in. So
if the time is set long before the next timer, there will be a period when
a gettimeofday() won't reflect the correct time.
I added an explicit update_vsyscall() during the settimeofday(), that way
the vsyscall state doesn't get stale.
Signed-off-by: Daniel Walker <dwalker@mvista.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The programming of periodic tick devices needs to be saved/restored
across suspend/resume - otherwise we might end up with a system coming
up that relies on getting a PIT (or HPET) interrupt, while those devices
default to 'no interrupts' after powerup. (To confuse things it worked
to a certain degree on some systems because the lapic gets initialized
as a side-effect of SMP bootup.)
This suspend / resume thing was dropped unintentionally during the
last-minute -mm code reshuffling.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doing something like this on a two cpu system
# echo 0 > /sys/devices/system/cpu/cpu0/online
# echo 1 > /sys/devices/system/cpu/cpu0/online
# echo 0 > /sys/devices/system/cpu/cpu1/online
will give me this:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.21-rc2-g562aa1d4-dirty #7
-------------------------------------------------------
bash/1282 is trying to acquire lock:
(&cpu_base->lock_key){.+..}, at: [<000000000005f17e>] hrtimer_cpu_notify+0xc6/0x240
but task is already holding lock:
(&cpu_base->lock_key#2){.+..}, at: [<000000000005f174>] hrtimer_cpu_notify+0xbc/0x240
which lock already depends on the new lock.
This happens because we have the following code in kernel/hrtimer.c:
migrate_hrtimers(int cpu)
[...]
old_base = &per_cpu(hrtimer_bases, cpu);
new_base = &get_cpu_var(hrtimer_bases);
[...]
spin_lock(&new_base->lock);
spin_lock(&old_base->lock);
Which means the spinlocks are taken in an order which depends on which cpu
gets shut down from which other cpu. Therefore lockdep complains that there
might be an ABBA deadlock. Since migrate_hrtimers() gets only called on
cpu hotplug it's safe to assume that it isn't executed concurrently on a
The same problem exists in kernel/timer.c: migrate_timers().
As pointed out by Christian Borntraeger one possible solution to avoid
the locking order complaints would be to make sure that the locks are
always taken in the same order. E.g. by taking the lock of the cpu with
the lower number first.
To achieve this we introduce two new spinlock functions double_spin_lock
and double_spin_unlock which lock or unlock two locks in a given order.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Christian Borntraeger <cborntra@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>