Commit Graph

1249 Commits

Author SHA1 Message Date
Thomas Gleixner
6053479cbb tick/broadcast: Prevent NULL pointer dereference
commit c1a9eeb938b5433947e5ea22f89baff3182e7075 upstream.

When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.

If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.

Reported-by: Mason <slash.tmp@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Richard Cochran <rcochran@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Sebastian Frias <sf84@laposte.net>
Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-12 11:22:51 +01:00
Thomas Gleixner
e01b04be3e timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
commit 9c1645727b8fa90d07256fdfcc45bf831242a3ab upstream.

The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:

 - Time jumps backwards

 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycle_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:43 +01:00
John Stultz
78c7b55b36 timekeeping: Fix __ktime_get_fast_ns() regression
commit 58bfea9532552d422bde7afa207e1a0f08dffa7d upstream.

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg <bgregg@netflix.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-and-reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:14 +02:00
Christopher S. Hall
9b57d91c03 time: Add cycles to nanoseconds translation
commit 6bd58f09e1d8cc6c50a824c00bf0d617919986a1 upstream.

The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:14 +02:00
John Stultz
4eca11dbd2 timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
commit 27727df240c7cc84f2ba6047c6f18d5addfd25ef upstream.

When I added some extra sanity checking in timekeeping_get_ns() under
CONFIG_DEBUG_TIMEKEEPING, I missed that the NMI safe __ktime_get_fast_ns()
method was using timekeeping_get_ns().

Thus the locking added to the debug checks broke the NMI-safety of
__ktime_get_fast_ns().

This patch open-codes the timekeeping_get_ns() logic for
__ktime_get_fast_ns(), so can avoid any deadlocks in NMI.

Fixes: 4ca22c2648 "timekeeping: Add warnings when overflows or underflows are observed"
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1471993702-29148-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:52 +02:00
John Stultz
42ef9015e0 timekeeping: Cap array access in timekeeping_debug
commit a4f8f6667f099036c88f231dcad4cf233652c824 upstream.

It was reported that hibernation could fail on the 2nd attempt, where the
system hangs at hibernate() -> syscore_resume() -> i8237A_resume() ->
claim_dma_lock(), because the lock has already been taken.

However there is actually no other process would like to grab this lock on
that problematic platform.

Further investigation showed that the problem is triggered by setting
/sys/power/pm_trace to 1 before the 1st hibernation.

Since once pm_trace is enabled, the rtc becomes unmeaningful after suspend,
and meanwhile some BIOSes would like to adjust the 'invalid' RTC (e.g, smaller
than 1970) to the release date of that motherboard during POST stage, thus
after resumed, it may seem that the system had a significant long sleep time
which is a completely meaningless value.

Then in timekeeping_resume -> tk_debug_account_sleep_time, if the bit31 of the
sleep time happened to be set to 1, fls() returns 32 and we add 1 to
sleep_time_bin[32], which causes an out of bounds array access and therefor
memory being overwritten.

As depicted by System.map:
0xffffffff81c9d080 b sleep_time_bin
0xffffffff81c9d100 B dma_spin_lock
the dma_spin_lock.val is set to 1, which caused this problem.

This patch adds a sanity check in tk_debug_account_sleep_time()
to ensure we don't index past the sleep_time_bin array.

[jstultz: Problem diagnosed and original patch by Chen Yu, I've solved the
 issue slightly differently, but borrowed his excelent explanation of the
 issue here.]

Fixes: 5c83545f24 "power: Add option to log time spent in suspend"
Reported-by: Janek Kozicki <cosurgi@gmail.com>
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xunlei Pang <xpang@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Zhang Rui <rui.zhang@intel.com>
Link: http://lkml.kernel.org/r/1471993702-29148-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:52 +02:00
Vitaly Kuznetsov
a2350f3d82 clocksource: Allow unregistering the watchdog
[ Upstream commit bbf66d897adf2bb0c310db96c97e8db6369f39e1 ]

Hyper-V vmbus module registers TSC page clocksource when loaded. This is
the clocksource with the highest rating and thus it becomes the watchdog
making unloading of the vmbus module impossible.
Separate clocksource_select_watchdog() from clocksource_enqueue_watchdog()
and use it on clocksource register/rating change/unregister.

After all, lobotomized monkeys may need some love too.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1453483913-25672-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:47 +02:00
John Stultz
1db396648c ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
[ Upstream commit dd4e17ab704269bce71402285f5e8b9ac24b1eff ]

Recently, in commit 37cf4dc3370f I forgot to check if the timeval being passed
was actually a timespec (as is signaled with ADJ_NANO).

This resulted in that patch breaking ADJ_SETOFFSET users who set
ADJ_NANO, by rejecting valid timespecs that were compared with
valid timeval ranges.

This patch addresses this by checking for the ADJ_NANO flag and
using the timepsec check instead in that case.

Reported-by: Harald Hoyer <harald@redhat.com>
Reported-by: Kay Sievers <kay@vrfy.org>
Fixes: 37cf4dc3370f "time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow"
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1453417415-19110-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:47 +02:00
John Stultz
e79e7333c3 time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow
[ Upstream commit 37cf4dc3370fbca0344e23bb96446eb2c3548ba7 ]

For adjtimex()'s ADJ_SETOFFSET, make sure the tv_usec value is
sane. We might multiply them later which can cause an overflow
and undefined behavior.

This patch introduces new helper functions to simplify the
checking code and adds comments to clarify

Orginally this patch was by Sasha Levin, but I've basically
rewritten it, so he should get credit for finding the issue
and I should get the blame for any mistakes made since.

Also, credit to Richard Cochran for the phrasing used in the
comment for what is considered valid here.

Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:47 +02:00
Marc Zyngier
4fc2942b6e hrtimer: Catch illegal clockids
[ Upstream commit 9006a01829a50cfd6bbd4980910ed46e895e93d7 ]

It is way too easy to take any random clockid and feed it to
the hrtimer subsystem. At best, it gets mapped to a monotonic
base, but it would be better to just catch illegal values as
early as possible.

This patch does exactly that, mapping illegal clockids to an
illegal base index, and panicing when we detect the illegal
condition.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Link: http://lkml.kernel.org/r/1452879670-16133-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:44 +02:00
Alexey Dobriyan
470f47fcf2 posix_cpu_timer: Exit early when process has been reaped
commit 2c13ce8f6b2f6fd9ba2f9261b1939fc0f62d1307 upstream.

Variable "now" seems to be genuinely used unintialized
if branch

	if (CPUCLOCK_PERTHREAD(timer->it_clock)) {

is not taken and branch

	if (unlikely(sighand == NULL)) {

is taken. In this case the process has been reaped and the timer is marked as
disarmed anyway. So none of the postprocessing of the sample is
required. Return right away.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-10 11:49:29 +02:00
Wanpeng Li
d024d46ec5 tick/nohz: Set the correct expiry when switching to nohz/lowres mode
commit 1ca8ec532fc2d986f1f4a319857bb18e0c9739b4 upstream.

commit 0ff53d0964 sets the next tick interrupt to the last jiffies update,
i.e. in the past, because the forward operation is invoked before the set
operation. There is no resulting damage (yet), but we get an extra pointless
tick interrupt.

Revert the order so we get the next tick interrupt in the future.

Fixes: commit 0ff53d0964 "tick: sched: Force tick interrupt and get rid of softirq magic"
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1453893967-3458-1-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:26 -08:00
David Gibson
972e9e3c7f time: Avoid signed overflow in timekeeping_get_ns()
commit 35a4933a895927990772ae96fdcfd2f806929ee2 upstream.

1e75fa8 "time: Condense timekeeper.xtime into xtime_sec" replaced a call to
clocksource_cyc2ns() from timekeeping_get_ns() with an open-coded version
of the same logic to avoid keeping a semi-redundant struct timespec
in struct timekeeper.

However, the commit also introduced a subtle semantic change - where
clocksource_cyc2ns() uses purely unsigned math, the new version introduces
a signed temporary, meaning that if (delta * tk->mult) has a 63-bit
overflow the following shift will still give a negative result.  The
choice of 'maxsec' in __clocksource_updatefreq_scale() means this will
generally happen if there's a ~10 minute pause in examining the
clocksource.

This can be triggered on a powerpc KVM guest by stopping it from qemu for
a bit over 10 minutes.  After resuming time has jumped backwards several
minutes causing numerous problems (jiffies does not advance, msleep()s can
be extended by minutes..).  It doesn't happen on x86 KVM guests, because
the guest TSC is effectively frozen while the guest is stopped, which is
not the case for the powerpc timebase.

Obviously an unsigned (64 bit) overflow will only take twice as long as a
signed, 63-bit overflow.  I don't know the time code well enough to know
if that will still cause incorrect calculations, or if a 64-bit overflow
is avoided elsewhere.

Still, an incorrect forwards clock adjustment will cause less trouble than
time going backwards.  So, this patch removes the potential for
intermediate signed overflow.

Suggested-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:16 -08:00
Richard Cochran
b8175b171f posix-clock: Fix return code on the poll method's error path
commit 1b9f23727abb92c5e58f139e7d180befcaa06fe0 upstream.

The posix_clock_poll function is supposed to return a bit mask of
POLLxxx values.  However, in case the hardware has disappeared (due to
hot plugging for example) this code returns -ENODEV in a futile
attempt to throw an error at the file descriptor level.  The kernel's
file_operations interface does not accept such error codes from the
poll method.  Instead, this function aught to return POLLERR.

The value -ENODEV does, in fact, contain the POLLERR bit (and almost
all the other POLLxxx bits as well), but only by chance.  This patch
fixes code to return a proper bit mask.

Credit goes to Markus Elfring for pointing out the suspicious
signed/unsigned mismatch.

Reported-by: Markus Elfring <elfring@users.sourceforge.net>
igned-off-by: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Link: http://lkml.kernel.org/r/1450819198-17420-1-git-send-email-richardcochran@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:15 -08:00
Thomas Gleixner
82e730baa9 itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper
commit 51cbb5242a41700a3f250ecfb48dcfb7e4375ea4 upstream.

As Helge reported for timerfd we have the same issue in itimers. We return
remaining time larger than the programmed relative time to user space in case
of CONFIG_TIME_LOW_RES=y. Use the proper function to adjust the extra time
added in hrtimer_start_range_ns().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Helge Deller <deller@gmx.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.528222587@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:25 -08:00
Thomas Gleixner
1c94da3e74 posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper
commit 572c39172684c3711e4a03c9a7380067e2b0661c upstream.

As Helge reported for timerfd we have the same issue in posix timers. We
return remaining time larger than the programmed relative time to user space
in case of CONFIG_TIME_LOW_RES=y. Use the proper function to adjust the extra
time added in hrtimer_start_range_ns().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Helge Deller <deller@gmx.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.450510905@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:25 -08:00
Thomas Gleixner
dd0d511548 hrtimer: Handle remaining time proper for TIME_LOW_RES
commit 203cbf77de59fc8f13502dcfd11350c6d4a5c95f upstream.

If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.

Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.

Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.

Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-17 12:30:57 -08:00
Linus Torvalds
511601bdbc Merge branches 'irq-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq and timer fixes from Thomas Gleixner:

 - An irq regression fix to restore the wakeup behaviour of chained
   interrupts.

 - A timer fix for a long standing race versus timers scheduled on a
   target cpu which got exposed by recent changes in the workqueue
   implementation.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/PM: Restore system wake up from chained interrupts

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Use proper base migration in add_timer_on()
2015-11-15 09:30:48 -08:00
Andrew Morton
79211c8ed1 remove abs64()
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Tejun Heo
22b886dd10 timers: Use proper base migration in add_timer_on()
Regardless of the previous CPU a timer was on, add_timer_on()
currently simply sets timer->flags to the new CPU.  As the caller must
be seeing the timer as idle, this is locally fine, but the timer
leaving the old base while unlocked can lead to race conditions as
follows.

Let's say timer was on cpu 0.

  cpu 0					cpu 1
  -----------------------------------------------------------------------------
  del_timer(timer) succeeds
					del_timer(timer)
					  lock_timer_base(timer) locks cpu_0_base
  add_timer_on(timer, 1)
    spin_lock(&cpu_1_base->lock)
    timer->flags set to cpu_1_base
    operates on @timer			  operates on @timer

This triggered with mod_delayed_work_on() which contains
"if (del_timer()) add_timer_on()" sequence eventually leading to the
following oops.

  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
  ...
  Workqueue: wqthrash wqthrash_workfunc [wqthrash]
  task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000
  RIP: 0010:[<ffffffff810ca6e9>]  [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0
  ...
  Call Trace:
   [<ffffffff810cb0b4>] del_timer+0x44/0x60
   [<ffffffff8106e836>] try_to_grab_pending+0xb6/0x160
   [<ffffffff8106e913>] mod_delayed_work_on+0x33/0x80
   [<ffffffffa0000081>] wqthrash_workfunc+0x61/0x90 [wqthrash]
   [<ffffffff8106dba8>] process_one_work+0x1e8/0x650
   [<ffffffff8106e05e>] worker_thread+0x4e/0x450
   [<ffffffff810746af>] kthread+0xef/0x110
   [<ffffffff8185980f>] ret_from_fork+0x3f/0x70

Fix it by updating add_timer_on() to perform proper migration as
__mod_timer() does.

Reported-and-tested-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chris Worley <chris.worley@primarydata.com>
Cc: bfields@fieldses.org
Cc: Michael Skralivetsky <michael.skralivetsky@primarydata.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: kernel-team@fb.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151029103113.2f893924@tlielax.poochiereds.net
Link: http://lkml.kernel.org/r/20151104171533.GI5749@mtj.duckdns.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-04 20:23:19 +01:00
Linus Torvalds
7b2a4306f9 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer departement provides:

   - More y2038 work in the area of ntp and pps.

   - Optimization of posix cpu timers

   - New time related selftests

   - Some new clocksource drivers

   - The usual pile of fixes, cleanups and improvements"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  timeconst: Update path in comment
  timers/x86/hpet: Type adjustments
  clocksource/drivers/armada-370-xp: Implement ARM delay timer
  clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
  clocksource/drivers/imx: Allow timer irq affinity change
  clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
  clocksource/drivers/h8300_*: Remove unneeded memset()s
  clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
  clocksource/drivers/em_sti: Remove unneeded memset()s
  clocksource/drivers/mediatek: Use GPT as sched clock source
  clockevents/drivers/mtk: Fix spurious interrupt leading to crash
  posix_cpu_timer: Reduce unnecessary sighand lock contention
  posix_cpu_timer: Convert cputimer->running to bool
  posix_cpu_timer: Check thread timers only when there are active thread timers
  posix_cpu_timer: Optimize fastpath_timer_check()
  timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
  timers: Use __fls in apply_slack()
  clocksource: Remove return statement from void functions
  net: sfc: avoid using timespec
  ntp/pps: use y2038 safe types in pps_event_time
  ...
2015-11-03 14:13:41 -08:00
Jason A. Donenfeld
03f136a207 timeconst: Update path in comment
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: hofrat@osadl.org
Link: http://lkml.kernel.org/r/1436894685-5868-1-git-send-email-Jason@zx2c4.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-26 10:06:06 +09:00
Thomas Gleixner
b2c280bdd6 Merge branch 'fortglx/4.4/time' of https://git.linaro.org/people/john.stultz/linux into timers/core
Time updates from John Stultz:

     - More 2038 work from Arnd Bergmann around ntp and pps
2015-10-20 12:36:37 +02:00
Thomas Gleixner
56fd16caba timekeeping: Increment clock_was_set_seq in timekeeping_init()
timekeeping_init() can set the wall time offset, so we need to
increment the clock_was_set_seq counter. That way hrtimers will pick
up the early offset immediately. Otherwise on a machine which does not
set wall time later in the boot process the hrtimer offset is stale at
0 and wall time timers are going to expire with a delay of 45 years.

Fixes: 868a3e915f "hrtimer: Make offset update smarter"
Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Stefan Liebler <stli@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
2015-10-16 15:50:22 +02:00
Jason Low
c8d75aa47d posix_cpu_timer: Reduce unnecessary sighand lock contention
It was found while running a database workload on large systems that
significant time was spent trying to acquire the sighand lock.

The issue was that whenever an itimer expired, many threads ended up
simultaneously trying to send the signal. Most of the time, nothing
happened after acquiring the sighand lock because another thread
had just already sent the signal and updated the "next expire" time.
The fastpath_timer_check() didn't help much since the "next expire"
time was updated after the threads exit fastpath_timer_check().

This patch addresses this by having the thread_group_cputimer structure
maintain a boolean to signify when a thread in the group is already
checking for process wide timers, and adds extra logic in the fastpath
to check the boolean.

Signed-off-by: Jason Low <jason.low2@hp.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: George Spelvin <linux@horizon.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: hideaki.kimura@hpe.com
Cc: terry.rudd@hpe.com
Cc: scott.norton@hpe.com
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1444849677-29330-5-git-send-email-jason.low2@hp.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-15 11:23:41 +02:00