Commit Graph

1026 Commits

Author SHA1 Message Date
Preeti U Murthy
a127d2bcf1 timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop
The hrtimer mode of broadcast queues hrtimers in the idle entry
path so as to wakeup cpus in deep idle states. The associated
call graph is :

	cpuidle_idle_call()
	|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
	     |_____tick_broadcast_set_event()
		   |____clockevents_program_event()
			|____bc_set_next()

The hrtimer_{start/cancel} functions call into tracing which uses RCU.
But it is not legal to call into RCU in cpuidle because it is one of the
quiescent states. Hence protect this region with RCU_NONIDLE which informs
RCU that the cpu is momentarily non-idle.

As an aside it is helpful to point out that the clock event device that is
programmed here is not a per-cpu clock device; it is a
pseudo clock device, used by the broadcast framework alone.
The per-cpu clock device programming never goes through bc_set_next().

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linuxppc-dev@ozlabs.org
Cc: mpe@ellerman.id.au
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:50:05 +01:00
Linus Torvalds
f3c233d75e Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ntp fix from Ingo Molnar:
 "An adjtimex interface regression fix for 32-bit systems"

[ A check that was added in a previous commit is really only a concern
  for 64bit systems, but was applied to both 32 and 64bit systems, which
  results in breaking 32bit systems.

  Thus the fix here is to make the check only apply to 64bit systems ]

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ntp: Fixup adjtimex freq validation on 32-bit systems
2015-02-21 11:05:22 -08:00
John Stultz
29183a70b0 ntp: Fixup adjtimex freq validation on 32-bit systems
Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367 (time: adjtimex: Validate the ADJ_FREQUENCY values)

Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.

ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.

See bugs:

  https://bugzilla.kernel.org/show_bug.cgi?id=92481
  https://bugzilla.redhat.com/show_bug.cgi?id=1188074

This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.

Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: George Joseph <george.joseph@fairview5.com>
Tested-by: George Joseph <george.joseph@fairview5.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.19+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 14:50:10 +01:00
Linus Torvalds
99fa0ad92c Merge tag 'suspend-to-idle-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull suspend-to-idle updates from Rafael Wysocki:
 "Suspend-to-idle timer quiescing support for v3.20-rc1

  Until now suspend-to-idle has not been able to save much more energy
  than runtime PM because of timer interrupts that periodically bring
  CPUs out of idle while they are waiting for a wakeup interrupt.  Of
  course, the timer interrupts are not wakeup ones, so the handling of
  them can be deferred until a real wakeup interrupt happens, but at the
  same time we don't want to mass-expire timers at that point.

  The solution is to suspend the entire timekeeping when the last CPU is
  entering an idle state and resume it when the first CPU goes out of
  idle.  That has to be done with care, though, so as to avoid accessing
  suspended clocksources etc.  end we need extra support from idle
  drivers for that.

  This series of commits adds support for quiescing timers during
  suspend-to-idle and adds the requisite callbacks to intel_idle and the
  ACPI cpuidle driver"

* tag 'suspend-to-idle-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / idle: Implement ->enter_freeze callback routine
  intel_idle: Add ->enter_freeze callbacks
  PM / sleep: Make it possible to quiesce timers during suspend-to-idle
  timekeeping: Make it safe to use the fast timekeeper while suspended
  timekeeping: Pass readout base to update_fast_timekeeper()
  PM / sleep: Re-implement suspend-to-idle handling
2015-02-17 14:17:51 -08:00
Rafael J. Wysocki
124cf9117c PM / sleep: Make it possible to quiesce timers during suspend-to-idle
The efficiency of suspend-to-idle depends on being able to keep CPUs
in the deepest available idle states for as much time as possible.
Ideally, they should only be brought out of idle by system wakeup
interrupts.

However, timer interrupts occurring periodically prevent that from
happening and it is not practical to chase all of the "misbehaving"
timers in a whack-a-mole fashion.  A much more effective approach is
to suspend the local ticks for all CPUs and the entire timekeeping
along the lines of what is done during full suspend, which also
helps to keep suspend-to-idle and full suspend reasonably similar.

The idea is to suspend the local tick on each CPU executing
cpuidle_enter_freeze() and to make the last of them suspend the
entire timekeeping.  That should prevent timer interrupts from
triggering until an IO interrupt wakes up one of the CPUs.  It
needs to be done with interrupts disabled on all of the CPUs,
though, because otherwise the suspended clocksource might be
accessed by an interrupt handler which might lead to fatal
consequences.

Unfortunately, the existing ->enter callbacks provided by cpuidle
drivers generally cannot be used for implementing that, because some
of them re-enable interrupts temporarily and some idle entry methods
cause interrupts to be re-enabled automatically on exit.  Also some
of these callbacks manipulate local clock event devices of the CPUs
which really shouldn't be done after suspending their ticks.

To overcome that difficulty, introduce a new cpuidle state callback,
->enter_freeze, that will be guaranteed (1) to keep interrupts
disabled all the time (and return with interrupts disabled) and (2)
not to touch the CPU timer devices.  Modify cpuidle_enter_freeze() to
look for the deepest available idle state with ->enter_freeze present
and to make the CPU execute that callback with suspended tick (and the
last of the online CPUs to execute it with suspended timekeeping).

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-02-15 19:40:09 +01:00
Rafael J. Wysocki
060407aed5 timekeeping: Make it safe to use the fast timekeeper while suspended
Theoretically, ktime_get_mono_fast_ns() may be executed after
timekeeping has been suspended (or before it is resumed) which
in turn may lead to undefined behavior, for example, when the
clocksource read from timekeeping_get_ns() called by it is
not accessible at that time.

Prevent that from happening by setting up a dummy readout base for
the fast timekeeper during timekeeping_suspend() such that it will
always return the same number of cycles.

After the last timekeeping_update() in timekeeping_suspend() the
clocksource is read and the result is stored as cycles_at_suspend.
The readout base from the current timekeeper is copied onto the
dummy and the ->read pointer of the dummy is set to a routine
unconditionally returning cycles_at_suspend.  Next, the dummy is
passed to update_fast_timekeeper().

Then, ktime_get_mono_fast_ns() will work until the subsequent
timekeeping_resume() and the proper readout base for the fast
timekeeper will be restored by the timekeeping_update() called
right after clearing timekeeping_suspended.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2015-02-15 19:39:40 +01:00
Tejun Heo
ffda22c1f3 time: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:37 -08:00
Rafael J. Wysocki
affe3e85ae timekeeping: Pass readout base to update_fast_timekeeper()
Modify update_fast_timekeeper() to take a struct tk_read_base
pointer as its argument (instead of a struct timekeeper pointer)
and update its kerneldoc comment to reflect that.

That will allow a struct tk_read_base that is not part of a
struct timekeeper to be passed to it in the next patch.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
2015-02-13 23:49:36 +01:00
Andy Lutomirski
f56141e3e2 all arches, signal: move restart_block to struct task_struct
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target.  This is because the
restart_block is held in the same memory allocation as the kernel stack.

Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.

Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.

It's also a decent simplification, since the restart code is more or less
identical on all architectures.

[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:12 -08:00
Linus Torvalds
c5ce28df0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
      wrong, and this pull actually adds an extra commit on top of the
      branch I'm pulling to fix that up, so that the pre-merge state is
      ok.   - Linus ]

 2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation.  From Alexander Duyck.

 3) Remove sock_iocb altogether, from CHristoph Hellwig.

 4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

 7) Add xmit_more support to r8169, e1000, and e1000e drivers.  From
    Florian Westphal.

 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

 9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms.  From Neal Cardwell.

13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

14) Support xmit_more in be2net, from Sathya Perla.

15) Group Policy extensions for vxlan, from Thomas Graf.

16) Remove Checksum Offload support for vxlan, from Tom Herbert.

17) Like ipv4, support lockless transmit over ipv6 UDP sockets.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
  crypto: fix af_alg_make_sg() conversion to iov_iter
  ipv4: Namespecify TCP PMTU mechanism
  i40e: Fix for stats init function call in Rx setup
  tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
  openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
  ipv6: Make __ipv6_select_ident static
  ipv6: Fix fragment id assignment on LE arches.
  bridge: Fix inability to add non-vlan fdb entry
  net: Mellanox: Delete unnecessary checks before the function call "vunmap"
  cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
  ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
  net: dsa: Remove redundant phy_attach()
  IB/mlx4: Reset flow support for IB kernel ULPs
  IB/mlx4: Always use the correct port for mirrored multicast attachments
  net/bonding: Fix potential bad memory access during bonding events
  tipc: remove tipc_snprintf
  tipc: nl compat add noop and remove legacy nl framework
  tipc: convert legacy nl stats show to nl compat
  tipc: convert legacy nl net id get to nl compat
  tipc: convert legacy nl net id set to nl compat
  ...
2015-02-10 20:01:30 -08:00
Linus Torvalds
0ba97bc4b4 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "The main changes in this cycle were:

   - rework hrtimer expiry calculation in hrtimer_interrupt(): the
     previous code had a subtle bug where expiry caching would miss an
     expiry, resulting in occasional bogus (late) expiry of hrtimers.

   - continuing Y2038 fixes

   - ktime division optimization

   - misc smaller fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Make __hrtimer_get_next_event() static
  rtc: Convert rtc_set_ntp_time() to use timespec64
  rtc: Remove redundant rtc_valid_tm() from rtc_hctosys()
  rtc: Modify rtc_hctosys() to address y2038 issues
  rtc: Update rtc-dev to use y2038-safe time interfaces
  rtc: Update interface.c to use y2038-safe time interfaces
  time: Expose get_monotonic_boottime64 for in-kernel use
  time: Expose getboottime64 for in-kernel uses
  ktime: Optimize ktime_divns for constant divisors
  hrtimer: Prevent stale expiry time in hrtimer_interrupt()
  ktime.h: Introduce ktime_ms_delta
2015-02-09 16:33:07 -08:00
John Stultz
2d926c15d6 hrtimer: Fix incorrect tai offset calculation for non high-res timer systems
I noticed some CLOCK_TAI timer test failures on one of my
less-frequently used configurations. And after digging in I
found in 76f4108892 (Cleanup hrtimer accessors to the
timekepeing state), the hrtimer_get_softirq_time tai offset
calucation was incorrectly rewritten, as the tai offset we
return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME.

This results in CLOCK_TAI timers expiring early on non-highres
capable machines.

This patch fixes the issue, calculating the tai time properly
from the monotonic base.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable <stable@vger.kernel.org> # 3.17+
Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-05 08:39:37 +01:00
David S. Miller
95f873f2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 16:59:56 -08:00
Linus Torvalds
b73f0c8f4b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A set of small fixes:

   - regression fix for exynos_mct clocksource

   - trivial build fix for kona clocksource

   - functional one liner fix for the sh_tmu clocksource

   - two validation fixes to prevent (root only) data corruption in the
     kernel via settimeofday and adjtimex.  Tagged for stable"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: adjtimex: Validate the ADJ_FREQUENCY values
  time: settimeofday: Validate the values of tv from user
  clocksource: sh_tmu: Set cpu_possible_mask to fix SMP broadcast
  clocksource: kona: fix __iomem annotation
  clocksource: exynos_mct: Fix bitmask regression for exynos4_mct_write
2015-01-25 17:47:34 -08:00
kbuild test robot
4ebbda5251 hrtimer: Make __hrtimer_get_next_event() static
kernel/time/hrtimer.c:444:9: sparse: symbol '__hrtimer_get_next_event' was not declared. Should it be static?

Fixes: 9bc7491906 hrtimer: Prevent stale expiry time in hrtimer_interrupt()
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Link: http://lkml.kernel.org/r/20150123121206.GA4766@snb
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-24 10:53:36 +01:00
Thomas Gleixner
fe31fca35d Merge tag 'fortglx-3.20-time' of https://git.linaro.org/people/john.stultz/linux into timers/core
Pull time updates from John Stultz for 3.20:

 * ktime division optimization
 * Expose a few more y2038-safe timekeeping interfaces
 * RTC core changes to address y2038
2015-01-24 10:11:12 +01:00
Xunlei Pang
9a4a445e30 rtc: Convert rtc_set_ntp_time() to use timespec64
rtc_set_ntp_time() uses timespec which is y2038-unsafe,
so modify to use timespec64 which is y2038-safe, then
replace rtc_time_to_tm() with rtc_time64_to_tm().

Also adjust all its call sites(only NTP uses it) accordingly.

Cc: pang.xunlei <pang.xunlei@linaro.org>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-01-23 17:21:57 -08:00
John Stultz
d08c0cdd26 time: Expose getboottime64 for in-kernel uses
Adds a timespec64 based getboottime64() implementation
that can be used as we convert internal users of
getboottime away from using timespecs.

Cc: pang.xunlei <pang.xunlei@linaro.org>
Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-01-23 17:21:54 -08:00
Nicolas Pitre
8b618628b2 ktime: Optimize ktime_divns for constant divisors
At least on ARM, do_div() is optimized to turn constant divisors into
an inline multiplication by the reciprocal value at compile time.
However this optimization is missed entirely whenever ktime_divns() is
used and the slow out-of-line division code is used all the time.

Let ktime_divns() use do_div() inline whenever the divisor is constant
and small enough.  This will make things like ktime_to_us() and
ktime_to_ms() much faster.

Cc: Arnd Bergmann <arnd.bergmann@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Nicolas Pitre <nico@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-01-23 17:21:31 -08:00
Thomas Gleixner
9bc7491906 hrtimer: Prevent stale expiry time in hrtimer_interrupt()
hrtimer_interrupt() has the following subtle issue:

hrtimer_interrupt()
  lock(cpu_base);
  expires_next = KTIME_MAX;

  expire_timers(CLOCK_MONOTONIC);
  expires = get_next_timer(CLOCK_MONOTONIC);
  if (expires < expires_next)
    expires_next = expires;

  expire_timers(CLOCK_REALTIME);
    unlock(cpu_base);
    wakeup()
    hrtimer_start(CLOCK_MONOTONIC, newtimer);
    lock(cpu_base();  
  expires = get_next_timer(CLOCK_REALTIME);
  if (expires < expires_next)
    expires_next = expires;

So because we already evaluated the next expiring timer of
CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
earlier than the overall next expiry time in hrtimer_interrupt().

To solve this, remove the caching of the next expiry value from
hrtimer_interrupt() and reevaluate all active clock bases for the next
expiry value. To avoid another code duplication, create a shared
evaluation function and use it for hrtimer_get_next_event(),
hrtimer_force_reprogram() and hrtimer_interrupt().

There is another subtlety in this mechanism:

While hrtimer_interrupt() is running, we want to avoid to touch the
hardware device because we will reprogram it anyway at the end of
hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
via the HRTIMER_RESTART mechanism, because we drop out when the
callback on that CPU is running. But that fails, if a new timer gets
enqueued like in the example above.

This has another implication: While hrtimer_interrupt() is running we
refuse remote enqueueing of timers - see hrtimer_interrupt() and
hrtimer_check_target().

hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
to KTIME_MAX, but that fails if a new timer gets queued.

Prevent both the hardware access and the remote enqueue
explicitely. We can loosen the restriction on the remote enqueue now
due to reevaluation of the next expiry value, but that needs a
seperate patch.

Folded in a fix from Vignesh Radhakrishnan.

Reported-and-tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Based-on-patch-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vigneshr@codeaurora.org
Cc: john.stultz@linaro.org
Cc: viresh.kumar@linaro.org
Cc: fweisbec@gmail.com
Cc: cl@linux.com
Cc: stuart.w.hayes@gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-23 12:13:20 +01:00
Thomas Gleixner
5fbaba8603 Merge branch 'fortglx/3.19-stable/time' of https://git.linaro.org/people/john.stultz/linux into timers/urgent
Pull urgent fixes from John Stultz:

  Two urgent fixes for user triggerable time related overflow issues
2015-01-22 12:28:02 +01:00
Sasha Levin
5e5aeb4367 time: adjtimex: Validate the ADJ_FREQUENCY values
Verify that the frequency value from userspace is valid and makes sense.

Unverified values can cause overflows later on.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-01-07 09:50:32 -08:00
Sasha Levin
6ada1fc0e1 time: settimeofday: Validate the values of tv from user
An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2015-01-07 09:49:14 -08:00
Richard Cochran
2eebdde652 timecounter: keep track of accumulated fractional nanoseconds
The current timecounter implementation will drop a variable amount
of resolution, depending on the magnitude of the time delta. In
other words, reading the clock too often or too close to a time
stamp conversion will introduce errors into the time values. This
patch fixes the issue by introducing a fractional nanosecond field
that accumulates the low order bits.

Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Richard Cochran
74d23cc704 time: move the timecounter/cyclecounter code into its own file.
The timecounter code has almost nothing to do with the clocksource
code. Let it live in its own file. This will help isolate the
timecounter users from the clocksource users in the source tree.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:25 -05:00