Commit Graph

380 Commits

Author SHA1 Message Date
Yu Liao
a9f2c6af5a timekeeping: Really make sure wall_to_monotonic isn't positive
commit 4e8c11b6b3 upstream.

Even after commit e1d7ba8735 ("time: Always make sure wall_to_monotonic
isn't positive") it is still possible to make wall_to_monotonic positive
by running the following code:

    int main(void)
    {
        struct timespec time;

        clock_gettime(CLOCK_MONOTONIC, &time);
        time.tv_nsec = 0;
        clock_settime(CLOCK_REALTIME, &time);
        return 0;
    }

The reason is that the second parameter of timespec64_compare(), ts_delta,
may be unnormalized because the delta is calculated with an open coded
substraction which causes the comparison of tv_sec to yield the wrong
result:

  wall_to_monotonic = { .tv_sec = -10, .tv_nsec =  900000000 }
  ts_delta 	    = { .tv_sec =  -9, .tv_nsec = -900000000 }

That makes timespec64_compare() claim that wall_to_monotonic < ts_delta,
but actually the result should be wall_to_monotonic > ts_delta.

After normalization, the result of timespec64_compare() is correct because
the tv_sec comparison is not longer misleading:

  wall_to_monotonic = { .tv_sec = -10, .tv_nsec =  900000000 }
  ts_delta 	    = { .tv_sec = -10, .tv_nsec =  100000000 }

Use timespec64_sub() to ensure that ts_delta is normalized, which fixes the
issue.

Fixes: e1d7ba8735 ("time: Always make sure wall_to_monotonic isn't positive")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211213135727.1656662-1-liaoyu15@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22 09:30:58 +01:00
Linus Torvalds
ed016af52e Merge tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "These are the locking updates for v5.10:

   - Add deadlock detection for recursive read-locks.

     The rationale is outlined in commit 224ec489d3 ("lockdep/
     Documention: Recursive read lock detection reasoning")

     The main deadlock pattern we want to detect is:

           TASK A:                 TASK B:

           read_lock(X);
                                   write_lock(X);
           read_lock_2(X);

   - Add "latch sequence counters" (seqcount_latch_t):

     A sequence counter variant where the counter even/odd value is used
     to switch between two copies of protected data. This allows the
     read path, typically NMIs, to safely interrupt the write side
     critical section.

     We utilize this new variant for sched-clock, and to make x86 TSC
     handling safer.

   - Other seqlock cleanups, fixes and enhancements

   - KCSAN updates

   - LKMM updates

   - Misc updates, cleanups and fixes"

* tag 'locking-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu variables"
  lockdep: Fix lockdep recursion
  lockdep: Fix usage_traceoverflow
  locking/atomics: Check atomic-arch-fallback.h too
  locking/seqlock: Tweak DEFINE_SEQLOCK() kernel doc
  lockdep: Optimize the memory usage of circular queue
  seqlock: Unbreak lockdep
  seqlock: PREEMPT_RT: Do not starve seqlock_t writers
  seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
  seqlock: seqcount_t: Implement all read APIs as statement expressions
  seqlock: Use unique prefix for seqcount_t property accessors
  seqlock: seqcount_LOCKNAME_t: Standardize naming convention
  seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  rbtree_latch: Use seqcount_latch_t
  x86/tsc: Use seqcount_latch_t
  timekeeping: Use seqcount_latch_t
  time/sched_clock: Use seqcount_latch_t
  seqlock: Introduce seqcount_latch_t
  mm/swap: Do not abuse the seqcount_t latching API
  time/sched_clock: Use raw_read_seqcount_latch() during suspend
  ...
2020-10-12 13:06:20 -07:00
Ahmed S. Darwish
249d053835 timekeeping: Use seqcount_latch_t
Latch sequence counters are a multiversion concurrency control mechanism
where the seqcount_t counter even/odd value is used to switch between
two data storage copies. This allows the seqcount_t read path to safely
interrupt its write side critical section (e.g. from NMIs).

Initially, latch sequence counters were implemented as a single write
function, raw_write_seqcount_latch(), above plain seqcount_t. The read
path was expected to use plain seqcount_t raw_read_seqcount().

A specialized read function was later added, raw_read_seqcount_latch(),
and became the standardized way for latch read paths. Having unique read
and write APIs meant that latch sequence counters are basically a data
type of their own -- just inappropriately overloading plain seqcount_t.
The seqcount_latch_t data type was thus introduced at seqlock.h.

Use that new data type instead of seqcount_raw_spinlock_t. This ensures
that only latch-safe APIs are to be used with the sequence counter.

Note that the use of seqcount_raw_spinlock_t was not very useful in the
first place. Only the "raw_" subset of seqcount_t APIs were used at
timekeeping.c. This subset was created for contexts where lockdep cannot
be used. seqcount_LOCKTYPE_t's raison d'ĂȘtre -- verifying that the
seqcount_t writer serialization lock is held -- cannot thus be done.

References: 0c3351d451 ("seqlock: Use raw_ prefix instead of _no_lockdep")
References: 55f3560df9 ("seqlock: Extend seqcount API with associated locks")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200827114044.11173-6-a.darwish@linutronix.de
2020-09-10 11:19:29 +02:00
Thomas Gleixner
e2d977c9f1 timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper
printk wants to store various timestamps (MONOTONIC, REALTIME, BOOTTIME) to
make correlation of dmesg from several systems easier.

Provide an interface to retrieve all three timestamps in one go.

There are some caveats:

1) Boot time and late sleep time injection

  Boot time is a racy access on 32bit systems if the sleep time injection
  happens late during resume and not in timekeeping_resume(). That could be
  avoided by expanding struct tk_read_base with boot offset for 32bit and
  adding more overhead to the update. As this is a hard to observe once per
  resume event which can be filtered with reasonable effort using the
  accurate mono/real timestamps, it's probably not worth the trouble.

  Aside of that it might be possible on 32 and 64 bit to observe the
  following when the sleep time injection happens late:

  CPU 0				         CPU 1
  timekeeping_resume()
  ktime_get_fast_timestamps()
    mono, real = __ktime_get_real_fast()
  					 inject_sleep_time()
  					   update boot offset
  	boot = mono + bootoffset;
  
  That means that boot time already has the sleep time adjustment, but
  real time does not. On the next readout both are in sync again.
  
  Preventing this for 64bit is not really feasible without destroying the
  careful cache layout of the timekeeper because the sequence count and
  struct tk_read_base would then need two cache lines instead of one.

2) Suspend/resume timestamps

   Access to the time keeper clock source is disabled accross the innermost
   steps of suspend/resume. The accessors still work, but the timestamps
   are frozen until time keeping is resumed which happens very early.

   For regular suspend/resume there is no observable difference vs. sched
   clock, but it might affect some of the nasty low level debug printks.

   OTOH, access to sched clock is not guaranteed accross suspend/resume on
   all systems either so it depends on the hardware in use.

   If that turns out to be a real problem then this could be mitigated by
   using sched clock in a similar way as during early boot. But it's not as
   trivial as on early boot because it needs some careful protection
   against the clock monotonic timestamp jumping backwards on resume.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Petr Mladek <pmladek@suse.com>                                                                                                                                                                                                                                      
Link: https://lore.kernel.org/r/20200814115512.159981360@linutronix.de
2020-08-23 10:38:24 +02:00
Thomas Gleixner
71419b30ca timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot
During early boot the NMI safe timekeeper returns 0 until the first
clocksource becomes available.

This prevents it from being used for printk or other facilities which today
use sched clock. sched clock can be available way before timekeeping is
initialized.

The obvious workaround for this is to utilize the early sched clock in the
default dummy clock read function until a clocksource becomes available.

After switching to the clocksource clock MONOTONIC and BOOTTIME will not
jump because the timekeeping_init() bases clock MONOTONIC on sched clock
and the offset between clock MONOTONIC and BOOTTIME is zero during boot.

Clock REALTIME cannot provide useful timestamps during early boot up to
the point where a persistent clock becomes available, which is either in
timekeeping_init() or later when the RTC driver which might depend on I2C
or other subsystems is initialized.

There is a minor difference to sched_clock() vs. suspend/resume. As the
timekeeper clock source might not be accessible during suspend, after
timekeeping_suspend() timestamps freeze up to the point where
timekeeping_resume() is invoked. OTOH this is true for some sched clock
implementations as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Petr Mladek <pmladek@suse.com>                                                                                                                                                                                                                                      
Link: https://lore.kernel.org/r/20200814115512.041422402@linutronix.de
2020-08-23 10:38:24 +02:00
Linus Torvalds
b923f1247b Merge tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping updates from Thomas Gleixner:
 "A set of timekeeping/VDSO updates:

   - Preparatory work to allow S390 to switch over to the generic VDSO
     implementation.

     S390 requires that the VDSO data pointer is handed in to the
     counter read function when time namespace support is enabled.
     Adding the pointer is a NOOP for all other architectures because
     the compiler is supposed to optimize that out when it is unused in
     the architecture specific inline. The change also solved a similar
     problem for MIPS which fortunately has time namespaces not yet
     enabled.

     S390 needs to update clock related VDSO data independent of the
     timekeeping updates. This was solved so far with yet another
     sequence counter in the S390 implementation. A better solution is
     to utilize the already existing VDSO sequence count for this. The
     core code now exposes helper functions which allow to serialize
     against the timekeeper code and against concurrent readers.

     S390 needs extra data for their clock readout function. The initial
     common VDSO data structure did not provide a way to add that. It
     now has an embedded architecture specific struct embedded which
     defaults to an empty struct.

     Doing this now avoids tree dependencies and conflicts post rc1 and
     allows all other architectures which work on generic VDSO support
     to work from a common upstream base.

   - A trivial comment fix"

* tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Delete repeated words in comments
  lib/vdso: Allow to add architecture-specific vdso data
  timekeeping/vsyscall: Provide vdso_update_begin/end()
  vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()
2020-08-14 14:26:08 -07:00
Linus Torvalds
97d052ea3f Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Randy Dunlap
b0294f3025 time: Delete repeated words in comments
Drop repeated words in kernel/time/.  {when, one, into}

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <john.stultz@linaro.org>
Link: https://lore.kernel.org/r/20200807033248.8452-1-rdunlap@infradead.org
2020-08-10 22:14:07 +02:00
Thomas Gleixner
19d0070a27 timekeeping/vsyscall: Provide vdso_update_begin/end()
Architectures can have the requirement to add additional architecture
specific data to the VDSO data page which needs to be updated independent
of the timekeeper updates.

To protect these updates vs. concurrent readers and a conflicting update
through timekeeping, provide helper functions to make such updates safe.

vdso_update_begin() takes the timekeeper_lock to protect against a
potential update from timekeeper code and increments the VDSO sequence
count to signal data inconsistency to concurrent readers. vdso_update_end()
makes the sequence count even again to signal data consistency and drops
the timekeeper lock.

[ Sven: Add interrupt disable handling to the functions ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200804150124.41692-3-svens@linux.ibm.com
2020-08-06 10:57:30 +02:00
Ahmed S. Darwish
025e82bcbc timekeeping: Use sequence counter with associated raw spinlock
A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200720155530.1173732-18-a.darwish@linutronix.de
2020-07-29 16:14:27 +02:00
Paul Gortmaker
46132e3ac5 sched: nohz: stop passing around unused "ticks" parameter.
The "ticks" parameter was added in commit 0f004f5a69 ("sched: Cure more
NO_HZ load average woes") since calc_global_nohz() was called and needed
the "ticks" argument.

But in commit c308b56b53 ("sched: Fix nohz load accounting -- again!")
it became unused as the function calc_global_nohz() dropped using "ticks".

Fixes: c308b56b53 ("sched: Fix nohz load accounting -- again!")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1593628458-32290-1-git-send-email-paul.gortmaker@windriver.com
2020-07-22 10:22:04 +02:00
Thomas Gleixner
865d3a9afe x86/mce: Address objtools noinstr complaints
Mark the relevant functions noinstr, use the plain non-instrumented MSR
accessors. The only odd part is the instrumentation_begin()/end() pair around the
indirect machine_check_vector() call as objtool can't figure that out. The
possible invoked functions are annotated correctly.

Also use notrace variant of nmi_enter/exit(). If MCEs happen then hardware
latency tracing is the least of the worries.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135315.476734898@linutronix.de
2020-06-11 15:15:02 +02:00
Linus Torvalds
dbb381b619 Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping and timer updates from Thomas Gleixner:
 "Core:

   - Consolidation of the vDSO build infrastructure to address the
     difficulties of cross-builds for ARM64 compat vDSO libraries by
     restricting the exposure of header content to the vDSO build.

     This is achieved by splitting out header content into separate
     headers. which contain only the minimaly required information which
     is necessary to build the vDSO. These new headers are included from
     the kernel headers and the vDSO specific files.

   - Enhancements to the generic vDSO library allowing more fine grained
     control over the compiled in code, further reducing architecture
     specific storage and preparing for adopting the generic library by
     PPC.

   - Cleanup and consolidation of the exit related code in posix CPU
     timers.

   - Small cleanups and enhancements here and there

  Drivers:

   - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support

   - Correct the clock rate of PIT64b global clock

   - setup_irq() cleanup

   - Preparation for PWM and suspend support for the TI DM timer

   - Expand the fttmr010 driver to support ast2600 systems

   - The usual small fixes, enhancements and cleanups all over the
     place"

* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
  Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
  vdso: Fix clocksource.h macro detection
  um: Fix header inclusion
  arm64: vdso32: Enable Clang Compilation
  lib/vdso: Enable common headers
  arm: vdso: Enable arm to use common headers
  x86/vdso: Enable x86 to use common headers
  mips: vdso: Enable mips to use common headers
  arm64: vdso32: Include common headers in the vdso library
  arm64: vdso: Include common headers in the vdso library
  arm64: Introduce asm/vdso/processor.h
  arm64: vdso32: Code clean up
  linux/elfnote.h: Replace elf.h with UAPI equivalent
  scripts: Fix the inclusion order in modpost
  common: Introduce processor.h
  linux/ktime.h: Extract common header for vDSO
  linux/jiffies.h: Extract common header for vDSO
  linux/time64.h: Extract common header for vDSO
  linux/time32.h: Extract common header for vDSO
  linux/time.h: Extract common header for vDSO
  ...
2020-03-30 18:51:47 -07:00
Thomas Gleixner
e5d4d1756b timekeeping: Split jiffies seqlock
seqlock consists of a sequence counter and a spinlock_t which is used to
serialize the writers. spinlock_t is substituted by a "sleeping" spinlock
on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping
code as the writers are executed in hard interrupt and therefore
non-preemptible context even on PREEMPT_RT.

The spinlock in seqlock cannot be unconditionally replaced by a
raw_spinlock_t as many seqlock users have nesting spinlock sections or
other code which is not suitable to run in truly atomic context on RT.

Instead of providing a raw_seqlock API for a single use case, open code the
seqlock for the jiffies use case and implement it with a raw_spinlock_t and
a sequence counter.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200321113242.120587764@linutronix.de
2020-03-21 16:00:23 +01:00
Wen Yang
4cbbc3a0ee timekeeping: Prevent 32bit truncation in scale64_check_overflow()
While unlikely the divisor in scale64_check_overflow() could be >= 32bit in
scale64_check_overflow(). do_div() truncates the divisor to 32bit at least
on 32bit platforms.

Use div64_u64() instead to avoid the truncation to 32-bit.

[ tglx: Massaged changelog ]

Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200120100523.45656-1-wenyang@linux.alibaba.com
2020-03-04 10:17:51 +01:00
Thomas Gleixner
b99328a60a timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
The VDSO update for CLOCK_BOOTTIME has a overflow issue as it shifts the
nanoseconds based boot time offset left by the clocksource shift. That
overflows once the boot time offset becomes large enough. As a consequence
CLOCK_BOOTTIME in the VDSO becomes a random number causing applications to
misbehave.

Fix it by storing a timespec64 representation of the offset when boot time
is adjusted and add that to the MONOTONIC base time value in the vdso data
page. Using the timespec64 representation avoids a 64bit division in the
update code.

Fixes: 44f57d788e ("timekeeping: Provide a generic update_vsyscall() implementation")
Reported-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1908221257580.1983@nanos.tec.linutronix.de
2019-08-23 02:12:11 +02:00
Jason A. Donenfeld
0354c1a3cd timekeeping: Use proper ktime_add when adding nsecs in coarse offset
While this doesn't actually amount to a real difference, since the macro
evaluates to the same thing, every place else operates on ktime_t using
these functions, so let's not break the pattern.

Fixes: e3ff9c3678 ("timekeeping: Repair ktime_get_coarse*() granularity")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20190621203249.3909-1-Jason@zx2c4.com
2019-06-22 12:11:27 +02:00
Thomas Gleixner
e3ff9c3678 timekeeping: Repair ktime_get_coarse*() granularity
Jason reported that the coarse ktime based time getters advance only once
per second and not once per tick as advertised.

The code reads only the monotonic base time, which advances once per
second. The nanoseconds are accumulated on every tick in xtime_nsec up to
a second and the regular time getters take this nanoseconds offset into
account, but the ktime_get_coarse*() implementation fails to do so.

Add the accumulated xtime_nsec value to the monotonic base time to get the
proper per tick advancing coarse tinme.

Fixes: b9ff604cff ("timekeeping: Add ktime_get_coarse_with_offset")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Waiman Long <longman@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1906132136280.1791@nanos.tec.linutronix.de
2019-06-14 11:51:44 +02:00
Linus Torvalds
02aff8db64 Merge tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
 "We've got a reasonably broad set of audit patches for the v5.2 merge
  window, the highlights are below:

   - The biggest change, and the source of all the arch/* changes, is
     the patchset from Dmitry to help enable some of the work he is
     doing around PTRACE_GET_SYSCALL_INFO.

     To be honest, including this in the audit tree is a bit of a
     stretch, but it does help move audit a little further along towards
     proper syscall auditing for all arches, and everyone else seemed to
     agree that audit was a "good" spot for this to land (or maybe they
     just didn't want to merge it? dunno.).

   - We can now audit time/NTP adjustments.

   - We continue the work to connect associated audit records into a
     single event"

* tag 'audit-pr-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: (21 commits)
  audit: fix a memory leak bug
  ntp: Audit NTP parameters adjustment
  timekeeping: Audit clock adjustments
  audit: purge unnecessary list_empty calls
  audit: link integrity evm_write_xattrs record to syscall event
  syscall_get_arch: add "struct task_struct *" argument
  unicore32: define syscall_get_arch()
  Move EM_UNICORE to uapi/linux/elf-em.h
  nios2: define syscall_get_arch()
  nds32: define syscall_get_arch()
  Move EM_NDS32 to uapi/linux/elf-em.h
  m68k: define syscall_get_arch()
  hexagon: define syscall_get_arch()
  Move EM_HEXAGON to uapi/linux/elf-em.h
  h8300: define syscall_get_arch()
  c6x: define syscall_get_arch()
  arc: define syscall_get_arch()
  Move EM_ARCOMPACT and EM_ARCV2 to uapi/linux/elf-em.h
  audit: Make audit_log_cap and audit_copy_inode static
  audit: connect LOGIN record to its syscall record
  ...
2019-05-07 19:06:04 -07:00
Ondrej Mosnacek
7e8eda734d ntp: Audit NTP parameters adjustment
Emit an audit record every time selected NTP parameters are modified
from userspace (via adjtimex(2) or clock_adjtime(2)). These parameters
may be used to indirectly change system clock, and thus their
modifications should be audited.

Such events will now generate records of type AUDIT_TIME_ADJNTPVAL
containing the following fields:
  - op -- which value was adjusted:
    - offset -- corresponding to the time_offset variable
    - freq   -- corresponding to the time_freq variable
    - status -- corresponding to the time_status variable
    - adjust -- corresponding to the time_adjust variable
    - tick   -- corresponding to the tick_usec variable
    - tai    -- corresponding to the timekeeping's TAI offset
  - old -- the old value
  - new -- the new value

Example records:

type=TIME_ADJNTPVAL msg=audit(1530616044.507:7): op=status old=64 new=8256
type=TIME_ADJNTPVAL msg=audit(1530616044.511:11): op=freq old=0 new=49180377088000

The records of this type will be associated with the corresponding
syscall records.

An overview of parameter changes that can be done via do_adjtimex()
(based on information from Miroslav Lichvar) and whether they are
audited:
  __timekeeping_set_tai_offset() -- sets the offset from the
                                    International Atomic Time
                                    (AUDITED)
  NTP variables:
    time_offset -- can adjust the clock by up to 0.5 seconds per call
                   and also speed it up or slow down by up to about
                   0.05% (43 seconds per day) (AUDITED)
    time_freq -- can speed up or slow down by up to about 0.05%
                 (AUDITED)
    time_status -- can insert/delete leap seconds and it also enables/
                   disables synchronization of the hardware real-time
                   clock (AUDITED)
    time_maxerror, time_esterror -- change error estimates used to
                                    inform userspace applications
                                    (NOT AUDITED)
    time_constant -- controls the speed of the clock adjustments that
                     are made when time_offset is set (NOT AUDITED)
    time_adjust -- can temporarily speed up or slow down the clock by up
                   to 0.05% (AUDITED)
    tick_usec -- a more extreme version of time_freq; can speed up or
                 slow down the clock by up to 10% (AUDITED)

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-04-15 18:14:01 -04:00
Ondrej Mosnacek
2d87a0674b timekeeping: Audit clock adjustments
Emit an audit record whenever the system clock is changed (i.e. shifted
by a non-zero offset) by a syscall from userspace. The syscalls than can
(at the time of writing) trigger such record are:
  - settimeofday(2), stime(2), clock_settime(2) -- via
    do_settimeofday64()
  - adjtimex(2), clock_adjtime(2) -- via do_adjtimex()

The new records have type AUDIT_TIME_INJOFFSET and contain the following
fields:
  - sec -- the 'seconds' part of the offset
  - nsec -- the 'nanoseconds' part of the offset

Example record (time was shifted backwards by ~15.875 seconds):

type=TIME_INJOFFSET msg=audit(1530616049.652:13): sec=-16 nsec=124887145

The records of this type will be associated with the corresponding
syscall records.

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[PM: fixed a line width problem in __audit_tk_injoffset()]
Signed-off-by: Paul Moore <paul@paul-moore.com>
2019-04-15 18:10:17 -04:00
Thomas Gleixner
7a8e61f847 timekeeping: Force upper bound for setting CLOCK_REALTIME
Several people reported testing failures after setting CLOCK_REALTIME close
to the limits of the kernel internal representation in nanoseconds,
i.e. year 2262.

The failures are exposed in subsequent operations, i.e. when arming timers
or when the advancing CLOCK_MONOTONIC makes the calculation of
CLOCK_REALTIME overflow into negative space.

Now people start to paper over the underlying problem by clamping
calculations to the valid range, but that's just wrong because such
workarounds will prevent detection of real issues as well.

It is reasonable to force an upper bound for the various methods of setting
CLOCK_REALTIME. Year 2262 is the absolute upper bound. Assume a maximum
uptime of 30 years which is plenty enough even for esoteric embedded
systems. That results in an upper bound of year 2232 for setting the time.

Once that limit is reached in reality this limit is only a small part of
the problem space. But until then this stops people from trying to paper
over the problem at the wrong places.

Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reported-by: Hongbo Yao <yaohongbo@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903231125480.2157@nanos.tec.linutronix.de
2019-03-28 13:41:06 +01:00
Rasmus Villemoes
e1e41b6ce5 timekeeping: Consistently use unsigned int for seqcount snapshot
The timekeeping code uses a random mix of "unsigned long" and "unsigned
int" for the seqcount snapshots (ratio 14:12). Since the seqlock.h API is
entirely based on unsigned int, use that throughout.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Link: https://lkml.kernel.org/r/20190318195557.20773-1-linux@rasmusvillemoes.dk
2019-03-23 11:43:56 +01:00
Deepa Dinamani
ead25417f8 timex: use __kernel_timex internally
struct timex is not y2038 safe.
Replace all uses of timex with y2038 safe __kernel_timex.

Note that struct __kernel_timex is an ABI interface definition.
We could define a new structure based on __kernel_timex that
is only available internally instead. Right now, there isn't
a strong motivation for this as the structure is isolated to
a few defined struct timex interfaces and such a structure would
be exactly the same as struct timex.

The patch was generated by the following coccinelle script:

virtual patch

@depends on patch forall@
identifier ts;
expression e;
@@
(
- struct timex ts;
+ struct __kernel_timex ts;
|
- struct timex ts = {};
+ struct __kernel_timex ts = {};
|
- struct timex ts = e;
+ struct __kernel_timex ts = e;
|
- struct timex *ts;
+ struct __kernel_timex *ts;
|
(memset \| copy_from_user \| copy_to_user \)(...,
- sizeof(struct timex))
+ sizeof(struct __kernel_timex))
)

@depends on patch forall@
identifier ts;
identifier fn;
@@
fn(...,
- struct timex *ts,
+ struct __kernel_timex *ts,
...) {
...
}

@depends on patch forall@
identifier ts;
identifier fn;
@@
fn(...,
- struct timex *ts) {
+ struct __kernel_timex *ts) {
...
}

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: linux-alpha@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-07 00:13:27 +01:00
Linus Torvalds
b12a9124ee Merge tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 updates from Arnd Bergmann:
 "More syscalls and cleanups

  This concludes the main part of the system call rework for 64-bit
  time_t, which has spread over most of year 2018, the last six system
  calls being

    - ppoll
    - pselect6
    - io_pgetevents
    - recvmmsg
    - futex
    - rt_sigtimedwait

  As before, nothing changes for 64-bit architectures, while 32-bit
  architectures gain another entry point that differs only in the layout
  of the timespec structure. Hopefully in the next release we can wire
  up all 22 of those system calls on all 32-bit architectures, which
  gives us a baseline version for glibc to start using them.

  This does not include the clock_adjtime, getrusage/waitid, and
  getitimer/setitimer system calls. I still plan to have new versions of
  those as well, but they are not required for correct operation of the
  C library since they can be emulated using the old 32-bit time_t based
  system calls.

  Aside from the system calls, there are also a few cleanups here,
  removing old kernel internal interfaces that have become unused after
  all references got removed. The arch/sh cleanups are part of this,
  there were posted several times over the past year without a reaction
  from the maintainers, while the corresponding changes made it into all
  other architectures"

* tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground:
  timekeeping: remove obsolete time accessors
  vfs: replace current_kernel_time64 with ktime equivalent
  timekeeping: remove timespec_add/timespec_del
  timekeeping: remove unused {read,update}_persistent_clock
  sh: remove board_time_init() callback
  sh: remove unused rtc_sh_get/set_time infrastructure
  sh: sh03: rtc: push down rtc class ops into driver
  sh: dreamcast: rtc: push down rtc class ops into driver
  y2038: signal: Add compat_sys_rt_sigtimedwait_time64
  y2038: signal: Add sys_rt_sigtimedwait_time32
  y2038: socket: Add compat_sys_recvmmsg_time64
  y2038: futex: Add support for __kernel_timespec
  y2038: futex: Move compat implementation into futex.c
  io_pgetevents: use __kernel_timespec
  pselect6: use __kernel_timespec
  ppoll: use __kernel_timespec
  signal: Add restore_user_sigmask()
  signal: Add set_user_sigmask()
2018-12-28 12:45:04 -08:00