Commit Graph

2155 Commits

Author SHA1 Message Date
Thomas Gleixner
bb7262b295 timers: Move clearing of base::timer_running under base:: Lock
syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().

This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:

 int data = 0;

 void timer_func(struct timer_list *t)
 {
    data = 1;
 }

 CPU 0                                            CPU 1
 ------------------------------                   --------------------------
 base = lock_timer_base(timer, &flags);           raw_spin_unlock(&base->lock);
 if (base->running_timer != timer)                call_timer_fn(timer, fn, baseclk);
   ret = detach_if_pending(timer, base, true);    base->running_timer = NULL;
 raw_spin_unlock_irqrestore(&base->lock, flags);  raw_spin_lock(&base->lock);

 x = data;

If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.

For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.

Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
Fixes: 030dcdd197 ("timers: Prepare support for PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de
Cc: stable@vger.kernel.org
2021-07-27 20:57:44 +02:00
Nicolas Saenz Julienne
aebacb7f6c timers: Fix get_next_timer_interrupt() with no timers pending
31cd0e119d ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.

In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.

A simplified example:

- Upon entering get_next_timer_interrupt() with:

	jiffies = 1
	base->clk = 0;
	base->next_expiry = NEXT_TIMER_MAX_DELTA;

  'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
  returns KTIME_MAX.

- 'base->clk' is updated to the jiffies value.

- The next time we enter get_next_timer_interrupt(), taking into account
  no timer operations happened:

	base->clk = 1;
	base->next_expiry = NEXT_TIMER_MAX_DELTA;

  'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
  returns a valid expire time, which is incorrect.

This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].

So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().

[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
    'struct timer_base' any bigger.

Fixes: 31cd0e119d ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2021-07-15 01:23:54 +02:00
Frederic Weisbecker
1a3402d93c posix-cpu-timers: Fix rearm racing against process tick
Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:

         CPU 0                                CPU 1
         -----                                -----
                                           timer_settime(TIMER B)
   posix_cpu_timer_rearm(TIMER A)
       cpu_clock_sample_group()
           (pct->timers_active already true)

                                           handle_posix_cpu_timers()
                                               check_process_timers()
                                                   stop_process_timers()
                                                       pct->timers_active = false
       arm_timer(TIMER A)

   tick -> run_posix_cpu_timers()
       // sees !pct->timers_active, ignore
       // our TIMER A

Fix this with simply locking process wide cputime counting start and
timer arm in the same block.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Fixes: 60f2ceaa81 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
Cc: stable@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
2021-07-15 01:20:10 +02:00
Linus Torvalds
28e92f9903 Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:

 - Bitmap parsing support for "all" as an alias for all bits

 - Documentation updates

 - Miscellaneous fixes, including some that overlap into mm and lockdep

 - kvfree_rcu() updates

 - mem_dump_obj() updates, with acks from one of the slab-allocator
   maintainers

 - RCU NOCB CPU updates, including limited deoffloading

 - SRCU updates

 - Tasks-RCU updates

 - Torture-test updates

* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
  tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
  rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
  rcu: Add missing __releases() annotation
  rcu: Remove obsolete rcu_read_unlock() deadlock commentary
  rcu: Improve comments describing RCU read-side critical sections
  rcu: Create an unrcu_pointer() to remove __rcu from a pointer
  srcu: Early test SRCU polling start
  rcu: Fix various typos in comments
  rcu/nocb: Unify timers
  rcu/nocb: Prepare for fine-grained deferred wakeup
  rcu/nocb: Only cancel nocb timer if not polling
  rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
  rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
  rcu/nocb: Allow de-offloading rdp leader
  rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
  rcu: Don't penalize priority boosting when there is nothing to boost
  rcu: Point to documentation of ordering guarantees
  rcu: Make rcu_gp_cleanup() be noinline for tracing
  rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
  rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
  ...
2021-07-04 12:58:33 -07:00
Linus Torvalds
a941a0349c Merge tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Time and clocksource/clockevent related updates:

  Core changes:

   - Infrastructure to support per CPU "broadcast" devices for per CPU
     clockevent devices which stop in deep idle states. This allows us
     to utilize the more efficient architected timer on certain ARM SoCs
     for normal operation instead of permanentely using the slow to
     access SoC specific clockevent device.

   - Print the name of the broadcast/wakeup device in /proc/timer_list

   - Make the clocksource watchdog more robust against delays between
     reading the current active clocksource and the watchdog
     clocksource. Such delays can be caused by NMIs, SMIs and vCPU
     preemption.

     Handle this by reading the watchdog clocksource twice, i.e. before
     and after reading the current active clocksource. In case that the
     two watchdog reads shows an excessive time delta, the read sequence
     is repeated up to 3 times.

   - Improve the debug output and add a test module for the watchdog
     mechanism.

   - Reimplementation of the venerable time64_to_tm() function with a
     faster and significantly smaller version. Straight from the source,
     i.e. the author of the related research paper contributed this!

  Driver changes:

   - No new drivers, not even new device tree bindings!

   - Fixes, improvements and cleanups and all over the place"

* tag 'timers-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  time/kunit: Add missing MODULE_LICENSE()
  time: Improve performance of time64_to_tm()
  clockevents: Use list_move() instead of list_del()/list_add()
  clocksource: Print deviation in nanoseconds when a clocksource becomes unstable
  clocksource: Provide kernel module to test clocksource watchdog
  clocksource: Reduce clocksource-skew threshold
  clocksource: Limit number of CPUs checked for clock synchronization
  clocksource: Check per-CPU clock synchronization when marked unstable
  clocksource: Retry clock read if long delays detected
  clockevents: Add missing parameter documentation
  clocksource/drivers/timer-ti-dm: Drop unnecessary restore
  clocksource/arm_arch_timer: Improve Allwinner A64 timer workaround
  clocksource/drivers/arm_global_timer: Remove duplicated argument in arm_global_timer
  clocksource/drivers/arm_global_timer: Make symbol 'gt_clk_rate_change_nb' static
  arm: zynq: don't disable CONFIG_ARM_GLOBAL_TIMER due to CONFIG_CPU_FREQ anymore
  clocksource/drivers/arm_global_timer: Implement rate compensation whenever source clock changes
  clocksource/drivers/ingenic: Rename unreasonable array names
  clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFG
  clocksource/drivers/mediatek: Ack and disable interrupts on suspend
  clocksource/drivers/samsung_pwm: Constify source IO memory
  ...
2021-06-29 12:31:16 -07:00
Linus Torvalds
9269d27e51 Merge tag 'timers-nohz-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers/nohz updates from Ingo Molnar:

 - Micro-optimize tick_nohz_full_cpu()

 - Optimize idle exit tick restarts to be less eager

 - Optimize tick_nohz_dep_set_task() to only wake up a single CPU.
   This reduces IPIs and interruptions on nohz_full CPUs.

 - Optimize tick_nohz_dep_set_signal() in a similar fashion.

 - Skip IPIs in tick_nohz_kick_task() when trying to kick a
   non-running task.

 - Micro-optimize tick_nohz_task_switch() IRQ flags handling to
   reduce context switching costs.

 - Misc cleanups and fixes

* tag 'timers-nohz-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Add myself as context tracking maintainer
  tick/nohz: Call tick_nohz_task_switch() with interrupts disabled
  tick/nohz: Kick only _queued_ task whose tick dependency is updated
  tick/nohz: Change signal tick dependency to wake up CPUs of member tasks
  tick/nohz: Only wake up a single target cpu when kicking a task
  tick/nohz: Update nohz_full Kconfig help
  tick/nohz: Update idle_exittime on actual idle exit
  tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
  tick/nohz: Conditionally restart tick on idle exit
  tick/nohz: Evaluate the CPU expression after the static key
2021-06-28 12:22:06 -07:00
Thomas Gleixner
2d0a9eb23c time/kunit: Add missing MODULE_LICENSE()
[ mingo: MODULE_LICENSE() takes a string. ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-28 07:40:23 +02:00
Cassio Neri
2760105516 time: Improve performance of time64_to_tm()
The current implementation of time64_to_tm() contains unnecessary loops,
branches and look-up tables. The new one uses an arithmetic-based algorithm
appeared in [1] and is approximately 3x faster (YMMV).

The drawback is that the new code isn't intuitive and contains many 'magic
numbers' (not unusual for this type of algorithm). However, [1] justifies
all those numbers and, given this function's history, the code is unlikely
to need much maintenance, if any at all.

Add a KUnit test for it which checks every day in a 160,000 years interval
centered at 1970-01-01 against the expected result.

[1] Neri, Schneider, "Euclidean Affine Functions and Applications to
Calendar Algorithms". https://arxiv.org/abs/2102.06959

Signed-off-by: Cassio Neri <cassio.neri@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210622213616.313046-1-cassio.neri@gmail.com
2021-06-24 11:51:59 +02:00
Baokun Li
4e82d2e20f clockevents: Use list_move() instead of list_del()/list_add()
Simplify the code.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210609070242.1322450-1-libaokun1@huawei.com
2021-06-22 17:16:46 +02:00
Feng Tang
22a2238337 clocksource: Print deviation in nanoseconds when a clocksource becomes unstable
Currently when an unstable clocksource is detected, the raw counters of
that clocksource and watchdog will be printed, which can only be understood
after some math calculation.

So print the delta in nanoseconds as well to make it easier for humans to
check the results.

[ paulmck: Fix typo. ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210527190124.440372-6-paulmck@kernel.org
2021-06-22 16:53:17 +02:00
Paul E. McKenney
1253b9b87e clocksource: Provide kernel module to test clocksource watchdog
When the clocksource watchdog marks a clock as unstable, this might
be due to that clock being unstable or it might be due to delays that
happen to occur between the reads of the two clocks.  It would be good
to have a way of testing the clocksource watchdog's ability to
distinguish between these two causes of clock skew and instability.

Therefore, provide a new clocksource-wdtest module selected by a new
TEST_CLOCKSOURCE_WATCHDOG Kconfig option.  This module has a single module
parameter named "holdoff" that provides the number of seconds of delay
before testing should start, which defaults to zero when built as a module
and to 10 seconds when built directly into the kernel.  Very large systems
that boot slowly may need to increase the value of this module parameter.

This module uses hand-crafted clocksource structures to do its testing,
thus avoiding messing up timing for the rest of the kernel and for user
applications.  This module first verifies that the ->uncertainty_margin
field of the clocksource structures are set sanely.  It then tests the
delay-detection capability of the clocksource watchdog, increasing the
number of consecutive delays injected, first provoking console messages
complaining about the delays and finally forcing a clock-skew event.
Unexpected test results cause at least one WARN_ON_ONCE() console splat.
If there are no splats, the test has passed.  Finally, it fuzzes the
value returned from a clocksource to test the clocksource watchdog's
ability to detect time skew.

This module checks the state of its clocksource after each test, and
uses WARN_ON_ONCE() to emit a console splat if there are any failures.
This should enable all types of test frameworks to detect any such
failures.

This facility is intended for diagnostic use only, and should be avoided
on production systems.

Reported-by: Chris Mason <clm@fb.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-5-paulmck@kernel.org
2021-06-22 16:53:17 +02:00
Paul E. McKenney
2e27e793e2 clocksource: Reduce clocksource-skew threshold
Currently, WATCHDOG_THRESHOLD is set to detect a 62.5-millisecond skew in
a 500-millisecond WATCHDOG_INTERVAL.  This requires that clocks be skewed
by more than 12.5% in order to be marked unstable.  Except that a clock
that is skewed by that much is probably destroying unsuspecting software
right and left.  And given that there are now checks for false-positive
skews due to delays between reading the two clocks, it should be possible
to greatly decrease WATCHDOG_THRESHOLD, at least for fine-grained clocks
such as TSC.

Therefore, add a new uncertainty_margin field to the clocksource structure
that contains the maximum uncertainty in nanoseconds for the corresponding
clock.  This field may be initialized manually, as it is for
clocksource_tsc_early and clocksource_jiffies, which is copied to
refined_jiffies.  If the field is not initialized manually, it will be
computed at clock-registry time as the period of the clock in question
based on the scale and freq parameters to __clocksource_update_freq_scale()
function.  If either of those two parameters are zero, the
tens-of-milliseconds WATCHDOG_THRESHOLD is used as a cowardly alternative
to dividing by zero.  No matter how the uncertainty_margin field is
calculated, it is bounded below by twice WATCHDOG_MAX_SKEW, that is, by 100
microseconds.

Note that manually initialized uncertainty_margin fields are not adjusted,
but there is a WARN_ON_ONCE() that triggers if any such field is less than
twice WATCHDOG_MAX_SKEW.  This WARN_ON_ONCE() is intended to discourage
production use of the one-nanosecond uncertainty_margin values that are
used to test the clock-skew code itself.

The actual clock-skew check uses the sum of the uncertainty_margin fields
of the two clocksource structures being compared.  Integer overflow is
avoided because the largest computed value of the uncertainty_margin
fields is one billion (10^9), and double that value fits into an
unsigned int.  However, if someone manually specifies (say) UINT_MAX,
they will get what they deserve.

Note that the refined_jiffies uncertainty_margin field is initialized to
TICK_NSEC, which means that skew checks involving this clocksource will
be sufficently forgiving.  In a similar vein, the clocksource_tsc_early
uncertainty_margin field is initialized to 32*NSEC_PER_MSEC, which
replicates the current behavior and allows custom setting if needed
in order to address the rare skews detected for this clocksource in
current mainline.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-4-paulmck@kernel.org
2021-06-22 16:53:16 +02:00
Paul E. McKenney
fa218f1cce clocksource: Limit number of CPUs checked for clock synchronization
Currently, if skew is detected on a clock marked CLOCK_SOURCE_VERIFY_PERCPU,
that clock is checked on all CPUs.  This is thorough, but might not be
what you want on a system with a few tens of CPUs, let alone a few hundred
of them.

Therefore, by default check only up to eight randomly chosen CPUs.  Also
provide a new clocksource.verify_n_cpus kernel boot parameter.  A value of
-1 says to check all of the CPUs, and a non-negative value says to randomly
select that number of CPUs, without concern about selecting the same CPU
multiple times.  However, make use of a cpumask so that a given CPU will be
checked at most once.

Suggested-by: Thomas Gleixner <tglx@linutronix.de> # For verify_n_cpus=1.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-3-paulmck@kernel.org
2021-06-22 16:53:16 +02:00
Paul E. McKenney
7560c02bdf clocksource: Check per-CPU clock synchronization when marked unstable
Some sorts of per-CPU clock sources have a history of going out of
synchronization with each other.  However, this problem has purportedy been
solved in the past ten years.  Except that it is all too possible that the
problem has instead simply been made less likely, which might mean that
some of the occasional "Marking clocksource 'tsc' as unstable" messages
might be due to desynchronization.  How would anyone know?

Therefore apply CPU-to-CPU synchronization checking to newly unstable
clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag.
Lists of desynchronized CPUs are printed, with the caveat that if it
is the reporting CPU that is itself desynchronized, it will appear that
all the other clocks are wrong.  Just like in real life.

Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org
2021-06-22 16:53:16 +02:00
Paul E. McKenney
db3a34e174 clocksource: Retry clock read if long delays detected
When the clocksource watchdog marks a clock as unstable, this might be due
to that clock being unstable or it might be due to delays that happen to
occur between the reads of the two clocks.  Yes, interrupts are disabled
across those two reads, but there are no shortage of things that can delay
interrupts-disabled regions of code ranging from SMI handlers to vCPU
preemption.  It would be good to have some indication as to why the clock
was marked unstable.

Therefore, re-read the watchdog clock on either side of the read from the
clock under test.  If the watchdog clock shows an excessive time delta
between its pair of reads, the reads are retried.

The maximum number of retries is specified by a new kernel boot parameter
clocksource.max_cswd_read_retries, which defaults to three, that is, up to
four reads, one initial and up to three retries.  If more than one retry
was required, a message is printed on the console (the occasional single
retry is expected behavior, especially in guest OSes).  If the maximum
number of retries is exceeded, the clock under test will be marked
unstable.  However, the probability of this happening due to various sorts
of delays is quite small.  In addition, the reason (clock-read delays) for
the unstable marking will be apparent.

Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Feng Tang <feng.tang@intel.com>
Link: https://lore.kernel.org/r/20210527190124.440372-1-paulmck@kernel.org
2021-06-22 16:53:16 +02:00
Baokun Li
64ab707125 clockevents: Add missing parameter documentation
Add the missing documentation for the @cpu parameter of
tick_cleanup_dead_cpu().

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210608024305.2750999-1-libaokun1@huawei.com
2021-06-22 16:33:16 +02:00
Peter Zijlstra
600642ae90 sched,timer: Use __set_current_state()
There's an existing helper for setting TASK_RUNNING; must've gotten
lost last time we did this cleanup.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.409696194@infradead.org
2021-06-18 11:43:08 +02:00
Will Deacon
245a057fee timer_list: Print name of per-cpu wakeup device
With the introduction of per-cpu wakeup devices that can be used in
preference to the broadcast timer, print the name of such devices when
they are available.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-6-will@kernel.org
2021-05-31 17:04:49 +02:00
Will Deacon
ea5c7f1b9a tick/broadcast: Program wakeup timer when entering idle if required
When configuring the broadcast timer on entry to and exit from deep idle
states, prefer a per-CPU wakeup timer if one exists.

On entry to idle, stop the tick device and transfer the next event into
the oneshot wakeup device, which will serve as the wakeup from idle. To
avoid the overhead of additional hardware accesses on exit from idle,
leave the timer armed and treat the inevitable interrupt as a (possibly
spurious) tick event.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-5-will@kernel.org
2021-05-31 17:04:46 +02:00
Will Deacon
c94a8537df tick/broadcast: Prefer per-cpu oneshot wakeup timers to broadcast
Some SoCs have two per-cpu timer implementations where the timer with the
higher rating stops in deep idle (i.e. suffers from CLOCK_EVT_FEAT_C3STOP)
but is otherwise preferable to the timer with the lower rating. In such a
design, selecting the higher rated devices relies on a global broadcast
timer and IPIs to wake up from deep idle states.

To avoid the reliance on a global broadcast timer and also to reduce the
overhead associated with the IPI wakeups, extend
tick_install_broadcast_device() to manage per-cpu wakeup timers separately
from the broadcast device.

For now, these timers remain unused.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-4-will@kernel.org
2021-05-31 17:04:45 +02:00
Will Deacon
e5007c288e tick/broadcast: Split __tick_broadcast_oneshot_control() into a helper
In preparation for adding support for per-cpu wakeup timers, split
_tick_broadcast_oneshot_control() into a helper function which deals
only with the broadcast timer management across idle transitions.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-3-will@kernel.org
2021-05-31 17:04:45 +02:00
Will Deacon
c2d4fee3f6 tick/broadcast: Drop unneeded CONFIG_GENERIC_CLOCKEVENTS_BROADCAST guard
tick-broadcast.o is only built if CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
so remove the redundant #ifdef guards around the definition of
tick_receive_broadcast().

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210524221818.15850-2-will@kernel.org
2021-05-31 17:04:44 +02:00
YueHaibing
1fa98d96ea clockevents: Use DEVICE_ATTR_[RO|WO] macros
Use the DEVICE_ATTR_[RO|WO] helpers instead of plain DEVICE_ATTR, which
makes the code a bit shorter and easier to read.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210523065825.19684-1-yuehaibing@huawei.com
2021-05-31 17:04:42 +02:00
Frederic Weisbecker
f268c3737e tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed
Checking for and processing RCU-nocb deferred wakeup upon user/guest
entry is only relevant when nohz_full runs on the local CPU, otherwise
the periodic tick should take care of it.

Make sure we don't needlessly pollute these fast-paths as a -3%
performance regression on a will-it-scale.per_process_ops has been
reported so far.

Fixes: 47b8ff194c (entry: Explicitly flush pending rcuog wakeup before last rescheduling point)
Fixes: 4ae7dc97f7 (entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point)
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210527113441.465489-1-frederic@kernel.org
2021-05-31 10:14:49 +02:00
Peter Zijlstra
0fdcccfafc tick/nohz: Call tick_nohz_task_switch() with interrupts disabled
Call tick_nohz_task_switch() slightly earlier after the context switch
to benefit from disabled IRQs. This way the function doesn't need to
disable them once more.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210512232924.150322-10-frederic@kernel.org
2021-05-13 14:21:23 +02:00