Perf modules abuse irq_set_affinity_hint() to set the affinity of system
PMU interrupts just because irq_set_affinity() was not exported.
The fact that irq_set_affinity_hint() actually sets the affinity is a
non-documented side effect and the name is clearly saying it's a hint.
To clean this up, export the real affinity setter.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210518093117.968251441@linutronix.de
The whole call to note_interrupt() can be avoided or return early when
interrupts would be marked accordingly. For IPI handlers which always
return HANDLED the whole procedure is pretty pointless to begin with.
Add a IRQF_NO_DEBUG flag and mark the interrupt accordingly if supplied
when the interrupt is requested.
When noirqdebug is set on the kernel commandline, then the interrupt is
marked unconditionally so that there is only one condition in the hotpath
to evaluate.
[ clg: Add changelog ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/7a8ad02f-63a8-c1aa-fdd1-39d973593d02@kaod.org
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in
the tasklet state to be cleared. This works on !RT nicely because the
corresponding execution can only happen on a different CPU.
On RT softirq processing is preemptible, therefore a task preempting the
softirq processing thread can spin forever.
Prevent this by invoking local_bh_disable()/enable() inside the loop. In
case that the softirq processing thread was preempted by the current task,
current will block on the local lock which yields the CPU to the preempted
softirq processing thread. If the tasklet is processed on a different CPU
then the local_bh_disable()/enable() pair is just a waste of processor
cycles.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.988908275@linutronix.de
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This
is wasting CPU cycles in a tight loop which is especially painful in a
guest when the CPU running the tasklet is scheduled out.
tasklet_unlock_wait() is invoked from tasklet_kill() which is used in
teardown paths and not performance critical at all. Replace the spin wait
with wait_var_event().
There are no users of tasklet_unlock_wait() which are invoked from atomic
contexts. The usage in tasklet_disable() has been replaced temporarily with
the spin waiting variant until the atomic users are fixed up and will be
converted to the sleep wait variant later.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.783936921@linutronix.de
Replacing the spin wait loops in tasklet_unlock_wait() with
wait_var_event() is not possible as a handful of tasklet_disable()
invocations are happening in atomic context. All other invocations are in
teardown paths which can sleep.
Provide tasklet_disable_in_atomic() and tasklet_unlock_spin_wait() to
convert the few atomic use cases over, which allows to change
tasklet_disable() and tasklet_unlock_wait() in a later step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.563164193@linutronix.de
Many drivers don't want interrupts enabled automatically via request_irq().
So they are handling this issue by either way of the below two:
(1)
irq_set_status_flags(irq, IRQ_NOAUTOEN);
request_irq(dev, irq...);
(2)
request_irq(dev, irq...);
disable_irq(irq);
The code in the second way is silly and unsafe. In the small time gap
between request_irq() and disable_irq(), interrupts can still come.
The code in the first way is safe though it's subobtimal.
Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to
request_irq() and request_nmi(). It prevents the automatic enabling of the
requested interrupt/nmi in the same safe way as #1 above. With that the
various usage sites of #1 and #2 above can be simplified and corrected.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: dmitry.torokhov@gmail.com
Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com
To avoid include recursion hell move the do_softirq_own_stack() related
content into a generic asm header and include it from all places in arch/
which need the prototype.
This allows architectures to provide an inline implementation of
do_softirq_own_stack() without introducing a lot of #ifdeffery all over the
place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.289960691@linutronix.de
To prepare for inlining do_softirq_own_stack() replace
__ARCH_HAS_DO_SOFTIRQ with a Kconfig switch and select it in the affected
architectures.
This allows in the next step to move the function prototype and the inline
stub into a seperate asm-generic header file which is required to avoid
include recursion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.181713427@linutronix.de
This function uses irq_to_desc() and is going to be used by modules to
replace the open coded irq_to_desc() (ab)usage. The final goal is to remove
the export of irq_to_desc() so driver cannot fiddle with it anymore.
Move it into the core code and fixup the usage sites to include the proper
header.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201210194042.548936472@linutronix.de
Add a function to allow the affinity of an interrupt be switched to
managed, such that interrupts allocated for platform devices may be
managed.
This new interface has certain limitations, and attempts to use it in the
following circumstances will fail:
- For when the kernel is configured for generic IRQ reservation mode (in
config GENERIC_IRQ_RESERVATION_MODE). The reason being that it could
conflict with managed vs. non-managed interrupt accounting.
- The interrupt is already started, which should not be the case during
init
- The interrupt is already configured as managed, which means double init
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1606905417-183214-2-git-send-email-john.garry@huawei.com
Nowadays, modern kernel subsystems that use callbacks pass the data
structure associated with a given callback as argument to the callback.
The tasklet subsystem remains one which passes an arbitrary unsigned
long to the callback function. This has several problems:
- This keeps an extra field for storing the argument in each tasklet
data structure, it bloats the tasklet_struct structure with a redundant
.data field
- No type checking can be performed on this argument. Instead of
using container_of() like other callback subsystems, it forces callbacks
to do explicit type cast of the unsigned long argument into the required
object type.
- Buffer overflows can overwrite the .func and the .data field, so
an attacker can easily overwrite the function and its first argument
to whatever it wants.
Add a new tasklet initialization API, via DECLARE_TASKLET() and
tasklet_setup(), which will replace the existing ones.
This work is greatly inspired by the timer_struct conversion series,
see commit e99e88a9d2 ("treewide: setup_timer() -> timer_setup()")
To avoid problems with both -Wcast-function-type (which is enabled in
the kernel via -Wextra is several subsystems), and with mismatched
function prototypes when build with Control Flow Integrity enabled,
this adds the "use_callback" member to let the tasklet caller choose
which union member to call through. Once all old API uses are removed,
this and the .data member will be removed as well. (On 64-bit this does
not grow the struct size as the new member fills the hole after atomic_t,
which is also "int" sized.)
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Co-developed-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
This converts all the existing DECLARE_TASKLET() (and ...DISABLED)
macros with DECLARE_TASKLET_OLD() in preparation for refactoring the
tasklet callback type. All existing DECLARE_TASKLET() users had a "0"
data argument, it has been removed here as well.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
The entry rework moved interrupt entry code from the irqentry to the
noinstr section which made the irqentry section empty.
This breaks boundary checks which rely on the __irqentry_text_start/end
markers to find out whether a function in a stack trace is
interrupt/exception entry code. This affects the function graph tracer and
filter_irq_stacks().
As the IDT entry points are all sequentialy emitted this is rather simple
to unbreak by injecting __irqentry_text_start/end as global labels.
To make this work correctly:
- Remove the IRQENTRY_TEXT section from the x86 linker script
- Define __irqentry so it breaks the build if it's used
- Adjust the entry mirroring in PTI
- Remove the redundant kprobes and unwinder bound checks
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Error injection mechanisms need a half ways safe way to inject interrupts as
invoking generic_handle_irq() or the actual device interrupt handler
directly from e.g. a debugfs write is not guaranteed to be safe.
On x86 generic_handle_irq() is unsafe due to the hardware trainwreck which
is the base of x86 interrupt delivery and affinity management.
Move the irq debugfs injection code into a separate function which can be
used by error injection code as well.
The implementation prevents at least that state is corrupted, but it cannot
close a very tiny race window on x86 which might result in a stale and not
serviced device interrupt under very unlikely circumstances.
This is explicitly for debugging and testing and not for production use or
abuse in random driver code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lkml.kernel.org/r/20200306130623.990928309@linutronix.de
Pull timer updates from Ingo Molnar:
"The main changes in the timer code in this cycle were:
- Clockevent updates:
- timer-of framework cleanups. (Geert Uytterhoeven)
- Use timer-of for the renesas-ostm and the device name to prevent
name collision in case of multiple timers. (Geert Uytterhoeven)
- Check if there is an error after calling of_clk_get in asm9260
(Chuhong Yuan)
- ABI fix: Zero out high order bits of nanoseconds on compat
syscalls. This got broken a year ago, with apparently no side
effects so far.
Since the kernel would use random data otherwise I don't think we'd
have other options but to fix the bug, even if there was a side
effect to applications (Dmitry Safonov)
- Optimize ns_to_timespec64() on 32-bit systems: move away from
div_s64_rem() which can be slow, to div_u64_rem() which is faster
(Arnd Bergmann)
- Annotate KCSAN-reported false positive data races in
hrtimer_is_queued() users by moving timer->state handling over to
the READ_ONCE()/WRITE_ONCE() APIs. This documents these accesses
(Eric Dumazet)
- Misc cleanups and small fixes"
[ I undid the "ABI fix" and updated the comments instead. The reason
there were apparently no side effects is that the fix was a no-op.
The updated comment is to say _why_ it was a no-op. - Linus ]
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Zero the upper 32-bits in __kernel_timespec on 32-bit
time: Rename tsk->real_start_time to ->start_boottime
hrtimer: Remove the comment about not used HRTIMER_SOFTIRQ
time: Fix spelling mistake in comment
time: Optimize ns_to_timespec64()
hrtimer: Annotate lockless access to timer->state
clocksource/drivers/asm9260: Add a check for of_clk_get
clocksource/drivers/renesas-ostm: Use unique device name instead of ostm
clocksource/drivers/renesas-ostm: Convert to timer_of
clocksource/drivers/timer-of: Use unique device name instead of timer
clocksource/drivers/timer-of: Convert last full_name to %pOF
While checking the results of the :c:func: removal, I noticed that there
was no documentation for request_irq(), and request_threaded_irq() was not
mentioned at all. Add a kerneldoc comment for request_irq() and add
request_threaded_irq() to the list of functions.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>