A common pattern seen when wake_qs are used to defer a wakeup
until after a lock is released is something like:
preempt_disable();
raw_spin_unlock(lock);
wake_up_q(wake_q);
preempt_enable();
So create some raw_spin_unlock*_wake() helper functions to clean
this up.
Applies on top of the fix I submitted here:
https://lore.kernel.org/lkml/20241212222138.2400498-1-jstultz@google.com/
NOTE: I recognise the unlock()/unlock_irq()/unlock_irqrestore()
variants creates its own duplication, which we could use a macro
to generate the similar functions, but I often dislike how those
generation macros making finding the actual implementation
harder, so I left the three functions as is. If folks would
prefer otherwise, let me know and I'll switch it.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20241217040803.243420-1-jstultz@google.com
With the proxy-execution series, we traverse the task->mutex->task
blocked_on/owner chain in the scheduler core. We do this while holding
the rq::lock to keep the structures in place while taking and
releasing the alternating lock types.
Since the mutex::wait_lock is one of the locks we will take in this
way under the rq::lock in the scheduler core, we need to make sure
that its usage elsewhere is irq safe.
[rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h]
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Link: https://lore.kernel.org/r/20241009235352.1614323-3-jstultz@google.com
I have seen several cases of attempts to use mutex_unlock() to release an
object such that the object can then be freed by another task.
This is not safe because mutex_unlock(), in the
MUTEX_FLAG_WAITERS && !MUTEX_FLAG_HANDOFF case, accesses the mutex
structure after having marked it as unlocked; so mutex_unlock() requires
its caller to ensure that the mutex stays alive until mutex_unlock()
returns.
If MUTEX_FLAG_WAITERS is set and there are real waiters, those waiters
have to keep the mutex alive, but we could have a spurious
MUTEX_FLAG_WAITERS left if an interruptible/killable waiter bailed
between the points where __mutex_unlock_slowpath() did the cmpxchg
reading the flags and where it acquired the wait_lock.
( With spinlocks, that kind of code pattern is allowed and, from what I
remember, used in several places in the kernel. )
Document this, such a semantic difference between mutexes and spinlocks
is fairly unintuitive.
[ mingo: Made the changelog a bit more assertive, refined the comments. ]
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20231130204817.2031407-1-jannh@google.com
The bcachefs implementation of six locks is intended to land in
generic locking code in the long term, but has been pulled into the
bcachefs subsystem for internal use for the time being. This code
lift breaks the bcachefs module build as six locks depend a couple
of the generic locking tracepoints. Export these tracepoint symbols
for bcachefs.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Have the trace_contention_*() tracepoints consistently include
adaptive spinning. In order to differentiate between the spinning and
non-spinning states add LCB_F_MUTEX and combine with LCB_F_SPIN.
The consequence is that a mutex contention can now triggler multiple
_begin() tracepoints before triggering an _end().
Additionally, this fixes one path where mutex would trigger _end()
without ever seeing a _begin().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
This adds two new lock contention tracepoints like below:
* lock:contention_begin
* lock:contention_end
The lock:contention_begin takes a flags argument to classify locks. I
found it useful to identify what kind of locks it's tracing like if
it's spinning or sleeping, reader-writer lock, real-time, and per-cpu.
Move tracepoint definitions into mutex.c so that we can use them
without lockdep.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Link: https://lkml.kernel.org/r/20220322185709.141236-2-namhyung@kernel.org
i915 will soon gain an eviction path that trylock a whole lot of locks
for eviction, getting dmesg failures like below:
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by i915_selftest/5776:
#0: ffff888101a79240 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x88/0x160
#1: ffffc900009778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
#2: ffff88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
#3: ffff88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c4/0x9d0 [i915]
#4: ffff88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
#5: ffff88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
...
#46: ffff88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
#47: ffff88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
INFO: lockdep is turned off.
Fixing eviction to nest into ww_class_acquire is a high priority, but
it requires a rework of the entire driver, which can only be done one
step at a time.
As an intermediate solution, add an acquire context to
ww_mutex_trylock, which allows us to do proper nesting annotations on
the trylocks, making the above lockdep splat disappear.
This is also useful in regulator_lock_nested, which may avoid dropping
regulator_nesting_mutex in the uncontended path, so use it there.
TTM may be another user for this, where we could lock a buffer in a
fastpath with list locks held, without dropping all locks we hold.
[peterz: rework actual ww_mutex_trylock() implementations]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YUBGPdDDjKlxAuXJ@hirez.programming.kicks-ass.net
The consolidation of the debug code for mutex waiter intialization sets
waiter::ww_ctx to a poison value unconditionally. For regular mutexes this
is intended to catch the case where waiter_ww_ctx is dereferenced
accidentally.
For ww_mutex the poison value has to be overwritten either with a context
pointer or NULL for ww_mutexes without context.
The rework broke this as it made the store conditional on the context
pointer instead of the argument which signals whether ww_mutex code should
be compiled in or optiized out. As a result waiter::ww_ctx ends up with the
poison pointer for contextless ww_mutexes which causes a later dereference of
the poison pointer because it is != NULL.
Use the build argument instead so for ww_mutex the poison value is always
overwritten.
Fixes: c0afb0ffc0 ("locking/ww_mutex: Gather mutex_waiter initialization")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210819193030.zpwrpvvrmy7xxxiy@linutronix.de
Yanfei reported that it is possible to loose HANDOFF when we race with
mutex_unlock() and end up setting HANDOFF on an unlocked mutex. At
that point anybody can steal it, losing HANDOFF in the process.
If this happens often enough, we can in fact starve the top waiter.
Solve this by folding the 'set HANDOFF' operation into the trylock
operation, such that either we acquire the lock, or it gets HANDOFF
set. This avoids having HANDOFF set on an unlocked mutex.
Reported-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com>
Link: https://lore.kernel.org/r/20210630154114.958507900@infradead.org
Yanfei reported that setting HANDOFF should not depend on recomputing
@first, only on @first state. Which would then give:
if (ww_ctx || !first)
first = __mutex_waiter_is_first(lock, &waiter);
if (first)
__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
But because 'ww_ctx || !first' is basically 'always' and the test for
first is relatively cheap, omit that first branch entirely.
Reported-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Yanfei Xu <yanfei.xu@windriver.com>
Link: https://lore.kernel.org/r/20210630154114.896786297@infradead.org