KCSAN complaints about the sbitmap hint update:
==================================================================
BUG: KCSAN: data-race in sbitmap_queue_clear / sbitmap_queue_clear
write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 1:
sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
blk_mq_put_tag+0x82/0x90
__blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
__blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
blk_complete_reqs block/blk-mq.c:584 [inline]
blk_done_softirq+0x69/0x90 block/blk-mq.c:589
__do_softirq+0x12c/0x26e kernel/softirq.c:558
run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
kthread+0x262/0x280 kernel/kthread.c:319
ret_from_fork+0x1f/0x30
write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 0:
sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
blk_mq_put_tag+0x82/0x90
__blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
__blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
blk_complete_reqs block/blk-mq.c:584 [inline]
blk_done_softirq+0x69/0x90 block/blk-mq.c:589
__do_softirq+0x12c/0x26e kernel/softirq.c:558
run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
kthread+0x262/0x280 kernel/kthread.c:319
ret_from_fork+0x1f/0x30
value changed: 0x00000035 -> 0x00000044
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.15.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
==================================================================
which is a data race, but not an important one. This is just updating the
percpu alloc hint, and the reader of that hint doesn't ever require it to
be valid.
Just annotate it with data_race() to silence this one.
Reported-by: syzbot+4f8bfd804b4a1f95b8f6@syzkaller.appspotmail.com
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sbitmap currently only supports clearing tags one-by-one, add a helper
that allows the caller to pass in an array of tags to clear.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The block layer tag allocation batching still calls into sbitmap to get
each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
which returns a mask of tags all at once, along with an offset for
those tags.
An example return would be 0xff, where bits 0..7 are set, with
tag_offset == 128. The valid tags in this case would be 128..135.
A batch is specific to an individual sbitmap_map, hence it cannot be
larger than that. The requested number of tags is automatically reduced
to the max that can be satisfied with a single map.
On failure, 0 is returned. Caller should fall back to single tag
allocation at that point/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
__sbitmap_get_word() doesn't warp if it's starting from the beginning
(i.e. initial hint is 0). Instead of stashing the original hint just set
@wrap accordingly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sbitmap_deferred_clear() does CAS loop to propagate cleared bits,
replace it with equivalent atomic bitwise and. That's slightly faster
and makes wait-free instead of lock-free as before.
The atomic can be relaxed (i.e. barrier-less) because following
sbitmap_get*() deal with synchronisation, see comments in
sbitmap_queue_clear().
It's ok to cast to atomic_long_t, that's what bitops/lock.h does.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
map->swap_lock protects map->cleared from concurrent modification,
however sbitmap_deferred_clear() is already atomically drains it, so
it's guaranteed to not loose bits on concurrent
sbitmap_deferred_clear().
A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Because of spinlocks and atomics sbitmap_deferred_clear() have to reload
&sb->map[index] on each access even though the map address won't change.
Pass in sbitmap_word instead of {sb, index}, so it's cached in a
variable. It also improves code generation of
sbitmap_find_bit_in_index().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sbitmap works by maintaining separate bitmaps of set and cleared bits.
The set bits are cleared in a batch, to save the burden of continuously
locking the "word" map to unset.
sbitmap_bitmap_show() only shows the set bits (in "word"), which is not
too much use, so mask out the cleared bits.
Fixes: ea86ea2cdc ("sbitmap: ammortize cost of clearing bits")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Under heavy loads where the kyber I/O scheduler hits the token limits for
its scheduling domains, kyber can become stuck. When active requests
complete, kyber may not be woken up leaving the I/O requests in kyber
stuck.
This stuck state is due to a race condition with kyber and the sbitmap
functions it uses to run a callback when enough requests have completed.
The running of a sbt_wait callback can race with the attempt to insert the
sbt_wait. Since sbitmap_del_wait_queue removes the sbt_wait from the list
first then sets the sbq field to NULL, kyber can see the item as not on a
list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
results in the sbt_wait being inserted onto the wait list but ws_active
doesn't get incremented. So the sbitmap queue does not know there is a
waiter on a wait list.
Since sbitmap doesn't think there is a waiter, kyber may never be
informed that there are domain tokens available and the I/O never advances.
With the sbt_wait on a wait list, kyber believes it has an active waiter
so cannot insert a new waiter when reaching the domain's full state.
This race can be fixed by only adding the sbt_wait to the queue if the
sbq field is NULL. If sbq is not NULL, there is already an action active
which will trigger the re-running of kyber. Let it run and add the
sbt_wait to the wait list if still needing to wait.
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reported-by: John Pittman <jpittman@redhat.com>
Tested-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since the only caller of this function has been deleted, delete this one
also.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
cmpxchg() with an immediate value could be replaced with less expensive
xchg(). The same true if new value don't _depend_ on the old one.
In the second block, atomic_cmpxchg() return value isn't checked, so
after atomic_cmpxchg() -> atomic_xchg() conversion it could be replaced
with atomic_set(). Comparison with atomic_read() in the second chunk was
left as an optimisation (if that was the initial intention).
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license v2 as published
by the free software foundation this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not see https www gnu org licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 2 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000435.923873561@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because we may call blk_mq_get_driver_tag() directly from
blk_mq_dispatch_rq_list() without holding any lock, then HARDIRQ may
come and the above DEADLOCK is triggered.
Commit ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") tries to
fix this issue by using 'spin_lock_bh', which isn't enough because we
complete request from hardirq context direclty in case of multiqueue.
Cc: Clark Williams <williams@redhat.com>
Fixes: ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The swap_lock used by sbitmap has a chain with locks taken from softirq,
but the swap_lock is not protected from being preempted by softirqs.
A chain exists of:
sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock
Where the sbq->ws[i].wait lock can be taken from softirq context, which
means all locks below it in the chain must also be protected from
softirqs.
Reported-by: Clark Williams <williams@redhat.com>
Fixes: 58ab5e32e6 ("sbitmap: silence bogus lockdep IRQ warning")
Fixes: ea86ea2cdc ("sbitmap: amortize cost of clearing bits")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit 5d2ee7122c, users of sbitmap that need wait queue
handling must use the provided helpers. But we only added
prepare_to_wait()/finish_wait() style helpers, add the equivalent
add_wait_queue/list_del wrappers as we..
This is needed to ensure kyber plays by the sbitmap waitqueue
rules.
Tested-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We're missing a deferred clear off the shallow get, which can cause
a hang. Additionally, when we resize the sbitmap, we should also
flush deferred clears for good measure.
Ensure we have full coverage on batch clears, even for paths where
we would not be doing deferred clear. This makes it less error
prone for future additions.
Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>