This results in no change in structure size on 64-bit machines as it
fits in the padding between the gfp_t and the void *. 32-bit machines
will grow the structure from 8 to 12 bytes. Almost all radix trees are
protected with (at least) a spinlock, so as they are converted from
radix trees to xarrays, the data structures will shrink again.
Initialising the spinlock requires a name for the benefit of lockdep, so
RADIX_TREE_INIT() now needs to know the name of the radix tree it's
initialising, and so do IDR_INIT() and IDA_INIT().
Also add the xa_lock() and xa_unlock() family of wrappers to make it
easier to use the lock. If we could rely on -fplan9-extensions in the
compiler, we could avoid all of this syntactic sugar, but that wasn't
added until gcc 4.6.
Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "XArray", v9. (First part thereof).
This patchset is, I believe, appropriate for merging for 4.17. It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.
This conversion keeps the radix tree and XArray data structures in sync
at all times. That allows us to convert the page cache one function at
a time and should allow for easier bisection. Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines. I have changes planned to the XArray data
structure, but those will happen in future patches.
Improvements the XArray has over the radix tree:
- The radix tree provides operations like other trees do; 'insert' and
'delete'. But what most users really want is an automatically
resizing array, and so it makes more sense to give users an API that
is like an array -- 'load' and 'store'. We still have an 'insert'
operation for users that really want that semantic.
- The XArray considers locking as part of its API. This simplifies a
lot of users who formerly had to manage their own locking just for
the radix tree. It also improves code generation as we can now tell
RCU that we're holding a lock and it doesn't need to generate as much
fencing code. The other advantage is that tree nodes can be moved
(not yet implemented).
- GFP flags are now parameters to calls which may need to allocate
memory. The radix tree forced users to decide what the allocation
flags would be at creation time. It's much clearer to specify them at
allocation time.
- Memory is not preloaded; we don't tie up dozens of pages on the off
chance that the slab allocator fails. Instead, we drop the lock,
allocate a new node and retry the operation. We have to convert all
the radix tree, IDA and IDR preload users before we can realise this
benefit, but I have not yet found a user which cannot be converted.
- The XArray provides a cmpxchg operation. The radix tree forces users
to roll their own (and at least four have).
- Iterators take a 'max' parameter. That simplifies many users and will
reduce the amount of iteration done.
- Iteration can proceed backwards. We only have one user for this, but
since it's called as part of the pagefault readahead algorithm, that
seemed worth mentioning.
- RCU-protected pointers are not exposed as part of the API. There are
some fun bugs where the page cache forgets to use rcu_dereference()
in the current codebase.
- Value entries gain an extra bit compared to radix tree exceptional
entries. That gives us the extra bit we need to put huge page swap
entries in the page cache.
- Some iterators now take a 'filter' argument instead of having
separate iterators for tagged/untagged iterations.
The page cache is improved by this:
- Shorter, easier to read code
- More efficient iterations
- Reduction in size of struct address_space
- Fewer walks from the top of the data structure; the XArray API
encourages staying at the leaf node and conducting operations there.
This patch (of 8):
None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.
Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the idr kernel-doc to its own idr.rst file and add a few
paragraphs about how to use it. Also add some more kernel-doc.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
About 20% of the IDR users in the kernel want the allocated IDs to start
at 1. The implementation currently searches all the way down the left
hand side of the tree, finds no free ID other than ID 0, walks all the
way back up, and then all the way down again. This patch 'rebases' the
ID so we fill the entire radix tree, rather than leave a gap at 0.
Chris Wilson says: "I did the quick hack of allocating index 0 of the
idr and that eradicated idr_get_free() from being at the top of the
profiles for the many-object stress tests. This improvement will be
much appreciated."
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Most places in the kernel that we need to distinguish functions by the
type of their arguments, we use '_ul' as a suffix for the unsigned long
variant, not '_ext'. Also add kernel-doc.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
It has no more users, so remove it. Move idr_alloc() back into idr.c,
move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
wrappers around idr_get_free_cmn() and rename it to idr_get_free().
While there is now no interface to allocate IDs larger than a u32,
the IDR internals remain ready to handle a larger ID should a need arise.
These changes make it possible to provide the guarantee that, if the
nextid pointer points into the object, the object's ID will be initialised
before a concurrent lookup can find the object.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
All current users of idr_alloc_ext() actually want to allocate a u32
and idr_alloc_u32() fits their needs better.
Like idr_get_next(), it uses a 'nextid' argument which serves as both
a pointer to the start ID and the assigned ID (instead of a separate
minimum and pointer-to-assigned-ID argument). It uses a 'max' argument
rather than 'end' because the semantics that idr_alloc has for 'end'
don't work well for unsigned types.
Since idr_alloc_u32() returns an errno instead of the allocated ID, mark
it as __must_check to help callers use it correctly. Include copious
kernel-doc. Chris Mi <chrism@mellanox.com> has promised to contribute
test-cases for idr_alloc_u32.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Changing idr_replace's 'id' argument to 'unsigned long' works for all
callers. Callers which passed a negative ID now get -ENOENT instead of
-EINVAL. No callers relied on this error value.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
The following new APIs are added:
int idr_alloc_ext(struct idr *idr, void *ptr, unsigned long *index,
unsigned long start, unsigned long end, gfp_t gfp);
void *idr_remove_ext(struct idr *idr, unsigned long id);
void *idr_find_ext(const struct idr *idr, unsigned long id);
void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id);
void *idr_get_next_ext(struct idr *idr, unsigned long *nextid);
Signed-off-by: Chris Mi <chrism@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a relatively common idiom (8 instances) to first look up an IDR
entry, and then remove it from the tree if it is found, possibly doing
further operations upon the entry afterwards. If we change idr_remove()
to return the removed object, all of these users can save themselves a
walk of the IDR tree.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
When we preload the IDA, we allocate an IDA bitmap. Instead of storing
that preallocated bitmap in the IDA, we store it in a percpu variable.
Generally there are more IDAs in the system than CPUs, so this cuts down
on the number of preallocated bitmaps that are unused, and about half
of the IDA users did not call ida_destroy() so they were leaking IDA
bitmaps.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
The IDR is very similar to the radix tree. It has some functionality that
the radix tree did not have (alloc next free, cyclic allocation, a
callback-based for_each, destroy tree), which is readily implementable on
top of the radix tree. A few small changes were needed in order to use a
tag to represent nodes with free space below them. More extensive
changes were needed to support storing NULL as a valid entry in an IDR.
Plain radix trees still interpret NULL as a not-present entry.
The IDA is reimplemented as a client of the newly enhanced radix tree. As
in the current implementation, it uses a bitmap at the last level of the
tree.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The status command originates the drbd9 code base. While for now we
keep the status information in /proc/drbd available, this commit
allows the user base to gracefully migrate their monitoring
infrastructure to the new status reporting interface.
In drbd9 no status information is exposed through /proc/drbd.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
idr_layer->layer is always accessed in read path, move it in the front.
idr_layer->bitmap is moved on the bottom. And rcu_head shares with
bitmap due to they do not be accessed at the same time.
idr->id_free/id_free_cnt/lock are free list fields, and moved to the
bottom. They will be removed in near future.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block driver updates from Jens Axboe:
"It might look big in volume, but when categorized, not a lot of
drivers are touched. The pull request contains:
- mtip32xx fixes from Micron.
- A slew of drbd updates, this time in a nicer series.
- bcache, a flash/ssd caching framework from Kent.
- Fixes for cciss"
* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
bcache: Use bd_link_disk_holder()
bcache: Allocator cleanup/fixes
cciss: bug fix to prevent cciss from loading in kdump crash kernel
cciss: add cciss_allow_hpsa module parameter
drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
mtip32xx: Workaround for unaligned writes
bcache: Make sure blocksize isn't smaller than device blocksize
bcache: Fix merge_bvec_fn usage for when it modifies the bvm
bcache: Correctly check against BIO_MAX_PAGES
bcache: Hack around stuff that clones up to bi_max_vecs
bcache: Set ra_pages based on backing device's ra_pages
bcache: Take data offset from the bdev superblock.
mtip32xx: mtip32xx: Disable TRIM support
mtip32xx: fix a smatch warning
bcache: Disable broken btree fuzz tester
bcache: Fix a format string overflow
bcache: Fix a minor memory leak on device teardown
bcache: Documentation updates
bcache: Use WARN_ONCE() instead of __WARN()
bcache: Add missing #include <linux/prefetch.h>
...