mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
Merge tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs updates from Kent Overstreet: - More safety fixes, primarily found by syzbot - Run the upgrade/downgrade paths in nochnages mode. Nochanges mode is primarily for testing fsck/recovery in dry run mode, so it shouldn't change anything besides disabling writes and holding dirty metadata in memory. The idea here was to reduce the amount of activity if we can't write anything out, so that bringing up a filesystem in "super ro" mode would be more lilkely to work for data recovery - but norecovery is the correct option for this. - btree_trans->locked; we now track whether a btree_trans has any btree nodes locked, and this is used for improved assertions related to trans_unlock() and trans_relock(). We'll also be using it for improving how we work with lockdep in the future: we don't want lockdep to be tracking individual btree node locks because we take too many for lockdep to track, and it's not necessary since we have a cycle detector. - Trigger improvements that are prep work for online fsck - BTREE_TRIGGER_check_repair; this regularizes how we do some repair work for extents that goes with running triggers in fsck, and fixes some subtle issues with transaction restarts there. - bch2_snapshot_equiv() has now been ripped out of fsck.c; snapshot equivalence classes are for when snapshot deletion leaves behind redundant snapshot nodes, but snapshot deletion now cleans this up right away, so the abstraction doesn't need to leak. - Improvements to how we resume writing to the journal in recovery. The code for picking the new place to write when reading the journal is greatly simplified and we also store the position in the superblock for when we don't read the journal; this means that we preserve more of the journal for list_journal debugging. - Improvements to sysfs btree_cache and btree_node_cache, for debugging memory reclaim. - We now detect when we've blocked for 10 seconds on the allocator in the write path and dump some useful info. - Safety fixes for devices references: this is a big series that changes almost all device lookups to properly check if the device exists and take a reference to it. Previously we assumed that if a bkey exists that references a device then the device must exist, and this was enforced in .invalid methods, but this was incorrect because it meant device removal relied on accounting being correct to not leave keys pointing to invalid devices, and that's not something we can assume. Getting the "pointer to invalid device" checks out of our .invalid() methods fixes some long standing device removal bugs; the only outstanding bug with device removal now is a race between the discard path and deleting alloc info, which should be easily fixed. - The allocator now prefers not to expand the new member_info.btree_allocated bitmap, meaning if repair ever requires scanning for btree nodes (because of a corrupt interior nodes) we won't have to scan the whole device(s). - New coding style document, which among other things talks about the correct usage of assertions * tag 'bcachefs-2024-05-19' of https://evilpiepirate.org/git/bcachefs: (155 commits) bcachefs: add no_invalid_checks flag bcachefs: add counters for failed shrinker reclaim bcachefs: Fix sb_field_downgrade validation bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() bcachefs: s/bkey_invalid_flags/bch_validate_flags bcachefs: fsync() should not return -EROFS bcachefs: Invalid devices are now checked for by fsck, not .invalid methods bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() bcachefs: bch2_dev_get_ioref() checks for device not present bcachefs: bch2_dev_get_ioref2(); io_read.c bcachefs: bch2_dev_get_ioref2(); debug.c bcachefs: bch2_dev_get_ioref2(); journal_io.c bcachefs: bch2_dev_get_ioref2(); io_write.c bcachefs: bch2_dev_get_ioref2(); btree_io.c bcachefs: bch2_dev_get_ioref2(); backpointers.c bcachefs: bch2_dev_get_ioref2(); alloc_background.c bcachefs: for_each_bset() declares loop iter bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h bcachefs: Improve sysfs internal/btree_cache ...
This commit is contained in:
186
Documentation/filesystems/bcachefs/CodingStyle.rst
Normal file
186
Documentation/filesystems/bcachefs/CodingStyle.rst
Normal file
@@ -0,0 +1,186 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
bcachefs coding style
|
||||
=====================
|
||||
|
||||
Good development is like gardening, and codebases are our gardens. Tend to them
|
||||
every day; look for little things that are out of place or in need of tidying.
|
||||
A little weeding here and there goes a long way; don't wait until things have
|
||||
spiraled out of control.
|
||||
|
||||
Things don't always have to be perfect - nitpicking often does more harm than
|
||||
good. But appreciate beauty when you see it - and let people know.
|
||||
|
||||
The code that you are afraid to touch is the code most in need of refactoring.
|
||||
|
||||
A little organizing here and there goes a long way.
|
||||
|
||||
Put real thought into how you organize things.
|
||||
|
||||
Good code is readable code, where the structure is simple and leaves nowhere
|
||||
for bugs to hide.
|
||||
|
||||
Assertions are one of our most important tools for writing reliable code. If in
|
||||
the course of writing a patchset you encounter a condition that shouldn't
|
||||
happen (and will have unpredictable or undefined behaviour if it does), or
|
||||
you're not sure if it can happen and not sure how to handle it yet - make it a
|
||||
BUG_ON(). Don't leave undefined or unspecified behavior lurking in the codebase.
|
||||
|
||||
By the time you finish the patchset, you should understand better which
|
||||
assertions need to be handled and turned into checks with error paths, and
|
||||
which should be logically impossible. Leave the BUG_ON()s in for the ones which
|
||||
are logically impossible. (Or, make them debug mode assertions if they're
|
||||
expensive - but don't turn everything into a debug mode assertion, so that
|
||||
we're not stuck debugging undefined behaviour should it turn out that you were
|
||||
wrong).
|
||||
|
||||
Assertions are documentation that can't go out of date. Good assertions are
|
||||
wonderful.
|
||||
|
||||
Good assertions drastically and dramatically reduce the amount of testing
|
||||
required to shake out bugs.
|
||||
|
||||
Good assertions are based on state, not logic. To write good assertions, you
|
||||
have to think about what the invariants on your state are.
|
||||
|
||||
Good invariants and assertions will hold everywhere in your codebase. This
|
||||
means that you can run them in only a few places in the checked in version, but
|
||||
should you need to debug something that caused the assertion to fail, you can
|
||||
quickly shotgun them everywhere to find the codepath that broke the invariant.
|
||||
|
||||
A good assertion checks something that the compiler could check for us, and
|
||||
elide - if we were working in a language with embedded correctness proofs that
|
||||
the compiler could check. This is something that exists today, but it'll likely
|
||||
still be a few decades before it comes to systems programming languages. But we
|
||||
can still incorporate that kind of thinking into our code and document the
|
||||
invariants with runtime checks - much like the way people working in
|
||||
dynamically typed languages may add type annotations, gradually making their
|
||||
code statically typed.
|
||||
|
||||
Looking for ways to make your assertions simpler - and higher level - will
|
||||
often nudge you towards making the entire system simpler and more robust.
|
||||
|
||||
Good code is code where you can poke around and see what it's doing -
|
||||
introspection. We can't debug anything if we can't see what's going on.
|
||||
|
||||
Whenever we're debugging, and the solution isn't immediately obvious, if the
|
||||
issue is that we don't know where the issue is because we can't see what's
|
||||
going on - fix that first.
|
||||
|
||||
We have the tools to make anything visible at runtime, efficiently - RCU and
|
||||
percpu data structures among them. Don't let things stay hidden.
|
||||
|
||||
The most important tool for introspection is the humble pretty printer - in
|
||||
bcachefs, this means `*_to_text()` functions, which output to printbufs.
|
||||
|
||||
Pretty printers are wonderful, because they compose and you can use them
|
||||
everywhere. Having functions to print whatever object you're working with will
|
||||
make your error messages much easier to write (therefore they will actually
|
||||
exist) and much more informative. And they can be used from sysfs/debugfs, as
|
||||
well as tracepoints.
|
||||
|
||||
Runtime info and debugging tools should come with clear descriptions and
|
||||
labels, and good structure - we don't want files with a list of bare integers,
|
||||
like in procfs. Part of the job of the debugging tools is to educate users and
|
||||
new developers as to how the system works.
|
||||
|
||||
Error messages should, whenever possible, tell you everything you need to debug
|
||||
the issue. It's worth putting effort into them.
|
||||
|
||||
Tracepoints shouldn't be the first thing you reach for. They're an important
|
||||
tool, but always look for more immediate ways to make things visible. When we
|
||||
have to rely on tracing, we have to know which tracepoints we're looking for,
|
||||
and then we have to run the troublesome workload, and then we have to sift
|
||||
through logs. This is a lot of steps to go through when a user is hitting
|
||||
something, and if it's intermittent it may not even be possible.
|
||||
|
||||
The humble counter is an incredibly useful tool. They're cheap and simple to
|
||||
use, and many complicated internal operations with lots of things that can
|
||||
behave weirdly (anything involving memory reclaim, for example) become
|
||||
shockingly easy to debug once you have counters on every distinct codepath.
|
||||
|
||||
Persistent counters are even better.
|
||||
|
||||
When debugging, try to get the most out of every bug you come across; don't
|
||||
rush to fix the initial issue. Look for things that will make related bugs
|
||||
easier the next time around - introspection, new assertions, better error
|
||||
messages, new debug tools, and do those first. Look for ways to make the system
|
||||
better behaved; often one bug will uncover several other bugs through
|
||||
downstream effects.
|
||||
|
||||
Fix all that first, and then the original bug last - even if that means keeping
|
||||
a user waiting. They'll thank you in the long run, and when they understand
|
||||
what you're doing you'll be amazed at how patient they're happy to be. Users
|
||||
like to help - otherwise they wouldn't be reporting the bug in the first place.
|
||||
|
||||
Talk to your users. Don't isolate yourself.
|
||||
|
||||
Users notice all sorts of interesting things, and by just talking to them and
|
||||
interacting with them you can benefit from their experience.
|
||||
|
||||
Spend time doing support and helpdesk stuff. Don't just write code - code isn't
|
||||
finished until it's being used trouble free.
|
||||
|
||||
This will also motivate you to make your debugging tools as good as possible,
|
||||
and perhaps even your documentation, too. Like anything else in life, the more
|
||||
time you spend at it the better you'll get, and you the developer are the
|
||||
person most able to improve the tools to make debugging quick and easy.
|
||||
|
||||
Be wary of how you take on and commit to big projects. Don't let development
|
||||
become product-manager focused. Often time an idea is a good one but needs to
|
||||
wait for its proper time - but you won't know if it's the proper time for an
|
||||
idea until you start writing code.
|
||||
|
||||
Expect to throw a lot of things away, or leave them half finished for later.
|
||||
Nobody writes all perfect code that all gets shipped, and you'll be much more
|
||||
productive in the long run if you notice this early and shift to something
|
||||
else. The experience gained and lessons learned will be valuable for all the
|
||||
other work you do.
|
||||
|
||||
But don't be afraid to tackle projects that require significant rework of
|
||||
existing code. Sometimes these can be the best projects, because they can lead
|
||||
us to make existing code more general, more flexible, more multipurpose and
|
||||
perhaps more robust. Just don't hesitate to abandon the idea if it looks like
|
||||
it's going to make a mess of things.
|
||||
|
||||
Complicated features can often be done as a series of refactorings, with the
|
||||
final change that actually implements the feature as a quite small patch at the
|
||||
end. It's wonderful when this happens, especially when those refactorings are
|
||||
things that improve the codebase in their own right. When that happens there's
|
||||
much less risk of wasted effort if the feature you were going for doesn't work
|
||||
out.
|
||||
|
||||
Always strive to work incrementally. Always strive to turn the big projects
|
||||
into little bite sized projects that can prove their own merits.
|
||||
|
||||
Instead of always tackling those big projects, look for little things that
|
||||
will be useful, and make the big projects easier.
|
||||
|
||||
The question of what's likely to be useful is where junior developers most
|
||||
often go astray - doing something because it seems like it'll be useful often
|
||||
leads to overengineering. Knowing what's useful comes from many years of
|
||||
experience, or talking with people who have that experience - or from simply
|
||||
reading lots of code and looking for common patterns and issues. Don't be
|
||||
afraid to throw things away and do something simpler.
|
||||
|
||||
Talk about your ideas with your fellow developers; often times the best things
|
||||
come from relaxed conversations where people aren't afraid to say "what if?".
|
||||
|
||||
Don't neglect your tools.
|
||||
|
||||
The most important tools (besides the compiler and our text editor) are the
|
||||
tools we use for testing. The shortest possible edit/test/debug cycle is
|
||||
essential for working productively. We learn, gain experience, and discover the
|
||||
errors in our thinking by running our code and seeing what happens. If your
|
||||
time is being wasted because your tools are bad or too slow - don't accept it,
|
||||
fix it.
|
||||
|
||||
Put effort into your documentation, commmit messages, and code comments - but
|
||||
don't go overboard. A good commit message is wonderful - but if the information
|
||||
was important enough to go in a commit message, ask yourself if it would be
|
||||
even better as a code comment.
|
||||
|
||||
A good code comment is wonderful, but even better is the comment that didn't
|
||||
need to exist because the code was so straightforward as to be obvious;
|
||||
organized into small clean and tidy modules, with clear and descriptive names
|
||||
for functions and variable, where every line of code has a clear purpose.
|
||||
@@ -8,4 +8,5 @@ bcachefs Documentation
|
||||
:maxdepth: 2
|
||||
:numbered:
|
||||
|
||||
CodingStyle
|
||||
errorcodes
|
||||
|
||||
@@ -282,18 +282,12 @@ struct posix_acl *bch2_get_acl(struct mnt_idmap *idmap,
|
||||
struct btree_trans *trans = bch2_trans_get(c);
|
||||
struct btree_iter iter = { NULL };
|
||||
struct posix_acl *acl = NULL;
|
||||
struct bkey_s_c k;
|
||||
int ret;
|
||||
retry:
|
||||
bch2_trans_begin(trans);
|
||||
|
||||
ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
|
||||
&hash, inode_inum(inode), &search, 0);
|
||||
if (ret)
|
||||
goto err;
|
||||
|
||||
k = bch2_btree_iter_peek_slot(&iter);
|
||||
ret = bkey_err(k);
|
||||
struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
|
||||
&hash, inode_inum(inode), &search, 0);
|
||||
int ret = bkey_err(k);
|
||||
if (ret)
|
||||
goto err;
|
||||
|
||||
@@ -366,7 +360,7 @@ retry:
|
||||
|
||||
ret = bch2_subvol_is_ro_trans(trans, inode->ei_subvol) ?:
|
||||
bch2_inode_peek(trans, &inode_iter, &inode_u, inode_inum(inode),
|
||||
BTREE_ITER_INTENT);
|
||||
BTREE_ITER_intent);
|
||||
if (ret)
|
||||
goto btree_err;
|
||||
|
||||
@@ -414,39 +408,30 @@ int bch2_acl_chmod(struct btree_trans *trans, subvol_inum inum,
|
||||
struct bch_hash_info hash_info = bch2_hash_info_init(trans->c, inode);
|
||||
struct xattr_search_key search = X_SEARCH(KEY_TYPE_XATTR_INDEX_POSIX_ACL_ACCESS, "", 0);
|
||||
struct btree_iter iter;
|
||||
struct bkey_s_c_xattr xattr;
|
||||
struct bkey_i_xattr *new;
|
||||
struct posix_acl *acl = NULL;
|
||||
struct bkey_s_c k;
|
||||
int ret;
|
||||
|
||||
ret = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
|
||||
&hash_info, inum, &search, BTREE_ITER_INTENT);
|
||||
struct bkey_s_c k = bch2_hash_lookup(trans, &iter, bch2_xattr_hash_desc,
|
||||
&hash_info, inum, &search, BTREE_ITER_intent);
|
||||
int ret = bkey_err(k);
|
||||
if (ret)
|
||||
return bch2_err_matches(ret, ENOENT) ? 0 : ret;
|
||||
|
||||
k = bch2_btree_iter_peek_slot(&iter);
|
||||
ret = bkey_err(k);
|
||||
if (ret)
|
||||
goto err;
|
||||
xattr = bkey_s_c_to_xattr(k);
|
||||
struct bkey_s_c_xattr xattr = bkey_s_c_to_xattr(k);
|
||||
|
||||
acl = bch2_acl_from_disk(trans, xattr_val(xattr.v),
|
||||
le16_to_cpu(xattr.v->x_val_len));
|
||||
ret = PTR_ERR_OR_ZERO(acl);
|
||||
if (IS_ERR_OR_NULL(acl))
|
||||
goto err;
|
||||
|
||||
ret = allocate_dropping_locks_errcode(trans,
|
||||
__posix_acl_chmod(&acl, _gfp, mode));
|
||||
if (ret)
|
||||
goto err;
|
||||
|
||||
new = bch2_acl_to_xattr(trans, acl, ACL_TYPE_ACCESS);
|
||||
if (IS_ERR(new)) {
|
||||
ret = PTR_ERR(new);
|
||||
ret = allocate_dropping_locks_errcode(trans, __posix_acl_chmod(&acl, _gfp, mode));
|
||||
if (ret)
|
||||
goto err;
|
||||
|
||||
struct bkey_i_xattr *new = bch2_acl_to_xattr(trans, acl, ACL_TYPE_ACCESS);
|
||||
ret = PTR_ERR_OR_ZERO(new);
|
||||
if (ret)
|
||||
goto err;
|
||||
}
|
||||
|
||||
new->k.p = iter.pos;
|
||||
ret = bch2_trans_update(trans, &iter, &new->k_i, 0);
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -8,21 +8,18 @@
|
||||
#include "debug.h"
|
||||
#include "super.h"
|
||||
|
||||
enum bkey_invalid_flags;
|
||||
enum bch_validate_flags;
|
||||
|
||||
/* How out of date a pointer gen is allowed to be: */
|
||||
#define BUCKET_GC_GEN_MAX 96U
|
||||
|
||||
static inline bool bch2_dev_bucket_exists(struct bch_fs *c, struct bpos pos)
|
||||
{
|
||||
struct bch_dev *ca;
|
||||
|
||||
if (!bch2_dev_exists2(c, pos.inode))
|
||||
return false;
|
||||
|
||||
ca = bch_dev_bkey_exists(c, pos.inode);
|
||||
return pos.offset >= ca->mi.first_bucket &&
|
||||
pos.offset < ca->mi.nbuckets;
|
||||
rcu_read_lock();
|
||||
struct bch_dev *ca = bch2_dev_rcu(c, pos.inode);
|
||||
bool ret = ca && bucket_valid(ca, pos.offset);
|
||||
rcu_read_unlock();
|
||||
return ret;
|
||||
}
|
||||
|
||||
static inline u64 bucket_to_u64(struct bpos bucket)
|
||||
@@ -40,38 +37,50 @@ static inline u8 alloc_gc_gen(struct bch_alloc_v4 a)
|
||||
return a.gen - a.oldest_gen;
|
||||
}
|
||||
|
||||
static inline enum bch_data_type __alloc_data_type(u32 dirty_sectors,
|
||||
u32 cached_sectors,
|
||||
u32 stripe,
|
||||
struct bch_alloc_v4 a,
|
||||
enum bch_data_type data_type)
|
||||
static inline void alloc_to_bucket(struct bucket *dst, struct bch_alloc_v4 src)
|
||||
{
|
||||
if (stripe)
|
||||
return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe;
|
||||
if (dirty_sectors)
|
||||
return data_type;
|
||||
if (cached_sectors)
|
||||
return BCH_DATA_cached;
|
||||
if (BCH_ALLOC_V4_NEED_DISCARD(&a))
|
||||
return BCH_DATA_need_discard;
|
||||
if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX)
|
||||
return BCH_DATA_need_gc_gens;
|
||||
return BCH_DATA_free;
|
||||
dst->gen = src.gen;
|
||||
dst->data_type = src.data_type;
|
||||
dst->dirty_sectors = src.dirty_sectors;
|
||||
dst->cached_sectors = src.cached_sectors;
|
||||
dst->stripe = src.stripe;
|
||||
}
|
||||
|
||||
static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a,
|
||||
enum bch_data_type data_type)
|
||||
static inline void __bucket_m_to_alloc(struct bch_alloc_v4 *dst, struct bucket src)
|
||||
{
|
||||
return __alloc_data_type(a.dirty_sectors, a.cached_sectors,
|
||||
a.stripe, a, data_type);
|
||||
dst->gen = src.gen;
|
||||
dst->data_type = src.data_type;
|
||||
dst->dirty_sectors = src.dirty_sectors;
|
||||
dst->cached_sectors = src.cached_sectors;
|
||||
dst->stripe = src.stripe;
|
||||
}
|
||||
|
||||
static inline struct bch_alloc_v4 bucket_m_to_alloc(struct bucket b)
|
||||
{
|
||||
struct bch_alloc_v4 ret = {};
|
||||
__bucket_m_to_alloc(&ret, b);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static inline enum bch_data_type bucket_data_type(enum bch_data_type data_type)
|
||||
{
|
||||
return data_type == BCH_DATA_stripe ? BCH_DATA_user : data_type;
|
||||
switch (data_type) {
|
||||
case BCH_DATA_cached:
|
||||
case BCH_DATA_stripe:
|
||||
return BCH_DATA_user;
|
||||
default:
|
||||
return data_type;
|
||||
}
|
||||
}
|
||||
|
||||
static inline unsigned bch2_bucket_sectors(struct bch_alloc_v4 a)
|
||||
static inline bool bucket_data_type_mismatch(enum bch_data_type bucket,
|
||||
enum bch_data_type ptr)
|
||||
{
|
||||
return !data_type_is_empty(bucket) &&
|
||||
bucket_data_type(bucket) != bucket_data_type(ptr);
|
||||
}
|
||||
|
||||
static inline unsigned bch2_bucket_sectors_total(struct bch_alloc_v4 a)
|
||||
{
|
||||
return a.dirty_sectors + a.cached_sectors;
|
||||
}
|
||||
@@ -89,6 +98,27 @@ static inline unsigned bch2_bucket_sectors_fragmented(struct bch_dev *ca,
|
||||
return d ? max(0, ca->mi.bucket_size - d) : 0;
|
||||
}
|
||||
|
||||
static inline enum bch_data_type alloc_data_type(struct bch_alloc_v4 a,
|
||||
enum bch_data_type data_type)
|
||||
{
|
||||
if (a.stripe)
|
||||
return data_type == BCH_DATA_parity ? data_type : BCH_DATA_stripe;
|
||||
if (a.dirty_sectors)
|
||||
return data_type;
|
||||
if (a.cached_sectors)
|
||||
return BCH_DATA_cached;
|
||||
if (BCH_ALLOC_V4_NEED_DISCARD(&a))
|
||||
return BCH_DATA_need_discard;
|
||||
if (alloc_gc_gen(a) >= BUCKET_GC_GEN_MAX)
|
||||
return BCH_DATA_need_gc_gens;
|
||||
return BCH_DATA_free;
|
||||
}
|
||||
|
||||
static inline void alloc_data_type_set(struct bch_alloc_v4 *a, enum bch_data_type data_type)
|
||||
{
|
||||
a->data_type = alloc_data_type(*a, data_type);
|
||||
}
|
||||
|
||||
static inline u64 alloc_lru_idx_read(struct bch_alloc_v4 a)
|
||||
{
|
||||
return a.data_type == BCH_DATA_cached ? a.io_time[READ] : 0;
|
||||
@@ -147,7 +177,9 @@ static inline void set_alloc_v4_u64s(struct bkey_i_alloc_v4 *a)
|
||||
}
|
||||
|
||||
struct bkey_i_alloc_v4 *
|
||||
bch2_trans_start_alloc_update(struct btree_trans *, struct btree_iter *, struct bpos);
|
||||
bch2_trans_start_alloc_update_noupdate(struct btree_trans *, struct btree_iter *, struct bpos);
|
||||
struct bkey_i_alloc_v4 *
|
||||
bch2_trans_start_alloc_update(struct btree_trans *, struct bpos);
|
||||
|
||||
void __bch2_alloc_to_v4(struct bkey_s_c, struct bch_alloc_v4 *);
|
||||
|
||||
@@ -173,13 +205,13 @@ struct bkey_i_alloc_v4 *bch2_alloc_to_v4_mut(struct btree_trans *, struct bkey_s
|
||||
int bch2_bucket_io_time_reset(struct btree_trans *, unsigned, size_t, int);
|
||||
|
||||
int bch2_alloc_v1_invalid(struct bch_fs *, struct bkey_s_c,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
int bch2_alloc_v2_invalid(struct bch_fs *, struct bkey_s_c,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
int bch2_alloc_v3_invalid(struct bch_fs *, struct bkey_s_c,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
int bch2_alloc_v4_invalid(struct bch_fs *, struct bkey_s_c,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
void bch2_alloc_v4_swab(struct bkey_s);
|
||||
void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
|
||||
|
||||
@@ -213,7 +245,7 @@ void bch2_alloc_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
|
||||
})
|
||||
|
||||
int bch2_bucket_gens_invalid(struct bch_fs *, struct bkey_s_c,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
void bch2_bucket_gens_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
|
||||
|
||||
#define bch2_bkey_ops_bucket_gens ((struct bkey_ops) { \
|
||||
@@ -233,7 +265,8 @@ static inline bool bkey_is_alloc(const struct bkey *k)
|
||||
int bch2_alloc_read(struct bch_fs *);
|
||||
|
||||
int bch2_trigger_alloc(struct btree_trans *, enum btree_id, unsigned,
|
||||
struct bkey_s_c, struct bkey_s, unsigned);
|
||||
struct bkey_s_c, struct bkey_s,
|
||||
enum btree_iter_update_trigger_flags);
|
||||
int bch2_check_alloc_info(struct bch_fs *);
|
||||
int bch2_check_alloc_to_lru_refs(struct bch_fs *);
|
||||
void bch2_do_discards(struct bch_fs *);
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -30,8 +30,14 @@ void bch2_dev_stripe_increment(struct bch_dev *, struct dev_stripe_state *);
|
||||
|
||||
long bch2_bucket_alloc_new_fs(struct bch_dev *);
|
||||
|
||||
static inline struct bch_dev *ob_dev(struct bch_fs *c, struct open_bucket *ob)
|
||||
{
|
||||
return bch2_dev_have_ref(c, ob->dev);
|
||||
}
|
||||
|
||||
struct open_bucket *bch2_bucket_alloc(struct bch_fs *, struct bch_dev *,
|
||||
enum bch_watermark, struct closure *);
|
||||
enum bch_watermark, enum bch_data_type,
|
||||
struct closure *);
|
||||
|
||||
static inline void ob_push(struct bch_fs *c, struct open_buckets *obs,
|
||||
struct open_bucket *ob)
|
||||
@@ -184,7 +190,7 @@ bch2_alloc_sectors_append_ptrs_inlined(struct bch_fs *c, struct write_point *wp,
|
||||
wp->sectors_allocated += sectors;
|
||||
|
||||
open_bucket_for_each(c, &wp->ptrs, ob, i) {
|
||||
struct bch_dev *ca = bch_dev_bkey_exists(c, ob->dev);
|
||||
struct bch_dev *ca = ob_dev(c, ob);
|
||||
struct bch_extent_ptr ptr = bch2_ob_ptr(c, ob);
|
||||
|
||||
ptr.cached = cached ||
|
||||
@@ -221,4 +227,9 @@ void bch2_open_buckets_partial_to_text(struct printbuf *, struct bch_fs *);
|
||||
|
||||
void bch2_write_points_to_text(struct printbuf *, struct bch_fs *);
|
||||
|
||||
void bch2_fs_alloc_debug_to_text(struct printbuf *, struct bch_fs *);
|
||||
void bch2_dev_alloc_debug_to_text(struct printbuf *, struct bch_dev *);
|
||||
|
||||
void bch2_print_allocator_stuck(struct bch_fs *);
|
||||
|
||||
#endif /* _BCACHEFS_ALLOC_FOREGROUND_H */
|
||||
|
||||
@@ -9,11 +9,18 @@
|
||||
#include "fifo.h"
|
||||
|
||||
struct bucket_alloc_state {
|
||||
enum {
|
||||
BTREE_BITMAP_NO,
|
||||
BTREE_BITMAP_YES,
|
||||
BTREE_BITMAP_ANY,
|
||||
} btree_bitmap;
|
||||
|
||||
u64 buckets_seen;
|
||||
u64 skipped_open;
|
||||
u64 skipped_need_journal_commit;
|
||||
u64 skipped_nocow;
|
||||
u64 skipped_nouse;
|
||||
u64 skipped_mi_btree_bitmap;
|
||||
};
|
||||
|
||||
#define BCH_WATERMARKS() \
|
||||
|
||||
@@ -23,6 +23,7 @@ static bool extent_matches_bp(struct bch_fs *c,
|
||||
const union bch_extent_entry *entry;
|
||||
struct extent_ptr_decoded p;
|
||||
|
||||
rcu_read_lock();
|
||||
bkey_for_each_ptr_decode(k.k, ptrs, p, entry) {
|
||||
struct bpos bucket2;
|
||||
struct bch_backpointer bp2;
|
||||
@@ -30,31 +31,43 @@ static bool extent_matches_bp(struct bch_fs *c,
|
||||
if (p.ptr.cached)
|
||||
continue;
|
||||
|
||||
bch2_extent_ptr_to_bp(c, btree_id, level, k, p, entry, &bucket2, &bp2);
|
||||
struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev);
|
||||
if (!ca)
|
||||
continue;
|
||||
|
||||
bch2_extent_ptr_to_bp(c, ca, btree_id, level, k, p, entry, &bucket2, &bp2);
|
||||
if (bpos_eq(bucket, bucket2) &&
|
||||
!memcmp(&bp, &bp2, sizeof(bp)))
|
||||
!memcmp(&bp, &bp2, sizeof(bp))) {
|
||||
rcu_read_unlock();
|
||||
return true;
|
||||
}
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
int bch2_backpointer_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags,
|
||||
enum bch_validate_flags flags,
|
||||
struct printbuf *err)
|
||||
{
|
||||
struct bkey_s_c_backpointer bp = bkey_s_c_to_backpointer(k);
|
||||
|
||||
/* these will be caught by fsck */
|
||||
if (!bch2_dev_exists2(c, bp.k->p.inode))
|
||||
rcu_read_lock();
|
||||
struct bch_dev *ca = bch2_dev_rcu(c, bp.k->p.inode);
|
||||
if (!ca) {
|
||||
/* these will be caught by fsck */
|
||||
rcu_read_unlock();
|
||||
return 0;
|
||||
}
|
||||
|
||||
struct bch_dev *ca = bch_dev_bkey_exists(c, bp.k->p.inode);
|
||||
struct bpos bucket = bp_pos_to_bucket(c, bp.k->p);
|
||||
struct bpos bucket = bp_pos_to_bucket(ca, bp.k->p);
|
||||
struct bpos bp_pos = bucket_pos_to_bp_noerror(ca, bucket, bp.v->bucket_offset);
|
||||
rcu_read_unlock();
|
||||
int ret = 0;
|
||||
|
||||
bkey_fsck_err_on((bp.v->bucket_offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT) >= ca->mi.bucket_size ||
|
||||
!bpos_eq(bp.k->p, bucket_pos_to_bp_noerror(ca, bucket, bp.v->bucket_offset)),
|
||||
!bpos_eq(bp.k->p, bp_pos),
|
||||
c, err,
|
||||
backpointer_bucket_offset_wrong,
|
||||
"backpointer bucket_offset wrong");
|
||||
@@ -75,10 +88,16 @@ void bch2_backpointer_to_text(struct printbuf *out, const struct bch_backpointer
|
||||
|
||||
void bch2_backpointer_k_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k)
|
||||
{
|
||||
if (bch2_dev_exists2(c, k.k->p.inode)) {
|
||||
rcu_read_lock();
|
||||
struct bch_dev *ca = bch2_dev_rcu(c, k.k->p.inode);
|
||||
if (ca) {
|
||||
struct bpos bucket = bp_pos_to_bucket(ca, k.k->p);
|
||||
rcu_read_unlock();
|
||||
prt_str(out, "bucket=");
|
||||
bch2_bpos_to_text(out, bp_pos_to_bucket(c, k.k->p));
|
||||
bch2_bpos_to_text(out, bucket);
|
||||
prt_str(out, " ");
|
||||
} else {
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
bch2_backpointer_to_text(out, bkey_s_c_to_backpointer(k).v);
|
||||
@@ -117,8 +136,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
|
||||
|
||||
bch_err(c, "%s", buf.buf);
|
||||
} else if (c->curr_recovery_pass > BCH_RECOVERY_PASS_check_extents_to_backpointers) {
|
||||
prt_printf(&buf, "backpointer not found when deleting");
|
||||
prt_newline(&buf);
|
||||
prt_printf(&buf, "backpointer not found when deleting\n");
|
||||
printbuf_indent_add(&buf, 2);
|
||||
|
||||
prt_printf(&buf, "searching for ");
|
||||
@@ -145,6 +163,7 @@ static noinline int backpointer_mod_err(struct btree_trans *trans,
|
||||
}
|
||||
|
||||
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *trans,
|
||||
struct bch_dev *ca,
|
||||
struct bpos bucket,
|
||||
struct bch_backpointer bp,
|
||||
struct bkey_s_c orig_k,
|
||||
@@ -161,7 +180,7 @@ int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *trans,
|
||||
return ret;
|
||||
|
||||
bkey_backpointer_init(&bp_k->k_i);
|
||||
bp_k->k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset);
|
||||
bp_k->k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset);
|
||||
bp_k->v = bp;
|
||||
|
||||
if (!insert) {
|
||||
@@ -171,9 +190,9 @@ int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *trans,
|
||||
|
||||
k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers,
|
||||
bp_k->k.p,
|
||||
BTREE_ITER_INTENT|
|
||||
BTREE_ITER_SLOTS|
|
||||
BTREE_ITER_WITH_UPDATES);
|
||||
BTREE_ITER_intent|
|
||||
BTREE_ITER_slots|
|
||||
BTREE_ITER_with_updates);
|
||||
ret = bkey_err(k);
|
||||
if (ret)
|
||||
goto err;
|
||||
@@ -197,13 +216,13 @@ err:
|
||||
* Find the next backpointer >= *bp_offset:
|
||||
*/
|
||||
int bch2_get_next_backpointer(struct btree_trans *trans,
|
||||
struct bch_dev *ca,
|
||||
struct bpos bucket, int gen,
|
||||
struct bpos *bp_pos,
|
||||
struct bch_backpointer *bp,
|
||||
unsigned iter_flags)
|
||||
{
|
||||
struct bch_fs *c = trans->c;
|
||||
struct bpos bp_end_pos = bucket_pos_to_bp(c, bpos_nosnap_successor(bucket), 0);
|
||||
struct bpos bp_end_pos = bucket_pos_to_bp(ca, bpos_nosnap_successor(bucket), 0);
|
||||
struct btree_iter alloc_iter = { NULL }, bp_iter = { NULL };
|
||||
struct bkey_s_c k;
|
||||
int ret = 0;
|
||||
@@ -213,7 +232,7 @@ int bch2_get_next_backpointer(struct btree_trans *trans,
|
||||
|
||||
if (gen >= 0) {
|
||||
k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc,
|
||||
bucket, BTREE_ITER_CACHED|iter_flags);
|
||||
bucket, BTREE_ITER_cached|iter_flags);
|
||||
ret = bkey_err(k);
|
||||
if (ret)
|
||||
goto out;
|
||||
@@ -223,7 +242,7 @@ int bch2_get_next_backpointer(struct btree_trans *trans,
|
||||
goto done;
|
||||
}
|
||||
|
||||
*bp_pos = bpos_max(*bp_pos, bucket_pos_to_bp(c, bucket, 0));
|
||||
*bp_pos = bpos_max(*bp_pos, bucket_pos_to_bp(ca, bucket, 0));
|
||||
|
||||
for_each_btree_key_norestart(trans, bp_iter, BTREE_ID_backpointers,
|
||||
*bp_pos, iter_flags, k, ret) {
|
||||
@@ -249,7 +268,6 @@ static void backpointer_not_found(struct btree_trans *trans,
|
||||
{
|
||||
struct bch_fs *c = trans->c;
|
||||
struct printbuf buf = PRINTBUF;
|
||||
struct bpos bucket = bp_pos_to_bucket(c, bp_pos);
|
||||
|
||||
/*
|
||||
* If we're using the btree write buffer, the backpointer we were
|
||||
@@ -259,6 +277,10 @@ static void backpointer_not_found(struct btree_trans *trans,
|
||||
if (likely(!bch2_backpointers_no_use_write_buffer))
|
||||
return;
|
||||
|
||||
struct bpos bucket;
|
||||
if (!bp_pos_to_bucket_nodev(c, bp_pos, &bucket))
|
||||
return;
|
||||
|
||||
prt_printf(&buf, "backpointer doesn't match %s it points to:\n ",
|
||||
bp.level ? "btree node" : "extent");
|
||||
prt_printf(&buf, "bucket: ");
|
||||
@@ -288,15 +310,17 @@ struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *trans,
|
||||
{
|
||||
if (likely(!bp.level)) {
|
||||
struct bch_fs *c = trans->c;
|
||||
struct bpos bucket = bp_pos_to_bucket(c, bp_pos);
|
||||
struct bkey_s_c k;
|
||||
|
||||
struct bpos bucket;
|
||||
if (!bp_pos_to_bucket_nodev(c, bp_pos, &bucket))
|
||||
return bkey_s_c_err(-EIO);
|
||||
|
||||
bch2_trans_node_iter_init(trans, iter,
|
||||
bp.btree_id,
|
||||
bp.pos,
|
||||
0, 0,
|
||||
iter_flags);
|
||||
k = bch2_btree_iter_peek_slot(iter);
|
||||
struct bkey_s_c k = bch2_btree_iter_peek_slot(iter);
|
||||
if (bkey_err(k)) {
|
||||
bch2_trans_iter_exit(trans, iter);
|
||||
return k;
|
||||
@@ -325,18 +349,20 @@ struct btree *bch2_backpointer_get_node(struct btree_trans *trans,
|
||||
struct bch_backpointer bp)
|
||||
{
|
||||
struct bch_fs *c = trans->c;
|
||||
struct bpos bucket = bp_pos_to_bucket(c, bp_pos);
|
||||
struct btree *b;
|
||||
|
||||
BUG_ON(!bp.level);
|
||||
|
||||
struct bpos bucket;
|
||||
if (!bp_pos_to_bucket_nodev(c, bp_pos, &bucket))
|
||||
return ERR_PTR(-EIO);
|
||||
|
||||
bch2_trans_node_iter_init(trans, iter,
|
||||
bp.btree_id,
|
||||
bp.pos,
|
||||
0,
|
||||
bp.level - 1,
|
||||
0);
|
||||
b = bch2_btree_iter_peek_node(iter);
|
||||
struct btree *b = bch2_btree_iter_peek_node(iter);
|
||||
if (IS_ERR_OR_NULL(b))
|
||||
goto err;
|
||||
|
||||
@@ -367,16 +393,16 @@ static int bch2_check_btree_backpointer(struct btree_trans *trans, struct btree_
|
||||
struct printbuf buf = PRINTBUF;
|
||||
int ret = 0;
|
||||
|
||||
if (fsck_err_on(!bch2_dev_exists2(c, k.k->p.inode), c,
|
||||
backpointer_to_missing_device,
|
||||
"backpointer for missing device:\n%s",
|
||||
(bch2_bkey_val_to_text(&buf, c, k), buf.buf))) {
|
||||
ret = bch2_btree_delete_at(trans, bp_iter, 0);
|
||||
struct bpos bucket;
|
||||
if (!bp_pos_to_bucket_nodev_noerror(c, k.k->p, &bucket)) {
|
||||
if (fsck_err(c, backpointer_to_missing_device,
|
||||
"backpointer for missing device:\n%s",
|
||||
(bch2_bkey_val_to_text(&buf, c, k), buf.buf)))
|
||||
ret = bch2_btree_delete_at(trans, bp_iter, 0);
|
||||
goto out;
|
||||
}
|
||||
|
||||
alloc_k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc,
|
||||
bp_pos_to_bucket(c, k.k->p), 0);
|
||||
alloc_k = bch2_bkey_get_iter(trans, &alloc_iter, BTREE_ID_alloc, bucket, 0);
|
||||
ret = bkey_err(alloc_k);
|
||||
if (ret)
|
||||
goto out;
|
||||
@@ -460,8 +486,8 @@ found:
|
||||
|
||||
bytes = p.crc.compressed_size << 9;
|
||||
|
||||
struct bch_dev *ca = bch_dev_bkey_exists(c, dev);
|
||||
if (!bch2_dev_get_ioref(ca, READ))
|
||||
struct bch_dev *ca = bch2_dev_get_ioref(c, dev, READ);
|
||||
if (!ca)
|
||||
return false;
|
||||
|
||||
data_buf = kvmalloc(bytes, GFP_KERNEL);
|
||||
@@ -511,25 +537,27 @@ static int check_bp_exists(struct btree_trans *trans,
|
||||
struct printbuf buf = PRINTBUF;
|
||||
struct bkey_s_c bp_k;
|
||||
struct bkey_buf tmp;
|
||||
int ret;
|
||||
int ret = 0;
|
||||
|
||||
bch2_bkey_buf_init(&tmp);
|
||||
|
||||
if (!bch2_dev_bucket_exists(c, bucket)) {
|
||||
struct bch_dev *ca = bch2_dev_bucket_tryget(c, bucket);
|
||||
if (!ca) {
|
||||
prt_str(&buf, "extent for nonexistent device:bucket ");
|
||||
bch2_bpos_to_text(&buf, bucket);
|
||||
prt_str(&buf, "\n ");
|
||||
bch2_bkey_val_to_text(&buf, c, orig_k);
|
||||
bch_err(c, "%s", buf.buf);
|
||||
return -BCH_ERR_fsck_repair_unimplemented;
|
||||
ret = -BCH_ERR_fsck_repair_unimplemented;
|
||||
goto err;
|
||||
}
|
||||
|
||||
if (bpos_lt(bucket, s->bucket_start) ||
|
||||
bpos_gt(bucket, s->bucket_end))
|
||||
return 0;
|
||||
goto out;
|
||||
|
||||
bp_k = bch2_bkey_get_iter(trans, &bp_iter, BTREE_ID_backpointers,
|
||||
bucket_pos_to_bp(c, bucket, bp.bucket_offset),
|
||||
bucket_pos_to_bp(ca, bucket, bp.bucket_offset),
|
||||
0);
|
||||
ret = bkey_err(bp_k);
|
||||
if (ret)
|
||||
@@ -562,6 +590,7 @@ fsck_err:
|
||||
bch2_trans_iter_exit(trans, &other_extent_iter);
|
||||
bch2_trans_iter_exit(trans, &bp_iter);
|
||||
bch2_bkey_buf_exit(&tmp, c);
|
||||
bch2_dev_put(ca);
|
||||
printbuf_exit(&buf);
|
||||
return ret;
|
||||
check_existing_bp:
|
||||
@@ -637,13 +666,13 @@ missing:
|
||||
|
||||
struct bkey_i_backpointer n_bp_k;
|
||||
bkey_backpointer_init(&n_bp_k.k_i);
|
||||
n_bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset);
|
||||
n_bp_k.k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset);
|
||||
n_bp_k.v = bp;
|
||||
prt_printf(&buf, "\n want: ");
|
||||
bch2_bkey_val_to_text(&buf, c, bkey_i_to_s_c(&n_bp_k.k_i));
|
||||
|
||||
if (fsck_err(c, ptr_to_missing_backpointer, "%s", buf.buf))
|
||||
ret = bch2_bucket_backpointer_mod(trans, bucket, bp, orig_k, true);
|
||||
ret = bch2_bucket_backpointer_mod(trans, ca, bucket, bp, orig_k, true);
|
||||
|
||||
goto out;
|
||||
}
|
||||
@@ -667,7 +696,14 @@ static int check_extent_to_backpointers(struct btree_trans *trans,
|
||||
if (p.ptr.cached)
|
||||
continue;
|
||||
|
||||
bch2_extent_ptr_to_bp(c, btree, level, k, p, entry, &bucket_pos, &bp);
|
||||
rcu_read_lock();
|
||||
struct bch_dev *ca = bch2_dev_rcu(c, p.ptr.dev);
|
||||
if (ca)
|
||||
bch2_extent_ptr_to_bp(c, ca, btree, level, k, p, entry, &bucket_pos, &bp);
|
||||
rcu_read_unlock();
|
||||
|
||||
if (!ca)
|
||||
continue;
|
||||
|
||||
ret = check_bp_exists(trans, s, bucket_pos, bp, k);
|
||||
if (ret)
|
||||
@@ -760,7 +796,7 @@ static int bch2_get_btree_in_memory_pos(struct btree_trans *trans,
|
||||
|
||||
__for_each_btree_node(trans, iter, btree,
|
||||
btree == start.btree ? start.pos : POS_MIN,
|
||||
0, depth, BTREE_ITER_PREFETCH, b, ret) {
|
||||
0, depth, BTREE_ITER_prefetch, b, ret) {
|
||||
mem_may_pin -= btree_buf_bytes(b);
|
||||
if (mem_may_pin <= 0) {
|
||||
c->btree_cache.pinned_nodes_end = *end =
|
||||
@@ -794,31 +830,13 @@ static int bch2_check_extents_to_backpointers_pass(struct btree_trans *trans,
|
||||
|
||||
while (level >= depth) {
|
||||
struct btree_iter iter;
|
||||
bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0,
|
||||
level,
|
||||
BTREE_ITER_PREFETCH);
|
||||
while (1) {
|
||||
bch2_trans_begin(trans);
|
||||
|
||||
struct bkey_s_c k = bch2_btree_iter_peek(&iter);
|
||||
if (!k.k)
|
||||
break;
|
||||
ret = bkey_err(k) ?:
|
||||
check_extent_to_backpointers(trans, s, btree_id, level, k) ?:
|
||||
bch2_trans_commit(trans, NULL, NULL,
|
||||
BCH_TRANS_COMMIT_no_enospc);
|
||||
if (bch2_err_matches(ret, BCH_ERR_transaction_restart)) {
|
||||
ret = 0;
|
||||
continue;
|
||||
}
|
||||
if (ret)
|
||||
break;
|
||||
if (bpos_eq(iter.pos, SPOS_MAX))
|
||||
break;
|
||||
bch2_btree_iter_advance(&iter);
|
||||
}
|
||||
bch2_trans_iter_exit(trans, &iter);
|
||||
bch2_trans_node_iter_init(trans, &iter, btree_id, POS_MIN, 0, level,
|
||||
BTREE_ITER_prefetch);
|
||||
|
||||
ret = for_each_btree_key_continue(trans, iter, 0, k, ({
|
||||
check_extent_to_backpointers(trans, s, btree_id, level, k) ?:
|
||||
bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc);
|
||||
}));
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
@@ -936,7 +954,7 @@ static int bch2_check_backpointers_to_extents_pass(struct btree_trans *trans,
|
||||
struct bpos last_flushed_pos = SPOS_MAX;
|
||||
|
||||
return for_each_btree_key_commit(trans, iter, BTREE_ID_backpointers,
|
||||
POS_MIN, BTREE_ITER_PREFETCH, k,
|
||||
POS_MIN, BTREE_ITER_prefetch, k,
|
||||
NULL, NULL, BCH_TRANS_COMMIT_no_enospc,
|
||||
check_one_backpointer(trans, start, end,
|
||||
bkey_s_c_to_backpointer(k),
|
||||
|
||||
@@ -6,6 +6,7 @@
|
||||
#include "btree_iter.h"
|
||||
#include "btree_update.h"
|
||||
#include "buckets.h"
|
||||
#include "error.h"
|
||||
#include "super.h"
|
||||
|
||||
static inline u64 swab40(u64 x)
|
||||
@@ -18,7 +19,7 @@ static inline u64 swab40(u64 x)
|
||||
}
|
||||
|
||||
int bch2_backpointer_invalid(struct bch_fs *, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
void bch2_backpointer_to_text(struct printbuf *, const struct bch_backpointer *);
|
||||
void bch2_backpointer_k_to_text(struct printbuf *, struct bch_fs *, struct bkey_s_c);
|
||||
void bch2_backpointer_swab(struct bkey_s);
|
||||
@@ -36,15 +37,29 @@ void bch2_backpointer_swab(struct bkey_s);
|
||||
* Convert from pos in backpointer btree to pos of corresponding bucket in alloc
|
||||
* btree:
|
||||
*/
|
||||
static inline struct bpos bp_pos_to_bucket(const struct bch_fs *c,
|
||||
struct bpos bp_pos)
|
||||
static inline struct bpos bp_pos_to_bucket(const struct bch_dev *ca, struct bpos bp_pos)
|
||||
{
|
||||
struct bch_dev *ca = bch_dev_bkey_exists(c, bp_pos.inode);
|
||||
u64 bucket_sector = bp_pos.offset >> MAX_EXTENT_COMPRESS_RATIO_SHIFT;
|
||||
|
||||
return POS(bp_pos.inode, sector_to_bucket(ca, bucket_sector));
|
||||
}
|
||||
|
||||
static inline bool bp_pos_to_bucket_nodev_noerror(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket)
|
||||
{
|
||||
rcu_read_lock();
|
||||
struct bch_dev *ca = bch2_dev_rcu(c, bp_pos.inode);
|
||||
if (ca)
|
||||
*bucket = bp_pos_to_bucket(ca, bp_pos);
|
||||
rcu_read_unlock();
|
||||
return ca != NULL;
|
||||
}
|
||||
|
||||
static inline bool bp_pos_to_bucket_nodev(struct bch_fs *c, struct bpos bp_pos, struct bpos *bucket)
|
||||
{
|
||||
return !bch2_fs_inconsistent_on(!bp_pos_to_bucket_nodev_noerror(c, bp_pos, bucket),
|
||||
c, "backpointer for missing device %llu", bp_pos.inode);
|
||||
}
|
||||
|
||||
static inline struct bpos bucket_pos_to_bp_noerror(const struct bch_dev *ca,
|
||||
struct bpos bucket,
|
||||
u64 bucket_offset)
|
||||
@@ -57,32 +72,32 @@ static inline struct bpos bucket_pos_to_bp_noerror(const struct bch_dev *ca,
|
||||
/*
|
||||
* Convert from pos in alloc btree + bucket offset to pos in backpointer btree:
|
||||
*/
|
||||
static inline struct bpos bucket_pos_to_bp(const struct bch_fs *c,
|
||||
static inline struct bpos bucket_pos_to_bp(const struct bch_dev *ca,
|
||||
struct bpos bucket,
|
||||
u64 bucket_offset)
|
||||
{
|
||||
struct bch_dev *ca = bch_dev_bkey_exists(c, bucket.inode);
|
||||
struct bpos ret = bucket_pos_to_bp_noerror(ca, bucket, bucket_offset);
|
||||
EBUG_ON(!bkey_eq(bucket, bp_pos_to_bucket(c, ret)));
|
||||
EBUG_ON(!bkey_eq(bucket, bp_pos_to_bucket(ca, ret)));
|
||||
return ret;
|
||||
}
|
||||
|
||||
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bpos bucket,
|
||||
struct bch_backpointer, struct bkey_s_c, bool);
|
||||
int bch2_bucket_backpointer_mod_nowritebuffer(struct btree_trans *, struct bch_dev *,
|
||||
struct bpos bucket, struct bch_backpointer, struct bkey_s_c, bool);
|
||||
|
||||
static inline int bch2_bucket_backpointer_mod(struct btree_trans *trans,
|
||||
struct bch_dev *ca,
|
||||
struct bpos bucket,
|
||||
struct bch_backpointer bp,
|
||||
struct bkey_s_c orig_k,
|
||||
bool insert)
|
||||
{
|
||||
if (unlikely(bch2_backpointers_no_use_write_buffer))
|
||||
return bch2_bucket_backpointer_mod_nowritebuffer(trans, bucket, bp, orig_k, insert);
|
||||
return bch2_bucket_backpointer_mod_nowritebuffer(trans, ca, bucket, bp, orig_k, insert);
|
||||
|
||||
struct bkey_i_backpointer bp_k;
|
||||
|
||||
bkey_backpointer_init(&bp_k.k_i);
|
||||
bp_k.k.p = bucket_pos_to_bp(trans->c, bucket, bp.bucket_offset);
|
||||
bp_k.k.p = bucket_pos_to_bp(ca, bucket, bp.bucket_offset);
|
||||
bp_k.v = bp;
|
||||
|
||||
if (!insert) {
|
||||
@@ -120,7 +135,7 @@ static inline enum bch_data_type bch2_bkey_ptr_data_type(struct bkey_s_c k,
|
||||
}
|
||||
}
|
||||
|
||||
static inline void bch2_extent_ptr_to_bp(struct bch_fs *c,
|
||||
static inline void bch2_extent_ptr_to_bp(struct bch_fs *c, struct bch_dev *ca,
|
||||
enum btree_id btree_id, unsigned level,
|
||||
struct bkey_s_c k, struct extent_ptr_decoded p,
|
||||
const union bch_extent_entry *entry,
|
||||
@@ -130,7 +145,7 @@ static inline void bch2_extent_ptr_to_bp(struct bch_fs *c,
|
||||
s64 sectors = level ? btree_sectors(c) : k.k->size;
|
||||
u32 bucket_offset;
|
||||
|
||||
*bucket_pos = PTR_BUCKET_POS_OFFSET(c, &p.ptr, &bucket_offset);
|
||||
*bucket_pos = PTR_BUCKET_POS_OFFSET(ca, &p.ptr, &bucket_offset);
|
||||
*bp = (struct bch_backpointer) {
|
||||
.btree_id = btree_id,
|
||||
.level = level,
|
||||
@@ -142,7 +157,7 @@ static inline void bch2_extent_ptr_to_bp(struct bch_fs *c,
|
||||
};
|
||||
}
|
||||
|
||||
int bch2_get_next_backpointer(struct btree_trans *, struct bpos, int,
|
||||
int bch2_get_next_backpointer(struct btree_trans *, struct bch_dev *ca, struct bpos, int,
|
||||
struct bpos *, struct bch_backpointer *, unsigned);
|
||||
struct bkey_s_c bch2_backpointer_get_key(struct btree_trans *, struct btree_iter *,
|
||||
struct bpos, struct bch_backpointer,
|
||||
|
||||
@@ -359,6 +359,8 @@ do { \
|
||||
#define BCH_DEBUG_PARAMS_ALWAYS() \
|
||||
BCH_DEBUG_PARAM(key_merging_disabled, \
|
||||
"Disables merging of extents") \
|
||||
BCH_DEBUG_PARAM(btree_node_merging_disabled, \
|
||||
"Disables merging of btree nodes") \
|
||||
BCH_DEBUG_PARAM(btree_gc_always_rewrite, \
|
||||
"Causes mark and sweep to compact and rewrite every " \
|
||||
"btree node it traverses") \
|
||||
@@ -468,6 +470,7 @@ enum bch_time_stats {
|
||||
#include "quota_types.h"
|
||||
#include "rebalance_types.h"
|
||||
#include "replicas_types.h"
|
||||
#include "sb-members_types.h"
|
||||
#include "subvolume_types.h"
|
||||
#include "super_types.h"
|
||||
#include "thread_with_file_types.h"
|
||||
@@ -516,8 +519,8 @@ enum gc_phase {
|
||||
|
||||
struct gc_pos {
|
||||
enum gc_phase phase;
|
||||
u16 level;
|
||||
struct bpos pos;
|
||||
unsigned level;
|
||||
};
|
||||
|
||||
struct reflink_gc {
|
||||
@@ -534,7 +537,13 @@ struct io_count {
|
||||
|
||||
struct bch_dev {
|
||||
struct kobject kobj;
|
||||
#ifdef CONFIG_BCACHEFS_DEBUG
|
||||
atomic_long_t ref;
|
||||
bool dying;
|
||||
unsigned long last_put;
|
||||
#else
|
||||
struct percpu_ref ref;
|
||||
#endif
|
||||
struct completion ref_completion;
|
||||
struct percpu_ref io_ref;
|
||||
struct completion io_ref_completion;
|
||||
@@ -560,14 +569,11 @@ struct bch_dev {
|
||||
|
||||
struct bch_devs_mask self;
|
||||
|
||||
/* biosets used in cloned bios for writing multiple replicas */
|
||||
struct bio_set replica_set;
|
||||
|
||||
/*
|
||||
* Buckets:
|
||||
* Per-bucket arrays are protected by c->mark_lock, bucket_lock and
|
||||
* gc_lock, for device resize - holding any is sufficient for access:
|
||||
* Or rcu_read_lock(), but only for ptr_stale():
|
||||
* Or rcu_read_lock(), but only for dev_ptr_stale():
|
||||
*/
|
||||
struct bucket_array __rcu *buckets_gc;
|
||||
struct bucket_gens __rcu *bucket_gens;
|
||||
@@ -581,7 +587,7 @@ struct bch_dev {
|
||||
|
||||
/* Allocator: */
|
||||
u64 new_fs_bucket_idx;
|
||||
u64 alloc_cursor;
|
||||
u64 alloc_cursor[3];
|
||||
|
||||
unsigned nr_open_buckets;
|
||||
unsigned nr_btree_reserve;
|
||||
@@ -627,12 +633,12 @@ struct bch_dev {
|
||||
x(clean_shutdown) \
|
||||
x(fsck_running) \
|
||||
x(initial_gc_unfixed) \
|
||||
x(need_another_gc) \
|
||||
x(need_delete_dead_snapshots) \
|
||||
x(error) \
|
||||
x(topology_error) \
|
||||
x(errors_fixed) \
|
||||
x(errors_not_fixed)
|
||||
x(errors_not_fixed) \
|
||||
x(no_invalid_checks)
|
||||
|
||||
enum bch_fs_flags {
|
||||
#define x(n) BCH_FS_##n,
|
||||
@@ -715,6 +721,7 @@ struct btree_trans_buf {
|
||||
x(discard_fast) \
|
||||
x(invalidate) \
|
||||
x(delete_dead_snapshots) \
|
||||
x(gc_gens) \
|
||||
x(snapshot_delete_pagecache) \
|
||||
x(sysfs) \
|
||||
x(btree_write_buffer)
|
||||
@@ -926,7 +933,6 @@ struct bch_fs {
|
||||
/* JOURNAL SEQ BLACKLIST */
|
||||
struct journal_seq_blacklist_table *
|
||||
journal_seq_blacklist_table;
|
||||
struct work_struct journal_seq_blacklist_gc_work;
|
||||
|
||||
/* ALLOCATOR */
|
||||
spinlock_t freelist_lock;
|
||||
@@ -957,8 +963,7 @@ struct bch_fs {
|
||||
struct work_struct discard_fast_work;
|
||||
|
||||
/* GARBAGE COLLECTION */
|
||||
struct task_struct *gc_thread;
|
||||
atomic_t kick_gc;
|
||||
struct work_struct gc_gens_work;
|
||||
unsigned long gc_count;
|
||||
|
||||
enum btree_id gc_gens_btree;
|
||||
@@ -988,6 +993,7 @@ struct bch_fs {
|
||||
struct bio_set bio_read;
|
||||
struct bio_set bio_read_split;
|
||||
struct bio_set bio_write;
|
||||
struct bio_set replica_set;
|
||||
struct mutex bio_bounce_pages_lock;
|
||||
mempool_t bio_bounce_pages;
|
||||
struct bucket_nocow_lock_table
|
||||
@@ -1115,7 +1121,6 @@ struct bch_fs {
|
||||
u64 counters_on_mount[BCH_COUNTER_NR];
|
||||
u64 __percpu *counters;
|
||||
|
||||
unsigned btree_gc_periodic:1;
|
||||
unsigned copy_gc_enabled:1;
|
||||
bool promote_whole_extents;
|
||||
|
||||
@@ -1250,11 +1255,6 @@ static inline s64 bch2_current_time(const struct bch_fs *c)
|
||||
return timespec_to_bch2_time(c, now);
|
||||
}
|
||||
|
||||
static inline bool bch2_dev_exists2(const struct bch_fs *c, unsigned dev)
|
||||
{
|
||||
return dev < c->sb.nr_devices && c->devs[dev];
|
||||
}
|
||||
|
||||
static inline struct stdio_redirect *bch2_fs_stdio_redirect(struct bch_fs *c)
|
||||
{
|
||||
struct stdio_redirect *stdio = c->stdio;
|
||||
|
||||
@@ -76,6 +76,7 @@
|
||||
#include <asm/byteorder.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/uuid.h>
|
||||
#include <uapi/linux/magic.h>
|
||||
#include "vstructs.h"
|
||||
|
||||
#ifdef __KERNEL__
|
||||
@@ -589,6 +590,13 @@ struct bch_member {
|
||||
__le64 errors_reset_time;
|
||||
__le64 seq;
|
||||
__le64 btree_allocated_bitmap;
|
||||
/*
|
||||
* On recovery from a clean shutdown we don't normally read the journal,
|
||||
* but we still want to resume writing from where we left off so we
|
||||
* don't overwrite more than is necessary, for list journal debugging:
|
||||
*/
|
||||
__le32 last_journal_bucket;
|
||||
__le32 last_journal_bucket_offset;
|
||||
};
|
||||
|
||||
/*
|
||||
@@ -1283,7 +1291,7 @@ enum bch_compression_opts {
|
||||
UUID_INIT(0xc68573f6, 0x66ce, 0x90a9, \
|
||||
0xd9, 0x6a, 0x60, 0xcf, 0x80, 0x3d, 0xf7, 0xef)
|
||||
|
||||
#define BCACHEFS_STATFS_MAGIC 0xca451a4e
|
||||
#define BCACHEFS_STATFS_MAGIC BCACHEFS_SUPER_MAGIC
|
||||
|
||||
#define JSET_MAGIC __cpu_to_le64(0x245235c1a3625032ULL)
|
||||
#define BSET_MAGIC __cpu_to_le64(0x90135c78b99e07f5ULL)
|
||||
|
||||
@@ -640,7 +640,7 @@ struct bkey_format bch2_bkey_format_done(struct bkey_format_state *s)
|
||||
|
||||
int bch2_bkey_format_invalid(struct bch_fs *c,
|
||||
struct bkey_format *f,
|
||||
enum bkey_invalid_flags flags,
|
||||
enum bch_validate_flags flags,
|
||||
struct printbuf *err)
|
||||
{
|
||||
unsigned i, bits = KEY_PACKED_BITS_START;
|
||||
@@ -656,20 +656,17 @@ int bch2_bkey_format_invalid(struct bch_fs *c,
|
||||
* unpacked format:
|
||||
*/
|
||||
for (i = 0; i < f->nr_fields; i++) {
|
||||
if (!c || c->sb.version_min >= bcachefs_metadata_version_snapshot) {
|
||||
if ((!c || c->sb.version_min >= bcachefs_metadata_version_snapshot) &&
|
||||
bch2_bkey_format_field_overflows(f, i)) {
|
||||
unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i];
|
||||
u64 unpacked_max = ~((~0ULL << 1) << (unpacked_bits - 1));
|
||||
u64 packed_max = f->bits_per_field[i]
|
||||
? ~((~0ULL << 1) << (f->bits_per_field[i] - 1))
|
||||
: 0;
|
||||
u64 field_offset = le64_to_cpu(f->field_offset[i]);
|
||||
|
||||
if (packed_max + field_offset < packed_max ||
|
||||
packed_max + field_offset > unpacked_max) {
|
||||
prt_printf(err, "field %u too large: %llu + %llu > %llu",
|
||||
i, packed_max, field_offset, unpacked_max);
|
||||
return -BCH_ERR_invalid;
|
||||
}
|
||||
prt_printf(err, "field %u too large: %llu + %llu > %llu",
|
||||
i, packed_max, le64_to_cpu(f->field_offset[i]), unpacked_max);
|
||||
return -BCH_ERR_invalid;
|
||||
}
|
||||
|
||||
bits += f->bits_per_field[i];
|
||||
|
||||
@@ -9,10 +9,10 @@
|
||||
#include "util.h"
|
||||
#include "vstructs.h"
|
||||
|
||||
enum bkey_invalid_flags {
|
||||
BKEY_INVALID_WRITE = (1U << 0),
|
||||
BKEY_INVALID_COMMIT = (1U << 1),
|
||||
BKEY_INVALID_JOURNAL = (1U << 2),
|
||||
enum bch_validate_flags {
|
||||
BCH_VALIDATE_write = (1U << 0),
|
||||
BCH_VALIDATE_commit = (1U << 1),
|
||||
BCH_VALIDATE_journal = (1U << 2),
|
||||
};
|
||||
|
||||
#if 0
|
||||
@@ -574,8 +574,31 @@ static inline void bch2_bkey_format_add_key(struct bkey_format_state *s, const s
|
||||
|
||||
void bch2_bkey_format_add_pos(struct bkey_format_state *, struct bpos);
|
||||
struct bkey_format bch2_bkey_format_done(struct bkey_format_state *);
|
||||
|
||||
static inline bool bch2_bkey_format_field_overflows(struct bkey_format *f, unsigned i)
|
||||
{
|
||||
unsigned f_bits = f->bits_per_field[i];
|
||||
unsigned unpacked_bits = bch2_bkey_format_current.bits_per_field[i];
|
||||
u64 unpacked_mask = ~((~0ULL << 1) << (unpacked_bits - 1));
|
||||
u64 field_offset = le64_to_cpu(f->field_offset[i]);
|
||||
|
||||
if (f_bits > unpacked_bits)
|
||||
return true;
|
||||
|
||||
if ((f_bits == unpacked_bits) && field_offset)
|
||||
return true;
|
||||
|
||||
u64 f_mask = f_bits
|
||||
? ~((~0ULL << (f_bits - 1)) << 1)
|
||||
: 0;
|
||||
|
||||
if (((field_offset + f_mask) & unpacked_mask) < field_offset)
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
int bch2_bkey_format_invalid(struct bch_fs *, struct bkey_format *,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
void bch2_bkey_format_to_text(struct printbuf *, const struct bkey_format *);
|
||||
|
||||
#endif /* _BCACHEFS_BKEY_H */
|
||||
|
||||
@@ -27,7 +27,7 @@ const char * const bch2_bkey_types[] = {
|
||||
};
|
||||
|
||||
static int deleted_key_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags, struct printbuf *err)
|
||||
enum bch_validate_flags flags, struct printbuf *err)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
@@ -41,7 +41,7 @@ static int deleted_key_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
})
|
||||
|
||||
static int empty_val_key_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags, struct printbuf *err)
|
||||
enum bch_validate_flags flags, struct printbuf *err)
|
||||
{
|
||||
int ret = 0;
|
||||
|
||||
@@ -58,7 +58,7 @@ fsck_err:
|
||||
})
|
||||
|
||||
static int key_type_cookie_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags, struct printbuf *err)
|
||||
enum bch_validate_flags flags, struct printbuf *err)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
@@ -82,7 +82,7 @@ static void key_type_cookie_to_text(struct printbuf *out, struct bch_fs *c,
|
||||
})
|
||||
|
||||
static int key_type_inline_data_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags, struct printbuf *err)
|
||||
enum bch_validate_flags flags, struct printbuf *err)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
@@ -123,9 +123,12 @@ const struct bkey_ops bch2_bkey_null_ops = {
|
||||
};
|
||||
|
||||
int bch2_bkey_val_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags,
|
||||
enum bch_validate_flags flags,
|
||||
struct printbuf *err)
|
||||
{
|
||||
if (test_bit(BCH_FS_no_invalid_checks, &c->flags))
|
||||
return 0;
|
||||
|
||||
const struct bkey_ops *ops = bch2_bkey_type_ops(k.k->type);
|
||||
int ret = 0;
|
||||
|
||||
@@ -159,9 +162,12 @@ const char *bch2_btree_node_type_str(enum btree_node_type type)
|
||||
|
||||
int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum btree_node_type type,
|
||||
enum bkey_invalid_flags flags,
|
||||
enum bch_validate_flags flags,
|
||||
struct printbuf *err)
|
||||
{
|
||||
if (test_bit(BCH_FS_no_invalid_checks, &c->flags))
|
||||
return 0;
|
||||
|
||||
int ret = 0;
|
||||
|
||||
bkey_fsck_err_on(k.k->u64s < BKEY_U64s, c, err,
|
||||
@@ -172,7 +178,7 @@ int __bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
return 0;
|
||||
|
||||
bkey_fsck_err_on(k.k->type < KEY_TYPE_MAX &&
|
||||
(type == BKEY_TYPE_btree || (flags & BKEY_INVALID_COMMIT)) &&
|
||||
(type == BKEY_TYPE_btree || (flags & BCH_VALIDATE_commit)) &&
|
||||
!(bch2_key_types_allowed[type] & BIT_ULL(k.k->type)), c, err,
|
||||
bkey_invalid_type_for_btree,
|
||||
"invalid key type for btree %s (%s)",
|
||||
@@ -224,7 +230,7 @@ fsck_err:
|
||||
|
||||
int bch2_bkey_invalid(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum btree_node_type type,
|
||||
enum bkey_invalid_flags flags,
|
||||
enum bch_validate_flags flags,
|
||||
struct printbuf *err)
|
||||
{
|
||||
return __bch2_bkey_invalid(c, k, type, flags, err) ?:
|
||||
|
||||
@@ -22,14 +22,15 @@ extern const struct bkey_ops bch2_bkey_null_ops;
|
||||
*/
|
||||
struct bkey_ops {
|
||||
int (*key_invalid)(struct bch_fs *c, struct bkey_s_c k,
|
||||
enum bkey_invalid_flags flags, struct printbuf *err);
|
||||
enum bch_validate_flags flags, struct printbuf *err);
|
||||
void (*val_to_text)(struct printbuf *, struct bch_fs *,
|
||||
struct bkey_s_c);
|
||||
void (*swab)(struct bkey_s);
|
||||
bool (*key_normalize)(struct bch_fs *, struct bkey_s);
|
||||
bool (*key_merge)(struct bch_fs *, struct bkey_s, struct bkey_s_c);
|
||||
int (*trigger)(struct btree_trans *, enum btree_id, unsigned,
|
||||
struct bkey_s_c, struct bkey_s, unsigned);
|
||||
struct bkey_s_c, struct bkey_s,
|
||||
enum btree_iter_update_trigger_flags);
|
||||
void (*compat)(enum btree_id id, unsigned version,
|
||||
unsigned big_endian, int write,
|
||||
struct bkey_s);
|
||||
@@ -48,11 +49,11 @@ static inline const struct bkey_ops *bch2_bkey_type_ops(enum bch_bkey_type type)
|
||||
}
|
||||
|
||||
int bch2_bkey_val_invalid(struct bch_fs *, struct bkey_s_c,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
int __bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c, enum btree_node_type,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
int bch2_bkey_invalid(struct bch_fs *, struct bkey_s_c, enum btree_node_type,
|
||||
enum bkey_invalid_flags, struct printbuf *);
|
||||
enum bch_validate_flags, struct printbuf *);
|
||||
int bch2_bkey_in_btree_node(struct bch_fs *, struct btree *,
|
||||
struct bkey_s_c, struct printbuf *);
|
||||
|
||||
@@ -76,56 +77,10 @@ static inline bool bch2_bkey_maybe_mergable(const struct bkey *l, const struct b
|
||||
|
||||
bool bch2_bkey_merge(struct bch_fs *, struct bkey_s, struct bkey_s_c);
|
||||
|
||||
enum btree_update_flags {
|
||||
__BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE = __BTREE_ITER_FLAGS_END,
|
||||
__BTREE_UPDATE_NOJOURNAL,
|
||||
__BTREE_UPDATE_KEY_CACHE_RECLAIM,
|
||||
|
||||
__BTREE_TRIGGER_NORUN,
|
||||
__BTREE_TRIGGER_TRANSACTIONAL,
|
||||
__BTREE_TRIGGER_ATOMIC,
|
||||
__BTREE_TRIGGER_GC,
|
||||
__BTREE_TRIGGER_INSERT,
|
||||
__BTREE_TRIGGER_OVERWRITE,
|
||||
__BTREE_TRIGGER_BUCKET_INVALIDATE,
|
||||
};
|
||||
|
||||
#define BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE (1U << __BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE)
|
||||
#define BTREE_UPDATE_NOJOURNAL (1U << __BTREE_UPDATE_NOJOURNAL)
|
||||
#define BTREE_UPDATE_KEY_CACHE_RECLAIM (1U << __BTREE_UPDATE_KEY_CACHE_RECLAIM)
|
||||
|
||||
/* Don't run triggers at all */
|
||||
#define BTREE_TRIGGER_NORUN (1U << __BTREE_TRIGGER_NORUN)
|
||||
|
||||
/*
|
||||
* If set, we're running transactional triggers as part of a transaction commit:
|
||||
* triggers may generate new updates
|
||||
*
|
||||
* If cleared, and either BTREE_TRIGGER_INSERT|BTREE_TRIGGER_OVERWRITE are set,
|
||||
* we're running atomic triggers during a transaction commit: we have our
|
||||
* journal reservation, we're holding btree node write locks, and we know the
|
||||
* transaction is going to commit (returning an error here is a fatal error,
|
||||
* causing us to go emergency read-only)
|
||||
*/
|
||||
#define BTREE_TRIGGER_TRANSACTIONAL (1U << __BTREE_TRIGGER_TRANSACTIONAL)
|
||||
#define BTREE_TRIGGER_ATOMIC (1U << __BTREE_TRIGGER_ATOMIC)
|
||||
|
||||
/* We're in gc/fsck: running triggers to recalculate e.g. disk usage */
|
||||
#define BTREE_TRIGGER_GC (1U << __BTREE_TRIGGER_GC)
|
||||
|
||||
/* @new is entering the btree */
|
||||
#define BTREE_TRIGGER_INSERT (1U << __BTREE_TRIGGER_INSERT)
|
||||
|
||||
/* @old is leaving the btree */
|
||||
#define BTREE_TRIGGER_OVERWRITE (1U << __BTREE_TRIGGER_OVERWRITE)
|
||||
|
||||
/* signal from bucket invalidate path to alloc trigger */
|
||||
#define BTREE_TRIGGER_BUCKET_INVALIDATE (1U << __BTREE_TRIGGER_BUCKET_INVALIDATE)
|
||||
|
||||
static inline int bch2_key_trigger(struct btree_trans *trans,
|
||||
enum btree_id btree, unsigned level,
|
||||
struct bkey_s_c old, struct bkey_s new,
|
||||
unsigned flags)
|
||||
enum btree_iter_update_trigger_flags flags)
|
||||
{
|
||||
const struct bkey_ops *ops = bch2_bkey_type_ops(old.k->type ?: new.k->type);
|
||||
|
||||
@@ -135,8 +90,9 @@ static inline int bch2_key_trigger(struct btree_trans *trans,
|
||||
}
|
||||
|
||||
static inline int bch2_key_trigger_old(struct btree_trans *trans,
|
||||
enum btree_id btree_id, unsigned level,
|
||||
struct bkey_s_c old, unsigned flags)
|
||||
enum btree_id btree_id, unsigned level,
|
||||
struct bkey_s_c old,
|
||||
enum btree_iter_update_trigger_flags flags)
|
||||
{
|
||||
struct bkey_i deleted;
|
||||
|
||||
@@ -144,12 +100,13 @@ static inline int bch2_key_trigger_old(struct btree_trans *trans,
|
||||
deleted.k.p = old.k->p;
|
||||
|
||||
return bch2_key_trigger(trans, btree_id, level, old, bkey_i_to_s(&deleted),
|
||||
BTREE_TRIGGER_OVERWRITE|flags);
|
||||
BTREE_TRIGGER_overwrite|flags);
|
||||
}
|
||||
|
||||
static inline int bch2_key_trigger_new(struct btree_trans *trans,
|
||||
enum btree_id btree_id, unsigned level,
|
||||
struct bkey_s new, unsigned flags)
|
||||
enum btree_id btree_id, unsigned level,
|
||||
struct bkey_s new,
|
||||
enum btree_iter_update_trigger_flags flags)
|
||||
{
|
||||
struct bkey_i deleted;
|
||||
|
||||
@@ -157,7 +114,7 @@ static inline int bch2_key_trigger_new(struct btree_trans *trans,
|
||||
deleted.k.p = new.k->p;
|
||||
|
||||
return bch2_key_trigger(trans, btree_id, level, bkey_i_to_s_c(&deleted), new,
|
||||
BTREE_TRIGGER_INSERT|flags);
|
||||
BTREE_TRIGGER_insert|flags);
|
||||
}
|
||||
|
||||
void bch2_bkey_renumber(enum btree_node_type, struct bkey_packed *, int);
|
||||
|
||||
@@ -6,9 +6,9 @@
|
||||
#include "bset.h"
|
||||
#include "extents.h"
|
||||
|
||||
typedef int (*sort_cmp_fn)(struct btree *,
|
||||
struct bkey_packed *,
|
||||
struct bkey_packed *);
|
||||
typedef int (*sort_cmp_fn)(const struct btree *,
|
||||
const struct bkey_packed *,
|
||||
const struct bkey_packed *);
|
||||
|
||||
static inline bool sort_iter_end(struct sort_iter *iter)
|
||||
{
|
||||
@@ -70,9 +70,9 @@ static inline struct bkey_packed *sort_iter_next(struct sort_iter *iter,
|
||||
/*
|
||||
* If keys compare equal, compare by pointer order:
|
||||
*/
|
||||
static inline int key_sort_fix_overlapping_cmp(struct btree *b,
|
||||
struct bkey_packed *l,
|
||||
struct bkey_packed *r)
|
||||
static inline int key_sort_fix_overlapping_cmp(const struct btree *b,
|
||||
const struct bkey_packed *l,
|
||||
const struct bkey_packed *r)
|
||||
{
|
||||
return bch2_bkey_cmp_packed(b, l, r) ?:
|
||||
cmp_int((unsigned long) l, (unsigned long) r);
|
||||
@@ -154,46 +154,59 @@ bch2_sort_repack(struct bset *dst, struct btree *src,
|
||||
return nr;
|
||||
}
|
||||
|
||||
static inline int sort_keys_cmp(struct btree *b,
|
||||
struct bkey_packed *l,
|
||||
struct bkey_packed *r)
|
||||
static inline int keep_unwritten_whiteouts_cmp(const struct btree *b,
|
||||
const struct bkey_packed *l,
|
||||
const struct bkey_packed *r)
|
||||
{
|
||||
return bch2_bkey_cmp_packed_inlined(b, l, r) ?:
|
||||
(int) bkey_deleted(r) - (int) bkey_deleted(l) ?:
|
||||
(int) l->needs_whiteout - (int) r->needs_whiteout;
|
||||
(long) l - (long) r;
|
||||
}
|
||||
|
||||
unsigned bch2_sort_keys(struct bkey_packed *dst,
|
||||
struct sort_iter *iter,
|
||||
bool filter_whiteouts)
|
||||
#include "btree_update_interior.h"
|
||||
|
||||
/*
|
||||
* For sorting in the btree node write path: whiteouts not in the unwritten
|
||||
* whiteouts area are dropped, whiteouts in the unwritten whiteouts area are
|
||||
* dropped if overwritten by real keys:
|
||||
*/
|
||||
unsigned bch2_sort_keys_keep_unwritten_whiteouts(struct bkey_packed *dst, struct sort_iter *iter)
|
||||
{
|
||||
const struct bkey_format *f = &iter->b->format;
|
||||
struct bkey_packed *in, *next, *out = dst;
|
||||
|
||||
sort_iter_sort(iter, sort_keys_cmp);
|
||||
sort_iter_sort(iter, keep_unwritten_whiteouts_cmp);
|
||||
|
||||
while ((in = sort_iter_next(iter, sort_keys_cmp))) {
|
||||
bool needs_whiteout = false;
|
||||
|
||||
if (bkey_deleted(in) &&
|
||||
(filter_whiteouts || !in->needs_whiteout))
|
||||
while ((in = sort_iter_next(iter, keep_unwritten_whiteouts_cmp))) {
|
||||
if (bkey_deleted(in) && in < unwritten_whiteouts_start(iter->b))
|
||||
continue;
|
||||
|
||||
while ((next = sort_iter_peek(iter)) &&
|
||||
!bch2_bkey_cmp_packed_inlined(iter->b, in, next)) {
|
||||
BUG_ON(in->needs_whiteout &&
|
||||
next->needs_whiteout);
|
||||
needs_whiteout |= in->needs_whiteout;
|
||||
in = sort_iter_next(iter, sort_keys_cmp);
|
||||
}
|
||||
if ((next = sort_iter_peek(iter)) &&
|
||||
!bch2_bkey_cmp_packed_inlined(iter->b, in, next))
|
||||
continue;
|
||||
|
||||
if (bkey_deleted(in)) {
|
||||
memcpy_u64s_small(out, in, bkeyp_key_u64s(f, in));
|
||||
set_bkeyp_val_u64s(f, out, 0);
|
||||
} else {
|
||||
bkey_p_copy(out, in);
|
||||
}
|
||||
out->needs_whiteout |= needs_whiteout;
|
||||
bkey_p_copy(out, in);
|
||||
out = bkey_p_next(out);
|
||||
}
|
||||
|
||||
return (u64 *) out - (u64 *) dst;
|
||||
}
|
||||
|
||||
/*
|
||||
* Main sort routine for compacting a btree node in memory: we always drop
|
||||
* whiteouts because any whiteouts that need to be written are in the unwritten
|
||||
* whiteouts area:
|
||||
*/
|
||||
unsigned bch2_sort_keys(struct bkey_packed *dst, struct sort_iter *iter)
|
||||
{
|
||||
struct bkey_packed *in, *out = dst;
|
||||
|
||||
sort_iter_sort(iter, bch2_bkey_cmp_packed_inlined);
|
||||
|
||||
while ((in = sort_iter_next(iter, bch2_bkey_cmp_packed_inlined))) {
|
||||
if (bkey_deleted(in))
|
||||
continue;
|
||||
|
||||
bkey_p_copy(out, in);
|
||||
out = bkey_p_next(out);
|
||||
}
|
||||
|
||||
|
||||
@@ -48,7 +48,7 @@ bch2_sort_repack(struct bset *, struct btree *,
|
||||
struct btree_node_iter *,
|
||||
struct bkey_format *, bool);
|
||||
|
||||
unsigned bch2_sort_keys(struct bkey_packed *,
|
||||
struct sort_iter *, bool);
|
||||
unsigned bch2_sort_keys_keep_unwritten_whiteouts(struct bkey_packed *, struct sort_iter *);
|
||||
unsigned bch2_sort_keys(struct bkey_packed *, struct sort_iter *);
|
||||
|
||||
#endif /* _BCACHEFS_BKEY_SORT_H */
|
||||
|
||||
@@ -103,8 +103,6 @@ void bch2_dump_bset(struct bch_fs *c, struct btree *b,
|
||||
|
||||
void bch2_dump_btree_node(struct bch_fs *c, struct btree *b)
|
||||
{
|
||||
struct bset_tree *t;
|
||||
|
||||
console_lock();
|
||||
for_each_bset(b, t)
|
||||
bch2_dump_bset(c, b, bset(b, t), t - b->set);
|
||||
@@ -136,7 +134,6 @@ void bch2_dump_btree_node_iter(struct btree *b,
|
||||
|
||||
struct btree_nr_keys bch2_btree_node_count_keys(struct btree *b)
|
||||
{
|
||||
struct bset_tree *t;
|
||||
struct bkey_packed *k;
|
||||
struct btree_nr_keys nr = {};
|
||||
|
||||
@@ -198,7 +195,6 @@ void bch2_btree_node_iter_verify(struct btree_node_iter *iter,
|
||||
{
|
||||
struct btree_node_iter_set *set, *s2;
|
||||
struct bkey_packed *k, *p;
|
||||
struct bset_tree *t;
|
||||
|
||||
if (bch2_btree_node_iter_end(iter))
|
||||
return;
|
||||
@@ -213,12 +209,14 @@ void bch2_btree_node_iter_verify(struct btree_node_iter *iter,
|
||||
/* Verify that set->end is correct: */
|
||||
btree_node_iter_for_each(iter, set) {
|
||||
for_each_bset(b, t)
|
||||
if (set->end == t->end_offset)
|
||||
if (set->end == t->end_offset) {
|
||||
BUG_ON(set->k < btree_bkey_first_offset(t) ||
|
||||
set->k >= t->end_offset);
|
||||
goto found;
|
||||
}
|
||||
BUG();
|
||||
found:
|
||||
BUG_ON(set->k < btree_bkey_first_offset(t) ||
|
||||
set->k >= t->end_offset);
|
||||
do {} while (0);
|
||||
}
|
||||
|
||||
/* Verify iterator is sorted: */
|
||||
@@ -377,11 +375,9 @@ static struct bkey_float *bkey_float(const struct btree *b,
|
||||
return ro_aux_tree_base(b, t)->f + idx;
|
||||
}
|
||||
|
||||
static void bset_aux_tree_verify(const struct btree *b)
|
||||
static void bset_aux_tree_verify(struct btree *b)
|
||||
{
|
||||
#ifdef CONFIG_BCACHEFS_DEBUG
|
||||
const struct bset_tree *t;
|
||||
|
||||
for_each_bset(b, t) {
|
||||
if (t->aux_data_offset == U16_MAX)
|
||||
continue;
|
||||
@@ -685,20 +681,20 @@ static __always_inline void make_bfloat(struct btree *b, struct bset_tree *t,
|
||||
}
|
||||
|
||||
/* bytes remaining - only valid for last bset: */
|
||||
static unsigned __bset_tree_capacity(const struct btree *b, const struct bset_tree *t)
|
||||
static unsigned __bset_tree_capacity(struct btree *b, const struct bset_tree *t)
|
||||
{
|
||||
bset_aux_tree_verify(b);
|
||||
|
||||
return btree_aux_data_bytes(b) - t->aux_data_offset * sizeof(u64);
|
||||
}
|
||||
|
||||
static unsigned bset_ro_tree_capacity(const struct btree *b, const struct bset_tree *t)
|
||||
static unsigned bset_ro_tree_capacity(struct btree *b, const struct bset_tree *t)
|
||||
{
|
||||
return __bset_tree_capacity(b, t) /
|
||||
(sizeof(struct bkey_float) + sizeof(u8));
|
||||
}
|
||||
|
||||
static unsigned bset_rw_tree_capacity(const struct btree *b, const struct bset_tree *t)
|
||||
static unsigned bset_rw_tree_capacity(struct btree *b, const struct bset_tree *t)
|
||||
{
|
||||
return __bset_tree_capacity(b, t) / sizeof(struct rw_aux_tree);
|
||||
}
|
||||
@@ -1374,8 +1370,6 @@ void bch2_btree_node_iter_init(struct btree_node_iter *iter,
|
||||
void bch2_btree_node_iter_init_from_start(struct btree_node_iter *iter,
|
||||
struct btree *b)
|
||||
{
|
||||
struct bset_tree *t;
|
||||
|
||||
memset(iter, 0, sizeof(*iter));
|
||||
|
||||
for_each_bset(b, t)
|
||||
@@ -1481,7 +1475,6 @@ struct bkey_packed *bch2_btree_node_iter_prev_all(struct btree_node_iter *iter,
|
||||
{
|
||||
struct bkey_packed *k, *prev = NULL;
|
||||
struct btree_node_iter_set *set;
|
||||
struct bset_tree *t;
|
||||
unsigned end = 0;
|
||||
|
||||
if (bch2_expensive_debug_checks)
|
||||
@@ -1550,9 +1543,7 @@ struct bkey_s_c bch2_btree_node_iter_peek_unpack(struct btree_node_iter *iter,
|
||||
|
||||
void bch2_btree_keys_stats(const struct btree *b, struct bset_stats *stats)
|
||||
{
|
||||
const struct bset_tree *t;
|
||||
|
||||
for_each_bset(b, t) {
|
||||
for_each_bset_c(b, t) {
|
||||
enum bset_aux_tree_type type = bset_aux_tree_type(t);
|
||||
size_t j;
|
||||
|
||||
|
||||
@@ -206,7 +206,10 @@ static inline size_t btree_aux_data_u64s(const struct btree *b)
|
||||
}
|
||||
|
||||
#define for_each_bset(_b, _t) \
|
||||
for (_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++)
|
||||
for (struct bset_tree *_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++)
|
||||
|
||||
#define for_each_bset_c(_b, _t) \
|
||||
for (const struct bset_tree *_t = (_b)->set; _t < (_b)->set + (_b)->nsets; _t++)
|
||||
|
||||
#define bset_tree_for_each_key(_b, _t, _k) \
|
||||
for (_k = btree_bkey_first(_b, _t); \
|
||||
@@ -294,7 +297,6 @@ static inline struct bset_tree *
|
||||
bch2_bkey_to_bset_inlined(struct btree *b, struct bkey_packed *k)
|
||||
{
|
||||
unsigned offset = __btree_node_key_to_offset(b, k);
|
||||
struct bset_tree *t;
|
||||
|
||||
for_each_bset(b, t)
|
||||
if (offset <= t->end_offset) {
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user