As all the in-tree modules has been ported to the new Kernel 7
fs mount API, the old API has been removed.
This patch add the support for the new set of APIs, while
keeping previous kernel support intact.
The code is in review on the Canonical Launchpad bug
https://launchpad.net/bugs/2142837 as well
Signed-off-by: Alessio Faina <alessio.faina@canonical.com>
The build is broken for the 6.19 release candidate:
https://github.com/linux-apfs/linux-apfs-rw/issues/109
The problem is that the inode i_state field can no longer be accessed
directly. Switch to the new helpers, using a wrapper to keep the code
consistent for all kernel versions.
Note that I always use the *_raw version of the helpers to avoid a lockdep
check that will not pass. I can't tell for sure if this is a bug.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
The build is broken for the 6.18 release candidate:
https://github.com/linux-apfs/linux-apfs-rw/issues/107
The problem is that ->d_name has become const, and that qualifier gets
dropped when we pass it around in the driver. Just treat is as const for
all kernel versions.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
The build is still broken for the 6.17 release candidate. The problem is
that ->write_begin() and ->write_end() now take a kiocb struct as the
first argument instead of a file. Add the proper version checks. The
driver does nothing with this argument but pass it along, so continue to
call it "file" to keep the patch simple.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
It seems that wait_for_stable_page(), page_has_buffers() and
grab_cache_page_write_begin() are no longer available, so put
implementations inside the driver.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Running generic/269 of xfstests quickly gets orphan cleanups to hit
ENOSPC, which prints an error message. The problem is that this doesn't
happen just once, but over and over again, completely flooding dmesg for
no good reason.
Retrying cleanups after ENOSPC does have some value, but wait until we
actually made some room. For other errors it's just a bad idea, since
something has gone seriously wrong; we'll usually just get EROFS anyway,
but just don't retry at all.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
If a function throws ENOSPC halfway through an operation, the
transaction will get aborted so that the filesystem isn't left in an
inconsistent state. This means that the mount will go read-only, and
that the user will lose any uncommitted changes.
So, it's better if we detect ENOSPC before starting an operation. I
tried this before with commit a52b73ed97 ("Check free space before
starting a transaction"), but the code was messy and it would have
required constant updates, which I neglected of course. The problem was
that I wanted to be as accurate as possible with my calculations of
required space, but I'm not sure there is any point.
Instead, require a large fixed amount of space (512 KiB) for most
operations, and a much smaller but still almost certainly sufficient
amount of space (80 KiB) for deletions. Hopefully this will be enough
for users to maneuver around ENOSPC situations. It's not perfect, but no
CoW filesystem can be truly perfect in this regard.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
The build is broken for redhat kernels:
https://github.com/linux-apfs/linux-apfs-rw/issues/89
This seems to have been the case for more than a year, but nobody had
reported it so far, so I guess the driver isn't getting much use over
there.
We had encountered a similar problem before in commit 1a0b9fb8af ("Fix
build for RHEL 9 kernels"), but back then I thought it was just a
one-off thing. I was wrong: redhat kernels often backport patches that
break internal apis. I don't know how other out-of-tree drivers handle
this, but for now I will have to rely on user reports.
So, go through all RHEL 9 kernels one by one, find out which api changes
were applied to each, and add the corresponding macro version checks to
the driver.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
This has been around for a while so I guess checkpatch isn't picking up
on it for some reason.
[eafer: took this from a bigger patch, added commit message]
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
The build is broken for the 6.12 release candidate. The problem is that
->write_begin() and ->write_end() now take folios as parameters instead
of pages. The same is true, of course, of the associated
__block_write_begin() and generic_write_end(). Update the driver.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Orphan cleanup explicitly reports the error code from apfs_iget() when
it gets swallowed, but it's always ENODATA so it looks silly. Leave the
message in place but ignore the error code.
Instead, report all error codes returned by apfs_clean_orphans() before
they get discarded by apfs_orphan_cleanup_work(). That information might
actually be useful at some point.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Gilgamesh reports via email that the following iozone benchmark is
failing:
./iozone -a -i 0 -r 1M -n 1g -g 1g -f /mnt/apfs/test
I've tried this myself on a small (3 GiB) filesystem, and it is the
commit after the unlink that fails, complaining about the main free
queue being full:
APFS (800010g): free queue has too many nodes (348 > 347) (apfs_free_queue_insert_nocache:849)
The problem is that random writes to a big file create a lot of extents,
and when the file is deleted they all get dumped to the main free queue
as separate records. The queue gets full and the commit fails.
Stop forcing file deletion to happen in a single transaction. Instead,
set up a workqueue whenever an orphan inode gets evicted, and leave the
cleanup for another time. Also allow the cleanup to execute only
partially (deleting the first extents like the official driver) and
reschedule itself (again via inode eviction).
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
I've been working on making some very needed changes to how node splits
work. The plan is to always rerun queries after a split, to keep the
query chain valid. One problem I've encountered while implementing this
is that the search key is not really preserved by apfs_update_inode().
We have a pointer to it, but it was allocated in the stack of some other
function, so the data is just noise.
I've always thought it was silly to allocate both a query and a key
before a lookup. Instead, turn the search key into a field within the
query structure, and copy it to the children as the query gets run. This
will ensure that the key remains available for future reruns.
One thing to note is that, while this simplifies most callers a little
bit, some of them still allocate a key in the stack and later copy it to
the query. I didn't want to make any big changes and risk introducing
bugs at this point, since I'm busy with other stuff.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Like I mentioned in the previous patch, df is refusing to acknowledge
multiple mounted volumes or snapshots for the same device, and it's
reporting only one. Luckily the source code for df is straightforward,
so I figured out that it calls stat() on the mount point to find the
device number instead of just accepting the one reported by mountinfo.
This way each mounted volume/snapshot can choose to report a different
number, instead of just going with the one set in the superblock.
After confirming that this is indeed what btrfs does, implement it here.
This fixes the df issue, but sadly it doesn't fix all problems with
thunar. It can tell the different mounts apart, but for some reason I
need to close it and reopen it for that to happen (in fact I now realize
that it could already do that in the previous commit). Since btrfs does
have plugins for udisks and such, maybe this is not the driver's fault,
but I'm not sure.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Attempts to run xfstests for the 6.6 kernel fail brutally. Transactions
are getting aborted inside apfs_update_time(), which has never happened
before. The problem comes from upstream commit 541d4c798a59 ("fs: drop
the timespec64 arg from generic_update_time"): generic_update_time()
can't fail and used to always return a symbolic 0, but now instead
returns some flags - which my code mistakes for an error. We don't care
about the flags, so just ignore the return value for all kernel
versions.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Some code that wasn't originally written by me has a style that differs
slightly from the rest of the module. Since I've been refactoring
compress.c quite a lot lately, I have made the inconsistencies more
visible, so I think it's a good time to switch everything to the typical
kernel style.
For this purpose I'm running checkpatch.pl for the first time in years,
and it has detected a handful of other issues that I introduced myself.
Fix most of the issues now, but I prefer to leave the zero-length array
stuff for a separate patch.
For the record, at the moment I'm choosing to ignore style inconsistency
in the compression libraries, even in libzbitmap which I wrote myself. I
prefer to keep them as close to upstream as possible.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
In commit 31df49c6d8 ("Ban writes to compressed files explicitly"), I
blocked writes to compressed files when they actually happen, not when
the file gets opened. My fear was that banning rw open entirely could
break other people's code, since I'm not sure who is using this driver
and how.
Now I've implemented compressed mmap, and blocking writes when they
happen is becoming a little too hacky. Do this properly, and wait to see
if anyone complains.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Attempts to mmap compressed files currently fail with ENODEV. I guess I
had forgotten about compressed mmap. Implement it now, in read-only mode
like all compressed operations so far.
Since the existing read code works with userland buffers, it can't be
reused to copy stuff into the kernel. For now I've been forced to
duplicate a lot of code, but I think it might be a good idea to use the
generic read implementations instead. I'll look into that soon.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
The official implementation doesn't seem to truly support writes to
compressed files: when it's attempted, the file gets decompressed
transparently instead. I don't intend to that in this driver, unless
someone actually requests it. Users can just make a decompressed copy of
the file and replace the original.
By mistake, truncating compressed files is currently allowed, but since
compressed reads don't depend on the vfs-reported size, nothing seems to
happen. Other writes fail, but the error code is confusing and there is
no output on dmesg. Ban all writes explicitly, and explain the problem
in the console.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
Adamantinum reports trouble while attempting to use apfs as root with
kernel version 6.4.8:
https://github.com/linux-apfs/linux-apfs-rw/issues/50
His callstack shows an npd in mmap, and the bug can be easily reproduced
by generic/029, but not in kernel version 5.3, where I usually run my
tests. Via bisection I tracked the problem back to kernel commit
0af573780b0b ("mm: require ->set_page_dirty to be explicitly wired up").
Starting with kernel 5.14, the ->set_page_dirty() address space method
gets called unconditionally, without checking for null first, so set it
explicitly.
Kernel 5.18 replaced ->set_page_dirty() with ->dirty_folio(), so take
care of that as well.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
The driver is much closer to being usable, so I might start getting
subtler bug reports soon. To make them easier to handle, put error
messages all over the place. I should have done this from the beginning,
but I guess I didn't fully understand the need back then.
From now my general policy will be to use apfs_warn() for user errors or
unsupported features; apfs_err() for things that are probably corruption
or io errors; and apfs_alert() for things that are most likely bugs.
These last two should be rare, so the same error/alert will be thrown by
several layers in the callstack to provide as much information as
possible. Be careful and don't flood the console on normal situations.
Also, make messages with a log level lower than warning output their
function name and line number, which I think will help debugging more
than the actual messages.
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>