Commit Graph

33 Commits

Author SHA1 Message Date
Ernesto A. Fernández a3bca211bd Don't access inode state directly in 6.19
The build is broken for the 6.19 release candidate:

  https://github.com/linux-apfs/linux-apfs-rw/issues/109

The problem is that the inode i_state field can no longer be accessed
directly. Switch to the new helpers, using a wrapper to keep the code
consistent for all kernel versions.

Note that I always use the *_raw version of the helpers to avoid a lockdep
check that will not pass. I can't tell for sure if this is a bug.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2025-12-30 20:51:14 -03:00
Ernesto A. Fernández 0daf28cef7 Always treat ->d_name as const
The build is broken for the 6.18 release candidate:

  https://github.com/linux-apfs/linux-apfs-rw/issues/107

The problem is that ->d_name has become const, and that qualifier gets
dropped when we pass it around in the driver. Just treat is as const for
all kernel versions.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2025-10-27 22:30:27 -03:00
Ernesto A. Fernández c87d7aa77f Fix build for RHEL 9.6
The build has broken again for RHEL:

  https://github.com/linux-apfs/linux-apfs-rw/issues/95

Fix it.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2025-05-14 20:43:28 -03:00
Ernesto A. Fernández 51a5b742ec Fix the returned type of ->mkdir() for 6.15
The ->mkdir() method now needs to return a dentry struct instead of an
int, but this is meant for clustered filesystems or something so just
return NULL on success, like most filesystems do.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2025-04-21 19:24:48 -03:00
Ernesto A. Fernández 3174e363ad Simplify ENOSPC checks before transactions
If a function throws ENOSPC halfway through an operation, the
transaction will get aborted so that the filesystem isn't left in an
inconsistent state. This means that the mount will go read-only, and
that the user will lose any uncommitted changes.

So, it's better if we detect ENOSPC before starting an operation. I
tried this before with commit a52b73ed97 ("Check free space before
starting a transaction"), but the code was messy and it would have
required constant updates, which I neglected of course. The problem was
that I wanted to be as accurate as possible with my calculations of
required space, but I'm not sure there is any point.

Instead, require a large fixed amount of space (512 KiB) for most
operations, and a much smaller but still almost certainly sufficient
amount of space (80 KiB) for deletions. Hopefully this will be enough
for users to maneuver around ENOSPC situations. It's not perfect, but no
CoW filesystem can be truly perfect in this regard.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2025-03-11 16:15:41 -03:00
Ernesto A. Fernández 514a2164d8 Split orphan cleanup among multiple transactions
Gilgamesh reports via email that the following iozone benchmark is
failing:

  ./iozone -a -i 0 -r 1M -n 1g -g 1g -f /mnt/apfs/test

I've tried this myself on a small (3 GiB) filesystem, and it is the
commit after the unlink that fails, complaining about the main free
queue being full:

  APFS (800010g): free queue has too many nodes (348 > 347) (apfs_free_queue_insert_nocache:849)

The problem is that random writes to a big file create a lot of extents,
and when the file is deleted they all get dumped to the main free queue
as separate records. The queue gets full and the commit fails.

Stop forcing file deletion to happen in a single transaction. Instead,
set up a workqueue whenever an orphan inode gets evicted, and leave the
cleanup for another time. Also allow the cleanup to execute only
partially (deleting the first extents like the official driver) and
reschedule itself (again via inode eviction).

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2024-03-22 21:20:27 -03:00
Woody Suwalski 5dc836ad96 Support for kernel 6.7
[eafer: fixed some issues with folios and timestamps]
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2024-01-07 16:27:45 -03:00
Ernesto A. Fernández af1491622d Avoid filename normalizations issues on refresh
The previous patch introduced a regression in generic/013, seen when the
test is allowed to run for 20 minutes or so. The result is a corrupted
filesystem where one inode appears to have too many dentries, and
another inode has too few to balance.

The problem is that query refreshes just rerun the query once and expect
the correct result, but dentry queries for an existing record actually
need to be called repeatedly to deal with hash collisions. For
normalization-sensitive filesystems (which I use for most of my tests)
"collisions" are constant because there is no hash. So if a split
happens during a dentry deletion, chances are very high that the refresh
will end up pointing to the wrong record.

Don't add awkward code to deal with this to apfs_query_refresh().
Instead transform the multiple query from apfs_dentry_lookup() into a
regular exact query before returning, setting the key to the
unnormalized name. The refresh code will have no trouble working with
that.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2024-01-04 23:22:34 -03:00
Ernesto A. Fernández f1775ed0ef Turn the search key into a query field
I've been working on making some very needed changes to how node splits
work. The plan is to always rerun queries after a split, to keep the
query chain valid. One problem I've encountered while implementing this
is that the search key is not really preserved by apfs_update_inode().
We have a pointer to it, but it was allocated in the stack of some other
function, so the data is just noise.

I've always thought it was silly to allocate both a query and a key
before a lookup. Instead, turn the search key into a field within the
query structure, and copy it to the children as the query gets run. This
will ensure that the key remains available for future reruns.

One thing to note is that, while this simplifies most callers a little
bit, some of them still allocate a key in the stack and later copy it to
the query. I didn't want to make any big changes and risk introducing
bugs at this point, since I'm busy with other stuff.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2023-12-20 20:33:03 -03:00
Woody Suwalski 1bca725a63 Fix for the kernel 6.6 build errors
[eafer: removed unnecessary setting of some timestamps]
Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2023-10-25 22:43:27 -03:00
Ernesto A. Fernández 5c4773c036 Convert all strcpy() to strscpy()
The build appears to be broken for some configurations on the Linux 6.5
release candidate:

  https://github.com/linux-apfs/linux-apfs-rw/issues/49

Some sort of static overflow check is going on for strcpy(), and several
of my calls are triggering false positives. Just be safe and switch them
all to strscpy(). The qstr stuff in particular looked a bit too weird
anyway.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2023-08-08 19:14:10 -03:00
Ernesto A. Fernández d4797823ea Improve error reporting
The driver is much closer to being usable, so I might start getting
subtler bug reports soon. To make them easier to handle, put error
messages all over the place. I should have done this from the beginning,
but I guess I didn't fully understand the need back then.

From now my general policy will be to use apfs_warn() for user errors or
unsupported features; apfs_err() for things that are probably corruption
or io errors; and apfs_alert() for things that are most likely bugs.
These last two should be rare, so the same error/alert will be thrown by
several layers in the callstack to provide as much information as
possible. Be careful and don't flood the console on normal situations.

Also, make messages with a log level lower than warning output their
function name and line number, which I think will help debugging more
than the actual messages.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2023-04-05 21:59:24 -03:00
Woody Suwalski 24daddc890 Changes needed to build on kernel 6.3 2023-03-24 15:12:47 -03:00
Ernesto A. Fernández 03bc7bbc65 Adjust behaviour on orphan link creation
Make a few changes on orphan link behaviour to better match the official
implementation. It's not clear to me that this is mandatory, but I want
my fsck to work properly with official images: I've already modified it
to that end, so these changes need to be applied here as well to avoid
potential reports of corruption.

Anyway, the changes to orphan creation are four:

  - Report the inode's link count as 0, that is, don't count the orphan
    link itself.
  - Preserve the original primary name for the inode.
  - Preserve the inode's parent id.
  - Update the volume's file count immediately, instead of waiting until
    the orphan gets purged.

In essence, stop treating orphaned inodes as real files.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2023-01-06 19:36:57 -03:00
Ernesto A. Fernández 8d0c434441 Change the names of orphan files
The official apfs implementation uses the "0x%llx-dead" template to name
its orphan files. It's not really mandatory that all implementations
follow the same scheme, but try to do it anyway.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2023-01-06 19:36:57 -03:00
Ernesto A. Fernández 8f7e39583e Don't pass the parent directory to __apfs_link()
Get rid of the unused 'dir' parameter of __apfs_link().

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2022-03-01 21:37:23 -03:00
Ernesto A. Fernández a4720986c0 Don't pass the superblock to apfs_free_query()
I'm trying to build with "-W" for the first time, and it reports that
apfs_free_query() doesn't use the 'sb' parameter. Get rid of it.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2022-03-01 20:57:15 -03:00
Ernesto A. Fernández 8b1e37e232 Allow node data to be in memory instead of disk
Add a 'data' field to in-memory objects and point it to the buffer
head's 'b_data'; from now on, always access the data through this field
instead of the bh. To make sure I remembered to clean up all instances,
rename the buffer head in the object from 'bh' to 'o_bh'.

The point is to allow nodes to work with a data block that exists only
in memory, and the next patch will take advantage of this to improve
apfs_node_split(). Since objects that are not nodes will not need this
feature, only adjust the cleanup done in apfs_node_release() for now,
though maybe object cleanups should all be done in one place.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2022-02-25 17:34:11 -03:00
Ernesto A. Fernández 441540d919 Remove unused argument to apfs_init_drec_key()
The 'hashed' argument is never used by apfs_init_drec_key(), since we
can figure that out from the superblock. Get rid of it.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2022-02-03 16:50:24 -03:00
Ernesto A. Fernández dc1ed2cfe7 Rename SPDX license identifier
It seems that the license identifier currently in use (GPL-2.0) has been
deprecated:

  https://spdx.org/licenses/GPL-2.0.html

I don't know how important this is in practice, but I've received some
complaints about it:

  https://github.com/linux-apfs/linux-apfs-rw/issues/18

So, just run

  sed -i 's/2\.0/2.0-only/' *.{c,h}
  sed -i 's/2\.0/2.0-only/' Makefile

and change it to GPL-2.0-only.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2021-11-16 18:41:54 -03:00
Ernesto A. Fernández 07585e3d68 Get the dentry cache to work
Profiling has shown that the module spends too much time inside
apfs_lookup(). I had never really checked if the dentry cache was
working, so naturally that turned out to be the problem: I had always
assumed that the strings inside qstr structures were null-terminated,
but they may actually be a single component in a pathname, terminated by
a forward slash. The result was that we always searched the cache for
full pathnames, which were naturally never found.

To fix this, always pass the filename length from qstr to the unicode
handlers, and make them work without assuming a null-termination.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2021-11-09 18:01:16 -03:00
Ernesto A. Fernández 1bf1c35880 Fix build for kernels without d_instantiate_new()
It seems that d_instantiate_new() was introduced in 4.17 to fix a
problem with lockdep. I won't use lockdep myself in such an old kernel,
and I don't expect users to care. So, to get the module to build with
those kernels, just revert the code to unlock_new_inode/d_instantiate,
as was done before.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2021-09-14 13:02:08 -03:00
Ernesto A. Fernández 1ee197b6f4 Advance readdir context position after each emit
I'm running generic/257 of xfstests, and it reports that getdents64()
returns repeated values for d_off. It's been years since I implemented
this stuff, but it would seem that I thought only the final value of the
context position would matter, and I neglected to update it after each
call to dir_emit(). Fix this.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2021-08-20 20:02:10 -03:00
Ernesto A. Fernández b9920bb77d Support the sync family of syscalls
Implement the methods to support all syscalls such as sync() and
fsync(). Do this by just committing the whole transaction, which is not
very performant but should be technically valid. I would like to go back
to this eventually and get it right, but it will probably be tricky and
I would need to understand how btrfs and others handle it.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2021-08-16 22:59:18 -03:00
Ernesto A. Fernández 85ee7e2e21 Defer inode metadata writes
Don't write inode metadata (including the cached extent) to the
memory-mapped blocks until the transaction is about to be committed.
This should allow several consecutive metadata changes to be done in a
single operation. Most importantly, a new extent for appended content
will only be written once, in the end. Right now it's rewritten after
each block.

The performance improvement from this is not as big as I had hoped, it's
under 20%. I'm also sure this isn't the right way to approach this,
since I could be using the dirty inode lists from the vfs. So I'll need
to revise this patch at some point, but I'm leaving it for now.

Signed-off-by: Ernesto A. Fernández <ernesto@corellium.com>
2021-08-06 21:39:21 -03:00