Most filesystems just call fscrypt_set_context on new inodes, which
usually causes a setxattr. That's a bit late for ceph, which can send
along a full set of attributes with the create request.
Doing so allows it to avoid race windows that where the new inode could
be seen by other clients without the crypto context attached. It also
avoids the separate round trip to the server.
Refactor the fscrypt code a bit to allow us to create a new crypto
context, attach it to the inode, and write it to the buffer, but without
calling set_context on it. ceph can later use this to marshal the
context into the attributes we send along with the create request.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
For ceph, we want to use our own scheme for handling filenames that are
are longer than NAME_MAX after encryption and Base64 encoding. This
allows us to have a consistent view of the encrypted filenames for
clients that don't support fscrypt and clients that do but that don't
have the key.
Currently, fs/crypto only supports encrypting filenames using
fscrypt_setup_filename, but that also handles encoding nokey names. Ceph
can't use that because it handles nokey names in a different way.
Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to
__fscrypt_fname_encrypted_size and add a new wrapper called
fscrypt_fname_encrypted_size that takes an inode argument rather than a
pointer to a fscrypt_policy union.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Unfortunately the design of fscrypt_set_test_dummy_encryption() doesn't
work properly for the new mount API, as it combines too many steps into
one function:
- Parse the argument to test_dummy_encryption
- Check the setting against the filesystem instance
- Apply the setting to the filesystem instance
The new mount API has split these into separate steps. ext4 partially
worked around this by duplicating some of the logic, but it still had
some bugs. To address this, add some new helper functions that split up
the steps of fscrypt_set_test_dummy_encryption():
- fscrypt_parse_test_dummy_encryption()
- fscrypt_dummy_policies_equal()
- fscrypt_add_test_dummy_key()
While we're add it, also add a function fscrypt_is_dummy_policy_set()
which will be useful to avoid some #ifdef's.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220501050857.538984-5-ebiggers@kernel.org
FS_CRYPTO_BLOCK_SIZE is neither the filesystem block size nor the
granularity of encryption. Rather, it defines two logically separate
constraints that both arise from the block size of the AES cipher:
- The alignment required for the lengths of file contents blocks
- The minimum input/output length for the filenames encryption modes
Since there are way too many things called the "block size", and the
connection with the AES block size is not easily understood, split
FS_CRYPTO_BLOCK_SIZE into two constants FSCRYPT_CONTENTS_ALIGNMENT and
FSCRYPT_FNAME_MIN_MSG_LEN that more clearly describe what they are.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220405010914.18519-1-ebiggers@kernel.org
Encrypted files traditionally haven't supported DIO, due to the need to
encrypt/decrypt the data. However, when the encryption is implemented
using inline encryption (blk-crypto) instead of the traditional
filesystem-layer encryption, it is straightforward to support DIO.
In preparation for supporting this, add the following functions:
- fscrypt_dio_supported() checks whether a DIO request is supported as
far as encryption is concerned. Encrypted files will only support DIO
when inline encryption is used and the I/O request is properly
aligned; this function checks these preconditions.
- fscrypt_limit_io_blocks() limits the length of a bio to avoid crossing
a place in the file that a bio with an encryption context cannot
cross due to a DUN discontiguity. This function is needed by
filesystems that use the iomap DIO implementation (which operates
directly on logical ranges, so it won't use fscrypt_mergeable_bio())
and that support FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32.
Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220128233940.79464-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Add a helper function fscrypt_symlink_getattr() which will be called
from the various filesystems' ->getattr() methods to read and decrypt
the target of encrypted symlinks in order to report the correct st_size.
Detailed explanation:
As required by POSIX and as documented in various man pages, st_size for
a symlink is supposed to be the length of the symlink target.
Unfortunately, st_size has always been wrong for encrypted symlinks
because st_size is populated from i_size from disk, which intentionally
contains the length of the encrypted symlink target. That's slightly
greater than the length of the decrypted symlink target (which is the
symlink target that userspace usually sees), and usually won't match the
length of the no-key encoded symlink target either.
This hadn't been fixed yet because reporting the correct st_size would
require reading the symlink target from disk and decrypting or encoding
it, which historically has been considered too heavyweight to do in
->getattr(). Also historically, the wrong st_size had only broken a
test (LTP lstat03) and there were no known complaints from real users.
(This is probably because the st_size of symlinks isn't used too often,
and when it is, typically it's for a hint for what buffer size to pass
to readlink() -- which a slightly-too-large size still works for.)
However, a couple things have changed now. First, there have recently
been complaints about the current behavior from real users:
- Breakage in rpmbuild:
https://github.com/rpm-software-management/rpm/issues/1682https://github.com/google/fscrypt/issues/305
- Breakage in toybox cpio:
https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html
- Breakage in libgit2: https://issuetracker.google.com/issues/189629152
(on Android public issue tracker, requires login)
Second, we now cache decrypted symlink targets in ->i_link. Therefore,
taking the performance hit of reading and decrypting the symlink target
in ->getattr() wouldn't be as big a deal as it used to be, since usually
it will just save having to do the same thing later.
Also note that eCryptfs ended up having to read and decrypt symlink
targets in ->getattr() as well, to fix this same issue; see
commit 3a60a1686f ("eCryptfs: Decrypt symlink target for stat size").
So, let's just bite the bullet, and read and decrypt the symlink target
in ->getattr() in order to report the correct st_size. Add a function
fscrypt_symlink_getattr() which the filesystems will call to do this.
(Alternatively, we could store the decrypted size of symlinks on-disk.
But there isn't a great place to do so, and encryption is meant to hide
the original size to some extent; that property would be lost.)
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210702065350.209646-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've made more work into per-file compression support.
For example, F2FS_IOC_GET | SET_COMPRESS_OPTION provides a way to
change the algorithm or cluster size per file. F2FS_IOC_COMPRESS |
DECOMPRESS_FILE provides a way to compress and decompress the existing
normal files manually.
There is also a new mount option, compress_mode=fs|user, which can
control who compresses the data.
Chao also added a checksum feature with a mount option so that
we are able to detect any corrupted cluster.
In addition, Daniel contributed casefolding with encryption patch,
which will be used for Android devices.
Summary:
Enhancements:
- add ioctls and mount option to manage per-file compression feature
- support casefolding with encryption
- support checksum for compressed cluster
- avoid IO starvation by replacing mutex with rwsem
- add sysfs, max_io_bytes, to control max bio size
Bug fixes:
- fix use-after-free issue when compression and fsverity are enabled
- fix consistency corruption during fault injection test
- fix data offset for lseek
- get rid of buffer_head which has 32bits limit in fiemap
- fix some bugs in multi-partitions support
- fix nat entry count calculation in shrinker
- fix some stat information
And, we've refactored some logics and fix minor bugs as well"
* tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
f2fs: compress: fix compression chksum
f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
f2fs: fix race of pending_pages in decompression
f2fs: fix to account inline xattr correctly during recovery
f2fs: inline: fix wrong inline inode stat
f2fs: inline: correct comment in f2fs_recover_inline_data
f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()
f2fs: convert to F2FS_*_INO macro
f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size
f2fs: don't allow any writes on readonly mount
f2fs: avoid race condition for shrinker count
f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
f2fs: add compress_mode mount option
f2fs: Remove unnecessary unlikely()
f2fs: init dirty_secmap incorrectly
f2fs: remove buffer_head which has 32bits limit
f2fs: fix wrong block count instead of bytes
f2fs: use new conversion functions between blks and bytes
f2fs: rename logical_to_blk and blk_to_logical
f2fs: fix kbytes written stat for multi-device case
...
This shifts the responsibility of setting up dentry operations from
fscrypt to the individual filesystems, allowing them to have their own
operations while still setting fscrypt's d_revalidate as appropriate.
Most filesystems can just use generic_set_encrypted_ci_d_ops, unless
they have their own specific dentry operations as well. That operation
will set the minimal d_ops required under the circumstances.
Since the fscrypt d_ops are set later on, we must set all d_ops there,
since we cannot adjust those later on. This should not result in any
change in behavior.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently it's impossible to delete files that use an unsupported
encryption policy, as the kernel will just return an error when
performing any operation on the top-level encrypted directory, even just
a path lookup into the directory or opening the directory for readdir.
More specifically, this occurs in any of the following cases:
- The encryption context has an unrecognized version number. Current
kernels know about v1 and v2, but there could be more versions in the
future.
- The encryption context has unrecognized encryption modes
(FSCRYPT_MODE_*) or flags (FSCRYPT_POLICY_FLAG_*), an unrecognized
combination of modes, or reserved bits set.
- The encryption key has been added and the encryption modes are
recognized but aren't available in the crypto API -- for example, a
directory is encrypted with FSCRYPT_MODE_ADIANTUM but the kernel
doesn't have CONFIG_CRYPTO_ADIANTUM enabled.
It's desirable to return errors for most operations on files that use an
unsupported encryption policy, but the current behavior is too strict.
We need to allow enough to delete files, so that people can't be stuck
with undeletable files when downgrading kernel versions. That includes
allowing directories to be listed and allowing dentries to be looked up.
Fix this by modifying the key setup logic to treat an unsupported
encryption policy in the same way as "key unavailable" in the cases that
are required for a recursive delete to work: preparing for a readdir or
a dentry lookup, revalidating a dentry, or checking whether an inode has
the same encryption policy as its parent directory.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201203022041.230976-10-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Now that fscrypt_get_encryption_info() is only called from files in
fs/crypto/ (due to all key setup now being handled by higher-level
helper functions instead of directly by filesystems), unexport it and
move its declaration to fscrypt_private.h.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201203022041.230976-9-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
The last remaining use of fscrypt_get_encryption_info() from filesystems
is for readdir (->iterate_shared()). Every other call is now in
fs/crypto/ as part of some other higher-level operation.
We need to add a new argument to fscrypt_get_encryption_info() to
indicate whether the encryption policy is allowed to be unrecognized or
not. Doing this is easier if we can work with high-level operations
rather than direct filesystem use of fscrypt_get_encryption_info().
So add a function fscrypt_prepare_readdir() which wraps the call to
fscrypt_get_encryption_info() for the readdir use case.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20201203022041.230976-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
In an encrypted directory, a regular dentry (one that doesn't have the
no-key name flag) can only be created if the directory's encryption key
is available.
Therefore the calls to fscrypt_require_key() in __fscrypt_prepare_link()
and __fscrypt_prepare_rename() are unnecessary, as these functions
already check that the dentries they're given aren't no-key names.
Remove these unnecessary calls to fscrypt_require_key().
Link: https://lore.kernel.org/r/20201118075609.120337-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
It's possible to create a duplicate filename in an encrypted directory
by creating a file concurrently with adding the encryption key.
Specifically, sys_open(O_CREAT) (or sys_mkdir(), sys_mknod(), or
sys_symlink()) can lookup the target filename while the directory's
encryption key hasn't been added yet, resulting in a negative no-key
dentry. The VFS then calls ->create() (or ->mkdir(), ->mknod(), or
->symlink()) because the dentry is negative. Normally, ->create() would
return -ENOKEY due to the directory's key being unavailable. However,
if the key was added between the dentry lookup and ->create(), then the
filesystem will go ahead and try to create the file.
If the target filename happens to already exist as a normal name (not a
no-key name), a duplicate filename may be added to the directory.
In order to fix this, we need to fix the filesystems to prevent
->create(), ->mkdir(), ->mknod(), and ->symlink() on no-key names.
(->rename() and ->link() need it too, but those are already handled
correctly by fscrypt_prepare_rename() and fscrypt_prepare_link().)
In preparation for this, add a helper function fscrypt_is_nokey_name()
that filesystems can use to do this check. Use this helper function for
the existing checks that fs/crypto/ does for rename and link.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20201118075609.120337-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Dentries that represent no-key names must have a dentry_operations that
includes fscrypt_d_revalidate(). Currently, this is handled by
fscrypt_prepare_lookup() installing fscrypt_d_ops.
However, ceph support for encryption
(https://lore.kernel.org/r/20200914191707.380444-1-jlayton@kernel.org)
can't use fscrypt_d_ops, since ceph already has its own
dentry_operations.
Similarly, ext4 and f2fs support for directories that are both encrypted
and casefolded
(https://lore.kernel.org/r/20200923010151.69506-1-drosen@google.com)
can't use fscrypt_d_ops either, since casefolding requires some dentry
operations too.
To satisfy both users, we need to move the responsibility of installing
the dentry_operations to filesystems.
In preparation for this, export fscrypt_d_revalidate() and give it a
!CONFIG_FS_ENCRYPTION stub.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200924054721.187797-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Originally we used the term "encrypted name" or "ciphertext name" to
mean the encoded filename that is shown when an encrypted directory is
listed without its key. But these terms are ambiguous since they also
mean the filename stored on-disk. "Encrypted name" is especially
ambiguous since it could also be understood to mean "this filename is
encrypted on-disk", similar to "encrypted file".
So we've started calling these encoded names "no-key names" instead.
Therefore, rename DCACHE_ENCRYPTED_NAME to DCACHE_NOKEY_NAME to avoid
confusion about what this flag means.
Link: https://lore.kernel.org/r/20200924042624.98439-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Currently we're using the term "ciphertext name" ambiguously because it
can mean either the actual ciphertext filename, or the encoded filename
that is shown when an encrypted directory is listed without its key.
The latter we're now usually calling the "no-key name"; and while it's
derived from the ciphertext name, it's not the same thing.
To avoid this ambiguity, rename fscrypt_name::is_ciphertext_name to
fscrypt_name::is_nokey_name, and update comments that say "ciphertext
name" (or "encrypted name") to say "no-key name" instead when warranted.
Link: https://lore.kernel.org/r/20200924042624.98439-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
fscrypt_set_test_dummy_encryption() requires that the optional argument
to the test_dummy_encryption mount option be specified as a substring_t.
That doesn't work well with filesystems that use the new mount API,
since the new way of parsing mount options doesn't use substring_t.
Make it take the argument as a 'const char *' instead.
Instead of moving the match_strdup() into the callers in ext4 and f2fs,
make them just use arg->from directly. Since the pattern is
"test_dummy_encryption=%s", the argument will be null-terminated.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
The behavior of the test_dummy_encryption mount option is that when a
new file (or directory or symlink) is created in an unencrypted
directory, it's automatically encrypted using a dummy encryption policy.
That's it; in particular, the encryption (or lack thereof) of existing
files (or directories or symlinks) doesn't change.
Unfortunately the implementation of test_dummy_encryption is a bit weird
and confusing. When test_dummy_encryption is enabled and a file is
being created in an unencrypted directory, we set up an encryption key
(->i_crypt_info) for the directory. This isn't actually used to do any
encryption, however, since the directory is still unencrypted! Instead,
->i_crypt_info is only used for inheriting the encryption policy.
One consequence of this is that the filesystem ends up providing a
"dummy context" (policy + nonce) instead of a "dummy policy". In
commit ed318a6cc0 ("fscrypt: support test_dummy_encryption=v2"), I
mistakenly thought this was required. However, actually the nonce only
ends up being used to derive a key that is never used.
Another consequence of this implementation is that it allows for
'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
case that can be forgotten about. For example, currently
FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
dummy encryption policy when the filesystem is mounted with
test_dummy_encryption. That seems like the wrong thing to do, since
again, the directory itself is not actually encrypted.
Therefore, switch to a more logical and maintainable implementation
where the dummy encryption policy inheritance is done without setting up
keys for unencrypted directories. This involves:
- Adding a function fscrypt_policy_to_inherit() which returns the
encryption policy to inherit from a directory. This can be a real
policy, a dummy policy, or no policy.
- Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
- Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
of an inode.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
In preparation for moving the logic for "get the encryption policy
inherited by new files in this directory" to a single place, make
fscrypt_prepare_symlink() a regular function rather than an inline
function that wraps __fscrypt_prepare_symlink().
This way, the new function fscrypt_policy_to_inherit() won't need to be
exported to filesystems.
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-12-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
fscrypt_get_encryption_info() is intended to be GFP_NOFS-safe. But
actually it isn't, since it uses functions like crypto_alloc_skcipher()
which aren't GFP_NOFS-safe, even when called under memalloc_nofs_save().
Therefore it can deadlock when called from a context that needs
GFP_NOFS, e.g. during an ext4 transaction or between f2fs_lock_op() and
f2fs_unlock_op(). This happens when creating a new encrypted file.
We can't fix this by just not setting up the key for new inodes right
away, since new symlinks need their key to encrypt the symlink target.
So we need to set up the new inode's key before starting the
transaction. But just calling fscrypt_get_encryption_info() earlier
doesn't work, since it assumes the encryption context is already set,
and the encryption context can't be set until the transaction.
The recently proposed fscrypt support for the ceph filesystem
(https://lkml.kernel.org/linux-fscrypt/20200821182813.52570-1-jlayton@kernel.org/T/#u)
will have this same ordering problem too, since ceph will need to
encrypt new symlinks before setting their encryption context.
Finally, f2fs can deadlock when the filesystem is mounted with
'-o test_dummy_encryption' and a new file is created in an existing
unencrypted directory. Similarly, this is caused by holding too many
locks when calling fscrypt_get_encryption_info().
To solve all these problems, add new helper functions:
- fscrypt_prepare_new_inode() sets up a new inode's encryption key
(fscrypt_info), using the parent directory's encryption policy and a
new random nonce. It neither reads nor writes the encryption context.
- fscrypt_set_context() persists the encryption context of a new inode,
using the information from the fscrypt_info already in memory. This
replaces fscrypt_inherit_context().
Temporarily keep fscrypt_inherit_context() around until all filesystems
have been converted to use fscrypt_set_context().
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20200917041136.178600-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>