Commit Graph

95 Commits

Author SHA1 Message Date
Eric Biggers
2d985f8c6b vfs: support STATX_DIOALIGN on block devices
Add support for STATX_DIOALIGN to block devices, so that direct I/O
alignment restrictions are exposed to userspace in a generic way.

Note that this breaks the tradition of stat operating only on the block
device node, not the block device itself.  However, it was felt that
doing this is preferable, in order to make the interface useful and
avoid needing separate interfaces for regular files and block devices.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20220827065851.135710-3-ebiggers@kernel.org
2022-09-11 19:47:12 -05:00
Eric Biggers
825cf206ed statx: add direct I/O alignment information
Traditionally, the conditions for when DIO (direct I/O) is supported
were fairly simple.  For both block devices and regular files, DIO had
to be aligned to the logical block size of the block device.

However, due to filesystem features that have been added over time (e.g.
multi-device support, data journalling, inline data, encryption, verity,
compression, checkpoint disabling, log-structured mode), the conditions
for when DIO is allowed on a regular file have gotten increasingly
complex.  Whether a particular regular file supports DIO, and with what
alignment, can depend on various file attributes and filesystem mount
options, as well as which block device(s) the file's data is located on.

Moreover, the general rule of DIO needing to be aligned to the block
device's logical block size was recently relaxed to allow user buffers
(but not file offsets) aligned to the DMA alignment instead.  See
commit bf8d08532b ("iomap: add support for dma aligned direct-io").

XFS has an ioctl XFS_IOC_DIOINFO that exposes DIO alignment information.
Uplifting this to the VFS is one possibility.  However, as discussed
(https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u),
this ioctl is rarely used and not known to be used outside of
XFS-specific code.  It was also never intended to indicate when a file
doesn't support DIO at all, nor was it intended for block devices.

Therefore, let's expose this information via statx().  Add the
STATX_DIOALIGN flag and two new statx fields associated with it:

* stx_dio_mem_align: the alignment (in bytes) required for user memory
  buffers for DIO, or 0 if DIO is not supported on the file.

* stx_dio_offset_align: the alignment (in bytes) required for file
  offsets and I/O segment lengths for DIO, or 0 if DIO is not supported
  on the file.  This will only be nonzero if stx_dio_mem_align is
  nonzero, and vice versa.

Note that as with other statx() extensions, if STATX_DIOALIGN isn't set
in the returned statx struct, then these new fields won't be filled in.
This will happen if the file is neither a regular file nor a block
device, or if the file is a regular file and the filesystem doesn't
support STATX_DIOALIGN.  It might also happen if the caller didn't
include STATX_DIOALIGN in the request mask, since statx() isn't required
to return unrequested information.

This commit only adds the VFS-level plumbing for STATX_DIOALIGN.  For
regular files, individual filesystems will still need to add code to
support it.  For block devices, a separate commit will wire it up too.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220827065851.135710-2-ebiggers@kernel.org
2022-09-11 19:47:04 -05:00
Linus Torvalds
35b51afd23 Merge tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:

 - Support for the Svpbmt extension, which allows memory attributes to
   be encoded in pages

 - Support for the Allwinner D1's implementation of page-based memory
   attributes

 - Support for running rv32 binaries on rv64 systems, via the compat
   subsystem

 - Support for kexec_file()

 - Support for the new generic ticket-based spinlocks, which allows us
   to also move to qrwlock. These should have already gone in through
   the asm-geneic tree as well

 - A handful of cleanups and fixes, include some larger ones around
   atomics and XIP

* tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
  riscv: compat: Using seperated vdso_maps for compat_vdso_info
  RISC-V: Fix the XIP build
  RISC-V: Split out the XIP fixups into their own file
  RISC-V: ignore xipImage
  RISC-V: Avoid empty create_*_mapping definitions
  riscv: Don't output a bogus mmu-type on a no MMU kernel
  riscv: atomic: Add custom conditional atomic operation implementation
  riscv: atomic: Optimize dec_if_positive functions
  riscv: atomic: Cleanup unnecessary definition
  RISC-V: Load purgatory in kexec_file
  RISC-V: Add purgatory
  RISC-V: Support for kexec_file on panic
  RISC-V: Add kexec_file support
  RISC-V: use memcpy for kexec_file mode
  kexec_file: Fix kexec_file.c build error for riscv platform
  riscv: compat: Add COMPAT Kbuild skeletal support
  riscv: compat: ptrace: Add compat_arch_ptrace implement
  riscv: compat: signal: Add rt_frame implementation
  riscv: add memory-type errata for T-Head
  ...
2022-05-31 14:10:54 -07:00
Guo Ren
f18ed30db2 fs: stat: compat: Add __ARCH_WANT_COMPAT_STAT
RISC-V doesn't neeed compat_stat, so using __ARCH_WANT_COMPAT_STAT
to exclude unnecessary SYSCALL functions.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Helge Deller <deller@gmx.de>  # parisc
Link: https://lore.kernel.org/r/20220405071314.3225832-6-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-04-26 13:35:45 -07:00
Mikulas Patocka
932aba1e16 stat: fix inconsistency between struct stat and struct compat_stat
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
st_dev and st_rdev; struct compat_stat (defined in
arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
a 16-bit padding.

This patch fixes struct compat_stat to match struct stat.

[ Historical note: the old x86 'struct stat' did have that 16-bit field
  that the compat layer had kept around, but it was changes back in 2003
  by "struct stat - support larger dev_t":

    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d

  and back in those days, the x86_64 port was still new, and separate
  from the i386 code, and had already picked up the old version with a
  16-bit st_dev field ]

Note that we can't change compat_dev_t because it is used by
compat_loop_info.

Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
old_valid_dev to test if the value fits into them.  This fixes
-EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
number 259.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-12 13:35:08 -10:00
Stefan Roesch
1b6fe6e0df io-uring: Make statx API stable
One of the key architectual tenets is to keep the parameters for
io-uring stable. After the call has been submitted, its value can
be changed. Unfortunaltely this is not the case for the current statx
implementation.

IO-Uring change:
This changes replaces the const char * filename pointer in the io_statx
structure with a struct filename *. In addition it also creates the
filename object during the prepare phase.

With this change, the opcode also needs to invoke cleanup, so the
filename object gets freed after processing the request.

fs change:
This replaces the const char* __user filename parameter in the two
functions do_statx and vfs_statx with a struct filename *. In addition
to be able to correctly construct a filename object a new helper
function getname_statx_lookup_flags is introduced. The function makes
sure that do_statx and vfs_statx is invoked with the correct lookup flags.

Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220225185326.1373304-2-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-10 09:33:55 -07:00
Amir Goldstein
4f911138c8 fs: add generic helper for filling statx attribute flags
The immutable and append-only properties on an inode are published on
the inode's i_flags and enforced by the VFS.

Create a helper to fill the corresponding STATX_ATTR_ flags in the kstat
structure from the inode's i_flags.

Only orange was converted to use this helper.
Other filesystems could use it in the future.

Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-08-17 11:47:43 +02:00
Theodore Ts'o
5afa7e8b70 fs: fix reporting supported extra file attributes for statx()
statx(2) notes that any attribute that is not indicated as supported
by stx_attributes_mask has no usable value.  Commits 801e523796
("fs: move generic stat response attr handling to vfs_getattr_nosec")
and 712b2698e4 ("fs/stat: Define DAX statx attribute") sets
STATX_ATTR_AUTOMOUNT and STATX_ATTR_DAX, respectively, without setting
stx_attributes_mask, which can cause xfstests generic/532 to fail.

Fix this in the same way as commit 1b9598c8fb ("xfs: fix reporting
supported extra file attributes for statx()")

Fixes: 801e523796 ("fs: move generic stat response attr handling to vfs_getattr_nosec")
Fixes: 712b2698e4 ("fs/stat: Define DAX statx attribute")
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-04-17 23:03:50 -04:00
Christian Brauner
549c729771 fs: make helpers idmap mount aware
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:20 +01:00
Christian Brauner
0d56a4518d stat: handle idmapped mounts
The generic_fillattr() helper fills in the basic attributes associated
with an inode. Enable it to handle idmapped mounts. If the inode is
accessed through an idmapped mount map it into the mount's user
namespace before we store the uid and gid. If the initial user namespace
is passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-12-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Christoph Hellwig
f2d077ff1b fs: remove KSTAT_QUERY_FLAGS
KSTAT_QUERY_FLAGS expands to AT_STATX_SYNC_TYPE, which itself already
is a mask.  Remove the double name, especially given that the prefix
is a little confusing vs the normal AT_* flags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Christoph Hellwig
b3f0515004 fs: remove vfs_stat_set_lookup_flags
The function really obsfucates checking for valid flags and setting the
lookup flags.  The fact that it returns -EINVAL through and unsigned
return value, which is then used as boolean really doesn't help either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Christoph Hellwig
09f1bde401 fs: move vfs_fstatat out of line
This allows to keep vfs_statx static in fs/stat.c to prepare for the following
changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Christoph Hellwig
da9aa5d96b fs: remove vfs_statx_fd
vfs_statx_fd is only used to implement vfs_fstat.  Remove vfs_statx_fd
and just implement vfs_fstat directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-26 22:55:05 -04:00
Linus Torvalds
96ed320d52 Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull DAX updates part one from Darrick Wong:
 "After many years of LKML-wrangling about how to enable programs to
  query and influence the file data access mode (DAX) when a filesystem
  resides on storage devices such as persistent memory, Ira Weiny has
  emerged with a proposed set of standard behaviors that has not been
  shot down by anyone! We're more or less standardizing on the current
  XFS behavior and adapting ext4 to do the same.

  This is the first of a handful pull requests that will make ext4 and
  XFS present a consistent interface for user programs that care about
  DAX. We add a statx attribute that programs can check to see if DAX is
  enabled on a particular file. Then, we update the DAX documentation to
  spell out the user-visible behaviors that filesystems will guarantee
  (until the next storage industry shakeup). The on-disk inode flag has
  been in XFS for a few years now.

  Summary:

   - Clean up io_is_direct.

   - Add a new statx flag to indicate when file data access is being
     done via DAX (as opposed to the page cache).

   - Update the documentation for how system administrators and
     application programmers can take advantage of the (still
     experimental DAX) feature"

Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/

* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  Documentation/dax: Update Usage section
  fs/stat: Define DAX statx attribute
  fs: Remove unneeded IS_DAX() check in io_is_direct()
2020-06-02 19:45:12 -07:00
Linus Torvalds
1ee08de1e2 Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
 "A relatively quiet round, mostly just fixes and code improvements. In
particular:

   - Make statx just use the generic statx handler, instead of open
     coding it. We don't need that anymore, as we always call it async
     safe (Bijan)

   - Enable closing of the ring itself. Also fixes O_PATH closure (me)

   - Properly name completion members (me)

   - Batch reap of dead file registrations (me)

   - Allow IORING_OP_POLL with double waitqueues (me)

   - Add tee(2) support (Pavel)

   - Remove double off read (Pavel)

   - Fix overflow cancellations (Pavel)

   - Improve CQ timeouts (Pavel)

   - Async defer drain fixes (Pavel)

   - Add support for enabling/disabling notifications on a registered
     eventfd (Stefano)

   - Remove dead state parameter (Xiaoguang)

   - Disable SQPOLL submit on dying ctx (Xiaoguang)

   - Various code cleanups"

* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
  io_uring: fix overflowed reqs cancellation
  io_uring: off timeouts based only on completions
  io_uring: move timeouts flushing to a helper
  statx: hide interfaces no longer used by io_uring
  io_uring: call statx directly
  statx: allow system call to be invoked from io_uring
  io_uring: add io_statx structure
  io_uring: get rid of manual punting in io_close
  io_uring: separate DRAIN flushing into a cold path
  io_uring: don't re-read sqe->off in timeout_prep()
  io_uring: simplify io_timeout locking
  io_uring: fix flush req->refs underflow
  io_uring: don't submit sqes when ctx->refs is dying
  io_uring: async task poll trigger cleanup
  io_uring: add tee(2) support
  splice: export do_tee()
  io_uring: don't repeat valid flag list
  io_uring: rename io_file_put()
  io_uring: remove req->needs_fixed_files
  io_uring: cleanup io_poll_remove_one() logic
  ...
2020-06-02 15:42:50 -07:00
Bijan Mottahedeh
6f88cc176a statx: hide interfaces no longer used by io_uring
The io_uring interfaces have been replaced by do_statx() and are no
longer needed.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-26 16:48:06 -06:00
Bijan Mottahedeh
0018784fc8 statx: allow system call to be invoked from io_uring
This is a prepatory patch to allow io_uring to invoke statx directly.

Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-26 16:48:06 -06:00
Miklos Szeredi
80340fe360 statx: add mount_root
Determining whether a path or file descriptor refers to a mountpoint (or
more precisely a mount root) is not trivial using current tools.

Add a flag to statx that indicates whether the path or fd refers to the
root of a mount or not.

Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Reported-by: Lennart Poettering <mzxreary@0pointer.de>
Reported-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Miklos Szeredi
fa2fcf4f1d statx: add mount ID
Systemd is hacking around to get it and it's trivial to add to statx, so...

Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Miklos Szeredi
761e28fa27 statx: don't clear STATX_ATIME on SB_RDONLY
IS_NOATIME(inode) is defined as __IS_FLG(inode, SB_RDONLY|SB_NOATIME), so
generic_fillattr() will clear STATX_ATIME from the result_mask if the super
block is marked read only.

This was probably not the intention, so fix to only clear STATX_ATIME if
the fs doesn't support atime at all.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Miklos Szeredi
581701b7ef uapi: deprecate STATX_ALL
Constants of the *_ALL type can be actively harmful due to the fact that
developers will usually fail to consider the possible effects of future
changes to the definition.

Deprecate STATX_ALL in the uapi, while no damage has been done yet.

We could keep something like this around in the kernel, but there's
actually no point, since all filesystems should be explicitly checking
flags that they support and not rely on the VFS masking unknown ones out: a
flag could be known to the VFS, yet not known to the filesystem.

Cc: David Howells <dhowells@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-man@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-14 16:44:24 +02:00
Ira Weiny
712b2698e4 fs/stat: Define DAX statx attribute
In order for users to determine if a file is currently operating in DAX
state (effective DAX).  Define a statx attribute value and set that
attribute if the effective DAX flag is set.

To go along with this we propose the following addition to the statx man
page:

STATX_ATTR_DAX

	The file is in the DAX (cpu direct access) state.  DAX state
	attempts to minimize software cache effects for both I/O and
	memory mappings of this file.  It requires a file system which
	has been configured to support DAX.

	DAX generally assumes all accesses are via cpu load / store
	instructions which can minimize overhead for small accesses, but
	may adversely affect cpu utilization for large transfers.

	File I/O is done directly to/from user-space buffers and memory
	mapped I/O may be performed with direct memory mappings that
	bypass kernel page cache.

	While the DAX property tends to result in data being transferred
	synchronously, it does not give the same guarantees of O_SYNC
	where data and the necessary metadata are transferred together.

	A DAX file may support being mapped with the MAP_SYNC flag,
	which enables a program to use CPU cache flush instructions to
	persist CPU store operations without an explicit fsync(2).  See
	mmap(2) for more information.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 08:49:39 -07:00
Jens Axboe
3934e36f60 fs: make two stat prep helpers available
To implement an async stat, we need to provide the flags mapping and
the statx user copy. Make them available internally, through
fs/internal.h.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20 17:03:54 -07:00
Christoph Hellwig
801e523796 fs: move generic stat response attr handling to vfs_getattr_nosec
generic_fillattr is an optional helper that isn't used by all file
systems, move handling purely based on inode flags to vfs_getattr_nosec,
which is common code.

This fixes setting this flag for file systems not using generic_fillattr
like xfs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-01 01:55:45 -05:00