Commit Graph

1559 Commits

Author SHA1 Message Date
Jan Kara 89e524c04f loop: Fix mount(2) failure due to race with LOOP_SET_FD
Commit 33ec3e53e7 ("loop: Don't change loop device under exclusive
opener") made LOOP_SET_FD ioctl acquire exclusive block device reference
while it updates loop device binding. However this can make perfectly
valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding
temporarily the exclusive bdev reference in cases like this:

for i in {a..z}{a..z}; do
        dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024
        mkfs.ext2 $i.image
        mkdir mnt$i
done

echo "Run"
for i in {a..z}{a..z}; do
        mount -o loop -t ext2 $i.image mnt$i &
done

Fix the problem by not getting full exclusive bdev reference in
LOOP_SET_FD but instead just mark the bdev as being claimed while we
update the binding information. This just blocks new exclusive openers
instead of failing them with EBUSY thus fixing the problem.

Fixes: 33ec3e53e7 ("loop: Don't change loop device under exclusive opener")
Cc: stable@vger.kernel.org
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-30 13:16:57 -06:00
Linus Torvalds 933a90bf4f Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount updates from Al Viro:
 "The first part of mount updates.

  Convert filesystems to use the new mount API"

* 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  mnt_init(): call shmem_init() unconditionally
  constify ksys_mount() string arguments
  don't bother with registering rootfs
  init_rootfs(): don't bother with init_ramfs_fs()
  vfs: Convert smackfs to use the new mount API
  vfs: Convert selinuxfs to use the new mount API
  vfs: Convert securityfs to use the new mount API
  vfs: Convert apparmorfs to use the new mount API
  vfs: Convert openpromfs to use the new mount API
  vfs: Convert xenfs to use the new mount API
  vfs: Convert gadgetfs to use the new mount API
  vfs: Convert oprofilefs to use the new mount API
  vfs: Convert ibmasmfs to use the new mount API
  vfs: Convert qib_fs/ipathfs to use the new mount API
  vfs: Convert efivarfs to use the new mount API
  vfs: Convert configfs to use the new mount API
  vfs: Convert binfmt_misc to use the new mount API
  convenience helper: get_tree_single()
  convenience helper get_tree_nodev()
  vfs: Kill sget_userns()
  ...
2019-07-19 10:42:02 -07:00
Linus Torvalds 5010fe9f09 Merge tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong:
 "Here's a patch series that sets up common parameter checking functions
  for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations.

  The goal here is to reduce the amount of behaviorial variance between
  the filesystems where those ioctls originated (ext2 and XFS,
  respectively) and everybody else.

   - Standardize parameter checking for the SETFLAGS and FSSETXATTR
     ioctls (which were the file attribute setters for ext4 and xfs and
     have now been hoisted to the vfs)

   - Only allow the DAX flag to be set on files and directories"

* tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  vfs: only allow FSSETXATTR to set DAX flag on files and dirs
  vfs: teach vfs_ioc_fssetxattr_check to check extent size hints
  vfs: teach vfs_ioc_fssetxattr_check to check project id info
  vfs: create a generic checking function for FS_IOC_FSSETXATTR
  vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
2019-07-12 16:54:37 -07:00
Linus Torvalds d2b6b4c832 Merge tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "Highlights:

   - Add a new /proc/fs/nfsd/clients/ directory which exposes some
     long-requested information about NFSv4 clients (like open files)
     and allows forced revocation of client state.

   - Replace the global duplicate reply cache by a cache per network
     namespace; previously, a request in one network namespace could
     incorrectly match an entry from another, though we haven't seen
     this in production. This is the last remaining container bug that
     I'm aware of; at this point you should be able to run separate
     nfsd's in each network namespace, each with their own set of
     exports, and everything should work.

   - Cleanup and modify lock code to show the pid of lockd as the owner
     of NLM locks. This is the correct version of the bugfix originally
     attempted in b8eee0e90f ("lockd: Show pid of lockd for remote
     locks")"

* tag 'nfsd-5.3' of git://linux-nfs.org/~bfields/linux: (34 commits)
  nfsd: Make __get_nfsdfs_client() static
  nfsd: Make two functions static
  nfsd: Fix misuse of strlcpy
  sunrpc/cache: remove the exporting of cache_seq_next
  nfsd: decode implementation id
  nfsd: create xdr_netobj_dup helper
  nfsd: allow forced expiration of NFSv4 clients
  nfsd: create get_nfsdfs_clp helper
  nfsd4: show layout stateids
  nfsd: show lock and deleg stateids
  nfsd4: add file to display list of client's opens
  nfsd: add more information to client info file
  nfsd: escape high characters in binary data
  nfsd: copy client's address including port number to cl_addr
  nfsd4: add a client info file
  nfsd: make client/ directory names small ints
  nfsd: add nfsd/clients directory
  nfsd4: use reference count to free client
  nfsd: rename cl_refcount
  nfsd: persist nfsd filesystem across mounts
  ...
2019-07-10 21:22:43 -07:00
Linus Torvalds 2e756758e5 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Many bug fixes and cleanups, and an optimization for case-insensitive
  lookups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix coverity warning on error path of filename setup
  ext4: replace ktype default_attrs with default_groups
  ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
  ext4: refactor initialize_dirent_tail()
  ext4: rename "dirent_csum" functions to use "dirblock"
  ext4: allow directory holes
  jbd2: drop declaration of journal_sync_buffer()
  ext4: use jbd2_inode dirty range scoping
  jbd2: introduce jbd2_inode dirty range scoping
  mm: add filemap_fdatawait_range_keep_errors()
  ext4: remove redundant assignment to node
  ext4: optimize case-insensitive lookups
  ext4: make __ext4_get_inode_loc plug
  ext4: clean up kerneldoc warnigns when building with W=1
  ext4: only set project inherit bit for directory
  ext4: enforce the immutable flag on open files
  ext4: don't allow any modifications to an immutable file
  jbd2: fix typo in comment of journal_submit_inode_data_buffers
  jbd2: fix some print format mistakes
  ext4: gracefully handle ext4_break_layouts() failure during truncate
2019-07-10 21:06:01 -07:00
Linus Torvalds 40f06c7995 Merge tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull copy_file_range updates from Darrick Wong:
 "This fixes numerous parameter checking problems and inconsistent
  behaviors in the new(ish) copy_file_range system call.

  Now the system call will actually check its range parameters
  correctly; refuse to copy into files for which the caller does not
  have sufficient privileges; update mtime and strip setuid like file
  writes are supposed to do; and allows copying up to the EOF of the
  source file instead of failing the call like we used to.

  Summary:

   - Create a generic copy_file_range handler and make individual
     filesystems responsible for calling it (i.e. no more assuming that
     do_splice_direct will work or is appropriate)

   - Refactor copy_file_range and remap_range parameter checking where
     they are the same

   - Install missing copy_file_range parameter checking(!)

   - Remove suid/sgid and update mtime like any other file write

   - Change the behavior so that a copy range crossing the source file's
     eof will result in a short copy to the source file's eof instead of
     EINVAL

   - Permit filesystems to decide if they want to handle
     cross-superblock copy_file_range in their local handlers"

* tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fuse: copy_file_range needs to strip setuid bits and update timestamps
  vfs: allow copy_file_range to copy across devices
  xfs: use file_modified() helper
  vfs: introduce file_modified() helper
  vfs: add missing checks to copy_file_range
  vfs: remove redundant checks from generic_remap_checks()
  vfs: introduce generic_file_rw_checks()
  vfs: no fallback for ->copy_file_range
  vfs: introduce generic_copy_file_range()
2019-07-10 20:32:37 -07:00
Linus Torvalds e6983afd92 Merge tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
 "This contains cleanups of the fsnotify name removal hook and also a
  patch to disable fanotify permission events for 'proc' filesystem"

* tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: get rid of fsnotify_nameremove()
  fsnotify: move fsnotify_nameremove() hook out of d_delete()
  configfs: call fsnotify_rmdir() hook
  debugfs: call fsnotify_{unlink,rmdir}() hooks
  debugfs: simplify __debugfs_remove_file()
  devpts: call fsnotify_unlink() hook
  tracefs: call fsnotify_{unlink,rmdir}() hooks
  rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks
  btrfs: call fsnotify_rmdir() hook
  fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
  fanotify: Disallow permission events for proc filesystem
2019-07-10 20:09:17 -07:00
Linus Torvalds 988052f47a Merge tag 'locks-v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking updates from Jeff Layton:
 "Just a couple of small lease-related patches this cycle.

  One from Ira to add a new tracepoint that fires during lease conflict
  checks, and another patch from Amir to reduce false positives when
  checking for lease conflicts"

* tag 'locks-v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: eliminate false positive conflicts for write lease
  locks: Add trace_leases_conflict
2019-07-10 19:21:38 -07:00
Benjamin Coddington f85d93385e locks: Cleanup lm_compare_owner and lm_owner_key
After the update to use nlm_lockowners for the NLM server, there are no
more users of lm_compare_owner and lm_owner_key.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-07-03 17:52:09 -04:00
Darrick J. Wong 7b0e492e6b vfs: create a generic checking function for FS_IOC_FSSETXATTR
Create a generic checking function for the incoming FS_IOC_FSSETXATTR
fsxattr values so that we can standardize some of the implementation
behaviors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-07-01 08:25:35 -07:00
Darrick J. Wong 5aca284210 vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
Create a generic function to check incoming FS_IOC_SETFLAGS flag values
and later prepare the inode for updates so that we can standardize the
implementations that follow ext4's flag values.

Note that the efivarfs implementation no longer fails a no-op SETFLAGS
without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2019-07-01 08:25:34 -07:00
Ross Zwisler aa0bfcd939 mm: add filemap_fdatawait_range_keep_errors()
In the spirit of filemap_fdatawait_range() and
filemap_fdatawait_keep_errors(), introduce
filemap_fdatawait_range_keep_errors() which both takes a range upon
which to wait and does not clear errors from the address space.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
2019-06-20 17:05:37 -04:00
Amir Goldstein 387e3746d0 locks: eliminate false positive conflicts for write lease
check_conflicting_open() is checking for existing fd's open for read or
for write before allowing to take a write lease.  The check that was
implemented using i_count and d_count is an approximation that has
several false positives.  For example, overlayfs since v4.19, takes an
extra reference on the dentry; An open with O_PATH takes a reference on
the dentry although the file cannot be read nor written.

Change the implementation to use i_readcount and i_writecount to
eliminate the false positive conflicts and allow a write lease to be
taken on an overlayfs file.

The change of behavior with existing fd's open with O_PATH is symmetric
w.r.t. current behavior of lease breakers - an open with O_PATH currently
does not break a write lease.

This increases the size of struct inode by 4 bytes on 32bit archs when
CONFIG_FILE_LOCKING is defined and CONFIG_IMA was not already
defined.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-06-19 08:49:38 -04:00
Amir Goldstein e38f7f53c3 vfs: introduce file_modified() helper
The combination of file_remove_privs() and file_update_mtime() is
quite common in filesystem ->write_iter() methods.

Modelled after the helper file_accessed(), introduce file_modified()
and use it from generic_remap_file_range_prep().

Note that the order of calling file_remove_privs() before
file_update_mtime() in the helper was matched to the more common order by
filesystems and not the current order in generic_remap_file_range_prep().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-09 10:06:19 -07:00
Amir Goldstein 96e6e8f4a6 vfs: add missing checks to copy_file_range
Like the clone and dedupe interfaces we've recently fixed, the
copy_file_range() implementation is missing basic sanity, limits and
boundary condition tests on the parameters that are passed to it
from userspace. Create a new "generic_copy_file_checks()" function
modelled on the generic_remap_checks() function to provide this
missing functionality.

[Amir] Shorten copy length instead of checking pos_in limits
because input file size already abides by the limits.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-09 10:06:19 -07:00
Amir Goldstein a31713517d vfs: introduce generic_file_rw_checks()
Factor out helper with some checks on in/out file that are
common to clone_file_range and copy_file_range.

Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-09 10:06:19 -07:00
Dave Chinner f16acc9d9b vfs: introduce generic_copy_file_range()
Right now if vfs_copy_file_range() does not use any offload
mechanism, it falls back to calling do_splice_direct(). This fails
to do basic sanity checks on the files being copied. Before we
start adding this necessarily functionality to the fallback path,
separate it out into generic_copy_file_range().

generic_copy_file_range() has the same prototype as
->copy_file_range() so that filesystems can use it in their custom
->copy_file_range() method if they so choose.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-09 10:06:18 -07:00
Mauro Carvalho Chehab 5c437fa295 docs: fs: fix broken links to vfs.txt with was renamed to vfs.rst
A recent documentation conversion renamed this file but forgot
to update the links.

Fixes: af96c1e304 ("docs: filesystems: vfs: Convert vfs.txt to RST")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-06-08 13:42:13 -06:00
Jan Kara 0b3b094ac9 fanotify: Disallow permission events for proc filesystem
Proc filesystem has special locking rules for various files. Thus
fanotify which opens files on event delivery can easily deadlock
against another process that waits for fanotify permission event to be
handled. Since permission events on /proc have doubtful value anyway,
just disallow them.

Link: https://lore.kernel.org/linux-fsdevel/20190320131642.GE9485@quack2.suse.cz/
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-05-28 18:10:07 +02:00
David Howells 023d066a0d vfs: Kill sget_userns()
Kill sget_userns(), folding it into sget() as that's the only remaining
user.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-fsdevel@vger.kernel.org
2019-05-25 18:06:17 -04:00
Al Viro 8d9e46d807 fold mount_pseudo_xattr() into pseudo_fs_get_tree()
... now that all other callers are gone

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-25 18:06:16 -04:00
David Howells bb7b6b2bbd vfs: Kill mount_ns()
Kill mount_ns() as it has been replaced by vfs_get_super() in the new mount
API.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-25 17:59:57 -04:00
Al Viro 1f58bb18f6 mount_pseudo(): drop 'name' argument, switch to d_make_root()
Once upon a time we used to set ->d_name of e.g. pipefs root
so that d_path() on pipes would work.  These days it's
completely pointless - dentries of pipes are not even connected
to pipefs root.  However, mount_pseudo() had set the root
dentry name (passed as the second argument) and callers
kept inventing names to pass to it.  Including those that
didn't *have* any non-root dentries to start with...

All of that had been pointless for about 8 years now; it's
time to get rid of that cargo-culting...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-25 17:59:24 -04:00
Linus Torvalds 5abe37954e Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Add as a feature case-insensitive directories (the casefold feature)
  using Unicode 12.1.

  Also, the usual largish number of cleanups and bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
  ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
  ext4: fix ext4_show_options for file systems w/o journal
  unicode: refactor the rule for regenerating utf8data.h
  docs: ext4.rst: document case-insensitive directories
  ext4: Support case-insensitive file name lookups
  ext4: include charset encoding information in the superblock
  MAINTAINERS: add Unicode subsystem entry
  unicode: update unicode database unicode version 12.1.0
  unicode: introduce test module for normalized utf8 implementation
  unicode: implement higher level API for string handling
  unicode: reduce the size of utf8data[]
  unicode: introduce code for UTF-8 normalization
  unicode: introduce UTF-8 character database
  ext4: actually request zeroing of inode table after grow
  ext4: cond_resched in work-heavy group loops
  ext4: fix use-after-free race with debug_want_extra_isize
  ext4: avoid drop reference to iloc.bh twice
  ext4: ignore e_value_offs for xattrs with value-in-ea-inode
  ext4: protect journal inode's blocks using block_validity
  ext4: use BUG() instead of BUG_ON(1)
  ...
2019-05-07 21:12:44 -07:00
Linus Torvalds e5fef2a973 Merge tag 'afs-next-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
 "A set of fix and development patches for AFS for 5.2.

  Summary:

   - Fix the AFS file locking so that sqlite can run on an AFS mount and
     also so that firefox and gnome can use a homedir that's mounted
     through AFS.

     This required emulation of fine-grained locking when the server
     will only support whole-file locks and no upgrade/downgrade. Four
     modes are provided, settable by mount parameter:

       "flock=local"   - No reference to the server

       "flock=openafs" - Fine-grained locks are local-only, whole-file
                         locks require sufficient server locks

       "flock=strict"  - All locks require sufficient server locks

       "flock=write"   - Always get an exclusive server lock

     If the volume is a read-only or backup volume, then flock=local for
     that volume.

   - Log extra information for a couple of cases where the client mucks
     up somehow: AFS vnode with undefined type and dir check failure -
     in both cases we seem to end up with unfilled data, but the issues
     happen infrequently and are difficult to reproduce at will.

   - Implement silly rename for unlink() and rename().

   - Set i_blocks so that du can get some information about usage.

   - Fix xattr handlers to return the right amount of data and to not
     overflow buffers.

   - Implement getting/setting raw AFS and YFS ACLs as xattrs"

* tag 'afs-next-20190507' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Implement YFS ACL setting
  afs: Get YFS ACLs and information through xattrs
  afs: implement acl setting
  afs: Get an AFS3 ACL as an xattr
  afs: Fix getting the afs.fid xattr
  afs: Fix the afs.cell and afs.volume xattr handlers
  afs: Calculate i_blocks based on file size
  afs: Log more information for "kAFS: AFS vnode with undefined type\n"
  afs: Provide mount-time configurable byte-range file locking emulation
  afs: Add more tracepoints
  afs: Implement sillyrename for unlink and rename
  afs: Add directory reload tracepoint
  afs: Handle lock rpc ops failing on a file that got deleted
  afs: Improve dir check failure reports
  afs: Add file locking tracepoints
  afs: Further fix file locking
  afs: Fix AFS file locking to allow fine grained locks
  afs: Calculate lock extend timer from set/extend reply reception
  afs: Split wait from afs_make_call()
2019-05-07 20:51:58 -07:00