Commit Graph

3758 Commits

Author SHA1 Message Date
Linus Torvalds
0e4656a299 Merge tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
 "The most notable changes are:

   - iomap no longer invalidates the page cache when performing a direct
     read, since doing so is unnecessary and the old directio code
     doesn't do that either.

   - iomap embraced the use of returning ENOTBLK from a direct write to
     trigger falling back to a buffered write since ext4 already did
     this and btrfs wants it for their port.

   - iomap falls back to buffered writes if we're doing a direct write
     and the page cache invalidation after the flush fails; this was
     necessary to handle a corner case in the btrfs port.

   - Remove email virus scanner detritus that was accidentally included
     in yesterday's pull request. Clearly I need(ed) to update my git
     branch checker scripts. :("

* tag 'iomap-5.9-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: fall back to buffered writes for invalidation failures
  xfs: use ENOTBLK for direct I/O to buffered I/O fallback
  iomap: Only invalidate page cache pages on direct IO writes
  iomap: Make sure iomap_end is called after iomap_begin
2020-08-06 19:35:12 -07:00
Christoph Hellwig
60263d5889 iomap: fall back to buffered writes for invalidation failures
Failing to invalid the page cache means data in incoherent, which is
a very bad state for the system.  Always fall back to buffered I/O
through the page cache if we can't invalidate mappings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
2020-08-05 09:24:16 -07:00
Eric Biggers
4f74d15fe4 ext4: add inline encryption support
Wire up ext4 to support inline encryption via the helper functions which
fs/crypto/ now provides.  This includes:

- Adding a mount option 'inlinecrypt' which enables inline encryption
  on encrypted files where it can be used.

- Setting the bio_crypt_ctx on bios that will be submitted to an
  inline-encrypted file.

  Note: submit_bh_wbc() in fs/buffer.c also needed to be patched for
  this part, since ext4 sometimes uses ll_rw_block() on file data.

- Not adding logically discontiguous data to bios that will be submitted
  to an inline-encrypted file.

- Not doing filesystem-layer crypto on inline-encrypted files.

Co-developed-by: Satya Tangirala <satyat@google.com>
Signed-off-by: Satya Tangirala <satyat@google.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20200702015607.1215430-5-satyat@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-07-08 10:29:43 -07:00
Linus Torvalds
3be20b6fc1 Merge tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull more ext4 updates from Ted Ts'o:
 "This is the second round of ext4 commits for 5.8 merge window [1].

  It includes the per-inode DAX support, which was dependant on the DAX
  infrastructure which came in via the XFS tree, and a number of
  regression and bug fixes; most notably the "BUG: using
  smp_processor_id() in preemptible code in ext4_mb_new_blocks" reported
  by syzkaller"

[1] The pull request actually came in 15 minutes after I had tagged the
    rc1 release. Tssk, tssk, late..   - Linus

* tag 'ext4-for-linus-5.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
  ext4: support xattr gnu.* namespace for the Hurd
  ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
  ext4: avoid utf8_strncasecmp() with unstable name
  ext4: stop overwrite the errcode in ext4_setup_super
  ext4: fix partial cluster initialization when splitting extent
  ext4: avoid race conditions when remounting with options that change dax
  Documentation/dax: Update DAX enablement for ext4
  fs/ext4: Introduce DAX inode flag
  fs/ext4: Remove jflag variable
  fs/ext4: Make DAX mount option a tri-state
  fs/ext4: Only change S_DAX on inode load
  fs/ext4: Update ext4_should_use_dax()
  fs/ext4: Change EXT4_MOUNT_DAX to EXT4_MOUNT_DAX_ALWAYS
  fs/ext4: Disallow verity if inode is DAX
  fs/ext4: Narrow scope of DAX check in setflags
2020-06-15 09:32:10 -07:00
zhangyi (F)
7b97d868b7 ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers
In the ext4 filesystem with errors=panic, if one process is recording
errno in the superblock when invoking jbd2_journal_abort() due to some
error cases, it could be raced by another __ext4_abort() which is
setting the SB_RDONLY flag but missing panic because errno has not been
recorded.

jbd2_journal_commit_transaction()
 jbd2_journal_abort()
  journal->j_flags |= JBD2_ABORT;
  jbd2_journal_update_sb_errno()
                                    | ext4_journal_check_start()
                                    |  __ext4_abort()
                                    |   sb->s_flags |= SB_RDONLY;
                                    |   if (!JBD2_REC_ERR)
                                    |        return;
  journal->j_flags |= JBD2_REC_ERR;

Finally, it will no longer trigger panic because the filesystem has
already been set read-only. Fix this by introduce j_abort_mutex to make
sure journal abort is completed before panic, and remove JBD2_REC_ERR
flag.

Fixes: 4327ba52af ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12 14:51:41 -04:00
Jan (janneke) Nieuwenhuizen
88ee9d571b ext4: support xattr gnu.* namespace for the Hurd
The Hurd gained[0] support for moving the translator and author
fields out of the inode and into the "gnu.*" xattr namespace.

In anticipation of that, an xattr INDEX was reserved[1].  The Hurd has
now been brought into compliance[2] with that.

This patch adds support for reading and writing such attributes from
Linux; you can now do something like

    mkdir -p hurd-root/servers/socket
    touch hurd-root/servers/socket/1
    setfattr --name=gnu.translator --value='"/hurd/pflocal\0"' \
        hurd-root/servers/socket/1
    getfattr --name=gnu.translator hurd-root/servers/socket/1
    # file: 1
    gnu.translator="/hurd/pflocal"

to setup a pipe translator, which is being used to create[3] a
vm-image for the Hurd from GNU Guix.

[0] https://summerofcode.withgoogle.com/projects/#5869799859027968
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3980bd3b406addb327d858aebd19e229ea340b9a
[2] https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=a04c7bf83172faa7cb080fbe3b6c04a8415ca645
[3] https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-hurd-vm

Signed-off-by: Jan Nieuwenhuizen <janneke@gnu.org>
Link: https://lore.kernel.org/r/20200525193940.878-1-janneke@gnu.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-12 13:23:34 -04:00
Ritesh Harjani
8119853653 ext4: mballoc: Use this_cpu_read instead of this_cpu_ptr
Simplify reading a seq variable by directly using this_cpu_read API
instead of doing this_cpu_ptr and then dereferencing it.

This also avoid the below kernel BUG: which happens when
CONFIG_DEBUG_PREEMPT is enabled

BUG: using smp_processor_id() in preemptible [00000000] code: syz-fuzzer/6927
caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
 ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
 ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
 ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
 ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
 ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
 ext4_append+0x153/0x360 fs/ext4/namei.c:67
 ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
 ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
 vfs_mkdir+0x419/0x690 fs/namei.c:3632
 do_mkdirat+0x21e/0x280 fs/namei.c:3655
 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 42f56b7a4a7d ("ext4: mballoc: introduce pcpu seqcnt for freeing PA
to improve ENOSPC handling")
Suggested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: syzbot+82f324bb69744c5f6969@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/534f275016296996f54ecf65168bb3392b6f653d.1591699601.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11 11:03:26 -04:00
Eric Biggers
2ce3ee931a ext4: avoid utf8_strncasecmp() with unstable name
If the dentry name passed to ->d_compare() fits in dentry::d_iname, then
it may be concurrently modified by a rename.  This can cause undefined
behavior (possibly out-of-bounds memory accesses or crashes) in
utf8_strncasecmp(), since fs/unicode/ isn't written to handle strings
that may be concurrently modified.

Fix this by first copying the filename to a stack buffer if needed.
This way we get a stable snapshot of the filename.

Fixes: b886ee3e77 ("ext4: Support case-insensitive file name lookups")
Cc: <stable@vger.kernel.org> # v5.2+
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Rosenberg <drosen@google.com>
Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20200601200543.59417-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11 11:01:33 -04:00
yangerkun
5adaccac46 ext4: stop overwrite the errcode in ext4_setup_super
Now the errcode from ext4_commit_super will overwrite EROFS exists in
ext4_setup_super. Actually, no need to call ext4_commit_super since we
will return EROFS. Fix it by goto done directly.

Fixes: c89128a008 ("ext4: handle errors on ext4_commit_super")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200601073404.3712492-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11 10:59:38 -04:00
Jeffle Xu
cfb3c85a60 ext4: fix partial cluster initialization when splitting extent
Fix the bug when calculating the physical block number of the first
block in the split extent.

This bug will cause xfstests shared/298 failure on ext4 with bigalloc
enabled occasionally. Ext4 error messages indicate that previously freed
blocks are being freed again, and the following fsck will fail due to
the inconsistency of block bitmap and bg descriptor.

The following is an example case:

1. First, Initialize a ext4 filesystem with cluster size '16K', block size
'4K', in which case, one cluster contains four blocks.

2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
tree of this file is like:

...
36864:[0]4:220160
36868:[0]14332:145408
51200:[0]2:231424
...

3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
like:

..
ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
...

4. Then the extent tree of this file after punching is like

...
49507:[0]37:158047
49547:[0]58:158087
...

5. Detailed procedure of punching hole [49544, 49546]

5.1. The block address space:
```
lblk        ~49505  49506   49507~49543     49544~49546    49547~
	  ---------+------+-------------+----------------+--------
	    extent | hole |   extent	|	hole	 | extent
	  ---------+------+-------------+----------------+--------
pblk       ~158045  158046  158047~158083  158084~158086   158087~
```

5.2. The detailed layout of cluster 39521:
```
		cluster 39521
	<------------------------------->

		hole		  extent
	<----------------------><--------

lblk      49544   49545   49546   49547
	+-------+-------+-------+-------+
	|	|	|	|	|
	+-------+-------+-------+-------+
pblk     158084  1580845  158086  158087
```

5.3. The ftrace output when punching hole [49544, 49546]:
- ext4_ext_remove_space (start 49544, end 49546)
  - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
    - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
      - ext4_free_blocks: (block 158084 count 4)
        - ext4_mballoc_free (extent 1/6753/1)

5.4. Ext4 error message in dmesg:
EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters

In this case, the whole cluster 39521 is freed mistakenly when freeing
pblock 158084~158086 (i.e., the first three blocks of this cluster),
although pblock 158087 (the last remaining block of this cluster) has
not been freed yet.

The root cause of this isuue is that, the pclu of the partial cluster is
calculated mistakenly in ext4_ext_remove_space(). The correct
partial_cluster.pclu (i.e., the cluster number of the first block in the
next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
than 39522.

Fixes: f4226d9ea4 ("ext4: fix partial cluster initialization")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Eric Whitney <enwlinux@gmail.com>
Cc: stable@kernel.org # v3.19+
Link: https://lore.kernel.org/r/1590121124-37096-1-git-send-email-jefflexu@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11 10:57:40 -04:00
Theodore Ts'o
829b37b8cd ext4: avoid race conditions when remounting with options that change dax
Trying to change dax mount options when remounting could allow mount
options to be enabled for a small amount of time, and then the mount
option change would be reverted.

In the case of "mount -o remount,dax", this can cause a race where
files would temporarily treated as DAX --- and then not.

Cc: stable@kernel.org
Reported-by: syzbot+bca9799bf129256190da@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-11 10:54:07 -04:00
Theodore Ts'o
68cd44920d Enable ext4 support for per-file/directory dax operations
This adds the same per-file/per-directory DAX support for ext4 as was
done for xfs, now that we finally have consensus over what the
interface should be.
2020-06-11 10:51:44 -04:00
Linus Torvalds
595a56ac1b Merge tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kunit updates from Shuah Khan:
 "This consists of:

   - Several config fragment fixes from Anders Roxell to improve test
     coverage.

   - Improvements to kunit run script to use defconfig as default and
     restructure the code for config/build/exec/parse from Vitor Massaru
     Iha and David Gow.

   - Miscellaneous documentation warn fix"

* tag 'linux-kselftest-kunit-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  security: apparmor: default KUNIT_* fragments to KUNIT_ALL_TESTS
  fs: ext4: default KUNIT_* fragments to KUNIT_ALL_TESTS
  drivers: base: default KUNIT_* fragments to KUNIT_ALL_TESTS
  lib: Kconfig.debug: default KUNIT_* fragments to KUNIT_ALL_TESTS
  kunit: default KUNIT_* fragments to KUNIT_ALL_TESTS
  kunit: Kconfig: enable a KUNIT_ALL_TESTS fragment
  kunit: Fix TabError, remove defconfig code and handle when there is no kunitconfig
  kunit: use KUnit defconfig by default
  kunit: use --build_dir=.kunit as default
  Documentation: test.h - fix warnings
  kunit: kunit_tool: Separate out config/build/exec/parse
2020-06-09 10:04:47 -07:00
Michel Lespinasse
c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Linus Torvalds
9daa0a27a0 Merge tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells:
 "There's some core VFS changes which affect a couple of filesystems:

   - Make the inode hash table RCU safe and providing some RCU-safe
     accessor functions. The search can then be done without taking the
     inode_hash_lock. Care must be taken because the object may be being
     deleted and no wait is made.

   - Allow iunique() to avoid taking the inode_hash_lock.

   - Allow AFS's callback processing to avoid taking the inode_hash_lock
     when using the inode table to find an inode to notify.

   - Improve Ext4's time updating. Konstantin Khlebnikov said "For now,
     I've plugged this issue with try-lock in ext4 lazy time update.
     This solution is much better."

  Then there's a set of changes to make a number of improvements to the
  AFS driver:

   - Improve callback (ie. third party change notification) processing
     by:

      (a) Relying more on the fact we're doing this under RCU and by
          using fewer locks. This makes use of the RCU-based inode
          searching outlined above.

      (b) Moving to keeping volumes in a tree indexed by volume ID
          rather than a flat list.

      (c) Making the server and volume records logically part of the
          cell. This means that a server record now points directly at
          the cell and the tree of volumes is there. This removes an N:M
          mapping table, simplifying things.

   - Improve keeping NAT or firewall channels open for the server
     callbacks to reach the client by actively polling the fileserver on
     a timed basis, instead of only doing it when we have an operation
     to process.

   - Improving detection of delayed or lost callbacks by including the
     parent directory in the list of file IDs to be queried when doing a
     bulk status fetch from lookup. We can then check to see if our copy
     of the directory has changed under us without us getting notified.

   - Determine aliasing of cells (such as a cell that is pointed to be a
     DNS alias). This allows us to avoid having ambiguity due to
     apparently different cells using the same volume and file servers.

   - Improve the fileserver rotation to do more probing when it detects
     that all of the addresses to a server are listed as non-responsive.
     It's possible that an address that previously stopped responding
     has become responsive again.

  Beyond that, lay some foundations for making some calls asynchronous:

   - Turn the fileserver cursor struct into a general operation struct
     and hang the parameters off of that rather than keeping them in
     local variables and hang results off of that rather than the call
     struct.

   - Implement some general operation handling code and simplify the
     callers of operations that affect a volume or a volume component
     (such as a file). Most of the operation is now done by core code.

   - Operations are supplied with a table of operations to issue
     different variants of RPCs and to manage the completion, where all
     the required data is held in the operation object, thereby allowing
     these to be called from a workqueue.

   - Put the standard "if (begin), while(select), call op, end" sequence
     into a canned function that just emulates the current behaviour for
     now.

  There are also some fixes interspersed:

   - Don't let the EACCES from ICMP6 mapping reach the user as such,
     since it's confusing as to whether it's a filesystem error. Convert
     it to EHOSTUNREACH.

   - Don't use the epoch value acquired through probing a server. If we
     have two servers with the same UUID but in different cells, it's
     hard to draw conclusions from them having different epoch values.

   - Don't interpret the argument to the CB.ProbeUuid RPC as a
     fileserver UUID and look up a fileserver from it.

   - Deal with servers in different cells having the same UUIDs. In the
     event that a CB.InitCallBackState3 RPC is received, we have to
     break the callback promises for every server record matching that
     UUID.

   - Don't let afs_statfs return values that go below 0.

   - Don't use running fileserver probe state to make server selection
     and address selection decisions on. Only make decisions on final
     state as the running state is cleared at the start of probing"

Acked-by: Al Viro <viro@zeniv.linux.org.uk> (fs/inode.c part)

* tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (27 commits)
  afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly
  afs: Show more a bit more server state in /proc/net/afs/servers
  afs: Don't use probe running state to make decisions outside probe code
  afs: Fix afs_statfs() to not let the values go below zero
  afs: Fix the by-UUID server tree to allow servers with the same UUID
  afs: Reorganise volume and server trees to be rooted on the cell
  afs: Add a tracepoint to track the lifetime of the afs_volume struct
  afs: Detect cell aliases 3 - YFS Cells with a canonical cell name op
  afs: Detect cell aliases 2 - Cells with no root volumes
  afs: Detect cell aliases 1 - Cells with root volumes
  afs: Implement client support for the YFSVL.GetCellName RPC op
  afs: Retain more of the VLDB record for alias detection
  afs: Fix handling of CB.ProbeUuid cache manager op
  afs: Don't get epoch from a server because it may be ambiguous
  afs: Build an abstraction around an "operation" concept
  afs: Rename struct afs_fs_cursor to afs_operation
  afs: Remove the error argument from afs_protocol_error()
  afs: Set error flag rather than return error from file status decode
  afs: Make callback processing more efficient.
  afs: Show more information in /proc/net/afs/servers
  ...
2020-06-05 16:26:36 -07:00
Linus Torvalds
0b166a57e6 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "A lot of bug fixes and cleanups for ext4, including:

   - Fix performance problems found in dioread_nolock now that it is the
     default, caused by transaction leaks.

   - Clean up fiemap handling in ext4

   - Clean up and refactor multiple block allocator (mballoc) code

   - Fix a problem with mballoc with a smaller file systems running out
     of blocks because they couldn't properly use blocks that had been
     reserved by inode preallocation.

   - Fixed a race in ext4_sync_parent() versus rename()

   - Simplify the error handling in the extent manipulation code

   - Make sure all metadata I/O errors are felected to
     ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.

   - Avoid passing an error pointer to brelse in ext4_xattr_set()

   - Fix race which could result to freeing an inode on the dirty last
     in data=journal mode.

   - Fix refcount handling if ext4_iget() fails

   - Fix a crash in generic/019 caused by a corrupted extent node"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
  ext4: avoid unnecessary transaction starts during writeback
  ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
  ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
  fs: remove the access_ok() check in ioctl_fiemap
  fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
  fs: move fiemap range validation into the file systems instances
  iomap: fix the iomap_fiemap prototype
  fs: move the fiemap definitions out of fs.h
  fs: mark __generic_block_fiemap static
  ext4: remove the call to fiemap_check_flags in ext4_fiemap
  ext4: split _ext4_fiemap
  ext4: fix fiemap size checks for bitmap files
  ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
  add comment for ext4_dir_entry_2 file_type member
  jbd2: avoid leaking transaction credits when unreserving handle
  ext4: drop ext4_journal_free_reserved()
  ext4: mballoc: use lock for checking free blocks while retrying
  ext4: mballoc: refactor ext4_mb_good_group()
  ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
  ext4: mballoc: refactor ext4_mb_discard_preallocations()
  ...
2020-06-05 16:19:28 -07:00
Jan Kara
6b8ed62008 ext4: avoid unnecessary transaction starts during writeback
ext4_writepages() currently works in a loop like:
  start a transaction
  scan inode for pages to write
  map and submit these pages
  stop the transaction

This loop results in starting transaction once more than is needed
because in the last iteration we start a transaction only to scan the
inode and find there are no pages to write. This can be significant
increase in number of transaction starts for single-extent files or
files that have all blocks already mapped. Furthermore we already know
from previous iteration whether there are more pages to write or not. So
propagate the information from mpage_prepare_extent_to_map() and avoid
unnecessary looping in case there are no more pages to write.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200525081215.29451-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:56 -04:00
Jens Axboe
6e014c621e ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
Running with some debug patches to detect illegal blocking triggered the
extend/unaligned condition in ext4. If ext4 needs to extend the file (and
hence go to buffered IO), or if the app is doing unaligned IO, then ext4
asks the iomap code to wait for IO completion. If the caller asked for
no-wait semantics by setting IOCB_NOWAIT, then ext4 should return -EAGAIN
instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

Link: https://lore.kernel.org/r/76152096-2bbb-7682-8fce-4cb498bcd909@kernel.dk
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig
ba98890393 ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
access_ok just checks we are fed a proper user pointer.  We also do that
in copy_to_user itself, so no need to do this early.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-10-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig
45dd052e67 fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
By moving FIEMAP_FLAG_SYNC handling to fiemap_prep we ensure it is
handled once instead of duplicated, but can still be done under fs locks,
like xfs/iomap intended with its duplicate handling.  Also make sure the
error value of filemap_write_and_wait is propagated to user space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-8-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig
cddf8a2c4a fs: move fiemap range validation into the file systems instances
Replace fiemap_check_flags with a fiemap_prep helper that also takes the
inode and mapped range, and performs the sanity check and truncation
previously done in fiemap_check_range.  This way the validation is inside
the file system itself and thus properly works for the stacked overlayfs
case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-7-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig
10c5db2864 fs: move the fiemap definitions out of fs.h
No need to pull the fiemap definitions into almost every file in the
kernel build.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:55 -04:00
Christoph Hellwig
da565e792b ext4: remove the call to fiemap_check_flags in ext4_fiemap
iomap_fiemap already calls fiemap_check_flags first thing, so this
additional check is redundant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-3-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Christoph Hellwig
03a5ed24c9 ext4: split _ext4_fiemap
The fiemap and EXT4_IOC_GET_ES_CACHE cases share almost no code, so split
them into entirely separate functions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200523073016.2944131-2-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00
Christoph Hellwig
328e24ae14 ext4: fix fiemap size checks for bitmap files
Add an extra validation of the len parameter, as for ext4 some files
might have smaller file size limits than others.  This also means the
redundant size check in ext4_ioctl_get_es_cache can go away, as all
size checking is done in the shared fiemap handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200505154324.3226743-3-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-06-03 23:16:54 -04:00