Commit Graph

45813 Commits

Author SHA1 Message Date
Deepa Dinamani 3cd886666f vfs: Add current_time() api
current_fs_time() is used for inode timestamps.

Change the signature of the function to take inode pointer
instead of superblock as per Linus's suggestion.

Also, move the api under vfs as per the discussion on the
thread: https://lkml.org/lkml/2016/6/9/36 . As per Arnd's
suggestion on the thread, changing the function name.

current_fs_time() will be deleted after all the references
to it are replaced by current_time().

There was a bug reported by kbuild test bot with the change
as some of the calls to current_time() were made before the
super_block was initialized. Catch these accidental assignments
as timespec_trunc() does for wrong granularities. This allows
for the function to work right even in these circumstances.
But, adds a warning to make the user aware of the bug.

A coccinelle script was used to identify all the current
.alloc_inode super_block callbacks that updated inode timestamps.
proc filesystem was the only one that was modifying inode times
as part of this callback. The series includes a patch to fix that.

Note that timespec_trunc() will also be moved to fs/inode.c
in a separate patch when this will need to be revamped for
bounds checking purposes.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:20 -04:00
Linus Torvalds 98ac9a608d Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "nvdimm fixes for v4.8, two of them are tagged for -stable:

   - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
     DAX pmd mappings end up with an uncached pgprot, and unusable
     performance for the device-dax interface.  The device-dax interface
     appeared in 4.7 so this is tagged for -stable.

   - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
     understand DAX pmd entries.  This fix is tagged for -stable.

   - Fix a mis-merge of the nfit machine-check handler to flip the
     polarity of an if() to match the final version of the patch that
     Vishal sent for 4.8-rc1.  Without this the nfit machine check
     handler never detects / inserts new 'badblocks' entries which
     applications use to identify lost portions of files.

   - For test purposes, fix the nvdimm_clear_poison() path to operate on
     legacy / simulated nvdimm memory ranges.  Without this fix a test
     can set badblocks, but never clear them on these ranges.

   - Fix the range checking done by dax_dev_pmd_fault().  This is not
     tagged for -stable since this problem is mitigated by specifying
     aligned resources at device-dax setup time.

  These patches have appeared in a next release over the past week.  The
  recent rebase you can see in the timestamps was to drop an invalid fix
  as identified by the updated device-dax unit tests [1].  The -mm
  touches have an ack from Andrew"

[1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
   https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm: allow legacy (e820) pmem region to clear bad blocks
  nfit, mce: Fix SPA matching logic in MCE handler
  mm: fix cache mode of dax pmd mappings
  mm: fix show_smap() for zone_device-pmd ranges
  dax: fix mapping size check
2016-09-10 09:58:52 -07:00
Linus Torvalds 6905732c80 Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull fscrypto fixes fromTed Ts'o:
 "Fix some brown-paper-bag bugs for fscrypto, including one one which
  allows a malicious user to set an encryption policy on an empty
  directory which they do not own"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fscrypto: require write access to mount to set encryption policy
  fscrypto: only allow setting encryption policy on directories
  fscrypto: add authorization check for setting encryption policy
2016-09-10 09:18:33 -07:00
Eric Biggers ba63f23d69 fscrypto: require write access to mount to set encryption policy
Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-09-10 01:18:57 -04:00
Eric Biggers 002ced4be6 fscrypto: only allow setting encryption policy on directories
The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files.  This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.

In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.

This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later.  It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.

This patch restores the !S_ISDIR() check that was present in older
kernels.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-09 23:38:12 -04:00
Eric Biggers 163ae1c6ad fscrypto: add authorization check for setting encryption policy
On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access.  This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy.  This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-09 23:37:14 -04:00
Dan Williams ca120cf688 mm: fix show_smap() for zone_device-pmd ranges
Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
currently results in the following VM_BUG_ONs:

 kernel BUG at mm/huge_memory.c:1105!
 task: ffff88045f16b140 task.stack: ffff88045be14000
 RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
 [..]
 Call Trace:
  [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
  [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307656>] show_smap+0xa6/0x2b0

 kernel BUG at fs/proc/task_mmu.c:585!
 RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
 Call Trace:
  [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307696>] show_smap+0xa6/0x2b0

These locations are sanity checking page flags that must be set for an
anonymous transparent huge page, but are not set for the zone_device
pages associated with dax mappings.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-09 17:34:45 -07:00
Linus Torvalds 6dc728ccd3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "This fixes a deadlock when fuse, direct I/O and loop device are
  combined"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: direct-io: don't dirty ITER_BVEC pages
2016-09-09 13:00:41 -07:00
Linus Torvalds 5c44ad6a35 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fix from Miklos Szeredi:
 "This fixes a regression caused by the last pull request"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: fix workdir creation
2016-09-09 12:56:28 -07:00
Linus Torvalds f4a9c169c2 Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I'm not proud of how long it took me to track down that one liner in
  btrfs_sync_log(), but the good news is the patches I was trying to
  blame for these problems were actually fine (sorry Filipe)"

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress
  btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
  btrfs: do not decrease bytes_may_use when replaying extents
2016-09-09 12:52:31 -07:00
Chris Mason b7f3c7d345 Merge branch 'for-chris' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.8 2016-09-07 12:55:36 -07:00
Wang Xiaoguang ce129655c9 btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress
In btrfs_async_reclaim_metadata_space(), we use ticket's address to
determine whether asynchronous metadata reclaim work is making progress.

	ticket = list_first_entry(&space_info->tickets,
				  struct reserve_ticket, list);
	if (last_ticket == ticket) {
		flush_state++;
	} else {
		last_ticket = ticket;
		flush_state = FLUSH_DELAYED_ITEMS_NR;
		if (commit_cycles)
			commit_cycles--;
	}

But indeed it's wrong, we should not rely on local variable's address to
do this check, because addresses may be same. In my test environment, I
dd one 168MB file in a 256MB fs, found that for this file, every time
wait_reserve_ticket() called, local variable ticket's address is same,

For above codes, assume a previous ticket's address is addrA, last_ticket
is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
wake up it, then another ticket is added, but with the same address addrA,
now last_ticket will be same to current ticket, then current ticket's flush
work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
which may result in some enospc issues(I have seen this in my test machine).

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-06 16:31:43 +02:00
Chris Mason cbd60aa7cd Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out.  It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433debe added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list.  This
patch makes sure we remove things before exiting.

Signed-off-by: Chris Mason <clm@fb.com>
Fixes: d1433debe7
cc: stable@vger.kernel.org # 3.15+
2016-09-06 05:57:25 -07:00
Wang Xiaoguang ed7a694839 btrfs: do not decrease bytes_may_use when replaying extents
When replaying extents, there is no need to update bytes_may_use
in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a
WARN_ON about bytes_may_use.

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-05 17:40:41 +02:00
Nicolas Iooss 0f5aa88a7b ceph: do not modify fi->frag in need_reset_readdir()
Commit f3c4ebe65e ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
            } else if (fi->frag |= fpos_frag(new_pos)) {
                       ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^
                       (                             )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^~
                                !=

Fixes: f3c4ebe65e ("ceph: using hash value to compose dentry offset")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-09-05 14:30:35 +02:00
Miklos Szeredi e1ff3dd1ae ovl: fix workdir creation
Workdir creation fails in latest kernel.

Fix by allowing EOPNOTSUPP as a valid return value from
vfs_removexattr(XATTR_NAME_POSIX_ACL_*).  Upper filesystem may not support
ACL and still be perfectly able to support overlayfs.

Reported-by: Martin Ziegler <ziegler@uni-freiburg.de>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: c11b9fdd6a ("ovl: remove posix_acl_default from workdir")
Cc: <stable@vger.kernel.org>
2016-09-05 13:55:20 +02:00
Linus Torvalds 4b30b6d126 Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I'm still prepping a set of fixes for btrfs fsync, just nailing down a
  hard to trigger memory corruption.  For now, these are tested and ready."

* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
  Btrfs: fix endless loop in balancing block groups
  Btrfs: kill invalid ASSERT() in process_all_refs()
2016-09-03 12:40:45 -07:00
Linus Torvalds 41488202f1 Merge tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are three small fixes for 4.8-rc5.

  One for sysfs, one for kernfs, and one documentation fix, all for
  reported issues.  All of these have been in linux-next for a while"

* tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: correctly handle read offset on PREALLOC attrs
  documentation: drivers/core/of: fix name of of_node symlink
  kernfs: don't depend on d_find_any_alias() when generating notifications
2016-09-03 11:36:55 -07:00
Linus Torvalds 3e423945ea devpts: return NULL pts 'priv' entry for non-devpts nodes
In commit 8ead9dd547 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly.  And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.

And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.

So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org # 4.6+
Cc: Eric W Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-03 11:02:50 -07:00
Linus Torvalds f28929ba36 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
 "Most of this is regression fixes for posix acl behavior introduced in
  4.8-rc1 (these were caught by the pjd-fstest suite).  The are also
  miscellaneous fixes marked as stable material and cleanups.

  Other than overlayfs code, it touches <linux/fs.h> to add a constant
  with which to disable posix acl caching.  No changes needed to the
  actual caching code, it automatically does the right thing, although
  later we may want to optimize this case.

  I'm now testing overlayfs with the following test suites to catch
  regressions:

   - unionmount-testsuite
   - xfstests
   - pjd-fstest"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: update doc
  ovl: listxattr: use strnlen()
  ovl: Switch to generic_getxattr
  ovl: copyattr after setting POSIX ACL
  ovl: Switch to generic_removexattr
  ovl: Get rid of ovl_xattr_noacl_handlers array
  ovl: Fix OVL_XATTR_PREFIX
  ovl: fix spelling mistake: "directries" -> "directories"
  ovl: don't cache acl on overlay layer
  ovl: use cached acl on underlying layer
  ovl: proper cleanup of workdir
  ovl: remove posix_acl_default from workdir
  ovl: handle umask and posix_acl_default correctly on creation
  ovl: don't copy up opaqueness
2016-09-02 09:32:15 -07:00
Linus Torvalds 511a8cdb65 Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
 "Two small patches to fix some bugs with the audit-by-executable
  functionality we introduced back in v4.3 (both patches are marked
  for the stable folks)"

* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
  audit: fix exe_file access in audit_exe_compare
  mm: introduce get_task_exe_file
2016-09-01 15:55:56 -07:00
Linus Torvalds 7d1ce606a3 Merge tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
 "Most of these changes are small regression fixes that address problems
  introduced in the 4.8-rc1 window.  The two fixes that aren't (IO
  completion fix and superblock inprogress check) are fixes for problems
  introduced some time ago and need to be pushed back to stable kernels.

  Changes in this update:
   - iomap FIEMAP_EXTENT_MERGED usage fix
   - additional mount-time feature restrictions
   - rmap btree query fixes
   - freeze/unmount io completion workqueue fix
   - memory corruption fix for deferred operations handling"

* tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: track log done items directly in the deferred pending work item
  iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems
  xfs: prevent dropping ioend completions during buftarg wait
  xfs: fix superblock inprogress check
  xfs: simple btree query range should look right if LE lookup fails
  xfs: fix some key handling problems in _btree_simple_query_range
  xfs: don't log the entire end of the AGF
  xfs: disallow mounting of realtime + rmap filesystems
  xfs: don't perform lookups on zero-height btrees
2016-09-01 15:33:16 -07:00
Wang Xiaoguang e0af24849e btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
return directly and also forget to wake up process which are waiting for
their tickets, so these processes will wait endlessly.

Fstests case generic/172 with mount option "-o compress=lzo" have revealed
this bug in my test machine. Here if we have tickets to handle, we must
handle them first.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-01 17:23:24 +02:00
Liu Bo a9b1fc851d Btrfs: fix endless loop in balancing block groups
Qgroup function may overwrite the saved error 'err' with 0
in case quota is not enabled, and this ends up with a
endless loop in balance because we keep going back to balance
the same block group.

It really should use 'ret' instead.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-01 17:16:47 +02:00
Josef Bacik 3dc09ec895 Btrfs: kill invalid ASSERT() in process_all_refs()
Suppose you have the following tree in snap1 on a file system mounted with -o
inode_cache so that inode numbers are recycled

└── [    258]  a
    └── [    257]  b

and then you remove b, rename a to c, and then re-create b in c so you have the
following tree

└── [    258]  c
    └── [    257]  b

and then you try to do an incremental send you will hit

ASSERT(pending_move == 0);

in process_all_refs().  This is because we assume that any recycling of inodes
will not have a pending change in our path, which isn't the case.  This is the
case for the DELETE side, since we want to remove the old file using the old
path, but on the create side we could have a pending move and need to do the
normal pending rename dance.  So remove this ASSERT() and put a comment about
why we ignore pending_move.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-01 17:16:47 +02:00