Commit Graph

141 Commits

Author SHA1 Message Date
Linus Torvalds
7e0fb73c52 Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
 "This series does several related things:

   - Makes the dcache hash (fs/namei.c) useful for general kernel use.

     (Thanks to Bruce for noticing the zero-length corner case)

   - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
     above.

   - Avoids 64-bit multiplies in hash_64() on 32-bit platforms.  Two
     32-bit multiplies will do well enough.

   - Rids the world of the bad hash multipliers in hash_32.

     This finishes the job started in commit 689de1d6ca ("Minimal
     fix-up of bad hashing behavior of hash_64()")

     The vast majority of Linux architectures have hardware support for
     32x32-bit multiply and so derive no benefit from "simplified"
     multipliers.

     The few processors that do not (68000, h8/300 and some models of
     Microblaze) have arch-specific implementations added.  Those
     patches are last in the series.

   - Overhauls the dcache hash mixing.

     The patch in commit 0fed3ac866 ("namei: Improve hash mixing if
     CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
     Replaced with a much more careful design that's simultaneously
     faster and better.  (My own invention, as there was noting suitable
     in the literature I could find.  Comments welcome!)

   - Modify the hash_name() loop to skip the initial HASH_MIX().  This
     would let us salt the hash if we ever wanted to.

   - Sort out partial_name_hash().

     The hash function is declared as using a long state, even though
     it's truncated to 32 bits at the end and the extra internal state
     contributes nothing to the result.  And some callers do odd things:

      - fs/hfs/string.c only allocates 32 bits of state
      - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes

   - Modify bytemask_from_count to handle inputs of 1..sizeof(long)
     rather than 0..sizeof(long)-1.  This would simplify users other
     than full_name_hash"

  Special thanks to Bruce Fields for testing and finding bugs in v1.  (I
  learned some humbling lessons about "obviously correct" code.)

  On the arch-specific front, the m68k assembly has been tested in a
  standalone test harness, I've been in contact with the Microblaze
  maintainers who mostly don't care, as the hardware multiplier is never
  omitted in real-world applications, and I haven't heard anything from
  the H8/300 world"

* 'hash' of git://ftp.sciencehorizons.net/linux:
  h8300: Add <asm/hash.h>
  microblaze: Add <asm/hash.h>
  m68k: Add <asm/hash.h>
  <linux/hash.h>: Add support for architecture-specific functions
  fs/namei.c: Improve dcache hash function
  Eliminate bad hash multipliers from hash_32() and  hash_64()
  Change hash_64() return value to 32 bits
  <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
  fs/namei.c: Add hashlen_string() function
  Pull out string hash to <linux/stringhash.h>
2016-05-28 16:15:25 -07:00
George Spelvin
f4bcbe792b Pull out string hash to <linux/stringhash.h>
... so they can be used without the rest of <linux/dcache.h>

The hashlen_* macros will make sense next patch.

Signed-off-by: George Spelvin <linux@sciencehorizons.net>
2016-05-28 15:42:40 -04:00
Al Viro
0e0162bb8c Merge branch 'ovl-fixes' into for-linus
Backmerge to resolve a conflict in ovl_lookup_real();
"ovl_lookup_real(): use lookup_one_len_unlocked()" instead,
but it was too late in the cycle to rebase.
2016-05-17 02:17:59 -04:00
Miklos Szeredi
54d5ca871e vfs: add vfs_select_inode() helper
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
2016-05-10 23:55:01 -04:00
Al Viro
d9171b9345 parallel lookups machinery, part 4 (and last)
If we *do* run into an in-lookup match, we need to wait for it to
cease being in-lookup.  Fortunately, we do have unused space in
in-lookup dentries - d_lru is never looked at until it stops being
in-lookup.

So we can stash a pointer to wait_queue_head from stack frame of
the caller of ->lookup().  Some precautions are needed while
waiting, but it's not that hard - we do hold a reference to dentry
we are waiting for, so it can't go away.  If it's found to be
in-lookup the wait_queue_head is still alive and will remain so
at least while ->d_lock is held.  Moreover, the condition we
are waiting for becomes true at the same point where everything
on that wq gets woken up, so we can just add ourselves to the
queue once.

d_alloc_parallel() gets a pointer to wait_queue_head_t from its
caller; lookup_slow() adjusted, d_add_ci() taught to use
d_alloc_parallel() if the dentry passed to it happens to be
in-lookup one (i.e. if it's been called from the parallel lookup).

That's pretty much it - all that remains is to switch ->i_mutex
to rwsem and have lookup_slow() take it shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:27 -04:00
Al Viro
94bdd655ca parallel lookups machinery, part 3
We will need to be able to check if there is an in-lookup
dentry with matching parent/name.  Right now it's impossible,
but as soon as start locking directories shared such beasts
will appear.

Add a secondary hash for locating those.  Hash chains go through
the same space where d_alias will be once it's not in-lookup anymore.
Search is done under the same bitlock we use for modifications -
with the primary hash we can rely on d_rehash() into the wrong
chain being the worst that could happen, but here the pointers are
buggered once it's removed from the chain.  On the other hand,
the chains are not going to be long and normally we'll end up
adding to the chain anyway.  That allows us to avoid bothering with
->d_lock when doing the comparisons - everything is stable until
removed from chain.

New helper: d_alloc_parallel().  Right now it allocates, verifies
that no hashed and in-lookup matches exist and adds to in-lookup
hash.

Returns ERR_PTR() for error, hashed match (in the unlikely case it's
been found) or new dentry.  In-lookup matches trigger BUG() for
now; that will change in the next commit when we introduce waiting
for ongoing lookup to finish.  Note that in-lookup matches won't be
possible until we actually go for shared locking.

lookup_slow() switched to use of d_alloc_parallel().

Again, these commits are separated only for making it easier to
review.  All this machinery will start doing something useful only
when we go for shared locking; it's just that the combination is
too large for my taste.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:27 -04:00
Al Viro
85c7f81041 beginning of transition to parallel lookups - marking in-lookup dentries
marked as such when (would be) parallel lookup is about to pass them
to actual ->lookup(); unmarked when
	* __d_add() is about to make it hashed, positive or not.
	* __d_move() (from d_splice_alias(), directly or via
__d_unalias()) puts a preexisting dentry in its place
	* in caller of ->lookup() if it has escaped all of the
above.  Bug (WARN_ON, actually) if it reaches the final dput()
or d_instantiate() while still marked such.

As the result, we are guaranteed that for as long as the flag is
set, dentry will
	* remain negative unhashed with positive refcount
	* never have its ->d_alias looked at
	* never have its ->d_lru looked at
	* never have its ->d_parent and ->d_name changed

Right now we have at most one such for any given parent directory.
With parallel lookups that restriction will weaken to
	* only exist when parent is locked shared
	* at most one with given (parent,name) pair (comparison of
names is according to ->d_compare())
	* only exist when there's no hashed dentry with the same
(parent,name)

Transition will take the next several commits; unfortunately, we'll
only be able to switch to rwsem at the end of this series.  The
reason for not making it a single patch is to simplify review.

New primitives: d_in_lookup() (a predicate checking if dentry is in
the in-lookup state) and d_lookup_done() (tells the system that
we are done with lookup and if it's still marked as in-lookup, it
should cease to be such).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:47:51 -04:00
Miklos Szeredi
d101a12595 fs: add file_dentry()
This series fixes bugs in nfs and ext4 due to 4bacc9c923 ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").

Regular files opened on overlayfs will result in the file being opened on
the underlying filesystem, while f_path points to the overlayfs
mount/dentry.

This confuses filesystems which get the dentry from struct file and assume
it's theirs.

Add a new helper, file_dentry() [*], to get the filesystem's own dentry
from the file.  This checks file->f_path.dentry->d_flags against
DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
set (this is the common, non-overlayfs case).

In the uncommon case it will call into overlayfs's ->d_real() to get the
underlying dentry, matching file_inode(file).

The reason we need to check against the inode is that if the file is copied
up while being open, d_real() would return the upper dentry, while the open
file comes from the lower dentry.

[*] If possible, it's better simply to use file_inode() instead.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: <stable@vger.kernel.org> # v4.2
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Axtens <dja@axtens.net>
2016-03-26 16:14:37 -04:00
Linus Torvalds
d407574e79 Merge tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New Features:
   - uplift filesystem encryption into fs/crypto/
   - give sysfs entries to control memroy consumption

  Enhancements:
   - aio performance by preallocating blocks in ->write_iter
   - use writepages lock for only WB_SYNC_ALL
   - avoid redundant inline_data conversion
   - enhance forground GC
   - use wait_for_stable_page as possible
   - speed up SEEK_DATA and fiiemap

  Bug Fixes:
   - corner case in terms of -ENOSPC for inline_data
   - hung task caused by long latency in shrinker
   - corruption between atomic write and f2fs_trace_pid
   - avoid garbage lengths in dentries
   - revoke atomicly written pages if an error occurs

  In addition, there are various minor bug fixes and clean-ups"

* tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
  f2fs: submit node page write bios when really required
  f2fs: add missing argument to f2fs_setxattr stub
  f2fs: fix to avoid unneeded unlock_new_inode
  f2fs: clean up opened code with f2fs_update_dentry
  f2fs: declare static functions
  f2fs: use cryptoapi crc32 functions
  f2fs: modify the readahead method in ra_node_page()
  f2fs crypto: sync ext4_lookup and ext4_file_open
  fs crypto: move per-file encryption from f2fs tree to fs/crypto
  f2fs: mutex can't be used by down_write_nest_lock()
  f2fs: recovery missing dot dentries in root directory
  f2fs: fix to avoid deadlock when merging inline data
  f2fs: introduce f2fs_flush_merged_bios for cleanup
  f2fs: introduce f2fs_update_data_blkaddr for cleanup
  f2fs crypto: fix incorrect positioning for GCing encrypted data page
  f2fs: fix incorrect upper bound when iterating inode mapping tree
  f2fs: avoid hungtask problem caused by losing wake_up
  f2fs: trace old block address for CoWed page
  f2fs: try to flush inode after merging inline data
  f2fs: show more info about superblock recovery
  ...
2016-03-21 11:03:02 -07:00
Jaegeuk Kim
0b81d07790 fs crypto: move per-file encryption from f2fs tree to fs/crypto
This patch adds the renamed functions moved from the f2fs crypto files.

1. definitions for per-file encryption used by ext4 and f2fs.

2. crypto.c for encrypt/decrypt functions
 a. IO preparation:
  - fscrypt_get_ctx / fscrypt_release_ctx
 b. before IOs:
  - fscrypt_encrypt_page
  - fscrypt_decrypt_page
  - fscrypt_zeroout_range
 c. after IOs:
  - fscrypt_decrypt_bio_pages
  - fscrypt_pullback_bio_page
  - fscrypt_restore_control_page

3. policy.c supporting context management.
 a. For ioctls:
  - fscrypt_process_policy
  - fscrypt_get_policy
 b. For context permission
  - fscrypt_has_permitted_context
  - fscrypt_inherit_context

4. keyinfo.c to handle permissions
  - fscrypt_get_encryption_info
  - fscrypt_free_encryption_info

5. fname.c to support filename encryption
 a. general wrapper functions
  - fscrypt_fname_disk_to_usr
  - fscrypt_fname_usr_to_disk
  - fscrypt_setup_filename
  - fscrypt_free_filename

 b. specific filename handling functions
  - fscrypt_fname_alloc_buffer
  - fscrypt_fname_free_buffer

6. Makefile and Kconfig

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:33 -07:00
Al Viro
34d0d19dc0 uninline d_add()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14 00:17:24 -04:00
Al Viro
668d0cd56e replace d_add_unique() with saner primitive
new primitive: d_exact_alias(dentry, inode).  If there is an unhashed
dentry with the same name/parent and given inode, rehash, grab and
return it.  Otherwise, return NULL.  The only caller of d_add_unique()
switched to d_exact_alias() + d_splice_alias().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-14 00:17:20 -04:00
Al Viro
a528aca7f3 use ->d_seq to get coherency between ->d_inode and ->d_flags
Games with ordering and barriers are way too brittle.  Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-02-29 12:16:43 -05:00
Andrew Morton
2bd03e49d6 include/linux/dcache.h: remove semicolons from HASH_LEN_DECLARE
A little cleanup - the invocation site provdes the semicolon.

Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
Nicolas Iooss
8db1486065 include, lib: add __printf attributes to several function prototypes
Using __printf attributes helps to detect several format string issues
at compile time (even though -Wformat-security is currently disabled in
Makefile).  For example it can detect when formatting a pointer as a
number, like the issue fixed in commit a3fa71c40f ("wl18xx: show
rx_frames_per_rates as an array as it really is"), or when the arguments
do not match the format string, c.f.  for example commit 5ce1aca814
("reiserfs: fix __RASSERT format string").

To prevent similar bugs in the future, add a __printf attribute to every
function prototype which needs one in include/linux/ and lib/.  These
functions were mostly found by using gcc's -Wsuggest-attribute=format
flag.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-17 16:39:53 -07:00
Al Viro
dc3f4198ea make simple_positive() public
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:02:01 -04:00
David Howells
4bacc9c923 overlayfs: Make f_path always point to the overlay and f_inode to the underlay
Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).

Using my union testsuite to set things up, before the patch I see:

	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
	...
	lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
	[root@andromeda union-testsuite]# stat /mnt/a/foo107
	...
	Device: 23h/35d Inode: 13381       Links: 1
	...
	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
	...
	Device: 23h/35d Inode: 13381       Links: 1
	...

After the patch:

	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
	...
	lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
	[root@andromeda union-testsuite]# stat /mnt/a/foo107
	...
	Device: 23h/35d Inode: 40346       Links: 1
	...
	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
	...
	Device: 23h/35d Inode: 40346       Links: 1
	...

Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).

The inode accessed, however, is the lower layer.  The union layer is on device
25h/37d and the upper layer on 24h/36d.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-19 03:19:32 -04:00
David Howells
4bf46a2726 VFS: Impose ordering on accesses of d_inode and d_flags
Impose ordering on accesses of d_inode and d_flags to avoid the need to do
this:

	if (!dentry->d_inode || d_is_negative(dentry)) {

when this:

	if (d_is_negative(dentry)) {

should suffice.

This check is especially problematic if a dentry can have its type field set
to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in
unionmount).

What we really need to do is stick a write barrier between setting d_inode and
setting d_flags and a read barrier between reading d_flags and reading
d_inode.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:05:28 -04:00
David Howells
525d27b235 VFS: Add owner-filesystem positive/negative dentry checks
Supply two functions to test whether a filesystem's own dentries are positive
or negative (d_really_is_positive() and d_really_is_negative()).

The problem is that the DCACHE_ENTRY_TYPE field of dentry->d_flags may be
overridden by the union part of a layered filesystem and isn't thus
necessarily indicative of the type of dentry.

Normally, this would involve a negative dentry (ie. ->d_inode == NULL) having
->d_layer.lower pointed to a lower layer dentry, DCACHE_PINNING_LOWER set and
the DCACHE_ENTRY_TYPE field set to something other than DCACHE_MISS_TYPE - but
it could also involve, say, a DCACHE_SPECIAL_TYPE being overridden to
DCACHE_WHITEOUT_TYPE if a 0,0 chardev is detected in the top layer.

However, inside a filesystem, when that fs is looking at its own dentries, it
probably wants to know if they are really negative or not - and doesn't care
about the fallthrough bits used by the union.

To this end, a filesystem should normally use d_really_is_positive/negative()
when looking at its own dentries rather than d_is_positive/negative() and
should use d_inode() to get at the inode.

Anyone looking at someone else's dentries (this includes pathwalk) should use
d_is_xxx() and d_backing_inode().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:04:42 -04:00
David Howells
44bdb5e5f6 VFS: Split DCACHE_FILE_TYPE into regular and special types
Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular
files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and
socket files).

d_is_reg() and d_is_special() are added to detect these subtypes and
d_is_file() is left as the union of the two.

This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to
use d_is_reg(dentry) instead.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:38 -05:00
David Howells
df1a085af1 VFS: Add a fallthrough flag for marking virtual dentries
Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
a virtual dentry that covers another one in a lower layer that should be used
instead.  This may be recorded on medium if directory integration is stored
there.

The flag can be set with d_set_fallthru() and tested with d_is_fallthru().

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:38 -05:00
David Howells
e7f7d2253c VFS: Add a whiteout dentry type
Add DCACHE_WHITEOUT_TYPE and provide a d_is_whiteout() accessor function.  A
d_is_miss() accessor is also added for ordinary cache misses and
d_is_negative() is modified to indicate either an ordinary miss or an enforced
miss (whiteout).

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:37 -05:00
David Howells
155e35d4da VFS: Introduce inode-getting helpers for layered/unioned fs environments
Introduce some function for getting the inode (and also the dentry) in an
environment where layered/unioned filesystems are in operation.

The problem is that we have places where we need *both* the union dentry and
the lower source or workspace inode or dentry available, but we can only have
a handle on one of them.  Therefore we need to derive the handle to the other
from that.

The idea is to introduce an extra field in struct dentry that allows the union
dentry to refer to and pin the lower dentry.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:15 -05:00
Al Viro
d6cb125b99 kill d_validate()
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:26 -05:00
Al Viro
41d28bca2d switch d_materialise_unique() users to d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:20 -05:00