When file attributes are changed most filesystems rely on the
setattr_prepare(), setattr_copy(), and notify_change() helpers for
initialization and permission checking. Let them handle idmapped mounts.
If the inode is accessed through an idmapped mount map it into the
mount's user namespace. Afterwards the checks are identical to
non-idmapped mounts. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.
Helpers that perform checks on the ia_uid and ia_gid fields in struct
iattr assume that ia_uid and ia_gid are intended values and have already
been mapped correctly at the userspace-kernelspace boundary as we
already do today. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public licence as published by
the free software foundation either version 2 of the licence or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 114 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pass the object size in to fscache_acquire_cookie() and
fscache_write_page() rather than the netfs providing a callback by which it
can be received. This makes it easier to update the size of the object
when a new page is written that extends the object.
The current object size is also passed by fscache to the check_aux
function, obviating the need to store it in the aux data.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
Attach copies of the index key and auxiliary data to the fscache cookie so
that:
(1) The callbacks to the netfs for this stuff can be eliminated. This
can simplify things in the cache as the information is still
available, even after the cache has relinquished the cookie.
(2) Simplifies the locking requirements of accessing the information as we
don't have to worry about the netfs object going away on us.
(3) The cache can do lazy updating of the coherency information on disk.
As long as the cache is flushed before reboot/poweroff, there's no
need to update the coherency info on disk every time it changes.
(4) Cookies can be hashed or put in a tree as the index key is easily
available. This allows:
(a) Checks for duplicate cookies can be made at the top fscache layer
rather than down in the bowels of the cache backend.
(b) Caching can be added to a netfs object that has a cookie if the
cache is brought online after the netfs object is allocated.
A certain amount of space is made in the cookie for inline copies of the
data, but if it won't fit there, extra memory will be allocated for it.
The downside of this is that live cache operation requires more memory.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
Add some tracepoints to fscache:
(*) fscache_cookie - Tracks a cookie's usage count.
(*) fscache_netfs - Logs registration of a network filesystem, including
the pointer to the cookie allocated.
(*) fscache_acquire - Logs cookie acquisition.
(*) fscache_relinquish - Logs cookie relinquishment.
(*) fscache_enable - Logs enablement of a cookie.
(*) fscache_disable - Logs disablement of a cookie.
(*) fscache_osm - Tracks execution of states in the object state machine.
and cachefiles:
(*) cachefiles_ref - Tracks a cachefiles object's usage count.
(*) cachefiles_lookup - Logs result of lookup_one_len().
(*) cachefiles_mkdir - Logs result of vfs_mkdir().
(*) cachefiles_create - Logs result of vfs_create().
(*) cachefiles_unlink - Logs calls to vfs_unlink().
(*) cachefiles_rename - Logs calls to vfs_rename().
(*) cachefiles_mark_active - Logs an object becoming active.
(*) cachefiles_wait_active - Logs a wait for an old object to be
destroyed.
(*) cachefiles_mark_inactive - Logs an object becoming inactive.
(*) cachefiles_mark_buried - Logs the burial of an object.
Signed-off-by: David Howells <dhowells@redhat.com>
An NULL-pointer dereference happens in cachefiles_mark_object_inactive()
when it tries to read i_blocks so that it can tell the cachefilesd daemon
how much space it's making available.
The problem is that cachefiles_drop_object() calls
cachefiles_mark_object_inactive() after calling cachefiles_delete_object()
because the object being marked active staves off attempts to (re-)use the
file at that filename until after it has been deleted. This means that
d_inode is NULL by the time we come to try to access it.
To fix the problem, have the caller of cachefiles_mark_object_inactive()
supply the number of blocks freed up.
Without this, the following oops may occur:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G I ------------ 3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:ffff8800b77c3d70 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
FS: 0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
[<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
[<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
[<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
[<ffffffff810a605b>] process_one_work+0x17b/0x470
[<ffffffff810a6e96>] worker_thread+0x126/0x410
[<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
[<ffffffff810ae64f>] kthread+0xcf/0xe0
[<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
[<ffffffff81695418>] ret_from_fork+0x58/0x90
[<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
The oopsing code shows:
callq 0xffffffff810af6a0 <wake_up_bit>
mov 0xf8(%r12),%rax
mov 0x30(%rax),%rax
mov 0x98(%rax),%rax <---- oops here
lock add %rax,0x130(%rbx)
where this is:
d_backing_inode(object->dentry)->i_blocks
Fixes: a5b3a80b89 (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: Jianhong Yin <jiyin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
__fscache_check_consistency() calls check_consistency() callback
and return the callback's return value. But the return type of
check_consistency() is bool. So __fscache_check_consistency()
return 1 if the cache is inconsistent. This is inconsistent with
the document.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Provide read-and-reset objects- and blocks-released counters for cachefilesd
to use to work out whether there's anything new that can be culled.
One of the problems cachefilesd has is that if all the objects in the cache
are pinned by inodes lying dormant in the kernel inode cache, there isn't
anything for it to cull. In such a case, it just spins around walking the
filesystem tree and scanning for something to cull. This eats up a lot of
CPU time.
By telling cachefilesd if there have been any releases, the daemon can
sleep until there is the possibility of something to do.
cachefilesd finds this information by the following means:
(1) When the control fd is read, the kernel presents a list of values of
interest. "freleased=N" and "breleased=N" are added to this list to
indicate the number of files released and number of blocks released
since the last read call. At this point the counters are reset.
(2) POLLIN is signalled if the number of files released becomes greater
than 0.
Note that by 'released' it just means that the kernel has released its
interest in those files for the moment, not necessarily that the files
should be deleted from the cache.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cachefiles should perform fs modifications (eg. vfs_unlink()) on the top layer
only and should not attempt to alter the lower layer.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack
thereof) in cachefiles:
(1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as
it doesn't want to deal with automounts in its cache.
(2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in
cachefiles.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If a cache object gets killed whilst in the process of being set up - for
instance if the netfs relinquishes the cookie that the object is associated
with - then the object's state machine will transit to the DROP_OBJECT state
without necessarily going through the LOOKUP_OBJECT or CREATE_OBJECT states.
This is a problem for CacheFiles because cachefiles_drop_object() assumes that
object->dentry will be set upon reaching the DROP_OBJECT state and has an
ASSERT() to that effect (see the oops below) - but object->dentry doesn't get
set until the LOOKUP_OBJECT or CREATE_OBJECT states (and not always then if
they fail).
To fix this, just make the dentry cleanup in cachefiles_drop_object()
conditional on the dentry actually being set and remove the assertion.
CacheFiles: Assertion failed
------------[ cut here ]------------
kernel BUG at .../fs/cachefiles/namei.c:425!
...
Workqueue: fscache_object fscache_object_work_func [fscache]
...
RIP: ... cachefiles_delete_object+0xcd/0x110 [cachefiles]
...
Call Trace:
[<ffffffffa043280f>] ? cachefiles_drop_object+0xff/0x130 [cachefiles]
[<ffffffffa02ac511>] ? fscache_drop_object+0xd1/0x1d0 [fscache]
[<ffffffffa02ac697>] ? fscache_object_work_func+0x87/0x210 [fscache]
[<ffffffff81080635>] ? process_one_work+0x155/0x450
[<ffffffff81081c44>] ? worker_thread+0x114/0x370
[<ffffffff81081b30>] ? manage_workers.isra.21+0x2c0/0x2c0
[<ffffffff81087fcc>] ? kthread+0xbc/0xe0
[<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
[<ffffffff8150638c>] ? ret_from_fork+0x7c/0xb0
[<ffffffff81087f10>] ? flush_kthread_worker+0xa0/0xa0
Reported-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
Provide the ability to enable and disable fscache cookies. A disabled cookie
will reject or ignore further requests to:
Acquire a child cookie
Invalidate and update backing objects
Check the consistency of a backing object
Allocate storage for backing page
Read backing pages
Write to backing pages
but still allows:
Checks/waits on the completion of already in-progress objects
Uncaching of pages
Relinquishment of cookies
Two new operations are provided:
(1) Disable a cookie:
void fscache_disable_cookie(struct fscache_cookie *cookie,
bool invalidate);
If the cookie is not already disabled, this locks the cookie against other
dis/enablement ops, marks the cookie as being disabled, discards or
invalidates any backing objects and waits for cessation of activity on any
associated object.
This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
but it reinitialises the cookie such that it can be reenabled.
All possible failures are handled internally. The caller should consider
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
markings are cleared up.
(2) Enable a cookie:
void fscache_enable_cookie(struct fscache_cookie *cookie,
bool (*can_enable)(void *data),
void *data)
If the cookie is not already enabled, this locks the cookie against other
dis/enablement ops, invokes can_enable() and, if the cookie is not an
index cookie, will begin the procedure of acquiring backing objects.
The optional can_enable() function is passed the data argument and returns
a ruling as to whether or not enablement should actually be permitted to
begin.
All possible failures are handled internally. The cookie will only be
marked as enabled if provisional backing objects are allocated.
A later patch will introduce these to NFS. Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0. can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR). This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.
One operation has its API modified:
(3) Acquire a cookie.
struct fscache_cookie *fscache_acquire_cookie(
struct fscache_cookie *parent,
const struct fscache_cookie_def *def,
void *netfs_data,
bool enable);
This now has an additional argument that indicates whether the requested
cookie should be enabled by default. It doesn't need the can_enable()
function because the caller must prevent multiple calls for the same netfs
object and it doesn't need to take the enablement lock because no one else
can get at the cookie before this returns.
Signed-off-by: David Howells <dhowells@redhat.com
Simplify the way fscache cache objects retain their cookie. The way I
implemented the cookie storage handling made synchronisation a pain (ie. the
object state machine can't rely on the cookie actually still being there).
Instead of the the object being detached from the cookie and the cookie being
freed in __fscache_relinquish_cookie(), we defer both operations:
(*) The detachment of the object from the list in the cookie now takes place
in fscache_drop_object() and is thus governed by the object state machine
(fscache_detach_from_cookie() has been removed).
(*) The release of the cookie is now in fscache_object_destroy() - which is
called by the cache backend just before it frees the object.
This means that the fscache_cookie struct is now available to the cache all the
way through from ->alloc_object() to ->drop_object() and ->put_object() -
meaning that it's no longer necessary to take object->lock to guarantee access.
However, __fscache_relinquish_cookie() doesn't wait for the object to go all
the way through to destruction before letting the netfs proceed. That would
massively slow down the netfs. Since __fscache_relinquish_cookie() leaves the
cookie around, in must therefore break all attachments to the netfs - which
includes ->def, ->netfs_data and any outstanding page read/writes.
To handle this, struct fscache_cookie now has an n_active counter:
(1) This starts off initialised to 1.
(2) Any time the cache needs to get at the netfs data, it calls
fscache_use_cookie() to increment it - if it is not zero. If it was zero,
then access is not permitted.
(3) When the cache has finished with the data, it calls fscache_unuse_cookie()
to decrement it. This does a wake-up on it if it reaches 0.
(4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
reach 0. The initialisation to 1 in step (1) ensures that we only get
wake ups when we're trying to get rid of the cookie.
This leaves __fscache_relinquish_cookie() a lot simpler.
***
This fixes a problem in the current code whereby if fscache_invalidate() is
followed sufficiently quickly by fscache_relinquish_cookie() then it is
possible for __fscache_relinquish_cookie() to have detached the cookie from the
object and cleared the pointer before a thread is dispatched to process the
invalidation state in the object state machine.
Since the pending write clearance was deferred to the invalidation state to
make it asynchronous, we need to either wait in relinquishment for the stores
tree to be cleared in the invalidation state or we need to handle the clearance
in relinquishment.
Further, if the relinquishment code does clear the tree, then the invalidation
state need to make the clearance contingent on still having the cookie to hand
(since that's where the tree is rooted) and we have to prevent the cookie from
disappearing for the duration.
This can lead to an oops like the following:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
...
RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30
...
CR2: 000000000000000c ...
...
Process kslowd002 (...)
....
Call Trace:
[<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache]
[<ffffffff810096f0>] ? __switch_to+0xd0/0x320
[<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
[<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180
[<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
[<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0
[<ffffffff8110e233>] slow_work_execute+0x233/0x310
[<ffffffff8110e515>] slow_work_thread+0x205/0x360
[<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8110e310>] ? slow_work_thread+0x0/0x360
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810968a0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
The parameter to fscache_invalidate_writes() was object->cookie which is NULL.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Fix object state machine to have separate work and wait states as that makes
it easier to envision.
There are now three kinds of state:
(1) Work state. This is an execution state. No event processing is performed
by a work state. The function attached to a work state returns a pointer
indicating the next state to which the OSM should transition. Returning
NO_TRANSIT repeats the current state, but goes back to the scheduler
first.
(2) Wait state. This is an event processing state. No execution is
performed by a wait state. Wait states are just tables of "if event X
occurs, clear it and transition to state Y". The dispatcher returns to
the scheduler if none of the events in which the wait state has an
interest are currently pending.
(3) Out-of-band state. This is a special work state. Transitions to normal
states can be overridden when an unexpected event occurs (eg. I/O error).
Instead the dispatcher disables and clears the OOB event and transits to
the specified work state. This then acts as an ordinary work state,
though object->state points to the overridden destination. Returning
NO_TRANSIT resumes the overridden transition.
In addition, the states have names in their definitions, so there's no need for
tables of state names. Further, the EV_REQUEUE event is no longer necessary as
that is automatic for work states.
Since the states are now separate structs rather than values in an enum, it's
not possible to use comparisons other than (non-)equality between them, so use
some object->flags to indicate what phase an object is in.
The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
(EV_KILL). An object flag now carries the information about retirement.
Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
into an KILL_OBJECT state and additional states have been added for handling
waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).
A state has also been added for synchronising with parent object initialisation
(WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Milosz Tanski <milosz@adfin.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.
Signed-off-by: David Howells <dhowells@redhat.com>