Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
This patch removes the following build warning:
fs/f2fs/node.c: warning: 'nofs' may be used uninitialized in this function
[-Wuninitialized]: => 738:8
Note that this is a false alarm.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Pull f2fs cleanup patches from Al Viro:
f2fs: get rid of fake on-stack dentries
f2fs: switch init_inode_metadata() to passing parent and name separately
f2fs: switch new_inode_page() from dentry to qstr
f2fs: init_dent_inode() should take qstr
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Conflicts:
fs/f2fs/recovery.c
In the SSR case, the max gc cost should be the number of pages in a segment.
Otherwise, f2fs is able to fail getting dirty segments frequently for SSR.
In get_victim_by_default() previously,
while(1) {
...
cost = get_gc_cost(); <- cost is between 0 ~ 512.
...
if (cost == get_max_cost(sbi, &p)) <- max cost is UINT_MAX due to GC_CB type
continue;
if (nsearched++ >= MAX_VICTIM_SEARCH)
break;
}
So, if there are a number of fully valid segments in series, f2fs cannot skip
those segments by comparing the cost and max cost of each segment.
Note that, the cost is the number of valid blocks at the time of the last
checkpoint.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch makes clearer the ambiguous f2fs_gc flow as follows.
1. Remove intermediate checkpoint condition during f2fs_gc
(i.e., should_do_checkpoint() and GC_BLOCKED)
2. Remove unnecessary return values of f2fs_gc because of #1.
(i.e., GC_NODE, GC_OK, etc)
3. Simplify write_checkpoint() because of #2.
4. Clarify the main f2fs_gc flow.
o monitor how many freed sections during one iteration of do_garbage_collect().
o do GC more without checkpoints if we can't get enough free sections.
o do checkpoint once we've got enough free sections through forground GCs.
5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
log types. See. get_ssr_segement()
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Instead of evaluating the free_sections and then deciding to return
true/false from that path. We can directly use the evaluation condition
for returning proper value.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Introduce accessor to get the sections based upon the block type
(node,dents...) and modify the functions : should_do_checkpoint,
has_not_enough_free_secs to use this accessor function to get
the node sections and dent sections.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When gc thread creation is failed, mark gc_thread as NULL to avoid
crash while trying to stop invalid thread in stop_gc_thread->kthread_stop.
Instead make it return from:
if (!gc_th)
return;
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Currently GC task is started for each f2fs formatted/mounted device.
But, when we check the task list, using 'ps', there is no distinguishing
factor between the tasks. So, name the task as per the block device just
like the flusher threads.
Also, remove the macro GC_THREAD_NAME and instead use the name: f2fs_gc
to avoid name length truncation, as the command length is 16
-> TASK_COMM_LEN 16 and example name like:
f2fs_gc_task:8:16 -> this exceeds name length
Before Patch for 2 F2FS formatted partitions:
root 28061 0.0 0.0 0 0 ? S 10:31 0:00 [f2fs_gc_task]
root 28087 0.0 0.0 0 0 ? S 10:32 0:00 [f2fs_gc_task]
After Patch:
root 16756 0.0 0.0 0 0 ? S 14:57 0:00 [f2fs_gc-8:18]
root 16765 0.0 0.0 0 0 ? S 14:57 0:00 [f2fs_gc-8:19]
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
1. If f2fs is mounted with background_gc_off option, checking
BG_GC is not redundant.
2. f2fs_balance_fs is checked in f2fs_gc, so this is also redundant.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
F2FS_SET_SB_DIRT is called in inc_page_count and
it is directly called one more time in the next line.
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In f2fs, there are two superblocks. So when the first superblock was
invalidate, it should try to check another.
By Jaegeuk Kim:
o Remove a white space for coding style
o Clean up for code readability
o Fix a typo
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In some system PAGE_CACHE_SIZE isn't 4K. So using F2FS_BLKSIZE to judge.
By Jaegeuk Kim:
o f2fs does not support no other 4KB page cache size.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In file status, it can't distinguish between different devices.
So add device name to do this function.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
1. Background
Previously, if f2fs tries to move data blocks of an *evicting* inode during the
cleaning process, it stops the process incompletely and then restarts the whole
process, since it needs a locked inode to grab victim data pages in its address
space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
used, but, it waits if the inode is on freeing.
So, here is a deadlock scenario.
1. f2fs_evict_inode() <- inode "A"
2. f2fs_balance_fs()
3. f2fs_gc()
4. gc_data_segment()
5. f2fs_iget() <- inode "A" too!
If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc()
to complete f2fs_evict_inode().
1. f2fs_evict_inode() <- inode "A"
2. f2fs_balance_fs()
3. f2fs_gc()
4. gc_data_segment()
5. f2fs_iget_nowait() <- inode "A", then stop f2fs_gc() w/ -ENOENT
2. Problem and Solution
In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
o there are not enough free sections, and
o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.
So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
f2fs_evict_inode().
The f2fs_evict_inode() actually truncates all the data and node blocks, which
means that it doesn't produce any dirty node pages accordingly.
So, we don't need to do f2fs_balance_fs() in practical.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Let's remove the use of page_cache_release() in f2fs, and instead, use
f2fs_put_page(page, 0) which is exactly same but for code readability.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In f2fs_inode_info structure, the description for data_version
has a typo mistake. It should be latest instead of lastes.
So, correcting that.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
We can remove unneeded label unlock_out, avoid unnecessary jump
and reorganize the returning conditions in this function.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>