Commit Graph

334 Commits

Author SHA1 Message Date
Linus Torvalds
a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Jan Kara
1e8b212fe5 ext2: Implement freezing
The only missing piece to make freezing work reliably with ext2 is to
stop iput() of unlinked inode from deleting the inode on frozen filesystem.
So add a necessary protection to ext2_evict_inode().

We also provide appropriate ->freeze_fs and ->unfreeze_fs functions.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 09:45:53 +04:00
Akinobu Mita
ecd0afa3ce ext2: use memweight()
Convert ext2_count_free() to use memweight() instead of table lookup
based counting clear bits implementation.  This change only affects the
code segments enabled by EXT2FS_DEBUG.

Note that this memweight() call can't be replaced with a single
bitmap_weight() call, although the pointer to the memory area is aligned
to long-word boundary.  Because the size of the memory area may not be a
multiple of BITS_PER_LONG, then it returns wrong value on big-endian
architecture.

This also includes the following changes.

- Remove unnecessary map == NULL check in ext2_count_free() which
  always takes non-null pointer as the memory area.

- Fix printk format warning that only reveals with EXT2FS_DEBUG.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30 17:25:16 -07:00
Linus Torvalds
08d9329c29 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara:
 "Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs.  I'm
  on vacation and scarcely checking email since we are expecting baby
  any day now but these fixes should be safe to go in and I don't want
  to delay them unnecessarily."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: avoid info leak on export
  isofs: avoid info leak on export
  udf: Improve table length check to avoid possible overflow
  ext3: Check return value of blkdev_issue_flush()
  jbd: Check return value of blkdev_issue_flush()
  udf: Do not decrement i_blocks when freeing indirect extent block
  udf: Fix memory leak when mounting
  ext2: cleanup the confused goto label
  UDF: Remove unnecessary variable "offset" from udf_fill_inode
  udf: stop using s_dirt
  ext3: force ro mount if ext3_setup_super() fails
  quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
2012-07-24 17:40:44 -07:00
Al Viro
8fc37ec54c don't expose I_NEW inodes via dentry->d_inode
d_instantiate(dentry, inode);
	unlock_new_inode(inode);

is a bad idea; do it the other way round...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23 00:00:58 +04:00
Jan Kara
a117782571 quota: Move quota syncing to ->sync_fs method
Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:34 +04:00
Al Viro
ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Wanlong Gao
e124a32043 ext2: cleanup the confused goto label
Cleanup the confused goto label, since the big lock has been removed.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Linus Torvalds
90324cc1b1 Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Linus Torvalds
ece78b7df7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
 "Interesting bits are:
   - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
     quota code does not need i_mutex anymore in any unusual way.
   - backport (from ext4) of a fix of a checkpointing bug (missing cache
     flush) that could lead to fs corruption on power failure

  The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: trivial fix to comment for ext2_free_blocks
  ext2: remove the redundant comment for ext2_export_ops
  ext3: return 32/64-bit dir name hash according to usage type
  quota: Get rid of nested I_MUTEX_QUOTA locking subclass
  quota: Use precomputed value of sb_dqopt in dquot_quota_sync
  ext2: Remove i_mutex use from ext2_quota_write()
  reiserfs: Remove i_mutex use from reiserfs_quota_write()
  ext4: Remove i_mutex use from ext4_quota_write()
  ext3: Remove i_mutex use from ext3_quota_write()
  quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
  jbd: Write journal superblock with WRITE_FUA after checkpointing
  jbd: protect all log tail updates with j_checkpoint_mutex
  jbd: Split updating of journal superblock and marking journal empty
  ext2: do not register write_super within VFS
  ext2: Remove s_dirt handling
  ext2: write superblock only once on unmount
  ext3: update documentation with barrier=1 default
  ext3: remove max_debt in find_group_orlov()
  jbd: Refine commit writeout logic
2012-05-25 08:14:59 -07:00
Linus Torvalds
644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00
Wang Sheng-Hui
0324876628 ext2: trivial fix to comment for ext2_free_blocks
The function is ext2_free_blocks(), not ext2_free_blocks_sb().

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-18 15:23:08 +02:00
Eric W. Biederman
b8a9f9e183 userns: Convert ext2 to use kuid/kgid where appropriate.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:26 -07:00
Wang Sheng-Hui
aa9e939d52 ext2: remove the redundant comment for ext2_export_ops
The comment is outdated and isn't particularly informative anyway - NULL
meaning the default behavior is very common in kernel. And we really set about
half of entries. So remove the whole comment for ext2_export_ops.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:39 +02:00
Jan Kara
e2a3fde750 ext2: Remove i_mutex use from ext2_quota_write()
We don't need i_mutex in ext2_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:38 +02:00
Linus Torvalds
26fe575028 vfs: make it possible to access the dentry hash/len as one 64-bit entry
This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:54:35 -07:00
Jan Kara
dbd5768f87 vfs: Rename end_writeback() to clear_inode()
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:41 +08:00
Artem Bityutskiy
f72cf5e223 ext2: do not register write_super within VFS
Jan Kara removed 'sb->s_dirt' VFS flag references, so we do not need to
register the ext2 'ext2_write_super()' method in the VFS superblock operations,
because 'sb->s_dirt' won't be ever set to 1 and VFS won't ever call
'->write_super()' anyway. Thus, remove the method.

Tested using xfstests.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:46 +02:00
Jan Kara
b838ec2232 ext2: Remove s_dirt handling
Places which modify superblock feature / state fields mark the superblock
buffer dirty so it is written out by flusher thread. Thus there's no need to
set s_dirt there.

The only other fields changing in the superblock are the numbers of free
blocks, free inodes and s_wtime. There's no real need to write (or even
compute) these periodically. Free blocks / inodes counters are recomputed on
every mount from group counters anyway and value of s_wtime is only
informational and imprecise anyway. So it should be enough to write these
opportunistically on mount, remount, umount, and sync_fs times.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:45 +02:00
Artem Bityutskiy
f2b2242081 ext2: write superblock only once on unmount
Currently on unmount if we are mounted R/W, we first write the superblock to
the media if it is dirty, and then write it again, which is not optimal. This
patch makes ext2 write the superblock on unmount less times.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:45 +02:00
Al Viro
f7699f2b01 migrate ext2_fs.h guts to fs/ext2/ext2.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-31 16:03:16 -04:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Al Viro
48fde701af switch open-coded instances of d_make_root() to new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:35 -04:00
Al Viro
8de5277879 vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link}
New field of struct super_block - ->s_max_links.  Maximal allowed
value of ->i_nlink or 0; in the latter case all checks still need
to be done in ->link/->mkdir/->rename instances.  Note that this
limit applies both to directoris and to non-directories.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:32 -04:00