Commit Graph

1822 Commits

Author SHA1 Message Date
Jan Kara 15c34a7606 ocfs2: fix quota file corruption
Global quota files are accessed from different nodes.  Thus we cannot
cache offset of quota structure in the quota file after we drop our node
reference count to it because after that moment quota structure may be
freed and reallocated elsewhere by a different node resulting in
corruption of quota file.

Fix the problem by clearing dq_off when we are releasing dquot structure.
We also remove the DB_READ_B handling because it is useless -
DQ_ACTIVE_B is set iff DQ_READ_B is set.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-04 07:55:48 -08:00
Xue jiufei 0e048316ff ocfs2: check existence of old dentry in ocfs2_link()
System call linkat first calls user_path_at(), check the existence of
old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
work.  There may exist a race when Node A create a hard link for file
while node B rm it.

         Node A                          Node B
user_path_at()
  ->ocfs2_lookup(),
find old dentry exist
                                rm file, add inode say inodeA
                                to orphan_dir

call ocfs2_link(),create a
hard link for inodeA.

                                rm the link, add inodeA to orphan_dir
                                again

When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
main work.  It first tranverses entrys in orphan_dir, linking all inodes
in this orphan_dir to a list look like this:

	inodeA->inodeB->...->inodeA

When tranvering this list, it will fall into loop, calling iput() again
and again.  And finally trigger BUG_ON(inode->i_state & I_CLEAR).

Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Junxiao Bi c7d2cbc364 ocfs2: update inode size after zeroing the hole
fs-writeback will release the dirty pages without page lock whose offset
are over inode size, the release happens at
block_write_full_page_endio().  If not update, dirty pages in file holes
may be released before flushed to the disk, then file holes will contain
some non-zero data, this will cause sparse file md5sum error.

To reproduce the bug, find a big sparse file with many holes, like vm
image file, its actual size should be bigger than available mem size to
make writeback work more frequently, tar it with -S option, then keep
untar it and check its md5sum again and again until you get a wrong
md5sum.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Younger Liu d62e74be12 ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
The issue scenario is as following:

- Create a small file and fallocate a large disk space for a file with
  FALLOC_FL_KEEP_SIZE option.

- ftruncate the file back to the original size again.  but the disk free
  space is not changed back.  This is a real bug that be fixed in this
  patch.

In order to solve the issue above, we modified ocfs2_setattr(), if
attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
truncate disk space to attr->ia_size.

Signed-off-by: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Tested-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jensen <shencanquan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Younger Liu a987c7ca7f ocfs2: fix ocfs2_sync_file() if filesystem is readonly
If filesystem is readonly, there is no need to flush drive's caches or
force any uncommitted transactions.

[akpm@linux-foundation.org: return -EROFS, not 0]
Signed-off-by: Younger Liu <younger.liucn@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Zongxun Wang fb951eb5e1 ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
Even if using the same jbd2 handle, we cannot rollback a transaction.
So once some error occurs after successfully allocating clusters, the
allocated clusters will never be used and it means they are lost.  For
example, call ocfs2_claim_clusters successfully when expanding a file,
but failed in ocfs2_insert_extent.  So we need free the allocated
clusters if they are not used indeed.

Signed-off-by: Zongxun Wang <wangzongxun@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06 13:48:51 -08:00
Linus Torvalds f568849eda Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
2014-01-30 11:19:05 -08:00
Linus Torvalds bf3d846b78 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted stuff; the biggest pile here is Christoph's ACL series.  Plus
  assorted cleanups and fixes all over the place...

  There will be another pile later this week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
  __dentry_path() fixes
  vfs: Remove second variable named error in __dentry_path
  vfs: Is mounted should be testing mnt_ns for NULL or error.
  Fix race when checking i_size on direct i/o read
  hfsplus: remove can_set_xattr
  nfsd: use get_acl and ->set_acl
  fs: remove generic_acl
  nfs: use generic posix ACL infrastructure for v3 Posix ACLs
  gfs2: use generic posix ACL infrastructure
  jfs: use generic posix ACL infrastructure
  xfs: use generic posix ACL infrastructure
  reiserfs: use generic posix ACL infrastructure
  ocfs2: use generic posix ACL infrastructure
  jffs2: use generic posix ACL infrastructure
  hfsplus: use generic posix ACL infrastructure
  f2fs: use generic posix ACL infrastructure
  ext2/3/4: use generic posix ACL infrastructure
  btrfs: use generic posix ACL infrastructure
  fs: make posix_acl_create more useful
  fs: make posix_acl_chmod more useful
  ...
2014-01-28 08:38:04 -08:00
Xiaowei.Hu f5b258550f ocfs2: do not log ENOENT in unlink()
Suppress log message like this: (open_delete,8328,0):ocfs2_unlink:951
ERROR: status = -2

Orabug:17445485

Signed-off-by: Xiaowei Hu <xiaowei.hu@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Christoph Hellwig 702e5bc68a ocfs2: use generic posix ACL infrastructure
This contains some major refactoring for the create path so that
inodes are created with the right mode to start with instead of
fixing it up later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:21 -05:00
Christoph Hellwig 37bc15392a fs: make posix_acl_create more useful
Rename the current posix_acl_created to __posix_acl_create and add
a fully featured helper to set up the ACLs on file creation that
uses get_acl().

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Christoph Hellwig 5bf3258fd2 fs: make posix_acl_chmod more useful
Rename the current posix_acl_chmod to __posix_acl_chmod and add
a fully featured ACL chmod helper that uses the ->set_acl inode
operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-01-25 23:58:18 -05:00
Yiwen Jiang 75f82eaa50 ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously
2 nodes cluster, say Node A and Node B, mount the same ocfs2 volume, and
create a file 1.

Node A			Node B
open 1, get open lock
                        rm 1, and then add 1 to orphan_dir
storage link down,
o2hb_write_timeout
->o2quo_disk_timeout
->emergency_restart
                        at the moment, Node B dismount and do
			ocfs2rec simultaneously
                        1) ocfs2_dismount_volume
			->ocfs2_recovery_exit
			->wait_event(osb->recovery_event)
			->flush_workqueue(ocfs2_wq)
			2) ocfs2rec
			->queue_work(&journal->j_recovery_work)
                        ->ocfs2_recover_orphans
			->ocfs2_commit_truncate
                        ->queue_delayed_work(&osb->osb_truncate_log_wq)

In ocfs2_recovery_exit, it flushes workqueue and then releases system
inodes.  When doing ocfs2rec, it will call ocfs2_flush_truncate_log
which will try to get sys_root_inode, and NULL pointer dereference
occurs.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: joyce <xuejiufei@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Tariq Saeed a2a3b39824 ocfs2: punch hole should return EINVAL if the length argument in ioctl is negative
An unreserve space ioctl OCFS2_IOC_UNRESVSP/64 should reject a negative
length.

Orabug:14789508

Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com>
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Wei Yongjun 16eac4be46 ocfs2: fix sparse non static symbol warning
Fixes the following sparse warning:

  fs/ocfs2/stack_user.c:930:32: warning:
   symbol 'ocfs2_ls_ops' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Jie Liu 1ba2212bb3 ocfs2: adjust minlen with discard_granularity in the FITRIM ioctl
Adjust minlen with discard_granularity for FITRIM ioctl(2) if the given
minimum size in bytes is less than it because, discard granularity is
used to tell us that the minimum size of extent that can be discarded by
the storage device.

This is inspired by ext4 commit 5c2ed62fd4 ("ext4: Adjust minlen with
discard_granularity in the FITRIM ioctl") from Lukas Czerner.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Jie Liu aa89762c54 ocfs2: return EINVAL if the given range to discard is less than block size
For FITRIM ioctl(2), we should not keep silence if the given range
length ls less than a block size as there is no data blocks would be
discareded.  Hence it should return EINVAL instead.  This issue can be
verified via xfstests/generic/288 which is used for FITRIM argument
handling tests.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Jie Liu 19e8ac2721 ocfs2: return EOPNOTSUPP if the device does not support discard
For FITRIM ioctl(2), we should return EOPNOTSUPP to inform the user that
the storage device does not support discard if it is, otherwise return
success would confuse the user even though there is no free blocks were
trimmed at all.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Younger Liu 0a2fcd8988 ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits()
ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() are
already provided in suballoc.c.  So, the same functions in
move_extents.c are not needed any more.

Declare the functions in suballoc.h and remove redundant functions in
move_extents.c.

Signed-off-by: Younger Liu <liuyiyang@hisense.com>
Cc: Younger Liu <younger.liucn@gmail.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Goldwyn Rodrigues c994c2ebdb ocfs2: use the new DLM operation callbacks while requesting new lockspace
Attempt to use the new DLM operations.  If it is not supported, use the
traditional ocfs2_controld.

To exchange ocfs2 versioning, we use the LVB of the version dlm lock.
It first attempts to take the lock in EX mode (non-blocking).  If
successful (which means it is the first mount), it writes the version
number and downconverts to PR lock.  If it is unsuccessful, it reads the
version from the lock.

If this becomes the standard (with o2cb as well), it could simplify
userspace tools to check if the filesystem is mounted on other nodes.

Dan: Since ocfs2_protocol_version are two u8 values, the additional
checks with LONG* don't make sense.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:42 -08:00
Goldwyn Rodrigues 4150363033 ocfs2: framework for version LVB
Use the native DLM locks for version control negotiation.  Most of the
framework is taken from gfs2/lock_dlm.c

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues 3e83415164 ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node
This is done to differentiate between using and not using controld and
use the connection information accordingly.

We need to be backward compatible.  So, we use a new enum
ocfs2_connection_type to identify when controld is used and when it is
not.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues 24aa338611 ocfs2: shift allocation ocfs2_live_connection to user_connect()
We perform this because the DLM recovery callbacks will require the
ocfs2_live_connection structure to record the node information when
dlm_new_lockspace() is updated (in the last patch of the series).

Before calling dlm_new_lockspace(), we need the structure ready for the
.recover_done() callback, which would set oc_this_node.  This is the
reason we allocate ocfs2_live_connection beforehand in user_connect().

[AKPM] rc initialization is not required because it assigned in case of
errors.  It will be cleared by compiler anyways.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reveiwed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues 66e188fc31 ocfs2: add DLM recovery callbacks
These are the callbacks called by the fs/dlm code in case the membership
changes.  If there is a failure while/during calling any of these, the
DLM creates a new membership and relays to the rest of the nodes.

 - recover_prep() is called when DLM understands a node is down.
 - recover_slot() is called once all nodes have acknowledged
   recover_prep and recovery can begin.
 - recover_done() is called once the recovery is complete.  It returns
   the new membership.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00
Goldwyn Rodrigues c74a3bdd9b ocfs2: add clustername to cluster connection
This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
handling up to the times with respect to DLM (>=4.0.1) and corosync
(2.3.x).  AFAIK, cman also is being phased out for a unified corosync
cluster stack.

fs/dlm performs all the functions with respect to fencing and node
management and provides the API's to do so for ocfs2.  For all future
references, DLM stands for fs/dlm code.

The advantages are:
 + No need to run an additional userspace daemon (ocfs2_controld)
 + No controld device handling and controld protocol
 + Shifting responsibilities of node management to DLM layer

For backward compatibility, we are keeping the controld handling code.
Once enough time has passed we can remove a significant portion of the
code.  This was tested by using the kernel with changes on older
unmodified tools.  The kernel used ocfs2_controld as expected, and
displayed the appropriate warning message.

This feature requires modification in the userspace ocfs2-tools.  The
changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
nocontrold Currently, not many checks are present in the userspace code,
but that would change soon.

This patch (of 6):

Add clustername to cluster connection.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:41 -08:00