Commit Graph

214 Commits

Author SHA1 Message Date
Linus Torvalds
aca105a697 Merge tag 'md/4.2-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
 "Some md fixes for 4.2

  Several are tagged for -stable.
  A few aren't because they are not very, serious or because they are in
  the 'experimental' cluster code"

* tag 'md/4.2-fixes' of git://neil.brown.name/md:
  md/raid5: clear R5_NeedReplace when no longer needed.
  Fix read-balancing during node failure
  md-cluster: fix bitmap sub-offset in bitmap_read_sb
  md: Return error if request_module fails and returns positive value
  md: Skip cluster setup in case of error while reading bitmap
  md/raid1: fix test for 'was read error from last working device'.
  md: Skip cluster setup for dm-raid
  md: flush ->event_work before stopping array.
  md/raid10: always set reshape_safe when initializing reshape_position.
  md/raid5: avoid races when changing cache size.
2015-07-25 11:24:58 -07:00
Goldwyn Rodrigues
33e38ac688 md-cluster: fix bitmap sub-offset in bitmap_read_sb
bitmap_read_sb is modifying mddev->bitmap_info.offset. This works for
the first bitmap read. However, when multiple bitmaps need to be opened
by the same node, it ends up corrupting the offset. Fix it by using a
local variable.

Also, bitmap_read_sb is not required in bitmap_copy_from_slot since
it is called in bitmap_create. Remove bitmap_read_sb().

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 13:37:55 +10:00
Goldwyn Rodrigues
f735727319 md: Skip cluster setup in case of error while reading bitmap
If the bitmap read fails, the error code set is -EINVAL. However,
we don't check for errors and go ahead with cluster_setup.
Skip the cluster setup in case of error.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-24 13:37:48 +10:00
Goldwyn Rodrigues
d3b178adb3 md: Skip cluster setup for dm-raid
There is a bug that the bitmap superblock isn't initialised properly for
dm-raid, so a new field can have garbage in new fields.
(dm-raid does initialisation in the kernel - md initialised the
 superblock in mdadm).

This means that for dm-raid we cannot currently trust the new ->nodes
field. So:
 - use __GFP_ZERO to initialise the superblock properly for all new
    arrays
 - initialise all fields in bitmap_info in bitmap_new_disk_sb
 - ignore ->nodes for dm arrays (yes, this is a hack)

This bug exposes dm-raid to bug in the (still experimental) md-cluster
code, so it is suitable for -stable.  It does cause crashes.

References: https://bugzilla.kernel.org/show_bug.cgi?id=100491
Cc: stable@vger.kernel.org (v4.1)
Signed-off-By: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-07-23 09:22:00 +10:00
Linus Torvalds
1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Miklos Szeredi
2726d56620 vfs: add seq_file_path() helper
Turn
	seq_path(..., &file->f_path, ...);
into
	seq_file_path(..., file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:01:07 -04:00
Miklos Szeredi
9bf39ab2ad vfs: add file_path() helper
Turn
	d_path(&file->f_path, ...);
into
	file_path(file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-23 18:00:05 -04:00
NeilBrown
8532e34390 md/bitmap: remove rcu annotation from pointer arithmetic.
Evaluating  "&mddev->disks" is simple pointer arithmetic, so
it does not need 'rcu' annotations - no dereferencing is happening.

Also enhance the comment to explain that 'rdev' in that case
is not actually a pointer to an rdev.

Reported-by: Patrick Marlier <patrick.marlier@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-05-21 09:14:41 +10:00
Goldwyn Rodrigues
97f6cd39da md-cluster: re-add capabilities
When "re-add" is writted to /sys/block/mdXX/md/dev-YYY/state,
the clustered md:

1. Sends RE_ADD message with the desc_nr. Nodes receiving the message
   clear the Faulty bit in their respective rdev->flags.
2. The node initiating re-add, gathers the bitmaps of all nodes
   and copies them into the local bitmap. It does not clear the bitmap
   from which it is copying.
3. Initiating node schedules a md recovery to sync the devices.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-04-22 07:59:39 +10:00
Goldwyn Rodrigues
124eb761ed md: Fix bitmap offset calculations
The calculations of bitmap offset is incorrect with respect to bits to bytes
conversion.

Also, remove an irrelevant duplicate message.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-25 13:07:55 +11:00
Stephen Rothwell
3b0e6aacbf md/bitmap: use sector_div for sector_t divisions
neilb: modified to not corrupt ->resync_max_sectors.

sector_div usage fixed by Guoqing Jiang <gqjiang@suse.com>

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 13:08:16 +11:00
NeilBrown
935f3d4fc6 md/bitmap: fix incorrect DIV_ROUND_UP usage.
DIV_ROUTND_UP doesn't work on "long long", - and it should be
sector_t anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-03-04 12:54:29 +11:00
Goldwyn Rodrigues
11dd35daaa Copy set bits from another slot
bitmap_copy_from_slot reads the bitmap from the slot mentioned.
It then copies the set bits to the node local bitmap.

This is helper function for the resync operation on node failure.

bitmap_set_memory_bits() currently assumes it is only run at startup and that
they bitmap is currently empty.  So if it finds that a region is already
marked as dirty, it won't mark it dirty again. Change bitmap_set_memory_bits()
to always set the NEEDED_MASK bit if 'needed' is set.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:59:05 -06:00
Goldwyn Rodrigues
f9209a3235 bitmap_create returns bitmap pointer
This is done to have multiple bitmaps open at the same time.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 09:57:57 -06:00
Goldwyn Rodrigues
b97e92574c Use separate bitmaps for each nodes in the cluster
On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Bitmap super has a field nodes, which defines the maximum number
of nodes the device can use. While reading the bitmap super, if
the cluster finds out that the number of nodes is > 0:
1. Requests the md-cluster module.
2. Calls md_cluster_ops->join(), which sets up clustering such as
   joining DLM lockspace.

Since the first time, the first bitmap is read. After the call
to the cluster_setup, the bitmap offset is adjusted and the
superblock is re-read. This also ensures the bitmap is read
the bitmap lock (when bitmap lock is introduced in later patches)

Questions:
1. cluster name is repeated in all bitmap supers. Is that okay?

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues
cf921cc19c Add node recovery callbacks
DLM offers callbacks when a node fails and the lock remastery
is performed:

1. recover_prep: called when DLM discovers a node is down
2. recover_slot: called when DLM identifies the node and recovery
		can start
3. recover_done: called when all nodes have completed recover_slot

recover_slot() and recover_done() are also called when the node joins
initially in order to inform the node with its slot number. These slot
numbers start from one, so we deduct one to make it start with zero
which the cluster-md code uses.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:30:11 -06:00
Goldwyn Rodrigues
c4ce867fda Introduce md_cluster_info
md_cluster_info stores the cluster information in the MD device.

The join() is called when mddev detects it is a clustered device.
The main responsibilities are:
	1. Setup a DLM lockspace
	2. Setup all initial locks such as super block locks and bitmap lock (will come later)

The leave() clears up the lockspace and all the locks held.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2015-02-23 07:28:42 -06:00
NeilBrown
b7b17c9b67 md: remove mddev_lock() from md_attr_show()
Most attributes can be read safely without any locking.
A race might lead to a slightly out-dated value, but nothing wrong.

We already have locking in some places where needed.
All that remains is can_clear_show(), behind_writes_used_show()
and action_show() which are easily fixed.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:55 +11:00
NeilBrown
978a7a47ca md/bitmap: protect clearing of ->bitmap by mddev->lock
This makes it safe to inspect the struct while holding only
the spinlock.

Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-06 09:32:55 +11:00
NeilBrown
d959014334 md/bitmap: fix a might_sleep() warning.
commit 8eb23b9f35
    sched: Debug nested sleeps

causes false-positive warnings in RAID5 code.

This annotation removes them and adds a comment
explaining why there is no real problem.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-02 17:08:03 +11:00
NeilBrown
4b5060ddae md/bitmap: always wait for writes on unplug.
If two threads call bitmap_unplug at the same time, then
one might schedule all the writes, and the other might
decide that it doesn't need to wait.  But really it does.

It rarely hurts to wait when it isn't absolutely necessary,
and the current code doesn't really focus on 'absolutely necessary'
anyway.  So just wait always.

This can potentially lead to data corruption if a crash happens
at an awkward time and data was written before the bitmap was
updated.  It is very unlikely, but this should go to -stable
just to be safe.  Appropriate for any -stable.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org (please delay until 3.18 is released)
2014-10-09 10:07:04 +11:00
NeilBrown
f2e06c5884 md/bitmap: remove confusing code from filemap_get_page.
file_page_index(store, 0) is *always* 0.
This is because the bitmap sb, at 256 bytes, is *always* less than
one page.
So subtracting it has no effect and the code should be removed.

Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-29 16:59:47 +10:00
NeilBrown
035328c202 md/bitmap: don't abuse i_writecount for bitmap files.
md bitmap code currently tries to use i_writecount to stop any other
process from writing to out bitmap file.  But that is really an abuse
and has bit-rotted so locking is all wrong.

So discard that - root should be allowed to shoot self in foot.

Still use it in a much less intrusive way to stop the same file being
used as bitmap on two different array, and apply other checks to
ensure the file is at least vaguely usable for bitmap storage
(is regular, is open for write.  Support for ->bmap is already checked
elsewhere).

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-04-09 12:26:59 +10:00
Tejun Heo
324a56e16e kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly
kernfs has just been separated out from sysfs and we're already in
full conflict mode.  Nothing can make the situation any worse.  Let's
take the chance to name things properly.

This patch performs the following renames.

* s/sysfs_elem_dir/kernfs_elem_dir/
* s/sysfs_elem_symlink/kernfs_elem_symlink/
* s/sysfs_elem_attr/kernfs_elem_file/
* s/sysfs_dirent/kernfs_node/
* s/sd/kn/ in kernfs proper
* s/parent_sd/parent/
* s/target_sd/target/
* s/dir_sd/parent/
* s/to_sysfs_dirent()/rb_to_kn()/
* misc renames of local vars when they conflict with the above

Because md, mic and gpio dig into sysfs details, this patch ends up
modifying them.  All are sysfs_dirent renames and trivial.  While we
can avoid these by introducing a dummy wrapping struct sysfs_dirent
around kernfs_node, given the limited usage outside kernfs and sysfs
proper, I don't think such workaround is called for.

This patch is strictly rename only and doesn't introduce any
functional difference.

- mic / gpio renames were missing.  Spotted by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-11 15:28:36 -08:00
Tejun Heo
388975ccca sysfs: clean up sysfs_get_dirent()
The pre-existing sysfs interfaces which take explicit namespace
argument are weird in that they place the optional @ns in front of
@name which is contrary to the established convention.  For example,
we end up forcing vast majority of sysfs_get_dirent() users to do
sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
especially as @ns and @name may be interchanged without causing
compilation warning.

This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
around sysfs_get_dirent_ns().  This makes confusions a lot less
likely.

There are other interfaces which take @ns before @name.  They'll be
updated by following patches.

This patch doesn't introduce any functional changes.

v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
    error on module builds.  Reported by build test robot.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-09-26 15:33:18 -07:00