Commit Graph

245 Commits

Author SHA1 Message Date
Guoqing Jiang 4aaf7694f8 md/bitmap: don't read page from device with Bitmap_sync
The device owns Bitmap_sync flag needs recovery
to become in sync, and read page from this type
device could get stale status.

Also add comments for Bitmap_sync bit per the
suggestion from Shaohua and Neil.

Previous disscussion can be found here:
https://marc.info/?t=149760428900004&r=1&w=2

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-07-10 10:30:41 -07:00
Kyungchan Koh 4179bc30b2 md: uuid debug statement now in processor byte order.
Previously, the uuid debug statements were printed in little-endian
format, which wasn't consistent in machines that might not be in
little-endian byte order. With this change, the output will be
consistent for all machines with different byte-ordering.

Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-05-24 15:58:43 -07:00
Zhilong Liu 3560741e31 md: fix several trivial typos in comments
Signed-off-by: Zhilong Liu <zlliu@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-03-23 22:54:57 -07:00
Guoqing Jiang 48df498daf md: move bitmap_destroy to the beginning of __md_stop
Since we have switched to sync way to handle METADATA_UPDATED
msg for md-cluster, then process_metadata_update is depended
on mddev->thread->wqueue.

With the new change, clustered raid could possible hang if
array received a METADATA_UPDATED msg after array unregistered
mddev->thread, so we need to stop clustered raid (bitmap_destroy
-> bitmap_free -> md_cluster_stop) earlier than unregister
thread (mddev_detach -> md_unregister_thread).

And this change should be safe for non-clustered raid since
all writes are stopped before the destroy. Also in md_run,
we activate the personality (pers->run()) before activating
the bitmap (bitmap_create()). So it is pleasingly symmetric
to stop the bitmap (bitmap_destroy()) before stopping the
personality (__md_stop() calls pers->free()), we achieve this
by move bitmap_destroy to the beginning of __md_stop.

But we don't want to break the codes for waiting behind IO as
Shaohua mentioned, so introduce bitmap_wait_behind_writes to
call the codes, and call the new fun in both mddev_detach and
bitmap_destroy, then we will not break original behind IO code
and also fit the new condition well.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-03-16 16:55:58 -07:00
Guoqing Jiang b98938d16a md-cluster: introduce cluster_check_sync_size
Support resize is a little complex for clustered
raid, since we need to ensure all the nodes share
the same knowledge about the size of raid.

We achieve the goal by check the sync_size which
is in each node's bitmap, we can only change the
capacity after cluster_check_sync_size returns 0.

Also, get_bitmap_from_slot is added to get a slot's
bitmap. And we exported some funcs since they are
used in cluster_check_sync_size().

We can also reuse get_bitmap_from_slot to remove
redundant code existed in bitmap_copy_from_slot.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2017-03-16 16:55:50 -07:00
Shaohua Li 2953079c69 md: separate flags for superblock changes
The mddev->flags are used for different purposes. There are a lot of
places we check/change the flags without masking unrelated flags, we
could check/change unrelated flags. These usage are most for superblock
write, so spearate superblock related flags. This should make the code
clearer and also fix real bugs.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-12-08 22:01:47 -08:00
NeilBrown 46533ff7fe md: Use REQ_FAILFAST_* on metadata writes where appropriate
This can only be supported on personalities which ensure
that md_error() never causes an array to enter the 'failed'
state.  i.e. if marking a device Faulty would cause some
data to be inaccessible, the device is status is left as
non-Faulty.  This is true for RAID1 and RAID10.

If we get a failure writing metadata but the device doesn't
fail, it must be the last device so we re-write without
FAILFAST to improve chance of success.  We also flag the
device as LastDev so that future metadata updates don't
waste time on failfast writes.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-22 09:11:33 -08:00
NeilBrown 581dbd94da md/bitmap: add blktrace event for writes to the bitmap
We trace wheneven bitmap_unplug() finds that it needs to write
to the bitmap, or when bitmap_daemon_work() find there is work
to do.

This makes it easier to correlate bitmap updates with data writes.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-18 09:34:45 -08:00
NeilBrown 85c9ccd4f0 md/bitmap: Don't write bitmap while earlier writes might be in-flight
As we don't wait for writes to complete in bitmap_daemon_work, they
could still be in-flight when bitmap_unplug writes again.  Or when
bitmap_daemon_work tries to write again.
This can be confusing and could risk the wrong data being written last.

So make sure we wait for old writes to complete before new writes start.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-07 15:08:23 -08:00
NeilBrown ec0cc22685 md/bitmap: change all printk() to pr_*()
Follow err/warn distinction introduced in md.c
Join multi-part strings into single string.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-07 15:08:21 -08:00
Guoqing Jiang cbb3873236 md/bitmap: call bitmap_file_unmap once bitmap_storage_alloc returns -ENOMEM
It is possible that bitmap_storage_alloc could return -ENOMEM,
and some member inside store could be allocated such as filemap.

To avoid memory leak, we need to call bitmap_file_unmap to free
those members in the bitmap_resize.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-07 15:08:21 -08:00
Shaohua Li f71f1cf97c md/bitmap: fix wrong cleanup
if bitmap_create fails, the bitmap is already cleaned up and the returned value
is an error number. We can't do the cleanup again.

Reported-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-09-21 09:09:44 -07:00
Shaohua Li d9dd26b20c MD: hold mddev lock to change bitmap location
Changing the location changes a lot of things. Holding the lock to avoid race.
This makes the .quiesce called with mddev lock hold too.

Acked-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-08-05 22:02:40 -07:00
Mike Christie 796a5cf083 md: use bio op accessors
Separate the op from the rq_flag_bits and have md
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie 2a222ca992 fs: have submit_bh users pass in op and flags separately
This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Guoqing Jiang 51e453aecb md-cluster: gather resync infos and enable recv_thread after bitmap is ready
The in-memory bitmap is not ready when node joins cluster,
so it doesn't make sense to make gather_all_resync_info()
called so earlier, we need to call it after the node's
bitmap is setup. Also, recv_thread could be wake up after
node joins cluster, but it could cause problem if node
receives RESYNCING message without persionality since
mddev->pers->quiesce is called in process_suspend_info.

This commit introduces a new cluster interface load_bitmaps
to fix above problems, load_bitmaps is called in bitmap_load
where bitmap and persionality are ready, and load_bitmaps
does the following tasks:

1. call gather_all_resync_info to load all the node's
   bitmap info.
2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
   could be wake up, and wake up recv_thread if there is
   pending recv event.

Then ack_bast only wakes up recv_thread after IN_CLUSTER
bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
set.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:24:03 -07:00
kbuild test robot bc47e84258 md-cluster: fix ifnullfree.cocci warnings
drivers/md/bitmap.c:2049:6-11: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00
Guoqing Jiang c84400c89f md-cluster/bitmap: unplug bitmap to sync dirty pages to disk
This patch is doing two distinct but related things.

1. It adds bitmap_unplug() for the main bitmap (mddev->bitmap).  As bit
have been set, BITMAP_PAGE_DIRTY is set so bitmap_deamon_work() will
not write those pages out in its regular scans, only bitmap_unplug()
will.  If there are no writes to the array, bitmap_unplug() won't be
called, so we need to call it explicitly here.

2. bitmap_write_all() is a bit of a confusing interface as it doesn't
actually write anything.  The current code for writing "bitmap" works
but this change makes it a bit clearer.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00
Guoqing Jiang 23cea66a37 md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit
The pnum passed to set_page_attr and test_page_attr should from
0 to storage.file_pages - 1, but bitmap_file_set_bit and
bitmap_file_clear_bit call set_page_attr and test_page_attr with
page->index parameter while page->index has already added node_offset
before.

So we need to minus node_offset in both bitmap_file_clear_bit
and bitmap_file_set_bit.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00
Guoqing Jiang 7f86ffed9b md-cluster/bitmap: fix wrong calcuation of offset
The offset is wrong in bitmap_storage_alloc, we should
set it like below in bitmap_init_from_disk().

node_offset = bitmap->cluster_slot * (DIV_ROUND_UP(store->bytes, PAGE_SIZE));

Because 'offset' is only assigned to 'page->index' and
that is usually over-written by read_sb_page. So it does
not cause problem in general, but it still need to be fixed.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00
Guoqing Jiang 18c9ff7f48 md-cluster: sync bitmap when node received RESYNCING msg
If the node received RESYNCING message which means
another node will perform resync with the area, then
we don't want to do it again in another node.

Let's set RESYNC_MASK and clear NEEDED_MASK for the
region from old-low to new-low which has finished
syncing, and the region from old-hi to new-hi is about
to syncing, bitmap_sync_with_cluste is introduced for
the purpose.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00
Guoqing Jiang c9d6503228 md-cluster: always setup in-memory bitmap
The in-memory bitmap for raid is allocated on demand,
then for cluster scenario, it is possible that slave
node which received RESYNCING message doesn't have the
in-memory bitmap when master node is perform resyncing,
so we can't make bitmap is match up well among each
nodes.

So for cluster scenario, we need always preserve the
bitmap, and ensure the page will not be freed. And a
no_hijack flag is introduced to both bitmap_checkpage
and bitmap_get_counter, which makes cluster raid returns
fail once allocate failed.

And the next patch is relied on this change since it
keeps sync bitmap among each nodes during resyncing
stage.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00
Linus Torvalds 63b106a87d Merge tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
 "This update mainly fixes bugs:

   - fix error handling (Guoqing)
   - fix a crash when a disk is hotremoved (me)
   - fix a dead loop (Wei Fang)"

* tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/bitmap: clear bitmap if bitmap_create failed
  MD: add rdev reference for super write
  md: fix a trivial typo in comments
  md:raid1: fix a dead loop when read from a WriteMostly disk
2016-04-09 11:23:27 -07:00
Kirill A. Shutemov 09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Guoqing Jiang f9a67b1182 md/bitmap: clear bitmap if bitmap_create failed
If bitmap_create returns an error, we need to call
either bitmap_destroy or bitmap_free to do clean up,
and the selection is based on mddev->bitmap is set
or not.

And the sysfs_put(bitmap->sysfs_can_clear) is moved
from bitmap_destroy to bitmap_free, and the comment
of bitmap_create is changed as well.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-04-01 13:05:50 -07:00