Commit 9567366fef ("dm cache metadata: fix READ_LOCK macros and
cleanup WRITE_LOCK macros") uses down_write() instead of down_read() in
cmd_read_lock(), yet up_read() is used to release the lock in
READ_UNLOCK(). Fix it.
Fixes: 9567366fef ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros")
Cc: stable@vger.kernel.org
Signed-off-by: Ahmed Samy <f.fallen45@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The READ_LOCK macro was incorrectly returning -EINVAL if
dm_bm_is_read_only() was true -- it will always be true once the cache
metadata transitions to read-only by dm_cache_metadata_set_read_only().
Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0).
Also, all accesses of the 'cmd' argument passed to these related macros
are now encapsulated in parenthesis.
A follow-up patch can be developed to eliminate the use of macros in
favor of pure C code. Avoiding that now given that this needs to apply
to stable@.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: d14fcf3dd7 ("dm cache: make sure every metadata function checks fail_io")
Cc: stable@vger.kernel.org
Commit c80914e81e ("dm: return error if bio_integrity_clone() fails
in clone_bio()") changed clone_bio() such that if it does return error
then the alloc_tio() created resources (both the bio that was allocated
to be a clone and the containing dm_target_io struct) will leak.
Fix this by calling free_tio() in __clone_and_map_data_bio()'s
clone_bio() error path.
Fixes: c80914e81e ("dm: return error if bio_integrity_clone() fails in clone_bio()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull MD fixes from Shaohua Li:
"This update mainly fixes bugs:
- fix error handling (Guoqing)
- fix a crash when a disk is hotremoved (me)
- fix a dead loop (Wei Fang)"
* tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: clear bitmap if bitmap_create failed
MD: add rdev reference for super write
md: fix a trivial typo in comments
md:raid1: fix a dead loop when read from a WriteMostly disk
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If bitmap_create returns an error, we need to call
either bitmap_destroy or bitmap_free to do clean up,
and the selection is based on mddev->bitmap is set
or not.
And the sysfs_put(bitmap->sysfs_can_clear) is moved
from bitmap_destroy to bitmap_free, and the comment
of bitmap_create is changed as well.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
If first_bad == this_sector when we get the WriteMostly disk
in read_balance(), valid disk will be returned with zero
max_sectors. It'll lead to a dead loop in make_request(), and
OOM will happen because of endless allocation of struct bio.
Since we can't get data from this disk in this case, so
continue for another disk.
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Pull MD updates from Shaohua Li:
"This update mainly fixes bugs.
- a raid5 discard related fix from Jes
- a MD multipath bio clone fix from Ming
- raid1 error handling deadlock fix from Nate and corresponding
raid10 fix from myself
- a raid5 stripe batch fix from Neil
- a patch from Sebastian to avoid unnecessary uevent
- several cleanup/debug patches"
* tag 'md/4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid5: Cleanup cpu hotplug notifier
raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang
raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang
md: fix typos for stipe
md/bitmap: remove redundant return in bitmap_checkpage
md/raid1: remove unnecessary BUG_ON
md: multipath: don't hardcopy bio in .make_request path
md/raid5: output stripe state for debug
md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list
Update MD git tree URL
md/bitmap: remove redundant check
MD: warn for potential deadlock
md: Drop sending a change uevent when stopping
RAID5: revert e9e4c377e2 to fix a livelock
RAID5: check_reshape() shouldn't call mddev_suspend
md/raid5: Compare apples to apples (or sectors to sectors)
Pull block driver updates from Jens Axboe:
"This is the block driver pull request for this merge window. It sits
on top of for-4.6/core, that was just sent out.
This contains:
- A set of fixes for lightnvm. One from Alan, fixing an overflow,
and the rest from the usual suspects, Javier and Matias.
- A set of fixes for nbd from Markus and Dan, and a fixup from Arnd
for correct usage of the signed 64-bit divider.
- A set of bug fixes for the Micron mtip32xx, from Asai.
- A fix for the brd discard handling from Bart.
- Update the maintainers entry for cciss, since that hardware has
transferred ownership.
- Three bug fixes for bcache from Eric Wheeler.
- Set of fixes for xen-blk{back,front} from Jan and Konrad.
- Removal of the cpqarray driver. It has been disabled in Kconfig
since 2013, and we were initially scheduled to remove it in 3.15.
- Various updates and fixes for NVMe, with the most important being:
- Removal of the per-device NVMe thread, replacing that with a
watchdog timer instead. From Christoph.
- Exposing the namespace WWID through sysfs, from Keith.
- Set of cleanups from Ming Lin.
- Logging the controller device name instead of the underlying
PCI device name, from Sagi.
- And a bunch of fixes and optimizations from the usual suspects
in this area"
* 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits)
NVMe: Expose ns wwid through single sysfs entry
drivers:block: cpqarray clean up
brd: Fix discard request processing
cpqarray: remove it from the kernel
cciss: update MAINTAINERS
NVMe: Remove unused sq_head read in completion path
bcache: fix cache_set_flush() NULL pointer dereference on OOM
bcache: cleaned up error handling around register_cache()
bcache: fix race of writeback thread starting before complete initialization
NVMe: Create discard zero quirk white list
nbd: use correct div_s64 helper
mtip32xx: remove unneeded variable in mtip_cmd_timeout()
lightnvm: generalize rrpc ppa calculations
lightnvm: remove struct nvm_dev->total_blocks
lightnvm: rename ->nr_pages to ->nr_sects
lightnvm: update closed list outside of intr context
xen/blback: Fit the important information of the thread in 17 characters
lightnvm: fold get bb tbl when using dual/quad plane mode
lightnvm: fix up nonsensical configure overrun checking
xen-blkback: advertise indirect segment support earlier
...
The raid456_cpu_notify() hotplug callback lacks handling of the
CPU_UP_CANCELED case. That means if CPU_UP_PREPARE fails, the scratch
buffer is leaked.
Add handling for CPU_UP_CANCELED[_FROZEN] hotplug notifier transitions
to free the scratch buffer.
CC: Shaohua Li <shli@kernel.org>
CC: linux-raid@vger.kernel.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Shaohua Li <shli@fb.com>
This is the raid10 counterpart of the bug fixed by Nate
(raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang)
Fixes: 95af587e95(md/raid10: ensure device failure recorded before write request returns)
Cc: stable@vger.kernel.org (V4.3+)
Cc: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.
This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.
If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.
To prevent the hang, include bio_end_io_list writes in nr_queued.
There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.
I'm happy to supply the script I used to repro this hang.
Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.)
Cc: stable@vger.kernel.org (v4.3+)
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.6:
API:
- Convert remaining crypto_hash users to shash or ahash, also convert
blkcipher/ablkcipher users to skcipher.
- Remove crypto_hash interface.
- Remove crypto_pcomp interface.
- Add crypto engine for async cipher drivers.
- Add akcipher documentation.
- Add skcipher documentation.
Algorithms:
- Rename crypto/crc32 to avoid name clash with lib/crc32.
- Fix bug in keywrap where we zero the wrong pointer.
Drivers:
- Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
- Add PIC32 hwrng driver.
- Support BCM6368 in bcm63xx hwrng driver.
- Pack structs for 32-bit compat users in qat.
- Use crypto engine in omap-aes.
- Add support for sama5d2x SoCs in atmel-sha.
- Make atmel-sha available again.
- Make sahara hashing available again.
- Make ccp hashing available again.
- Make sha1-mb available again.
- Add support for multiple devices in ccp.
- Improve DMA performance in caam.
- Add hashing support to rockchip"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: qat - remove redundant arbiter configuration
crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
crypto: qat - Change the definition of icp_qat_uof_regtype
hwrng: exynos - use __maybe_unused to hide pm functions
crypto: ccp - Add abstraction for device-specific calls
crypto: ccp - CCP versioning support
crypto: ccp - Support for multiple CCPs
crypto: ccp - Remove check for x86 family and model
crypto: ccp - memset request context to zero during import
lib/mpi: use "static inline" instead of "extern inline"
lib/mpi: avoid assembler warning
hwrng: bcm63xx - fix non device tree compatibility
crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
crypto: qat - The AE id should be less than the maximal AE number
lib/mpi: Endianness fix
crypto: rockchip - add hash support for crypto engine in rk3288
crypto: xts - fix compile errors
crypto: doc - add skcipher API documentation
crypto: doc - update AEAD AD handling
...
An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq->special.
dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq->special to NULL. But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.
Fix this by calling rq_end_stats() _before_ dm_unprep_request()
Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: e262f34741 ("dm stats: add support for request-based DM devices")
Cc: stable@vger.kernel.org # 4.2+
The "return 0" is not needed since bitmap_checkpage
will finally return 0 for the case.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Since bitmap_start_sync will not return until
sync_blocks is not less than PAGE_SIZE>>9, so
the BUG_ON is not needed anymore.
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Inside multipath_make_request(), multipath maps the incoming
bio into low level device's bio, but it is totally wrong to
copy the bio into mapped bio via '*mapped_bio = *bio'. For
example, .__bi_remaining is kept in the copy, especially if
the incoming bio is chained to via bio splitting, so .bi_end_io
can't be called for the mapped bio at all in the completing path
in this kind of situation.
This patch fixes the issue by using clone style.
Cc: stable@vger.kernel.org (v3.14+)
Reported-and-tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Commit 0a927c2f02 ("dm thin: return -ENOSPC when erroring retry list due
to out of data space") was a step in the right direction but didn't go
far enough.
Add a new 'out_of_data_space' flag to 'struct pool' and set it if/when
the pool runs of of data space. This fixes cell_error() and
error_retry_list() to not blindly return -EIO.
We cannot rely on the 'error_if_no_space' feature flag since it is
transient (in that it can be reset once space is added, plus it only
controls whether errors are issued, it doesn't reflect whether the
pool is actually out of space).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Otherwise operations may be attempted that will only ever go on to crash
(since the metadata device is either missing or unreliable if 'fail_io'
is set).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org