As a precaution, set bi_end_io to NULL when failing to remap.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
kill_bdev as well, so brd doesn't have to open code it. Reduce
buffer_head.h requirement accordingly.
Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving. The small comment replacing it says enough.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed
with any other target type, and once loaded into a device, it cannot be
replaced with a table containing a different type.
The thin provisioning pool device will use this.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Since set_current_state() contains a memory barrier in it,
an additional barrier isn't needed.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
printk_ratelimit() shares global ratelimiting state with all
other subsystems, so its usage is discouraged. Instead,
define and use dm's local state.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Avoid the hacks need for request based device mappers currently by simply
exporting the symbol instead of trying to get it through the back door.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
regardless of whether or not a given DM device's underlying devices
also advertised a need for them.
Block's flush-merge changes from 2.6.39 have proven to be more costly
for DM devices. Performance regressions have been reported even when
DM's underlying devices do not advertise that they have a write cache.
Fix the performance regressions by configuring a DM device's flushing
capabilities based on those of the underlying devices' capabilities.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns. When set, use this to send large bios to snapshots
which can split them if necessary. Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.
Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function. After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.
The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped. Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function. Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.
If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove 'discards_supported' from the dm_table structure. The same
information can be easily discovered from the table's target(s) in
dm_table_supports_discards().
Before this fix dm_table_supports_discards() would skip checking the
individual targets' 'discards_supported' flag if any one target in the
table didn't set num_discard_requests > 0. Now the per-target
'discards_supported' flag is effective at insuring the final DM device
advertises discard support. But, to be clear, targets that don't
support discards (!num_discard_requests) will not receive discard
requests.
Also DMWARN if a target sets 'discards_supported' override but forgets
to set 'num_discard_requests'.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
After the stack plugging introduction, these are called lockless.
Ensure that the counters are updated atomically.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.
Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.
Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snizer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch changes spin_lock_irq() to spin_lock() in dm_request_fn().
This patch is just a clean-up and no functional change.
The spin_lock_irq() was leftover from the early request-based dm code,
where map_request() used to enable interrupts.
Since current map_request() never enables interrupts, we can change it
to spin_lock() to match the prior spin_unlock().
Auditing through the dm and block-layer code called from
map_request(), I confirmed all functions save/restore interrupt
status, so no function returning with interrupts enabled.
Also I haven't observed any problem on my test environment which
uses scsi and lpfc driver after heavy I/O testing with occasional
path down/up.
Added BUG_ON() to detect breakage in future.
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
kmirrord_wq, kcopyd_work and md->wq are created per dm instance and
serve only a single work item from the dm instance, so non-reentrant
workqueues would provide the same ordering guarantees as ordered ones
while allowing CPU affinity and use of the workqueues for other
purposes. Switch them to non-reentrant workqueues.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue(). This conversion is mechanical and
doesn't introduce any behavior change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch replaces dm_mutex with _minor_lock in dm_blk_close()
and then removes it.
During the BKL conversion, commit 6e9624b8ca
(block: push down BKL into .open and .release) pushed lock_kernel()
down into dm_blk_open/close calls.
Commit 2a48fc0ab2
(block: autoconvert trivial BKL users to private mutex) converted it to a
local mutex, but _minor_lock is sufficient.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
No longer needlessly hold md->bdev->bd_inode->i_mutex when changing the
size of a DM device. This additional locking is unnecessary because
i_size_write() is already protected by the existing critical section in
dm_swap_table(). DM already has a reference on md->bdev so the
associated bd_inode may be changed without lifetime concerns.
A negative side-effect of having held md->bdev->bd_inode->i_mutex was
that a concurrent DM device resize and flush (via fsync) would deadlock.
Dropping md->bdev->bd_inode->i_mutex eliminates this potential for
deadlock. The following reproducer no longer deadlocks:
https://www.redhat.com/archives/dm-devel/2009-July/msg00284.html
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...