The current string_get_size() overflows when the device size goes over
2^64 bytes because the string helper routine computes the suffix from
the size in bytes. However, the entirety of Linux thinks in terms of
blocks, not bytes, so this will artificially induce an overflow on very
large devices. Fix this by making the function string_get_size() take
blocks and the block size instead of bytes. This should allow us to
keep working until the current SCSI standard overflows.
Also fix virtio_blk and mmc (both of which were also artificially
multiplying by the block size to pass a byte side to string_get_size()).
The mathematics of this is pretty simple: we're taking a product of
size in blocks (S) and block size (B) and trying to re-express this in
exponential form: S*B = R*N^E (where N, the exponent is either 1000 or
1024) and R < N. Mathematically, S = RS*N^ES and B=RB*N^EB, so if RS*RB
< N it's easy to see that S*B = RS*RB*N^(ES+EB). However, if RS*BS > N,
we can see that this can be re-expressed as RS*BS = R*N (where R =
RS*BS/N < N) so the whole exponent becomes R*N^(ES+EB+1)
[jejb: fix incorrect 32 bit do_div spotted by kbuild test robot <fengguang.wu@intel.com>]
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
The device model already takes care of races between ->remove and
->shutdown vs its other methods, and we now take care about locking
them out for ->rescan as well.
This is a partial revert of commit 39b7f1 ("[SCSI] sd: Fix refcounting").
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Pull first round of SCSI updates from James Bottomley:
"This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas,
megaraid_sas, ses) plus an assortment of minor updates.
There's also an update to ufs which adds new phy drivers and finally a
new logging infrastructure for SCSI"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits)
scsi_logging: return void for dev_printk() functions
scsi: print single-character strings with seq_putc
scsi: merge consecutive seq_puts calls
scsi: replace seq_printf with seq_puts
aha152x: replace seq_printf with seq_puts
advansys: replace seq_printf with seq_puts
scsi: remove SPRINTF macro
sg: remove an unused variable
hpsa: Use local workqueues instead of system workqueues
hpsa: add in P840ar controller model name
hpsa: add in gen9 controller model names
hpsa: detect and report failures changing controller transport modes
hpsa: shorten the wait for the CISS doorbell mode change ack
hpsa: refactor duplicated scan completion code into a new routine
hpsa: move SG descriptor set-up out of hpsa_scatter_gather()
hpsa: do not use function pointers in fast path command submission
hpsa: print CDBs instead of kernel virtual addresses for uncommon errors
hpsa: do not use a void pointer for scsi_cmd field of struct CommandList
hpsa: return failed from device reset/abort handlers
hpsa: check for ctlr lockup after command allocation in main io path
...
The following patch fixes an issue observed with 4k sector disks
where the max_hw_sectors attribute was getting set too large in
sd_revalidate_disk. Since sdkp->max_xfer_blocks is in units
of SCSI logical blocks and queue_max_hw_sectors is in units of
512 byte blocks, on a 4k sector disk, every time we went through
sd_revalidate_disk, we were taking the current value of
queue_max_hw_sectors and increasing it by a factor of 8. Fix
this by only shifting sdkp->max_xfer_blocks.
Cc: stable@vger.kernel.org
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Convert sense buffer logging to use the per-cpu buffer to avoid line
breakup.
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
7985090aa0 changed the discard heuristics to give preference to the
WRITE SAME commands that (unlike UNMAP) guarantee deterministic results.
Ming Lei discovered that QEMU SCSI's WRITE SAME implementation
internally relied on limits that were only communicated for the UNMAP
case. And therefore discard commands backed by WRITE SAME would fail.
Tweak the heuristics so we still pick UNMAP in the LBPRZ=0 case and only
prefer the WRITE SAME variants if the device has the LBPRZ flag set.
Reported-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16).
So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be
consistent with SPC and to allow for better distinction.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The driver core driver structure has grown an owner field and now
requires it to be set for all modular drivers. Set it up for
all scsi_driver instances and get rid of the now superflous
scsi_driver owner field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Shane M Seymour <shane.seymour@hp.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
There is no reason for ULDs to pass in a flag on how to allocate the S/G
lists. While we don't need GFP_ATOMIC for the blk-mq case because we
don't hold locks, that decision can be made way down the chain without
having to pass a pointless gfp_mask argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
The T10 SBC UNMAP command does not provide any hard guarantees that
blocks will return zeroes on a subsequent READ. This is due to the fact
that the device server is free to silently ignore all or parts of the
request.
The only way to ensure that a block consistently returns zeroes after
being unmapped is to use WRITE SAME with the UNMAP bit set. Should the
device be unable to unmap one or more blocks described by the command it
is required to manually write zeroes to them.
Until now we have preferred UNMAP over the WRITE SAME variants to
accommodate thinly provisioned devices that predated the final SBC-3
spec. This patch changes the heuristic so that we favor WRITE SAME(16)
or (10) over UNMAP if these commands are marked as supported in the
Logical Block Provisioning VPD page.
The patch also disables discard_zeroes_data for devices operating in
UNMAP mode.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
No need to verify the passthrough ioctls, the real handler will
take care of that. Also make sure not to block for resets on
O_NONBLOCK fds.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
The calling conventions for this function are bad as it could return
-ENODEV both for a device not currently online and a not recognized ioctl.
Add a new scsi_ioctl_block_when_processing_errors function that wraps
scsi_block_when_processing_errors with the a special case for the
SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself
in scsi_ioctl. All callers of scsi_ioctl now must call the above helper
to check for the EH state, so that the ioctl handler itself doesn't
have to.
Reported-by: Robert Elliott <Elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Open-code scsi_print_result in sd.c, and cleanup logging to
not print duplicate informations.
Also remove the call to scsi_show_result() in ufshcd.c
to be consistent with other callers of scsi_execute().
With that we can remove scsi_show_result in constants.c
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We should be using the standard dev_printk() variants for
sense code printing.
[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
sd_done() was calling scsi_print_sense() for a sense code
of 'NO_SENSE'.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.
- blk-mq timeout updates and fixes from Christoph.
- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.
- blk integrity updates and fixes from Martin and Gu Zheng.
- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.
- Improve the error printing, from Rob Elliott.
- Backing device improvements and cleanups from Tejun.
- Fixup of a misplaced rq_complete() tracepoint from Hannes.
- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.
- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.
- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.
- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"
* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.
Historically, all block devices have automatically made entropy
contributions. But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
- On SSD disks, the completion times aren't as random as they
are for rotational drives. So it's questionable whether they
should contribute to the random pool in the first place.
- Calling add_disk_randomness() has a lot of overhead.
There are more reliable sources for randomness than non-rotational block
devices. From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
A set of flags introduced in the block layer enable better control over
how protection information is handled. These flags are useful for both
error injection and data recovery purposes. Checking can be enabled and
disabled for controller and disk, and the guard tag format is now a
per-I/O property.
Update sd_protect_op to communicate the relevant information to the
low-level device driver via a set of flags in scsi_cmnd.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
SCSI Well-known logical units generally don't have any scsi driver
associated with it which means no one will call scsi_autopm_put_device()
on these wlun scsi devices and this would result in keeping the
corresponding scsi device always active (hence LLD can't be suspended as
well). Same exact problem can be seen for other scsi device representing
normal logical unit whose driver is yet to be loaded. This patch fixes
the above problem with this approach:
- make the scsi_autopm_put_device call at the end of scsi_sysfs_add_sdev
to make it balance out the get earlier in the function.
- let drivers do paired get/put calls in their probe methods.
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The SYNCHRONIZE_CACHE command is a medium write command and hence can
fail when the device is write protected. Avoid sending such commands by
making sure that write-cache-enable is disabled even though the device
claim to support it.
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit ID: 7e660100d8 added code to derive the
FLUSH_TIMEOUT from the basic I/O timeout. However, this patch did not use the
basic I/O timeout of the device. Fix this bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Despite supporting modern SCSI features some storage devices continue to
claim conformance to an older version of the SPC spec. This is done for
compatibility with legacy operating systems.
Linux by default will not attempt to read VPD pages on devices that
claim SPC-2 or older. Introduce a blacklist flag that can be used to
trigger VPD page inquiries on devices that are known to support them.
Reported-by: KY Srinivasan <kys@microsoft.com>
Tested-by: KY Srinivasan <kys@microsoft.com>
Reviewed-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We currently set the field in common code based on the device type,
but then only use it in the cdrom driver which also overrides the
value previously set in the generic code.
Just leave this entirely to the CDROM driver to make everyones life
simpler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>