One of the strange things that the original sg driver did was let the
user provide both a data-out buffer (it followed the sg_header+cdb)
_and_ specify a reply length greater than zero. What happened was that
the user data-out buffer was copied into some kernel buffers and then
the mid level was told a read type operation would take place with the
data from the device overwriting the same kernel buffers. The user would
then read those kernel buffers back into the user space.
From what I can tell, the above action was broken by commit fad7f01e61
("sg: set dxferp to NULL for READ with the older SG interface") in 2008
and syzkaller found that out recently.
Make sure that a user space pointer is passed through when data follows
the sg_header structure and command. Fix the abnormal case when a
non-zero reply_len is also given.
Fixes: fad7f01e61
Cc: <stable@vger.kernel.org> #v2.6.28+
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In sg_common_write(), we free the block request and return -ENODEV if
the device is detached in the middle of the SG_IO ioctl().
Unfortunately, sg_finish_rem_req() also tries to free srp->rq, so we
end up freeing rq->cmd in the already free rq object, and then free
the object itself out from under the current user.
This ends up corrupting random memory via the list_head on the rq
object. The most common crash trace I saw is this:
------------[ cut here ]------------
kernel BUG at block/blk-core.c:1420!
Call Trace:
[<ffffffff81281eab>] blk_put_request+0x5b/0x80
[<ffffffffa0069e5b>] sg_finish_rem_req+0x6b/0x120 [sg]
[<ffffffffa006bcb9>] sg_common_write.isra.14+0x459/0x5a0 [sg]
[<ffffffff8125b328>] ? selinux_file_alloc_security+0x48/0x70
[<ffffffffa006bf95>] sg_new_write.isra.17+0x195/0x2d0 [sg]
[<ffffffffa006cef4>] sg_ioctl+0x644/0xdb0 [sg]
[<ffffffff81170f80>] do_vfs_ioctl+0x90/0x520
[<ffffffff81258967>] ? file_has_perm+0x97/0xb0
[<ffffffff811714a1>] SyS_ioctl+0x91/0xb0
[<ffffffff81602afb>] tracesys+0xdd/0xe2
RIP [<ffffffff81281e04>] __blk_put_request+0x154/0x1a0
The solution is straightforward: just set srp->rq to NULL in the
failure branch so that sg_finish_rem_req() doesn't attempt to re-free
it.
Additionally, since sg_rq_end_io() will never be called on the object
when this happens, we need to free memory backing ->cmd if it isn't
embedded in the object itself.
KASAN was extremely helpful in finding the root cause of this bug.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there. If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.
X-Coverup: TINC (and there's no lumber cartel either)
Cc: stable@vger.kernel.org # way, way back
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull misc SCSI patches from James Bottomley:
"This is a short patch set representing a couple of left overs from the
merge window (debug removal and MAINTAINER changes).
Plus one merge window regression (the local workqueue for hpsa) and a
set of bug fixes for several issues (two for scsi-mq and the rest an
assortment of long standing stuff, all cc'd to stable)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
sg: fix EWOULDBLOCK errors with scsi-mq
sg: fix unkillable I/O wait deadlock with scsi-mq
sg: fix read() error reporting
wd719x: add missing .module to wd719x_template
hpsa: correct compiler warnings introduced by hpsa-add-local-workqueue patch
fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit.
fcoe: Transition maintainership to Vasu
am53c974: remove left-over debugging code
With scsi-mq enabled, userspace programs can get unexpected EWOULDBLOCK
(a.k.a. EAGAIN) errors when submitting commands to the SCSI generic
driver. Fix by calling blk_get_request() with GFP_KERNEL instead of
GFP_ATOMIC.
Note: to avoid introducing a potential deadlock, this patch should be
applied after the patch titled "sg: fix unkillable I/O wait deadlock
with scsi-mq".
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When using the write()/read() interface for submitting commands, the
SCSI generic driver does not call blk_put_request() on a completed SCSI
command until userspace calls read() to get the command completion.
Since scsi-mq uses a fixed number of preallocated requests, this makes
it possible for userspace to exhaust the entire preallocated supply of
requests. For places in the kernel that call blk_get_request() with
GFP_KERNEL, this can cause the calling process to deadlock in a
permanent unkillable I/O wait in blk_get_request() -> ... -> bt_get().
For places in the kernel that call blk_get_request() with GFP_ATOMIC,
this can cause blk_get_request() always to return -EWOULDBLOCK. Note
that these problems happen only if scsi-mq is enabled. Prevent the
problems by calling blk_put_request() as soon as the SCSI command
completes instead of waiting for userspace to call read().
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Pull core block IO changes from Jens Axboe:
"This contains:
- A series from Christoph that cleans up and refactors various parts
of the REQ_BLOCK_PC handling. Contributions in that series from
Dongsu Park and Kent Overstreet as well.
- CFQ:
- A bug fix for cfq for realtime IO scheduling from Jeff Moyer.
- A stable patch fixing a potential crash in CFQ in OOM
situations. From Konstantin Khlebnikov.
- blk-mq:
- Add support for tag allocation policies, from Shaohua. This is
a prep patch enabling libata (and other SCSI parts) to use the
blk-mq tagging, instead of rolling their own.
- Various little tweaks from Keith and Mike, in preparation for
DM blk-mq support.
- Minor little fixes or tweaks from me.
- A double free error fix from Tony Battersby.
- The partition 4k issue fixes from Matthew and Boaz.
- Add support for zero+unprovision for blkdev_issue_zeroout() from
Martin"
* 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits)
block: remove unused function blk_bio_map_sg
block: handle the null_mapped flag correctly in blk_rq_map_user_iov
blk-mq: fix double-free in error path
block: prevent request-to-request merging with gaps if not allowed
blk-mq: make blk_mq_run_queues() static
dm: fix multipath regression due to initializing wrong request
cfq-iosched: handle failure of cfq group allocation
block: Quiesce zeroout wrapper
block: rewrite and split __bio_copy_iov()
block: merge __bio_map_user_iov into bio_map_user_iov
block: merge __bio_map_kern into bio_map_kern
block: pass iov_iter to the BLOCK_PC mapping functions
block: add a helper to free bio bounce buffer pages
block: use blk_rq_map_user_iov to implement blk_rq_map_user
block: simplify bio_map_kern
block: mark blk-mq devices as stackable
block: keep established cmd_flags when cloning into a blk-mq request
block: add blk-mq support to blk_insert_cloned_request()
block: require blk_rq_prep_clone() be given an initialized clone request
blk-mq: add tag allocation policy
...
Make use of a new interface provided by iov_iter, backed by
scatter-gather list of iovec, instead of the old interface based on
sg_iovec. Also use iov_iter_advance() instead of manual iteration.
This commit should contain only literal replacements, without
functional changes.
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park <dongsu.park@profitbricks.com>
[hch: fixed to do a deep clone of the iov_iter, and to properly use
the iov_iter direction]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The 'data_dir' variable is not used in sg_common_write(), hence
remove this variable.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The calling conventions for this function are bad as it could return
-ENODEV both for a device not currently online and a not recognized ioctl.
Add a new scsi_ioctl_block_when_processing_errors function that wraps
scsi_block_when_processing_errors with the a special case for the
SG_SCSI_RESET ioctl command, and handle the SG_SCSI_RESET case itself
in scsi_ioctl. All callers of scsi_ioctl now must call the above helper
to check for the EH state, so that the ioctl handler itself doesn't
have to.
Reported-by: Robert Elliott <Elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Pull the common code from the two callers into the function,
and rename it to scsi_ioctl_reset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
We should be using the standard dev_printk() variants for
sense code printing.
[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Like scmd_printk(), but the device name is passed in as
a string. Can be used by eg ULDs which do not have access
to the scsi_cmnd structure.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Further to a January 2013 thread titled: "[PATCH] SG_SCSI_RESET ioctl
should only perform requested operation" by Jeremy Linton a patch (v3)
is presented that expands the existing ioctl to include "no_escalate"
versions to the existing resets. This requires no changes to SCSI low
level drivers (LLDs); it adds several more finely tuned reset options
to the user space. For example:
/* This call remains the same, with the same escalating semantics
* if the device (LU) reset fail. That is: on failure to try a
* target reset and if that fails, try a bus reset, and if that fails
* try a host (i.e. LLD) reset. */
val = SG_SCSI_RESET_DEVICE;
res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val);
/* What follows is a new option introduced by this patch series. Only
* a device reset is attempted. If that fails then an appropriate
* error code is provided. N.B. There is no reset escalation. */
val = SG_SCSI_RESET_DEVICE | SG_SCSI_RESET_NO_ESCALATE;
res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val);
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Jeremy Linton <jlinton@tributary.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.
For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request. It may fail if the queue is dead, or the caller was
unwilling to wait.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Avoid taking the queue_lock to check the per-device queue limit. Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.
Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock round trip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
Update the sg driver to use dev_printk() variants instead of
plain printk(); this will prefix logging messages with the
appropriate device.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Doug Gilbert <dgilbert@interlog.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The SCSI standard defines 64-bit values for LUNs, and large arrays
employing large or hierarchical LUN numbers become more and more
common.
So update the linux SCSI stack to use 64-bit LUN numbers.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This addresses a problem reported by Vaughan Cao concerning
the correctness of the O_EXCL logic in the sg driver. POSIX
doesn't defined O_EXCL semantics on devices but "allow only
one open file descriptor at a time per sg device" is a rough
definition. The sg driver's semantics have been to wait
on an open() when O_NONBLOCK is not given and there are
O_EXCL headwinds. Nasty things can happen during that wait
such as the device being detached (removed). So multiple
locks are reworked in this patch making it large and hard
to break down into digestible bits.
This patch is against Linus's current git repository which
doesn't include any sg patches sent in the last few weeks.
Hence this patch touches as little as possible that it
doesn't need to and strips out most SCSI_LOG_TIMEOUT()
changes in v3 because Hannes said he was going to rework all
that stuff.
The sg3_utils package has several test programs written to
test this patch. See examples/sg_tst_excl*.cpp .
Not all the locks and flags in sg have been re-worked in
this patch, notably sg_request::done . That can wait for
a follow-up patch if this one meets with approval.
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
When the SG_IO ioctl was copied into the block layer and
later into the bsg driver, subtle differences emerged.
One difference is the way injected commands are queued through
the block layer (i.e. this is not SCSI device queueing nor SATA
NCQ). Summarizing:
- SG_IO in the block layer: blk_exec*(at_head=false)
- sg SG_IO: at_head=true
- bsg SG_IO: at_head=true
Some time ago Boaz Harrosh introduced a sg v4 flag called
BSG_FLAG_Q_AT_TAIL to override the bsg driver default.
This patch does the equivalent for the sg driver.
ChangeLog:
Introduce SG_FLAG_Q_AT_TAIL flag to cause commands
to be injected into the block layer with
at_head=false.
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>