Commit Graph

482 Commits

Author SHA1 Message Date
Ryan Kuester
aaf3b48b50 SCSI: mptsas: fix hangs caused by ATA pass-through
commit 2a1b7e575b upstream.

I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
pass-through commands, in particular by smartctl.

First, my version of the symptoms.  On an LSI SAS1068E B3 HBA running
01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
occasional task, bus, and host resets, some of which lead to hard faults of
the HBA requiring a reboot.  Abusively looping the smartctl command,

    # while true; do smartctl -a /dev/sdb > /dev/null; done

dramatically increases the frequency of these failures to nearly one per
minute.  A high IO load through the HBA while looping smartctl seems to
improve the chance of a full scsi host reset or a non-recoverable hang.

I reduced what smartctl was doing down to a simple test case which
causes the hang with a single IO when pointed at the sd interface.  See
the code at the bottom of this e-mail.  It uses an SG_IO ioctl to issue
a single pass-through ATA identify device command.  If the buffer
userspace gives for the read data has certain alignments, the task is
issued to the HBA but the HBA fails to respond.  If run against the sg
interface, neither the test code nor smartctl causes a hang.

sd and sg handle the SG_IO ioctl slightly differently.  Unless you
specifically set a flag to do direct IO, sg passes a buffer of its own,
which is page-aligned, to the block layer and later copies the result
into the userspace buffer regardless of its alignment.  sd, on the other
hand, always does direct IO unless the userspace buffer fails an
alignment test at block/blk-map.c line 57, in which case a page-aligned
buffer is created and used for the transfer.

The alignment test currently checks for word-alignment, the default
setup by scsi_lib.c; therefore, userspace buffers of almost any
alignment are given directly to the HBA as DMA targets.  The LSI 1068
hardware doesn't seem to like at least a couple of the alignments which
cross a page boundary (see the test code below).  Curiously, many
page-boundary-crossing alignments do work just fine.

So, either the hardware has an bug handling certain alignments or the
hardware has a stricter alignment requirement than the driver is
advertising.  If stricter alignment is required, then in no case should
misaligned buffers from userspace be allowed through without being
bounced or at least causing an error to be returned.

It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
a stricter alignment requirement.  If it does, sd does the right thing and
bounces misaligned buffers (see block/blk-map.c line 57).  The following
patch to 2.6.34-rc5 makes my symptoms go away.  I'm sure this is the wrong
place for this code, but it gets my idea across.

Acked-by: Kashyap Desai <Kashyap.Desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26 17:21:28 -07:00
Kashyap, Desai
3def5de896 mptspi: Fix for incorrect data underrun errata
commit 9b53b39243 upstream.

Errata:
Certain conditions on the scsi bus may casue the 53C1030 to incorrectly signal
a SCSI_DATA_UNDERRUN to the host.

Workaround 1:
For an Errata on LSI53C1030 When the length of request data
and transfer data are different with result of command (READ or VERIFY),
DID_SOFT_ERROR is set.

Workaround 2:
For potential trouble on LSI53C1030. It is checked whether the length of
request data is equal to the length of transfer and residual.
MEDIUM_ERROR is set by incorrect data.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:36 -07:00
Kashyap, Desai
8e3400b3d1 mptctl : Remove printk which floods unnecessary messages to var/log/message
commit e39e145dfb upstream.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-04-26 07:41:36 -07:00
Kashyap, Desai
36a508e6c6 SCSI: mptfusion : mptscsih_abort return value should be SUCCESS instead of value 0.
commit 9858ae3801 upstream.

retval should be SUCCESS/FAILED which is defined at scsi.h
retval = 0 is directing wrong return value. It must be retval = SUCCESS.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:55 -08:00
Anatolij Gustschin
b7a9d922a5 mptsas: Fix issue with chain pools allocation on katmai
commit f1053a7ca9 upstream.

Since commit 9d2e9d66a3
mptsas driver fails to allocate memory for the MPT chain buffers
for second LSI adapter on PPC440SPe Katmai platform:
...
ioc1: LSISAS1068E B3: Capabilities={Initiator}
mptbase: ioc1: ERROR - Unable to allocate Reply, Request, Chain Buffers!
mptbase: ioc1: ERROR - didn't initialize properly! (-3)
mptsas: probe of 0002:31:00.0 failed with error -3

This commit increased MPT_FC_CAN_QUEUE value but initChainBuffers()
doesn't differentiate between SAS and FC causing increased allocation
for SAS case, too. Later pci_alloc_consistent() fails to allocate
increased chain buffer pool size for SAS case.

Provide a fix by looking at the bus type and using appropriate
MPT_SAS_CAN_QUEUE value while calculation of the number of chain
buffers.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09 04:50:41 -08:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
Linus Torvalds
342ff1a1b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  trivial: fix typo in aic7xxx comment
  trivial: fix comment typo in drivers/ata/pata_hpt37x.c
  trivial: typo in kernel-parameters.txt
  trivial: fix typo in tracing documentation
  trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
  trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
  trivial: remove unnecessary semicolons
  trivial: Fix duplicated word "options" in comment
  trivial: kbuild: remove extraneous blank line after declaration of usage()
  trivial: improve help text for mm debug config options
  trivial: doc: hpfall: accept disk device to unload as argument
  trivial: doc: hpfall: reduce risk that hpfall can do harm
  trivial: SubmittingPatches: Fix reference to renumbered step
  trivial: fix typos "man[ae]g?ment" -> "management"
  trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
  trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
  trivial: fix missing printk space in amd_k7_smp_check
  trivial: fix typo s/ketymap/keymap/ in comment
  trivial: fix typo "to to" in multiple files
  trivial: fix typos in comments s/DGBU/DBGU/
  ...
2009-09-22 07:51:45 -07:00
Alexey Dobriyan
83d5cde47d const: make block_device_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:25 -07:00
Uwe Kleine-Koenig
3dbda77e6f trivial: fix typos "man[ae]g?ment" -> "management"
Signed-off-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:56 +02:00
Linus Torvalds
39695224bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (209 commits)
  [SCSI] fix oops during scsi scanning
  [SCSI] libsrp: fix memory leak in srp_ring_free()
  [SCSI] libiscsi, bnx2i: make bound ep check common
  [SCSI] libiscsi: add completion function for drivers that do not need pdu processing
  [SCSI] scsi_dh_rdac: changes for rdac debug logging
  [SCSI] scsi_dh_rdac: changes to collect the rdac debug information during the initialization
  [SCSI] scsi_dh_rdac: move the init code from rdac_activate to rdac_bus_attach
  [SCSI] sg: fix oops in the error path in sg_build_indirect()
  [SCSI] mptsas : Bump version to 3.04.12
  [SCSI] mptsas : FW event thread and scsi mid layer deadlock in SYNCHRONIZE CACHE command
  [SCSI] mptsas : Send DID_NO_CONNECT for pending IOs of removed device
  [SCSI] mptsas : PAE Kernel more than 4 GB kernel panic
  [SCSI] mptsas : NULL pointer on big endian systems causing Expander not to tear off
  [SCSI] mptsas : Sanity check for phyinfo is added
  [SCSI] scsi_dh_rdac: Add support for Sun StorageTek ST2500, ST2510 and ST2530
  [SCSI] pmcraid: PMC-Sierra MaxRAID driver to support 6Gb/s SAS RAID controller
  [SCSI] qla2xxx: Update version number to 8.03.01-k6.
  [SCSI] qla2xxx: Properly delete rports attached to a vport.
  [SCSI] qla2xxx: Correct various NPIV issues.
  [SCSI] qla2xxx: Correct qla2x00_eh_wait_on_command() to wait correctly.
  ...
2009-09-14 17:53:36 -07:00
Kashyap, Desai
b437b95620 [SCSI] mptsas : Bump version to 3.04.12
Bump version to 3.04.12

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:29 -05:00
Kashyap, Desai
9766096d33 [SCSI] mptsas : FW event thread and scsi mid layer deadlock in SYNCHRONIZE CACHE command
Normally In HBA reset path MPT driver will flush existing work in current work
queue (mpt/0) . This is just a dummy activity for MPT driver point of
view, since HBA reset will turn off Work queue events.

It means we will simply returns from work queue without doing anything.
But for the case where Work is already done (half the way), we have to have
that work to be done.

Considering above condition we stuck forever since Deadlock in scsi midlayer
and MPT driver. sd_sync_cache() will wait forever since HBA is not in
Running state, and it will never come into Running state since
sd_sync_cache() is called from HBA reset context.
Now new code will not wait for half cooked work to be finished
before returning from HBA reset.

Once we are out of HBA reset, EH thread will change host state to running from
recovery and work waiting for running state of HBA will be finished.
New code is turning ON firmware event from another special work called
Rescan toplogy.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:28 -05:00
Kashyap, Desai
fea984034b [SCSI] mptsas : Send DID_NO_CONNECT for pending IOs of removed device
Driver is modified to return DID_NO_CONNECT for all pending I/O
requests for bus type SAS, if it founds the target is removed at
the firmware level.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:28 -05:00
Kashyap, Desai
c55b89fba9 [SCSI] mptsas : PAE Kernel more than 4 GB kernel panic
This patch is solving problem for PAE kernel DMA operation.
On PAE system dma_addr and unsigned long will have different
values.
Now dma_addr is not type casted using unsigned long.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:27 -05:00
Kashyap, Desai
f44fd18198 [SCSI] mptsas : NULL pointer on big endian systems causing Expander not to tear off
On Big endian system kernel will crash due to address translation
is not handle properly.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:26 -05:00
Kashyap, Desai
9e39089b95 [SCSI] mptsas : Sanity check for phyinfo is added
Check for phyinfo->phy before calling sas_port_delete_phy.

Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-09-12 09:35:25 -05:00
Kashyap, Desai
8ea0696e92 [SCSI] mptsas : Bump version to 3.04.11
Bump version to 3.04.11

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:26 -05:00
Kashyap, Desai
d130691725 [SCSI] mptsas : Code cleanup of host page alloc and diag reset.
Code cleanup of host page alloc and diag reset.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:26 -05:00
Kashyap, Desai
79a3ec1ace [SCSI] mptsas : set max_id to infinite value.
Do not set max_id value received from FW. Once SAS transport layer is
introduced max_id value is missleading to SCSI mid layer. Use max_id to
infinite value.

logic of can queue of scsi host is changed.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:26 -05:00
Kashyap, Desai
4b97650b55 [SCSI] mptsas : Change config request timeout value to 30 seconds.
Change config request timeout value to 30 seconds.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:25 -05:00
Kashyap, Desai
d23321b488 [SCSI] mptsas : Handle INSUFFICIENT resources status as similar to IOC BUSY status
Handle insufficient resources status as similar to busy status.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:25 -05:00
Kashyap, Desai
a247fa4521 [SCSI] mptsas : Removed mptscsih_timer_expired.
Removed mptscsih_timer_expired. This timer is no more use.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:24 -05:00
Kashyap, Desai
9d2e9d66a3 [SCSI] mptsas : Change DEFINED value of can queue for FC and SAS devices.
Change DEFINED value of can queue for FC and SAS devices.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:52:24 -05:00
Patrick McHardy
6ed106549d net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions
This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.

Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-05 19:16:04 -07:00
Jiri Slaby
129dd98194 fusion: mptsas, fix lock imbalance
Fix two typos in mptsas_not_responding_devices. It was mutex_lock instead
of unlock.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: "Desai, Kashyap" <Kashyap.Desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-22 08:54:14 -05:00