If the driver fails to allocate the contiguous (DMAable) memory for
system reasons, we fail to load the instance, but then we try to free
the <nul> allocation in the cleanup code and we get a panic in
pci_free_consistent(). This is reported against an older kernel, hope
this is relevant for latest/greatest.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The patch is *much* smaller than the description. I am attempting to
answer to those that want to understand an issue that was reported in
May this year.
If a Sunrise Lake based card that requires an alternate reset mechanism
is set up to ignore the commanded IOP_RESET it reports 0x00000010
(IOP_RESET ignored) instead of 0x3803000F (use alternate reset mechanism
to reset all cores), and thus the reset platform function decides to
switch to IOP_RESET_ALWAYS because the reset platform function
parameters indicate that we *need* to reset the card. IOP_RESET_ALWAYS
then responds with the 0x3803000F return code, but alas we treat this as
an error instead of using the alternate reset mechanism (put a 0x03 into
the register offset 0x38). The reset fails, but the fact that the
IOP_RESET_ALWAYS command was issued has put the card in a purposeful
shutdown state in preparation for the alternate hardware reset to be
applied. Yuck.
IOP_RESET is ignored in internal production cards, typically to ensure
that we catch all adapter lockup issues without the driver progressing
further, so this would not appear to be a field issue and thus this
patch was destined to be only in the internal Adaptec source tree.
IOP_RESET_ALWAYS is reserved for
kexec/kdump/FirmwareUpdate/AutomatedTestFrames so we did not function as
expected in any case. Also in the past we have had OEMs specifically
request that cards not be resetable after a BlinkLED/FirmwareAssert for
one reason or another and To head off the possibility that the Sunrise
Lake based cards would suffer a similar fate, we propose the enclosed
fix.
Yinghai Lu of SUN had a pre-production card with IOP_RESET disabled when
he reported an issue to the linux kernel list back in May regarding a
kexec problem resulting from this reset being ignore. His fix was to
update the Firmware to one that did not ignore the IOP_RESET. Previous
kernels did not attempt to reset the adapter and that is why it surfaced
as a regression in his hands.
The current list of aacraid based cards that use Sunrise Lake:
9005:0285:9005:02b5 Adaptec 5445
9005:0285:9005:02b6 Adaptec 5805
9005:0285:9005:02b7 Adaptec 5085
9005:0285:9005:02c3 Adaptec 51205
9005:0285:9005:02c4 Adaptec 51605
9005:0285:9005:02ce Adaptec 51245
9005:0285:9005:02cf Adaptec 51645
9005:0285:9005:02d0 Adaptec 52445
9005:0285:9005:02d1 Adaptec 5405
9005:0285:9005:02b8 ICP ICP5445SL
9005:0285:9005:02b9 ICP ICP5085SL
9005:0285:9005:02ba ICP ICP5805SL
9005:0285:9005:02c5 ICP ICP5125SL
9005:0285:9005:02c6 ICP ICP5165SL
9005:0285:108e:7aac SUN STK RAID REM
9005:0285:108e:0286 SUN STK RAID INT
9005:0285:108e:0287 SUN STK RAID EXT
9005:0285:108e:7aae SUN STK RAID EM
All of these are publicly released with IOP_RESET enabled. So there is
no immediate need for this patch.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Customer running an application that issues SYNCHRONIZE_CACHE calls
directly noticed the broad stroke of the current implementation in the
aacraid driver resulting in multiple applications feeding I/O to the
storage causing the issuing application to stall for long periods of
time. By only waiting for the current WRITE commands, rather than all
commands, to complete; and those that are in range of the
SYNCHRONIZE_CACHE call that would associate more tightly with the
issuing application before telling the Firmware to flush it's dirty
cache, we managed to reduce the stalling. The Firmware itself still
flushes all the dirty cache associated with the array ignoring the
range, it just does so in a more timely manner.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Minor unimportant cuttings from the floor bundled in with a version
stamp update. Only controversial change is the dropping of Alan Cox
copyright on the nark.c module since that file has no code written by
him in it.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We need to newline terminate responses from nodes within the sysfs tree,
the Adapter status value reported by the reset adapter node is adjusted.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On the SCSI layer ioctl path there is no implicit permissions check for
ioctls (and indeed other drivers implement unprivileged ioctls). aacraid
however allows all sorts of very admin only things to be done so should
check.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: "Salyzyn, Mark" <mark_salyzyn@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Report VPD inquiry page 0x80 with an unique array creation serial
number (CUID). When an array is created, the metadata stored on the
physical drives gets an unique serial number. This serial number
remains constant through array morphing or migration to other
controllers. This patch is a forward port and modification to survive
morphing and migration operations, of a similar piece of
(un-attributed author) code added to the SLES10 SP1 aacraid driver.
To test the results of the patch, observe that /dev/disk/by-id/
entries will show up for the arrays resulting from the udev rules.
Also, as per the udev rules, 'scsi_id -g -x -a -s /block/sd? -d
/dev/sd?' will report the ID_SERIAL as constructed from the inquiry
data.
It was reported to me that the 'ADPT' leading the serial number was bad
form, that the inquiry vendor field was enough to differentiate the
storage uniquely. Subsequent search found that another Adaptec AAC based
driver reported the 8 hex serial number only without such adornments, so
dropped ADPT to match. Resubmitting the patch with this alteration.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Incorrect dma mask was used for blinkled (firmware assert) recovery or
user initiated reset during initialization portion. Ensure that all
callers of aac_fib_map_free null out the fib allocation references to
prevent multiple free. Although serious sounding, no reports of these
problems have surfaced...
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
During an Adapter Initiated scan request, the query disk ioctl reports a
value of 2 rather than 1 for the valid field. This presents a problem
for some legacy management applications.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
[SCSI] ibmvscsi: convert to use the data buffer accessors
[SCSI] dc395x: convert to use the data buffer accessors
[SCSI] ncr53c8xx: convert to use the data buffer accessors
[SCSI] sym53c8xx: convert to use the data buffer accessors
[SCSI] ppa: coding police and printk levels
[SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
[SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
[SCSI] remove the dead CYBERSTORMIII_SCSI option
[SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
[SCSI] Clean up scsi_add_lun a bit
[SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
[SCSI] sni_53c710: Cleanup
[SCSI] qla4xxx: Fix underrun/overrun conditions
[SCSI] megaraid_mbox: use mutex instead of semaphore
[SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
[SCSI] qla2xxx: update version to 8.02.00-k1.
[SCSI] qla2xxx: add support for NPIV
[SCSI] stex: use resid for xfer len information
[SCSI] Add Brownie 1200U3P to blacklist
[SCSI] scsi.c: convert to use the data buffer accessors
...
Support displaying long serial number information. Reuse sysfs handler
internally as helper.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The Dell PERC2/QC erroneously was listed as having the 31 bit limit
quirk on the interface allocations, removing the reference to repair
this oversight. Also, the 2 quad pci address (family) match catch-all
also retained the 31 bit limit and the 34 SG limit quirks in a paranoid
move. Now, many years later, we find that none of the Adapters that did
trigger with the family match had such quirks; these quirks are all
limited to the 4 quad pci address matches to select legacy adapters
already populated.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch is more like a spelling correction than a fix. It was
discovered that if we had a busy status return from the Adapter for the
SCSI srb command to a physical component, that we returned
DID_NO_CONNECT rather than what one would expect DID_BUS_BUSY.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add the ability for an application to issue a hardware reset to the
adapter via sysfs. Typical uses include restarting the adapter after it
has been flashed. Bumped revision number for the driver and added a
feature to periodically check the adapter's health (check_interval),
update the adapter's concept of time (update_interval) and block
checking/resetting of the adapter (check_reset).
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Sundry cleanups:
1) Use kzalloc instead of kmalloc.
2) Make sure probe worked before recalling the SCSI command to finalize
processing.
3) _aac_probe_container2 and _aac_probe_container1 return value goes
unused, change return to void.
4) Use a lower depth pointer reference to pick up the driver instance
variable.
5) Although effectively unused except to fake for scsicmd validity, set
the scsi_done in probe code to aac_probe_container_callback1 instead of
the less valid dummy reference to _aac_probe_container1.
6) SCp.phase is set in aac_valid_context, drop setting up this value in
caller when unnecessary.
7) take container target id at the beginning, rather than referencing
scmd_id() to pick it up.
There should be no side effects or functionality changes.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Moves quiesce, thread and interrupt shutdown into aacraid drivers'
.shutdown handler. This fix to the aac_shutdown handler will remove the
superfluous reset of the adapter during a (clean) kexec.
This fix may mitigate the active investigation 'kexec and aacraid
broken' but it is unlikely to affect the root cause (issue likely
present in both kexec and kdump). This patch reduces the chance the
problem will occur with a kexec. The fix for root cause is currently
expected to be the minimum value check to the aacraid.startup_timeout
driver variable after an adapter reset within aacraid_commit_reset.patch
submitted on 05/22/2007 and awaiting testing by Yinghai to confirm.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Inspired by Brian King's patch to the ibmvscsi driver. Adds support for
a changeable queue depth to the aacraid driver.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Conflicts:
drivers/scsi/jazz_esp.c
Same changes made by both SCSI and SPARC trees: problem with UTF-8
conversion in the copyright.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Under some conditions associated with the unclean transition to kdump,
the aacraid adapters will view the array as foreign and not export it to
prevent access and data manipulation. The solution is to submit a commit
configuration to export the devices since this is a expected behavior
when transitioning to a kdump kernel.
This patch adds the aacraid.reset_devices flag and when either this or
the global reset_devices flag is set, ensures that a commit config is
issued and extends the startup_timeout if it is set less than 5 minutes.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Back in the beginning of last year we disabled mode page 8 and mode page
3f requests through device quirk bits instead of enhancing the driver to
respond to these mode pages because there was no apparent added value.
The Firmware that supports the new communication commands supports the
ability to force a write around of the adapter cache on a command by
command basis. In the attached patch we enable mode page 8 and 3f and
spoof the results as needed in order to *convince* the layers above to
submit writes with the FUA (Force Unit Attention) bit set if the file
system or application requires it, if the Firmware supports the write
through, or instead to submit a SYNCHRONIZE_CACHE if the Firmware does
not. The added value here is for file systems that benefit from this
functionality and for clustering or redundancy scenarios.
Caveats: By convince, we are responding with a minimal short 3 byte
content mode page 8, with only the data the SCSI layer needs and that we
can fill confidently. Applications that require the customarily larger
mode page 8 results may be confused by this(?). The FUA, or the
SYNCHRONIZE_CACHE only affect the cache on the controller. Our firmware
by default ensure that the underlying physical drives of the array have
their cache turned off so normally this is not a problem.
This attached patch is against current scsi-misc-2.6 and was unit tested
on RHEL5. Since this is a feature enhancement, it should not be
considered for any current stabilization efforts.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
http://bugzilla.kernel.org/show_bug.cgi?id=8469
As discussed in the bugzilla outlined below, we have an sa based
(Mustang) RAID adapter on the system, a Dell PERC2/QC. Affected
controllers are HP NetRAID, Adaptec AAC-364, Dell PERC2/QC or Adaptec
5400S. This problem coincides with the introduction of the adapter_comm
and adapter_deliver platform functions (Message [PATCH 1/4] aacraid:
rework communication support code, January 23 2007, which initially
migrated to 2.6.21)
The panic occurs with an uninitialized adapter_deliver platform function
pointer. The enclosed patch, unmodified as tested by Rainer, solves the
problem.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>