Multiple Discovery Fixes:
- Fix race on discovery due to link events coinciding with vport_delete.
- Use NLP_FABRIC state to filter out switch-based pseudo initiators that
reuse the same WWNs.
- Correct erroneous setting of DID=0 in lpfc_matchdid()
- Correct extra reference count that was in the lookup path for the
remoteid from an unsolicited ELS.
- Correct double-free bug in els abort path.
- Correct FDMI server discovery logic for switch that return a WWN of 0.
- Fix bugs in ndlp mgmt when a node changes address
- Correct bug that did not delete RSCNs for vports upon link transitions
- Fix "0216 Link event during NS query" error which pops up when vports
are swapped to different switch ports.
- Add sanity checks on ndlp structures
- Fix devloss log message to dump WWN correctly
- Hold off mgmt commands that were interferring with discovery mailbox cmds
- Remove unnecessary FC_ESTABLISH_LINK logic.
- Correct some race conditions in the worker thread, resulting in devloss:
- Clear the work_port_events field before handling the work port events
- Clear the deferred ring event before handling a deferred ring event
- Hold the hba lock when waking up the work thread
- Send an acc for the rscn even when we aren't going to handle it
- Fix locking behavior that was not properly protecting the ACTIVE flag,
thus allowing mailbox command order to shift.
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Upon having configuration changes on vports only, the driver
handles SCR regardless physical port state and, in turn, it
results mailbox error as below:
Mar 20 11:24:20 dl585 kernel: qla2x00_mailbox_command(9): **** FAILED. mbx0=4005, mbx1=1, mbx2=8100, cmd=70 ****
With the changes, driver checks physical port loop_state and make
sure the port is ready to take commands.
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There were several places where referencing ha structure of
virtual ports for resources. Among those refereces, certain
fields are get up-to-dated only on ha structure of physical port.
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Now that infrastructure is present within the midlayer and there
is a clear distinction between what is expected from a device and
target reset, convert the current device-reset codes to a
target-reset, and add codes to perform a proper device-reset (LUN
reset).
In the process of adding reset support, collapse and consolidate
large sections of mailbox-command (TMF issuance) codes,
generalize the two 'wait-for-commands-to-complete' functions, and
add a generic-reset routine for use by midlayer reset functions.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some switches return 0x09 (Command not supported) as the reason
code for GPSC failure. Check for this code, and disable
additional GPSC queries if found.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The Flash Descriptor Table (FDT) present on many recent HBAs
encodes flash accessing characteristics of the flash-part used on
the HBA. Use this information during flash manipulation (writes)
rather than using specific hard-coded values based on queried
manufacturer and device IDs.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Strip unused (DEBUG-ONLY) enabled functions, inlines, useless
wrappers, and unused DPC flags from the code. Another step in
the migration towards a cleaner (less-crusty) driver.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Recent ISPs have a region within FLASH which acts as a repository
for the logging of serious hardware and software failures.
Currently, the region is large enough to support up to 255
entries.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Supported events include LIP, LIP reset, RSCN, link up, and link
down.
To support AEN (and additional forthcoming features), we also
introduce a simple deferred-work construct to manage events which
require a non-atomic sleeping-capable context. This work-list is
processed as part of the driver's standard DPC routine.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There's no need to use the heavier (albiet safer)
*_irq[save|restore]() locking primitives within the driver's
interrupt handlers, interrupts are guaranteed to be
non-reentrant. Use lightweight spin_lock() and spin_unlock()
primitives while acquiring the hardware_lock.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
These are completely wrong because both outX and writeX do an
automatic reverse of their arguments if necessary, so having an extra
cpu_to_leX gives us the wrong ordering on BE platforms again.
Acked-by: Mark Salyzyn <Mark_Salyzyn@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Every current transport class calls transport_container_release but
ignores the return value. This is catastrophic if it returns an error
because the containers are part of a global list and the next action of
almost every transport class is to free the memory used by the
container.
Fix this by making transport_container_release a void, but making it BUG
if attribute_container_release returns an error ... this catches the
root cause of a system panic much earlier. If we don't do this, we get
an eventual BUG when the attribute container list notices the corruption
caused by the freed memory it's still referencing.
Also made attribute_container_release __must_check as a reminder.
Cc: Greg KH <greg@kroah.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Currently, if the barrier command fails, the error return isn't seen
by the block layer and it proceeds on regardless. The problem is that
SCSI always returns no error for REQ_TYPE_BLOCK_PC ... it expects the
submitter to pick the errors out of req->errors, which the block
barrier functions don't do.
Since it appears that the way SG_IO and scsi_execute_request() work
they discard the block error return and always use req->errors, the
best fix for this is to have the SCSI layer return an error to block
if one actually occurred (this also allows us to filter out spurious
errors, like deferred sense).
This patch is a bug fix that will need backporting to stable, but it's
also quite a big change and in need of testing, so we'll incubate in
the main kernel tree and backport at the -rc2 or so stage if no
problems turn up.
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>