This reverts commit a692b0eec5.
Tom reports:
[ 8.741033] ------------[ cut here ]------------
[ 8.741038] WARNING: at fs/sysfs/dir.c:508 sysfs_add_one+0xc1/0xf0()
[ 8.741040] Hardware name: To Be Filled By O.E.M.
[ 8.741041] sysfs: cannot create duplicate filename
...and missing 2 out of 4 drives connected to mvsas. Commit a692b0ee
made the assumption that all the phy ids an lldd registers to libsas are
unique. However, in the "multi-chip" case mvsas does a rather annoying
duplication of phy ids in the array passed to libsas. So, for example,
chip0 has phy0-3 at ha phy index 0-3 and chip1 has its phy0-3 at ha phy
index 4-7. The more natural model would be to create a scsi_host (and
sas_ha) per chip (controller), but for now revert the naming fix which
unfortunately means dealing with unpredictable end-device names for a
bit longer.
Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Patrick Thomson <patrick.s.thomson@intel.com>
Reported-by: Tom Rini <trini@ti.com>
Tested-by: Tom Rini <trini@ti.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Normalize phy->attached_sas_addr to return a zero-address in the case
when device-type == NO_DEVICE or the linkrate is invalid to handle
expanders that put non-zero sas addresses in the discovery response:
sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device)
sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device)
Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This changes the ordering of initialization and probing events from:
1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
2/ allocate ata_port and schedule port probe in DISCE_PROBE
...to:
1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
3/ schedule port probe in DISCE_PROBE
This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
destrory ata devices before they have been fully initialized:
BUG: unable to handle kernel paging request at 0000000000003b10
IP: [<ffffffffa0053d7e>] sas_ata_end_eh+0x12/0x5e [libsas]
...
[<ffffffffa004d1af>] sas_unregister_common_dev+0x78/0xc9 [libsas]
[<ffffffffa004d4d4>] sas_unregister_dev+0x4f/0xad [libsas]
[<ffffffffa004d5b1>] sas_unregister_domain_devices+0x7f/0xbf [libsas]
[<ffffffffa004c487>] sas_deform_port+0x61/0x1b8 [libsas]
[<ffffffffa004bed0>] sas_phye_loss_of_signal+0x29/0x2b [libsas]
...and kills the awkward "sata domain_device briefly existing in the
domain without an ata_port" state.
Reported-by: Michal Kosciowski <michal.kosciowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The check_ready implementation in the expander-attached ata device case
polls on sas_ex_phy_discover(). The effect is that the ex_phy fields
(critically ->attached_sas_addr) can change. When ata_eh ends and
libsas comes along to revalidate the domain
sas_unregister_devs_sas_addr() can fail to lookup devices to remove, or
fail to re-add an ata device that ata_eh marked as disabled. So change
the code to skip the sas_address and change count updates when ata_eh is
active.
Cc: Jack Wang <jack_wang@usish.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Commit 899fcf4 "[SCSI] libsas: set attached device type and target
protocols for local phys" setup 'phy' to be dereferenced after
list_for_each_entry(phy, &port->phy_list, port_phy_el) (i.e. phy ==
&port->phy_list) resulting in reports like:
BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
IP: [<ffffffffa00ce948>] sas_discover_domain+0x29e/0x4fb [libsas]
...fix by deferring sas_phy_set_target() to the end of
sas_get_port_device().
Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If an expander reports 'PHY VACANT' for a phy index prior to the one
that generated a BCN libsas fails rediscovery. Since a vacant phy is
defined as a valid phy index that will never have an attached device
just continue the search.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When requeuing work to a draining workqueue the last work instance may
not be idle, so sas_queue_work() must not touch work->entry. Introduce
sas_work with a drain_node list_head to have a private list for
collecting work deferred due to drain collision.
Fixes reports like:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
SCSI updates from James Bottomley:
"The update includes the usual assortment of driver updates (lpfc,
qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
amount of infrastructure work in the SAS library and transport class
as well as an iSCSI update. There's also a new SCSI based virtio
driver."
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
[SCSI] qla4xxx: Update driver version to 5.02.00-k15
[SCSI] qla4xxx: trivial cleanup
[SCSI] qla4xxx: Fix sparse warning
[SCSI] qla4xxx: Add support for multiple session per host.
[SCSI] qla4xxx: Export CHAP index as sysfs attribute
[SCSI] scsi_transport: Export CHAP index as sysfs attribute
[SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
[SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
[SCSI] pm8001: fix endian issue with code optimization.
[SCSI] pm8001: Fix possible racing condition.
[SCSI] pm8001: Fix bogus interrupt state flag issue.
[SCSI] ipr: update PCI ID definitions for new adapters
[SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
[SCSI] isci: improvements in driver unloading routine
[SCSI] isci: improve phy event warnings
[SCSI] isci: debug, provide state-enum-to-string conversions
[SCSI] scsi_transport_sas: 'enable' phys on reset
[SCSI] libsas: don't recover end devices attached to disabled phys
[SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
[SCSI] libsas: set attached device type and target protocols for local phys
...
If userspace has decided to disable a phy the kernel should honor that
and not inadvertantly re-enable the phy via error recovery. This is
more straightforward in the sata case where link recovery (via
libata-eh) is separate from sas_task cancelling in libsas-eh. Teach
libsas to accept -ENODEV as a successful response from I_T_nexus_reset
('successful' in terms of not escalating further).
This is a more comprehensive fix then "libsas: don't recover 'gone'
devices in sas_ata_hard_reset()", as it is no longer sata-specific.
aic94xx does check the return value from sas_phy_reset() so if the phy
is disabled we proceed with clearing the I_T_nexus.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If discovery returns 0 for target_port_protocols but shows an attached
sata device, just report SAS_PROTOCOL_SATA in the identify data so
userspace can reliably search for sata devices in the domain.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Before:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
none
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
none
After:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
end device
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
sata
Also downgrade the phy_list_lock to _irq instead of _irqsave since
libsas will never call sas_get_port_device with interrupts disbled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
libata issues follow up srsts when the controller has a hard time
recording the signature-fis after a reset, or if the link supports port
multipliers. libsas does not support port multipliers and no current
libsas lldds appear to need help retrieving the signature fis. Revert
it for now to remove confusion.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Until all sas_tasks are known to no longer be in-flight this flag gates late
completions from colliding with error handling. However, it must be cleared
prior to the submission of scsi_send_eh_cmnd() requests, otherwise those
commands will never be completed correctly.
This was spotted by slub debug:
=============================================================================
BUG sas_task: Objects remaining on kmem_cache_close()
-----------------------------------------------------------------------------
INFO: Slab 0xffffea001f0eba00 objects=34 used=1 fp=0xffff8807c3aecb00 flags=0x8000000000004080
Pid: 22919, comm: modprobe Not tainted 3.2.0-isci+ #2
Call Trace:
[<ffffffff810fcdcd>] slab_err+0xb0/0xd2
[<ffffffff810e1c50>] ? free_percpu+0x31/0x117
[<ffffffff81100122>] ? kzalloc+0x14/0x16
[<ffffffff81100122>] ? kzalloc+0x14/0x16
[<ffffffff81100486>] kmem_cache_destroy+0x11d/0x270
[<ffffffffa0112bdc>] sas_class_exit+0x10/0x12 [libsas]
[<ffffffff81078fba>] sys_delete_module+0x1c4/0x23c
[<ffffffff814797ba>] ? sysret_check+0x2e/0x69
[<ffffffff8126479e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81479782>] system_call_fastpath+0x16/0x1b
INFO: Object 0xffff8807c3aed280 @offset=21120
INFO: Allocated in sas_alloc_task+0x22/0x90 [libsas] age=4615311 cpu=2 pid=12966
__slab_alloc.clone.3+0x1d1/0x234
kmem_cache_alloc+0x52/0x10d
sas_alloc_task+0x22/0x90 [libsas]
sas_queuecommand+0x20e/0x230 [libsas]
scsi_send_eh_cmnd+0xd1/0x30c
scsi_eh_try_stu+0x4f/0x6b
scsi_eh_ready_devs+0xba/0x6ef
sas_scsi_recover_host+0xa35/0xab1 [libsas]
scsi_error_handler+0x14b/0x5fa
kthread+0x9d/0xa5
kernel_thread_helper+0x4/0x10
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
libsas ata error handling is already async but this does not help the
scan case. Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.
Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.
Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.
Acked-by: Jack Wang <jack_wang@usish.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
ata devices are always scanned after ssp. Prior to the ata error
handling reworks libsas would tend to scan devices in ascending expander
phy order. Restore this ordering by deferring ssp discovery to a
DISCE_PROBE event, and keep the probe order consistent with the
discovery order, not the placement of sata devices.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If the phy is attached to a new sas address unregister the first address
before processing the new attachment.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
libsas fails to discover all sata devices in the domain. If a device fails
negotiation and does not transmit a signature fis the link needs recovery.
libata already understands how to manage slow to come up links, so treat these
conditions as ata device attach events for the purposes of creating an
ata_port. This allows libata to manage retrying link bring up.
Rediscovery is modified to be careful about checking changes in dev_type. It
looks like libsas leaks old devices if the sas address changes, but that's a
fix for another patch.
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Make sas-port naming consistent with the expander-attached case whereby
the phy-id is the last digit in the port name. Otherwise we get the
random behavior of the allocation order.
Reported-by: Patrick Thomson <patrick.s.thomson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
It's difficult to determine which domain_device is triggering error recovery,
so convert messages like:
sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028
sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029
...
ata7: sas eh calling libata port error handler
ata8: sas eh calling libata port error handler
...into:
sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp)
sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp)
...
sas: ata7: end_device-21:1: dev error handler
sas: ata8: end_device-20:0:5: dev error handler
which shows attached link rate, device type, and associates a
domain_device with its ata_port id to correlate messages emitted from
libata-eh.
As Doug notes, we can also take the opportunity to clarify expander phy
routing capabilities.
[dgilbert@interlog.com: clarify table2table with 'U']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY
we were holding a reference while the destruct was pending in case the
domain was torn down before the desctruct event ran. That case is
covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed
memory, or worse frees the memory while another agent holds a reference.
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If we have a domain with sas and sata devices there may still be sas
recovery actions to take after peeling off the commands to send to
libata.
Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>