Commit Graph

78 Commits

Author SHA1 Message Date
Eric W. Biederman 78b7611c4a msi: mask the msix vector before we unmap it
With these two lines in the reverse order the drives/block/ccis.c was
oopsing in msi_free_irqs.  Silly us calling writel on an area after
we unmap it.

BUG: unable to handle kernel paging request at virtual address f8b2200c
 printing eip:
c01e9cc7
*pdpt = 0000000000003001
*pde = 0000000037e48067
*pte = 0000000000000000
Oops: 0002 [#1]
SMP
Modules linked in: cciss ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core
sunrpc loop dm_multipath button battery asus_acpi ac tg3 floppy sg dm_snapshot
dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata mptsas scsi_transport_sas
mptspi scsi_transport_spi mptscsih mptbase sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c01e9cc7>]    Not tainted VLI
EFLAGS: 00010286   (2.6.22-rc2-gd2579053 #1)
EIP is at msi_free_irqs+0x81/0xbe
eax: f8b22000   ebx: f71f3180   ecx: f7fff280   edx: c1886eb8
esi: f7c4e800   edi: f7c4ec48   ebp: 00000002   esp: f5a0dec8
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process rmmod (pid: 5286, ti=f5a0d000 task=c47d2550 task.ti=f5a0d000)
Stack: 00000002 f8b72294 00000400 f8b69ca7 f8b6bc6c 00000002 00000000 00000000
       00000000 00000000 00000000 f5a997f4 f8b69d61 f7c5a4b0 f7c4e848 f7c4e848
       f7c4e800 f7c4e800 f8b72294 f7c4e848 f8b72294 c01e3cdf f7c4e848 c024c469
Call Trace:
 [<f8b69ca7>] cciss_shutdown+0xae/0xc3 [cciss]
 [<f8b69d61>] cciss_remove_one+0xa5/0x178 [cciss]
 [<c01e3cdf>] pci_device_remove+0x16/0x35
 [<c024c469>] __device_release_driver+0x71/0x8e
 [<c024c56e>] driver_detach+0xa0/0xde
 [<c024bc5c>] bus_remove_driver+0x27/0x41
 [<c01e3ef3>] pci_unregister_driver+0xb/0x13
 [<f8b6a343>] cciss_cleanup+0xf/0x51 [cciss]
 [<c0139ced>] sys_delete_module+0x110/0x135
 [<c0104c7a>] sysenter_past_esp+0x5f/0x85

Here's a patch that just reverses the 2 lines of code as Eric suggests. Please
consider this for inclusion.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Chase Maupin <chase.maupin@hp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:27 -07:00
Eric W. Biederman 0dd11f9be4 msi: fix the ordering of msix irqs
"Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net> writes:

Found what seems the problem with our vectors being listed backward.  In
drivers/pci/msi.c we should be using list_add_tail rather than list_add to
preserve the ordering across various kernels.  Please consider this for
inclusion.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Screwed-up-by: Michael Ellerman <michael@ellerman.id.au>
Cc: "Mike Miller (OS Dev)" <mikem@beardog.cca.cpqcorp.net>
Cc: Andi Kleen <ak@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-01 08:18:27 -07:00
Dan Williams 4fdadebc31 msi: fix ARM compile
In file included from drivers/pci/msi.c:22:
include/asm/smp.h:17:26: asm/arch/smp.h: No such file or directory
include/asm/smp.h:20:3: #error "<asm-arm/smp.h> included in non-SMP build"
include/asm/smp.h:23:1: warning: "raw_smp_processor_id" redefined
In file included from include/linux/sched.h:65,
                 from include/linux/mm.h:4,
                 from drivers/pci/msi.c:10:
include/linux/smp.h:85:1: warning: this is the location of the previous
definition

Tested on powerpc, i386, and x86_64.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-31 16:56:36 -07:00
David Miller b3b7cc7b41 Fix assertion failure with MSI on sparc64
Today's find is a triggered assertion in msi_free_irqs() when the system
doesn't support MSI, in which case arch_setup_msi_irqs() always returns
an error.

The problem is that when this happens we branch into msi_free_irqs(), to
which you added the following assertion loop:

	list_for_each_entry(entry, &dev->msi_list, list)
		BUG_ON(irq_has_action(entry->irq));

Well, if arch_setup_msi_irqs() fails, entry->irq will be zero and
although that's never assigned to any normal devices we use that IRQ
number for the timer interrupt on sparc64 so this assertion triggers.

Better to test for zero before doing the irq_has_action() assertion
thing.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-11 16:01:18 -07:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Michael Ellerman 032de8e2fe MSI: Give archs the option to free all MSI/Xs at once.
This patch introduces an optional function, arch_teardown_msi_irqs(),
which gives an arch the opportunity to do per-device teardown for
MSI/X. If that's not required, the default version simply calls
arch_teardown_msi_irq() for each msi irq required.

arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev
is a list of msi_descs, it is up to the arch to free the irq associated
with each of these as appropriate.

For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs
with irq == 0 are considered unallocated, and the arch teardown routine
is not called on them.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 9c8313343c MSI: Give archs the option to allocate all MSI/Xs at once.
This patch introduces an optional function, arch_setup_msi_irqs(),
(note the plural) which gives an arch the opportunity to do per-device
setup for MSI/X and then allocate all the requested MSI/Xs at once.

If that's not required by the arch, the default version simply calls
arch_setup_msi_irq() for each MSI irq required.

arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list
of msi_descs with irq == 0, it is up to the arch to connect these up to
an irq (via set_irq_msi()) or return an error. For convenience the number
of vectors and the type are passed also.

All msi_descs with irq != 0 are considered allocated, and the arch
teardown routine will be called on them when necessary.

The existing semantics of pci_enable_msix() are that if the requested
number of irqs can not be allocated, the maximum number that _could_ be
allocated is returned. To support that, we define that in case of an
error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0
are considered allocated, and are counted toward the "max that could be
allocated".


Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 7fe3730de7 MSI: arch must connect the irq and the msi_desc
set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
it at some point in their setup routine, and then the generic code sets up the
reverse mapping from the msi_desc back to the irq.

set_irq_msi() should do both connections, making it the one and only call
required to connect an irq with it's MSI desc and vice versa.

The arch code MUST call set_irq_msi(), and it must do so only once it's sure
it's not going to fail the irq allocation.

Given that there's no need for the arch to return the irq anymore, the return
value from the arch setup routine just becomes 0 for success and anything else
for failure.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:38 -07:00
Michael Ellerman 314e77b3ee MSI: Remove dev->first_msi_irq
Now that we keep a list of msi descriptors, we don't need first_msi_irq
in the pci dev.

If we somehow have zero MSIs configured list_entry() will give us weird
oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also
a check in pci_msi_check_device() to make sure nvec > 0.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman 4aa9bc955d MSI: Use a list instead of the custom link structure
The msi descriptors are linked together with what looks a lot like
a linked list, but isn't a struct list_head list. Make it one.

The only complication is that previously we walked a list of irqs, and
got the descriptor for each with get_irq_msi(). Now we have a list of
descriptors and need to get the irq out of it, so it needs to be in the
actual struct msi_desc. We use 0 to indicate no irq is setup.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman c9953a73e9 MSI: Add an arch_msi_check_device()
Add an arch_check_device(), which gives archs a chance to check the input
to pci_enable_msi/x. The arch might be interested in the value of nvec so
pass it in. Propagate the error value returned from the arch routine out
to the caller.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:37 -07:00
Michael Ellerman 17bbc12acd MSI: Rename pci_msi_supported() to pci_msi_check_device()
As pointed out by Eric, the name pci_msi_supported() suggests it should
return a boolean value, however it doesn't. So update the name to be
a bit less confusing and update the doco too.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman 128bc5fced MSI: Consolidate precondition checks
Consolidate precondition checks into a single if statement.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman b1e2303dba MSI: Expand pci_msi_supported()
pci_enable_msi() and pci_enable_msix() both search for the MSI/MSI-X
capability, we can fold this into pci_msi_supported() by passing the
type in.

Update the code to match the comment for pci_msi_supported(). That is
it returns 0 on success, and anything else indicates an error.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman 3e916c0503 MSI: Remove msi_cache
We don't need a special cache just for msi descriptors. They're not
particularly large, under 100 bytes for sure, and don't seem to require any
special alignment etc. On most systems there will be relatively few MSIs,
and hence we waste most of a page on the cache. Better to just kzalloc the
space for the few we do need.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman 4cc086fa5b MSI: Move EXPORT_SYMBOL()s near their definition
Move EXPORT_SYMBOL()s near their definition.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman 7ede9c1fa5 MSI: Consolidate BUG_ON()s.
When freeing MSIs and MSI-Xs, we BUG_ON() if the irq has not been
freed, ie. if it still has an action. We can consolidate all of these
BUG_ON()s into msi_free_irqs() as all the code paths lead there almost
immediately anyway.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman fc4afc7b2b MSI: Consolidate MSI-X irq freeing code
For the MSI-X case we do exactly the same logic in pci_disable_msix() and
msi_remove_pci_irq_vectors(), so consolidate them.

msi_remove_pci_irq_vectors() wasn't setting dev->first_msi_irq to 0, but
I think it should have been, so the consolidated version does.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman 00ba16ab26 MSI: Simplify BUG() handling in msi_remove_pci_irq_vectors() part 2
Although it might be nice to do a printk before BUG'ing, it's really not
necessary, and it complicates the code.

The behaviour has changed slightly, in that before we set a flag if the irq
had an action, and continued freeing the other irqs. But as I see it that's
all irrelevant because we end up BUG'ing anyway.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman c31af39870 MSI: Simplify BUG() handling in msi_remove_pci_irq_vectors() part 1
Although it might be nice to do a printk before BUG'ing, it's really not
necessary, and it complicates the code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman 54bc6c0b0e MSI: Simplify BUG() handling in pci_disable_msix()
Although it might be nice to do a printk before BUG'ing, it's really not
necessary, and it complicates the code.

The behaviour has changed slightly, in that before we set a flag if the irq
had an action, and continued freeing the other irqs. But as I see it that's
all irrelevant because we end up BUG'ing anyway.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:36 -07:00
Michael Ellerman e387b9eefe MSI: Simplify BUG() handling in pci_disable_msi()
Although it might be nice to do a printk before BUG'ing, it's really not
necessary, and it complicates the code.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:35 -07:00
Mitch Williams 988cbb15e0 PCI: Flush MSI-X table writes
This patch fixes a kernel bug which is triggered when using the
irqbalance daemon with MSI-X hardware.

Because both MSI-X interrupt messages and MSI-X table writes are posted,
it's possible for them to cross while in-flight.  This results in
interrupts being received long after the kernel thinks they're disabled,
and in interrupts being sent to stale vectors after rebalancing.

This patch performs a read flush after writes to the MSI-X table for
mask and unmask operations.  Since the SMP affinity is set while
the interrupt is masked, and since it's unmasked immediately after,
no additional flushes are required in the various affinity setting
routines.

This patch has been validated with (unreleased) network hardware which
uses MSI-X.

Revised with input from Eric Biederman.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02 19:02:34 -07:00
Eric W. Biederman 348e3fd194 [PATCH] msi: synchronously mask and unmask msi-x irqs.
This is a simplified and actually more comprehensive form of a bug
fix from Mitch Williams <mitch.a.williams@intel.com>.

When we mask or unmask a msi-x irqs the writes may be posted because
we are writing to memory mapped region.  This means the mask and
unmask don't happen immediately but at some unspecified time in the
future.  Which is out of sync with how the mask/unmask logic work
for ioapic irqs.

The practical result is that we get very subtle and hard to track down
irq migration bugs.

This patch performs a read flush after writes to the MSI-X table for mask
and unmask operations.  Since the SMP affinity is set while the interrupt
is masked, and since it's unmasked immediately after, no additional flushes
are required in the various affinity setting routines.

The testing by Mitch Williams on his especially problematic system should
still be valid as I have only simplified the code, not changed the
functionality.

We currently have 7 drivers: cciss, mthca, cxgb3, forceth, s2io,
pcie/portdrv_core, and qla2xxx in 2.6.21 that are affected by this
problem when the hardware they driver is plugged into the right slot.

Given the difficulty of reproducing this bug and tracing it down to
anything that even remotely resembles a cause, even if people are
being affected we aren't likely to see many meaningful bug reports, and
the people who see this bug aren't likely to be able to reproduce this
bug in a timely fashion.  So it is best to get this problem fixed
as soon as we can so people don't have problems.

Then if people do have a kernel message stating "No irq for vector" we
will know it is yet another novel cause that needs a complete new
investigation.

Cc: Greg KH <greg@kroah.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-03 14:02:49 -07:00
Eric W. Biederman 392ee1e6dd [PATCH] msi: Safer state caching.
There are two ways pci_save_state and pci_restore_state are used.  As
helper functions during suspend/resume, and as helper functions around
a hardware reset event.  When used as helper functions around a hardware
reset event there is no reason to believe the calls will be paired, nor
is there a good reason to believe that if we restore the msi state from
before the reset that it will match the current msi state.  Since arch
code may change the msi message without going through the driver, drivers
currently do not have enough information to even know when to call
pci_save_state to ensure they will have msi state in sync with the other
kernel irq reception data structures.

It turns out the solution is straight forward, cache the state in the
existing msi data structures (not the magic pci saved things) and
have the msi code update the cached state each time we write to the hardware.
This means we never need to read the hardware to figure out what the hardware
state should be.

By modifying the caching in this manner we get to remove our save_state
routines and only need to provide restore_state routines.

The only fields that were at all tricky to regenerate were the msi and msi-x
control registers and the way we regenerate them currently is a bit dependent
upon assumptions on how we use the allow msi registers to be configured and used
making the code a little bit brittle.  If we ever change what cases we allow
or how we configure the msi bits we can address the fragility then.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-12 16:31:50 -07:00