Kernel bugzilla 10273
As reported by Jos van der Ende, ever since commit
5a606b72a4 ("[SPARC64]: Do not ACK an
INO if it is disabled or inprogress.") sun4u interrupts
can get stuck.
What this changset did was add the following conditional to
the various IRQ chip ->enable() handlers on sparc64:
if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS)))
return;
which is correct, however it means that special care is needed
in the ->enable() method.
Specifically we must put the interrupt into IDLE state during
an enable, or else it might never be sent out again.
Setting the INO interrupt state to IDLE resets the state machine,
the interrupt input to the INO is retested by the hardware, and
if an interrupt is being signalled by the device, the INO
moves back into TRANSMIT state, and an interrupt vector is sent
to the cpu.
The two sun4v IRQ chip handlers were already doing this properly,
only sun4u got it wrong.
Signed-off-by: David S. Miller <davem@davemloft.net>
Invoke the desc->handle_irq directly in the top-level dispatch,
just like other sophisticated ports.
This will allow us to decrease the cost of the MSI queue dispatch.
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not use *alloc_bootmem_low*(), because ARCH_LOW_ADDRESS_LIMIT
is 4GB and this results in boot failures if all of the physical
memory in the machine is above 4GB.
Signed-off-by: David S. Miller <davem@davemloft.net>
It no longer translates to "real irqs" (aka. INO buckets)
so reflect that by using a simpler name for it.
Signed-off-by: David S. Miller <davem@davemloft.net>
All the users go through virt_irq_to_bucket() and essentially
want to go from a virt_irq to an INO, but we have a way
to do that already via virt_to_real_irq_table[].dev_ino.
This also allows us to kill both virt_to_real_irq() and
virt_irq_to_bucket().
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a place to stick INO information in the
virt_to_real_irq_table[], which is currently only used for VIRQs.
And that is readily accessible from the one __irq_ino() call site.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were simply concatenating the devhandle and devino and using that
as the cookie, which defeats the entire purpose of the VIRQ hypervisor
interfaces.
Now that we use physical addresses for the INO buckets, we can
allocate them dynamically for VIRQs and encode the cookies as
~__pa(bucket). This allows us to test for and decode the cookie with
a simple:
brlz $reg1, 1f
xnor $reg1, %g0, $reg2
sequence.
This works because bit 64 is never set in traditional
INO vectors, and it is also never set in a physical
address. So xnor'ing the physical address of the bucket
always gives us a negative number, and thus a unique
condition we can test cheaply.
Inspired by ideas from Greg Onufer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we chain IVEC entries using 32-bit "pointers"
because we know that the ivector_table is in the main
kernel image, thus below 4GB.
This uses proper 64-bit pointers instead.
Whilst this bloats up the kernel image size, this sets
the infrastructure necessary to significantly shrink the
kernel size by using physical addresses and dynamically
allocating the ivector table.
Signed-off-by: David S. Miller <davem@davemloft.net>
This also makes us use the MSI queues correctly.
Each MSI queue is serviced by a normal sun4u/sun4v INO interrupt
handler. This handler runs the MSI queue and dispatches the
virtual interrupts indicated by arriving MSIs in that MSI queue.
All of the common logic is placed in pci_msi.c, with callbacks to
handle the PCI controller specific aspects of the operations.
This common infrastructure will make it much easier to add MSG
support.
Signed-off-by: David S. Miller <davem@davemloft.net>
The support code is identical to the hypervisor sun4v stuff,
just replacing the hypervisor calls with register reads and
writes in the Fire controller.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) sun4{u,v}_build_msi() have improper return value handling.
We should always return negative error codes, instead of
using the magic value "0" which could in fact be a valid
MSI number.
2) sun4{u,v}_build_msi() should return -ENOMEM instead of
calling prom_prom() halt with kzalloc() of the interrupt
data fails.
3) We 'remembered' the MSI number using a singleton in the
struct device archdata area, this doesn't work for MSI-X
which can cause multiple MSIs assosciated with one device.
Delete that archdata member, and instead store the MSI
number in the IRQ chip data area.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes we were using 32-bit values and the top bits were
getting inadvertantly chopped off. This will matter for the
forthcoming Fire controller MSI support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Every time a cpu is added via hotplug, we allocate the per-cpu MONDO
queues but we never free them up. Freeing isn't easy since the first
cpu gets this memory from bootmem.
Therefore, the simplest thing to do to fix this bug is to allocate the
queues for all possible cpus at boot time.
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_handle and dev_ino fields don't match up exactly to
the traditional IMAP_IGN and IMAP_INO masks.
So store them away in a table and look them up directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
dr-cpu unconfigure requests will walk throught he enabled
IRQs and trigger ->set_affinity so that the going-down
cpu no longer has INOs targetted to it.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is also a partial workaround for a bug in the LDOM firmware which
double-transmits RX inos during high load. Without this, such an
event causes the kernel to loop forever in the interrupt call chain
ACK'ing but never actually running the IRQ handler (and thus clearing
the interrupt condition in the device).
There is still a bad potential effect when double INOs occur,
not covered by this changeset. Namely, if the INO is already on
the per-cpu INO vector list, we still blindly re-insert it and
thus we can end up losing interrupts already linked in after
it.
We could deal with that by traversing the list before insertion,
but that's too expensive for this edge case.
Signed-off-by: David S. Miller <davem@davemloft.net>