When userspace provides the doorbell address for an MSI to be
injected into the guest, we find a KVM device which feels responsible.
Lets check that this device is really an emulated ITS before we make
real use of the container_of-ed pointer.
[ Moved NULL-pointer check to caller of static function
- Christoffer ]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently we register an ITS device upon userland issuing the CTLR_INIT
ioctl to mark initialization of the ITS as done.
This deviates from the initialization sequence of the existing GIC
devices and does not play well with the way QEMU handles things.
To be more in line with what we are used to, register the ITS(es) just
before the first VCPU is about to run, so in the map_resources() call.
This involves iterating through the list of KVM devices and map each
ITS that we find.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
There are two problems with the current implementation of the MMIO
handlers for the propbaser and pendbaser:
First, the write to the value itself is not guaranteed to be an atomic
64-bit write so two concurrent writes to the structure field could be
intermixed.
Second, because we do a read-modify-update operation without any
synchronization, if we have two 32-bit accesses to separate parts of the
register, we can loose one of them.
By using the atomic cmpxchg64 we should cover both issues above.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Right now the following sequence of events can happen:
1. Thread X calls vgic_put_irq
2. Thread Y calls vgic_add_lpi
3. Thread Y gets lpi_list_lock
4. Thread X drops the ref count to 0 and blocks on lpi_list_lock
5. Thread Y finds the irq via the lpi_list_lock, raises the ref
count to 1, and release the lpi_list_lock.
6. Thread X proceeds and frees the irq.
Avoid this by holding the spinlock around the kref_put.
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
During low memory conditions, we could be dereferencing a NULL pointer
when vgic_add_lpi fails to allocate memory.
Consider for example this call sequence:
vgic_its_cmd_handle_mapi
itte->irq = vgic_add_lpi(kvm, lpi_nr);
update_lpi_config(kvm, itte->irq, NULL);
ret = kvm_read_guest(kvm, propbase + irq->intid
^^^^
kaboom?
Instead, return an error pointer from vgic_add_lpi and check the return
value from its single caller.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
According to the KVM API documentation a successful MSI injection
should return a value > 0 on success.
Return possible errors in vgic_its_trigger_msi() and report a
successful injection back to userland, while also reporting the
case where the MSI could not be delivered due to the guest not
having the LPI mapped, for instance.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
KVM/ARM Changes for v4.8 - Take 2
Includes GSI routing support to go along with the new VGIC and a small fix that
has been cooking in -next for a while.
Pull KVM updates from Paolo Bonzini:
- ARM: GICv3 ITS emulation and various fixes. Removal of the
old VGIC implementation.
- s390: support for trapping software breakpoints, nested
virtualization (vSIE), the STHYI opcode, initial extensions
for CPU model support.
- MIPS: support for MIPS64 hosts (32-bit guests only) and lots
of cleanups, preliminary to this and the upcoming support for
hardware virtualization extensions.
- x86: support for execute-only mappings in nested EPT; reduced
vmexit latency for TSC deadline timer (by about 30%) on Intel
hosts; support for more than 255 vCPUs.
- PPC: bugfixes.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
KVM: PPC: Introduce KVM_CAP_PPC_HTM
MIPS: Select HAVE_KVM for MIPS64_R{2,6}
MIPS: KVM: Reset CP0_PageMask during host TLB flush
MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
MIPS: KVM: Sign extend MFC0/RDHWR results
MIPS: KVM: Fix 64-bit big endian dynamic translation
MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
MIPS: KVM: Use 64-bit CP0_EBase when appropriate
MIPS: KVM: Set CP0_Status.KX on MIPS64
MIPS: KVM: Make entry code MIPS64 friendly
MIPS: KVM: Use kmap instead of CKSEG0ADDR()
MIPS: KVM: Use virt_to_phys() to get commpage PFN
MIPS: Fix definition of KSEGX() for 64-bit
KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
kvm: x86: nVMX: maintain internal copy of current VMCS
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
KVM: arm64: vgic-its: Simplify MAPI error handling
KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
...
Pull smp hotplug updates from Thomas Gleixner:
"This is the next part of the hotplug rework.
- Convert all notifiers with a priority assigned
- Convert all CPU_STARTING/DYING notifiers
The final removal of the STARTING/DYING infrastructure will happen
when the merge window closes.
Another 700 hundred line of unpenetrable maze gone :)"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
timers/core: Correct callback order during CPU hot plug
leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
powerpc/numa: Convert to hotplug state machine
arm/perf: Fix hotplug state machine conversion
irqchip/armada: Avoid unused function warnings
ARC/time: Convert to hotplug state machine
clocksource/atlas7: Convert to hotplug state machine
clocksource/armada-370-xp: Convert to hotplug state machine
clocksource/exynos_mct: Convert to hotplug state machine
clocksource/arm_global_timer: Convert to hotplug state machine
rcu: Convert rcutree to hotplug state machine
KVM/arm/arm64/vgic-new: Convert to hotplug state machine
smp/cfd: Convert core to hotplug state machine
x86/x2apic: Convert to CPU hotplug state machine
profile: Convert to hotplug state machine
timers/core: Convert to hotplug state machine
hrtimer: Convert to hotplug state machine
x86/tboot: Convert to hotplug state machine
arm64/armv8 deprecated: Convert to hotplug state machine
hwtracing/coresight-etm4x: Convert to hotplug state machine
...
kvm_set_routing_entry is changing in -next, and causes things to
explode. Add a temporary workaround that should be dropped when
we hit 4.8-rc1
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM/ARM changes for Linux 4.8
- GICv3 ITS emulation
- Simpler idmap management that fixes potential TLB conflicts
- Honor the kernel protection in HYP mode
- Removal of the old vgic implementation
Up to now, only irqchip routing entries could be set. This patch
adds the capability to insert MSI routing entries.
For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: this
include SPI irqchip routes plus MSI routes. In the future this
might be extended.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch adds compilation and link against irqchip.
Main motivation behind using irqchip code is to enable MSI
routing code. In the future irqchip routing may also be useful
when targeting multiple irqchips.
Routing standard callbacks now are implemented in vgic-irqfd:
- kvm_set_routing_entry
- kvm_set_irq
- kvm_set_msi
They only are supported with new_vgic code.
Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.
So from now on IRQCHIP routing is enabled and a routing table entry
must exist for irqfd injection to succeed for a given SPI. This patch
builds a default flat irqchip routing table (gsi=irqchip.pin) covering
all the VGIC SPI indexes. This routing table is overwritten by the
first first user-space call to KVM_SET_GSI_ROUTING ioctl.
MSI routing setup is not yet allowed.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If we care to move all the checks that do not involve any memory
allocation, we can simplify the MAPI error handling. Let's do that,
it cannot hurt.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
vgic_its_cmd_handle_mapi has an extra "subcmd" argument, which is
already contained in the command buffer that all command handlers
obtain from the command queue. Let's drop it, as it is not that
useful.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
There is no need to have separate functions to validate devices
and collections, as the architecture doesn't really distinguish the
two, and they are supposed to be managed the same way.
Let's turn the DevID checker into a generic one.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Going from the ITS structure to the corresponding KVM structure
would be quite handy at times. The kvm_device pointer that is
passed at create time is quite convenient for this, so let's
keep a copy of it in the vgic_its structure.
This will be put to a good use in subsequent patches.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Instead of spreading random allocations all over the place,
consolidate allocation/init/freeing of collections in a pair
of constructor/destructor.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When checking that the storage address of a device entry is valid,
it is critical to compute the actual address of the entry, rather
than relying on the beginning of the page to match a CPU page of
the same size: for example, if the guest places the table at the
last 64kB boundary of RAM, but RAM size isn't a multiple of 64kB...
Fix this by computing the actual offset of the device ID in the
L2 page, and check the corresponding GFN.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Checking that the device_id fits if the table, and we must make
sure that the associated memory is also accessible.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The nr_entries variable in vgic_its_check_device_id actually
describe the size of the L1 table, and not the number of
entries in this table.
Rename it to l1_tbl_size, so that we can now change the code
with a better understanding of what is what.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The ITS tables are stored in LE format. If the host is reading
a L1 table entry to check its validity, it must convert it to
the CPU endianness.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The current code will fail on valid indirect tables, and happily
use the ones that are pointing out of the guest RAM. Funny what a
small "!" can do for you...
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Instead of sprinkling raw kref_get() calls everytime we cannot
do a normal vgic_get_irq(), use the existing vgic_get_irq_kref(),
which does the same thing and is paired with a vgic_put_irq().
vgic_get_irq_kref is moved to vgic.h in order to be easily shared.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>