This is necessary for qemu to be able to pass the right information
to the guest, such as the supported page sizes and corresponding
encodings in the SLB and hash table, which can vary depending
on the processor type, the type of KVM used (PR vs HV) and the
version of KVM
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix compilation on hv, adjust for newer ioctl numbers]
Signed-off-by: Alexander Graf <agraf@suse.de>
There is nothing in the code for emulating TCE tables in the kernel
that prevents it from working on "PR" KVM... other than ifdef's and
location of the code.
This and moves the bulk of the code there to a new file called
book3s_64_vio.c.
This speeds things up a bit on my G5.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[agraf: fix for hv kvm, 32bit, whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
Guest r8 register is held in the scratch register and stored correctly,
so remove the instruction that clobbers it. Guest r13 was missing from vcpu,
store it there.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
While handling an exit, we should listen for interrupts and make sure to
receive them when they arrive, to keep our latencies low.
Signed-off-by: Alexander Graf <agraf@suse.de>
When running on a system that is HV capable, some interrupts use HSRR
SPRs instead of the normal SRR SPRs. These are also used in the Linux
handlers to jump back to code after an interrupt got processed.
Unfortunately, in our "jump back to the real host handler after we've
done the context switch" code, we were only setting the SRR SPRs,
rendering Linux to jump back to some invalid IP after it's processed
the interrupt.
This fixes random crashes on p7 opal mode with PR KVM for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
Stbux writes the address it's operating on to the register specified in ra,
not into the data source register.
Signed-off-by: Alexander Graf <agraf@suse.de>
Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields
which led to memory overflow on 64-bit. Use lwz/stw instead.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
While messing around with the SLBs we're running in real mode. The
entry to guest space goes through rfid, which is context synchronizing,
so there's no need to manually synchronize anything through isync.
With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2035607 to 2181301 exits per second.
Signed-off-by: Alexander Graf <agraf@suse.de>
By shuffling a few instructions around we can execute more memory
loads in parallel, giving us a small performance boost.
With this patch and a simple priviledged SPR access loop guest, I get
a speed bump from 2013052 to 2035607 exits per second.
Signed-off-by: Alexander Graf <agraf@suse.de>
For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and
non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies.
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Time for which the hrtimer is started for decrementer emulation is calculated
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC
reprogramming (if needed) and which calculate timebase ticks using the
multiplier and shifter mechanism implemented within clockevent layer.
It was observed that this conversion (timebase->time->timebase) are not
correct because the mechanism are not consistent.
In our setup it adds 2% jitter.
With this patch clockevent multiplier and shifter mechanism are used when
starting hrtimer for decrementer emulation. Now the jitter is < 0.5%.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Keep track of minimum and maximum address mapped by tlb1.
This helps in TLBMISS handling in KVM to quick check whether the address lies in mapped range.
If address does not lies in this range then no need to look in each tlb1 entry of tlb1 array.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Let userspace know the number of max and supported cpus for kvm on s390.
Return KVM_MAX_VCPUS (currently 64) for both values.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Lets replace the old open coded version of diag 0x44 (which relied on
compat_sched_yield) with kvm_vcpu_on_spin.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch implements the directed yield hypercall found on other
System z hypervisors. It delegates execution time to the virtual cpu
specified in the instruction's parameter.
Useful to avoid long spinlock waits in the guest.
Christian Borntraeger: moved common code in virt/kvm/
Signed-off-by: Konstantin Weitz <WEITZKON@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We can't run PIT IRQ injection work in the interrupt context of the host
timer. This would allow the user to influence the handler complexity by
asking for a broadcast to a large number of VCPUs. Therefore, this work
was pushed into workqueue context in 9d244caf2e. However, this prevents
prioritizing the PIT injection over other task as workqueues share
kernel threads.
This replaces the workqueue with a kthread worker and gives that thread
a name in the format "kvm-pit/<owner-process-pid>". That allows to
identify and adjust the kthread priority according to the VM process
parameters.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This helps to identify sections and it also fixes the numbering from
4.54 to 4.61.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The patch introduces a bitmap that will hold reasons apic should be
checked during vmexit. This is in a preparation for vp eoi patch
that will add one more check on vmexit. With the bitmap we can do
if(apic_attention) to check everything simultaneously which will
add zero overhead on the fast path.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.
By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>