Commit Graph

288082 Commits

Author SHA1 Message Date
Scott Wood 29ac26efbd KVM: PPC: booke: Fix int_pending calculation for MSR[EE] paravirt
int_pending was only being lowered if a bit in pending_exceptions
was cleared during exception delivery -- but for interrupts, we clear
it during IACK/TSR emulation.  This caused paravirt for enabling
MSR[EE] to be ineffective.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood c59a6a3e4e KVM: PPC: booke: Check for MSR[WE] in prepare_to_enter
This prevents us from inappropriately blocking in a KVM_SET_REGS
ioctl -- the MSR[WE] will take effect when the guest is next entered.

It also causes SRR1[WE] to be set when we enter the guest's interrupt
handler, which is what e500 hardware is documented to do.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood 25051b5a5a KVM: PPC: Move prepare_to_enter call site into subarch code
This function should be called with interrupts disabled, to avoid
a race where an exception is delivered after we check, but the
resched kick is received before we disable interrupts (and thus doesn't
actually trigger the exit code that would recheck exceptions).

booke already does this properly in the lightweight exit case, but
not on initial entry.

For now, move the call of prepare_to_enter into subarch-specific code so
that booke can do the right thing here.  Ideally book3s would do the same
thing, but I'm having a hard time seeing where it does any interrupt
disabling of this sort (plus it has several additional call sites), so
I'm deferring the book3s fix to someone more familiar with that code.
book3s behavior should be unchanged by this patch.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:26 +02:00
Scott Wood 7e28e60ef9 KVM: PPC: Rename deliver_interrupts to prepare_to_enter
This function also updates paravirt int_pending, so rename it
to be more obvious that this is a collection of checks run prior
to (re)entering a guest.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Scott Wood 1d1ef22208 KVM: PPC: booke: check for signals in kvmppc_vcpu_run
Currently we check prior to returning from a lightweight exit,
but not prior to initial entry.

book3s already does a similar test.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan 7401f6266d KVM: PPC: booke: Do Not start decrementer when SPRN_DEC set 0
As per specification the decrementer interrupt not happen when DEC is written
with 0. Also when DEC is zero, no decrementer running. So we should not start
hrtimer for decrementer when DEC = 0.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan dc2babfea5 KVM: PPC: Fix DEC truncation for greater than 0xffff_ffff/1000
kvmppc_emulate_dec() uses dec_nsec of type unsigned long and does below calculation:

        dec_nsec = vcpu->arch.dec;
        dec_nsec *= 1000;
This will truncate if DEC value "vcpu->arch.dec" is greater than 0xffff_ffff/1000.
For example : For tb_ticks_per_usec = 4a, we can not set decrementer more than ~58ms.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan f9208427f7 PPC: Fix race in mtmsr paravirt implementation
The current implementation of mtmsr and mtmsrd are racy in that it does:

  * check (int_pending == 0)
  ---> host sets int_pending = 1 <---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is written.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Alexander Graf 95325e6b19 KVM: PPC: E500: Support hugetlbfs
With hugetlbfs support emerging on e500, we should also support KVM
backing its guest memory by it.

This patch adds support for hugetlbfs into the e500 shadow mmu code.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood 841741f23b KVM: PPC: e500: Don't hardcode PIR=0
The hardcoded behavior prevents proper SMP support.

user space shall specify the vcpu's PIR as the vcpu id.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood 303b7c97e3 KVM: PPC: e500: tlbsx: fix tlb0 esel
It should contain the way, not the absolute TLB0 index.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood dc83b8bc02 KVM: PPC: e500: MMU API
This implements a shared-memory API for giving host userspace access to
the guest's TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood 0164c0f0c4 KVM: PPC: e500: clear up confusion between host and guest entries
Split out the portions of tlbe_priv that should be associated with host
entries into tlbe_ref.  Base victim selection on the number of hardware
entries, not guest entries.

For TLB1, where one guest entry can be mapped by multiple host entries,
we use the host tlbe_ref for tracking page references.  For the guest
TLB0 entries, we still track it with gtlb_priv, to avoid having to
retranslate if the entry is evicted from the host TLB but not the
guest TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Scott Wood 90b92a6f51 KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all
The only place it makes sense to call this function already needs
to have preemption disabled.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Scott Wood 3bf3cdcc14 KVM: PPC: e500: don't translate gfn to pfn with preemption disabled
Delay allocation of the shadow pid until we're ready to disable
preemption and write the entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Christian Borntraeger 59674c1a6a KVM: s390: provide access guest registers via kvm_run
This patch adds the access registers to the kvm_run structure.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:22 +02:00
Christian Borntraeger 5a32c1af56 KVM: s390: provide general purpose guest registers via kvm_run
This patch adds the general purpose registers to the kvm_run structure.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:22 +02:00
Christian Borntraeger 60b413c924 KVM: s390: provide the prefix register via kvm_run
Add the prefix register to the synced register field in kvm_run.
While we need the prefix register most of the time read-only, this
patch also adds handling for guest dirtying of the prefix register.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:22 +02:00
Christian Borntraeger b9e5dc8d45 KVM: provide synchronous registers in kvm_run
On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:22 +02:00
Christian Borntraeger 8d26cf7b40 KVM: s390: rework code that sets the prefix
There are several places in the kvm module, which set the prefix register.
Since we need to flush the cpu, lets combine this operation into a helper
function. This helper will also explicitely mask out the unused bits.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:21 +02:00
Boris Ostrovsky 2b036c6b86 KVM: SVM: Add support for AMD's OSVW feature in guests
In some cases guests should not provide workarounds for errata even when the
physical processor is affected. For example, because of erratum 400 on family
10h processors a Linux guest will read an MSR (resulting in VMEXIT) before
going to idle in order to avoid getting stuck in a non-C0 state. This is not
necessary: HLT and IO instructions are intercepted and therefore there is no
reason for erratum 400 workaround in the guest.

This patch allows us to present a guest with certain errata as fixed,
regardless of the state of actual hardware.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:21 +02:00
Davidlohr Bueso 4a58ae614a KVM: MMU: unnecessary NX state assignment
We can remove the first ->nx state assignment since it is assigned afterwards anyways.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:21 +02:00
Carsten Otte 3e6afcf1d8 KVM: s390: Fix return code for unknown ioctl numbers
This patch fixes the return code of kvm_arch_vcpu_ioctl in case
of an unkown ioctl number.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:21 +02:00
Carsten Otte 1efd0f595a KVM: s390: ucontrol: announce capability for user controlled vms
This patch announces a new capability KVM_CAP_S390_UCONTROL that
indicates that kvm can now support virtual machines that are
controlled by userspace.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:20 +02:00
Carsten Otte 3777594d5a KVM: s390: fix assumption for KVM_MAX_VCPUS
This patch fixes definition of the idle_mask and the local_int array
in kvm_s390_float_interrupt. Previous definition had 64 cpus max
hardcoded instead of using KVM_MAX_VCPUS.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:20 +02:00