Commit Graph

498 Commits

Author SHA1 Message Date
Ralf Baechle 5e0e61dd2c Merge branch 'next/kvm' into mips-for-linux-next 2013-05-09 17:56:40 +02:00
Sanjay Lal 2f4d9b5442 KVM/MIPS32: Do not call vcpu_load when injecting interrupts.
Signed-off-by: Sanjay Lal <sanjayl@kymasys.com>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2013-05-09 17:48:22 +02:00
Asias He 7dac16c379 KVM: Fix kvm_irqfd_init initialization
In commit a0f155e96 'KVM: Initialize irqfd from kvm_init()', when
kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko),
kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be
called on the error handling path. This way, the kvm_irqfd system will
not be ready.

This patch fix the following:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: vhost_net
CPU 6
Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK
RIP: 0010:[<ffffffff81c0721e>]  [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
RSP: 0018:ffff880221721cc8  EFLAGS: 00010046
RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950
RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000
RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002
R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8
R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000
FS:  00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640)
Stack:
 ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086
 0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000
 ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000
Call Trace:
 [<ffffffff810ac5c5>] __queue_work+0x45/0x2d0
 [<ffffffff810ac8fe>] queue_work_on+0x8e/0xa0
 [<ffffffff810ac949>] queue_work+0x19/0x20
 [<ffffffff81009b6b>] irqfd_deactivate+0x4b/0x60
 [<ffffffff8100a69d>] kvm_irqfd+0x39d/0x580
 [<ffffffff81007a27>] kvm_vm_ioctl+0x207/0x5b0
 [<ffffffff810c9545>] ? update_curr+0xf5/0x180
 [<ffffffff811b66e8>] do_vfs_ioctl+0x98/0x550
 [<ffffffff810c1f5e>] ? finish_task_switch+0x4e/0xe0
 [<ffffffff81c054aa>] ? __schedule+0x2ea/0x710
 [<ffffffff811b6bf7>] sys_ioctl+0x57/0x90
 [<ffffffff8140ae9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff81c0f602>] system_call_fastpath+0x16/0x1b
Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 <f0> 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f
RIP  [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
RSP <ffff880221721cc8>
CR2: 0000000000000000
---[ end trace 13fb1e4b6e5ab21f ]---

Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-08 13:15:35 +03:00
Linus Torvalds 01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Scott Wood db6ae61581 kvm: Add compat_ioctl for device control API
This API shouldn't have 32/64-bit issues, but VFS assumes it does
unless told otherwise.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-05 12:14:15 +03:00
Paul Mackerras 5975a2e095 KVM: PPC: Book3S: Add API for in-kernel XICS emulation
This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-05-02 15:28:36 +02:00
Scott Wood 07f0a7bdec kvm: destroy emulated devices on VM exit
The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:28 +02:00
Scott Wood 5df554ad5b kvm/ppc/mpic: in-kernel MPIC emulation
Hook the MPIC code up to the KVM interfaces, add locking, etc.

Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:23 +02:00
Scott Wood 852b6d57dc kvm: add device control API
Currently, devices that are emulated inside KVM are configured in a
hardcoded manner based on an assumption that any given architecture
only has one way to do it.  If there's any need to access device state,
it is done through inflexible one-purpose-only IOCTLs (e.g.
KVM_GET/SET_LAPIC).  Defining new IOCTLs for every little thing is
cumbersome and depletes a limited numberspace.

This API provides a mechanism to instantiate a device of a certain
type, returning an ID that can be used to set/get attributes of the
device.  Attributes may include configuration parameters (e.g.
register base address), device state, operational commands, etc.  It
is similar to the ONE_REG API, except that it acts on devices rather
than vcpus.

Both device types and individual attributes can be tested without having
to create the device or get/set the attribute, without the need for
separately managing enumerated capabilities.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:20 +02:00
Alexander Graf 7df35f5496 KVM: Move irqfd resample cap handling to generic code
Now that we have most irqfd code completely platform agnostic, let's move
irqfd's resample capability return to generic code as well.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26 20:27:19 +02:00
Alexander Graf aa8d5944b8 KVM: Move irq routing to generic code
The IRQ routing set ioctl lives in the hacky device assignment code inside
of KVM today. This is definitely the wrong place for it. Move it to the much
more natural kvm_main.c.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26 20:27:17 +02:00
Alexander Graf a725d56a02 KVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTING
Quite a bit of code in KVM has been conditionalized on availability of
IOAPIC emulation. However, most of it is generically applicable to
platforms that don't have an IOPIC, but a different type of irq chip.

Make code that only relies on IRQ routing, not an APIC itself, on
CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
2013-04-26 20:27:14 +02:00
Yang Zhang a20ed54d6e KVM: VMX: Add the deliver posted interrupt algorithm
Only deliver the posted interrupt when target vcpu is running
and there is no previous interrupt pending in pir.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16 16:32:40 -03:00
Yang Zhang 3d81bc7e96 KVM: Call common update function when ioapic entry changed.
Both TMR and EOI exit bitmap need to be updated when ioapic changed
or vcpu's id/ldr/dfr changed. So use common function instead eoi exit
bitmap specific function.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-16 16:32:40 -03:00
Yang Zhang aa2fbe6d44 KVM: Let ioapic know the irq line status
Userspace may deliver RTC interrupt without query the status. So we
want to track RTC EOI for this case.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-04-15 23:20:34 -03:00
Geoff Levand e3ba45b804 KVM: Move kvm_spurious_fault to x86.c
The routine kvm_spurious_fault() is an x86 specific routine, so
move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c.

Fixes this sparse warning when building on arm64:

  virt/kvm/kvm_main.c⚠️ symbol 'kvm_spurious_fault' was not declared. Should it be static?

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-08 13:02:06 +03:00
Geoff Levand 39369f7a8b KVM: Make local routines static
The routines get_user_page_nowait(), kvm_io_bus_sort_cmp(), kvm_io_bus_insert_dev()
and kvm_io_bus_get_first_dev() are only referenced within kvm_main.c, so give them
static linkage.

Fixes sparse warnings like these:

  virt/kvm/kvm_main.c: warning: symbol 'get_user_page_nowait' was not declared. Should it be static?

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-08 13:02:04 +03:00
Andrew Honig 8f964525a1 KVM: Allow cross page reads and writes from cached translations.
This patch adds support for kvm_gfn_to_hva_cache_init functions for
reads and writes that will cross a page.  If the range falls within
the same memslot, then this will be a fast operation.  If the range
is split between two memslots, then the slower kvm_read_guest and
kvm_write_guest are used.

Tested: Test against kvm_clock unit tests.

Signed-off-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-04-07 13:05:35 +03:00
Raghavendra K T 7bc7ae25b1 kvm: Iterate over only vcpus that are preempted
This helps in filtering out the eligible candidates further and
thus potentially helps in quickly allowing preempted lockholders to run.
Note that if a vcpu was spinning during preemption we filter them
by checking whether they are preempted due to pause loop exit.

Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-03-11 11:37:22 +02:00
Raghavendra K T 3a08a8f9f0 kvm: Record the preemption status of vcpus using preempt notifiers
Note that we mark as preempted only when vcpu's task state was
Running during preemption.

Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ
for their precious suggestions. Thanks Srikar for an idea on avoiding
rcu lock while checking task state that improved overcommit numbers.

Reviewed-by: Chegu Vinod <chegu_vinod@hp.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-03-11 11:37:08 +02:00
Cornelia Huck a0f155e964 KVM: Initialize irqfd from kvm_init().
Currently, eventfd introduces module_init/module_exit functions
to initialize/cleanup the irqfd workqueue. This only works, however,
if no other module_init/module_exit functions are built into the
same module.

Let's just move the initialization and cleanup to kvm_init and kvm_exit.
This way, it is also clearer where kvm startup may fail.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-05 19:12:16 -03:00
Takuya Yoshikawa 8482644aea KVM: set_memory_region: Refactor commit_memory_region()
This patch makes the parameter old a const pointer to the old memory
slot and adds a new parameter named change to know the change being
requested: the former is for removing extra copying and the latter is
for cleaning up the code.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00
Takuya Yoshikawa 7b6195a91d KVM: set_memory_region: Refactor prepare_memory_region()
This patch drops the parameter old, a copy of the old memory slot, and
adds a new parameter named change to know the change being requested.

This not only cleans up the code but also removes extra copying of the
memory slot structure.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00
Takuya Yoshikawa 74d0727cb7 KVM: set_memory_region: Make kvm_mr_change available to arch code
This will be used for cleaning up prepare/commit_memory_region() later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00
Takuya Yoshikawa 47ae31e257 KVM: set_memory_region: Drop user_alloc from set_memory_region()
Except ia64's stale code, KVM_SET_MEMORY_REGION support, this is only
used for sanity checks in __kvm_set_memory_region() which can easily
be changed to use slot id instead.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-03-04 20:21:08 -03:00