Commit Graph

237 Commits

Author SHA1 Message Date
Linus Torvalds
53861af9a1 Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
 "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

  On top of tht is the major rework of lguest, to use PCI and virtio
  1.0, to double-check the implementation.

  Then comes the inevitable fixes and cleanups from that work"

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
  virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
  virtio_net: unconditionally define struct virtio_net_hdr_v1.
  tools/lguest: don't use legacy definitions for net device in example launcher.
  virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
  tools/lguest: use common error macros in the example launcher.
  tools/lguest: give virtqueues names for better error messages
  tools/lguest: more documentation and checking of virtio 1.0 compliance.
  lguest: don't look in console features to find emerg_wr.
  tools/lguest: don't start devices until DRIVER_OK status set.
  tools/lguest: handle indirect partway through chain.
  tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
  tools/lguest: rename virtio_pci_cfg_cap field to match spec.
  tools/lguest: fix features_accepted logic in example launcher.
  tools/lguest: handle device reset correctly in example launcher.
  virtual: Documentation: simplify and generalize paravirt_ops.txt
  lguest: remove NOTIFY call and eventfd facility.
  lguest: remove NOTIFY facility from demonstration launcher.
  lguest: use the PCI console device's emerg_wr for early boot messages.
  lguest: always put console in PCI slot #1.
  ...
2015-02-18 09:24:01 -08:00
Luis R. Rodriguez
a2e1999157 virtual: Documentation: simplify and generalize paravirt_ops.txt
The general documentation we have for pv_ops is currenty present
on the IA64 docs, but since this documentation covers IA64 xen
enablement and IA64 Xen support got ripped out a while ago
through commit d52eefb47 present since v3.14-rc1 lets just
simplify, generalize and move the pv_ops documentation to a
shared place.

Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-02-13 17:15:44 +10:30
Michael Mueller
658b6eda20 KVM: s390: add cpu model support
This patch enables cpu model support in kvm/s390 via the vm attribute
interface.

During KVM initialization, the host properties cpuid, IBC value and the
facility list are stored in the architecture specific cpu model structure.

During vcpu setup, these properties are taken to initialize the related SIE
state. This mechanism allows to adjust the properties from user space and thus
to implement different selectable cpu models.

This patch uses the IBC functionality to block instructions that have not
been implemented at the requested CPU type and GA level compared to the
full host capability.

Userspace has to initialize the cpu model before vcpu creation. A cpu model
change of running vcpus is not possible.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:13 +01:00
Paolo Bonzini
8fff5e374a Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
KVM: s390: fixes and features for kvm/next (3.20)

1. Generic
- sparse warning (make function static)
- optimize locking
- bugfixes for interrupt injection
- fix MVPG addressing modes

2. hrtimer/wakeup fun
A recent change can cause KVM hangs if adjtime is used in the host.
The hrtimer might wake up too early or too late. Too early is fatal
as vcpu_block will see that the wakeup condition is not met and
sleep again. This CPU might never wake up again.
This series addresses this problem. adjclock slowing down the host
clock will result in too late wakeups. This will require more work.
In addition to that we also change the hrtimer from REALTIME to
MONOTONIC to avoid similar problems with timedatectl set-time.

3. sigp rework
We will move all "slow" sigps to QEMU (protected with a capability that
can be enabled) to avoid several races between concurrent SIGP orders.

4. Optimize the shadow page table
Provide an interface to announce the maximum guest size. The kernel
will use that to make the pagetable 2,3,4 (or theoretically) 5 levels.

5. Provide an interface to set the guest TOD
We now use two vm attributes instead of two oneregs, as oneregs are
vcpu ioctl and we don't want to call them from other threads.

6. Protected key functions
The real HMC allows to enable/disable protected key CPACF functions.
Lets provide an implementation + an interface for QEMU to activate
this the protected key instructions.
2015-01-23 14:33:36 +01:00
David Hildenbrand
2444b352c3 KVM: s390: forward most SIGP orders to user space
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand
2822545f9f KVM: s390: new parameter for SIGP STOP irqs
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
Dominik Dingel
8c0a7ce606 KVM: s390: Allow userspace to limit guest memory size
With commit c6c956b80b ("KVM: s390/mm: support gmap page tables with less
than 5 levels") we are able to define a limit for the guest memory size.

As we round up the guest size in respect to the levels of page tables
we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB.
We currently limit the guest size to 16 TB, which means we end up
creating a page table structure supporting guest sizes up to 8192 TB.

This patch introduces an interface that allows userspace to tune
this limit. This may bring performance improvements for small guests.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:30 +01:00
Andre Przywara
4fa96afd94 arm/arm64: KVM: force alignment of VGIC dist/CPU/redist addresses
Although the GIC architecture requires us to map the MMIO regions
only at page aligned addresses, we currently do not enforce this from
the kernel side.
Restrict any vGICv2 regions to be 4K aligned and any GICv3 regions
to be 64K aligned. Document this requirement.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:33 +01:00
Andre Przywara
ac3d373564 arm/arm64: KVM: allow userland to request a virtual GICv3
With all of the GICv3 code in place now we allow userland to ask the
kernel for using a virtual GICv3 in the guest.
Also we provide the necessary support for guests setting the memory
addresses for the virtual distributor and redistributors.
This requires some userland code to make use of that feature and
explicitly ask for a virtual GICv3.
Document that KVM_CREATE_IRQCHIP only works for GICv2, but is
considered legacy and using KVM_CREATE_DEVICE is preferred.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-20 18:25:33 +01:00
Eric Auger
065c003482 KVM: arm/arm64: vgic: add init entry to VGIC KVM device
Since the advent of VGIC dynamic initialization, this latter is
initialized quite late on the first vcpu run or "on-demand", when
injecting an IRQ or when the guest sets its registers.

This initialization could be initiated explicitly much earlier
by the users-space, as soon as it has provided the requested
dimensioning parameters.

This patch adds a new entry to the VGIC KVM device that allows
the user to manually request the VGIC init:
- a new KVM_DEV_ARM_VGIC_GRP_CTRL group is introduced.
- Its first attribute is KVM_DEV_ARM_VGIC_CTRL_INIT

The rationale behind introducing a group is to be able to add other
controls later on, if needed.

Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-01-11 14:12:15 +01:00
Paolo Bonzini
333bce5aac Merge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
VGIC init flow.

Conflicts:
	arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
2014-12-15 13:06:40 +01:00
Christoffer Dall
cf5d318865 arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.

Also, clarify the behavior and expectations for exits to user space with
the KVM_EXIT_SYSTEM_EVENT case.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-13 14:15:27 +01:00
Christoffer Dall
f7fa034dc8 arm/arm64: KVM: Clarify KVM_ARM_VCPU_INIT ABI
It is not clear that this ioctl can be called multiple times for a given
vcpu.  Userspace already does this, so clarify the ABI.

Also specify that userspace is expected to always make secondary and
subsequent calls to the ioctl with the same parameters for the VCPU as
the initial call (which userspace also already does).

Add code to check that userspace doesn't violate that ABI in the future,
and move the kvm_vcpu_set_target() function which is currently
duplicated between the 32-bit and 64-bit versions in guest.c to a common
static function in arm.c, shared between both architectures.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-13 14:15:26 +01:00
Christoffer Dall
3ad8b3de52 arm/arm64: KVM: Correct KVM_ARM_VCPU_INIT power off option
The implementation of KVM_ARM_VCPU_INIT is currently not doing what
userspace expects, namely making sure that a vcpu which may have been
turned off using PSCI is returned to its initial state, which would be
powered on if userspace does not set the KVM_ARM_VCPU_POWER_OFF flag.

Implement the expected functionality and clarify the ABI.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-13 14:15:25 +01:00
Tiejun Chen
c32a42721c kvm: Documentation: remove ia64
kvm/ia64 is gone, clean up Documentation too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-20 11:08:55 +01:00
Paolo Bonzini
173ede4ddd Merge tag 'kvm-s390-next-20141107' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes for kvm/next (3.19) and stable

1. We should flush TLBs for load control instruction emulation (stable)
2. A workaround for a compiler bug that renders ACCESS_ONCE broken (stable)
3. Fix program check handling for load control
4. Documentation Fix
2014-11-07 15:39:44 +01:00
Dominik Dingel
365dc16335 KVM: fix vm device attribute documentation
Documentation uses incorrect attribute names for some vm device
attributes: fix this.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-07 11:11:11 +01:00
Michael S. Tsirkin
7f05db6a20 kvm: drop unsupported capabilities, fix documentation
No kernel ever reported KVM_CAP_DEVICE_MSIX, KVM_CAP_DEVICE_MSI,
KVM_CAP_DEVICE_ASSIGNMENT, KVM_CAP_DEVICE_DEASSIGNMENT.

This makes the documentation wrong, and no application ever
written to use these capabilities has a chance to work correctly.
The only way to detect support is to try, and test errno for ENOTTY.
That's unfortunate, but we can't fix the past.

Document the actual semantics, and drop the definitions from
the exported header to make it easier for application
developers to note and fix the bug.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:29 +01:00
Tiejun Chen
91690bf32e Documentation: virtual: kvm: correct one bit description in APF case
When commit 6adba52742 (KVM: Let host know whether the guest can
handle async PF in non-userspace context.) is introduced, actually
bit 2 still is reserved and should be zero. Instead, bit 1 is 1 to
indicate if asynchronous page faults can be injected when vcpu is
in cpl == 0, and also please see this,

in the file kvm_para.h, #define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1).

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:27 +01:00
Paolo Bonzini
e77d99d4a4 Merge tag 'kvm-arm-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next
Changes for KVM for arm/arm64 for 3.18

This includes a bunch of changes:
 - Support read-only memory slots on arm/arm64
 - Various changes to fix Sparse warnings
 - Correctly detect write vs. read Stage-2 faults
 - Various VGIC cleanups and fixes
 - Dynamic VGIC data strcuture sizing
 - Fix SGI set_clear_pend offset bug
 - Fix VTTBR_BADDR Mask
 - Correctly report the FSC on Stage-2 faults

Conflicts:
	virt/kvm/eventfd.c
	[duplicate, different patch where the kvm-arm version broke x86.
	 The kvm tree instead has the right one]
2014-09-27 11:03:33 +02:00
Bharat Bhushan
bc8a4e5c25 KVM: PPC: BOOKE: Add one_reg documentation of SPRG9 and DBSR
This was missed in respective one_reg implementation patch.

Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-22 10:11:32 +02:00
Marc Zyngier
a98f26f183 arm/arm64: KVM: vgic: make number of irqs a configurable attribute
In order to make the number of interrupts configurable, use the new
fancy device management API to add KVM_DEV_ARM_VGIC_GRP_NR_IRQS as
a VGIC configurable attribute.

Userspace can now specify the exact size of the GIC (by increments
of 32 interrupts).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2014-09-18 18:48:58 -07:00
Alex Bennée
209cf19fcd KVM: fix api documentation of KVM_GET_EMULATED_CPUID
It looks like when this was initially merged it got accidentally included
in the following section. I've just moved it back in the correct section
and re-numbered it as other ioctls have been added since.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-10 11:34:39 +02:00
Alex Bennée
4bd9d3441e KVM: document KVM_SET_GUEST_DEBUG api
In preparation for working on the ARM implementation I noticed the debug
interface was missing from the API document. I've pieced together the
expected behaviour from the code and commit messages written it up as
best I can.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-10 11:33:12 +02:00
David Matlack
ee3d1570b5 kvm: fix potentially corrupt mmio cache
vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03 10:03:41 +02:00