Commit Graph

572 Commits

Author SHA1 Message Date
Gleb Natapov
a2ac07fe29 Fix NULL dereference in gfn_to_hva_prot()
gfn_to_memslot() can return NULL or invalid slot. We need to check slot
validity before accessing it.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-10-03 10:08:52 +03:00
Radim Krčmář
28b441e240 kvm: free resources after canceling async_pf
When we cancel 'async_pf_execute()', we should behave as if the work was
never scheduled in 'kvm_setup_async_pf()'.
Fixes a bug when we can't unload module because the vm wasn't destroyed.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:53:15 +03:00
Paolo Bonzini
ba6a354154 KVM: mmu: allow page tables to be in read-only slots
Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set.  Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.

Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-17 12:52:31 +03:00
Linus Torvalds
45d9a2220f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "Unfortunately, this merge window it'll have a be a lot of small piles -
  my fault, actually, for not keeping #for-next in anything that would
  resemble a sane shape ;-/

  This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
  cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
  components) + several long-standing patches from various folks.

  There definitely will be a lot more (starting with Miklos'
  check_submount_and_drop() series)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  direct-io: Handle O_(D)SYNC AIO
  direct-io: Implement generic deferred AIO completions
  add formats for dentry/file pathnames
  kvm eventfd: switch to fdget
  powerpc kvm: use fdget
  switch fchmod() to fdget
  switch epoll_ctl() to fdget
  switch copy_module_from_fd() to fdget
  git simplify nilfs check for busy subtree
  ibmasmfs: don't bother passing superblock when not needed
  don't pass superblock to hypfs_{mkdir,create*}
  don't pass superblock to hypfs_diag_create_files
  don't pass superblock to hypfs_vm_create_files()
  oprofile: get rid of pointless forward declarations of struct super_block
  oprofilefs_create_...() do not need superblock argument
  oprofilefs_mkdir() doesn't need superblock argument
  don't bother with passing superblock to oprofile_create_stats_files()
  oprofile: don't bother with passing superblock to ->create_files()
  don't bother passing sb to oprofile_create_files()
  coh901318: don't open-code simple_read_from_buffer()
  ...
2013-09-05 08:50:26 -07:00
Al Viro
cffe78d92c kvm eventfd: switch to fdget
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-03 23:04:45 -04:00
Christoffer Dall
8d98915b6b ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs
For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in
one word and since there are 32 private (per cpu) irqs, we have 8
private u32 fields on the vgic_bytemap struct.  We shift the offset from
the base of the register group right by 2, giving us the word index
instead of the field index.  But then there are 8 private words, not 4,
which is also why we subtract 8 words from the offset of the shared
words.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30 16:12:38 +03:00
Marc Zyngier
6545eae3d7 ARM: KVM: vgic: fix GICD_ICFGRn access
All the code in handle_mmio_cfg_reg() assumes the offset has
been shifted right to accomodate for the 2:1 bit compression,
but this is only done when getting the register address.

Shift the offset early so the code works mostly unchanged.

Reported-by: Zhaobo (Bob, ERC) <zhaobo@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30 16:12:16 +03:00
Marc Zyngier
986af8e078 ARM: KVM: vgic: simplify vgic_get_target_reg
vgic_get_target_reg is quite complicated, for no good reason.
Actually, it is fairly easy to write it in a much more efficient
way by using the target CPU array instead of the bitmap.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-30 16:03:07 +03:00
Paolo Bonzini
c21fbff16b KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp
This is the type-safe comparison function, so the double-underscore is
not related.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-28 09:39:40 +03:00
Andrea Arcangeli
11feeb4980 kvm: optimize away THP checks in kvm_is_mmio_pfn()
The checks on PG_reserved in the page structure on head and tail pages
aren't necessary because split_huge_page wouldn't transfer the
PG_reserved bit from head to tail anyway.

This was a forward-thinking check done in the case PageReserved was
set by a driver-owned page mapped in userland with something like
remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
possible right now). It was meant to be very safe, but it's overkill
as it's unlikely split_huge_page could ever run without the driver
noticing and tearing down the hugepage itself.

And if a driver in the future will really want to map a reserved
hugepage in userland using an huge pmd it should simply take care of
marking all subpages reserved too to keep KVM safe. This of course
would require such a hypothetical driver to tear down the huge pmd
itself and splitting the hugepage itself, instead of relaying on
split_huge_page, but that sounds very reasonable, especially
considering split_huge_page wouldn't currently transfer the reserved
bit anyway.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-27 11:01:10 +03:00
Yann Droneaud
24009b0549 kvm: use anon_inode_getfd() with O_CLOEXEC flag
KVM uses anon_inode_get() to allocate file descriptors as part
of some of its ioctls. But those ioctls are lacking a flag argument
allowing userspace to choose options for the newly opened file descriptor.

In such case it's advised to use O_CLOEXEC by default so that
userspace is allowed to choose, without race, if the file descriptor
is going to be inherited across exec().

This patch set O_CLOEXEC flag on all file descriptors created
with anon_inode_getfd() to not leak file descriptors across exec().

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-08-26 13:19:35 +03:00
Paolo Bonzini
a343c9b767 KVM: introduce __kvm_io_bus_sort_cmp
kvm_io_bus_sort_cmp is used also directly, not just as a callback for
sort and bsearch.  In these cases, it is handy to have a type-safe
variant.  This patch introduces such a variant, __kvm_io_bus_sort_cmp,
and uses it throughout kvm_main.c.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-29 09:01:14 +02:00
Takuya Yoshikawa
e59dbe09f8 KVM: Introduce kvm_arch_memslots_updated()
This is called right after the memslots is updated, i.e. when the result
of update_memslots() gets installed in install_new_memslots().  Since
the memslots needs to be updated twice when we delete or move a memslot,
kvm_arch_commit_memory_region() does not correspond to this exactly.

In the following patch, x86 will use this new API to check if the mmio
generation has reached its maximum value, in which case mmio sptes need
to be flushed out.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Acked-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-18 12:29:25 +02:00
Cornelia Huck
126a5af520 KVM: kvm-io: support cookies
Add new functions kvm_io_bus_{read,write}_cookie() that allows users of
the kvm io infrastructure to use a cookie value to speed up lookup of a
device on an io bus.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-07-18 12:29:23 +02:00
Linus Torvalds
fe489bf450 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "On the x86 side, there are some optimizations and documentation
  updates.  The big ARM/KVM change for 3.11, support for AArch64, will
  come through Catalin Marinas's tree.  s390 and PPC have misc cleanups
  and bugfixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits)
  KVM: PPC: Ignore PIR writes
  KVM: PPC: Book3S PR: Invalidate SLB entries properly
  KVM: PPC: Book3S PR: Allow guest to use 1TB segments
  KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match
  KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry
  KVM: PPC: Book3S PR: Fix proto-VSID calculations
  KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL
  KVM: Fix RTC interrupt coalescing tracking
  kvm: Add a tracepoint write_tsc_offset
  KVM: MMU: Inform users of mmio generation wraparound
  KVM: MMU: document fast invalidate all mmio sptes
  KVM: MMU: document fast invalidate all pages
  KVM: MMU: document fast page fault
  KVM: MMU: document mmio page fault
  KVM: MMU: document write_flooding_count
  KVM: MMU: document clear_spte_count
  KVM: MMU: drop kvm_mmu_zap_mmio_sptes
  KVM: MMU: init kvm generation close to mmio wrap-around value
  KVM: MMU: add tracepoint for check_mmio_spte
  KVM: MMU: fast invalidate all mmio sptes
  ...
2013-07-03 13:21:40 -07:00
Gleb Natapov
96f7edf9a5 Merge git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-3.11 into queue
KVM/ARM pull request for 3.11 merge window

* tag 'kvm-arm-3.11' of git://git.linaro.org/people/cdall/linux-kvm-arm.git:
  ARM: kvm: don't include drivers/virtio/Kconfig
  Update MAINTAINERS: KVM/ARM work now funded by Linaro
  arm/kvm: Cleanup KVM_ARM_MAX_VCPUS logic
  ARM: KVM: clear exclusive monitor on all exception returns
  ARM: KVM: add missing dsb before invalidating Stage-2 TLBs
  ARM: KVM: perform save/restore of PAR
  ARM: KVM: get rid of S2_PGD_SIZE
  ARM: KVM: don't special case PC when doing an MMIO
  ARM: KVM: use phys_addr_t instead of unsigned long long for HYP PGDs
  ARM: KVM: remove dead prototype for __kvm_tlb_flush_vmid
  ARM: KVM: Don't handle PSCI calls via SMC
  ARM: KVM: Allow host virt timer irq to be different from guest timer virt irq
2013-06-27 14:20:54 +03:00
Gleb Natapov
24f7bb52e9 KVM: Fix RTC interrupt coalescing tracking
This reverts most of the f1ed0450a5. After
the commit kvm_apic_set_irq() no longer returns accurate information
about interrupt injection status if injection is done into disabled
APIC. RTC interrupt coalescing tracking relies on the information to be
accurate and cannot recover if it is not.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-27 14:20:53 +03:00
Anup Patel
5ae7f87a56 ARM: KVM: Allow host virt timer irq to be different from guest timer virt irq
The arch_timer irq numbers (or PPI numbers) are implementation dependent,
so the host virtual timer irq number can be different from guest virtual
timer irq number.

This patch ensures that host virtual timer irq number is read from DTB and
guest virtual timer irq is determined based on vcpu target type.

Signed-off-by: Anup Patel <anup.patel@linaro.org>
Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
2013-06-26 10:50:02 -07:00
Amos Kong
6ea34c9b78 kvm: exclude ioeventfd from counting kvm_io_range limit
We can easily reach the 1000 limit by start VM with a couple
hundred I/O devices (multifunction=on). The hardcode limit
already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).

In userspace, we already have maximum file descriptor to
limit ioeventfd count. But kvm_io_bus devices also are used
for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
by maximum file descriptor.

Currently only ioeventfds take too much kvm_io_bus devices,
so just exclude it from counting kvm_io_range limit.

Also fixed one indent issue in kvm_host.h

Signed-off-by: Amos Kong <akong@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-04 11:49:38 +03:00
Marc Zyngier
7275acdfe2 ARM: KVM: move GIC/timer code to a common location
As KVM/arm64 is looming on the horizon, it makes sense to move some
of the common code to a single location in order to reduce duplication.

The code could live anywhere. Actually, most of KVM is already built
with a bunch of ugly ../../.. hacks in the various Makefiles, so we're
not exactly talking about style here. But maybe it is time to start
moving into a less ugly direction.

The include files must be in a "public" location, as they are accessed
from non-KVM files (arch/arm/kernel/asm-offsets.c).

For this purpose, introduce two new locations:
- virt/kvm/arm/ : x86 and ia64 already share the ioapic code in
  virt/kvm, so this could be seen as a (very ugly) precedent.
- include/kvm/  : there is already an include/xen, and while the
  intent is slightly different, this seems as good a location as
  any

Eventually, we should probably have independant Makefiles at every
levels (just like everywhere else in the kernel), but this is just
the first step.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-19 15:13:08 +03:00
Jan Kiszka
f1ed0450a5 KVM: x86: Remove support for reporting coalesced APIC IRQs
Since the arrival of posted interrupt support we can no longer guarantee
that coalesced IRQs are always reported to the IRQ source. Moreover,
accumulated APIC timer events could cause a busy loop when a VCPU should
rather be halted. The consensus is to remove coalesced tracking from the
LAPIC.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-14 12:09:02 +03:00
Wei Yongjun
afc2f792cd KVM: add missing misc_deregister() on error in kvm_init()
Add the missing misc_deregister() before return from kvm_init()
in the debugfs init error handling case.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-12 12:06:24 +03:00
Linus Torvalds
c67723ebbb Merge tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Gleb Natapov:
 "Most of the fixes are in the emulator since now we emulate more than
  we did before for correctness sake we see more bugs there, but there
  is also an OOPS fixed and corruption of xcr0 register."

* tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: emulator: emulate SALC
  KVM: emulator: emulate XLAT
  KVM: emulator: emulate AAM
  KVM: VMX: fix halt emulation while emulating invalid guest sate
  KVM: Fix kvm_irqfd_init initialization
  KVM: x86: fix maintenance of guest/host xcr0 state
2013-05-10 09:08:21 -07:00
Linus Torvalds
daf799cca8 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:

 - More work on DT support for various platforms

 - Various fixes that were to late to make it straight into 3.9

 - Improved platform support, in particular the Netlogic XLR and
   BCM63xx, and the SEAD3 and Malta eval boards.

 - Support for several Ralink SOC families.

 - Complete support for the microMIPS ASE which basically reencodes the
   existing MIPS32/MIPS64 ISA to use non-constant size instructions.

 - Some fallout from LTO work which remove old cruft and will generally
   make the MIPS kernel easier to maintain and resistant to compiler
   optimization, even in absence of LTO.

 - KVM support.  While MIPS has announced hardware virtualization
   extensions this KVM extension uses trap and emulate mode for
   virtualization of MIPS32.  More KVM work to add support for VZ
   hardware virtualizaiton extensions and MIPS64 will probably already
   be merged for 3.11.

Most of this has been sitting in -next for a long time.  All defconfigs
have been build or run time tested except three for which fixes are being
sent by other maintainers.

Semantic conflict with kvm updates done as per Ralf

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits)
  MIPS: Add new GIC clockevent driver.
  MIPS: Formatting clean-ups for clocksources.
  MIPS: Refactor GIC clocksource code.
  MIPS: Move 'gic_frequency' to common location.
  MIPS: Move 'gic_present' to common location.
  MIPS: MIPS16e: Add unaligned access support.
  MIPS: MIPS16e: Support handling of delay slots.
  MIPS: MIPS16e: Add instruction formats.
  MIPS: microMIPS: Optimise 'strnlen' core library function.
  MIPS: microMIPS: Optimise 'strlen' core library function.
  MIPS: microMIPS: Optimise 'strncpy' core library function.
  MIPS: microMIPS: Optimise 'memset' core library function.
  MIPS: microMIPS: Add configuration option for microMIPS kernel.
  MIPS: microMIPS: Disable LL/SC and fix linker bug.
  MIPS: microMIPS: Add vdso support.
  MIPS: microMIPS: Add unaligned access support.
  MIPS: microMIPS: Support handling of delay slots.
  MIPS: microMIPS: Add support for exception handling.
  MIPS: microMIPS: Floating point support.
  MIPS: microMIPS: Fix macro naming in micro-assembler.
  ...
2013-05-10 07:48:05 -07:00
Ralf Baechle
5e0e61dd2c Merge branch 'next/kvm' into mips-for-linux-next 2013-05-09 17:56:40 +02:00