Commit Graph

40476 Commits

Author SHA1 Message Date
Jan Kiszka
1e001d49f9 KVM: Refactor IRQ names of assigned devices
Cosmetic change, but it helps to correlate IRQs with PCI devices.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:21 +02:00
Jan Kiszka
0645211c43 KVM: Switch assigned device IRQ forwarding to threaded handler
This improves the IRQ forwarding for assigned devices: By using the
kernel's threaded IRQ scheme, we can get rid of the latency-prone work
queue and simplify the code in the same run.

Moreover, we no longer have to hold assigned_dev_lock while raising the
guest IRQ, which can be a lenghty operation as we may have to iterate
over all VCPUs. The lock is now only used for synchronizing masking vs.
unmasking of INTx-type IRQs, thus is renames to intx_lock.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:20 +02:00
Jan Kiszka
d89f5eff70 KVM: Clean up vm creation and release
IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.

It also fixes error clean-up on failures of kvm_create_vm for IA64.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:29:09 +02:00
Xiao Guangrong
0730388b97 KVM: cleanup async_pf tracepoints
Use 'DECLARE_EVENT_CLASS' to cleanup async_pf tracepoints

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:57 +02:00
Xiao Guangrong
c9b263d2be KVM: fix tracing kvm_try_async_get_page
Tracing 'async' and *pfn is useless, since 'async' is always true,
and '*pfn' is always "fault_pfn'

We can trace 'gva' and 'gfn' instead, it can help us to see the
life-cycle of an async_pf

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:56 +02:00
Takuya Yoshikawa
515a01279a KVM: pre-allocate one more dirty bitmap to avoid vmalloc()
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.

This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.

Performance improvement:
  I measured performance for the case of VGA update by trace-cmd.
  The result was 1.5 times faster than the original one.

  In the case of live migration, the improvement ratio depends on the workload
  and the guest memory size. In general, the larger the memory size is the more
  benefits we get.

Note:
  This does not change other architectures's logic but the allocation size
  becomes twice. This will increase the actual memory consumption only when
  the new size changes the number of pages allocated by vmalloc().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:46 +02:00
Gleb Natapov
64be500706 KVM: x86: trace "exit to userspace" event
Add tracepoint for userspace exit.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:44 +02:00
Marcelo Tosatti
612819c3c6 KVM: propagate fault r/w information to gup(), allow read-only memory
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:28:40 +02:00
Gleb Natapov
7c90705bf2 KVM: Inject asynchronous page fault into a PV guest if page is swapped out.
Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.

Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.

Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:17 +02:00
Gleb Natapov
344d9588a9 KVM: Add PV MSR to enable asynchronous page faults delivery.
Guest enables async PF vcpu functionality using this MSR.

Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:12 +02:00
Gleb Natapov
49c7754ce5 KVM: Add memory slot versioning and use it to provide fast guest write interface
Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:08 +02:00
Gleb Natapov
af585b921e KVM: Halt vcpu if page it tries to access is swapped out
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

[avi: remove call to get_user_pages_noio(), nacked by Linus; this
      makes everything synchrnous again]

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:21:39 +02:00
Linus Torvalds
46bdfe6a50 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86: avoid high BIOS area when allocating address space
  x86: avoid E820 regions when allocating address space
  x86: avoid low BIOS area when allocating address space
  resources: add arch hook for preventing allocation in reserved areas
  Revert "resources: support allocating space within a region from the top down"
  Revert "PCI: allocate bus resources from the top down"
  Revert "x86/PCI: allocate space from the end of a region, not the beginning"
  Revert "x86: allocate space within a region top-down"
  Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
  PCI: Update MCP55 quirk to not affect non HyperTransport variants
2010-12-18 10:13:24 -08:00
Bjorn Helgaas
fcb119183c resources: add arch hook for preventing allocation in reserved areas
This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.

Sometimes that can be done by just putting a region in the resource tree,
but there are cases where that doesn't work well.  For example, x86 BIOS
E820 reservations are not related to devices, so they may overlap part of,
all of, or more than a device resource, so they may not end up at the
correct spot in the resource tree.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:01:09 -08:00
Bjorn Helgaas
c0f5ac5426 Revert "resources: support allocating space within a region from the top down"
This reverts commit e7f8567db9.

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-12-17 10:00:59 -08:00
Linus Torvalds
508817e7cd Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM / Runtime: Fix pm_runtime_suspended()
  PM / Hibernate: Restore old swap signature to avoid user space breakage
  PM / Hibernate: Fix PM_POST_* notification with user-space suspend
2010-12-17 09:31:59 -08:00
Linus Torvalds
8efc1a1a22 Merge branch 'bkl_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'bkl_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  [media] uvcvideo: Convert to unlocked_ioctl
  [media] uvcvideo: Lock stream mutex when accessing format-related information
  [media] uvcvideo: Move mmap() handler to uvc_queue.c
  [media] uvcvideo: Move mutex lock/unlock inside uvc_free_buffers
  [media] uvcvideo: Lock controls mutex when querying menus
  [media] v4l2-dev: fix race condition
  [media] V4L: improve the BKL replacement heuristic
  [media] v4l2-dev: use mutex_lock_interruptible instead of plain mutex_lock
  [media] cx18: convert to unlocked_ioctl
  [media] radio-timb: convert to unlocked_ioctl
  [media] sh_vou: convert to unlocked_ioctl
  [media] cafe_ccic: replace ioctl by unlocked_ioctl
  [media] et61x251_core: trivial conversion to unlocked_ioctl
  [media] sn9c102: convert to unlocked_ioctl
  [media] BKL: trivial ioctl -> unlocked_ioctl video driver conversions
  [media] typhoon: convert to unlocked_ioctl
  [media] si4713: convert to unlocked_ioctl
  [media] tea5764: convert to unlocked_ioctl
  [media] cadet: use unlocked_ioctl
  [media] BKL: trivial BKL removal from V4L2 radio drivers
2010-12-17 09:28:17 -08:00
Linus Torvalds
a3383e8372 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  fanotify: fill in the metadata_len field on struct fanotify_event_metadata
  fanotify: split version into version and metadata_len
  fanotify: Dont try to open a file descriptor for the overflow event
  fanotify: Introduce FAN_NOFD
  fanotify: do not leak user reference on allocation failure
  inotify: stop kernel memory leak on file creation failure
  fanotify: on group destroy allow all waiters to bypass permission check
  fanotify: Dont allow a mask of 0 if setting or removing a mark
  fanotify: correct broken ref counting in case adding a mark failed
  fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called
  fanotify: remove packed from access response message
  fanotify: deny permissions when no event was sent
2010-12-16 15:45:49 -08:00
Linus Torvalds
68a4ec9c03 Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus: (28 commits)
  MIPS: Add a CONFIG_FORCE_MAX_ZONEORDER Kconfig option.
  MIPS: LD/SD o32 macro GAS fix update
  MIPS: Alchemy: fix build with SERIAL_8250=n
  MIPS: Rename mips_dma_cache_sync back to dma_cache_sync
  MIPS: MT: Fix typo in comment.
  SSB: Fix nvram_get on BCM47xx platform
  MIPS: BCM47xx: Swap serial console if ttyS1 was specified.
  MIPS: BCM47xx: Use sscanf for parsing mac address
  MIPS: BCM47xx: Fill values for b43 into SSB sprom
  MIPS: BCM47xx: Do not read config from CFE
  MIPS: FDT size is a be32
  MIPS: Fix CP0 COUNTER clockevent race
  MIPS: Fix regression on BCM4710 processor detection
  MIPS: JZ4740: Fix pcm device name
  MIPS: Separate two consecutive loads in memset.S
  MIPS: Send proper signal and siginfo on FP emulator faults.
  MIPS: AR7: Fix loops per jiffies on TNETD7200 devices
  MIPS: AR7: Fix double ar7_gpio_init declaration
  MIPS: Rework GENERIC_HARDIRQS Kconfig.
  MIPS: Alchemy: Add return value check for strict_strtoul()
  ...
2010-12-16 15:45:25 -08:00
Hauke Mehrtens
3f84622d7c SSB: Fix nvram_get on BCM47xx platform
The nvram_get function was never in the mainline kernel, it only existed in
an external OpenWrt patch. Use nvram_getenv function, which is in mainline
and use an include instead of an extra function declaration.  et0macaddr
contains the mac address in text from like 00:11:22:33:44:55. We have to
parse it before adding it into macaddr.

nvram_parse_macaddr will be merged into asm/mach-bcm47xx/nvram.h through
the MIPS git tree and will be available soon. It will not build now without
nvram_parse_macaddr, but it hasn't before either.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
To: linux-mips@linux-mips.org
Cc: mb@bu3sch.de
Cc: netdev@vger.kernel.org
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Michael Buesch <mb@bu3sch.de>
Patchwork: https://patchwork.linux-mips.org/patch/1849/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-12-16 18:11:00 +00:00
Linus Torvalds
9fe4145530 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2
  Input: wacom - add another Bamboo Pen ID (0xd4)
2010-12-16 08:33:44 -08:00
Rafael J. Wysocki
f08f5a0add PM / Runtime: Fix pm_runtime_suspended()
There are some situations (e.g. in __pm_generic_call()), where
pm_runtime_suspended() is used to decide whether or not to execute
a device's (system) ->suspend() callback.  The callback is not
executed if pm_runtime_suspended() returns true, but it does so
for devices that don't even support runtime PM, because the
power.disable_depth device field is ignored by it.  This leads to
problems (i.e. devices are not suspened when they should), so rework
pm_runtime_suspended() so that it returns false if the device's
power.disable_depth field is different from zero.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
2010-12-16 17:12:25 +01:00
Jeremy Fitzhardinge
667c78afae xen: Provide a variant of __RING_SIZE() that is an integer constant expression
Without this, gcc 4.5 won't compile xen-netfront and xen-blkfront, where
this is being used to specify array sizes.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Miller <davem@davemloft.net>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-15 12:34:28 -08:00
Alexey Zaytsev
62731fa0c8 fanotify: split version into version and metadata_len
To implement per event type optional headers we are interested in
knowing how long the metadata structure is.  This patch slits the __u32
version field into a __u8 version and a __u16 metadata_len field (with
__u8 left over).  This should allow for backwards compat ABI.

Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
[rewrote descrtion and changed object sizes and ordering - eparis]
Signed-off-by: Eric Paris <eparis@redhat.com>
2010-12-15 13:56:33 -05:00
Dmitry Torokhov
ab4e019219 Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2
The desire to keep old names for the EVIOCGKEYCODE/EVIOCSKEYCODE while
extending them to support large scancodes was a mistake. While we tried
to keep ABI intact (and we succeeded in doing that, programs compiled
on older kernels will work on newer ones) there is still a problem with
recompiling existing software with newer kernel headers.

New kernel headers will supply updated ioctl numbers and kernel will
expect that userspace will use struct input_keymap_entry to set and
retrieve keymap data. But since the names of ioctls are still the same
userspace will happily compile even if not adjusted to make use of the
new structure and will start miraculously fail in the field.

To avoid this issue let's revert EVIOCGKEYCODE/EVIOCSKEYCODE definitions
and add EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 so that userspace can explicitly
select the style of ioctls it wants to employ.

Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Acked-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-12-14 23:55:21 -08:00