Commit Graph

37829 Commits

Author SHA1 Message Date
Dexuan Cui
2acf923e38 KVM: VMX: Enable XSAVE/XRSTOR for guest
This patch enable guest to use XSAVE/XRSTOR instructions.

We assume that host_xcr0 would use all possible bits that OS supported.

And we loaded xcr0 in the same way we handled fpu - do it as late as we can.

Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:46:31 +03:00
Avi Kivity
d94e1dc9af KVM: Get rid of KVM_REQ_KICK
KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal
operation.  This causes the fast path check for a clear vcpu->requests
to fail all the time, triggering tons of atomic operations.

Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:37 +03:00
Huang Ying
bf998156d2 KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages
In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]

Reported-by: Max Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-01 10:35:26 +03:00
Linus Torvalds
8785eb1e7c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  davinci: da850/omap-l138 evm: account for DEFDCDC{2,3} being tied high
  regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
  wm8350-regulator: fix wm8350_register_regulator error handling
  ab3100: fix off-by-one value range checking for voltage selector
2010-07-28 19:59:55 -07:00
Anuj Aggarwal
7d14831e21 regulator: tps6507x: allow driver to use DEFDCDC{2,3}_HIGH register
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

In TPS6507x, depending on the status of DEFDCDC{2,3} pin either
DEFDCDC{2,3}_LOW or DEFDCDC{2,3}_HIGH register needs to be read or
programmed to change the output voltage.

The current driver assumes DEFDCDC{2,3} pins are always tied low
and thus operates only on DEFDCDC{2,3}_LOW register. This need
not always be the case (as is found on OMAP-L138 EVM).

Unfortunately, software cannot read the status of DEFDCDC{2,3} pins.
So, this information is passed through platform data depending on
how the board is wired.

Signed-off-by: Anuj Aggarwal <anuj.aggarwal@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2010-07-28 15:09:26 +01:00
Linus Torvalds
a376bca610 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  s2io: fixing DBG_PRINT() macro
  ath9k: fix dma direction for map/unmap in ath_rx_tasklet
  net: dev_forward_skb should call nf_reset
  net sched: fix race in mirred device removal
  tun: avoid BUG, dump packet on GSO errors
  bonding: set device in RLB ARP packet handler
  wimax/i2400m: Add PID & VID for Intel WiMAX 6250
  ipv6: Don't add routes to ipv6 disabled interfaces.
  net: Fix skb_copy_expand() handling of ->csum_start
  net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c
  macvtap: Limit packet queue length
  ixgbe/igb: catch invalid VF settings
  bnx2x: Advance a module version
  bnx2x: Protect statistics ramrod and sequence number
  bnx2x: Protect a SM state change
  wireless: use netif_rx_ni in ieee80211_send_layer2_update
2010-07-27 09:21:00 -07:00
Linus Torvalds
dbbe4649d6 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
  ACPI: create "processor.bm_check_disable" boot param
  ACPI: skip checking BM_STS if the BIOS doesn't ask for it
  ACPI: fix unused function warning
  ACPI: processor: fix processor_physically_present on UP
  ACPI video: fix string mismatch for Sony SR290 laptop
  ACPI battery: don't invoke power_supply_changed twice when battery is hot-added
  ACPI: handle systems which asynchoronously enable ACPI mode
2010-07-26 08:10:00 -07:00
stephen hemminger
3b87956ea6 net sched: fix race in mirred device removal
This fixes hang when target device of mirred packet classifier
action is removed.

If a mirror or redirection action is configured to cause packets
to go to another device, the classifier holds a ref count, but was assuming
the adminstrator cleaned up all redirections before removing. The fix
is to add a notifier and cleanup during unregister.

The new list is implicitly protected by RTNL mutex because
it is held during filter add/delete as well as notifier.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 21:04:20 -07:00
Len Brown
0e1cf38889 Merge branch 'bugzilla-16396' into release 2010-07-24 23:26:22 -04:00
Rafael J. Wysocki
72ad5d77fb ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
Commit 2a6b69765a
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work.  To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-24 23:26:09 -04:00
Linus Torvalds
86c65a7857 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  vmlinux.lds: fix .data..init_task output section (fix popwerpc boot)
  powerpc: Fix erroneous lmb->memblock conversions
  powerpc/mm: Add some debug output when hash insertion fails
  powerpc/mm: Fix bugs in huge page hashing
  powerpc/mm: Move around testing of _PAGE_PRESENT in hash code
  powerpc/mm: Handle hypervisor pte insert failure in __hash_page_huge
  powerpc/kexec: Fix boundary case for book-e kexec memory limits
2010-07-23 13:26:16 -07:00
Linus Torvalds
339a2afcaa Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available
  perf annotate: Fix handling of goto labels that are valid hex numbers
  tracing: Properly align linker defined symbols
  perf symbols: Fix directory descriptor leaking
  perf: Fix various display bugs with parent filtering
2010-07-23 13:24:02 -07:00
Sam Ravnborg
da5e37efe8 vmlinux.lds: fix .data..init_task output section (fix popwerpc boot)
The .data..init_task output section was missing
a load offset causing a popwerpc target to fail to boot.

Sean MacLennan tracked it down to the definition of
INIT_TASK_DATA_SECTION().

There are only two users of INIT_TASK_DATA_SECTION()
in the kernel today: cris and popwerpc.
cris do not support relocatable kernels and is thus not
impacted by this change.

Fix INIT_TASK_DATA_SECTION() to specify load offset like
all other output sections.

Reported-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 13:45:12 +10:00
Len Brown
4a973f2495 Merge branch 'bugzilla-15886' into release 2010-07-22 18:18:28 -04:00
Len Brown
718be4aaf3 ACPI: skip checking BM_STS if the BIOS doesn't ask for it
It turns out that there is a bit in the _CST for Intel FFH C3
that tells the OS if we should be checking BM_STS or not.

Linux has been unconditionally checking BM_STS.
If the chip-set is configured to enable BM_STS,
it can retard or completely prevent entry into
deep C-states -- as illustrated by turbostat:

http://userweb.kernel.org/~lenb/acpi/utils/pmtools/turbostat/

ref: Intel Processor Vendor-Specific ACPI Interface Specification
table 4 "_CST FFH GAS Field Encoding"
Bit 1: Set to 1 if OSPM should use Bus Master avoidance for this C-state

https://bugzilla.kernel.org/show_bug.cgi?id=15886

Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-22 16:54:27 -04:00
Herbert Xu
8a35747a5d macvtap: Limit packet queue length
Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length.  This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Chris Wright noticed that for this patch to work, we need a
non-zero TX queue length.  This patch includes his work to change
the default macvtap TX queue length to 500.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:08:56 -07:00
Linus Torvalds
e916beab22 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
  debug_core,kdb: fix kgdb_connected bit set in the wrong place
  Fix merge regression from external kdb to upstream kdb
  repair gdbstub to match the gdbserial protocol specification
  kdb: break out of kdb_ll() when command is terminated
2010-07-22 11:44:26 -07:00
Jason Wessel
edd63cb6b9 sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function
The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2010-07-21 19:27:07 -05:00
Linus Torvalds
95977d0ef2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  math-emu: correct test for downshifting fraction in _FP_FROM_INT()
  perf: Add DWARF register lookup for sparc
  MAINTAINERS: Add SBUS driver path to sparc entry.
  drivers/sbus: Remove unnecessary casts of private_data
  sparc: remove homegrown L1_CACHE_ALIGN macro
  sparc64: fix the build error due to smp_kgdb_capture_client()
  sparc64: Fix maybe_change_configuration() PCR setting.
  arch/sparc/kernel: Eliminate what looks like a NULL pointer dereference
  sparc64: Update defconfig.
  sunsu: Fix use after free in su_remove().
  sunserial: Don't call add_preferred_console() when console= is specified.
  sparc32: Kill none_mask, it's bogus.
2010-07-21 09:28:50 -07:00
Mikael Pettersson
f8324e20f8 math-emu: correct test for downshifting fraction in _FP_FROM_INT()
The kernel's math-emu code contains a macro _FP_FROM_INT() which is
used to convert an integer to a raw normalized floating-point value.
It does this basically in three steps:

1. Compute the exponent from the number of leading zero bits.
2. Downshift large fractions to put the MSB in the right position
   for normalized fractions.
3. Upshift small fractions to put the MSB in the right position.

There is an boundary error in step 2, causing a fraction with its
MSB exactly one bit above the normalized MSB position to not be
downshifted.  This results in a non-normalized raw float, which when
packed becomes a massively inaccurate representation for that input.

The impact of this depends on a number of arch-specific factors,
but it is known to have broken emulation of FXTOD instructions
on UltraSPARC III, which was originally reported as GCC bug 44631
<http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44631>.

Any arch which uses math-emu to emulate conversions from integers to
same-size floats may be affected.

The fix is simple: the exponent comparison used to determine if the
fraction should be downshifted must be "<=" not "<".

I'm sending a kernel module to test this as a reply to this message.
There are also SPARC user-space test cases in the GCC bug entry.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-20 18:45:14 -07:00
Linus Torvalds
f4b23cc2d5 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/r600: fix possible NULL pointer derefernce
  drm/radeon/kms: add quirk for ASUS HD 3600 board
  include/linux/vgaarb.h: add missing part of include guard
  drm/nouveau: Fix crashes during fbcon init on single head cards.
  drm/nouveau: fix pcirom vbios shadow breakage from acpi rom patch
  drm/radeon/kms: fix shared ddc harder
  drm/i915: enable low power render writes on GEN3 hardware.
  drm/i915: Define MI_ARB_STATE bits
  vmwgfx: return -EFAULT if copy_to_user fails
  fb: handle allocation failure in alloc_apertures()
  drm: radeon: check kzalloc() result
  drm/ttm: Fix build on architectures without AGP
  drm/radeon/kms: fix gtt MC base alignment on rs4xx/rs690/rs740 asics
  drm/radeon/kms: fix possible mis-detection of sideport on rs690/rs740
  drm/radeon/kms: fix legacy tv-out pal mode
2010-07-20 18:29:25 -07:00
Doug Goldstein
a6a1a095ec include/linux/vgaarb.h: add missing part of include guard
vgaarb.h was missing the #define of the #ifndef at the top for the guard
to prevent multiple #include's from causing re-define errors

Signed-off-by: Doug Goldstein <cardoe@gentoo.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-07-21 09:51:15 +10:00
Linus Torvalds
516bd66415 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
  bridge: Partially disable netpoll support
  tcp: fix crash in tcp_xmit_retransmit_queue
  IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input())
  ibmveth: lost IRQ while closing/opening device leads to service loss
  rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
  vhost: avoid pr_err on condition guest can trigger
  ipmr: Don't leak memory if fib lookup fails.
  vhost-net: avoid flush under lock
  net: fix problem in reading sock TX queue
  net/core: neighbour update Oops
  net: skb_tx_hash() fix relative to skb_orphan_try()
  rfs: call sock_rps_record_flow() in tcp_splice_read()
  xfrm: do not assume that template resolving always returns xfrms
  hostap_pci: set dev->base_addr during probe
  axnet_cs: use spin_lock_irqsave in ax_interrupt
  dsa: Fix Kconfig dependencies.
  act_nat: not all of the ICMP packets need an IP header payload
  r8169: incorrect identifier for a 8168dp
  Phonet: fix skb leak in pipe endpoint accept()
  Bluetooth: Update sec_level/auth_type for already existing connections
  ...
2010-07-20 16:26:42 -07:00
Paul E. McKenney
844b9a8707 vfs: fix RCU-lockdep false positive due to /proc
If a single-threaded process does a file-descriptor operation, and some
other process accesses that same file descriptor via /proc, the current
rcu_dereference_check_fdtable() can give a false-positive RCU-lockdep
splat due to the reference count being increased by the /proc access after
the reference-count check in fget_light() but before the check in
rcu_dereference_check_fdtable().

This commit prevents this false positive by checking for a single-threaded
process.  To avoid #include hell, this commit uses the wrapper for
thread_group_empty(current) defined by rcu_my_thread_group_empty()
provided in a separate commit.

Located-by: Miles Lane <miles.lane@gmail.com>
Located-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20 16:25:41 -07:00
Sam Ravnborg
07fca0e57f tracing: Properly align linker defined symbols
We define a number of symbols in the linker scipt like this:

    __start_syscalls_metadata = .;
    *(__syscalls_metadata)

But we do not know the alignment of "." when we assign
the __start_syscalls_metadata symbol.
gcc started to uses bigger alignment for structs (32 bytes),
so we saw situations where the linker due to alignment
constraints increased the value of "." after the symbol assignment.

This resulted in boot fails.

Fix this by forcing a 32 byte alignment of "." before the
assignment.

This patch introduces the forced alignment for
ftrace_events and syscalls_metadata.
It may be required in more places.

Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <20100710063459.GA14596@merkur.ravnborg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-07-20 19:02:52 -04:00