This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.
The area is binary compatible with xen, as we use the same shadow_info
structure.
[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The load_pdptrs() function is required in the SVM module for NPT support.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.
[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561
The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.
So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:
idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabled
Nearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.
this can fix a system without _CRS for multi pci root bus.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.
Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.
In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.
The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.
pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.
In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.
Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.
This is an updated version of pci_sysdata and Jeff's pci_domain patch.
[ mingo@elte.hu: build fix ]
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
x86_64/mm: check and print vmemmap allocation continuous
x86_64: fix setup_node_bootmem to support big mem excluding with memmap
x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
mm: allow reserve_bootmem() cross nodes
mm: offset align in alloc_bootmem()
mm: fix alloc_bootmem_core to use fast searching for all nodes
mm: make mem_map allocation continuous
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.
so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.
depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
x86, bitops: select the generic bitmap search functions
x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
x86: finalize bitops unification
x86, UML: remove x86-specific implementations of find_first_bit
x86: optimize find_first_bit for small bitmaps
x86: switch 64-bit to generic find_first_bit
x86: generic versions of find_first_(zero_)bit, convert i386
bitops: use __fls for fls64 on 64-bit archs
generic: implement __fls on all 64-bit archs
generic: introduce a generic __fls implementation
x86: merge the simple bitops and move them to bitops.h
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
x86, uml: fix uml with generic find_next_bit for x86
x86: change x86 to use generic find_next_bit
uml: Kconfig cleanup
uml: fix build error
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now
almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER
and has an extra inline function set_bit_string. The define currently
has no influence on the generated code, but it can be argued that
setting it on i386 is the right thing to do anyhow. The addition
of the extra inline function on i386 does not hurt either.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 has been switched to the generic versions of find_first_bit
and find_first_zero_bit, but the original versions were retained.
This patch just removes the now unused x86-specific versions.
also update UML.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch x86_64 to generic find_first_bit. The x86_64-specific
implementation is not removed.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use __fls for fls64 on 64-bit archs. The implementation for
64-bit archs is moved from x86_64 to asm-generic.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some of those can be written in such a way that the same
inline assembly can be used to generate both 32 bit and
64 bit code.
For ffs and fls, x86_64 unconditionally used the cmov
instruction and i386 unconditionally used a conditional
branch over a mov instruction. In the current patch I
chose to select the version based on the availability
of the cmov instruction instead. A small detail here is
that x86_64 did not previously set CONFIG_X86_CMOV=y.
Improved comments for ffs, ffz, fls and variations.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.
On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:
In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
On x86_64, 52 locations are optimized with a minimal increase in
code size:
Current #testing defconfig:
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392637 846592 724424 6963653 6a41c5 vmlinux
After removing the x86_64 specific optimization for find_next_*bit:
94 x bsf, 79 x find_next_*bit
text data bss dec hex filename
5392358 846592 724424 6963374 6a40ae vmlinux
After this patch (making the optimization generic):
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392396 846592 724424 6963412 6a40d4 vmlinux
[ tglx@linutronix.de: build fixes ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
Athlon Xeon Opteron 32/64bit
x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s
generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
defconfig: text data bss dec hex filename
x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit)
generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit)
x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit)
generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit)
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes section mismatch warnings in smpboot_setup_io_apic().
WARNING: arch/x86/kernel/built-in.o(.text+0x11781): Section mismatch in reference from the function smpboot_setup_io_apic()
to the function .init.text:setup_IO_APIC()
The function smpboot_setup_io_apic() references
the function __init setup_IO_APIC().
This is often because smpboot_setup_io_apic lacks a __init
annotation or the annotation of setup_IO_APIC is wrong.
Signed-off-by: Jacek Luczak <luczak.jacek@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>