Pull x86/xsave changes from Peter Anvin:
"This is a patchset to support the XSAVES instruction required to
support context switch of supervisor-only features in upcoming
silicon.
This patchset missed the 3.16 merge window, which is why it is based
on 3.15-rc7"
* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, xsave: Add forgotten inline annotation
x86/xsaves: Clean up code in xstate offsets computation in xsave area
x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
Define kernel API to get address of each state in xsave area
x86/xsaves: Enable xsaves/xrstors
x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
x86/xsaves: Add xsaves and xrstors support for booting time
x86/xsaves: Clear reserved bits in xsave header
x86/xsaves: Use xsave/xrstor for saving and restoring user space context
x86/xsaves: Use xsaves/xrstors for context switch
x86/xsaves: Use xsaves/xrstors to save and restore xsave area
x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
x86/xsaves: Define macros for xsave instructions
x86/xsaves: Change compacted format xsave area header
x86/alternative: Add alternative_input_2 to support alternative with two features and input
x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
Pull x86 mm changes from Ingo Molnar:
"The main change in this cycle is the rework of the TLB range flushing
code, to simplify, fix and consolidate the code. By Dave Hansen"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Set TLB flush tunable to sane value (33)
x86/mm: New tunable for single vs full TLB flush
x86/mm: Add tracepoints for TLB flushes
x86/mm: Unify remote INVLPG code
x86/mm: Fix missed global TLB flush stat
x86/mm: Rip out complicated, out-of-date, buggy TLB flushing
x86/mm: Clean up the TLB flushing code
x86/smep: Be more informative when signalling an SMEP fault
I think the flush_tlb_mm_range() code that tries to tune the
flush sizes based on the CPU needs to get ripped out for
several reasons:
1. It is obviously buggy. It uses mm->total_vm to judge the
task's footprint in the TLB. It should certainly be using
some measure of RSS, *NOT* ->total_vm since only resident
memory can populate the TLB.
2. Haswell, and several other CPUs are missing from the
intel_tlb_flushall_shift_set() function. Thus, it has been
demonstrated to bitrot quickly in practice.
3. It is plain wrong in my vm:
[ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.037444] tlb_flushall_shift: 6
Which leads to it to never use invlpg.
4. The assumptions about TLB refill costs are wrong:
http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
(more on this in later patches)
5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
I believe the sample times were too short. Running the
benchmark in a loop yields times that vary quite a bit.
Note that this leaves us with a static ceiling of 1 page. This
is a conservative, dumb setting, and will be revised in a later
patch.
This also removes the code which attempts to predict whether we
are flushing data or instructions. We expect instruction flushes
to be relatively rare and not worth tuning for explicitly.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The XSAVE area header is changed to support both compacted format and
standard format of xsave area.
The XSAVE header of an xsave area comprises the 64 bytes starting at offset
512 from the area base address:
- Bytes 7:0 of the xsave header is a state-component bitmap called
xstate_bv. It identifies the state components in the xsave area.
- Bytes 15:8 of the xsave header is a state-component bitmap called
xcomp_bv. It is used as follows:
- xcomp_bv[63] indicates the format of the extended region of
the xsave area. If it is clear, the standard format is used.
If it is set, the compacted format is used.
- xcomp_bv[62:0] indicate which features (starting at feature 2)
have space allocated for them in the compacted format.
- Bytes 63:16 of the xsave header are reserved.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-6-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.
x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.
Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull x86 cpufeature and mpx updates from Peter Anvin:
"This includes the basic infrastructure for MPX (Memory Protection
Extensions) support, but does not include MPX support itself. It is,
however, a prerequisite for KVM support for MPX, which I believe will
be pushed later this merge window by the KVM team.
This includes moving the functionality in
futex_atomic_cmpxchg_inatomic() into a new function in uaccess.h so it
can be reused - this will be used by the final MPX patches.
The actual MPX functionality (map management and so on) will be pushed
in a future merge window, when ready"
* 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel/mpx: Remove unused LWP structure
x86, mpx: Add MPX related opcodes to the x86 opcode map
x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic
x86: add user_atomic_cmpxchg_inatomic at uaccess.h
x86, xsave: Support eager-only xsave features, add MPX support
x86, cpufeature: Define the Intel MPX feature flag
Pull x86 TLB detection update from Ingo Molnar:
"A single change that extends our TLB cache size detection+reporting
code"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu: Detect more TLB configuration
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu, amd: Fix a shadowed variable situation
um, x86: Fix vDSO build
x86: Delete non-required instances of include <linux/init.h>
x86, realmode: Pointer walk cleanups, pull out invariant use of __pa()
x86/traps: Clean up error exception handler definitions
Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.
Compile tested i386_defconfig + gcc 4.7.3
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 paravirt changes from Ingo Molnar:
"Hypervisor signature detection cleanup and fixes - the goal is to make
KVM guests run better on MS/Hyperv and to generalize and factor out
the code a bit"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Correctly detect hypervisor
x86, kvm: Switch to use hypervisor_cpuid_base()
xen: Switch to use hypervisor_cpuid_base()
x86: Introduce hypervisor_cpuid_base()
Pull x86/asmlinkage changes from Ingo Molnar:
"As a preparation for Andi Kleen's LTO patchset (link time
optimizations using GCC's -flto which build time optimization has
steadily increased in quality over the past few years and might
eventually be usable for the kernel too) this tree includes a handful
of preparatory patches that make function calling convention
annotations consistent again:
- Mark every function without arguments (or 64bit only) that is used
by assembly code with asmlinkage()
- Mark every function with parameters or variables that is used by
assembly code as __visible.
For the vanilla kernel this has documentation, consistency and
debuggability advantages, for the time being"
* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asmlinkage: Fix warning in xen asmlinkage change
x86, asmlinkage, vdso: Mark vdso variables __visible
x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
x86, asmlinkage: Make dump_stack visible
x86, asmlinkage: Make 64bit checksum functions visible
x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
x86, asmlinkage, apm: Make APM data structure used from assembler visible
x86, asmlinkage: Make syscall tables visible
x86, asmlinkage: Make several variables used from assembler/linker script visible
x86, asmlinkage: Make kprobes code visible and fix assembler code
x86, asmlinkage: Make various syscalls asmlinkage
x86, asmlinkage: Make 32bit/64bit __switch_to visible
x86, asmlinkage: Make _*_start_kernel visible
x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
x86, asmlinkage: Change dotraplinkage into __visible on 32bit
x86: Fix sys_call_table type in asm/syscall.h
The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the APERF/MPERF support in cpufreq is not used any
more, so drop it.
[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/x86 uses of the __cpuinit macros from
all C files. x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Pull x86 FPU changes from Ingo Molnar:
"There are two bigger changes in this tree:
- Add an [early-use-]safe static_cpu_has() variant and other
robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
configurable debugging facility, motivated by recent obscure FPU
code bugs, by Borislav Petkov
- Reimplement FPU detection code in C and drop the old asm code, by
Peter Anvin."
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu: Use static_cpu_has_safe before alternatives
x86: Add a static_cpu_has_safe variant
x86: Sanity-check static_cpu_has usage
x86, cpu: Add a synthetic, always true, cpu feature
x86: Get rid of ->hard_math and all the FPU asm fu
It is sometimes very helpful to be able to pinpoint the location which
causes a double fault before it turns into a triple fault and the
machine reboots. We have this for 32-bit already so extend it to 64-bit.
On 64-bit we get the register snapshot at #DF time and not from the
first exception which actually causes the #DF. It should be close
enough, though.
[ hpa: and definitely better than nothing, which is what we have now. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Convert AMD erratum 400 to the bug infrastructure. Then, retract all
exports for modules since they're not needed now and make the AMD
erratum checking machinery local to amd.c. Use forward declarations to
avoid shuffling too much code around needlessly.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-7-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>