Pull x86/trace changes from Ingo Molnar:
"This adds page fault tracepoints which have zero runtime cost in the
disabled case via IDT trickery (no NOPs in the page fault hotpath)"
* 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, trace: Change user|kernel_page_fault to page_fault_user|kernel
x86, trace: Add page fault tracepoints
x86, trace: Delete __trace_alloc_intr_gate()
x86, trace: Register exception handler to trace IDT
x86, trace: Remove __alloc_intr_gate()
Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.
Compile tested i386_defconfig + gcc 4.7.3
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch registers exception handlers for tracing to a trace IDT.
To implemented it in set_intr_gate(), this patch does followings.
- Register the exception handlers to
the trace IDT by prepending "trace_" to the handler's names.
- Also, newly introduce trace_page_fault() to add tracepoints
in a subsequent patch.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/52716DEC.5050204@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:
__preempt_count_add()
__preempt_count_sub()
and the new:
__preempt_count_dec_and_test()
And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.
Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In fd4363fff3 ("x86: Introduce int3 (breakpoint)-based
instruction patching"), the mechanism that was introduced for
notifying alternatives code from int3 exception handler that and
exception occured was die_notifier.
This is however problematic, as early code might be using jump
labels even before the notifier registration has been performed,
which will then lead to an oops due to unhandled exception. One
of such occurences has been encountered by Fengguang:
int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
Stack:
ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
Call Trace:
<IRQ>
[<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
[<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
[<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
[<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
[<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
<EOI>
[<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
[<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
[<ffffffff81015f10>] default_idle+0x147/0x282
[<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
[<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
[<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
[<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
[...]
Fix this by directly calling poke_int3_handler() from the int3
exception handler (analogically to what ftrace has been doing
already), instead of relying on notifier, registration of which
might not have yet been finalized by the time of the first trap.
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.cz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since the IDT is referenced from a fixmap, make sure it is page aligned.
Merge with 32-bit one, since it was already aligned to deal with F00F
bug. Since bss is cleared before IDT setup, it can live there. This also
moves the other *_idt_table variables into common locations.
This avoids the risk of the IDT ever being moved in the bss and having
the mapping be offset, resulting in calling incorrect handlers. In the
current upstream kernel this is not a manifested bug, but heavily patched
kernels (such as those using the PaX patch series) did encounter this bug.
The tables other than idt_table technically do not need to be page
aligned, at least not at the current time, but using a common
declaration avoids mistakes. On 64 bits the table is exactly one page
long, anyway.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.net
Reported-by: PaX Team <pageexec@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull x86 tracing updates from Ingo Molnar:
"This tree adds IRQ vector tracepoints that are named after the handler
and which output the vector #, based on a zero-overhead approach that
relies on changing the IDT entries, by Seiji Aguchi.
The new tracepoints look like this:
# perf list | grep -i irq_vector
irq_vectors:local_timer_entry [Tracepoint event]
irq_vectors:local_timer_exit [Tracepoint event]
irq_vectors:reschedule_entry [Tracepoint event]
irq_vectors:reschedule_exit [Tracepoint event]
irq_vectors:spurious_apic_entry [Tracepoint event]
irq_vectors:spurious_apic_exit [Tracepoint event]
irq_vectors:error_apic_entry [Tracepoint event]
irq_vectors:error_apic_exit [Tracepoint event]
[...]"
* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tracing: Add config option checking to the definitions of mce handlers
trace,x86: Do not call local_irq_save() in load_current_idt()
trace,x86: Move creation of irq tracepoints from apic.c to irq.c
x86, trace: Add irq vector tracepoints
x86: Rename variables for debugging
x86, trace: Introduce entering/exiting_irq()
tracing: Add DEFINE_EVENT_FN() macro
Pull x86 debug update from Ingo Molnar:
"Misc debuggability improvements:
- Optimize the x86 CPU register printout a bit
- Expose the tboot TXT log via debugfs
- Small do_debug() cleanup"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tboot: Provide debugfs interfaces to access TXT log
x86: Remove weird PTR_ERR() in do_debug
x86/debug: Only print out DR registers if they are not power-on defaults
It is sometimes very helpful to be able to pinpoint the location which
causes a double fault before it turns into a triple fault and the
machine reboots. We have this for 32-bit already so extend it to 64-bit.
On 64-bit we get the register snapshot at #DF time and not from the
first exception which actually causes the #DF. It should be close
enough, though.
[ hpa: and definitely better than nothing, which is what we have now. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1368093749-31296-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull perparatory x86 kasrl changes from Ingo Molnar:
"This contains changes from the ongoing KASLR work, by Kees Cook.
The main changes are the use of a read-only IDT on x86 (which
decouples the userspace visible virtual IDT address from the physical
address), and a rework of ELF relocation support, in preparation of
random, boot-time kernel image relocation."
* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF
x86, relocs: Build separate 32/64-bit tools
x86, relocs: Add 64-bit ELF support to relocs tool
x86, relocs: Consolidate processing logic
x86, relocs: Generalize ELF structure names
x86: Use a read-only IDT alias on all CPUs
Make a copy of the IDT (as seen via the "sidt" instruction) read-only.
This primarily removes the IDT from being a target for arbitrary memory
write attacks, and has the added benefit of also not leaking the kernel
base offset, if it has been relocated.
We already did this on vendor == Intel and family == 5 because of the
F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or
not. Since the workaround was so cheap, there simply was no reason to
be very specific. This patch extends the readonly alias to all CPUs,
but does not activate the #PF to #UD conversion code needed to deliver
the proper exception in the F0 0F case except on Intel family 5
processors.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net
Cc: Eric Northup <digitaleric@google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
On exception exit, we restore the previous context tracking state based on
the regs of the interrupted frame. Iff that frame is in user mode as
stated by user_mode() helper, we restore the context tracking user mode.
However there is a tiny chunck of low level arch code after we pass through
user_enter() and until the CPU eventually resumes userspace.
If an exception happens in this tiny area, exception_enter() correctly
exits the context tracking user mode but exception_exit() won't restore
it because of the value returned by user_mode(regs).
As a result we may return to userspace with the wrong context tracking
state.
To fix this, change exception_enter() to return the context tracking state
prior to its call and pass this saved state to exception_exit(). This restores
the real context tracking state of the interrupted frame.
(May be this patch was suggested to me, I don't recall exactly. If so,
sorry for the missing credit).
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Mats Liljegren <mats.liljegren@enea.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all
64-bit code has to use page tables. This makes it awkward before we
have first set up properly all-covering page tables to access objects
that are outside the static kernel range.
So far we have dealt with that simply by mapping a fixed amount of
low memory, but that fails in at least two upcoming use cases:
1. We will support load and run kernel, struct boot_params, ramdisk,
command line, etc. above the 4 GiB mark.
2. need to access ramdisk early to get microcode to update that as
early possible.
We could use early_iomap to access them too, but it will make code to
messy and hard to be unified with 32 bit.
Hence, set up a #PF table and use a fixed number of buffers to set up
page tables on demand. If the buffers fill up then we simply flush
them and start over. These buffers are all in __initdata, so it does
not increase RAM usage at runtime.
Thus, with the help of the #PF handler, we can set the final kernel
mapping from blank, and switch to init_level4_pgt later.
During the switchover in head_64.S, before #PF handler is available,
we use three pages to handle kernel crossing 1G, 512G boundaries with
sharing page by playing games with page aliasing: the same page is
mapped twice in the higher-level tables with appropriate wraparound.
The kernel region itself will be properly mapped; other mappings may
be spurious.
early_make_pgtable is using kernel high mapping address to access pages
to set page table.
-v4: Add phys_base offset to make kexec happy, and add
init_mapping_kernel() - Yinghai
-v5: fix compiling with xen, and add back ident level3 and level2 for xen
also move back init_level4_pgt from BSS to DATA again.
because we have to clear it anyway. - Yinghai
-v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai
-v7: remove not needed clear_page for init_level4_page
it is with fill 512,8,0 already in head_64.S - Yinghai
-v8: we need to keep that handler alive until init_mem_mapping and don't
let early_trap_init to trash that early #PF handler.
So split early_trap_pf_init out and move it down. - Yinghai
-v9: switchover only cover kernel space instead of 1G so could avoid
touch possible mem holes. - Yinghai
-v11: change far jmp back to far return to initial_code, that is needed
to fix failure that is reported by Konrad on AMD systems. - Yinghai
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pull one final 386 removal patch from Peter Anvin.
IRQ 13 FPU error handling is gone. That was not one of the proudest
moments in PC history.
* 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, 386 removal: Remove support for IRQ 13 FPU error reporting
Remove support for FPU error reporting via IRQ 13, as opposed to
exception 16 (#MF). One last remnant of i386 gone.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Create a new subsystem that probes on kernel boundaries
to keep track of the transitions between level contexts
with two basic initial contexts: user or kernel.
This is an abstraction of some RCU code that use such tracking
to implement its userspace extended quiescent state.
We need to pull this up from RCU into this new level of indirection
because this tracking is also going to be used to implement an "on
demand" generic virtual cputime accounting. A necessary step to
shutdown the tick while still accounting the cputime.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: fix whitespace error and email address. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull x86/fpu update from Ingo Molnar:
"The biggest change is the addition of the non-lazy (eager) FPU saving
support model and enabling it on CPUs with optimized xsaveopt/xrstor
FPU state saving instructions.
There are also various Sparse fixes"
Fix up trivial add-add conflict in arch/x86/kernel/traps.c
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, kvm: fix kvm's usage of kernel_fpu_begin/end()
x86, fpu: remove cpu_has_xmm check in the fx_finit()
x86, fpu: make eagerfpu= boot param tri-state
x86, fpu: enable eagerfpu by default for xsaveopt
x86, fpu: decouple non-lazy/eager fpu restore from xsave
x86, fpu: use non-lazy fpu restore for processors supporting xsave
lguest, x86: handle guest TS bit for lazy/non-lazy fpu host models
x86, fpu: always use kernel_fpu_begin/end() for in-kernel FPU usage
x86, kvm: use kernel_fpu_begin/end() in kvm_load/put_guest_fpu()
x86, fpu: remove unnecessary user_fpu_end() in save_xstate_sig()
x86, fpu: drop_fpu() before restoring new state from sigframe
x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels
x86, fpu: Consolidate inline asm routines for saving/restoring fpu state
x86, signal: Cleanup ifdefs and is_ia32, is_x32
Pull x86/asm changes from Ingo Molnar:
"The one change that stands out is the alternatives patching change
that prevents us from ever patching back instructions from SMP to UP:
this simplifies things and speeds up CPU hotplug.
Other than that it's smaller fixes, cleanups and improvements."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unspaghettize do_trap()
x86_64: Work around old GAS bug
x86: Use REP BSF unconditionally
x86: Prefer TZCNT over BFS
x86/64: Adjust types of temporaries used by ffs()/fls()/fls64()
x86: Drop unnecessary kernel_eflags variable on 64-bit
x86/smp: Don't ever patch back to UP if we unplug cpus
There is some unnatural label based layout in this function.
Convert the unnecessary goto to readable conditional blocks.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>