There are three choices:
vsyscall=native: Vsyscalls are native code that issues the
corresponding syscalls.
vsyscall=emulate (default): Vsyscalls are emulated by instruction
fault traps, tested in the bad_area path. The actual contents of
the vsyscall page is the same as the vsyscall=native case except
that it's marked NX. This way programs that make assumptions about
what the code in the page does will not be confused when they read
that code.
vsyscall=none: Trying to execute a vsyscall will segfault.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Gold has trouble assigning numbers to the location counter inside of
an output section description. The bug was triggered by
9fd67b4ed0, which consolidated all of
the vsyscall sections into a single section. The workaround is IMO
still nicer than the old way of doing it.
This produces an apparently valid kernel image and passes my vdso
tests on both GNU ld version 2.21.51.0.6-2.fc15 20110118 and GNU
gold (version 2.21.51.0.6-2.fc15 20110118) 1.10 as distributed by
Fedora 15.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/0b260cb806f1f9a25c00ce8377a5f035d57f557a.1312378163.git.luto@mit.edu
Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: vdso: Remove unused variable
x86-64: Optimize vDSO time()
x86-64: Add time to vDSO
x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSO
x86-64: Move vread_tsc into a new file with sensible options
x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0
x86-64: Don't generate cmov in vread_tsc
x86-64: Remove unnecessary barrier in vread_tsc
x86-64: Clean up vdso/kernel shared variables
* 'for-2.6.40' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: Unify input section names
percpu: Avoid extra NOP in percpu_cmpxchg16b_double
percpu: Cast away printk format warning
percpu: Always align percpu output section to PAGE_SIZE
Fix up fairly trivial conflict in arch/x86/include/asm/percpu.h as per Tejun
Variables that are shared between the vdso and the kernel are
currently a bit of a mess. They are each defined with their own
magic, they are accessed differently in the kernel, the vsyscall page,
and the vdso, and one of them (vsyscall_clock) doesn't even really
exist.
This changes them all to use a common mechanism. All of them are
delcared in vvar.h with a fixed address (validated by the linker
script). In the kernel (as before), they look like ordinary
read-write variables. In the vsyscall page and the vdso, they are
accessed through a new macro VVAR, which gives read-only access.
The vdso is now loaded verbatim into memory without any fixups. As a
side bonus, access from the vdso is faster because a level of
indirection is removed.
While we're at it, pack jiffies and vgetcpu_mode into the same
cacheline.
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Percpu allocator honors alignment request upto PAGE_SIZE and both the
percpu addresses in the percpu address space and the translated kernel
addresses should be aligned accordingly. The calculation of the
former depends on the alignment of percpu output section in the kernel
image.
The linker script macros PERCPU_VADDR() and PERCPU() are used to
define this output section and the latter takes @align parameter.
Several architectures are using @align smaller than PAGE_SIZE breaking
percpu memory alignment.
This patch removes @align parameter from PERCPU(), renames it to
PERCPU_SECTION() and makes it always align to PAGE_SIZE. While at it,
add PCPU_SETUP_BUG_ON() checks such that alignment problems are
reliably detected and remove percpu alignment comment recently added
in workqueue.c as the condition would trigger BUG way before reaching
there.
For um, this patch raises the alignment of percpu area. As the area
is in .init, there shouldn't be any noticeable difference.
This problem was discovered by David Howells while debugging boot
failure on mn10300.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: uclinux-dist-devel@blackfin.uclinux.org
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: user-mode-linux-devel@lists.sourceforge.net
* 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix binutils-2.21 symbol related build failures
x86-64, trampoline: Remove unused variable
x86, reboot: Fix the use of passed arguments in 32-bit BIOS reboot
x86, reboot: Move the real-mode reboot code to an assembly file
x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assembly
x86, trampoline: Use the unified trampoline setup for ACPI wakeup
x86, trampoline: Common infrastructure for low memory trampolines
Fix up trivial conflicts in arch/x86/kernel/Makefile
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu, x86: Add arch-specific this_cpu_cmpxchg_double() support
percpu: Generic support for this_cpu_cmpxchg_double()
alpha: use L1_CACHE_BYTES for cacheline size in the linker script
percpu: align percpu readmostly subsection to cacheline
Fix up trivial conflict in arch/x86/kernel/vmlinux.lds.S due to the
percpu alignment having changed ("x86: Reduce back the alignment of the
per-CPU data section")
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, binutils, xen: Fix another wrong size directive
x86: Remove dead config option X86_CPU
x86: Really print supported CPUs if PROCESSOR_SELECT=y
x86: Fix a bogus unwind annotation in lib/semaphore_32.S
um, x86-64: Fix UML build after adding CFI annotations to lib/rwsem_64.S
x86: Remove unused bits from lib/thunk_*.S
x86: Use {push,pop}_cfi in more places
x86-64: Add CFI annotations to lib/rwsem_64.S
x86, asm: Cleanup unnecssary macros in asm-offsets.c
x86, system.h: Drop unused __SAVE/__RESTORE macros
x86: Use bitmap library functions
x86: Partly unify asm-offsets_{32,64}.c
x86: Reduce back the alignment of the per-CPU data section
Put x86 entry code into a separate link section: .entry.text.
Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.
Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:
before patch:
19417627 L1-icache-load-misses ( +- 0.147% )
after patch:
16490788 L1-icache-load-misses ( +- 0.180% )
The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.
Whole perf output follows:
- results for current tip tree:
Performance counter stats for './hackbench/hackbench 10' (500 runs):
19417627 L1-icache-load-misses ( +- 0.147% )
2676914223 instructions # 0.497 IPC ( +- 0.079% )
5389516026 cycles ( +- 0.144% )
0.206267711 seconds time elapsed ( +- 0.138% )
- results for current tip tree with the patch applied:
Performance counter stats for './hackbench/hackbench 10' (500 runs):
16490788 L1-icache-load-misses ( +- 0.180% )
2717734941 instructions # 0.502 IPC ( +- 0.079% )
5414756975 cycles ( +- 0.148% )
0.206747566 seconds time elapsed ( +- 0.137% )
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Common infrastructure for low memory trampolines. This code installs
the trampolines permanently in low memory very early. It also permits
multiple pieces of code to be used for this purpose.
This code also introduces a standard infrastructure for computing
symbol addresses in the trampoline code.
The only change to the actual SMP trampolines themselves is that the
64-bit trampoline has been made reusable -- the previous version would
overwrite the code with a status variable; this moves the status
variable to a separate location.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4D5DFBE4.7090104@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthieu Castet <castet.matthieu@free.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Currently percpu readmostly subsection may share cachelines with other
percpu subsections which may result in unnecessary cacheline bounce
and performance degradation.
This patch adds @cacheline parameter to PERCPU() and PERCPU_VADDR()
linker macros, makes each arch linker scripts specify its cacheline
size and use it to align percpu subsections.
This is based on Shaohua's x86 only patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shaohua Li <shaohua.li@intel.com>
This reverts commit 86b1e8dd83 ("x86: Make relocatable kernel work with
new binutils").
Markus Trippelsdorf reported a boot failure caused by this patch.
The real solution to the original patch will likely involve an
arch-generic solution to define an overlaid jiffies_64 and jiffies
variables.
Until that's done and tested on all architectures revert this commit to
solve the regression.
Reported-and-bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Lu, Hongjiu" <hongjiu.lu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <4D36A759.60704@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make
boot panic.
According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to
2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is
using such binutils. See:
http://sourceware.org/bugzilla/show_bug.cgi?id=12327
The reason of the boot panic is that we have 'jiffies = jiffies_64;' in
vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is
warning saying jiffies is an absolute address and can't be relocatable. At
runtime, jiffies will have virtual address 0.
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Lu Hongjiu<hongjiu.lu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1295312269.1949.725.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch expands functionality of CONFIG_DEBUG_RODATA to set main
(static) kernel data area as NX.
The following steps are taken to achieve this:
1. Linker script is adjusted so .text always starts and ends on a page bound
2. Linker script is adjusted so .rodata always start and end on a page boundary
3. NX is set for all pages from _etext through _end in mark_rodata_ro.
4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
5. bios rom is set to x when pcibios is used.
The results of patch application may be observed in the diff of kernel page
table dumps:
pcibios:
-- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400
++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400
0x00000000-0xc0000000 3G pmd
---[ Kernel Mapping ]---
-0xc0000000-0xc0100000 1M RW GLB x pte
+0xc0000000-0xc00a0000 640K RW GLB NX pte
+0xc00a0000-0xc0100000 384K RW GLB x pte
-0xc0100000-0xc03d7000 2908K ro GLB x pte
+0xc0100000-0xc0318000 2144K ro GLB x pte
+0xc0318000-0xc03d7000 764K ro GLB NX pte
-0xc03d7000-0xc0600000 2212K RW GLB x pte
+0xc03d7000-0xc0600000 2212K RW GLB NX pte
0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd
0xf7a00000-0xf7bfe000 2040K RW GLB NX pte
0xf7bfe000-0xf7c00000 8K pte
No pcibios:
-- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400
++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400
0x00000000-0xc0000000 3G pmd
---[ Kernel Mapping ]---
-0xc0000000-0xc0100000 1M RW GLB x pte
+0xc0000000-0xc0100000 1M RW GLB NX pte
-0xc0100000-0xc03d7000 2908K ro GLB x pte
+0xc0100000-0xc0318000 2144K ro GLB x pte
+0xc0318000-0xc03d7000 764K ro GLB NX pte
-0xc03d7000-0xc0600000 2212K RW GLB x pte
+0xc03d7000-0xc0600000 2212K RW GLB NX pte
0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd
0xf7a00000-0xf7bfe000 2040K RW GLB NX pte
0xf7bfe000-0xf7c00000 8K pte
The patch has been originally developed for Linux 2.6.34-rc2 x86 by
Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
-v1: initial patch for 2.6.30
-v2: patch for 2.6.31-rc7
-v3: moved all code into arch/x86, adjusted credits
-v4: fixed ifdef, removed credits from CREDITS
-v5: fixed an address calculation bug in mark_nxdata_nx()
-v6: added acked-by and PT dump diff to commit log
-v7: minor adjustments for -tip
-v8: rework with the merge of "Set first MB as RW+NX"
Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: James Morris <jmorris@namei.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dave Jones <davej@redhat.com>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4CE2F82E.60601@free.fr>
[ minor cleanliness edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The irq stacks, located in the percpu-area, need to be
THREAD_SIZE aligned. Add the infrastucture to align percpu
variables to larger-than-pagesize amounts within the percpu
area, and use it to specify the alignment for the irq stacks.
Also align the percpu area itself to THREAD_SIZE.
This should make irq stacks work with 8K THREAD_SIZE.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Tejun Heo <tj@kernel.org>
Cc: hch@lst.de
LKML-Reference: <1283799222.15941.1393621887@webmail.messagingengine.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>