Pull x86 asm updates from Ingo Molnar:
- Introduce the ORC unwinder, which can be enabled via
CONFIG_ORC_UNWINDER=y.
The ORC unwinder is a lightweight, Linux kernel specific debuginfo
implementation, which aims to be DWARF done right for unwinding.
Objtool is used to generate the ORC unwinder tables during build, so
the data format is flexible and kernel internal: there's no
dependency on debuginfo created by an external toolchain.
The ORC unwinder is almost two orders of magnitude faster than the
(out of tree) DWARF unwinder - which is important for perf call graph
profiling. It is also significantly simpler and is coded defensively:
there has not been a single ORC related kernel crash so far, even
with early versions. (knock on wood!)
But the main advantage is that enabling the ORC unwinder allows
CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
measurably:
With frame pointers disabled, GCC does not have to add frame pointer
instrumentation code to every function in the kernel. The kernel's
.text size decreases by about 3.2%, resulting in better cache
utilization and fewer instructions executed, resulting in a broad
kernel-wide speedup. Average speedup of system calls should be
roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
a speedup of 5-10% for some function execution intense workloads.
The main cost of the unwinder is that the unwinder data has to be
stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
config - which is a modest cost on modern x86 systems.
Given how young the ORC unwinder code is it's not enabled by default
- but given the performance advantages the plan is to eventually make
it the default unwinder on x86.
See Documentation/x86/orc-unwinder.txt for more details.
- Remove lguest support: its intended role was that of a temporary
proof of concept for virtualization, plus its removal will enable the
reduction (removal) of the paravirt API as well, so Rusty agreed to
its removal. (Juergen Gross)
- Clean up and fix FSGS related functionality (Andy Lutomirski)
- Clean up IO access APIs (Andy Shevchenko)
- Enhance the symbol namespace (Jiri Slaby)
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
objtool: Handle GCC stack pointer adjustment bug
x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
x86/fpu/math-emu: Add ENDPROC to functions
x86/boot/64: Extract efi_pe_entry() from startup_64()
x86/boot/32: Extract efi_pe_entry() from startup_32()
x86/lguest: Remove lguest support
x86/paravirt/xen: Remove xen_patch()
objtool: Fix objtool fallthrough detection with function padding
x86/xen/64: Fix the reported SS and CS in SYSCALL
objtool: Track DRAP separately from callee-saved registers
objtool: Fix validate_branch() return codes
x86: Clarify/fix no-op barriers for text_poke_bp()
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
selftests/x86/fsgsbase: Test selectors 1, 2, and 3
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
x86/asm/32: Fix regs_get_register() on segment registers
x86/xen/64: Rearrange the SYSCALL entries
x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
...
Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
It plugs into the existing x86 unwinder framework.
It relies on objtool to generate the needed .orc_unwind and
.orc_unwind_ip sections.
For more details on why ORC is used instead of DWARF, see
Documentation/x86/orc-unwinder.txt - but the short version is
that it's a simplified, fundamentally more robust debugninfo
data structure, which also allows up to two orders of magnitude
faster lookups than the DWARF unwinder - which matters to
profiling workloads like perf.
Thanks to Andy Lutomirski for the performance improvement ideas:
splitting the ORC unwind table into two parallel arrays and creating a
fast lookup table to search a subset of the unwind table.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
[ Extended the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Objtool tries to silence 'unreachable instruction' warnings when it
detects gcov is enabled, because gcov produces a lot of unreachable
instructions and they don't really matter.
However, the 0-day bot is still reporting some unreachable instruction
warnings with CONFIG_GCOV_KERNEL=y on GCC 4.6.4.
As it turns out, objtool's gcov detection doesn't work with older
versions of GCC because they don't create a bunch of symbols with the
'gcov.' prefix like newer versions of GCC do.
Move the gcov check out of objtool and instead just create a new
'--no-unreachable' flag which can be passed in by the kernel Makefile
when CONFIG_GCOV_KERNEL is defined.
Also rename the 'nofp' variable to 'no_fp' for consistency with the new
'no_unreachable' variable.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 9cfffb1168 ("objtool: Skip all "unreachable instruction" warnings for gcov kernels")
Link: http://lkml.kernel.org/r/c243dc78eb2ffdabb6e927844dea39b6033cd395.1500939244.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The P option makes ar do full path name matching and can prevent ar
from discarding files with duplicate names in some cases of creating
thin archives from thin archives. The sh architecture in particular
loses some object files from its kernel/cpu/sh*/ directories without
this option.
This could be a bug in binutils ar, but the P option should not cause
any negative effects so it is safe to use to work around this with.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
It is currently impossible to see what is going on with objtool when
building, so call echo-cmd to see the actions:
gcc -Wp,-MD,arch/x86/entry/.entry_64.o.d -nostdinc -isystem ...
./tools/objtool/objtool check "arch/x86/entry/entry_64.o";
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Michal Marek <mmarek@suse.com>
Cc: linux-kbuild@vger.kernel.org
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
This add the kbuild infrastructure that will allow architectures to emit
vmlinux symbol CRCs as 32-bit offsets to another location in the kernel
where the actual value is stored. This works around problems with CRCs
being mistaken for relocatable symbols on kernels that self relocate at
runtime (i.e., powerpc with CONFIG_RELOCATABLE=y)
For the kbuild side of things, this comes down to the following:
- introducing a Kconfig symbol MODULE_REL_CRCS
- adding a -R switch to genksyms to instruct it to emit the CRC symbols
as references into the .rodata section
- making modpost distinguish such references from absolute CRC symbols
by the section index (SHN_ABS)
- making kallsyms disregard non-absolute symbols with a __crc_ prefix
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kbuild updates from Michal Marek:
- prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
about missing CRCs (Nick Piggin)
- asm-exports fix for LTO (Nicolas Pitre)
- thin archives improvements (Nick Piggin)
- linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
Piggin)
- genksyms support for __builtin_va_list keyword
- misc minor fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
x86/kbuild: enable modversions for symbols exported from asm
kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
scripts/kallsyms: remove last remnants of --page-offset option
make use of make variable CURDIR instead of calling pwd
kbuild: cmd_export_list: tighten the sed script
kbuild: minor improvement for thin archives build
kbuild: modpost warn if export version crc is missing
kbuild: keep data tables through dead code elimination
kbuild: improve linker compatibility with lib-ksyms.o build
genksyms: Regenerate parser
kbuild/genksyms: handle va_list type
kbuild: thin archives for multi-y targets
kbuild: kallsyms allow 3-pass generation if symbols size has changed
When LTO is used, some ___ksymtab_string sections are seen by this sed
script, creating lines containing a single ) such as:
EXPORT(foo)
)
)
EXPORT(bar)
Let's make it so the + character is also required for any line to be
printed.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
lib-ksyms.o is created by linking an empty input file with a linker
script containing the interesting bits. Currently the empty input file
is an archive containing nothing, however this causes the gold linker
to segfault.
I have opened a bug against gold
https://sourceware.org/bugzilla/show_bug.cgi?id=20767
However this can be worked around by assembling an empty file to link
with instead. The resulting lib-ksyms.o is slightly larger (seemingly
due to empty .text, .data, .bss setions added), but final linked
output should not be changed.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
THIN_ARCHIVES builds archives for built-in.o targets, have it build
multi-y targets as archives as well.
This saves another ~15% of the size of intermediate artifacts in the
build tree. After this patch, the linker is only used in final link,
and special cases like vdsos.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
The CRC code for asm exports grabs the preprocessed asm, finds the
___EXPORT_SYMBOL and turns those into EXPORT_SYMBOL in a C program
that can be preprocessed and parsed to create the CRC signatures from
the type.
The existing regex matching and replacement is too strict, and doesn't
deal well with whitespace among other things. The line
" EXPORT_SYMBOL(sym)" in a .S file would not match due to initial
whitespace, for example, which resulted in x86's ___preempt_schedule
failing to get CRCs.
Reported-by: Philip Müller <philm@manjaro.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Allow architectures to create asm/asm-prototypes.h file that
provides C prototypes for exported asm functions, which enables
proper CRC versions to be generated for them.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
ld -r is an incremental link used to create built-in.o files in build
subdirectories. It produces relocatable object files containing all
its input files, and these are are then pulled together and relocated
in the final link. Aside from the bloat, this constrains the final
link relocations, which has bitten large powerpc builds with
unresolvable relocations in the final link.
Alan Modra has recommended the kernel use thin archives for linking.
This is an alternative and means that the linker has more information
available to it when it links the kernel.
This patch enables a config option architectures can select, which
causes all built-in.o files to be built as thin archives. built-in.o
files in subdirectories do not get symbol table or index attached,
which improves speed and size. The final link pass creates a
built-in.o archive in the root output directory which includes the
symbol table and index. The linker then uses takes this file to link.
The --whole-archive linker option is required, because the linker now
has visibility to every individual object file, and it will otherwise
just completely avoid including those without external references
(consider a file with EXPORT_SYMBOL or initcall or hardware exceptions
as its only entry points). The traditional built works "by luck" as
built-in.o files are large enough that they're going to get external
references. However this optimisation is unpredictable for the kernel
(due to above external references), ineffective at culling unused, and
costly because the .o files have to be searched for references.
Superior alternatives for link-time culling should be used instead.
Build characteristics for inclink vs thinarc, on a small powerpc64le
pseries VM with a modest .config:
inclink thinarc
sizes
vmlinux 15 618 680 15 625 028
sum of all built-in.o 56 091 808 1 054 334
sum excluding root built-in.o 151 430
find -name built-in.o | xargs rm ; time make vmlinux
real 22.772s 21.143s
user 13.280s 13.430s
sys 4.310s 2.750s
- Final kernel pulled in only about 6K more, which shows how
ineffective the object file culling is.
- Build performance looks improved due to less pagecache activity.
On IO constrained systems it could be a bigger win.
- Build size saving is significant.
Side note, the toochain understands archives, so there's some tricks,
$ ar t built-in.o # list all files you linked with
$ size built-in.o # and their sizes
$ objdump -d built-in.o # disassembly (unrelocated) with filenames
Implementation by sfr, minor tweaks by npiggin.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Collect the symbols exported by anything that goes into lib.a and
add an empty object (lib-exports.o) with explicit undefs for each
of those to obj-y.
That allows to relax the rules regarding the use of exports in
lib-* objects - right now an object with export can be in lib-*
only if we are guaranteed that there always will be users in
built-in parts of the tree, otherwise it needs to be in obj-*.
As the result, we have an unholy mix of lib- and obj- in lib/Makefile
and (especially) in arch/*/lib/Makefile. Moreover, a change in
generic part of the kernel can lead to mysteriously missing exports
on some configs. With this change we don't have to worry about
that anymore.
One side effect is that built-in.o now pulls everything with exports
from the corresponding lib.a (if such exists). That's exactly what
we want for linking vmlinux and fortunately it's almost the only thing
built-in.o is used in. arch/ia64/hp/sim/boot/bootloader is the only
exception and it's easy to get rid of now - just turn everything in
arch/ia64/lib into lib-* and don't bother with arch/ia64/lib/built-in.o
anymore.
[AV: stylistic fix from Michal folded in]
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Infrastructure for building independent shared library targets.
Based on work created by the PaX Team.
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
This command just preprocesses .S files into .s files, so cmd_cpp_s_S
seems more suitable.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
This command just preprocesses .c files into .i files, so cmd_cpp_i_c
seems more suitable.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
The generation and postprocessing of automatic dependency rules is
duplicated in rule_cc_o_c, rule_as_o_S and if_changed_dep. Since
this is not a trivial one-liner action, it is now abstracted under
cmd_and_fixdep to simplify things and make future changes in this area
easier.
In the rule_cc_o_c and rule_as_o_S cases that means the order of some
commands has been altered, namely fixdep and related file manipulations
are executed earlier, but they didn't depend on those commands that now
execute later.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Kernel modules are partially linked object files with some undefined
symbols that are expected to be matched with EXPORT_SYMBOL() entries
from elsewhere.
Each .tmp_versions/*.mod file currently contains two line of text
separated by a newline character. The first line has the actual module
file name while the second line has a list of object files constituting
that module. Those files are parsed by modpost (scripts/mod/sumversion.c),
scripts/Makefile.modpost, scripts/Makefile.modsign, etc. Only the
modpost utility cares about the second line while the others retrieve
only the first line.
Therefore we can add a third line to record the list of undefined symbols
aka required EXPORT_SYMBOL() entries for each module into that file
without breaking anything. Like for the second line, symbols are separated
by a blank and the list is terminated with a newline character.
To avoid needless build overhead, the undefined symbols extraction is
performed only when CONFIG_TRIM_UNUSED_KSYMS is selected.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>