Normally objtool will now follow indirect calls; there is no need.
However, this becomes a problem with noinstr validation; if there's an
indirect call from noinstr code, we very much need to know it is to
another noinstr function. Luckily there aren't many indirect calls in
entry code with the obvious exception of paravirt. As such, noinstr
validation didn't work with paravirt kernels.
In order to track pv_ops[] call targets, objtool reads the static
pv_ops[] tables as well as direct assignments to the pv_ops[] array,
provided the compiler makes them a single instruction like:
bf87: 48 c7 05 00 00 00 00 00 00 00 00 movq $0x0,0x0(%rip)
bf92 <xen_init_spinlocks+0x5f>
bf8a: R_X86_64_PC32 pv_ops+0x268
There are, as of yet, no warnings for when this goes wrong :/
Using the functions found with the above means, all pv_ops[] calls are
now subject to noinstr validation.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095149.118815755@infradead.org
Andi reported that objtool on vmlinux.o consumes more memory than his
system has, leading to horrific performance.
This is in part because we keep a struct instruction for every
instruction in the file in-memory. Shrink struct instruction by
removing the CFI state (which includes full register state) from it
and demand allocating it.
Given most instructions don't actually change CFI state, there's lots
of repetition there, so add a hash table to find previous CFI
instances.
Reduces memory consumption (and runtime) for processing an
x86_64-allyesconfig:
pre: 4:40.84 real, 143.99 user, 44.18 sys, 30624988 mem
post: 2:14.61 real, 108.58 user, 25.04 sys, 16396184 mem
Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210624095147.756759107@infradead.org
Nathan reported that LLVM ThinLTO builds have a performance regression
with commit 25cf0d8aa2 ("objtool: Rewrite hashtable sizing"). Sami
was quick to note that this is due to their use of -ffunction-sections.
As a result the .text section is small and basing the number of relocs
off of that no longer works. Instead have read_sections() compute the
sum of all SHF_EXECINSTR sections and use that.
Fixes: 25cf0d8aa2 ("objtool: Rewrite hashtable sizing")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lkml.kernel.org/r/YMJpGLuGNsGtA5JJ@hirez.programming.kicks-ass.net
Teach objtool about the the low bits in the struct static_key pointer.
That is, the low two bits of @key in:
struct jump_entry {
s32 code;
s32 target;
long key;
}
as found in the __jump_table section. Since @key has a relocation to
the variable (to be resolved by the linker), the low two bits will be
reflected in the relocation's addend.
As such, find the reloc and store the addend, such that we can access
these bits.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194158.028024143@infradead.org
Currently objtool has 5 hashtables and sizes them 16 or 20 bits
depending on the --vmlinux argument.
However, a single side doesn't really work well for the 5 tables,
which among them, cover 3 different uses. Also, while vmlinux is
larger, there is still a very wide difference between a defconfig and
allyesconfig build, which again isn't optimally covered by a single
size.
Another aspect is the cost of elf_hash_init(), which for large tables
dominates the runtime for small input files. It turns out that all it
does it assign NULL, something that is required when using malloc().
However, when we allocate memory using mmap(), we're guaranteed to get
zero filled pages.
Therefore, rewrite the whole thing to:
1) use more dynamic sized tables, depending on the input file,
2) avoid the need for elf_hash_init() entirely by using mmap().
This speeds up a regular kernel build (100s to 98s for
x86_64-defconfig), and potentially dramatically speeds up vmlinux
processing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194157.452881700@infradead.org
Pull objtool updates from Ingo Molnar:
- Standardize the crypto asm code so that it looks like compiler-
generated code to objtool - so that it can understand it. This
enables unwinding from crypto asm code - and also fixes the last
known remaining objtool warnings for LTO and more.
- x86 decoder fixes: clean up and fix the decoder, and also extend it a
bit
- Misc fixes and cleanups
* tag 'objtool-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/crypto: Enable objtool in crypto code
x86/crypto/sha512-ssse3: Standardize stack alignment prologue
x86/crypto/sha512-avx2: Standardize stack alignment prologue
x86/crypto/sha512-avx: Standardize stack alignment prologue
x86/crypto/sha256-avx2: Standardize stack alignment prologue
x86/crypto/sha1_avx2: Standardize stack alignment prologue
x86/crypto/sha_ni: Standardize stack alignment prologue
x86/crypto/crc32c-pcl-intel: Standardize jump table
x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack buffer
x86/crypto/aesni-intel_avx: Standardize stack alignment prologue
x86/crypto/aesni-intel_avx: Fix register usage comments
x86/crypto/aesni-intel_avx: Remove unused macros
objtool: Support asm jump tables
objtool: Parse options from OBJTOOL_ARGS
objtool: Collate parse_options() users
objtool: Add --backup
objtool,x86: More ModRM sugar
objtool,x86: Rewrite ADD/SUB/AND
objtool,x86: Support %riz encodings
objtool,x86: Simplify register decode
...
Provide infrastructure for architectures to rewrite/augment compiler
generated retpoline calls. Similar to what we do for static_call()s,
keep track of the instructions that are retpoline calls.
Use the same list_head, since a retpoline call cannot also be a
static_call.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151300.130805730@infradead.org
Instead of manually calling elf_rebuild_reloc_section() on sections
we've called elf_add_reloc() on, have elf_write() DTRT.
This makes it easier to add random relocations in places without
carefully tracking when we're done and need to flush what section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.754213408@infradead.org
Pull more clang LTO updates from Kees Cook:
"Clang LTO x86 enablement.
Full disclosure: while this has _not_ been in linux-next (since it
initially looked like the objtool dependencies weren't going to make
v5.12), it has been under daily build and runtime testing by Sami for
quite some time. These x86 portions have been discussed on lkml, with
Peter, Josh, and others helping nail things down.
The bulk of the changes are to get objtool working happily. The rest
of the x86 enablement is very small.
Summary:
- Generate __mcount_loc in objtool (Peter Zijlstra)
- Support running objtool against vmlinux.o (Sami Tolvanen)
- Clang LTO enablement for x86 (Sami Tolvanen)"
Link: https://lore.kernel.org/lkml/20201013003203.4168817-26-samitolvanen@google.com/
Link: https://lore.kernel.org/lkml/cover.1611263461.git.jpoimboe@redhat.com/
* tag 'clang-lto-v5.12-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: lto: force rebuilds when switching CONFIG_LTO
x86, build: allow LTO to be selected
x86, cpu: disable LTO for cpu.c
x86, vdso: disable LTO only for vDSO
kbuild: lto: postpone objtool
objtool: Split noinstr validation from --vmlinux
x86, build: use objtool mcount
tracing: add support for objtool mcount
objtool: Don't autodetect vmlinux.o
objtool: Fix __mcount_loc generation with Clang's assembler
objtool: Add a pass for generating __mcount_loc
The ORC metadata generated for UNWIND_HINT_FUNC isn't actually very
func-like. With certain usages it can cause stack state mismatches
because it doesn't set the return address (CFI_RA).
Also, users of UNWIND_HINT_RET_OFFSET no longer need to set a custom
return stack offset. Instead they just need to specify a func-like
situation, so the current ret_offset code is hacky for no good reason.
Solve both problems by simplifying the RET_OFFSET handling and
converting it into a more useful UNWIND_HINT_FUNC.
If we end up needing the old 'ret_offset' functionality again in the
future, we should be able to support it pretty easily with the addition
of a custom 'sp_offset' in UNWIND_HINT_FUNC.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/db9d1f5d79dddfbb3725ef6d8ec3477ad199948d.1611263462.git.jpoimboe@redhat.com