Occasionally objtool driven code patching (think .static_call_sites
.retpoline_sites etc..) goes sideways and it tries to patch an
instruction that doesn't match.
Much head-scatching and cursing later the problem is as outlined below
and affects every section that objtool generates for us, very much
including the ORC data. The below uses .static_call_sites because it's
convenient for demonstration purposes, but as mentioned the ORC
sections, .retpoline_sites and __mount_loc are all similarly affected.
Consider:
foo-weak.c:
extern void __SCT__foo(void);
__attribute__((weak)) void foo(void)
{
return __SCT__foo();
}
foo.c:
extern void __SCT__foo(void);
extern void my_foo(void);
void foo(void)
{
my_foo();
return __SCT__foo();
}
These generate the obvious code
(gcc -O2 -fcf-protection=none -fno-asynchronous-unwind-tables -c foo*.c):
foo-weak.o:
0000000000000000 <foo>:
0: e9 00 00 00 00 jmpq 5 <foo+0x5> 1: R_X86_64_PLT32 __SCT__foo-0x4
foo.o:
0000000000000000 <foo>:
0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <foo+0x9> 5: R_X86_64_PLT32 my_foo-0x4
9: 48 83 c4 08 add $0x8,%rsp
d: e9 00 00 00 00 jmpq 12 <foo+0x12> e: R_X86_64_PLT32 __SCT__foo-0x4
Now, when we link these two files together, you get something like
(ld -r -o foos.o foo-weak.o foo.o):
foos.o:
0000000000000000 <foo-0x10>:
0: e9 00 00 00 00 jmpq 5 <foo-0xb> 1: R_X86_64_PLT32 __SCT__foo-0x4
5: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1)
f: 90 nop
0000000000000010 <foo>:
10: 48 83 ec 08 sub $0x8,%rsp
14: e8 00 00 00 00 callq 19 <foo+0x9> 15: R_X86_64_PLT32 my_foo-0x4
19: 48 83 c4 08 add $0x8,%rsp
1d: e9 00 00 00 00 jmpq 22 <foo+0x12> 1e: R_X86_64_PLT32 __SCT__foo-0x4
Noting that ld preserves the weak function text, but strips the symbol
off of it (hence objdump doing that funny negative offset thing). This
does lead to 'interesting' unused code issues with objtool when ran on
linked objects, but that seems to be working (fingers crossed).
So far so good.. Now lets consider the objtool static_call output
section (readelf output, old binutils):
foo-weak.o:
Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 .text + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foo.o:
Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 .text + d
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foos.o:
Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000100000002 R_X86_64_PC32 0000000000000000 .text + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
0000000000000008 0000000100000002 R_X86_64_PC32 0000000000000000 .text + 1d
000000000000000c 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
So we have two patch sites, one in the dead code of the weak foo and one
in the real foo. All is well.
*HOWEVER*, when the toolchain strips unused section symbols it
generates things like this (using new enough binutils):
foo-weak.o:
Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 foo + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foo.o:
Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 foo + d
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foos.o:
Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000100000002 R_X86_64_PC32 0000000000000000 foo + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
0000000000000008 0000000100000002 R_X86_64_PC32 0000000000000000 foo + d
000000000000000c 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
And now we can see how that foos.o .static_call_sites goes side-ways, we
now have _two_ patch sites in foo. One for the weak symbol at foo+0
(which is no longer a static_call site!) and one at foo+d which is in
fact the right location.
This seems to happen when objtool cannot find a section symbol, in which
case it falls back to any other symbol to key off of, however in this
case that goes terribly wrong!
As such, teach objtool to create a section symbol when there isn't
one.
Fixes: 44f6a7c075 ("objtool: Fix seg fault with Clang non-section symbols")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20220419203807.655552918@infradead.org
There's a fun implementation detail on linking STB_WEAK symbols. When
the linker combines two translation units, where one contains a weak
function and the other an override for it. It simply strips the
STB_WEAK symbol from the symbol table, but doesn't actually remove the
code.
The result is that when objtool is ran in a whole-archive kind of way,
it will encounter *heaps* of unused (and unreferenced) code. All
rudiments of weak functions.
Additionally, when a weak implementation is split into a .cold
subfunction that .cold symbol is left in place, even though completely
unused.
Teach objtool to ignore such rudiments by searching for symbol holes;
that is, code ranges that fall outside the given symbol bounds.
Specifically, ignore a sequence of unreachable instruction iff they
occupy a single hole, additionally ignore any .cold subfunctions
referenced.
Both ld.bfd and ld.lld behave like this. LTO builds otoh can (and do)
properly DCE weak functions.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220308154319.232019347@infradead.org
Boris reported that in one of his randconfig builds, objtool got
infinitely stuck. Turns out there's trivial list corruption in the
pv_ops tracking when a function is both in a static table and in a code
assignment.
Avoid re-adding function to the pv_ops[] lists when they're already on
it.
Fixes: db2b0c5d7b ("objtool: Support pv_opsindirect calls for noinstr")
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20211202204534.GA16608@worktop.programming.kicks-ass.net
Pull objtool updates from Thomas Gleixner:
- Improve retpoline code patching by separating it from alternatives
which reduces memory footprint and allows to do better optimizations
in the actual runtime patching.
- Add proper retpoline support for x86/BPF
- Address noinstr warnings in x86/kvm, lockdep and paravirtualization
code
- Add support to handle pv_opsindirect calls in the noinstr analysis
- Classify symbols upfront and cache the result to avoid redundant
str*cmp() invocations.
- Add a CFI hash to reduce memory consumption which also reduces
runtime on a allyesconfig by ~50%
- Adjust XEN code to make objtool handling more robust and as a side
effect to prevent text fragmentation due to placement of the
hypercall page.
* tag 'objtool-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
bpf,x86: Respect X86_FEATURE_RETPOLINE*
bpf,x86: Simplify computing label offsets
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
x86/alternative: Add debug prints to apply_retpolines()
x86/alternative: Try inline spectre_v2=retpoline,amd
x86/alternative: Handle Jcc __x86_indirect_thunk_\reg
x86/alternative: Implement .retpoline_sites support
x86/retpoline: Create a retpoline thunk array
x86/retpoline: Move the retpoline thunk declarations to nospec-branch.h
x86/asm: Fixup odd GEN-for-each-reg.h usage
x86/asm: Fix register order
x86/retpoline: Remove unused replacement symbols
objtool,x86: Replace alternatives with .retpoline_sites
objtool: Shrink struct instruction
objtool: Explicitly avoid self modifying code in .altinstr_replacement
objtool: Classify symbols
objtool: Support pv_opsindirect calls for noinstr
x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
x86/xen: Mark xen_force_evtchn_callback() noinstr
x86/xen: Make irq_disable() noinstr
...
Pull objtool fix and updates from Ingo Molnar:
"An ELF format fix for a section flags mismatch bug that breaks kernel
tooling such as kpatch-build.
The biggest change in this cycle is the new code to handle and rewrite
variable sized jump labels - which results in slightly tighter code
generation in hot paths, through the use of short(er) NOPs.
Also a number of cleanups and fixes, and a change to the generic
include/linux/compiler.h to handle a s390 GCC quirk"
* tag 'objtool-urgent-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Don't make .altinstructions writable
* tag 'objtool-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Improve reloc hash size guestimate
instrumentation.h: Avoid using inline asm operand modifiers
compiler.h: Avoid using inline asm operand modifiers
kbuild: Fix objtool dependency for 'OBJECT_FILES_NON_STANDARD_<obj> := n'
objtool: Reflow handle_jump_alt()
jump_label/x86: Remove unused JUMP_LABEL_NOP_SIZE
jump_label, x86: Allow short NOPs
objtool: Provide stats for jump_labels
objtool: Rewrite jump_label instructions
objtool: Decode jump_entry::key addend
jump_label, x86: Emit short JMP
jump_label: Free jump_entry::key bit1 for build use
jump_label, x86: Add variable length patching support
jump_label, x86: Introduce jump_entry_size()
jump_label, x86: Improve error when we fail expected text
jump_label, x86: Factor out the __jump_table generation
jump_label, x86: Strip ASM jump_label support
x86, objtool: Dont exclude arch/x86/realmode/
objtool: Rewrite hashtable sizing
Nathan reported that LLVM ThinLTO builds have a performance regression
with commit 25cf0d8aa2 ("objtool: Rewrite hashtable sizing"). Sami
was quick to note that this is due to their use of -ffunction-sections.
As a result the .text section is small and basing the number of relocs
off of that no longer works. Instead have read_sections() compute the
sum of all SHF_EXECINSTR sections and use that.
Fixes: 25cf0d8aa2 ("objtool: Rewrite hashtable sizing")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Debugged-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lkml.kernel.org/r/YMJpGLuGNsGtA5JJ@hirez.programming.kicks-ass.net
When an ELF object uses extended symbol section indexes (IOW it has a
.symtab_shndx section), these must be kept in sync with the regular
symbol table (.symtab).
So for every new symbol we emit, make sure to also emit a
.symtab_shndx value to keep the arrays of equal size.
Note: since we're writing an UNDEF symbol, most GElf_Sym fields will
be 0 and we can repurpose one (st_size) to host the 0 for the xshndx
value.
Fixes: 2f2f7e47f0 ("objtool: Add elf_create_undef_symbol()")
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/YL3q1qFO9QIRL/BA@hirez.programming.kicks-ass.net
Currently objtool has 5 hashtables and sizes them 16 or 20 bits
depending on the --vmlinux argument.
However, a single side doesn't really work well for the 5 tables,
which among them, cover 3 different uses. Also, while vmlinux is
larger, there is still a very wide difference between a defconfig and
allyesconfig build, which again isn't optimally covered by a single
size.
Another aspect is the cost of elf_hash_init(), which for large tables
dominates the runtime for small input files. It turns out that all it
does it assign NULL, something that is required when using malloc().
However, when we allocate memory using mmap(), we're guaranteed to get
zero filled pages.
Therefore, rewrite the whole thing to:
1) use more dynamic sized tables, depending on the input file,
2) avoid the need for elf_hash_init() entirely by using mmap().
This speeds up a regular kernel build (100s to 98s for
x86_64-defconfig), and potentially dramatically speeds up vmlinux
processing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210506194157.452881700@infradead.org
Instead of manually calling elf_rebuild_reloc_section() on sections
we've called elf_add_reloc() on, have elf_write() DTRT.
This makes it easier to add random relocations in places without
carefully tracking when we're done and need to flush what section.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/20210326151259.754213408@infradead.org
Pull objtool updates from Thomas Gleixner:
- Make objtool work for big-endian cross compiles
- Make stack tracking via stack pointer memory operations match
push/pop semantics to prepare for architectures w/o PUSH/POP
instructions.
- Add support for analyzing alternatives
- Improve retpoline detection and handling
- Improve assembly code coverage on x86
- Provide support for inlined stack switching
* tag 'objtool-core-2021-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
objtool: Support stack-swizzle
objtool,x86: Additionally decode: mov %rsp, (%reg)
x86/unwind/orc: Change REG_SP_INDIRECT
x86/power: Support objtool validation in hibernate_asm_64.S
x86/power: Move restore_registers() to top of the file
x86/power: Annotate indirect branches as safe
x86/acpi: Support objtool validation in wakeup_64.S
x86/acpi: Annotate indirect branch as safe
x86/ftrace: Support objtool vmlinux.o validation in ftrace_64.S
x86/xen/pvh: Annotate indirect branch as safe
x86/xen: Support objtool vmlinux.o validation in xen-head.S
x86/xen: Support objtool validation in xen-asm.S
objtool: Add xen_start_kernel() to noreturn list
objtool: Combine UNWIND_HINT_RET_OFFSET and UNWIND_HINT_FUNC
objtool: Add asm version of STACK_FRAME_NON_STANDARD
objtool: Assume only ELF functions do sibling calls
x86/ftrace: Add UNWIND_HINT_FUNC annotation for ftrace_stub
objtool: Support retpoline jump detection for vmlinux.o
objtool: Fix ".cold" section suffix check for newer versions of GCC
objtool: Fix retpoline detection in asm code
...
I've always been bothered by the endless (fragile) boilerplate for
rbtree, and I recently wrote some rbtree helpers for objtool and
figured I should lift them into the kernel and use them more widely.
Provide:
partial-order; less() based:
- rb_add(): add a new entry to the rbtree
- rb_add_cached(): like rb_add(), but for a rb_root_cached
total-order; cmp() based:
- rb_find(): find an entry in an rbtree
- rb_find_add(): find an entry, and add if not found
- rb_find_first(): find the first (leftmost) matching entry
- rb_next_match(): continue from rb_find_first()
- rb_for_each(): iterate a sub-tree using the previous two
Inlining and constant propagation should see the compiler inline the
whole thing, including the various compare functions.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Thanks to a recent binutils change which doesn't generate unused
symbols, it's now possible for thunk_64.o be completely empty without
CONFIG_PREEMPTION: no text, no data, no symbols.
We could edit the Makefile to only build that file when
CONFIG_PREEMPTION is enabled, but that will likely create confusion
if/when the thunks end up getting used by some other code again.
Just ignore it and move on.
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1254
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Currently objtool headers are being included either by their base name
or included via ../ from a parent directory. In case of a base name usage:
#include "warn.h"
#include "arch_elf.h"
it does not make it apparent from which directory the file comes from.
To make it slightly better, and actually to avoid name clashes some arch
specific files have "arch_" suffix. And files from an arch folder have
to revert to including via ../ e.g:
#include "../../elf.h"
With additional architectures support and the code base growth there is
a need for clearer headers naming scheme for multiple reasons:
1. to make it instantly obvious where these files come from (objtool
itself / objtool arch|generic folders / some other external files),
2. to avoid name clashes of objtool arch specific headers, potential
obtool arch generic headers and the system header files (there is
/usr/include/elf.h already),
3. to avoid ../ includes and improve code readability.
4. to give a warm fuzzy feeling to developers who are mostly kernel
developers and are accustomed to linux kernel headers arranging
scheme.
Doesn't this make it instantly obvious where are these files come from?
#include <objtool/warn.h>
#include <arch/elf.h>
And doesn't it look nicer to avoid ugly ../ includes? Which also
guarantees this is elf.h from the objtool and not /usr/include/elf.h.
#include <objtool/elf.h>
This patch defines and implements new objtool headers arranging
scheme. Which is:
- all generic headers go to include/objtool (similar to include/linux)
- all arch headers go to arch/$(SRCARCH)/include/arch (to get arch
prefix). This is similar to linux arch specific "asm/*" headers but we
are not abusing "asm" name and calling it what it is. This also helps
to prevent name clashes (arch is not used in system headers or kernel
exports).
To bring objtool to this state the following things are done:
1. current top level tools/objtool/ headers are moved into
include/objtool/ subdirectory,
2. arch specific headers, currently only arch/x86/include/ are moved into
arch/x86/include/arch/ and were stripped of "arch_" suffix,
3. new -I$(srctree)/tools/objtool/include include path to make
includes like <objtool/warn.h> possible,
4. rewriting file includes,
5. make git not to ignore include/objtool/ subdirectory.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>