mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
Merge tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov: - Add BPF uprobe session support (Jiri Olsa) - Optimize uprobe performance (Andrii Nakryiko) - Add bpf_fastcall support to helpers and kfuncs (Eduard Zingerman) - Avoid calling free_htab_elem() under hash map bucket lock (Hou Tao) - Prevent tailcall infinite loop caused by freplace (Leon Hwang) - Mark raw_tracepoint arguments as nullable (Kumar Kartikeya Dwivedi) - Introduce uptr support in the task local storage map (Martin KaFai Lau) - Stringify errno log messages in libbpf (Mykyta Yatsenko) - Add kmem_cache BPF iterator for perf's lock profiling (Namhyung Kim) - Support BPF objects of either endianness in libbpf (Tony Ambardar) - Add ksym to struct_ops trampoline to fix stack trace (Xu Kuohai) - Introduce private stack for eligible BPF programs (Yonghong Song) - Migrate samples/bpf tests to selftests/bpf test_progs (Daniel T. Lee) - Migrate test_sock to selftests/bpf test_progs (Jordan Rife) * tag 'bpf-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (152 commits) libbpf: Change hash_combine parameters from long to unsigned long selftests/bpf: Fix build error with llvm 19 libbpf: Fix memory leak in bpf_program__attach_uprobe_multi bpf: use common instruction history across all states bpf: Add necessary migrate_disable to range_tree. bpf: Do not alloc arena on unsupported arches selftests/bpf: Set test path for token/obj_priv_implicit_token_envvar selftests/bpf: Add a test for arena range tree algorithm bpf: Introduce range_tree data structure and use it in bpf arena samples/bpf: Remove unused variable in xdp2skb_meta_kern.c samples/bpf: Remove unused variables in tc_l2_redirect_kern.c bpftool: Cast variable `var` to long long bpf, x86: Propagate tailcall info only for subprogs bpf: Add kernel symbol for struct_ops trampoline bpf: Use function pointers count as struct_ops links count bpf: Remove unused member rcu from bpf_struct_ops_map selftests/bpf: Add struct_ops prog private stack tests bpf: Support private stack for struct_ops progs selftests/bpf: Add tracing prog private stack tests bpf, x86: Support private stack in jit ...
This commit is contained in:
@@ -835,7 +835,7 @@ section named by ``btf_ext_info_sec->sec_name_off``.
|
||||
See :ref:`Documentation/bpf/llvm_reloc.rst <btf-co-re-relocations>`
|
||||
for more information on CO-RE relocations.
|
||||
|
||||
4.2 .BTF_ids section
|
||||
4.3 .BTF_ids section
|
||||
--------------------
|
||||
|
||||
The .BTF_ids section encodes BTF ID values that are used within the kernel.
|
||||
@@ -896,6 +896,81 @@ and is used as a filter when resolving the BTF ID value.
|
||||
All the BTF ID lists and sets are compiled in the .BTF_ids section and
|
||||
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.
|
||||
|
||||
4.4 .BTF.base section
|
||||
---------------------
|
||||
Split BTF - where the .BTF section only contains types not in the associated
|
||||
base .BTF section - is an extremely efficient way to encode type information
|
||||
for kernel modules, since they generally consist of a few module-specific
|
||||
types along with a large set of shared kernel types. The former are encoded
|
||||
in split BTF, while the latter are encoded in base BTF, resulting in more
|
||||
compact representations. A type in split BTF that refers to a type in
|
||||
base BTF refers to it using its base BTF ID, and split BTF IDs start
|
||||
at last_base_BTF_ID + 1.
|
||||
|
||||
The downside of this approach however is that this makes the split BTF
|
||||
somewhat brittle - when the base BTF changes, base BTF ID references are
|
||||
no longer valid and the split BTF itself becomes useless. The role of the
|
||||
.BTF.base section is to make split BTF more resilient for cases where
|
||||
the base BTF may change, as is the case for kernel modules not built every
|
||||
time the kernel is for example. .BTF.base contains named base types; INTs,
|
||||
FLOATs, STRUCTs, UNIONs, ENUM[64]s and FWDs. INTs and FLOATs are fully
|
||||
described in .BTF.base sections, while composite types like structs
|
||||
and unions are not fully defined - the .BTF.base type simply serves as
|
||||
a description of the type the split BTF referred to, so structs/unions
|
||||
have 0 members in the .BTF.base section. ENUM[64]s are similarly recorded
|
||||
with 0 members. Any other types are added to the split BTF. This
|
||||
distillation process then leaves us with a .BTF.base section with
|
||||
such minimal descriptions of base types and .BTF split section which refers
|
||||
to those base types. Later, we can relocate the split BTF using both the
|
||||
information stored in the .BTF.base section and the new .BTF base; the type
|
||||
information in the .BTF.base section allows us to update the split BTF
|
||||
references to point at the corresponding new base BTF IDs.
|
||||
|
||||
BTF relocation happens on kernel module load when a kernel module has a
|
||||
.BTF.base section, and libbpf also provides a btf__relocate() API to
|
||||
accomplish this.
|
||||
|
||||
As an example consider the following base BTF::
|
||||
|
||||
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
|
||||
[2] STRUCT 'foo' size=8 vlen=2
|
||||
'f1' type_id=1 bits_offset=0
|
||||
'f2' type_id=1 bits_offset=32
|
||||
|
||||
...and associated split BTF::
|
||||
|
||||
[3] PTR '(anon)' type_id=2
|
||||
|
||||
i.e. split BTF describes a pointer to struct foo { int f1; int f2 };
|
||||
|
||||
.BTF.base will consist of::
|
||||
|
||||
[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
|
||||
[2] STRUCT 'foo' size=8 vlen=0
|
||||
|
||||
If we relocate the split BTF later using the following new base BTF::
|
||||
|
||||
[1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64 encoding=(none)
|
||||
[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
|
||||
[3] STRUCT 'foo' size=8 vlen=2
|
||||
'f1' type_id=2 bits_offset=0
|
||||
'f2' type_id=2 bits_offset=32
|
||||
|
||||
...we can use our .BTF.base description to know that the split BTF reference
|
||||
is to struct foo, and relocation results in new split BTF::
|
||||
|
||||
[4] PTR '(anon)' type_id=3
|
||||
|
||||
Note that we had to update BTF ID and start BTF ID for the split BTF.
|
||||
|
||||
So we see how .BTF.base plays the role of facilitating later relocation,
|
||||
leading to more resilient split BTF.
|
||||
|
||||
.BTF.base sections will be generated automatically for out-of-tree kernel module
|
||||
builds - i.e. where KBUILD_EXTMOD is set (as it would be for "make M=path/2/mod"
|
||||
cases). .BTF.base generation requires pahole support for the "distilled_base"
|
||||
BTF feature; this is available in pahole v1.28 and later.
|
||||
|
||||
5. Using BTF
|
||||
============
|
||||
|
||||
|
||||
@@ -507,7 +507,7 @@ Notes:
|
||||
from the parent state to the current state.
|
||||
|
||||
* Details about REG_LIVE_READ32 are omitted.
|
||||
|
||||
|
||||
* Function ``propagate_liveness()`` (see section :ref:`read_marks_for_cache_hits`)
|
||||
might override the first parent link. Please refer to the comments in the
|
||||
``propagate_liveness()`` and ``mark_reg_read()`` source code for further
|
||||
@@ -571,7 +571,7 @@ works::
|
||||
are considered equivalent.
|
||||
|
||||
.. _read_marks_for_cache_hits:
|
||||
|
||||
|
||||
Read marks propagation for cache hits
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -2094,6 +2094,12 @@ static void restore_args(struct jit_ctx *ctx, int args_off, int nregs)
|
||||
}
|
||||
}
|
||||
|
||||
static bool is_struct_ops_tramp(const struct bpf_tramp_links *fentry_links)
|
||||
{
|
||||
return fentry_links->nr_links == 1 &&
|
||||
fentry_links->links[0]->link.type == BPF_LINK_TYPE_STRUCT_OPS;
|
||||
}
|
||||
|
||||
/* Based on the x86's implementation of arch_prepare_bpf_trampoline().
|
||||
*
|
||||
* bpf prog and function entry before bpf trampoline hooked:
|
||||
@@ -2123,6 +2129,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
|
||||
struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
|
||||
bool save_ret;
|
||||
__le32 **branches = NULL;
|
||||
bool is_struct_ops = is_struct_ops_tramp(fentry);
|
||||
|
||||
/* trampoline stack layout:
|
||||
* [ parent ip ]
|
||||
@@ -2191,11 +2198,14 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
|
||||
*/
|
||||
emit_bti(A64_BTI_JC, ctx);
|
||||
|
||||
/* frame for parent function */
|
||||
emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
|
||||
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
|
||||
/* x9 is not set for struct_ops */
|
||||
if (!is_struct_ops) {
|
||||
/* frame for parent function */
|
||||
emit(A64_PUSH(A64_FP, A64_R(9), A64_SP), ctx);
|
||||
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
|
||||
}
|
||||
|
||||
/* frame for patched function */
|
||||
/* frame for patched function for tracing, or caller for struct_ops */
|
||||
emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
|
||||
emit(A64_MOV(1, A64_FP, A64_SP), ctx);
|
||||
|
||||
@@ -2289,19 +2299,24 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im,
|
||||
/* reset SP */
|
||||
emit(A64_MOV(1, A64_SP, A64_FP), ctx);
|
||||
|
||||
/* pop frames */
|
||||
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
|
||||
emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
|
||||
|
||||
if (flags & BPF_TRAMP_F_SKIP_FRAME) {
|
||||
/* skip patched function, return to parent */
|
||||
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
|
||||
emit(A64_RET(A64_R(9)), ctx);
|
||||
if (is_struct_ops) {
|
||||
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
|
||||
emit(A64_RET(A64_LR), ctx);
|
||||
} else {
|
||||
/* return to patched function */
|
||||
emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
|
||||
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
|
||||
emit(A64_RET(A64_R(10)), ctx);
|
||||
/* pop frames */
|
||||
emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
|
||||
emit(A64_POP(A64_FP, A64_R(9), A64_SP), ctx);
|
||||
|
||||
if (flags & BPF_TRAMP_F_SKIP_FRAME) {
|
||||
/* skip patched function, return to parent */
|
||||
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
|
||||
emit(A64_RET(A64_R(9)), ctx);
|
||||
} else {
|
||||
/* return to patched function */
|
||||
emit(A64_MOV(1, A64_R(10), A64_LR), ctx);
|
||||
emit(A64_MOV(1, A64_LR, A64_R(9)), ctx);
|
||||
emit(A64_RET(A64_R(10)), ctx);
|
||||
}
|
||||
}
|
||||
|
||||
kfree(branches);
|
||||
|
||||
@@ -325,6 +325,22 @@ struct jit_context {
|
||||
/* Number of bytes that will be skipped on tailcall */
|
||||
#define X86_TAIL_CALL_OFFSET (12 + ENDBR_INSN_SIZE)
|
||||
|
||||
static void push_r9(u8 **pprog)
|
||||
{
|
||||
u8 *prog = *pprog;
|
||||
|
||||
EMIT2(0x41, 0x51); /* push r9 */
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
static void pop_r9(u8 **pprog)
|
||||
{
|
||||
u8 *prog = *pprog;
|
||||
|
||||
EMIT2(0x41, 0x59); /* pop r9 */
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
static void push_r12(u8 **pprog)
|
||||
{
|
||||
u8 *prog = *pprog;
|
||||
@@ -1404,6 +1420,24 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr)
|
||||
{
|
||||
u8 *prog = *pprog;
|
||||
|
||||
/* movabs r9, priv_frame_ptr */
|
||||
emit_mov_imm64(&prog, X86_REG_R9, (__force long) priv_frame_ptr >> 32,
|
||||
(u32) (__force long) priv_frame_ptr);
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
/* add <r9>, gs:[<off>] */
|
||||
EMIT2(0x65, 0x4c);
|
||||
EMIT3(0x03, 0x0c, 0x25);
|
||||
EMIT((u32)(unsigned long)&this_cpu_off, 4);
|
||||
#endif
|
||||
|
||||
*pprog = prog;
|
||||
}
|
||||
|
||||
#define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
|
||||
|
||||
#define __LOAD_TCC_PTR(off) \
|
||||
@@ -1412,6 +1446,10 @@ static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op)
|
||||
#define LOAD_TAIL_CALL_CNT_PTR(stack) \
|
||||
__LOAD_TCC_PTR(BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack))
|
||||
|
||||
/* Memory size/value to protect private stack overflow/underflow */
|
||||
#define PRIV_STACK_GUARD_SZ 8
|
||||
#define PRIV_STACK_GUARD_VAL 0xEB9F12345678eb9fULL
|
||||
|
||||
static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
|
||||
int oldproglen, struct jit_context *ctx, bool jmp_padding)
|
||||
{
|
||||
@@ -1421,18 +1459,28 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
|
||||
int insn_cnt = bpf_prog->len;
|
||||
bool seen_exit = false;
|
||||
u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
|
||||
void __percpu *priv_frame_ptr = NULL;
|
||||
u64 arena_vm_start, user_vm_start;
|
||||
void __percpu *priv_stack_ptr;
|
||||
int i, excnt = 0;
|
||||
int ilen, proglen = 0;
|
||||
u8 *prog = temp;
|
||||
u32 stack_depth;
|
||||
int err;
|
||||
|
||||
stack_depth = bpf_prog->aux->stack_depth;
|
||||
priv_stack_ptr = bpf_prog->aux->priv_stack_ptr;
|
||||
if (priv_stack_ptr) {
|
||||
priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8);
|
||||
stack_depth = 0;
|
||||
}
|
||||
|
||||
arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena);
|
||||
user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena);
|
||||
|
||||
detect_reg_usage(insn, insn_cnt, callee_regs_used);
|
||||
|
||||
emit_prologue(&prog, bpf_prog->aux->stack_depth,
|
||||
emit_prologue(&prog, stack_depth,
|
||||
bpf_prog_was_classic(bpf_prog), tail_call_reachable,
|
||||
bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
|
||||
/* Exception callback will clobber callee regs for its own use, and
|
||||
@@ -1454,6 +1502,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
|
||||
emit_mov_imm64(&prog, X86_REG_R12,
|
||||
arena_vm_start >> 32, (u32) arena_vm_start);
|
||||
|
||||
if (priv_frame_ptr)
|
||||
emit_priv_frame_ptr(&prog, priv_frame_ptr);
|
||||
|
||||
ilen = prog - temp;
|
||||
if (rw_image)
|
||||
memcpy(rw_image + proglen, temp, ilen);
|
||||
@@ -1473,6 +1524,14 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
|
||||
u8 *func;
|
||||
int nops;
|
||||
|
||||
if (priv_frame_ptr) {
|
||||
if (src_reg == BPF_REG_FP)
|
||||
src_reg = X86_REG_R9;
|
||||
|
||||
if (dst_reg == BPF_REG_FP)
|
||||
dst_reg = X86_REG_R9;
|
||||
}
|
||||
|
||||
switch (insn->code) {
|
||||
/* ALU */
|
||||
case BPF_ALU | BPF_ADD | BPF_X:
|
||||
@@ -2127,15 +2186,21 @@ populate_extable:
|
||||
u8 *ip = image + addrs[i - 1];
|
||||
|
||||
func = (u8 *) __bpf_call_base + imm32;
|
||||
if (tail_call_reachable) {
|
||||
LOAD_TAIL_CALL_CNT_PTR(bpf_prog->aux->stack_depth);
|
||||
if (src_reg == BPF_PSEUDO_CALL && tail_call_reachable) {
|
||||
LOAD_TAIL_CALL_CNT_PTR(stack_depth);
|
||||
ip += 7;
|
||||
}
|
||||
if (!imm32)
|
||||
return -EINVAL;
|
||||
if (priv_frame_ptr) {
|
||||
push_r9(&prog);
|
||||
ip += 2;
|
||||
}
|
||||
ip += x86_call_depth_emit_accounting(&prog, func, ip);
|
||||
if (emit_call(&prog, func, ip))
|
||||
return -EINVAL;
|
||||
if (priv_frame_ptr)
|
||||
pop_r9(&prog);
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -2145,13 +2210,13 @@ populate_extable:
|
||||
&bpf_prog->aux->poke_tab[imm32 - 1],
|
||||
&prog, image + addrs[i - 1],
|
||||
callee_regs_used,
|
||||
bpf_prog->aux->stack_depth,
|
||||
stack_depth,
|
||||
ctx);
|
||||
else
|
||||
emit_bpf_tail_call_indirect(bpf_prog,
|
||||
&prog,
|
||||
callee_regs_used,
|
||||
bpf_prog->aux->stack_depth,
|
||||
stack_depth,
|
||||
image + addrs[i - 1],
|
||||
ctx);
|
||||
break;
|
||||
@@ -3303,6 +3368,42 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func
|
||||
return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs, image, buf);
|
||||
}
|
||||
|
||||
static const char *bpf_get_prog_name(struct bpf_prog *prog)
|
||||
{
|
||||
if (prog->aux->ksym.prog)
|
||||
return prog->aux->ksym.name;
|
||||
return prog->aux->name;
|
||||
}
|
||||
|
||||
static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size)
|
||||
{
|
||||
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
|
||||
u64 *stack_ptr;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
|
||||
stack_ptr[0] = PRIV_STACK_GUARD_VAL;
|
||||
stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL;
|
||||
}
|
||||
}
|
||||
|
||||
static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size,
|
||||
struct bpf_prog *prog)
|
||||
{
|
||||
int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3;
|
||||
u64 *stack_ptr;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu);
|
||||
if (stack_ptr[0] != PRIV_STACK_GUARD_VAL ||
|
||||
stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL) {
|
||||
pr_err("BPF private stack overflow/underflow detected for prog %sx\n",
|
||||
bpf_get_prog_name(prog));
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct x64_jit_data {
|
||||
struct bpf_binary_header *rw_header;
|
||||
struct bpf_binary_header *header;
|
||||
@@ -3320,7 +3421,9 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
|
||||
struct bpf_binary_header *rw_header = NULL;
|
||||
struct bpf_binary_header *header = NULL;
|
||||
struct bpf_prog *tmp, *orig_prog = prog;
|
||||
void __percpu *priv_stack_ptr = NULL;
|
||||
struct x64_jit_data *jit_data;
|
||||
int priv_stack_alloc_sz;
|
||||
int proglen, oldproglen = 0;
|
||||
struct jit_context ctx = {};
|
||||
bool tmp_blinded = false;
|
||||
@@ -3356,6 +3459,23 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
|
||||
}
|
||||
prog->aux->jit_data = jit_data;
|
||||
}
|
||||
priv_stack_ptr = prog->aux->priv_stack_ptr;
|
||||
if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) {
|
||||
/* Allocate actual private stack size with verifier-calculated
|
||||
* stack size plus two memory guards to protect overflow and
|
||||
* underflow.
|
||||
*/
|
||||
priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 8) +
|
||||
2 * PRIV_STACK_GUARD_SZ;
|
||||
priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_sz, 8, GFP_KERNEL);
|
||||
if (!priv_stack_ptr) {
|
||||
prog = orig_prog;
|
||||
goto out_priv_stack;
|
||||
}
|
||||
|
||||
priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_sz);
|
||||
prog->aux->priv_stack_ptr = priv_stack_ptr;
|
||||
}
|
||||
addrs = jit_data->addrs;
|
||||
if (addrs) {
|
||||
ctx = jit_data->ctx;
|
||||
@@ -3491,6 +3611,11 @@ out_image:
|
||||
bpf_prog_fill_jited_linfo(prog, addrs + 1);
|
||||
out_addrs:
|
||||
kvfree(addrs);
|
||||
if (!image && priv_stack_ptr) {
|
||||
free_percpu(priv_stack_ptr);
|
||||
prog->aux->priv_stack_ptr = NULL;
|
||||
}
|
||||
out_priv_stack:
|
||||
kfree(jit_data);
|
||||
prog->aux->jit_data = NULL;
|
||||
}
|
||||
@@ -3529,6 +3654,8 @@ void bpf_jit_free(struct bpf_prog *prog)
|
||||
if (prog->jited) {
|
||||
struct x64_jit_data *jit_data = prog->aux->jit_data;
|
||||
struct bpf_binary_header *hdr;
|
||||
void __percpu *priv_stack_ptr;
|
||||
int priv_stack_alloc_sz;
|
||||
|
||||
/*
|
||||
* If we fail the final pass of JIT (from jit_subprogs),
|
||||
@@ -3544,6 +3671,13 @@ void bpf_jit_free(struct bpf_prog *prog)
|
||||
prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset();
|
||||
hdr = bpf_jit_binary_pack_hdr(prog);
|
||||
bpf_jit_binary_pack_free(hdr, NULL);
|
||||
priv_stack_ptr = prog->aux->priv_stack_ptr;
|
||||
if (priv_stack_ptr) {
|
||||
priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 8) +
|
||||
2 * PRIV_STACK_GUARD_SZ;
|
||||
priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_sz, prog);
|
||||
free_percpu(prog->aux->priv_stack_ptr);
|
||||
}
|
||||
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog));
|
||||
}
|
||||
|
||||
@@ -3559,6 +3693,11 @@ bool bpf_jit_supports_exceptions(void)
|
||||
return IS_ENABLED(CONFIG_UNWINDER_ORC);
|
||||
}
|
||||
|
||||
bool bpf_jit_supports_private_stack(void)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
|
||||
{
|
||||
#if defined(CONFIG_UNWINDER_ORC)
|
||||
|
||||
@@ -203,6 +203,7 @@ enum btf_field_type {
|
||||
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
|
||||
BPF_REFCOUNT = (1 << 9),
|
||||
BPF_WORKQUEUE = (1 << 10),
|
||||
BPF_UPTR = (1 << 11),
|
||||
};
|
||||
|
||||
typedef void (*btf_dtor_kfunc_t)(void *);
|
||||
@@ -322,6 +323,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
|
||||
return "kptr";
|
||||
case BPF_KPTR_PERCPU:
|
||||
return "percpu_kptr";
|
||||
case BPF_UPTR:
|
||||
return "uptr";
|
||||
case BPF_LIST_HEAD:
|
||||
return "bpf_list_head";
|
||||
case BPF_LIST_NODE:
|
||||
@@ -350,6 +353,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_KPTR_PERCPU:
|
||||
case BPF_UPTR:
|
||||
return sizeof(u64);
|
||||
case BPF_LIST_HEAD:
|
||||
return sizeof(struct bpf_list_head);
|
||||
@@ -379,6 +383,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_KPTR_PERCPU:
|
||||
case BPF_UPTR:
|
||||
return __alignof__(u64);
|
||||
case BPF_LIST_HEAD:
|
||||
return __alignof__(struct bpf_list_head);
|
||||
@@ -419,6 +424,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_KPTR_PERCPU:
|
||||
case BPF_UPTR:
|
||||
break;
|
||||
default:
|
||||
WARN_ON_ONCE(1);
|
||||
@@ -507,6 +513,25 @@ static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src
|
||||
bpf_obj_memcpy(map->record, dst, src, map->value_size, true);
|
||||
}
|
||||
|
||||
static inline void bpf_obj_swap_uptrs(const struct btf_record *rec, void *dst, void *src)
|
||||
{
|
||||
unsigned long *src_uptr, *dst_uptr;
|
||||
const struct btf_field *field;
|
||||
int i;
|
||||
|
||||
if (!btf_record_has_field(rec, BPF_UPTR))
|
||||
return;
|
||||
|
||||
for (i = 0, field = rec->fields; i < rec->cnt; i++, field++) {
|
||||
if (field->type != BPF_UPTR)
|
||||
continue;
|
||||
|
||||
src_uptr = src + field->offset;
|
||||
dst_uptr = dst + field->offset;
|
||||
swap(*src_uptr, *dst_uptr);
|
||||
}
|
||||
}
|
||||
|
||||
static inline void bpf_obj_memzero(struct btf_record *rec, void *dst, u32 size)
|
||||
{
|
||||
u32 curr_off = 0;
|
||||
@@ -907,10 +932,6 @@ enum bpf_reg_type {
|
||||
* additional context, assume the value is non-null.
|
||||
*/
|
||||
PTR_TO_BTF_ID,
|
||||
/* PTR_TO_BTF_ID_OR_NULL points to a kernel struct that has not
|
||||
* been checked for null. Used primarily to inform the verifier
|
||||
* an explicit null check is required for this struct.
|
||||
*/
|
||||
PTR_TO_MEM, /* reg points to valid memory region */
|
||||
PTR_TO_ARENA,
|
||||
PTR_TO_BUF, /* reg points to a read/write buffer */
|
||||
@@ -923,6 +944,10 @@ enum bpf_reg_type {
|
||||
PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | PTR_TO_SOCKET,
|
||||
PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | PTR_TO_SOCK_COMMON,
|
||||
PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | PTR_TO_TCP_SOCK,
|
||||
/* PTR_TO_BTF_ID_OR_NULL points to a kernel struct that has not
|
||||
* been checked for null. Used primarily to inform the verifier
|
||||
* an explicit null check is required for this struct.
|
||||
*/
|
||||
PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | PTR_TO_BTF_ID,
|
||||
|
||||
/* This must be the last entry. Its purpose is to ensure the enum is
|
||||
@@ -1300,8 +1325,12 @@ void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len);
|
||||
bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr);
|
||||
|
||||
#ifdef CONFIG_BPF_JIT
|
||||
int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
|
||||
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr);
|
||||
int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
|
||||
struct bpf_trampoline *tr,
|
||||
struct bpf_prog *tgt_prog);
|
||||
int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
|
||||
struct bpf_trampoline *tr,
|
||||
struct bpf_prog *tgt_prog);
|
||||
struct bpf_trampoline *bpf_trampoline_get(u64 key,
|
||||
struct bpf_attach_target_info *tgt_info);
|
||||
void bpf_trampoline_put(struct bpf_trampoline *tr);
|
||||
@@ -1373,7 +1402,8 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func
|
||||
void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
|
||||
struct bpf_prog *to);
|
||||
/* Called only from JIT-enabled code, so there's no need for stubs. */
|
||||
void bpf_image_ksym_add(void *data, unsigned int size, struct bpf_ksym *ksym);
|
||||
void bpf_image_ksym_init(void *data, unsigned int size, struct bpf_ksym *ksym);
|
||||
void bpf_image_ksym_add(struct bpf_ksym *ksym);
|
||||
void bpf_image_ksym_del(struct bpf_ksym *ksym);
|
||||
void bpf_ksym_add(struct bpf_ksym *ksym);
|
||||
void bpf_ksym_del(struct bpf_ksym *ksym);
|
||||
@@ -1382,12 +1412,14 @@ void bpf_jit_uncharge_modmem(u32 size);
|
||||
bool bpf_prog_has_trampoline(const struct bpf_prog *prog);
|
||||
#else
|
||||
static inline int bpf_trampoline_link_prog(struct bpf_tramp_link *link,
|
||||
struct bpf_trampoline *tr)
|
||||
struct bpf_trampoline *tr,
|
||||
struct bpf_prog *tgt_prog)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link,
|
||||
struct bpf_trampoline *tr)
|
||||
struct bpf_trampoline *tr,
|
||||
struct bpf_prog *tgt_prog)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
@@ -1476,6 +1508,7 @@ struct bpf_prog_aux {
|
||||
u32 max_rdwr_access;
|
||||
struct btf *attach_btf;
|
||||
const struct bpf_ctx_arg_aux *ctx_arg_info;
|
||||
void __percpu *priv_stack_ptr;
|
||||
struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */
|
||||
struct bpf_prog *dst_prog;
|
||||
struct bpf_trampoline *dst_trampoline;
|
||||
@@ -1491,7 +1524,13 @@ struct bpf_prog_aux {
|
||||
bool xdp_has_frags;
|
||||
bool exception_cb;
|
||||
bool exception_boundary;
|
||||
bool is_extended; /* true if extended by freplace program */
|
||||
bool jits_use_priv_stack;
|
||||
bool priv_stack_requested;
|
||||
u64 prog_array_member_cnt; /* counts how many times as member of prog_array */
|
||||
struct mutex ext_mutex; /* mutex for is_extended and prog_array_member_cnt */
|
||||
struct bpf_arena *arena;
|
||||
void (*recursion_detected)(struct bpf_prog *prog); /* callback if recursion is detected */
|
||||
/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
|
||||
const struct btf_type *attach_func_proto;
|
||||
/* function name for valid attach_btf_id */
|
||||
@@ -3461,4 +3500,10 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
|
||||
return prog->aux->func_idx != 0;
|
||||
}
|
||||
|
||||
static inline bool bpf_prog_is_raw_tp(const struct bpf_prog *prog)
|
||||
{
|
||||
return prog->type == BPF_PROG_TYPE_TRACING &&
|
||||
prog->expected_attach_type == BPF_TRACE_RAW_TP;
|
||||
}
|
||||
|
||||
#endif /* _LINUX_BPF_H */
|
||||
|
||||
@@ -77,7 +77,13 @@ struct bpf_local_storage_elem {
|
||||
struct hlist_node map_node; /* Linked to bpf_local_storage_map */
|
||||
struct hlist_node snode; /* Linked to bpf_local_storage */
|
||||
struct bpf_local_storage __rcu *local_storage;
|
||||
struct rcu_head rcu;
|
||||
union {
|
||||
struct rcu_head rcu;
|
||||
struct hlist_node free_node; /* used to postpone
|
||||
* bpf_selem_free
|
||||
* after raw_spin_unlock
|
||||
*/
|
||||
};
|
||||
/* 8 bytes hole */
|
||||
/* The data is stored in another cacheline to minimize
|
||||
* the number of cachelines access during a cache hit.
|
||||
@@ -181,7 +187,7 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
|
||||
|
||||
struct bpf_local_storage_elem *
|
||||
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
|
||||
bool charge_mem, gfp_t gfp_flags);
|
||||
bool charge_mem, bool swap_uptrs, gfp_t gfp_flags);
|
||||
|
||||
void bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
struct bpf_local_storage_map *smap,
|
||||
@@ -195,7 +201,7 @@ bpf_local_storage_alloc(void *owner,
|
||||
|
||||
struct bpf_local_storage_data *
|
||||
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
void *value, u64 map_flags, gfp_t gfp_flags);
|
||||
void *value, u64 map_flags, bool swap_uptrs, gfp_t gfp_flags);
|
||||
|
||||
u64 bpf_local_storage_map_mem_usage(const struct bpf_map *map);
|
||||
|
||||
|
||||
@@ -48,22 +48,6 @@ enum bpf_reg_liveness {
|
||||
REG_LIVE_DONE = 0x8, /* liveness won't be updating this register anymore */
|
||||
};
|
||||
|
||||
/* For every reg representing a map value or allocated object pointer,
|
||||
* we consider the tuple of (ptr, id) for them to be unique in verifier
|
||||
* context and conside them to not alias each other for the purposes of
|
||||
* tracking lock state.
|
||||
*/
|
||||
struct bpf_active_lock {
|
||||
/* This can either be reg->map_ptr or reg->btf. If ptr is NULL,
|
||||
* there's no active lock held, and other fields have no
|
||||
* meaning. If non-NULL, it indicates that a lock is held and
|
||||
* id member has the reg->id of the register which can be >= 0.
|
||||
*/
|
||||
void *ptr;
|
||||
/* This will be reg->id */
|
||||
u32 id;
|
||||
};
|
||||
|
||||
#define ITER_PREFIX "bpf_iter_"
|
||||
|
||||
enum bpf_iter_state {
|
||||
@@ -266,6 +250,13 @@ struct bpf_stack_state {
|
||||
};
|
||||
|
||||
struct bpf_reference_state {
|
||||
/* Each reference object has a type. Ensure REF_TYPE_PTR is zero to
|
||||
* default to pointer reference on zero initialization of a state.
|
||||
*/
|
||||
enum ref_state_type {
|
||||
REF_TYPE_PTR = 0,
|
||||
REF_TYPE_LOCK,
|
||||
} type;
|
||||
/* Track each reference created with a unique id, even if the same
|
||||
* instruction creates the reference multiple times (eg, via CALL).
|
||||
*/
|
||||
@@ -274,17 +265,10 @@ struct bpf_reference_state {
|
||||
* is used purely to inform the user of a reference leak.
|
||||
*/
|
||||
int insn_idx;
|
||||
/* There can be a case like:
|
||||
* main (frame 0)
|
||||
* cb (frame 1)
|
||||
* func (frame 3)
|
||||
* cb (frame 4)
|
||||
* Hence for frame 4, if callback_ref just stored boolean, it would be
|
||||
* impossible to distinguish nested callback refs. Hence store the
|
||||
* frameno and compare that to callback_ref in check_reference_leak when
|
||||
* exiting a callback function.
|
||||
/* Use to keep track of the source object of a lock, to ensure
|
||||
* it matches on unlock.
|
||||
*/
|
||||
int callback_ref;
|
||||
void *ptr;
|
||||
};
|
||||
|
||||
struct bpf_retval_range {
|
||||
@@ -332,6 +316,7 @@ struct bpf_func_state {
|
||||
|
||||
/* The following fields should be last. See copy_func_state() */
|
||||
int acquired_refs;
|
||||
int active_locks;
|
||||
struct bpf_reference_state *refs;
|
||||
/* The state of the stack. Each element of the array describes BPF_REG_SIZE
|
||||
* (i.e. 8) bytes worth of stack memory.
|
||||
@@ -349,7 +334,7 @@ struct bpf_func_state {
|
||||
|
||||
#define MAX_CALL_FRAMES 8
|
||||
|
||||
/* instruction history flags, used in bpf_jmp_history_entry.flags field */
|
||||
/* instruction history flags, used in bpf_insn_hist_entry.flags field */
|
||||
enum {
|
||||
/* instruction references stack slot through PTR_TO_STACK register;
|
||||
* we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8)
|
||||
@@ -367,7 +352,7 @@ enum {
|
||||
static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES);
|
||||
static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8);
|
||||
|
||||
struct bpf_jmp_history_entry {
|
||||
struct bpf_insn_hist_entry {
|
||||
u32 idx;
|
||||
/* insn idx can't be bigger than 1 million */
|
||||
u32 prev_idx : 22;
|
||||
@@ -434,7 +419,6 @@ struct bpf_verifier_state {
|
||||
u32 insn_idx;
|
||||
u32 curframe;
|
||||
|
||||
struct bpf_active_lock active_lock;
|
||||
bool speculative;
|
||||
bool active_rcu_lock;
|
||||
u32 active_preempt_lock;
|
||||
@@ -458,13 +442,14 @@ struct bpf_verifier_state {
|
||||
* See get_loop_entry() for more information.
|
||||
*/
|
||||
struct bpf_verifier_state *loop_entry;
|
||||
/* jmp history recorded from first to last.
|
||||
* backtracking is using it to go from last to first.
|
||||
* For most states jmp_history_cnt is [0-3].
|
||||
/* Sub-range of env->insn_hist[] corresponding to this state's
|
||||
* instruction history.
|
||||
* Backtracking is using it to go from last to first.
|
||||
* For most states instruction history is short, 0-3 instructions.
|
||||
* For loops can go up to ~40.
|
||||
*/
|
||||
struct bpf_jmp_history_entry *jmp_history;
|
||||
u32 jmp_history_cnt;
|
||||
u32 insn_hist_start;
|
||||
u32 insn_hist_end;
|
||||
u32 dfs_depth;
|
||||
u32 callback_unroll_depth;
|
||||
u32 may_goto_depth;
|
||||
@@ -649,6 +634,12 @@ struct bpf_subprog_arg_info {
|
||||
};
|
||||
};
|
||||
|
||||
enum priv_stack_mode {
|
||||
PRIV_STACK_UNKNOWN,
|
||||
NO_PRIV_STACK,
|
||||
PRIV_STACK_ADAPTIVE,
|
||||
};
|
||||
|
||||
struct bpf_subprog_info {
|
||||
/* 'start' has to be the first field otherwise find_subprog() won't work */
|
||||
u32 start; /* insn idx of function entry point */
|
||||
@@ -669,6 +660,7 @@ struct bpf_subprog_info {
|
||||
/* true if bpf_fastcall stack region is used by functions that can't be inlined */
|
||||
bool keep_fastcall_stack: 1;
|
||||
|
||||
enum priv_stack_mode priv_stack_mode;
|
||||
u8 arg_cnt;
|
||||
struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
|
||||
};
|
||||
@@ -747,7 +739,9 @@ struct bpf_verifier_env {
|
||||
int cur_stack;
|
||||
} cfg;
|
||||
struct backtrack_state bt;
|
||||
struct bpf_jmp_history_entry *cur_hist_ent;
|
||||
struct bpf_insn_hist_entry *insn_hist;
|
||||
struct bpf_insn_hist_entry *cur_hist_ent;
|
||||
u32 insn_hist_cap;
|
||||
u32 pass_cnt; /* number of times do_check() was called */
|
||||
u32 subprog_cnt;
|
||||
/* number of instructions analyzed by the verifier */
|
||||
@@ -888,6 +882,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
|
||||
case BPF_PROG_TYPE_TRACING:
|
||||
return prog->expected_attach_type != BPF_TRACE_ITER;
|
||||
case BPF_PROG_TYPE_STRUCT_OPS:
|
||||
return prog->aux->jits_use_priv_stack;
|
||||
case BPF_PROG_TYPE_LSM:
|
||||
return false;
|
||||
default:
|
||||
|
||||
@@ -75,6 +75,7 @@
|
||||
#define KF_ITER_NEXT (1 << 9) /* kfunc implements BPF iter next method */
|
||||
#define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */
|
||||
#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */
|
||||
#define KF_FASTCALL (1 << 12) /* kfunc supports bpf_fastcall protocol */
|
||||
|
||||
/*
|
||||
* Tag marking a kernel function as a kfunc. This is meant to minimize the
|
||||
@@ -581,6 +582,16 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
|
||||
bool btf_types_are_same(const struct btf *btf1, u32 id1,
|
||||
const struct btf *btf2, u32 id2);
|
||||
int btf_check_iter_arg(struct btf *btf, const struct btf_type *func, int arg_idx);
|
||||
|
||||
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
|
||||
{
|
||||
if (!btf_type_is_ptr(t))
|
||||
return false;
|
||||
|
||||
t = btf_type_skip_modifiers(btf, t->type, NULL);
|
||||
|
||||
return btf_type_is_struct(t);
|
||||
}
|
||||
#else
|
||||
static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
|
||||
u32 type_id)
|
||||
@@ -660,15 +671,4 @@ static inline int btf_check_iter_arg(struct btf *btf, const struct btf_type *fun
|
||||
return -EOPNOTSUPP;
|
||||
}
|
||||
#endif
|
||||
|
||||
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
|
||||
{
|
||||
if (!btf_type_is_ptr(t))
|
||||
return false;
|
||||
|
||||
t = btf_type_skip_modifiers(btf, t->type, NULL);
|
||||
|
||||
return btf_type_is_struct(t);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
@@ -283,5 +283,6 @@ extern u32 btf_tracing_ids[];
|
||||
extern u32 bpf_cgroup_btf_id[];
|
||||
extern u32 bpf_local_storage_map_btf_id[];
|
||||
extern u32 btf_bpf_map_id[];
|
||||
extern u32 bpf_kmem_cache_btf_id[];
|
||||
|
||||
#endif
|
||||
|
||||
@@ -1119,6 +1119,7 @@ bool bpf_jit_supports_exceptions(void);
|
||||
bool bpf_jit_supports_ptr_xchg(void);
|
||||
bool bpf_jit_supports_arena(void);
|
||||
bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena);
|
||||
bool bpf_jit_supports_private_stack(void);
|
||||
u64 bpf_arch_uaddress_limit(void);
|
||||
void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie);
|
||||
bool bpf_helper_changes_pkt_data(void *func);
|
||||
|
||||
@@ -1116,6 +1116,7 @@ enum bpf_attach_type {
|
||||
BPF_NETKIT_PRIMARY,
|
||||
BPF_NETKIT_PEER,
|
||||
BPF_TRACE_KPROBE_SESSION,
|
||||
BPF_TRACE_UPROBE_SESSION,
|
||||
__MAX_BPF_ATTACH_TYPE
|
||||
};
|
||||
|
||||
@@ -1973,6 +1974,8 @@ union bpf_attr {
|
||||
* program.
|
||||
* Return
|
||||
* The SMP id of the processor running the program.
|
||||
* Attributes
|
||||
* __bpf_fastcall
|
||||
*
|
||||
* long bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
|
||||
* Description
|
||||
@@ -3104,10 +3107,6 @@ union bpf_attr {
|
||||
* with the **CONFIG_BPF_KPROBE_OVERRIDE** configuration
|
||||
* option, and in this case it only works on functions tagged with
|
||||
* **ALLOW_ERROR_INJECTION** in the kernel code.
|
||||
*
|
||||
* Also, the helper is only available for the architectures having
|
||||
* the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
|
||||
* x86 architecture is the only one to support this feature.
|
||||
* Return
|
||||
* 0
|
||||
*
|
||||
@@ -5372,7 +5371,7 @@ union bpf_attr {
|
||||
* Currently, the **flags** must be 0. Currently, nr_loops is
|
||||
* limited to 1 << 23 (~8 million) loops.
|
||||
*
|
||||
* long (\*callback_fn)(u32 index, void \*ctx);
|
||||
* long (\*callback_fn)(u64 index, void \*ctx);
|
||||
*
|
||||
* where **index** is the current index in the loop. The index
|
||||
* is zero-indexed.
|
||||
|
||||
@@ -16,7 +16,7 @@ obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
|
||||
obj-$(CONFIG_BPF_JIT) += trampoline.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o
|
||||
ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
|
||||
obj-$(CONFIG_BPF_SYSCALL) += arena.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
|
||||
endif
|
||||
obj-$(CONFIG_BPF_JIT) += dispatcher.o
|
||||
ifeq ($(CONFIG_NET),y)
|
||||
@@ -52,3 +52,4 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
|
||||
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
|
||||
obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o
|
||||
|
||||
@@ -3,9 +3,11 @@
|
||||
#include <linux/bpf.h>
|
||||
#include <linux/btf.h>
|
||||
#include <linux/err.h>
|
||||
#include "linux/filter.h"
|
||||
#include <linux/btf_ids.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/pagemap.h>
|
||||
#include "range_tree.h"
|
||||
|
||||
/*
|
||||
* bpf_arena is a sparsely populated shared memory region between bpf program and
|
||||
@@ -45,7 +47,7 @@ struct bpf_arena {
|
||||
u64 user_vm_start;
|
||||
u64 user_vm_end;
|
||||
struct vm_struct *kern_vm;
|
||||
struct maple_tree mt;
|
||||
struct range_tree rt;
|
||||
struct list_head vma_list;
|
||||
struct mutex lock;
|
||||
};
|
||||
@@ -98,6 +100,9 @@ static struct bpf_map *arena_map_alloc(union bpf_attr *attr)
|
||||
u64 vm_range;
|
||||
int err = -ENOMEM;
|
||||
|
||||
if (!bpf_jit_supports_arena())
|
||||
return ERR_PTR(-EOPNOTSUPP);
|
||||
|
||||
if (attr->key_size || attr->value_size || attr->max_entries == 0 ||
|
||||
/* BPF_F_MMAPABLE must be set */
|
||||
!(attr->map_flags & BPF_F_MMAPABLE) ||
|
||||
@@ -132,7 +137,8 @@ static struct bpf_map *arena_map_alloc(union bpf_attr *attr)
|
||||
|
||||
INIT_LIST_HEAD(&arena->vma_list);
|
||||
bpf_map_init_from_attr(&arena->map, attr);
|
||||
mt_init_flags(&arena->mt, MT_FLAGS_ALLOC_RANGE);
|
||||
range_tree_init(&arena->rt);
|
||||
range_tree_set(&arena->rt, 0, attr->max_entries);
|
||||
mutex_init(&arena->lock);
|
||||
|
||||
return &arena->map;
|
||||
@@ -183,7 +189,7 @@ static void arena_map_free(struct bpf_map *map)
|
||||
apply_to_existing_page_range(&init_mm, bpf_arena_get_kern_vm_start(arena),
|
||||
KERN_VM_SZ - GUARD_SZ, existing_page_cb, NULL);
|
||||
free_vm_area(arena->kern_vm);
|
||||
mtree_destroy(&arena->mt);
|
||||
range_tree_destroy(&arena->rt);
|
||||
bpf_map_area_free(arena);
|
||||
}
|
||||
|
||||
@@ -274,20 +280,20 @@ static vm_fault_t arena_vm_fault(struct vm_fault *vmf)
|
||||
/* User space requested to segfault when page is not allocated by bpf prog */
|
||||
return VM_FAULT_SIGSEGV;
|
||||
|
||||
ret = mtree_insert(&arena->mt, vmf->pgoff, MT_ENTRY, GFP_KERNEL);
|
||||
ret = range_tree_clear(&arena->rt, vmf->pgoff, 1);
|
||||
if (ret)
|
||||
return VM_FAULT_SIGSEGV;
|
||||
|
||||
/* Account into memcg of the process that created bpf_arena */
|
||||
ret = bpf_map_alloc_pages(map, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE, 1, &page);
|
||||
if (ret) {
|
||||
mtree_erase(&arena->mt, vmf->pgoff);
|
||||
range_tree_set(&arena->rt, vmf->pgoff, 1);
|
||||
return VM_FAULT_SIGSEGV;
|
||||
}
|
||||
|
||||
ret = vm_area_map_pages(arena->kern_vm, kaddr, kaddr + PAGE_SIZE, &page);
|
||||
if (ret) {
|
||||
mtree_erase(&arena->mt, vmf->pgoff);
|
||||
range_tree_set(&arena->rt, vmf->pgoff, 1);
|
||||
__free_page(page);
|
||||
return VM_FAULT_SIGSEGV;
|
||||
}
|
||||
@@ -444,12 +450,16 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt
|
||||
|
||||
guard(mutex)(&arena->lock);
|
||||
|
||||
if (uaddr)
|
||||
ret = mtree_insert_range(&arena->mt, pgoff, pgoff + page_cnt - 1,
|
||||
MT_ENTRY, GFP_KERNEL);
|
||||
else
|
||||
ret = mtree_alloc_range(&arena->mt, &pgoff, MT_ENTRY,
|
||||
page_cnt, 0, page_cnt_max - 1, GFP_KERNEL);
|
||||
if (uaddr) {
|
||||
ret = is_range_tree_set(&arena->rt, pgoff, page_cnt);
|
||||
if (ret)
|
||||
goto out_free_pages;
|
||||
ret = range_tree_clear(&arena->rt, pgoff, page_cnt);
|
||||
} else {
|
||||
ret = pgoff = range_tree_find(&arena->rt, page_cnt);
|
||||
if (pgoff >= 0)
|
||||
ret = range_tree_clear(&arena->rt, pgoff, page_cnt);
|
||||
}
|
||||
if (ret)
|
||||
goto out_free_pages;
|
||||
|
||||
@@ -476,7 +486,7 @@ static long arena_alloc_pages(struct bpf_arena *arena, long uaddr, long page_cnt
|
||||
kvfree(pages);
|
||||
return clear_lo32(arena->user_vm_start) + uaddr32;
|
||||
out:
|
||||
mtree_erase(&arena->mt, pgoff);
|
||||
range_tree_set(&arena->rt, pgoff, page_cnt);
|
||||
out_free_pages:
|
||||
kvfree(pages);
|
||||
return 0;
|
||||
@@ -516,7 +526,7 @@ static void arena_free_pages(struct bpf_arena *arena, long uaddr, long page_cnt)
|
||||
|
||||
pgoff = compute_pgoff(arena, uaddr);
|
||||
/* clear range */
|
||||
mtree_store_range(&arena->mt, pgoff, pgoff + page_cnt - 1, NULL, GFP_KERNEL);
|
||||
range_tree_set(&arena->rt, pgoff, page_cnt);
|
||||
|
||||
if (page_cnt > 1)
|
||||
/* bulk zap if multiple pages being freed */
|
||||
|
||||
@@ -947,22 +947,44 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map,
|
||||
struct file *map_file, int fd)
|
||||
{
|
||||
struct bpf_prog *prog = bpf_prog_get(fd);
|
||||
bool is_extended;
|
||||
|
||||
if (IS_ERR(prog))
|
||||
return prog;
|
||||
|
||||
if (!bpf_prog_map_compatible(map, prog)) {
|
||||
if (prog->type == BPF_PROG_TYPE_EXT ||
|
||||
!bpf_prog_map_compatible(map, prog)) {
|
||||
bpf_prog_put(prog);
|
||||
return ERR_PTR(-EINVAL);
|
||||
}
|
||||
|
||||
mutex_lock(&prog->aux->ext_mutex);
|
||||
is_extended = prog->aux->is_extended;
|
||||
if (!is_extended)
|
||||
prog->aux->prog_array_member_cnt++;
|
||||
mutex_unlock(&prog->aux->ext_mutex);
|
||||
if (is_extended) {
|
||||
/* Extended prog can not be tail callee. It's to prevent a
|
||||
* potential infinite loop like:
|
||||
* tail callee prog entry -> tail callee prog subprog ->
|
||||
* freplace prog entry --tailcall-> tail callee prog entry.
|
||||
*/
|
||||
bpf_prog_put(prog);
|
||||
return ERR_PTR(-EBUSY);
|
||||
}
|
||||
|
||||
return prog;
|
||||
}
|
||||
|
||||
static void prog_fd_array_put_ptr(struct bpf_map *map, void *ptr, bool need_defer)
|
||||
{
|
||||
struct bpf_prog *prog = ptr;
|
||||
|
||||
mutex_lock(&prog->aux->ext_mutex);
|
||||
prog->aux->prog_array_member_cnt--;
|
||||
mutex_unlock(&prog->aux->ext_mutex);
|
||||
/* bpf_prog is freed after one RCU or tasks trace grace period */
|
||||
bpf_prog_put(ptr);
|
||||
bpf_prog_put(prog);
|
||||
}
|
||||
|
||||
static u32 prog_fd_array_sys_lookup_elem(void *ptr)
|
||||
|
||||
@@ -107,7 +107,7 @@ static long bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key,
|
||||
|
||||
bpf_cgrp_storage_lock();
|
||||
sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
|
||||
value, map_flags, GFP_ATOMIC);
|
||||
value, map_flags, false, GFP_ATOMIC);
|
||||
bpf_cgrp_storage_unlock();
|
||||
cgroup_put(cgroup);
|
||||
return PTR_ERR_OR_ZERO(sdata);
|
||||
@@ -181,7 +181,7 @@ BPF_CALL_5(bpf_cgrp_storage_get, struct bpf_map *, map, struct cgroup *, cgroup,
|
||||
if (!percpu_ref_is_dying(&cgroup->self.refcnt) &&
|
||||
(flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
|
||||
sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
|
||||
value, BPF_NOEXIST, gfp_flags);
|
||||
value, BPF_NOEXIST, false, gfp_flags);
|
||||
|
||||
unlock:
|
||||
bpf_cgrp_storage_unlock();
|
||||
|
||||
@@ -99,7 +99,7 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
|
||||
|
||||
sdata = bpf_local_storage_update(file_inode(fd_file(f)),
|
||||
(struct bpf_local_storage_map *)map,
|
||||
value, map_flags, GFP_ATOMIC);
|
||||
value, map_flags, false, GFP_ATOMIC);
|
||||
return PTR_ERR_OR_ZERO(sdata);
|
||||
}
|
||||
|
||||
@@ -153,7 +153,7 @@ BPF_CALL_5(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
|
||||
if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
|
||||
sdata = bpf_local_storage_update(
|
||||
inode, (struct bpf_local_storage_map *)map, value,
|
||||
BPF_NOEXIST, gfp_flags);
|
||||
BPF_NOEXIST, false, gfp_flags);
|
||||
return IS_ERR(sdata) ? (unsigned long)NULL :
|
||||
(unsigned long)sdata->data;
|
||||
}
|
||||
|
||||
@@ -73,7 +73,7 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
|
||||
|
||||
struct bpf_local_storage_elem *
|
||||
bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
|
||||
void *value, bool charge_mem, gfp_t gfp_flags)
|
||||
void *value, bool charge_mem, bool swap_uptrs, gfp_t gfp_flags)
|
||||
{
|
||||
struct bpf_local_storage_elem *selem;
|
||||
|
||||
@@ -99,9 +99,12 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
|
||||
}
|
||||
|
||||
if (selem) {
|
||||
if (value)
|
||||
if (value) {
|
||||
/* No need to call check_and_init_map_value as memory is zero init */
|
||||
copy_map_value(&smap->map, SDATA(selem)->data, value);
|
||||
/* No need to call check_and_init_map_value as memory is zero init */
|
||||
if (swap_uptrs)
|
||||
bpf_obj_swap_uptrs(smap->map.record, SDATA(selem)->data, value);
|
||||
}
|
||||
return selem;
|
||||
}
|
||||
|
||||
@@ -209,8 +212,12 @@ static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
static void bpf_selem_free_rcu(struct rcu_head *rcu)
|
||||
{
|
||||
struct bpf_local_storage_elem *selem;
|
||||
struct bpf_local_storage_map *smap;
|
||||
|
||||
selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
|
||||
/* The bpf_local_storage_map_free will wait for rcu_barrier */
|
||||
smap = rcu_dereference_check(SDATA(selem)->smap, 1);
|
||||
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
|
||||
bpf_mem_cache_raw_free(selem);
|
||||
}
|
||||
|
||||
@@ -226,16 +233,25 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
struct bpf_local_storage_map *smap,
|
||||
bool reuse_now)
|
||||
{
|
||||
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
|
||||
|
||||
if (!smap->bpf_ma) {
|
||||
/* Only task storage has uptrs and task storage
|
||||
* has moved to bpf_mem_alloc. Meaning smap->bpf_ma == true
|
||||
* for task storage, so this bpf_obj_free_fields() won't unpin
|
||||
* any uptr.
|
||||
*/
|
||||
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
|
||||
__bpf_selem_free(selem, reuse_now);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!reuse_now) {
|
||||
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
|
||||
} else {
|
||||
if (reuse_now) {
|
||||
/* reuse_now == true only happens when the storage owner
|
||||
* (e.g. task_struct) is being destructed or the map itself
|
||||
* is being destructed (ie map_free). In both cases,
|
||||
* no bpf prog can have a hold on the selem. It is
|
||||
* safe to unpin the uptrs and free the selem now.
|
||||
*/
|
||||
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
|
||||
/* Instead of using the vanilla call_rcu(),
|
||||
* bpf_mem_cache_free will be able to reuse selem
|
||||
* immediately.
|
||||
@@ -243,6 +259,26 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
migrate_disable();
|
||||
bpf_mem_cache_free(&smap->selem_ma, selem);
|
||||
migrate_enable();
|
||||
return;
|
||||
}
|
||||
|
||||
call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);
|
||||
}
|
||||
|
||||
static void bpf_selem_free_list(struct hlist_head *list, bool reuse_now)
|
||||
{
|
||||
struct bpf_local_storage_elem *selem;
|
||||
struct bpf_local_storage_map *smap;
|
||||
struct hlist_node *n;
|
||||
|
||||
/* The "_safe" iteration is needed.
|
||||
* The loop is not removing the selem from the list
|
||||
* but bpf_selem_free will use the selem->rcu_head
|
||||
* which is union-ized with the selem->free_node.
|
||||
*/
|
||||
hlist_for_each_entry_safe(selem, n, list, free_node) {
|
||||
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
|
||||
bpf_selem_free(selem, smap, reuse_now);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -252,7 +288,7 @@ void bpf_selem_free(struct bpf_local_storage_elem *selem,
|
||||
*/
|
||||
static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
|
||||
struct bpf_local_storage_elem *selem,
|
||||
bool uncharge_mem, bool reuse_now)
|
||||
bool uncharge_mem, struct hlist_head *free_selem_list)
|
||||
{
|
||||
struct bpf_local_storage_map *smap;
|
||||
bool free_local_storage;
|
||||
@@ -296,7 +332,7 @@ static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_stor
|
||||
SDATA(selem))
|
||||
RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
|
||||
|
||||
bpf_selem_free(selem, smap, reuse_now);
|
||||
hlist_add_head(&selem->free_node, free_selem_list);
|
||||
|
||||
if (rcu_access_pointer(local_storage->smap) == smap)
|
||||
RCU_INIT_POINTER(local_storage->smap, NULL);
|
||||
@@ -345,6 +381,7 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
|
||||
struct bpf_local_storage_map *storage_smap;
|
||||
struct bpf_local_storage *local_storage;
|
||||
bool bpf_ma, free_local_storage = false;
|
||||
HLIST_HEAD(selem_free_list);
|
||||
unsigned long flags;
|
||||
|
||||
if (unlikely(!selem_linked_to_storage_lockless(selem)))
|
||||
@@ -360,9 +397,11 @@ static void bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem,
|
||||
raw_spin_lock_irqsave(&local_storage->lock, flags);
|
||||
if (likely(selem_linked_to_storage(selem)))
|
||||
free_local_storage = bpf_selem_unlink_storage_nolock(
|
||||
local_storage, selem, true, reuse_now);
|
||||
local_storage, selem, true, &selem_free_list);
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
|
||||
bpf_selem_free_list(&selem_free_list, reuse_now);
|
||||
|
||||
if (free_local_storage)
|
||||
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, reuse_now);
|
||||
}
|
||||
@@ -524,11 +563,12 @@ uncharge:
|
||||
*/
|
||||
struct bpf_local_storage_data *
|
||||
bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
void *value, u64 map_flags, gfp_t gfp_flags)
|
||||
void *value, u64 map_flags, bool swap_uptrs, gfp_t gfp_flags)
|
||||
{
|
||||
struct bpf_local_storage_data *old_sdata = NULL;
|
||||
struct bpf_local_storage_elem *alloc_selem, *selem = NULL;
|
||||
struct bpf_local_storage *local_storage;
|
||||
HLIST_HEAD(old_selem_free_list);
|
||||
unsigned long flags;
|
||||
int err;
|
||||
|
||||
@@ -550,7 +590,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
if (err)
|
||||
return ERR_PTR(err);
|
||||
|
||||
selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
|
||||
selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
|
||||
if (!selem)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
@@ -584,7 +624,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
/* A lookup has just been done before and concluded a new selem is
|
||||
* needed. The chance of an unnecessary alloc is unlikely.
|
||||
*/
|
||||
alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, gfp_flags);
|
||||
alloc_selem = selem = bpf_selem_alloc(smap, owner, value, true, swap_uptrs, gfp_flags);
|
||||
if (!alloc_selem)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
@@ -624,11 +664,12 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
|
||||
if (old_sdata) {
|
||||
bpf_selem_unlink_map(SELEM(old_sdata));
|
||||
bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
|
||||
true, false);
|
||||
true, &old_selem_free_list);
|
||||
}
|
||||
|
||||
unlock:
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
bpf_selem_free_list(&old_selem_free_list, false);
|
||||
if (alloc_selem) {
|
||||
mem_uncharge(smap, owner, smap->elem_size);
|
||||
bpf_selem_free(alloc_selem, smap, true);
|
||||
@@ -706,6 +747,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
|
||||
struct bpf_local_storage_map *storage_smap;
|
||||
struct bpf_local_storage_elem *selem;
|
||||
bool bpf_ma, free_storage = false;
|
||||
HLIST_HEAD(free_selem_list);
|
||||
struct hlist_node *n;
|
||||
unsigned long flags;
|
||||
|
||||
@@ -734,10 +776,12 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
|
||||
* of the loop will set the free_cgroup_storage to true.
|
||||
*/
|
||||
free_storage = bpf_selem_unlink_storage_nolock(
|
||||
local_storage, selem, true, true);
|
||||
local_storage, selem, true, &free_selem_list);
|
||||
}
|
||||
raw_spin_unlock_irqrestore(&local_storage->lock, flags);
|
||||
|
||||
bpf_selem_free_list(&free_selem_list, true);
|
||||
|
||||
if (free_storage)
|
||||
bpf_local_storage_free(local_storage, storage_smap, bpf_ma, true);
|
||||
}
|
||||
@@ -883,6 +927,9 @@ void bpf_local_storage_map_free(struct bpf_map *map,
|
||||
synchronize_rcu();
|
||||
|
||||
if (smap->bpf_ma) {
|
||||
rcu_barrier_tasks_trace();
|
||||
if (!rcu_trace_implies_rcu_gp())
|
||||
rcu_barrier();
|
||||
bpf_mem_alloc_destroy(&smap->selem_ma);
|
||||
bpf_mem_alloc_destroy(&smap->storage_ma);
|
||||
}
|
||||
|
||||
@@ -23,7 +23,6 @@ struct bpf_struct_ops_value {
|
||||
|
||||
struct bpf_struct_ops_map {
|
||||
struct bpf_map map;
|
||||
struct rcu_head rcu;
|
||||
const struct bpf_struct_ops_desc *st_ops_desc;
|
||||
/* protect map_update */
|
||||
struct mutex lock;
|
||||
@@ -32,7 +31,9 @@ struct bpf_struct_ops_map {
|
||||
* (in kvalue.data).
|
||||
*/
|
||||
struct bpf_link **links;
|
||||
u32 links_cnt;
|
||||
/* ksyms for bpf trampolines */
|
||||
struct bpf_ksym **ksyms;
|
||||
u32 funcs_cnt;
|
||||
u32 image_pages_cnt;
|
||||
/* image_pages is an array of pages that has all the trampolines
|
||||
* that stores the func args before calling the bpf_prog.
|
||||
@@ -481,11 +482,11 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
|
||||
{
|
||||
u32 i;
|
||||
|
||||
for (i = 0; i < st_map->links_cnt; i++) {
|
||||
if (st_map->links[i]) {
|
||||
bpf_link_put(st_map->links[i]);
|
||||
st_map->links[i] = NULL;
|
||||
}
|
||||
for (i = 0; i < st_map->funcs_cnt; i++) {
|
||||
if (!st_map->links[i])
|
||||
break;
|
||||
bpf_link_put(st_map->links[i]);
|
||||
st_map->links[i] = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -586,6 +587,49 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_ksym_init(const char *tname, const char *mname,
|
||||
void *image, unsigned int size,
|
||||
struct bpf_ksym *ksym)
|
||||
{
|
||||
snprintf(ksym->name, KSYM_NAME_LEN, "bpf__%s_%s", tname, mname);
|
||||
INIT_LIST_HEAD_RCU(&ksym->lnode);
|
||||
bpf_image_ksym_init(image, size, ksym);
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_add_ksyms(struct bpf_struct_ops_map *st_map)
|
||||
{
|
||||
u32 i;
|
||||
|
||||
for (i = 0; i < st_map->funcs_cnt; i++) {
|
||||
if (!st_map->ksyms[i])
|
||||
break;
|
||||
bpf_image_ksym_add(st_map->ksyms[i]);
|
||||
}
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_del_ksyms(struct bpf_struct_ops_map *st_map)
|
||||
{
|
||||
u32 i;
|
||||
|
||||
for (i = 0; i < st_map->funcs_cnt; i++) {
|
||||
if (!st_map->ksyms[i])
|
||||
break;
|
||||
bpf_image_ksym_del(st_map->ksyms[i]);
|
||||
}
|
||||
}
|
||||
|
||||
static void bpf_struct_ops_map_free_ksyms(struct bpf_struct_ops_map *st_map)
|
||||
{
|
||||
u32 i;
|
||||
|
||||
for (i = 0; i < st_map->funcs_cnt; i++) {
|
||||
if (!st_map->ksyms[i])
|
||||
break;
|
||||
kfree(st_map->ksyms[i]);
|
||||
st_map->ksyms[i] = NULL;
|
||||
}
|
||||
}
|
||||
|
||||
static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
void *value, u64 flags)
|
||||
{
|
||||
@@ -601,6 +645,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
int prog_fd, err;
|
||||
u32 i, trampoline_start, image_off = 0;
|
||||
void *cur_image = NULL, *image = NULL;
|
||||
struct bpf_link **plink;
|
||||
struct bpf_ksym **pksym;
|
||||
const char *tname, *mname;
|
||||
|
||||
if (flags)
|
||||
return -EINVAL;
|
||||
@@ -639,14 +686,19 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
udata = &uvalue->data;
|
||||
kdata = &kvalue->data;
|
||||
|
||||
plink = st_map->links;
|
||||
pksym = st_map->ksyms;
|
||||
tname = btf_name_by_offset(st_map->btf, t->name_off);
|
||||
module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]);
|
||||
for_each_member(i, t, member) {
|
||||
const struct btf_type *mtype, *ptype;
|
||||
struct bpf_prog *prog;
|
||||
struct bpf_tramp_link *link;
|
||||
struct bpf_ksym *ksym;
|
||||
u32 moff;
|
||||
|
||||
moff = __btf_member_bit_offset(t, member) / 8;
|
||||
mname = btf_name_by_offset(st_map->btf, member->name_off);
|
||||
ptype = btf_type_resolve_ptr(st_map->btf, member->type, NULL);
|
||||
if (ptype == module_type) {
|
||||
if (*(void **)(udata + moff))
|
||||
@@ -714,7 +766,14 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
}
|
||||
bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS,
|
||||
&bpf_struct_ops_link_lops, prog);
|
||||
st_map->links[i] = &link->link;
|
||||
*plink++ = &link->link;
|
||||
|
||||
ksym = kzalloc(sizeof(*ksym), GFP_USER);
|
||||
if (!ksym) {
|
||||
err = -ENOMEM;
|
||||
goto reset_unlock;
|
||||
}
|
||||
*pksym++ = ksym;
|
||||
|
||||
trampoline_start = image_off;
|
||||
err = bpf_struct_ops_prepare_trampoline(tlinks, link,
|
||||
@@ -735,6 +794,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
|
||||
/* put prog_id to udata */
|
||||
*(unsigned long *)(udata + moff) = prog->aux->id;
|
||||
|
||||
/* init ksym for this trampoline */
|
||||
bpf_struct_ops_ksym_init(tname, mname,
|
||||
image + trampoline_start,
|
||||
image_off - trampoline_start,
|
||||
ksym);
|
||||
}
|
||||
|
||||
if (st_ops->validate) {
|
||||
@@ -783,6 +848,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
|
||||
*/
|
||||
|
||||
reset_unlock:
|
||||
bpf_struct_ops_map_free_ksyms(st_map);
|
||||
bpf_struct_ops_map_free_image(st_map);
|
||||
bpf_struct_ops_map_put_progs(st_map);
|
||||
memset(uvalue, 0, map->value_size);
|
||||
@@ -790,6 +856,8 @@ reset_unlock:
|
||||
unlock:
|
||||
kfree(tlinks);
|
||||
mutex_unlock(&st_map->lock);
|
||||
if (!err)
|
||||
bpf_struct_ops_map_add_ksyms(st_map);
|
||||
return err;
|
||||
}
|
||||
|
||||
@@ -849,7 +917,10 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map)
|
||||
|
||||
if (st_map->links)
|
||||
bpf_struct_ops_map_put_progs(st_map);
|
||||
if (st_map->ksyms)
|
||||
bpf_struct_ops_map_free_ksyms(st_map);
|
||||
bpf_map_area_free(st_map->links);
|
||||
bpf_map_area_free(st_map->ksyms);
|
||||
bpf_struct_ops_map_free_image(st_map);
|
||||
bpf_map_area_free(st_map->uvalue);
|
||||
bpf_map_area_free(st_map);
|
||||
@@ -866,6 +937,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
|
||||
if (btf_is_module(st_map->btf))
|
||||
module_put(st_map->st_ops_desc->st_ops->owner);
|
||||
|
||||
bpf_struct_ops_map_del_ksyms(st_map);
|
||||
|
||||
/* The struct_ops's function may switch to another struct_ops.
|
||||
*
|
||||
* For example, bpf_tcp_cc_x->init() may switch to
|
||||
@@ -895,6 +968,19 @@ static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static u32 count_func_ptrs(const struct btf *btf, const struct btf_type *t)
|
||||
{
|
||||
int i;
|
||||
u32 count;
|
||||
const struct btf_member *member;
|
||||
|
||||
count = 0;
|
||||
for_each_member(i, t, member)
|
||||
if (btf_type_resolve_func_ptr(btf, member->type, NULL))
|
||||
count++;
|
||||
return count;
|
||||
}
|
||||
|
||||
static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
|
||||
{
|
||||
const struct bpf_struct_ops_desc *st_ops_desc;
|
||||
@@ -961,11 +1047,15 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
|
||||
map = &st_map->map;
|
||||
|
||||
st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE);
|
||||
st_map->links_cnt = btf_type_vlen(t);
|
||||
st_map->funcs_cnt = count_func_ptrs(btf, t);
|
||||
st_map->links =
|
||||
bpf_map_area_alloc(st_map->links_cnt * sizeof(struct bpf_links *),
|
||||
bpf_map_area_alloc(st_map->funcs_cnt * sizeof(struct bpf_link *),
|
||||
NUMA_NO_NODE);
|
||||
if (!st_map->uvalue || !st_map->links) {
|
||||
|
||||
st_map->ksyms =
|
||||
bpf_map_area_alloc(st_map->funcs_cnt * sizeof(struct bpf_ksym *),
|
||||
NUMA_NO_NODE);
|
||||
if (!st_map->uvalue || !st_map->links || !st_map->ksyms) {
|
||||
ret = -ENOMEM;
|
||||
goto errout_free;
|
||||
}
|
||||
@@ -994,7 +1084,8 @@ static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map)
|
||||
usage = sizeof(*st_map) +
|
||||
vt->size - sizeof(struct bpf_struct_ops_value);
|
||||
usage += vt->size;
|
||||
usage += btf_type_vlen(vt) * sizeof(struct bpf_links *);
|
||||
usage += st_map->funcs_cnt * sizeof(struct bpf_link *);
|
||||
usage += st_map->funcs_cnt * sizeof(struct bpf_ksym *);
|
||||
usage += PAGE_SIZE;
|
||||
return usage;
|
||||
}
|
||||
|
||||
@@ -128,6 +128,9 @@ static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
|
||||
struct pid *pid;
|
||||
int fd, err;
|
||||
|
||||
if ((map_flags & BPF_F_LOCK) && btf_record_has_field(map->record, BPF_UPTR))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
fd = *(int *)key;
|
||||
pid = pidfd_get_pid(fd, &f_flags);
|
||||
if (IS_ERR(pid))
|
||||
@@ -146,7 +149,7 @@ static long bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
|
||||
bpf_task_storage_lock();
|
||||
sdata = bpf_local_storage_update(
|
||||
task, (struct bpf_local_storage_map *)map, value, map_flags,
|
||||
GFP_ATOMIC);
|
||||
true, GFP_ATOMIC);
|
||||
bpf_task_storage_unlock();
|
||||
|
||||
err = PTR_ERR_OR_ZERO(sdata);
|
||||
@@ -218,7 +221,7 @@ static void *__bpf_task_storage_get(struct bpf_map *map,
|
||||
(flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy) {
|
||||
sdata = bpf_local_storage_update(
|
||||
task, (struct bpf_local_storage_map *)map, value,
|
||||
BPF_NOEXIST, gfp_flags);
|
||||
BPF_NOEXIST, false, gfp_flags);
|
||||
return IS_ERR(sdata) ? NULL : sdata->data;
|
||||
}
|
||||
|
||||
|
||||
@@ -2808,7 +2808,7 @@ static void btf_ref_type_log(struct btf_verifier_env *env,
|
||||
btf_verifier_log(env, "type_id=%u", t->type);
|
||||
}
|
||||
|
||||
static struct btf_kind_operations modifier_ops = {
|
||||
static const struct btf_kind_operations modifier_ops = {
|
||||
.check_meta = btf_ref_type_check_meta,
|
||||
.resolve = btf_modifier_resolve,
|
||||
.check_member = btf_modifier_check_member,
|
||||
@@ -2817,7 +2817,7 @@ static struct btf_kind_operations modifier_ops = {
|
||||
.show = btf_modifier_show,
|
||||
};
|
||||
|
||||
static struct btf_kind_operations ptr_ops = {
|
||||
static const struct btf_kind_operations ptr_ops = {
|
||||
.check_meta = btf_ref_type_check_meta,
|
||||
.resolve = btf_ptr_resolve,
|
||||
.check_member = btf_ptr_check_member,
|
||||
@@ -2858,7 +2858,7 @@ static void btf_fwd_type_log(struct btf_verifier_env *env,
|
||||
btf_verifier_log(env, "%s", btf_type_kflag(t) ? "union" : "struct");
|
||||
}
|
||||
|
||||
static struct btf_kind_operations fwd_ops = {
|
||||
static const struct btf_kind_operations fwd_ops = {
|
||||
.check_meta = btf_fwd_check_meta,
|
||||
.resolve = btf_df_resolve,
|
||||
.check_member = btf_df_check_member,
|
||||
@@ -3109,7 +3109,7 @@ static void btf_array_show(const struct btf *btf, const struct btf_type *t,
|
||||
__btf_array_show(btf, t, type_id, data, bits_offset, show);
|
||||
}
|
||||
|
||||
static struct btf_kind_operations array_ops = {
|
||||
static const struct btf_kind_operations array_ops = {
|
||||
.check_meta = btf_array_check_meta,
|
||||
.resolve = btf_array_resolve,
|
||||
.check_member = btf_array_check_member,
|
||||
@@ -3334,7 +3334,7 @@ static int btf_find_struct(const struct btf *btf, const struct btf_type *t,
|
||||
}
|
||||
|
||||
static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
|
||||
u32 off, int sz, struct btf_field_info *info)
|
||||
u32 off, int sz, struct btf_field_info *info, u32 field_mask)
|
||||
{
|
||||
enum btf_field_type type;
|
||||
u32 res_id;
|
||||
@@ -3358,9 +3358,14 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
|
||||
type = BPF_KPTR_REF;
|
||||
else if (!strcmp("percpu_kptr", __btf_name_by_offset(btf, t->name_off)))
|
||||
type = BPF_KPTR_PERCPU;
|
||||
else if (!strcmp("uptr", __btf_name_by_offset(btf, t->name_off)))
|
||||
type = BPF_UPTR;
|
||||
else
|
||||
return -EINVAL;
|
||||
|
||||
if (!(type & field_mask))
|
||||
return BTF_FIELD_IGNORE;
|
||||
|
||||
/* Get the base type */
|
||||
t = btf_type_skip_modifiers(btf, t->type, &res_id);
|
||||
/* Only pointer to struct is allowed */
|
||||
@@ -3502,7 +3507,7 @@ static int btf_get_field_type(const struct btf *btf, const struct btf_type *var_
|
||||
field_mask_test_name(BPF_REFCOUNT, "bpf_refcount");
|
||||
|
||||
/* Only return BPF_KPTR when all other types with matchable names fail */
|
||||
if (field_mask & BPF_KPTR && !__btf_type_is_struct(var_type)) {
|
||||
if (field_mask & (BPF_KPTR | BPF_UPTR) && !__btf_type_is_struct(var_type)) {
|
||||
type = BPF_KPTR_REF;
|
||||
goto end;
|
||||
}
|
||||
@@ -3535,6 +3540,7 @@ static int btf_repeat_fields(struct btf_field_info *info, int info_cnt,
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_KPTR_PERCPU:
|
||||
case BPF_UPTR:
|
||||
case BPF_LIST_HEAD:
|
||||
case BPF_RB_ROOT:
|
||||
break;
|
||||
@@ -3667,8 +3673,9 @@ static int btf_find_field_one(const struct btf *btf,
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_KPTR_PERCPU:
|
||||
case BPF_UPTR:
|
||||
ret = btf_find_kptr(btf, var_type, off, sz,
|
||||
info_cnt ? &info[0] : &tmp);
|
||||
info_cnt ? &info[0] : &tmp, field_mask);
|
||||
if (ret < 0)
|
||||
return ret;
|
||||
break;
|
||||
@@ -3991,6 +3998,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
|
||||
case BPF_KPTR_UNREF:
|
||||
case BPF_KPTR_REF:
|
||||
case BPF_KPTR_PERCPU:
|
||||
case BPF_UPTR:
|
||||
ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
|
||||
if (ret < 0)
|
||||
goto end;
|
||||
@@ -4050,12 +4058,28 @@ int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec)
|
||||
* Hence we only need to ensure that bpf_{list_head,rb_root} ownership
|
||||
* does not form cycles.
|
||||
*/
|
||||
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & BPF_GRAPH_ROOT))
|
||||
if (IS_ERR_OR_NULL(rec) || !(rec->field_mask & (BPF_GRAPH_ROOT | BPF_UPTR)))
|
||||
return 0;
|
||||
for (i = 0; i < rec->cnt; i++) {
|
||||
struct btf_struct_meta *meta;
|
||||
const struct btf_type *t;
|
||||
u32 btf_id;
|
||||
|
||||
if (rec->fields[i].type == BPF_UPTR) {
|
||||
/* The uptr only supports pinning one page and cannot
|
||||
* point to a kernel struct
|
||||
*/
|
||||
if (btf_is_kernel(rec->fields[i].kptr.btf))
|
||||
return -EINVAL;
|
||||
t = btf_type_by_id(rec->fields[i].kptr.btf,
|
||||
rec->fields[i].kptr.btf_id);
|
||||
if (!t->size)
|
||||
return -EINVAL;
|
||||
if (t->size > PAGE_SIZE)
|
||||
return -E2BIG;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (!(rec->fields[i].type & BPF_GRAPH_ROOT))
|
||||
continue;
|
||||
btf_id = rec->fields[i].graph_root.value_btf_id;
|
||||
@@ -4191,7 +4215,7 @@ static void btf_struct_show(const struct btf *btf, const struct btf_type *t,
|
||||
__btf_struct_show(btf, t, type_id, data, bits_offset, show);
|
||||
}
|
||||
|
||||
static struct btf_kind_operations struct_ops = {
|
||||
static const struct btf_kind_operations struct_ops = {
|
||||
.check_meta = btf_struct_check_meta,
|
||||
.resolve = btf_struct_resolve,
|
||||
.check_member = btf_struct_check_member,
|
||||
@@ -4359,7 +4383,7 @@ static void btf_enum_show(const struct btf *btf, const struct btf_type *t,
|
||||
btf_show_end_type(show);
|
||||
}
|
||||
|
||||
static struct btf_kind_operations enum_ops = {
|
||||
static const struct btf_kind_operations enum_ops = {
|
||||
.check_meta = btf_enum_check_meta,
|
||||
.resolve = btf_df_resolve,
|
||||
.check_member = btf_enum_check_member,
|
||||
@@ -4462,7 +4486,7 @@ static void btf_enum64_show(const struct btf *btf, const struct btf_type *t,
|
||||
btf_show_end_type(show);
|
||||
}
|
||||
|
||||
static struct btf_kind_operations enum64_ops = {
|
||||
static const struct btf_kind_operations enum64_ops = {
|
||||
.check_meta = btf_enum64_check_meta,
|
||||
.resolve = btf_df_resolve,
|
||||
.check_member = btf_enum_check_member,
|
||||
@@ -4540,7 +4564,7 @@ done:
|
||||
btf_verifier_log(env, ")");
|
||||
}
|
||||
|
||||
static struct btf_kind_operations func_proto_ops = {
|
||||
static const struct btf_kind_operations func_proto_ops = {
|
||||
.check_meta = btf_func_proto_check_meta,
|
||||
.resolve = btf_df_resolve,
|
||||
/*
|
||||
@@ -4598,7 +4622,7 @@ static int btf_func_resolve(struct btf_verifier_env *env,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct btf_kind_operations func_ops = {
|
||||
static const struct btf_kind_operations func_ops = {
|
||||
.check_meta = btf_func_check_meta,
|
||||
.resolve = btf_func_resolve,
|
||||
.check_member = btf_df_check_member,
|
||||
@@ -5566,7 +5590,7 @@ btf_parse_struct_metas(struct bpf_verifier_log *log, struct btf *btf)
|
||||
goto free_aof;
|
||||
}
|
||||
|
||||
ret = btf_find_kptr(btf, t, 0, 0, &tmp);
|
||||
ret = btf_find_kptr(btf, t, 0, 0, &tmp, BPF_KPTR);
|
||||
if (ret != BTF_FIELD_FOUND)
|
||||
continue;
|
||||
|
||||
@@ -6564,7 +6588,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
|
||||
if (prog_args_trusted(prog))
|
||||
info->reg_type |= PTR_TRUSTED;
|
||||
|
||||
if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
|
||||
/* Raw tracepoint arguments always get marked as maybe NULL */
|
||||
if (bpf_prog_is_raw_tp(prog))
|
||||
info->reg_type |= PTR_MAYBE_NULL;
|
||||
else if (btf_param_match_suffix(btf, &args[arg], "__nullable"))
|
||||
info->reg_type |= PTR_MAYBE_NULL;
|
||||
|
||||
if (tgt_prog) {
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user