A longstanding issue with genksyms is that it has hidden syntax errors.
For example, genksyms fails to parse the following valid code:
int x, __attribute__((__section__(".init.data")))y;
Here, only 'y' is annotated by the attribute, although I am not aware
of actual uses of this pattern in the kernel tree.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
$ echo 'int x, __attribute__((__section__(".init.data")))y;' | scripts/genksyms/genksyms -w
<stdin>:1: syntax error
This commit allows attributes to be placed between a comma and
init_declarator.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
A longstanding issue with genksyms is that it has hidden syntax errors.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
For example, genksyms fails to parse the following code in
arch/arm64/lib/xor-neon.c:
static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r)
{
[ snip ]
}
The syntax error occurs because genksyms does not recognize the
uint64x2_t keyword.
This commit adds support for builtin types described in Arm Neon
Intrinsics Reference.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
A longstanding issue with genksyms is that it has hidden syntax errors.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
For example, with CONFIG_MODVERSIONS=y on v6.13-rc1:
$ make -s KCFLAGS=-D__GENKSYMS__ fs/lockd/svc.i
$ cat fs/lockd/svc.i | scripts/genksyms/genksyms -w
[ snip ]
./include/net/addrconf.h:35: syntax error
The syntax error occurs in the following code in include/net/addrconf.h:
union __packed {
[ snip ]
};
The issue arises from __packed, which is defined as
__attribute__((__packed__)), immediately after the 'union' keyword.
This commit allows the 'union' keyword to be followed by attributes.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
A longstanding issue with genksyms is that it has hidden syntax errors.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
For example, with CONFIG_MODVERSIONS=y on v6.13-rc1:
$ make -s KCFLAGS=-D__GENKSYMS__ arch/x86/kernel/cpu/mshyperv.i
$ cat arch/x86/kernel/cpu/mshyperv.i | scripts/genksyms/genksyms -w
[ snip ]
./arch/x86/include/asm/svm.h:122: syntax error
The syntax error occurs in the following code in arch/x86/include/asm/svm.h:
struct __attribute__ ((__packed__)) vmcb_control_area {
[ snip ]
};
The issue arises from __attribute__ immediately after the 'struct'
keyword.
This commit allows the 'struct' keyword to be followed by attributes.
The lexer must be adjusted because dont_want_brace_phase should not be
decremented while processing attributes.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
A longstanding issue with genksyms is that it has hidden syntax errors.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
For example, with CONFIG_MODVERSIONS=y on v6.13-rc1:
$ make -s KCFLAGS=-D__GENKSYMS__ kernel/module/main.i
$ cat kernel/module/main.i | scripts/genksyms/genksyms -w
[ snip ]
kernel/module/main.c:97: syntax error
The syntax error occurs in the following code in kernel/module/main.c:
static void __mod_update_bounds(enum mod_mem_type type __maybe_unused, void *base,
unsigned int size, struct mod_tree_root *tree)
{
[ snip ]
}
The issue arises from __maybe_unused, which is defined as
__attribute__((__unused__)).
This commit allows direct_abstract_declarator to be followed with
attributes.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
A longstanding issue with genksyms is that it has hidden syntax errors.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
For example, with CONFIG_MODVERSIONS=y on v6.13-rc1:
$ make -s KCFLAGS=-D__GENKSYMS__ drivers/acpi/prmt.i
$ cat drivers/acpi/prmt.i | scripts/genksyms/genksyms -w
[ snip ]
drivers/acpi/prmt.c:56: syntax error
The syntax error occurs in the following code in drivers/acpi/prmt.c:
struct prm_handler_info {
[ snip ]
efi_status_t (__efiapi *handler_addr)(u64, void *);
[ snip ]
};
The issue arises from __efiapi, which is defined as either
__attribute__((ms_abi)) or __attribute__((regparm(0))).
This commit allows nested_declarator to be prefixed with attributes.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
A longstanding issue with genksyms is that it has hidden syntax errors.
When a syntax error occurs, yyerror() is called. However,
error_with_pos() is a no-op unless the -w option is provided.
You can observe syntax errors by manually passing the -w option.
For example, with CONFIG_MODVERSIONS=y on v6.13-rc1:
$ make -s KCFLAGS=-D__GENKSYMS__ init/main.i
$ cat init/main.i | scripts/genksyms/genksyms -w
[ snip ]
./include/linux/efi.h:1225: syntax error
The syntax error occurs in the following code in include/linux/efi.h:
efi_status_t
efi_call_acpi_prm_handler(efi_status_t (__efiapi *handler_addr)(u64, void *),
u64 param_buffer_addr, void *context);
The issue arises from __efiapi, which is defined as either
__attribute__((ms_abi)) or __attribute__((regparm(0))).
This commit allows abstract_declarator to be prefixed with attributes.
To avoid conflicts, I tweaked the rule for decl_specifier_seq. Due to
this change, a standalone attribute cannot become decl_specifier_seq.
Otherwise, I do not know how to resolve the conflicts.
The following code, which was previously accepted by genksyms, will now
result in a syntax error:
void my_func(__attribute__((unused))x);
I do not think it is a big deal because GCC also fails to parse it.
$ echo 'void my_func(__attribute__((unused))x);' | gcc -c -x c -
<stdin>:1:37: error: unknown type name 'x'
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
The __attribute__ keyword can appear in more contexts than 'const' or
'volatile'.
To avoid grammatical conflicts with future changes, ATTRIBUTE_PHRASE
should not be reduced into type_qualifier.
No functional changes are intended.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
I believe the missing action here is a bug.
For rules with no explicit action, the following default is used:
{ $$ = $1; }
However, in this case, $1 is the value of attribute_opt itself. As a
result, the value of attribute_opt is always NULL.
The following test code demonstrates inconsistent behavior.
int x __attribute__((__aligned__(4)));
int y __attribute__((__aligned__(4))) = 0;
The attribute is recorded only when followed by an initializer.
This commit adds the correct action to propagate the value of the
ATTRIBUTE_PHRASE token.
With this change, the attribute in the example above is consistently
recorded for both 'x' and 'y'.
[Before]
$ cat <<EOF | scripts/genksyms/genksyms -d
int x __attribute__((__aligned__(4)));
int y __attribute__((__aligned__(4))) = 0;
EOF
Defn for type0 x == <int x >
Defn for type0 y == <int y __attribute__ ( ( __aligned__ ( 4 ) ) ) >
Hash table occupancy 2/4096 = 0.000488281
[After]
$ cat <<EOF | scripts/genksyms/genksyms -d
int x __attribute__((__aligned__(4)));
int y __attribute__((__aligned__(4))) = 0;
EOF
Defn for type0 x == <int x __attribute__ ( ( __aligned__ ( 4 ) ) ) >
Defn for type0 y == <int y __attribute__ ( ( __aligned__ ( 4 ) ) ) >
Hash table occupancy 2/4096 = 0.000488281
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
Similar to the previous commit, this change makes the parser logic a
little more accurate.
Currently, genksyms accepts the following invalid code:
struct foo {
int (*callback)(int)(int)(int);
};
A direct-declarator should not recursively absorb multiple
( parameter-type-list ) constructs.
In the example above, (*callback) should be followed by at most one
(int).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
While there is no more grammatical ambiguity in genksyms, the parser
logic is still inaccurate.
For example, genksyms accepts the following invalid C code:
void my_func(int ()(int));
This should result in a syntax error because () cannot be reduced to
<direct-abstract-declarator>.
( <abstract-declarator> ) can be reduced, but <abstract-declarator>
must not be empty in the following grammar from K&R [1]:
<direct-abstract-declarator> ::= ( <abstract-declarator> )
| {<direct-abstract-declarator>}? [ {<constant-expression>}? ]
| {<direct-abstract-declarator>}? ( {<parameter-type-list>}? )
Furthermore, genksyms accepts the following weird code:
void my_func(int (*callback)(int)(int)(int));
The parser allows <direct-abstract-declarator> to recursively absorb
multiple ( {<parameter-type-list>}? ), but this behavior is incorrect.
In the example above, (*callback) should be followed by at most one
(int).
[1]: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
This workaround was introduced for suppressing the reduce/reduce conflict
warnings because the %expect-rr directive, which is applicable only to GLR
parsers, cannot be used for genksyms.
Since there are no longer any conflicts, this Makefile hack is now
unnecessary.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
The genksyms parser has ambiguities in its grammar, which are currently
suppressed by a workaround in scripts/genksyms/Makefile.
Building genksyms with W=1 generates the following warnings:
YACC scripts/genksyms/parse.tab.[ch]
scripts/genksyms/parse.y: warning: 3 shift/reduce conflicts [-Wconflicts-sr]
scripts/genksyms/parse.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
The ambiguity arises when decl_specifier_seq is followed by '(' because
the following two interpretations are possible:
- decl_specifier_seq direct_abstract_declarator '(' parameter_declaration_clause ')'
- decl_specifier_seq '(' abstract_declarator ')'
This issue occurs because the current parser allows an empty string to
be reduced to direct_abstract_declarator, which is incorrect.
K&R [1] explains the correct grammar:
<parameter-declaration> ::= {<declaration-specifier>}+ <declarator>
| {<declaration-specifier>}+ <abstract-declarator>
| {<declaration-specifier>}+
<abstract-declarator> ::= <pointer>
| <pointer> <direct-abstract-declarator>
| <direct-abstract-declarator>
<direct-abstract-declarator> ::= ( <abstract-declarator> )
| {<direct-abstract-declarator>}? [ {<constant-expression>}? ]
| {<direct-abstract-declarator>}? ( {<parameter-type-list>}? )
This commit resolves all remaining conflicts.
We need to consider the difference between the following two examples:
[Example 1] ( <abstract-declarator> ) can become <direct-abstract-declarator>
void my_func(int (foo));
... is equivalent to:
void my_func(int foo);
[Example 2] ( <parameter-type-list> ) can become <direct-abstract-declarator>
typedef int foo;
void my_func(int (foo));
... is equivalent to:
void my_func(int (*callback)(int));
Please note that the function declaration is identical in both examples,
but the preceding typedef creates the distinction. I introduced a new
term, open_paren, to enable the type lookup immediately after the '('
token. Without this, we cannot distinguish between [Example 1] and
[Example 2].
[1]: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
The genksyms parser has ambiguities in its grammar, which are currently
suppressed by a workaround in scripts/genksyms/Makefile.
Building genksyms with W=1 generates the following warnings:
YACC scripts/genksyms/parse.tab.[ch]
scripts/genksyms/parse.y: warning: 9 shift/reduce conflicts [-Wconflicts-sr]
scripts/genksyms/parse.y: warning: 5 reduce/reduce conflicts [-Wconflicts-rr]
scripts/genksyms/parse.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
The comment in the parser describes the current problem:
/* This wasn't really a typedef name but an identifier that
shadows one. */
Consider the following simple C code:
typedef int foo;
void my_func(foo foo) {}
In the function parameter list (foo foo), the first 'foo' is a type
specifier (typedef'ed as 'int'), while the second 'foo' is an identifier.
However, the lexer cannot distinguish between the two. Since 'foo' is
already typedef'ed, the lexer returns TYPE for both instances, instead
of returning IDENT for the second one.
To support shadowed identifiers, TYPE can be reduced to either a
simple_type_specifier or a direct_abstract_declarator, which creates
a grammatical ambiguity.
Without analyzing the grammar context, it is very difficult to resolve
this correctly.
This commit introduces a flag, dont_want_type_specifier, which allows
the parser to inform the lexer whether an identifier is expected. When
dont_want_type_specifier is true, the type lookup is suppressed, and
the lexer returns IDENT regardless of any preceding typedef.
After this commit, only 3 shift/reduce conflicts will remain.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Nicolas Schier <n.schier@avm.de>
Currently, 'unsigned long' is used for intermediate variables when
calculating CRCs.
The size of 'long' differs depending on the architecture: it is 32 bits
on 32-bit architectures and 64 bits on 64-bit architectures.
The CRC values generated by genksyms represent the compatibility of
exported symbols. Therefore, reproducibility is important. In other
words, we need to ensure that the output is the same when the kernel
source is identical, regardless of whether genksyms is running on a
32-bit or 64-bit build machine.
Fortunately, the output from genksyms is not affected by the build
machine's architecture because only the lower 32 bits of the
'unsigned long' variables are used.
To make it even clearer that the CRC calculation is independent of
the build machine's architecture, this commit explicitly uses the
fixed-width type, uint32_t.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
free_list() must be called before returning from this for-loop.
Swap 'break' and the combination of free_list() and 'return'.
This reduces the code and minimizes the risk of introducing memory
leaks in future changes.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
To improve readability, reduce the indentation as follows:
- Use 'continue' earlier when the symbol does not match
- flip !sym->is_declared to flatten the if-else chain
No functional changes are intended.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When a symbol that is already registered is read again from *.symref
file, __add_symbol() removes the previous one from the hash table without
freeing it.
[Test Case]
$ cat foo.c
#include <linux/export.h>
void foo(void);
void foo(void) {}
EXPORT_SYMBOL(foo);
$ cat foo.symref
foo void foo ( void )
foo void foo ( void )
When a symbol is removed from the hash table, it must be freed along
with its ->name and ->defn members. However, sym->name cannot be freed
because it is sometimes shared with node->string, but not always. If
sym->name and node->string share the same memory, free(sym->name) could
lead to a double-free bug.
To resolve this issue, always assign a strdup'ed string to sym->name.
Fixes: 64e6c1e123 ("genksyms: track symbol checksum changes")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
When a symbol that is already registered is added again, __add_symbol()
returns without freeing the symbol definition, making it unreachable.
The following test cases demonstrate different memory leak points.
[Test Case 1]
Forward declaration with exactly the same definition
$ cat foo.c
#include <linux/export.h>
void foo(void);
void foo(void) {}
EXPORT_SYMBOL(foo);
[Test Case 2]
Forward declaration with a different definition (e.g. attribute)
$ cat foo.c
#include <linux/export.h>
void foo(void);
__attribute__((__section__(".ref.text"))) void foo(void) {}
EXPORT_SYMBOL(foo);
[Test Case 3]
Preserving an overridden symbol (compile with KBUILD_PRESERVE=1)
$ cat foo.c
#include <linux/export.h>
void foo(void);
void foo(void) { }
EXPORT_SYMBOL(foo);
$ cat foo.symref
override foo void foo ( int )
The memory leaks in Test Case 1 and 2 have existed since the introduction
of genksyms into the kernel tree. [1]
The memory leak in Test Case 3 was introduced by commit 5dae9a550a
("genksyms: allow to ignore symbol checksum changes").
When multiple init_declarators are reduced to an init_declarator_list,
the decl_spec must be duplicated. Otherwise, the following Test Case 4
would result in a double-free bug.
[Test Case 4]
$ cat foo.c
#include <linux/export.h>
extern int foo, bar;
int foo, bar;
EXPORT_SYMBOL(foo);
In this case, 'foo' and 'bar' share the same decl_spec, 'int'. It must
be unshared before being passed to add_symbol().
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=46bd1da672d66ccd8a639d3c1f8a166048cca608
Fixes: 5dae9a550a ("genksyms: allow to ignore symbol checksum changes")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Modify this function to return earlier when find_symbol() returns NULL,
reducing the level of improve readability.
No functional changes are intended.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:
src := $(obj)
When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.
This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.
To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.
Going forward, the variables used in Makefiles will have the following
meanings:
$(obj) - directory in the object tree
$(src) - directory in the source tree (changed by this commit)
$(objtree) - the top of the kernel object tree
$(srctree) - the top of the kernel source tree
Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>