Since CHANGEUPPER can now fail, add support for it in the newly
introduced netdev notifier error injection infrastructure.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module allows to insert errors in some of netdevice's notifier
events. All network drivers use these notifiers to signal various events
and to check if they are allowed, e.g. PRECHANGEMTU and CHANGEMTU
afterwards. Until recently I had to run failure tests by injecting
a custom module, but now this infrastructure makes it trivial to test
these failure paths. Some of the recent bugs I fixed were found using
this module.
Here's an example:
$ cd /sys/kernel/debug/notifier-error-inject/netdev
$ echo -22 > actions/NETDEV_CHANGEMTU/error
$ ip link set eth0 mtu 1024
RTNETLINK answers: Invalid argument
CC: Akinobu Mita <akinobu.mita@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev <netdev@vger.kernel.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is rather a hack to expose the current issue with rhashtable to
under high pressure sometimes return -ENOMEM even though system memory
is not exhausted and a consecutive insert may succeed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
A maximum table size of 64k entries is insufficient for the multiple
threads test even in default configuration (10 threads * 50000 objects =
500000 objects in total). Since we know how many objects will be
inserted, calculate the max size unless overridden by parameter.
Note that specifying the exact number of objects upon table init won't
suffice as that value is being rounded down to the next power of two -
anticipate this by rounding up to the next power of two in beforehand.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
After adding cond_resched() calls to threadfunc(), a surprisingly high
rate of insert failures occurred probably due to table resizes getting a
better chance to run in background. To not soften up the remaining
tests, retry inserts until they either succeed or fail permanently.
Also change the non-threaded test to retry insert operations, too.
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should fix for soft lockup bugs triggered on slow systems.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge final patch-bomb from Andrew Morton:
"Various leftovers, mainly Christoph's pci_dma_supported() removals"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
pci: remove pci_dma_supported
usbnet: remove ifdefed out call to dma_supported
kaweth: remove ifdefed out call to dma_supported
sfc: don't call dma_supported
nouveau: don't call pci_dma_supported
netup_unidvb: use pci_set_dma_mask insted of pci_dma_supported
cx23885: use pci_set_dma_mask insted of pci_dma_supported
cx25821: use pci_set_dma_mask insted of pci_dma_supported
cx88: use pci_set_dma_mask insted of pci_dma_supported
saa7134: use pci_set_dma_mask insted of pci_dma_supported
saa7164: use pci_set_dma_mask insted of pci_dma_supported
tw68-core: use pci_set_dma_mask insted of pci_dma_supported
pcnet32: use pci_set_dma_mask insted of pci_dma_supported
lib/string.c: add ULL suffix to the constant definition
hugetlb: trivial comment fix
selftests/mlock2: add ULL suffix to 64-bit constants
selftests/mlock2: add missing #define _GNU_SOURCE
Pull networking fixes from David Miller:
1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.
2) Several spots need to get to the original listner for SYN-ACK
packets, most spots got this ok but some were not. Whilst covering
the remaining cases, create a helper to do this. From Eric Dumazet.
3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
from Rasmus Villemoes.
4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.
5) Use after free in mvneta driver, from Justin Maggard.
6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.
7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd
Bergmann.
8) Fix multicast getsockopt deadlock, from WANG Cong.
9) Fix deadlock in btusb, from Kuba Pawlak.
10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
counter state. From Sabrina Dubroca.
11) Fix packet_bind() race, which can cause lost notifications, from
Francesco Ruggeri.
12) Fix MAC restoration in qlcnic driver during bonding mode changes,
from Jarod Wilson.
13) Revert bridging forward delay change which broke libvirt and other
userspace things, from Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
Revert "bridge: Allow forward delay to be cfgd when STP enabled"
bpf_trace: Make dependent on PERF_EVENTS
qed: select ZLIB_INFLATE
net: fix a race in dst_release()
net: mvneta: Fix memory use after free.
net: Documentation: Fix default value tcp_limit_output_bytes
macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
mvneta: add FIXED_PHY dependency
net: caif: check return value of alloc_netdev
net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
drivers: net: xgene: fix RGMII 10/100Mb mode
netfilter: nft_meta: use skb_to_full_sk() helper
net_sched: em_meta: use skb_to_full_sk() helper
sched: cls_flow: use skb_to_full_sk() helper
netfilter: xt_owner: use skb_to_full_sk() helper
smack: use skb_to_full_sk() helper
net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
bpf: doc: correct arch list for supported eBPF JIT
dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
bonding: fix panic on non-ARPHRD_ETHER enslave failure
...
8-byte constant is too big for long and compiler complains about this.
lib/string.c:907:20: warning: constant 0x0101010101010101 is so big it is long
Append ULL suffix to explicitly show its type.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull module updates from Rusty Russell:
"Nothing exciting, minor tweaks and cleanups"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
scripts: [modpost] add new sections to white list
modpost: Add flag -E for making section mismatches fatal
params: don't ignore the rest of cmdline if parse_one() fails
modpost: abort if a module symbol is too long
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
There is a classical off-by-one error in case when we try to place, for
example, 1+1 bytes as hex in the buffer of size 6. The expected result is
to get an output truncated, but in the reality we get 6 bytes filed
followed by terminating NUL.
Change the logic how we fill the output in case of byte dumping into
limited space. This will follow the snprintf() behaviour by truncating
output even on half bytes.
Fixes: 114fc1afb2 (hexdump: make it return number of bytes placed in buffer)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes kobject_set_name_vargs is called with a format string conaining
no %, or a format string of precisely "%s", where the single vararg
happens to point to .rodata. kvasprintf_const detects these cases for us
and returns a copy of that pointer instead of duplicating the string, thus
saving some run-time memory. Otherwise, it falls back to kvasprintf. We
just need to always deallocate ->name using kfree_const.
Unfortunately, the dance we need to do to perform the '/' -> '!'
sanitization makes the resulting code rather ugly.
I instrumented kstrdup_const to provide some statistics on the memory
saved, and for me this gave an additional ~14KB after boot (306KB was
already saved; this patch bumped that to 320KB). I have
KMALLOC_SHIFT_LOW==3, and since 80% of the kvasprintf_const hits were
satisfied by an 8-byte allocation, the 14K would roughly be quadrupled
when KMALLOC_SHIFT_LOW==5. Whether these numbers are sufficient to
justify the ugliness I'll leave to others to decide.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds kvasprintf_const which tries to use kstrdup_const if possible:
If the format string contains no % characters, or if the format string is
exactly "%s", we delegate to kstrdup_const. Otherwise, we fall back to
kvasprintf.
Just as for kstrdup_const, the main motivation is to save memory by
reusing .rodata when possible.
The return value should be freed by kfree_const, just like for
kstrdup_const.
There is deliberately no kasprintf_const: In the vast majority of cases,
the format string argument is a literal, so one can determine statically
whether one could instead use kstrdup_const directly (which would also
require one to change all corresponding kfree calls to kfree_const).
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
llist_del_first reads entry->next, but it did not acquire visibility over
the entry node. As the result it can get a stale value of entry->next
(e.g. NULL or whatever garbage was there before the appending thread
wrote correct value). And then commit that value as llist head with
cmpxchg. That will corrupt llist.
Note there is a control-dependency between read of head->first and read of
entry->next, but it does not make the code correct. Kernel memory model
unambiguously says: "A load-load control dependency requires a full read
memory barrier".
Use smp_load_acquire to acquire visibility over the entry node.
The data race was found with KernelThreadSanitizer (KTSAN).
Here is an example of KTSAN report:
ThreadSanitizer: data-race in llist_del_first
Read of size 1 by thread T389 (K2630, CPU0):
[<ffffffff8156b8a9>] llist_del_first+0x39/0x70 lib/llist.c:74
[< inlined >] tty_buffer_alloc drivers/tty/tty_buffer.c:181
[<ffffffff81664af4>] __tty_buffer_request_room+0xb4/0x250 drivers/tty/tty_buffer.c:292
[<ffffffff81664e6c>] tty_insert_flip_string_fixed_flag+0x6c/0x150 drivers/tty/tty_buffer.c:337
[< inlined >] tty_insert_flip_string include/linux/tty_flip.h:35
[<ffffffff81667422>] pty_write+0x72/0xc0 drivers/tty/pty.c:110
[< inlined >] process_output_block drivers/tty/n_tty.c:611
[<ffffffff8165c016>] n_tty_write+0x346/0x7f0 drivers/tty/n_tty.c:2401
[< inlined >] do_tty_write drivers/tty/tty_io.c:1159
[<ffffffff816568df>] tty_write+0x21f/0x3f0 drivers/tty/tty_io.c:1245
[<ffffffff8125f00f>] __vfs_write+0x5f/0x1f0 fs/read_write.c:489
[<ffffffff8125ff8f>] vfs_write+0xef/0x280 fs/read_write.c:538
[< inlined >] SYSC_write fs/read_write.c:585
[<ffffffff81261390>] SyS_write+0x70/0xe0 fs/read_write.c:577
[<ffffffff81ee862e>] entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186
Previous write of size 8 by thread T226 (K761, CPU0):
[<ffffffff8156b832>] llist_add_batch+0x32/0x70 lib/llist.c:44 (discriminator 16)
[< inlined >] llist_add include/linux/llist.h:180
[<ffffffff816649fc>] tty_buffer_free+0x6c/0xb0 drivers/tty/tty_buffer.c:221
[<ffffffff816651e7>] flush_to_ldisc+0x107/0x300 drivers/tty/tty_buffer.c:514
[<ffffffff810b20ee>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
[<ffffffff810b2650>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
[<ffffffff810bbe20>] kthread+0x150/0x170 kernel/kthread.c:209
[<ffffffff81ee8a1f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a simple module for testing the kernel's printf facilities.
Previously, some %p extensions have caused a wrong return value in case
the entire output didn't fit and/or been unusable in kasprintf(). This
should help catch such issues. Also, it should help ensure that changes
to the formatting algorithms don't break anything.
I'm not sure if we have a struct dentry or struct file lying around at
boot time or if we can fake one, but most %p extensions should be
testable, as should the ordinary number and string formatting.
The nature of vararg functions means we can't use a more conventional
table-driven approach.
For now, this is mostly a skeleton; contributions are very
welcome. Some tests are/will be slightly annoying to write, since the
expected output depends on stuff like CONFIG_*, sizeof(long), runtime
values etc.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a quick
git grep -E '%[ +0#-]*#[ +0#-]*(\*|[0-9]+)?(\.(\*|[0-9]+)?)?p'
shows, nobody uses the # flag with %p. Should one try to do so, one
will be met with
warning: `#' flag used with `%p' gnu_printf format [-Wformat]
(POSIX and C99 both say "... For other conversion specifiers, the
behavior is undefined.". Obviously, the kernel can choose to define
the behaviour however it wants, but as long as gcc issues that
warning, users are unlikely to show up.)
Since default_width is effectively always 2*sizeof(void*), we can
simplify the prologue of pointer() and save a few instructions.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quoting from 2aa2f9e21e ("lib/vsprintf.c: improve sanity check in
vsnprintf()"):
On 64 bit, size may very well be huge even if bit 31 happens to be 0.
Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
3 GiB one. So cap at INT_MAX as was probably the intention all along.
This is also the made-up value passed by sprintf and vsprintf.
I should have seen this copy-pasted instance back then, but let's just
do it now.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>