Pull networking fixes from David Miller:
"This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from Jose
Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
ccid, from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep
sleeps, from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: dsa: b53: Fix for failure when irq is not defined in dt
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: Don't default Cavium PTP driver to 'y'
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx5e: Update hw flows when encap source mac changed
qed*: Advance drivers version to 8.37.0.20
qed: Change verbosity for coalescing message.
qede: Fix system crash on configuring channels.
qed: Consider TX tcs while deriving the max num_queues for PF.
...
The test_insert_dup() function from lib/test_rhashtable.c passes a
pointer to a stack object to rhltable_init(). Allocate the hash table
dynamically to avoid that the following is reported with object
debugging enabled:
ODEBUG: object (ptrval) is on stack (ptrval), but NOT annotated.
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:368 __debug_object_init+0x312/0x480
Modules linked in:
EIP: __debug_object_init+0x312/0x480
Call Trace:
? debug_object_init+0x1a/0x20
? __init_work+0x16/0x30
? rhashtable_init+0x1e1/0x460
? sched_clock_cpu+0x57/0xe0
? rhltable_init+0xb/0x20
? test_insert_dup+0x32/0x20f
? trace_hardirqs_on+0x38/0xf0
? ida_dump+0x10/0x10
? jhash+0x130/0x130
? my_hashfn+0x30/0x30
? test_rht_init+0x6aa/0xab4
? ida_dump+0x10/0x10
? test_rhltable+0xc5c/0xc5c
? do_one_initcall+0x67/0x28e
? trace_hardirqs_off+0x22/0xe0
? restore_all_kernel+0xf/0x70
? trace_hardirqs_on_thunk+0xc/0x10
? restore_all_kernel+0xf/0x70
? kernel_init_freeable+0x142/0x213
? rest_init+0x230/0x230
? kernel_init+0x10/0x110
? schedule_tail_wrapper+0x9/0xc
? ret_from_fork+0x19/0x24
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull XArray fixes from Matthew Wilcox:
"Fix some oversights in the XArray porcelain API:
- support for m68k's two-byte aligned pointers
- reserving entries using xa_insert()
- missing xa_insert_bh() and xa_insert_irq() functions
- simplify using xa_for_each()
- use lockdep correctly
- a few other minor fixes and improvements"
* tag 'xarray-5.0-rc3' of git://git.infradead.org/users/willy/linux-dax:
XArray: Fix an arithmetic error in xa_is_err
XArray tests: Check mark 2 gets squashed
XArray: Fix typo in comment
XArray: Honour reserved entries in xa_insert
XArray: Permit storing 2-byte-aligned pointers
XArray: Change xa_for_each iterator
XArray: Turn xa_init_flags into a static inline
XArray tests: Add RCU locking
If an input number x for int_sqrt64() has the highest bit set, then
fls64(x) is 64. (1UL << 64) is an overflow and breaks the algorithm.
Subtracting 1 is a better guess for the initial value of m anyway and
that's what also done in int_sqrt() implicitly [*].
[*] Note how int_sqrt() uses __fls() with two underscores, which already
returns the proper raw bit number.
In contrast, int_sqrt64() used fls64(), and that returns bit numbers
illogically starting at 1, because of error handling for the "no
bits set" case. Will points out that he bug probably is due to a
copy-and-paste error from the regular int_sqrt() case.
Signed-off-by: Florian La Roche <Florian.LaRoche@googlemail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Because we may call blk_mq_get_driver_tag() directly from
blk_mq_dispatch_rq_list() without holding any lock, then HARDIRQ may
come and the above DEADLOCK is triggered.
Commit ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") tries to
fix this issue by using 'spin_lock_bh', which isn't enough because we
complete request from hardirq context direclty in case of multiqueue.
Cc: Clark Williams <williams@redhat.com>
Fixes: ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We do not currently check that the loop in xas_squash_marks() doesn't have
an off-by-one error in it. It didn't, but a patch which introduced an
off-by-one error wasn't caught by any existing test. Switch the roles
of XA_MARK_1 and XA_MARK_2 to catch that bug.
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
The swap_lock used by sbitmap has a chain with locks taken from softirq,
but the swap_lock is not protected from being preempted by softirqs.
A chain exists of:
sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock
Where the sbq->ws[i].wait lock can be taken from softirq context, which
means all locks below it in the chain must also be protected from
softirqs.
Reported-by: Clark Williams <williams@redhat.com>
Fixes: 58ab5e32e6 ("sbitmap: silence bogus lockdep IRQ warning")
Fixes: ea86ea2cdc ("sbitmap: amortize cost of clearing bits")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xa_insert() should treat reserved entries as occupied, not as available.
Also, it should treat requests to insert a NULL pointer as a request
to reserve the slot. Add xa_insert_bh() and xa_insert_irq() for
completeness.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
On m68k, statically allocated pointers may only be two-byte aligned.
This clashes with the XArray's method for tagging internal pointers.
Permit storing these pointers in single slots (ie not in multislots).
Signed-off-by: Matthew Wilcox <willy@infradead.org>
There were three problems with this API:
1. It took too many arguments; almost all users wanted to iterate over
every element in the array rather than a subset.
2. It required that 'index' be initialised before use, and there's no
realistic way to make GCC catch that.
3. 'index' and 'entry' were the opposite way round from every other
member of the XArray APIs.
So split it into three different APIs:
xa_for_each(xa, index, entry)
xa_for_each_start(xa, index, entry, start)
xa_for_each_marked(xa, index, entry, filter)
Signed-off-by: Matthew Wilcox <willy@infradead.org>
A regular xa_init_flags() put all dynamically-initialised XArrays into
the same locking class. That leads to lockdep believing that taking
one XArray lock while holding another is a deadlock. It's possible to
work around some of these situations with separate locking classes for
irq/bh/regular XArrays, and SINGLE_DEPTH_NESTING, but that's ugly, and
it doesn't work for all situations (where we have completely unrelated
XArrays).
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Pull more Kbuild updates from Masahiro Yamada:
- improve boolinit.cocci and use_after_iter.cocci semantic patches
- fix alignment for kallsyms
- move 'asm goto' compiler test to Kconfig and clean up jump_label
CONFIG option
- generate asm-generic wrappers automatically if arch does not
implement mandatory UAPI headers
- remove redundant generic-y defines
- misc cleanups
* tag 'kbuild-v4.21-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: rename generated .*conf-cfg to *conf-cfg
kbuild: remove unnecessary stubs for archheader and archscripts
kbuild: use assignment instead of define ... endef for filechk_* rules
arch: remove redundant UAPI generic-y defines
kbuild: generate asm-generic wrappers if mandatory headers are missing
arch: remove stale comments "UAPI Header export list"
riscv: remove redundant kernel-space generic-y
kbuild: change filechk to surround the given command with { }
kbuild: remove redundant target cleaning on failure
kbuild: clean up rule_dtc_dt_yaml
kbuild: remove UIMAGE_IN and UIMAGE_OUT
jump_label: move 'asm goto' support test to Kconfig
kallsyms: lower alignment on ARM
scripts: coccinelle: boolinit: drop warnings on named constants
scripts: coccinelle: check for redeclaration
kconfig: remove unused "file" field of yylval union
nds32: remove redundant kernel-space generic-y
nios2: remove unneeded HAS_DMA define
Pull block updates and fixes from Jens Axboe:
- Pulled in MD changes that Shaohua had queued up for 4.21.
Unfortunately we lost Shaohua late 2018, I'm sending these in on his
behalf.
- In conjunction with the above, I added a CREDITS entry for Shaoua.
- sunvdc queue restart fix (Ming)
* tag 'for-linus-20190104' of git://git.kernel.dk/linux-block:
Add CREDITS entry for Shaohua Li
block: sunvdc: don't run hw queue synchronously from irq context
md: fix raid10 hang issue caused by barrier
raid10: refactor common wait code from regular read/write request
md: remvoe redundant condition check
lib/raid6: add option to skip algo benchmarking
lib/raid6: sort algos in rough performance order
lib/raid6: check for assembler SSSE3 support
lib/raid6: avoid __attribute_const__ redefinition
lib/raid6: add missing include for raid6test
md: remove set but not used variable 'bi_rdev'
Since commit 9c2af1c737 ("kbuild: add .DELETE_ON_ERROR special
target"), the target file is automatically deleted on failure.
The boilerplate code
... || { rm -f $@; false; }
is unneeded.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Fixes build break on most ARM/ARM64 defconfigs:
lib/genalloc.c: In function 'gen_pool_add_virt':
lib/genalloc.c:190:10: error: implicit declaration of function 'vzalloc_node'; did you mean 'kzalloc_node'?
lib/genalloc.c:190:8: warning: assignment to 'struct gen_pool_chunk *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
lib/genalloc.c: In function 'gen_pool_destroy':
lib/genalloc.c:254:3: error: implicit declaration of function 'vfree'; did you mean 'kfree'?
Fixes: 6862d2fc81 ('lib/genalloc.c: use vzalloc_node() to allocate the bitmap')
Cc: Huang Shijie <sjhuang@iluvatar.ai>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Skidanov <alexey.skidanov@intel.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull trivial vfs updates from Al Viro:
"A few cleanups + Neil's namespace_unlock() optimization"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
exec: make prepare_bprm_creds static
genheaders: %-<width>s had been there since v6; %-*s - since v7
VFS: use synchronize_rcu_expedited() in namespace_unlock()
iov_iter: reduce code duplication
Merge more updates from Andrew Morton:
- procfs updates
- various misc bits
- lib/ updates
- epoll updates
- autofs
- fatfs
- a few more MM bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
mm/page_io.c: fix polled swap page in
checkpatch: add Co-developed-by to signature tags
docs: fix Co-Developed-by docs
drivers/base/platform.c: kmemleak ignore a known leak
fs: don't open code lru_to_page()
fs/: remove caller signal_pending branch predictions
mm/: remove caller signal_pending branch predictions
arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
kernel/sched/: remove caller signal_pending branch predictions
kernel/locking/mutex.c: remove caller signal_pending branch predictions
mm: select HAVE_MOVE_PMD on x86 for faster mremap
mm: speed up mremap by 20x on large regions
mm: treewide: remove unused address argument from pte_alloc functions
initramfs: cleanup incomplete rootfs
scripts/gdb: fix lx-version string output
kernel/kcov.c: mark write_comp_data() as notrace
kernel/sysctl: add panic_print into sysctl
panic: add options to print system info when panic happens
bfs: extra sanity checking and static inode bitmap
exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
...
gen_pool_alloc_algo() uses different allocation functions implementing
different allocation algorithms. With gen_pool_first_fit_align()
allocation function, the returned address should be aligned on the
requested boundary.
If chunk start address isn't aligned on the requested boundary, the
returned address isn't aligned too. The only way to get properly
aligned address is to initialize the pool with chunks aligned on the
requested boundary. If want to have an ability to allocate buffers
aligned on different boundaries (for example, 4K, 1MB, ...), the chunk
start address should be aligned on the max possible alignment.
This happens because gen_pool_first_fit_align() looks for properly
aligned memory block without taking into account the chunk start address
alignment.
To fix this, we provide chunk start address to
gen_pool_first_fit_align() and change its implementation such that it
starts looking for properly aligned block with appropriate offset
(exactly as is done in CMA).
Link: https://lkml.kernel.org/lkml/a170cf65-6884-3592-1de9-4c235888cc8a@intel.com
Link: http://lkml.kernel.org/r/1541690953-4623-1-git-send-email-alexey.skidanov@intel.com
Signed-off-by: Alexey Skidanov <alexey.skidanov@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.
But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar. Which makes it very hard to verify that the access has
actually been range-checked.
If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin(). But
nothing really forces the range check.
By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses. We have way too long a history of people
trying to avoid them.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>