The sprint_oid() utility function doesn't properly check the buffer size
that it causes that the warning in vsnprintf() be triggered. For
example on v4.1 kernel:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0()
...
We can trigger this issue by injecting maliciously crafted x509 cert in
DER format. Just using hex editor to change the length of OID to over
the length of the SEQUENCE container. For example:
0:d=0 hl=4 l= 980 cons: SEQUENCE
4:d=1 hl=4 l= 700 cons: SEQUENCE
8:d=2 hl=2 l= 3 cons: cont [ 0 ]
10:d=3 hl=2 l= 1 prim: INTEGER :02
13:d=2 hl=2 l= 9 prim: INTEGER :9B47FAF791E7D1E3
24:d=2 hl=2 l= 13 cons: SEQUENCE
26:d=3 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption
37:d=3 hl=2 l= 0 prim: NULL
39:d=2 hl=2 l= 121 cons: SEQUENCE
41:d=3 hl=2 l= 22 cons: SET
43:d=4 hl=2 l= 20 cons: SEQUENCE <=== the SEQ length is 20
45:d=5 hl=2 l= 3 prim: OBJECT :organizationName
<=== the original length is 3, change the length of OID to over the length of SEQUENCE
Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
patch to fix it by checking the bufsize in sprint_oid().
Link: http://lkml.kernel.org/r/20170903021646.2080-1-jlee@suse.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
Reported-by: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current implementation of bitmap_parselist() uses a static variable to
save local state while setting bits in the bitmap. It is obviously wrong
if we assume execution in multiprocessor environment. Fortunately, it's
possible to rewrite this portion of code to avoid using the static
variable.
It is also possible to set bits in the mask per-range with bitmap_set(),
not per-bit, as it is implemented now, with set_bit(); which is way
faster.
The important side effect of this change is that setting bits in this
function from now is not per-bit atomic and less memory-ordered. This is
because set_bit() guarantees the order of memory accesses, while
bitmap_set() does not. I think that it is the advantage of the new
approach, because the bitmap_parselist() is intended to initialise bit
arrays, and user should protect the whole bitmap during initialisation if
needed. So protecting individual bits looks expensive and useless. Also,
other range-oriented functions in lib/bitmap.c don't worry much about
atomicity.
With all that, setting 2k bits in map with the pattern like 0-2047:128/256
becomes ~50 times faster after applying the patch in my testing
environment (arm64 hosted on qemu).
The second patch of the series adds the test for bitmap_parselist(). It's
not intended to cover all tricky cases, just to make sure that I didn't
screw up during rework.
Link: http://lkml.kernel.org/r/20170807225438.16161-1-ynorov@caviumnetworks.com
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Cc: Noam Camus <noamca@mellanox.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only times the nil-parent (root node) condition is true is when the
node is the first in the tree, or after fixing rbtree rule #4 and the
case 1 rebalancing made the node the root. Such conditions do not apply
most of the time:
(i) The common case in an rbtree is to have more than a single node,
so this is only true for the first rb_insert().
(ii) While there is a chance only one first rotation is needed, cases
where the node's uncle is black (cases 2,3) are more common as we can
have the following scenarios during the rotation looping:
case1 only, case1+1, case2+3, case1+2+3, case3 only, etc.
This patch, therefore, adds an unlikely() optimization to this
conditional. When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES, a
kernel build shows that the incorrect rate is less than 15%, and for
workloads that involve insert mostly trees overtime tend to have less
than 2% incorrect rate.
Link: http://lkml.kernel.org/r/20170719014603.19029-3-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "rbtree: Cache leftmost node internally", v4.
A series to extending rbtrees to internally cache the leftmost node such
that we can have fast overlap check optimization for all interval tree
users[1]. The benefits of this series are that:
(i) Unify users that do internal leftmost node caching.
(ii) Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.
This patch (of 16):
Red-black tree semantics imply that nodes with smaller or greater (or
equal for duplicates) keys always be to the left and right,
respectively. For the kernel this is extremely evident when considering
our rb_first() semantics. Enabling lookups for the smallest node in the
tree in O(1) can save a good chunk of cycles in not having to walk down
the tree each time. To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes. There is also
the desire for interval trees to have this optimization allowing faster
overlap checking.
This patch introduces a new 'struct rb_root_cached' which is just the
root with a cached pointer to the leftmost node. The reason why the
regular rb_root was not extended instead of adding a new structure was
that this allows the user to have the choice between memory footprint
and actual tree performance. The new wrappers on top of the regular
rb_root calls are:
- rb_first_cached(cached_root) -- which is a fast replacement
for rb_first.
- rb_insert_color_cached(node, cached_root, new)
- rb_erase_cached(node, cached_root)
In addition, augmented cached interfaces are also added for basic
insertion and deletion operations; which becomes important for the
interval tree changes.
With the exception of the inserts, which adds a bool for updating the
new leftmost, the interfaces are kept the same. To this end, porting rb
users to the cached version becomes really trivial, and keeping current
rbtree semantics for users that don't care about the optimization
requires zero overhead.
Link: http://lkml.kernel.org/r/20170719014603.19029-2-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "Multibyte memset variations", v4.
A relatively common idiom we're missing is a function to fill an area of
memory with a pattern which is larger than a single byte. I first
noticed this with a zram patch which wanted to fill a page with an
'unsigned long' value. There turn out to be quite a few places in the
kernel which can benefit from using an optimised function rather than a
loop; sometimes text size, sometimes speed, and sometimes both. The
optimised PowerPC version (not included here) improves performance by
about 30% on POWER8 on just the raw memset_l().
Most of the extra lines of code come from the three testcases I added.
This patch (of 8):
memset16(), memset32() and memset64() are like memset(), but allow the
caller to fill the destination with a value larger than a single byte.
memset_l() and memset_p() allow the caller to use unsigned long and
pointer values respectively.
Link: http://lkml.kernel.org/r/20170720184539.31609-2-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull MD updates from Shaohua Li:
"This update mainly fixes bugs:
- Make raid5 ppl support several ppl from Pawel
- Several raid5-cache bug fixes from Song
- Bitmap fixes from Neil and Me
- One raid1/10 regression fix since 4.12 from Me
- Other small fixes and cleanup"
* tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/bitmap: disable bitmap_resize for file-backed bitmaps.
raid5-ppl: Recovery support for multiple partial parity logs
md: Runtime support for multiple ppls
md/raid0: attach correct cgroup info in bio
lib/raid6: align AVX512 constants to 512 bits, not bytes
raid5: remove raid5_build_block
md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
md: replace seq_release_private with seq_release
md: notify about new spare disk in the container
md/raid1/10: reset bio allocated from mempool
md/raid5: release/flush io in raid5_do_work()
md/bitmap: copy correct data for bitmap super
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
Pull writeback error handling updates from Jeff Layton:
"This pile continues the work from last cycle on better tracking
writeback errors. In v4.13 we added some basic errseq_t infrastructure
and converted a few filesystems to use it.
This set continues refining that infrastructure, adds documentation,
and converts most of the other filesystems to use it. The main
exception at this point is the NFS client"
* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
ecryptfs: convert to file_write_and_wait in ->fsync
mm: remove optimizations based on i_size in mapping writeback waits
fs: convert a pile of fsync routines to errseq_t based reporting
gfs2: convert to errseq_t based writeback error reporting for fsync
fs: convert sync_file_range to use errseq_t based error-tracking
mm: add file_fdatawait_range and file_write_and_wait
fuse: convert to errseq_t based error tracking for fsync
mm: consolidate dax / non-dax checks for writeback
Documentation: add some docs for errseq_t
errseq: rename __errseq_set to errseq_set
Pull driver core update from Greg KH:
"Here is the "big" driver core update for 4.14-rc1.
It's really not all that big, the largest thing here being some
firmware tests to help ensure that that crazy api is working properly.
There's also a new uevent for when a driver is bound or unbound from a
device, fixing a hole in the driver model that's been there since the
very beginning. Many thanks to Dmitry for being persistent and
pointing out how wrong I was about this all along :)
Patches for the new uevents are already in the systemd tree, if people
want to play around with them.
Otherwise just a number of other small api changes and updates here,
nothing major. All of these patches have been in linux-next for a
while with no reported issues"
* tag 'driver-core-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits)
driver core: bus: Fix a potential double free
Do not disable driver and bus shutdown hook when class shutdown hook is set.
base: topology: constify attribute_group structures.
base: Convert to using %pOF instead of full_name
kernfs: Clarify lockdep name for kn->count
fbdev: uvesafb: remove DRIVER_ATTR() usage
xen: xen-pciback: remove DRIVER_ATTR() usage
driver core: Document struct device:dma_ops
mod_devicetable: Remove excess description from structured comment
test_firmware: add batched firmware tests
firmware: enable a debug print for batched requests
firmware: define pr_fmt
firmware: send -EINTR on signal abort on fallback mechanism
test_firmware: add test case for SIGCHLD on sync fallback
initcall_debug: add deferred probe times
Input: axp20x-pek - switch to using devm_device_add_group()
Input: synaptics_rmi4 - use devm_device_add_group() for attributes in F01
Input: gpio_keys - use devm_device_add_group() for attributes
driver core: add devm_device_add_group() and friends
driver core: add device_{add|remove}_group() helpers
...