Commit Graph

4483 Commits

Author SHA1 Message Date
Linus Torvalds
60a6ca6c94 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Two lockdep fixes for bugs introduced by the cross-release dependency
  tracking feature - plus a commit that disables it because performance
  regressed in an absymal fashion on some systems"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Disable cross-release features for now
  locking/selftest: Avoid false BUG report
  locking/lockdep: Fix stacktrace mess
2017-10-14 15:14:20 -04:00
Ingo Molnar
b483cf3bc2 locking/lockdep: Disable cross-release features for now
Johan Hovold reported a big lockdep slowdown on his system, caused by lockdep:

> I had noticed that the BeagleBone Black boot time appeared to have
> increased significantly with 4.14 and yesterday I finally had time to
> investigate it.
>
> Boot time (from "Linux version" to login prompt) had in fact doubled
> since 4.13 where it took 17 seconds (with my current config) compared to
> the 35 seconds I now see with 4.14-rc4.
>
> I quick bisect pointed to lockdep and specifically the following commit:
>
>	28a903f63e ("locking/lockdep: Handle non(or multi)-acquisition of a crosslock")

Because the final v4.14 release is close, disable the cross-release lockdep
features for now.

Bisected-by: Johan Hovold <johan@kernel.org>
Debugged-by: Johan Hovold <johan@kernel.org>
Reported-by: Johan Hovold <johan@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Lindgren <tony@atomide.com>
Cc: kernel-team@lge.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mm@kvack.org
Cc: linux-omap@vger.kernel.org
Link: http://lkml.kernel.org/r/20171014072659.f2yr6mhm5ha3eou7@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14 12:50:26 +02:00
Randy Dunlap
cc3fa84045 lib/Kconfig.debug: kernel hacking menu: runtime testing: keep tests together
Expand the "Runtime testing" menu by including more entries inside it
instead of after it.  This is just Kconfig symbol movement.

This causes the (arch-independent) Runtime tests to be presented
(listed) all in one place instead of in multiple places.

Link: http://lkml.kernel.org/r/c194e5c4-2042-bf94-a2d8-7aa13756e257@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-13 16:18:32 -07:00
Peter Zijlstra
c7e2f69d3e locking/selftest: Avoid false BUG report
The work-around for the expected failure is providing another failure :/

Only when CONFIG_PROVE_LOCKING=y do we increment unexpected_testcase_failures,
so only then do we need to decrement, otherwise we'll end up with a negative
number and that will again trigger a BUG (printout, not crash).

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: d82fed7529 ("locking/lockdep/selftests: Fix mixed read-write ABBA tests")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 10:04:29 +02:00
Sergey Senozhatsky
656d61ce96 lib/ratelimit.c: use deferred printk() version
printk_ratelimit() invokes ___ratelimit() which may invoke a normal
printk() (pr_warn() in this particular case) to warn about suppressed
output.  Given that printk_ratelimit() may be called from anywhere, that
pr_warn() is dangerous - it may end up deadlocking the system.  Fix
___ratelimit() by using deferred printk().

Sasha reported the following lockdep error:

 : Unregister pv shared memory for cpu 8
 : select_fallback_rq: 3 callbacks suppressed
 : process 8583 (trinity-c78) no longer affine to cpu8
 :
 : ======================================================
 : WARNING: possible circular locking dependency detected
 : 4.14.0-rc2-next-20170927+ #252 Not tainted
 : ------------------------------------------------------
 : migration/8/62 is trying to acquire lock:
 : (&port_lock_key){-.-.}, at: serial8250_console_write()
 :
 : but task is already holding lock:
 : (&rq->lock){-.-.}, at: sched_cpu_dying()
 :
 : which lock already depends on the new lock.
 :
 :
 : the existing dependency chain (in reverse order) is:
 :
 : -> #3 (&rq->lock){-.-.}:
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock()
 : task_fork_fair()
 : sched_fork()
 : copy_process.part.31()
 : _do_fork()
 : kernel_thread()
 : rest_init()
 : start_kernel()
 : x86_64_start_reservations()
 : x86_64_start_kernel()
 : verify_cpu()
 :
 : -> #2 (&p->pi_lock){-.-.}:
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock_irqsave()
 : try_to_wake_up()
 : default_wake_function()
 : woken_wake_function()
 : __wake_up_common()
 : __wake_up_common_lock()
 : __wake_up()
 : tty_wakeup()
 : tty_port_default_wakeup()
 : tty_port_tty_wakeup()
 : uart_write_wakeup()
 : serial8250_tx_chars()
 : serial8250_handle_irq.part.25()
 : serial8250_default_handle_irq()
 : serial8250_interrupt()
 : __handle_irq_event_percpu()
 : handle_irq_event_percpu()
 : handle_irq_event()
 : handle_level_irq()
 : handle_irq()
 : do_IRQ()
 : ret_from_intr()
 : native_safe_halt()
 : default_idle()
 : arch_cpu_idle()
 : default_idle_call()
 : do_idle()
 : cpu_startup_entry()
 : rest_init()
 : start_kernel()
 : x86_64_start_reservations()
 : x86_64_start_kernel()
 : verify_cpu()
 :
 : -> #1 (&tty->write_wait){-.-.}:
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock_irqsave()
 : __wake_up_common_lock()
 : __wake_up()
 : tty_wakeup()
 : tty_port_default_wakeup()
 : tty_port_tty_wakeup()
 : uart_write_wakeup()
 : serial8250_tx_chars()
 : serial8250_handle_irq.part.25()
 : serial8250_default_handle_irq()
 : serial8250_interrupt()
 : __handle_irq_event_percpu()
 : handle_irq_event_percpu()
 : handle_irq_event()
 : handle_level_irq()
 : handle_irq()
 : do_IRQ()
 : ret_from_intr()
 : native_safe_halt()
 : default_idle()
 : arch_cpu_idle()
 : default_idle_call()
 : do_idle()
 : cpu_startup_entry()
 : rest_init()
 : start_kernel()
 : x86_64_start_reservations()
 : x86_64_start_kernel()
 : verify_cpu()
 :
 : -> #0 (&port_lock_key){-.-.}:
 : check_prev_add()
 : __lock_acquire()
 : lock_acquire()
 : _raw_spin_lock_irqsave()
 : serial8250_console_write()
 : univ8250_console_write()
 : console_unlock()
 : vprintk_emit()
 : vprintk_default()
 : vprintk_func()
 : printk()
 : ___ratelimit()
 : __printk_ratelimit()
 : select_fallback_rq()
 : sched_cpu_dying()
 : cpuhp_invoke_callback()
 : take_cpu_down()
 : multi_cpu_stop()
 : cpu_stopper_thread()
 : smpboot_thread_fn()
 : kthread()
 : ret_from_fork()
 :
 : other info that might help us debug this:
 :
 : Chain exists of:
 :   &port_lock_key --> &p->pi_lock --> &rq->lock
 :
 :  Possible unsafe locking scenario:
 :
 :        CPU0                    CPU1
 :        ----                    ----
 :   lock(&rq->lock);
 :                                lock(&p->pi_lock);
 :                                lock(&rq->lock);
 :   lock(&port_lock_key);
 :
 :  *** DEADLOCK ***
 :
 : 4 locks held by migration/8/62:
 : #0: (&p->pi_lock){-.-.}, at: sched_cpu_dying()
 : #1: (&rq->lock){-.-.}, at: sched_cpu_dying()
 : #2: (printk_ratelimit_state.lock){....}, at: ___ratelimit()
 : #3: (console_lock){+.+.}, at: vprintk_emit()
 :
 : stack backtrace:
 : CPU: 8 PID: 62 Comm: migration/8 Not tainted 4.14.0-rc2-next-20170927+ #252
 : Call Trace:
 : dump_stack()
 : print_circular_bug()
 : check_prev_add()
 : ? add_lock_to_list.isra.26()
 : ? check_usage()
 : ? kvm_clock_read()
 : ? kvm_sched_clock_read()
 : ? sched_clock()
 : ? check_preemption_disabled()
 : __lock_acquire()
 : ? __lock_acquire()
 : ? add_lock_to_list.isra.26()
 : ? debug_check_no_locks_freed()
 : ? memcpy()
 : lock_acquire()
 : ? serial8250_console_write()
 : _raw_spin_lock_irqsave()
 : ? serial8250_console_write()
 : serial8250_console_write()
 : ? serial8250_start_tx()
 : ? lock_acquire()
 : ? memcpy()
 : univ8250_console_write()
 : console_unlock()
 : ? __down_trylock_console_sem()
 : vprintk_emit()
 : vprintk_default()
 : vprintk_func()
 : printk()
 : ? show_regs_print_info()
 : ? lock_acquire()
 : ___ratelimit()
 : __printk_ratelimit()
 : select_fallback_rq()
 : sched_cpu_dying()
 : ? sched_cpu_starting()
 : ? rcutree_dying_cpu()
 : ? sched_cpu_starting()
 : cpuhp_invoke_callback()
 : ? cpu_disable_common()
 : take_cpu_down()
 : ? trace_hardirqs_off_caller()
 : ? cpuhp_invoke_callback()
 : multi_cpu_stop()
 : ? __this_cpu_preempt_check()
 : ? cpu_stop_queue_work()
 : cpu_stopper_thread()
 : ? cpu_stop_create()
 : smpboot_thread_fn()
 : ? sort_range()
 : ? schedule()
 : ? __kthread_parkme()
 : kthread()
 : ? sort_range()
 : ? kthread_create_on_node()
 : ret_from_fork()
 : process 9121 (trinity-c78) no longer affine to cpu8
 : smpboot: CPU 8 is now offline

Link: http://lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhatsky@gmail.com
Fixes: 6b1d174b0c ("ratelimit: extend to print suppressed messages on release")
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Borislav Petkov <bp@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:26 -07:00
Eric Biggers
a70e43a59d lib/idr.c: fix comment for idr_replace()
idr_replace() returns the old value on success, not 0.

Link: http://lkml.kernel.org/r/20170918162642.37511-1-ebiggers3@gmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:25 -07:00
Colin Ian King
8cb5d74828 lib/lz4: make arrays static const, reduces object code size
Don't populate the read-only arrays dec32table and dec64table on the
stack, instead make them both static const.  Makes the object code
smaller by over 10K bytes:

  Before:
     text	   data	    bss	    dec	    hex	filename
    31500	      0	      0	  31500	   7b0c	lib/lz4/lz4_decompress.o

  After:
     text	   data	    bss	    dec	    hex	filename
    20237	    176	      0	  20413	   4fbd	lib/lz4/lz4_decompress.o

(gcc version 7.2.0 x86_64)

Link: http://lkml.kernel.org/r/20170921221939.20820-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:25 -07:00
Linus Torvalds
c4142ed602 Merge tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
 "Here are a few small fixes for 4.14-rc4.

  The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
  straggler made it in through some other tree (odds are, one of
  mine...) So there's a simple removal of the last user, and then
  finally the macro is removed from the tree.

  There's a fix for old crazy udev instances that insist on reloading a
  module when it is removed from the kernel due to the new uevents for
  bind/unbind. This fixes the reported regression, hopefully some year
  in the future we can drop the workaround, once users update to the
  latest version, but I'm not holding my breath.

  And then there's a build fix for a linker warning, and a buffer
  overflow fix to match the PCI fixes you took through the PCI tree in
  the same area.

  All of these have been in linux-next for a few weeks while I've been
  traveling, sorry for the delay"

* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver core: remove DRIVER_ATTR
  fpga: altera-cvp: remove DRIVER_ATTR() usage
  driver core: platform: Don't read past the end of "driver_override" buffer
  base: arch_topology: fix section mismatch build warnings
  driver core: suppress sending MODALIAS in UNBIND uevents
2017-10-03 08:57:07 -07:00
Linus Torvalds
cd4175b116 Merge branch 'parisc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:

 - Unbreak parisc bootloader by avoiding a gcc-7 optimization to convert
   multiple byte-accesses into one word-access.

 - Add missing HWPOISON page fault handler code. I completely missed
   that when I added HWPOISON support during this merge window and it
   only showed up now with the madvise07 LTP test case.

 - Fix backtrace unwinding to stop when stack start has been reached.

 - Issue warning if initrd has been loaded into memory regions with
   broken RAM modules.

 - Fix HPMC handler (parisc hardware fault handler) to comply with
   architecture specification.

 - Avoid compiler warnings about too large frame sizes.

 - Minor init-section fixes.

* 'parisc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Unbreak bootloader due to gcc-7 optimizations
  parisc: Reintroduce option to gzip-compress the kernel
  parisc: Add HWPOISON page fault handler code
  parisc: Move init_per_cpu() into init section
  parisc: Check if initrd was loaded into broken RAM
  parisc: Add PDCE_CHECK instruction to HPMC handler
  parisc: Add wrapper for pdc_instr() firmware function
  parisc: Move start_parisc() into init section
  parisc: Stop unwinding at start of stack
  parisc: Fix too large frame size warnings
2017-09-23 06:14:06 -10:00
Linus Torvalds
71aa60f67f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix NAPI poll list corruption in enic driver, from Christian
    Lamparter.

 2) Fix route use after free, from Eric Dumazet.

 3) Fix regression in reuseaddr handling, from Josef Bacik.

 4) Assert the size of control messages in compat handling since we copy
    it in from userspace twice. From Meng Xu.

 5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.)
    from Ursula Braun.

 6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn.

 7) Don't use ARRAY_SIZE on spinlock array which might have zero
    entries, from Geert Uytterhoeven.

 8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov
    Kasiviswanathan.

 9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin
    Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
  inet: fix improper empty comparison
  net: use inet6_rcv_saddr to compare sockets
  net: set tb->fast_sk_family
  net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
  MAINTAINERS: update git tree locations for ieee802154 subsystem
  net: prevent dst uses after free
  net: phy: Fix truncation of large IRQ numbers in phy_attached_print()
  net/smc: no close wait in case of process shut down
  net/smc: introduce a delay
  net/smc: terminate link group if out-of-sync is received
  net/smc: longer delay for client link group removal
  net/smc: adapt send request completion notification
  net/smc: adjust net_device refcount
  net/smc: take RCU read lock for routing cache lookup
  net/smc: add receive timeout check
  net/smc: add missing dev_put
  net: stmmac: Cocci spatch "of_table"
  lan78xx: Use default values loaded from EEPROM/OTP after reset
  lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
  lan78xx: Fix for eeprom read/write when device auto suspend
  ...
2017-09-23 05:41:27 -10:00
Helge Deller
432654df90 parisc: Fix too large frame size warnings
The parisc architecture has larger stack frames than most other
architectures on 32-bit kernels.

Increase the maximum allowed stack frame to 1280 bytes for parisc to
avoid warnings in the do_sys_poll() and pat_memconfig() functions.

Signed-off-by: Helge Deller <deller@gmx.de>
2017-09-22 19:46:07 +02:00
Petar Penkov
a90bcb86ae iov_iter: fix page_copy_sane for compound pages
Issue is that if the data crosses a page boundary inside a compound
page, this check will incorrectly trigger a WARN_ON.

To fix this, compute the order using the head of the compound page and
adjust the offset to be relative to that head.

Fixes: 72e809ed81 ("iov_iter: sanity checks for copy to/from page
primitives")

Signed-off-by: Petar Penkov <ppenkov@google.com>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-20 23:27:48 -04:00
Andreas Gruenbacher
0647169cf9 rhashtable: Documentation tweak
Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table.  Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 15:18:33 -07:00
Dmitry Torokhov
6878e7de6a driver core: suppress sending MODALIAS in UNBIND uevents
The current udev rules cause modules to be loaded on all device events save
for "remove". With the introduction of KOBJ_BIND/KOBJ_UNBIND this causes
issues, as driver modules that have devices bound to their drivers get
immediately reloaded, and it appears to the user that module unloading doe
snot work.

The standard udev matching rule is foillowing:

ENV{MODALIAS}=="?*", RUN{builtin}+="kmod load $env{MODALIAS}"

Given that MODALIAS data is not terribly useful for UNBIND event, let's zap
it from the generated uevent environment until we get userspace updated
with the correct udev rule that only loads modules on "add" event.

Reported-by: Jakub Kicinski <kubakici@wp.pl>
Tested-by: Jakub Kicinski <kubakici@wp.pl>
Fixes: 1455cf8dbf ("driver core: emit uevents when device is bound ...")
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-18 16:48:33 +02:00
Linus Torvalds
e7cdb60fd2 Merge branch 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull zstd support from Chris Mason:
 "Nick Terrell's patch series to add zstd support to the kernel has been
  floating around for a while. After talking with Dave Sterba, Herbert
  and Phillip, we decided to send the whole thing in as one pull
  request.

  zstd is a big win in speed over zlib and in compression ratio over
  lzo, and the compression team here at FB has gotten great results
  using it in production. Nick will continue to update the kernel side
  with new improvements from the open source zstd userland code.

  Nick has a number of benchmarks for the main zstd code in his lib/zstd
  commit:

      I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB
      of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel
      Core i7 processor, 16 GB of RAM, and a SSD. I benchmarked using
      `silesia.tar` [3], which is 211,988,480 B large. Run the following
      commands for the benchmark:

        sudo modprobe zstd_compress_test
        sudo mknod zstd_compress_test c 245 0
        sudo cp silesia.tar zstd_compress_test

      The time is reported by the time of the userland `cp`.
      The MB/s is computed with

        1,536,217,008 B / time(buffer size, hash)

      which includes the time to copy from userland.
      The Adjusted MB/s is computed with

        1,536,217,088 B / (time(buffer size, hash) - time(buffer size, none)).

      The memory reported is the amount of memory the compressor
      requests.

        | Method   | Size (B) | Time (s) | Ratio | MB/s    | Adj MB/s | Mem (MB) |
        |----------|----------|----------|-------|---------|----------|----------|
        | none     | 11988480 |    0.100 |     1 | 2119.88 |        - |        - |
        | zstd -1  | 73645762 |    1.044 | 2.878 |  203.05 |   224.56 |     1.23 |
        | zstd -3  | 66988878 |    1.761 | 3.165 |  120.38 |   127.63 |     2.47 |
        | zstd -5  | 65001259 |    2.563 | 3.261 |   82.71 |    86.07 |     2.86 |
        | zstd -10 | 60165346 |   13.242 | 3.523 |   16.01 |    16.13 |    13.22 |
        | zstd -15 | 58009756 |   47.601 | 3.654 |    4.45 |     4.46 |    21.61 |
        | zstd -19 | 54014593 |  102.835 | 3.925 |    2.06 |     2.06 |    60.15 |
        | zlib -1  | 77260026 |    2.895 | 2.744 |   73.23 |    75.85 |     0.27 |
        | zlib -3  | 72972206 |    4.116 | 2.905 |   51.50 |    52.79 |     0.27 |
        | zlib -6  | 68190360 |    9.633 | 3.109 |   22.01 |    22.24 |     0.27 |
        | zlib -9  | 67613382 |   22.554 | 3.135 |    9.40 |     9.44 |     0.27 |

      I benchmarked zstd decompression using the same method on the same
      machine. The benchmark file is located in the upstream zstd repo
      under `contrib/linux-kernel/zstd_decompress_test.c` [4]. The
      memory reported is the amount of memory required to decompress
      data compressed with the given compression level. If you know the
      maximum size of your input, you can reduce the memory usage of
      decompression irrespective of the compression level.

        | Method   | Time (s) | MB/s    | Adjusted MB/s | Memory (MB) |
        |----------|----------|---------|---------------|-------------|
        | none     |    0.025 | 8479.54 |             - |           - |
        | zstd -1  |    0.358 |  592.15 |        636.60 |        0.84 |
        | zstd -3  |    0.396 |  535.32 |        571.40 |        1.46 |
        | zstd -5  |    0.396 |  535.32 |        571.40 |        1.46 |
        | zstd -10 |    0.374 |  566.81 |        607.42 |        2.51 |
        | zstd -15 |    0.379 |  559.34 |        598.84 |        4.61 |
        | zstd -19 |    0.412 |  514.54 |        547.77 |        8.80 |
        | zlib -1  |    0.940 |  225.52 |        231.68 |        0.04 |
        | zlib -3  |    0.883 |  240.08 |        247.07 |        0.04 |
        | zlib -6  |    0.844 |  251.17 |        258.84 |        0.04 |
        | zlib -9  |    0.837 |  253.27 |        287.64 |        0.04 |

  I ran a long series of tests and benchmarks on the btrfs side and the
  gains are very similar to the core benchmarks Nick ran"

* 'zstd-minimal' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  squashfs: Add zstd support
  btrfs: Add zstd support
  lib: Add zstd modules
  lib: Add xxhash module
2017-09-14 17:30:49 -07:00
Michal Hocko
0ee931c4e3 mm: treewide: remove GFP_TEMPORARY allocation flag
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:16 -07:00
Geert Uytterhoeven
8185f5708d lib/test_bitmap.c: use ULL suffix for 64-bit constants
With gcc 4.1.2:

  lib/test_bitmap.c:189: warning: integer constant is too large for `long' type
  lib/test_bitmap.c:190: warning: integer constant is too large for `long' type
  lib/test_bitmap.c:194: warning: integer constant is too large for `long' type
  lib/test_bitmap.c:195: warning: integer constant is too large for `long' type

Add the missing "ULL" suffix to fix this.

Link: http://lkml.kernel.org/r/1505040523-31230-1-git-send-email-geert@linux-m68k.org
Fixes: 60ef690018 ("bitmap: introduce BITMAP_FROM_U64()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:15 -07:00
Eric Biggers
a47f68d6a9 idr: remove WARN_ON_ONCE() when trying to replace negative ID
IDR only supports non-negative IDs.  There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b2867 ("idr: remove WARN_ON_ONCE() on negative IDs").

Then it was added back by commit 0a835c4f09 ("Reimplement IDR and IDA
using the radix tree").  However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered.  drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).

So once again remove the bogus WARN_ON_ONCE().

This bug was found by syzkaller, which encountered the following
warning:

    WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
     do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
     do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
     do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
     do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
     invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
    RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
    RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
    RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
    RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
    R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
    R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
     drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
     drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
     drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
     drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
     vfs_ioctl fs/ioctl.c:45 [inline]
     do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
     SYSC_ioctl fs/ioctl.c:700 [inline]
     SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
     entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

    #include <fcntl.h>
    #include <stddef.h>
    #include <stdint.h>
    #include <sys/ioctl.h>
    #include <drm/drm.h>

    int main(void)
    {
            int cardfd = open("/dev/dri/card0", O_RDONLY);

            ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
                  &(struct drm_gem_close) { .handle = -1 } );
    }

Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
Fixes: 0a835c4f09 ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [v4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-13 18:53:15 -07:00
Linus Torvalds
89fd915c40 Merge tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm from Dan Williams:
 "A rework of media error handling in the BTT driver and other updates.
  It has appeared in a few -next releases and collected some late-
  breaking build-error and warning fixups as a result.

  Summary:

   - Media error handling support in the Block Translation Table (BTT)
     driver is reworked to address sleeping-while-atomic locking and
     memory-allocation-context conflicts.

   - The dax_device lookup overhead for xfs and ext4 is moved out of the
     iomap hot-path to a mount-time lookup.

   - A new 'ecc_unit_size' sysfs attribute is added to advertise the
     read-modify-write boundary property of a persistent memory range.

   - Preparatory fix-ups for arm and powerpc pmem support are included
     along with other miscellaneous fixes"

* tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
  libnvdimm, btt: fix format string warnings
  libnvdimm, btt: clean up warning and error messages
  ext4: fix null pointer dereference on sbi
  libnvdimm, nfit: move the check on nd_reserved2 to the endpoint
  dax: fix FS_DAX=n BLOCK=y compilation
  libnvdimm: fix integer overflow static analysis warning
  libnvdimm, nd_blk: remove mmio_flush_range()
  libnvdimm, btt: rework error clearing
  libnvdimm: fix potential deadlock while clearing errors
  libnvdimm, btt: cache sector_size in arena_info
  libnvdimm, btt: ensure that flags were also unchanged during a map_read
  libnvdimm, btt: refactor map entry operations with macros
  libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
  libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute
  ext4: perform dax_device lookup at mount
  ext2: perform dax_device lookup at mount
  xfs: perform dax_device lookup at mount
  dax: introduce a fs_dax_get_by_bdev() helper
  libnvdimm, btt: check memory allocation failure
  libnvdimm, label: fix index block size calculation
  ...
2017-09-11 13:10:57 -07:00
Alexey Dobriyan
f22ef333c3 cpumask: make cpumask_next() out-of-line
Every for_each_XXX_cpu() invocation calls cpumask_next() which is an
inline function:

	static inline unsigned int cpumask_next(int n, const struct cpumask *srcp)
	{
	        /* -1 is a legal arg here. */
	        if (n != -1)
	                cpumask_check(n);
	        return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
	}

However!

find_next_bit() is regular out-of-line function which means "nr_cpu_ids"
load and increment happen at the caller resulting in a lot of bloat

x86_64 defconfig:
	add/remove: 3/0 grow/shrink: 8/373 up/down: 155/-5668 (-5513)
x86_64 allyesconfig-ish:
	add/remove: 3/1 grow/shrink: 57/634 up/down: 3515/-28177 (-24662) !!!

Some archs redefine find_next_bit() but it is OK:

	m68k		inline but SMP is not supported
	arm		out-of-line
	unicore32	out-of-line

Function call will happen anyway, so move load and increment into callee.

Link: http://lkml.kernel.org/r/20170824230010.GA1593@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:51 -07:00
Dan Carpenter
52c270d358 test_kmod: flip INT checks to be consistent
Most checks will check for min and then max, except the int check.  Flip
the checks to be consistent with the other code.

[mcgrof@kernel.org: massaged commit log]
Link: http://lkml.kernel.org/r/20170802211707.28020-3-mcgrof@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:50 -07:00
Dan Carpenter
f520409cfd test_kmod: remove paranoid UINT_MAX check on uint range processing
The UINT_MAX comparison is not needed because "max" is already an unsigned
int, and we expect developer C code max value input to have a sensible 0 -
UINT_MAX range.  Note that if it so happens to be UINT_MAX + 1 it would
lead to an issue, but we expect the developer to know this.

[mcgrof@kernel.org: massaged commit log]
Link: http://lkml.kernel.org/r/20170802211707.28020-2-mcgrof@kernel.org
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:50 -07:00
Takashi Iwai
afdb05e9d6 lib/oid_registry.c: X.509: fix the buffer overflow in the utility function for OID string
The sprint_oid() utility function doesn't properly check the buffer size
that it causes that the warning in vsnprintf() be triggered.  For
example on v4.1 kernel:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 2357 at lib/vsprintf.c:1867 vsnprintf+0x5a7/0x5c0()
  ...

We can trigger this issue by injecting maliciously crafted x509 cert in
DER format.  Just using hex editor to change the length of OID to over
the length of the SEQUENCE container.  For example:

    0:d=0  hl=4 l= 980 cons: SEQUENCE
    4:d=1  hl=4 l= 700 cons:  SEQUENCE
    8:d=2  hl=2 l=   3 cons:   cont [ 0 ]
   10:d=3  hl=2 l=   1 prim:    INTEGER           :02
   13:d=2  hl=2 l=   9 prim:   INTEGER           :9B47FAF791E7D1E3
   24:d=2  hl=2 l=  13 cons:   SEQUENCE
   26:d=3  hl=2 l=   9 prim:    OBJECT            :sha256WithRSAEncryption
   37:d=3  hl=2 l=   0 prim:    NULL
   39:d=2  hl=2 l= 121 cons:   SEQUENCE
   41:d=3  hl=2 l=  22 cons:    SET
   43:d=4  hl=2 l=  20 cons:     SEQUENCE      <=== the SEQ length is 20
   45:d=5  hl=2 l=   3 prim:      OBJECT            :organizationName
	<=== the original length is 3, change the length of OID to over the length of SEQUENCE

Pawel Wieczorkiewicz reported this problem and Takashi Iwai provided
patch to fix it by checking the bufsize in sprint_oid().

Link: http://lkml.kernel.org/r/20170903021646.2080-1-jlee@suse.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
Reported-by: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Pawel Wieczorkiewicz <pwieczorkiewicz@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:49 -07:00
Eric Dumazet
bc9ae2247a radix-tree: must check __radix_tree_preload() return value
__radix_tree_preload() only disables preemption if no error is returned.

So we really need to make sure callers always check the return value.

idr_preload() contract is to always disable preemption, so we need
to add a missing preempt_disable() if an error happened.

Similarly, ida_pre_get() only needs to call preempt_enable() in the
case no error happened.

Link: http://lkml.kernel.org/r/1504637190.15310.62.camel@edumazet-glaptop3.roam.corp.google.com
Fixes: 0a835c4f09 ("Reimplement IDR and IDA using the radix tree")
Fixes: 7ad3d4d85c ("ida: Move ida_bitmap to a percpu variable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>    [4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:49 -07:00
Baoquan He
7c61bd6983 lib/cmdline.c: remove meaningless comment
One line of code was commented out by c++ style comment for debugging, but
forgot removing it.

Clean it up.

Link: http://lkml.kernel.org/r/1503312113-11843-1-git-send-email-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:49 -07:00