Commit Graph

25841 Commits

Author SHA1 Message Date
Linus Torvalds
ff33952e4d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix object leak on IPSEC offload failure, from Steffen Klassert.

 2) Fix range checks in ipset address range addition operations, from
    Jozsef Kadlecsik.

 3) Fix pernet ops unregistration order in ipset, from Florian Westphal.

 4) Add missing netlink attribute policy for nl80211 packet pattern
    attrs, from Peng Xu.

 5) Fix PPP device destruction race, from Guillaume Nault.

 6) Write marks get lost when BPF verifier processes R1=R2 register
    assignments, causing incorrect liveness information and less state
    pruning. Fix from Alexei Starovoitov.

 7) Fix blockhole routes so that they are marked dead and therefore not
    cached in sockets, otherwise IPSEC stops working. From Steffen
    Klassert.

 8) Fix broadcast handling of UDP socket early demux, from Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
  cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
  net: thunderx: mark expected switch fall-throughs in nicvf_main()
  udp: fix bcast packet reception
  netlink: do not set cb_running if dump's start() errs
  ipv4: Fix traffic triggered IPsec connections.
  ipv6: Fix traffic triggered IPsec connections.
  ixgbe: incorrect XDP ring accounting in ethtool tx_frame param
  net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  Revert commit 1a8b6d76dc ("net:add one common config...")
  ixgbe: fix masking of bits read from IXGBE_VXLANCTRL register
  ixgbe: Return error when getting PHY address if PHY access is not supported
  netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
  netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook
  tipc: Unclone message at secondary destination lookup
  tipc: correct initialization of skb list
  gso: fix payload length when gso_size is zero
  mlxsw: spectrum_router: Avoid expensive lookup during route removal
  bpf: fix liveness marking
  doc: Fix typo "8023.ad" in bonding documentation
  ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real
  ...
2017-10-09 16:25:00 -07:00
David S. Miller
fb60bccc06 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Fix packet drops due to incorrect ECN handling in IPVS, from Vadim
   Fedorenko.

2) Fix splat with mark restoration in xt_socket with non-full-sock,
   patch from Subash Abhinov Kasiviswanathan.

3) ipset bogusly bails out when adding IPv4 range containing more than
   2^31 addresses, from Jozsef Kadlecsik.

4) Incorrect pernet unregistration order in ipset, from Florian Westphal.

5) Races between dump and swap in ipset results in BUG_ON splats, from
   Ross Lagerwall.

6) Fix chain renames in nf_tables, from JingPiao Chen.

7) Fix race in pernet codepath with ebtables table registration, from
   Artem Savkov.

8) Memory leak in error path in set name allocation in nf_tables, patch
   from Arvind Yadav.

9) Don't dump chain counters if they are not available, this fixes a
   crash when listing the ruleset.

10) Fix out of bound memory read in strlcpy() in x_tables compat code,
    from Eric Dumazet.

11) Make sure we only process TCP packets in SYNPROXY hooks, patch from
    Lin Zhang.

12) Cannot load rules incrementally anymore after xt_bpf with pinned
    objects, added in revision 1. From Shmulik Ladkani.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:39:52 -07:00
Shmulik Ladkani
98589a0998 netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'
Commit 2c16d60332 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2

Reported-by: Rafael Buchbinder <rafi@rbk.ms>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-10-09 15:18:04 +02:00
Alexei Starovoitov
8fe2d6ccd5 bpf: fix liveness marking
while processing Rx = Ry instruction the verifier does
regs[insn->dst_reg] = regs[insn->src_reg]
which often clears write mark (when Ry doesn't have it)
that was just set by check_reg_arg(Rx) prior to the assignment.
That causes mark_reg_read() to keep marking Rx in this block as
REG_LIVE_READ (since the logic incorrectly misses that it's
screened by the write) and in many of its parents (until lucky
write into the same Rx or beginning of the program).
That causes is_state_visited() logic to miss many pruning opportunities.

Furthermore mark_reg_read() logic propagates the read mark
for BPF_REG_FP as well (though it's readonly) which causes
harmless but unnecssary work during is_state_visited().
Note that do_propagate_liveness() skips FP correctly,
so do the same in mark_reg_read() as well.
It saves 0.2 seconds for the test below

program               before  after
bpf_lb-DLB_L3.o       2604    2304
bpf_lb-DLB_L4.o       11159   3723
bpf_lb-DUNKNOWN.o     1116    1110
bpf_lxc-DDROP_ALL.o   34566   28004
bpf_lxc-DUNKNOWN.o    53267   39026
bpf_netdev.o          17843   16943
bpf_overlay.o         8672    7929
time                  ~11 sec  ~4 sec

Fixes: dc503a8ad9 ("bpf/verifier: track liveness for pruning")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 23:25:17 +01:00
Linus Torvalds
27efed3e83 Merge branch 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchddog clean-up and fixes from Thomas Gleixner:
 "The watchdog (hard/softlockup detector) code is pretty much broken in
  its current state. The patch series addresses this by removing all
  duct tape and refactoring it into a workable state.

  The reasons why I ask for inclusion that late in the cycle are:

   1) The code causes lockdep splats vs. hotplug locking which get
      reported over and over. Unfortunately there is no easy fix.

   2) The risk of breakage is minimal because it's already broken

   3) As 4.14 is a long term stable kernel, I prefer to have working
      watchdog code in that and the lockdep issues resolved. I wouldn't
      ask you to pull if 4.14 wouldn't be a LTS kernel or if the
      solution would be easy to backport.

   4) The series was around before the merge window opened, but then got
      delayed due to the UP failure caused by the for_each_cpu()
      surprise which we discussed recently.

  Changes vs. V1:

   - Addressed your review points

   - Addressed the warning in the powerpc code which was discovered late

   - Changed two function names which made sense up to a certain point
     in the series. Now they match what they do in the end.

   - Fixed a 'unused variable' warning, which got not detected by the
     intel robot. I triggered it when trying all possible related config
     combinations manually. Randconfig testing seems not random enough.

  The changes have been tested by and reviewed by Don Zickus and tested
  and acked by Micheal Ellerman for powerpc"

* 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  watchdog/core: Put softlockup_threads_initialized under ifdef guard
  watchdog/core: Rename some softlockup_* functions
  powerpc/watchdog: Make use of watchdog_nmi_probe()
  watchdog/core, powerpc: Lock cpus across reconfiguration
  watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
  watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently"
  watchdog/hardlockup/perf: Cure UP damage
  watchdog/hardlockup: Clean up hotplug locking mess
  watchdog/hardlockup/perf: Simplify deferred event destroy
  watchdog/hardlockup/perf: Use new perf CPU enable mechanism
  watchdog/hardlockup/perf: Implement CPU enable replacement
  watchdog/hardlockup/perf: Implement init time detection of perf
  watchdog/hardlockup/perf: Implement init time perf validation
  watchdog/core: Get rid of the racy update loop
  watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage
  watchdog/sysctl: Clean up sysctl variable name space
  watchdog/sysctl: Get rid of the #ifdeffery
  watchdog/core: Clean up header mess
  watchdog/core: Further simplify sysctl handling
  watchdog/core: Get rid of the thread teardown/setup dance
  ...
2017-10-06 08:36:41 -07:00
Linus Torvalds
7a92616c0b Merge tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
 "This fixes a code ordering issue in the main suspend-to-idle loop that
  causes some "low power S0 idle" conditions to be incorrectly reported
  as unmet with suspend/resume debug messages enabled"

* tag 'pm-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / s2idle: Invoke the ->wake() platform callback earlier
2017-10-05 15:51:37 -07:00
Rafael J. Wysocki
ca935f8e76 Merge branch 'pm-sleep'
* pm-sleep:
  PM / s2idle: Invoke the ->wake() platform callback earlier
2017-10-06 00:24:14 +02:00
Linus Torvalds
9a431ef962 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Check iwlwifi 9000 reorder buffer out-of-space condition properly,
    from Sara Sharon.

 2) Fix RCU splat in qualcomm rmnet driver, from Subash Abhinov
    Kasiviswanathan.

 3) Fix session and tunnel release races in l2tp, from Guillaume Nault
    and Sabrina Dubroca.

 4) Fix endian bug in sctp_diag_dump(), from Dan Carpenter.

 5) Several mlx5 driver fixes from the Mellanox folks (max flow counters
    cap check, invalid memory access in IPoIB support, etc.)

 6) tun_get_user() should bail if skb->len is zero, from Alexander
    Potapenko.

 7) Fix RCU lookups in inetpeer, from Eric Dumazet.

 8) Fix locking in packet_do_bund().

 9) Handle cb->start() error properly in netlink dump code, from Jason
    A. Donenfeld.

10) Handle multicast properly in UDP socket early demux code. From Paolo
    Abeni.

11) Several erspan bug fixes in ip_gre, from Xin Long.

12) Fix use-after-free in socket filter code, in order to handle the
    fact that listener lock is no longer taken during the three-way TCP
    handshake. From Eric Dumazet.

13) Fix infoleak in RTM_GETSTATS, from Nikolay Aleksandrov.

14) Fix tail call generation in x86-64 BPF JIT, from Alexei Starovoitov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
  net: 8021q: skip packets if the vlan is down
  bpf: fix bpf_tail_call() x64 JIT
  net: stmmac: dwmac-rk: Add RK3128 GMAC support
  rndis_host: support Novatel Verizon USB730L
  net: rtnetlink: fix info leak in RTM_GETSTATS call
  socket, bpf: fix possible use after free
  mlxsw: spectrum_router: Track RIF of IPIP next hops
  mlxsw: spectrum_router: Move VRF refcounting
  net: hns3: Fix an error handling path in 'hclge_rss_init_hw()'
  net: mvpp2: Fix clock resource by adding an optional bus clock
  r8152: add Linksys USB3GIGV1 id
  l2tp: fix l2tp_eth module loading
  ip_gre: erspan device should keep dst
  ip_gre: set tunnel hlen properly in erspan_tunnel_init
  ip_gre: check packet length and mtu correctly in erspan_xmit
  ip_gre: get key from session_id correctly in erspan_rcv
  tipc: use only positive error codes in messages
  ppp: fix __percpu annotation
  udp: perform source validation for mcast early demux
  IPv4: early demux can return an error code
  ...
2017-10-05 08:40:09 -07:00
Linus Torvalds
b7e1416441 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "A lot of stuff, sorry about that. A week on a beach, then a bunch of
  time catching up then more time letting it bake in -next. Shan't do
  that again!"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (51 commits)
  include/linux/fs.h: fix comment about struct address_space
  checkpatch: fix ignoring cover-letter logic
  m32r: fix build failure
  lib/ratelimit.c: use deferred printk() version
  kernel/params.c: improve STANDARD_PARAM_DEF readability
  kernel/params.c: fix an overflow in param_attr_show
  kernel/params.c: fix the maximum length in param_get_string
  mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long
  mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function
  kernel/kcmp.c: drop branch leftover typo
  memremap: add scheduling point to devm_memremap_pages
  mm, page_alloc: add scheduling point to memmap_init_zone
  mm, memory_hotplug: add scheduling point to __add_pages
  lib/idr.c: fix comment for idr_replace()
  mm: memcontrol: use vmalloc fallback for large kmem memcg arrays
  kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
  include/linux/bitfield.h: remove 32bit from FIELD_GET comment block
  lib/lz4: make arrays static const, reduces object code size
  exec: binfmt_misc: kill the onstack iname[BINPRM_BUF_SIZE] array
  exec: binfmt_misc: fix race between load_misc_binary() and kill_node()
  ...
2017-10-04 09:30:50 -07:00
Linus Torvalds
013a8ee628 Merge tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixlets from Steven Rostedt:
 "Two updates:

   - A memory fix with left over code from spliting out ftrace_ops and
     function graph tracer, where the function graph tracer could reset
     the trampoline pointer, leaving the old trampoline not to be freed
     (memory leak).

   - The update to Paul's patch that added the unnecessary READ_ONCE().
     This removes the unnecessary READ_ONCE() instead of having to
     rebase the branch to update the patch that added it"

* tag 'trace-v4.14-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  rcu: Remove extraneous READ_ONCE()s from rcu_irq_{enter,exit}()
  ftrace: Fix kmemleak in unregister_ftrace_graph
2017-10-04 08:34:01 -07:00
Thomas Gleixner
0b62bf862d watchdog/core: Put softlockup_threads_initialized under ifdef guard
The variable is unused when the softlockup detector is disabled in Kconfig.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-04 11:30:50 +02:00
Thomas Gleixner
5587185ddb watchdog/core: Rename some softlockup_* functions
The function names made sense up to the point where the watchdog
(re)configuration was unified to use softlockup_reconfigure_threads() for
all configuration purposes. But that includes scenarios which solely
configure the nmi watchdog.

Rename softlockup_reconfigure_threads() and softlockup_init_threads() so
the function names match the functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
2017-10-04 10:53:54 +02:00
Thomas Gleixner
34ddaa3e5c powerpc/watchdog: Make use of watchdog_nmi_probe()
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu()
on powerpc because it is called multiple times for the boot CPU.

The first call is via:

  start_wd_on_cpu+0x80/0x2f0
  watchdog_nmi_reconfigure+0x124/0x170
  softlockup_reconfigure_threads+0x110/0x130
  lockup_detector_init+0xbc/0xe0
  kernel_init_freeable+0x18c/0x37c
  kernel_init+0x2c/0x160
  ret_from_kernel_thread+0x5c/0xbc

And then again via the CPU hotplug registration:

  start_wd_on_cpu+0x80/0x2f0
  cpuhp_invoke_callback+0x194/0x620
  cpuhp_thread_fun+0x7c/0x1b0
  smpboot_thread_fn+0x290/0x2a0
  kthread+0x168/0x1b0
  ret_from_kernel_thread+0x5c/0xbc

This can be avoided by setting up the cpu hotplug state with nocalls and
move the initialization to the watchdog_nmi_probe() function. That
initializes the hotplug callbacks without invoking the callback and the
following core initialization function then configures the watchdog for the
online CPUs (in this case CPU0) via softlockup_reconfigure_threads().

Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
2017-10-04 10:53:54 +02:00
Thomas Gleixner
e31d6883f2 watchdog/core, powerpc: Lock cpus across reconfiguration
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and
threads and reaquiring for restart, the code and the protection rules
become more obvious when holding cpu hotplug lock across the full
reconfiguration.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
2017-10-04 10:53:54 +02:00
Thomas Gleixner
6b9dc4806b watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure()
into two stages. One to stop the NMI and one to restart it after
reconfiguration. That was done by adding a boolean 'run' argument to the
code, which is functionally correct but not necessarily a piece of art.

Replace it by two explicit functions: watchdog_nmi_stop() and
watchdog_nmi_start().

Fixes: 6592ad2fcc ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage")
Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
2017-10-04 10:53:53 +02:00
Jean Delvare
e0596c80f4 kernel/params.c: improve STANDARD_PARAM_DEF readability
Align the parameters passed to STANDARD_PARAM_DEF for clarity.

Link: http://lkml.kernel.org/r/20170928162728.756143cc@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:26 -07:00
Jean Delvare
96802e6b1d kernel/params.c: fix an overflow in param_attr_show
Function param_attr_show could overflow the buffer it is operating on.

The buffer size is PAGE_SIZE, and the string returned by
attribute->param->ops->get is generated by scnprintf(buffer, PAGE_SIZE,
...) so it could be PAGE_SIZE - 1 long, with the terminating '\0' at the
very end of the buffer.  Calling strcat(..., "\n") on this isn't safe, as
the '\0' will be replaced by '\n' (OK) and then another '\0' will be added
past the end of the buffer (not OK.)

Simply add the trailing '\n' when writing the attribute contents to the
buffer originally.  This is safe, and also faster.

Credits to Teradata for discovering this issue.

Link: http://lkml.kernel.org/r/20170928162602.60c379c7@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:26 -07:00
Jean Delvare
90ceb2a3ad kernel/params.c: fix the maximum length in param_get_string
The length parameter of strlcpy() is supposed to reflect the size of the
target buffer, not of the source string.  Harmless in this case as the
buffer is PAGE_SIZE long and the source string is always much shorter than
this, but conceptually wrong, so let's fix it.

Link: http://lkml.kernel.org/r/20170928162515.24846b4f@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:26 -07:00
Cyrill Gorcunov
c9653850c9 kernel/kcmp.c: drop branch leftover typo
The else branch been left over and escaped the source code refresh.  Not
a problem but better clean it up.

Fixes: 0791e3644e ("kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files")
Link: http://lkml.kernel.org/r/20170917165838.GA1887@uranus.lan
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:25 -07:00
Michal Hocko
1fdcce6e16 memremap: add scheduling point to devm_memremap_pages
devm_memremap_pages is initializing struct pages in for_each_device_pfn
and that can take quite some time.  We have even seen a soft lockup
triggering on a non preemptive kernel

  NMI watchdog: BUG: soft lockup - CPU#61 stuck for 22s! [kworker/u641:11:1808]
  [...]
  RIP: 0010:[<ffffffff8118b6b7>]  [<ffffffff8118b6b7>] devm_memremap_pages+0x327/0x430
  [...]
  Call Trace:
    pmem_attach_disk+0x2fd/0x3f0 [nd_pmem]
    nvdimm_bus_probe+0x64/0x110 [libnvdimm]
    driver_probe_device+0x1f7/0x420
    bus_for_each_drv+0x52/0x80
    __device_attach+0xb0/0x130
    bus_probe_device+0x87/0xa0
    device_add+0x3fc/0x5f0
    nd_async_device_register+0xe/0x40 [libnvdimm]
    async_run_entry_fn+0x43/0x150
    process_one_work+0x14e/0x410
    worker_thread+0x116/0x490
    kthread+0xc7/0xe0
    ret_from_fork+0x3f/0x70

fix this by adding cond_resched every 1024 pages.

Link: http://lkml.kernel.org/r/20170918121410.24466-4-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Dan Williams <dan.j.williams@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:25 -07:00
Luis R. Rodriguez
3181c38e4d kernel/sysctl.c: remove duplicate UINT_MAX check on do_proc_douintvec_conv()
do_proc_douintvec_conv() has two UINT_MAX checks, we can remove one.
This has no functional changes other than fixing a compiler warning:

  kernel/sysctl.c:2190]: (warning) Identical condition '*lvalp>UINT_MAX', second condition is always false

Fixes: 4f2fec00af ("sysctl: simplify unsigned int support")
Link: http://lkml.kernel.org/r/20170919072918.12066-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:25 -07:00
Sherry Yang
a1b2289cef android: binder: drop lru lock in isolate callback
Drop the global lru lock in isolate callback before calling
zap_page_range which calls cond_resched, and re-acquire the global lru
lock before returning.  Also change return code to LRU_REMOVED_RETRY.

Use mmput_async when fail to acquire mmap sem in an atomic context.

Fix "BUG: sleeping function called from invalid context"
errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

Also restore mmput_async, which was initially introduced in commit
ec8d7c14ea ("mm, oom_reaper: do not mmput synchronously from the oom
reaper context"), and was removed in commit 2129258024 ("mm: oom: let
oom_reap_task and exit_mmap run concurrently").

Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com
Fixes: f2517eb76f ("android: binder: Add global lru shrinker to binder")
Signed-off-by: Sherry Yang <sherryy@android.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Kyle Yan <kyan@codeaurora.org>
Acked-by: Arve Hjønnevåg <arve@android.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Martijn Coenen <maco@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Riley Andrews <riandrews@android.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hoeun Ryu <hoeun.ryu@gmail.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:24 -07:00
Jean Delvare
630cc2b30a kernel/params.c: align add_sysfs_param documentation with code
This parameter is named kp, so the documentation should use that.

Fixes: 9b473de872 ("param: Fix duplicate module prefixes")
Link: http://lkml.kernel.org/r/20170919142656.64aea59e@endymion
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-03 17:54:23 -07:00
Alexei Starovoitov
90caccdd8c bpf: fix bpf_tail_call() x64 JIT
- bpf prog_array just like all other types of bpf array accepts 32-bit index.
  Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent

The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40 in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40 ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 16:04:44 -07:00
Linus Torvalds
847d9fb477 Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "The recent migration code updates assumed that migrations always
  execute from the top to the bottom once and didn't clean up internal
  states after each migration round; however, cgroup_transfer_tasks()
  repeats the inner steps multiple times and the garbage internal states
  from the previous iteration led to OOPS.

  Waiman fixed the bug by reinitializing the relevant states at the end
  of each migration round"

* 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Reinit cgroup_taskset structure before cgroup_migrate_execute() returns
2017-10-03 10:40:36 -07:00