Commit Graph

534757 Commits

Author SHA1 Message Date
Guillaume Nault 8cb775bc0a ppp: fix device unregistration upon netns deletion
PPP devices may get automatically unregistered when their network
namespace is getting removed. This happens if the ppp control plane
daemon (e.g. pppd) exits while it is the last user of this namespace.

This leads to several races:

  * ppp_exit_net() may destroy the per namespace idr (pn->units_idr)
    before all file descriptors were released. Successive ppp_release()
    calls may then cleanup PPP devices with ppp_shutdown_interface() and
    try to use the already destroyed idr.

  * Automatic device unregistration may also happen before the
    ppp_release() call for that device gets executed. Once called on
    the file owning the device, ppp_release() will then clean it up and
    try to unregister it a second time.

To fix these issues, operations defined in ppp_shutdown_interface() are
moved to the PPP device's ndo_uninit() callback. This allows PPP
devices to be properly cleaned up by unregister_netdev() and friends.
So checking for ppp->owner is now an accurate test to decide if a PPP
device should be unregistered.

Setting ppp->owner is done in ppp_create_interface(), before device
registration, in order to avoid unprotected modification of this field.

Finally, ppp_exit_net() now starts by unregistering all remaining PPP
devices to ensure that none will get unregistered after the call to
idr_destroy().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 12:22:20 -07:00
Shaohui Xie 11e122cbe9 net: phy: fix PHY_RUNNING in phy_state_machine
Currently, if phy state is PHY_RUNNING, we always register a CHANGE
when phy works in polling or interrupt ignored, this will make the
adjust_link being called even the phy link did Not changed.

checking the phy link to make sure the link did changed before we
register a CHANGE, if link did not changed, we do nothing.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 12:18:09 -07:00
Calvin Owens 5d37852bf7 Revert "net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN"
Commit 8133534c76 ("net: limit tcp/udp rmem/wmem to
SOCK_{RCV,SND}BUF_MIN") modified four sysctls to enforce that the values
written to them are not less than SOCK_MIN_{RCV,SND}BUF.

That change causes 4096 to no longer be accepted as a valid value for
'min' in tcp_wmem and udp_wmem_min. 4096 has been the default for both
of those sysctls for a long time, and unfortunately seems to be an
extremely popular setting. This change breaks a large number of sysctl
configurations at Facebook.

That commit referred to b1cb59cf2e ("net: sysctl_net_core: check
SNDBUF and RCVBUF for min length"), which choose to use the SOCK_MIN
constants as the lower limits to avoid nasty bugs. But AFAICS, a limit
of SOCK_MIN_SNDBUF isn't necessary to do that: the BUG_ON cited in the
commit message seems to have happened because unix_stream_sendmsg()
expects a minimum of a full page (ie SK_MEM_QUANTUM) and the math broke,
not because it had less than SOCK_MIN_SNDBUF allocated.

This particular issue doesn't seem to affect TCP however: using a
setting of "1 1 1" for tcp_{r,w}mem works, although it's obviously
suboptimal. SK_MEM_QUANTUM would be a nice minimum, but it's 64K on
some archs, so there would still be breakage.

Since a value of one doesn't seem to cause any problems, we can drop the
minimum 8133534c added to fix this.

This reverts commit 8133534c76.

Fixes: 8133534c76 ("net: limit tcp/udp rmem/wmem to SOCK_MIN...")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sorin Dumitru <sorin@returnze.ro>
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-17 12:10:30 -07:00
Eric Dumazet 83fccfc394 inet: fix potential deadlock in reqsk_queue_unlink()
When replacing del_timer() with del_timer_sync(), I introduced
a deadlock condition :

reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()

inet_csk_reqsk_queue_drop() can be called from many contexts,
one being the timer handler itself (reqsk_timer_handler()).

In this case, del_timer_sync() loops forever.

Simple fix is to test if timer is pending.

Fixes: 2235f2ac75 ("inet: fix races with reqsk timers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 22:46:22 -07:00
Claudiu Manoil 2a4eebf0c4 gianfar: Restore link state settings after MAC reset
There are some MAC registers that need to be kept in sync
with the link state parameters, see adjust_link().
However, after a MAC soft reset default values for
these registers are assumed.  In some cases (excepting
if down/ if up for example) adjust_link() does not see
that these values were reset to default because the
priv->old* link parameters were left unchanged.
So, reset the priv->old* link params as well during a
MAC reset to let adjust_link() restore the MAC link
settings to the actual link state values.

Fixes following case, for example:
Setting link to 100M, changing MTU (implies MAC reset),
link state remains unchanged to 100M but MAC registers
were reset to default (1G) breaking the connectivity w/
the PHY.  Closing and re-opening the interface would
restore the MAC link parameters to the correct values.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:26:23 -07:00
Andy Whitcroft 25b97c016b ipv4: off-by-one in continuation handling in /proc/net/route
When generating /proc/net/route we emit a header followed by a line for
each route.  When a short read is performed we will restart this process
based on the open file descriptor.  When calculating the start point we
fail to take into account that the 0th entry is the header.  This leads
us to skip the first entry when doing a continuation read.

This can be easily seen with the comparison below:

  while read l; do echo "$l"; done </proc/net/route >A
  cat /proc/net/route >B
  diff -bu A B | grep '^[+-]'

On my example machine I have approximatly 10KB of route output.  There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:

  +wlan0 00000000 02021EAC 0003 0 0 400 00000000 0 0 0
  -tun1  00C0AC0A 00000000 0001 0 0 950 00C0FFFF 0 0 0

Fix up the off-by-one when reaquiring position on continuation.

Fixes: 8be33e955c ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
BugLink: http://bugs.launchpad.net/bugs/1483440
Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 21:16:29 -07:00
Linus Lüssing a516993f0a net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e4 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 17:08:39 -07:00
Linus Torvalds 5b3e2e14ea Merge tag 'dm-4.2-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:

 - two stable fixes for corruption seen in a snapshot of thinp metadata;
   metadata snapshots aren't widely used but help provide a consistent
   view of the metadata associated with an active thin-pool.

 - a dm-cache fix for the 4.2 "default" policy switch from "mq" to "smq"

* tag 'dm-4.2-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache policy smq: move 'dm-cache-default' module alias to SMQ
  dm btree: add ref counting ops for the leaves of top level btrees
  dm thin metadata: delete btrees when releasing metadata snapshot
2015-08-13 13:52:46 -07:00
Linus Torvalds ebcbf1664c Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull xen block driver fixes from Jens Axboe:
 "A few small bug fixes for xen-blk{front,back} that have been sitting
  over my vacation"

* 'for-linus' of git://git.kernel.dk/linux-block:
  xen-blkback: replace work_pending with work_busy in purge_persistent_gnt()
  xen-blkfront: don't add indirect pages to list when !feature_persistent
  xen-blkfront: introduce blkfront_gather_backend_features()
2015-08-13 13:44:32 -07:00
Linus Torvalds 6b476e1140 Merge tag 'for-linus-4.2-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:

 - revert a fix from 4.2-rc5 that was causing lots of WARNING spam.

 - fix a memory leak affecting backends in HVM guests.

 - fix PV domU hang with certain configurations.

* tag 'for-linus-4.2-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: Don't leak memory when unmapping the ring on HVM backend
  Revert "xen/events/fifo: Handle linked events when closing a port"
  x86/xen: build "Xen PV" APIC driver for domU as well
2015-08-13 13:36:22 -07:00
Linus Torvalds ed596cde94 Revert x86 sigcontext cleanups
This reverts commits 9a036b93a3 ("x86/signal/64: Remove 'fs' and 'gs'
from sigcontext") and c6f2062935 ("x86/signal/64: Fix SS handling for
signals delivered to 64-bit programs").

They were cleanups, but they break dosemu by changing the signal return
behavior (and removing 'fs' and 'gs' from the sigcontext struct - while
not actually changing any behavior - causes build problems).

Reported-and-tested-by: Stas Sergeev <stsp@list.ru>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-13 12:42:22 -07:00
Linus Torvalds 26b552e0a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Workaround hw bug when acquiring PCI bos ownership of iwlwifi
    devices, from Emmanuel Grumbach.

 2) Falling back to vmalloc in conntrack should not emit a warning, from
    Pablo Neira Ayuso.

 3) Fix NULL deref when rtlwifi driver is used as an AP, from Luis
    Felipe Dominguez Vega.

 4) Rocker doesn't free netdev on device removal, from Ido Schimmel.

 5) UDP multicast early sock demux has route handling races, from Eric
    Dumazet.

 6) Fix L4 checksum handling in openvswitch, from Glenn Griffin.

 7) Fix use-after-free in skb_set_peeked, from Herbert Xu.

 8) Don't advertize NETIF_F_FRAGLIST in virtio_net driver, this can lead
    to fraglists longer than the driver can support.  From Jason Wang.

 9) Fix mlx5 on non-4k-pagesize systems, from Carol L Soto.

10) Fix interrupt storm in bna driver, from Ivan Vecera.

11) Don't propagate -EBUSY from netlink_insert(), from Daniel Borkmann.

12) Fix inet request sock leak, from Eric Dumazet.

13) Fix TX interrupt masking and marking in TX descriptors of fs_enet
    driver, from LEROY Christophe.

14) Get rid of rule optimizer in gianfar driver, it's buggy and unlikely
    to get fixed any time soon.  From Jakub Kicinski

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  cosa: missing error code on failure in probe()
  gianfar: remove faulty filer optimizer
  gianfar: correct list membership accounting
  gianfar: correct filer table writing
  bonding: Gratuitous ARP gets dropped when first slave added
  net: dsa: Do not override PHY interface if already configured
  net: fs_enet: mask interrupts for TX partial frames.
  net: fs_enet: explicitly remove I flag on TX partial frames
  inet: fix possible request socket leak
  inet: fix races with reqsk timers
  mkiss: Fix error handling in mkiss_open()
  bnx2x: Free NVRAM lock at end of each page
  bnx2x: Prevent null pointer dereference on SKB release
  cxgb4: missing curly braces in t4_setup_debugfs()
  net-timestamp: Update skb_complete_tx_timestamp comment
  ipv6: don't reject link-local nexthop on other interface
  netlink: make sure -EBUSY won't escape from netlink_insert
  bna: fix interrupts storm caused by erroneous packets
  net: mvpp2: replace TX coalescing interrupts with hrtimer
  net: mvpp2: enable proper per-CPU TX buffers unmapping
  ...
2015-08-13 10:46:39 -07:00
Linus Torvalds 2331d30dc8 Merge tag 'edac_fix_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
 "A ppc4xx_edac fix for accessing ->csrows properly.  This driver was
  missed during the conversion a couple of years ago"

* tag 'edac_fix_for_4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, ppc4xx: Access mci->csrows array elements properly
2015-08-13 10:22:11 -07:00
Michael Walle 5c16179b55 EDAC, ppc4xx: Access mci->csrows array elements properly
The commit

  de3910eb79 ("edac: change the mem allocation scheme to
		 make Documentation/kobject.txt happy")

changed the memory allocation for the csrows member. But ppc4xx_edac was
forgotten in the patch. Fix it.

Signed-off-by: Michael Walle <michael@walle.cc>
Cc: <stable@vger.kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1437469253-8611-1-git-send-email-michael@walle.cc
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-08-13 06:02:19 +02:00
Dan Carpenter e6d006938c cosa: missing error code on failure in probe()
If register_hdlc_device() fails, the current code returns 0 but we
should return an error code instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 16:53:11 -07:00
David S. Miller e941ba8650 Merge branch 'gianfar-fixes'
Jakub Kicinski says:

====================
gianfar: filer changes

respinning with examples as requested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:47:06 -07:00
Jakub Kicinski 1f2b729334 gianfar: remove faulty filer optimizer
Current filer rule optimization is broken in several ways:
 (1) Can perform reads/writes beyond end of allocated tables.
     (gianfar_ethtool.c:1326).

(2) It breaks badly for rules with more than 2 specifiers
     (e.g. matching ip, port, tos).

Example:
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1 tos 1 action 1
Added rule with ID 254
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2 tos 2 action 9
Added rule with ID 253
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3 tos 3 action 17
Added rule with ID 252
# ./filer_decode /sys/kernel/debug/gfar1/filer_raw
00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
04: TOS  == 00000003 AND         Q:00           ctrl:0000008a prop:00000003
05: DIA  == 0a000003 AND         Q:11           ctrl:0000448c prop:0a000003
06: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
07: TOS  == 00000002 AND         Q:00           ctrl:0000008a prop:00000002
08: DIA  == 0a000002 AND         Q:09           ctrl:0000248c prop:0a000002
09: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
0a: DPT  == 00000001 AND         Q:00           ctrl:0000008e prop:00000001
0b: TOS  == 00000001     CLE     Q:01           ctrl:0000060a prop:00000001
ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000

(Entire cluster gets AND-ed together).

 (3) We observed that the masking rules it generates do not
     play well with clustering on P2020.  Only first rule
     of the cluster would ever fire.  Given that optimizer
     relies heavily on masking this is very hard to fix.

Example:
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1  action 1
Added rule with ID 254
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2  action 9
Added rule with ID 253
# ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3  action 17
Added rule with ID 252
# ./filer_decode /sys/kernel/debug/gfar1/filer_raw
00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
04: DIA  == 0a000003             Q:11           ctrl:0000440c prop:0a000003
05: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
06: DIA  == 0a000002             Q:09           ctrl:0000240c prop:0a000002
07: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
08: DPT  == 00000001     CLE     Q:01           ctrl:0000060e prop:00000001
ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000

Which looks correct according to the spec but only the first
(eth id 252)/last added rule for 10.0.0.3 will ever trigger.
As if filer did not treat the AND CLE as cluster start but
also kept AND-ing the rules.  We found no errata covering this.

The fact that nobody noticed (2) or (3) makes me think
that this feature is not very widely used and we should just
remove it.

Reported-by: Aleksander Dutkowski <adutkowski@gmail.com>
Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Acked-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:47:06 -07:00
Jakub Kicinski b5c8c8906e gianfar: correct list membership accounting
At a cost of one line let's make sure .count is correct
when calling gfar_process_filer_changes().

Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:47:05 -07:00
Jakub Kicinski a898fe040f gianfar: correct filer table writing
MAX_FILER_IDX is the last usable index.  Using less-than
will already guarantee that one entry for catch-all rule
will be left, no need to subtract 1 here.

Signed-off-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:47:05 -07:00
Venkat Venkatsubra b02e3e948d bonding: Gratuitous ARP gets dropped when first slave added
When the first slave is added (such as during bootup) the first
gratuitous ARP gets dropped. We don't see this drop during a failover.
The packet gets dropped in qdisc (noop_enqueue).

The fix is to delay the sending of gratuitous ARPs till the bond dev's
carrier is present.

It can also be worked around by setting num_grat_arp to more than 1.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:37:33 -07:00
Florian Fainelli 211c504a44 net: dsa: Do not override PHY interface if already configured
In case we need to divert reads/writes using the slave MII bus, we may have
already fetched a valid PHY interface property from Device Tree, and that
mode is used by the PHY driver to make configuration decisions.

If we could not fetch the "phy-mode" property, we will assign p->phy_interface
to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
to whether or not we should override the interface value.

Fixes: 19334920ea ("net: dsa: Set valid phy interface type")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:29:50 -07:00
Linus Torvalds 30065bfda9 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
 "Fix coarse clock monotonicity (VDSO timestamp off by one jiffy
  compared to the syscall one)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: VDSO: fix coarse clock monotonicity regression
2015-08-12 11:25:01 -07:00
Linus Torvalds 04da002d98 Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux
Pull amd drm fixes from Alex Deucher:
 "Dave is on vacation at the moment, so please pull these radeon and
  amdgpu fixes directly.

  Just a few minor things for 4.2:

   - add a new radeon pci id
   - fix a power management regression in amdgpu
   - fix HEVC command buffer validation in amdgpu"

* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: add new OLAND pci id
  Revert "drm/amdgpu: Configure doorbell to maximum slots"
  drm/amdgpu: add context buffer size check for HEVC
2015-08-12 11:13:54 -07:00
Alex Deucher e037239e5e drm/radeon: add new OLAND pci id
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2015-08-12 12:24:05 -04:00
Alex Deucher b8826b0cbf Revert "drm/amdgpu: Configure doorbell to maximum slots"
This reverts commit 78ad5cdd21.
This commit breaks dpm and suspend/resume on CZ.
2015-08-12 12:24:04 -04:00