Commit Graph

634193 Commits

Author SHA1 Message Date
Baruch Siach 10b217681d net: bpqether.h: remove if_ether.h guard
__LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from
double inclusion, though it uses a single underscore prefix.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13 00:57:53 -05:00
Eric Dumazet 34fad54c25 net: __skb_flow_dissect() must cap its return value
After Tom patch, thoff field could point past the end of the buffer,
this could fool some callers.

If an skb was provided, skb->len should be the upper limit.
If not, hlen is supposed to be the upper limit.

Fixes: a6e544b0a8 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yibin Yang <yibyang@cisco.com
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12 23:41:53 -05:00
David S. Miller 79774d6bfa Merge branch 'fix-bpf_redirect'
Martin KaFai Lau says:

====================
bpf: Fix bpf_redirect to an ipip/ip6tnl dev

This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an
ipip/ip6tnl.  The current problem is IP-EthHdr-IP is sent out instead of
IP-IP.

Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit()
in act_mirred.c which is only available in net-next.  We can consider to
refactor it once this patch is pulled into net-next from net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12 23:38:08 -05:00
Martin KaFai Lau 90e02896f1 bpf: Add test for bpf_redirect to ipip/ip6tnl
The test creates two netns, ns1 and ns2.  The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.

    ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)

The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.

The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].

At ns1, the VIPs are routed _via_ the host.

At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl.  The test is configured in a way so
that both ingress and egress can be tested.

At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified.  The return path is routed to the dev ipip/ip6tnl.

During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).

Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12 23:38:07 -05:00
Martin KaFai Lau 4e3264d21b bpf: Fix bpf_redirect to an ipip/ip6tnl dev
If the bpf program calls bpf_redirect(dev, 0) and dev is
an ipip/ip6tnl, it currently includes the mac header.
e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
of IP-IP.

The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
is not needed because the ethhdr should have been pulled once already
and then got pushed back just before calling the bpf_prog.
At egress, this patch calls skb_postpull_rcsum().

If bpf_redirect(dev, BPF_F_INGRESS) is called,
it also fails now because it calls dev_forward_skb() which
eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
will set skb->type = PACKET_OTHERHOST because the mac address
does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
will eventually cause the ip_rcv() errors out.  To fix this,
____dev_forward_skb() is added.

Joint work with Daniel Borkmann.

Fixes: cfc7381b30 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Fixes: 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-12 23:38:07 -05:00
David S. Miller 23dd831548 Merge branch 'mlxsw-fixes'
Jiri Pirko says:

====================
mlxsw: Couple of router fixes

v1->v2:
- patch2:
 - use net_eq
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 13:02:25 -05:00
Jiri Pirko 0e3715c9c2 mlxsw: spectrum_router: Ignore FIB notification events for non-init namespaces
Since now, the table with same id in multiple netnamespaces were squashed
to a single virtual router. That is not only incorrect, it also causes
error messages when trying to use RALUE register to do double remove
of FIB entries, like this one:

mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter))

Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL),
and the infrastructure is not yet prepared to handle netnamespaces, just
ignore FIB notification events for non-init namespaces. That is clear to
do since we don't need to offload them.

Fixes: b45f64d16d ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 13:02:15 -05:00
Jiri Pirko 33b1341cd1 mlxsw: spectrum_router: Fix handling of neighbour structure
__neigh_create function works in a different way than assumed.
It passes "n" as a parameter to ndo_neigh_construct. But this "n" might
be destroyed right away before __neigh_create() returns in case there is
already another neighbour struct in the hashtable with the same dev and
primary key. That is not expected by mlxsw_sp_router_neigh_construct()
and the stored "n" points to freed memory, eventually leading to crash.

Fix this by doing tight 1:1 coupling between neighbour struct and
internal driver neigh_entry. That allows to narrow down the key in
internal driver hashtable to do lookups by "n" only.

Fixes: 6cf3c971dc ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 13:02:15 -05:00
David S. Miller 2ce0af8fd0 Merge branch 'qed-fixes'
Yuval Mintz says:

====================
qed: Fix RoCE infrastructure

This series fixes 2 basic issues with RoCE support,
one handles a missing configuration in the initial infrastructure
support while the other is a regression introduced by one of the
initial fix submissions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 12:55:26 -05:00
Ram Amrani 5c5f260908 qed: Correct rdma params configuration
Previous fix has broken RoCE support as the rdma_pf_params are now
being set into the parameters only after the params are alrady assigned
into the hw-function.

Fixes: 0189efb8f4 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 12:55:20 -05:00
Ram Amrani 8d1d8fcb21 qed: configure ll2 RoCE v1/v2 flavor correctly
Currently RoCE v2 won't operate with RDMA CM due to missing setting of
the roce-flavour in the ll2 configuration.
This patch properly sets the flavour, and deletes incorrect HSI
that doesn't [yet] exist.

Fixes: abd49676c7 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 12:55:20 -05:00
Lance Richardson 0ace81ec71 ipv4: update comment to document GSO fragmentation cases.
This is a follow-up to commit 9ee6c5dc81 ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.

Suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-10 12:01:54 -05:00
David Ahern 9b6c14d51b net: tcp response should set oif only if it is L3 master
Lorenzo noted an Android unit test failed due to e0d56fdd73:
"The expectation in the test was that the RST replying to a SYN sent to a
closed port should be generated with oif=0. In other words it should not
prefer the interface where the SYN came in on, but instead should follow
whatever the routing table says it should do."

Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
that the oif in the flow is set to the skb_iif only if skb_iif is an L3
master.

Fixes: e0d56fdd73 ("net: l3mdev: remove redundant calls")
Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 22:32:10 -05:00
Allan Chou 8da3cf2a49 Net Driver: Add Cypress GX3 VID=04b4 PID=3610.
Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
Bridge Controller (Vendor=04b4 ProdID=3610).

Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
with the Kensington SD4600P USB-C Universal Dock with Power,
which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
Controller.

A similar patch was signed-off and tested-by Allan Chou
<allan@asix.com.tw> on 2015-12-01.

Allan verified his similar patch on x86 Linux kernel 4.1.6 system
with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.

Tested-by: Allan Chou <allan@asix.com.tw>
Tested-by: Chris Roth <chris.roth@usask.ca>
Tested-by: Artjom Simon <artjom.simon@gmail.com>

Signed-off-by: Allan Chou <allan@asix.com.tw>
Signed-off-by: Chris Roth <chris.roth@usask.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 21:45:34 -05:00
David S. Miller 9fa684ec86 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a larger than usual batch of Netfilter
fixes for your net tree. This series contains a mixture of old bugs and
recently introduced bugs, they are:

1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't
   support the set element updates from the packet path. From Liping
   Zhang.

2) Fix leak when nft_expr_clone() fails, from Liping Zhang.

3) Fix a race when inserting new elements to the set hash from the
   packet path, also from Liping.

4) Handle segmented TCP SIP packets properly, basically avoid that the
   INVITE in the allow header create bogus expectations by performing
   stricter SIP message parsing, from Ulrich Weber.

5) nft_parse_u32_check() should return signed integer for errors, from
   John Linville.

6) Fix wrong allocation instead of connlabels, allocate 16 instead of
   32 bytes, from Florian Westphal.

7) Fix compilation breakage when building the ip_vs_sync code with
   CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann.

8) Destroy the new set if the transaction object cannot be allocated,
   also from Liping Zhang.

9) Use device to route duplicated packets via nft_dup only when set by
   the user, otherwise packets may not follow the right route, again
   from Liping.

10) Fix wrong maximum genetlink attribute definition in IPVS, from
    WANG Cong.

11) Ignore untracked conntrack objects from xt_connmark, from Florian
    Westphal.

12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC
    via CT target, otherwise we cannot use the h.245 helper, from
    Florian.

13) Revisit garbage collection heuristic in the new workqueue-based
    timer approach for conntrack to evict objects earlier, again from
    Florian.

14) Fix crash in nf_tables when inserting an element into a verdict map,
    from Liping Zhang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:38:18 -05:00
Mathias Krause f567e950bf rtnl: reset calcit fptr in rtnl_unregister()
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679be ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 20:18:19 -05:00
Arnd Bergmann 4053ab1bf9 vxlan: hide unused local variable
A bugfix introduced a harmless warning in v4.9-rc4:

drivers/net/vxlan.c: In function 'vxlan_group_used':
drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable]

This hides the variable inside of the same #ifdef that is
around its user. The extraneous initialization is removed
at the same time, it was accidentally introduced in the
same commit.

Fixes: c6fcc4fc5f ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:59:50 -05:00
John Allen 6dbcd8fb59 ibmvnic: Start completion queue negotiation at server-provided optimum values
Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:52:41 -05:00
David Ahern 9d1a6c4ea4 net: icmp_route_lookup should use rt dev to determine L3 domain
icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.

Fixes: 613d09b30f ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:49:39 -05:00
David S. Miller fd6f24d75a Merge branch 'qcom-emac-pause'
Timur Tabi says:

====================
net: qcom/emac: ensure that pause frames are enabled

The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it.  Therefore, we
need to enable flow control and force the phy to pass pause frames.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:45:36 -05:00
Timur Tabi df63022e18 net: qcom/emac: enable flow control if requested
If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.

Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:45:24 -05:00
Timur Tabi 3e88449344 net: qcom/emac: configure the external phy to allow pause frames
Pause frames are used to enable flow control.  A MAC can send and
receive pause frames in order to throttle traffic.  However, the PHY
must be configured to allow those frames to pass through.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 18:45:24 -05:00
Rafał Miłecki cdb26d3387 net: bgmac: fix reversed checks for clock control flag
This fixes regression introduced by patch adding feature flags. It was
already reported and patch followed (it got accepted) but it appears it
was incorrect. Instead of fixing reversed condition it broke a good one.

This patch was verified to actually fix SoC hanges caused by bgmac on
BCM47186B0.

Fixes: db791eb297 ("net: ethernet: bgmac: convert to feature flags")
Fixes: 4af1474e61 ("net: bgmac: Fix errant feature flag check")
Cc: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:32:06 -05:00
Benjamin Poirier d667f78514 bna: Add synchronization for tx ring.
We received two reports of BUG_ON in bnad_txcmpl_process() where
hw_consumer_index appeared to be ahead of producer_index. Out of order
write/read of these variables could explain these reports.

bnad_start_xmit(), as a producer of tx descriptors, has a few memory
barriers sprinkled around writes to producer_index and the device's
doorbell but they're not paired with anything in bnad_txcmpl_process(), a
consumer.

Since we are synchronizing with a device, we must use mandatory barriers,
not smp_*. Also, I didn't see the purpose of the last smp_mb() in
bnad_start_xmit().

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:31:10 -05:00
Tariq Toukan f91d718156 Revert "net/mlx4_en: Fix panic during reboot"
This reverts commit 9d2afba058.

The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.

The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09 13:29:32 -05:00