Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if
this sock has no connection (assoc), sk state would be changed to
SCTP_SS_CLOSING, which is not as we expect.
Besides, after that if users try to listen on this sock, kernel
could even panic when it dereference sctp_sk(sk)->bind_hash in
sctp_inet_listen, as bind_hash is null when sock has no assoc.
This patch is to move sk state change after checking sk assocs
is not empty, and also merge these two if() conditions and reduce
indent level.
Fixes: d46e416c11 ("sctp: sctp should change socket state when shutdown is received")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Baoquan He says:
====================
bnx2: Wait for in-flight DMA to complete at probe stage
This is v2 post.
In commit 3e1be7a ("bnx2: Reset device during driver initialization"),
firmware requesting code was moved from open stage to probe stage.
The reason is in kdump kernel hardware iommu need device be reset in
driver probe stage, otherwise those in-flight DMA from 1st kernel
will continue going and look up into the newly created io-page tables.
However bnx2 chip resetting involves firmware requesting issue, that
need be done in open stage.
Michale Chan suggested we can just wait for the old in-flight DMA to
complete at probe stage, then though without device resetting, we
don't need to worry the old in-flight DMA could continue looking up
the newly created io-page tables.
v1->v2:
Michael suggested to wait for the in-flight DMA to complete at probe
stage. So give up the old method of trying to reset chip at probe
stage, take the new way accordingly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In-flight DMA from 1st kernel could continue going in kdump kernel.
New io-page table has been created before bnx2 does reset at open stage.
We have to wait for the in-flight DMA to complete to avoid it look up
into the newly created io-page table at probe stage.
Suggested-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 3e1be7ad2d.
When people build bnx2 driver into kernel, it will fail to detect
and load firmware because firmware is contained in initramfs and
initramfs has not been uncompressed yet during do_initcalls. So
revert commit 3e1be7a and work out a new way in the later patch.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake "unmached" to "unmatched" in
debug message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake "successed" to "succeeded"
in debug message. Also unwrap multi-line literal string.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This mistake was causing debugfs directory creation
failures when multiple ibmvnic devices were probed.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_copy_channel() doesn't correctly clear the napi_hash related state.
This means that when napi_hash_add is called for that channel nothing is
done, and we are left with a copy of the napi_hash_node from the old
channel. When we later call napi_hash_del() on this channel we have a
stale napi_hash_node.
Corruption is only seen when there are multiple entries in one of the
napi_hash lists. This is made more likely by having a very large number
of channels. Testing was carried out with 512 channels - 32 channels on
each of 16 ports.
This failure typically appears as protection faults within napi_by_id()
or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring
sizes are changed (ethtool -G).
Fixes: 36763266bb ("sfc: Add support for busy polling")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: Couple of fixes
Please, queue-up both for stable. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The device's neighbour table is periodically dumped in order to update
the kernel about active neighbours. A single dump session may span
multiple queries, until the response carries less records than requested
or when a record (can contain up to four neighbour entries) is not full.
Current code stops the session when the number of returned records is
zero, which can result in infinite loop in case of high packet rate.
Fix this by stopping the session according to the above logic.
Fixes: c723c735fa ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When binding port to a newly created span entry, its refcount is
initialized to zero even though it has a bound port. That leads
to unexpected behaviour when the user tries to delete that port
from the span entry.
Fix this by initializing the reference count to 1.
Also add a warning to put function.
Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: 2 bug fixes.
Bug fixes in bnxt_setup_tc() and VF vitual link state.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If the physical link is down and the VF virtual link is set to "enable",
the current code does not always work. If the link is down but the
cable is attached, the firmware returns LINK_SIGNAL instead of
NO_LINK. The current code is treating LINK_SIGNAL as link up.
The fix is to treat link as down when the link_status != LINK.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic is missing the check on whether the tx and rx rings are sharing
completion rings or not.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit cf00713a65 ("include/uapi/linux/atm_zatm.h: include
linux/time.h").
This attempted to fix userspace breakage that no longer existed when
the patch was merged. Almost one year earlier, commit 70ba07b675
("atm: remove 'struct zatm_t_hist'") deleted the struct in question.
After this patch was merged, we now have to deal with people being
unable to include this header in conjunction with standard C library
headers like stdlib.h (which linux-atm does). Example breakage:
x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \
-I. -DCPPFLAGS_TEST -I../../src/include -O2 -march=native -pipe -g \
-frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \
-Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \
-Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \
-Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c
In file included from /usr/include/linux/atm_zatm.h:17:0,
from zntune.c:17:
/usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’
struct timespec {
^
In file included from /usr/include/sys/select.h:43:0,
from /usr/include/sys/types.h:219,
from /usr/include/stdlib.h:314,
from zntune.c:9:
/usr/include/time.h:120:8: note: originally defined here
struct timespec
^
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()
Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.
We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq
Many thanks to syzkaller team and Marco for giving us a reproducer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().
After commit 5943634fc5 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.
So, use the new_gw for neigh lookup.
Changes from v1:
- use __ipv4_neigh_lookup instead (per Eric Dumazet).
Fixes: 5943634fc5 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If usb_submit_urb() called from the open function fails, the following
crash may be observed.
r8152 8-1:1.0 eth0: intr_urb submit failed: -19
...
r8152 8-1:1.0 eth0: v1.08.3
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b
pgd = ffffffc0e7305000
[6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
...
PC is at notifier_chain_register+0x2c/0x58
LR is at blocking_notifier_chain_register+0x54/0x70
...
Call trace:
[<ffffffc0002407f8>] notifier_chain_register+0x2c/0x58
[<ffffffc000240bdc>] blocking_notifier_chain_register+0x54/0x70
[<ffffffc00026991c>] register_pm_notifier+0x24/0x2c
[<ffffffbffc183200>] rtl8152_open+0x3dc/0x3f8 [r8152]
[<ffffffc000808000>] __dev_open+0xac/0x104
[<ffffffc0008082f8>] __dev_change_flags+0xb0/0x148
[<ffffffc0008083c4>] dev_change_flags+0x34/0x70
[<ffffffc000818344>] do_setlink+0x2c8/0x888
[<ffffffc0008199d4>] rtnl_newlink+0x328/0x644
[<ffffffc000819e98>] rtnetlink_rcv_msg+0x1a8/0x1d4
[<ffffffc0008373c8>] netlink_rcv_skb+0x68/0xd0
[<ffffffc000817990>] rtnetlink_rcv+0x2c/0x3c
[<ffffffc000836d1c>] netlink_unicast+0x16c/0x234
[<ffffffc00083720c>] netlink_sendmsg+0x340/0x364
[<ffffffc0007e85d0>] sock_sendmsg+0x48/0x60
[<ffffffc0007e9c30>] SyS_sendto+0xe0/0x120
[<ffffffc0007e9cb0>] SyS_send+0x40/0x4c
[<ffffffc000203e34>] el0_svc_naked+0x24/0x28
Clean up error handling to avoid registering the notifier if the open
function is going to fail.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
__LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from
double inclusion, though it uses a single underscore prefix.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
After Tom patch, thoff field could point past the end of the buffer,
this could fool some callers.
If an skb was provided, skb->len should be the upper limit.
If not, hlen is supposed to be the upper limit.
Fixes: a6e544b0a8 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yibin Yang <yibyang@cisco.com
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau says:
====================
bpf: Fix bpf_redirect to an ipip/ip6tnl dev
This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an
ipip/ip6tnl. The current problem is IP-EthHdr-IP is sent out instead of
IP-IP.
Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit()
in act_mirred.c which is only available in net-next. We can consider to
refactor it once this patch is pulled into net-next from net.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The test creates two netns, ns1 and ns2. The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.
The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].
At ns1, the VIPs are routed _via_ the host.
At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl. The test is configured in a way so
that both ingress and egress can be tested.
At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified. The return path is routed to the dev ipip/ip6tnl.
During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the bpf program calls bpf_redirect(dev, 0) and dev is
an ipip/ip6tnl, it currently includes the mac header.
e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
of IP-IP.
The fix is to pull the mac header. At ingress, skb_postpull_rcsum()
is not needed because the ethhdr should have been pulled once already
and then got pushed back just before calling the bpf_prog.
At egress, this patch calls skb_postpull_rcsum().
If bpf_redirect(dev, BPF_F_INGRESS) is called,
it also fails now because it calls dev_forward_skb() which
eventually calls eth_type_trans(skb, dev). The eth_type_trans()
will set skb->type = PACKET_OTHERHOST because the mac address
does not match the redirecting dev->dev_addr. The PACKET_OTHERHOST
will eventually cause the ip_rcv() errors out. To fix this,
____dev_forward_skb() is added.
Joint work with Daniel Borkmann.
Fixes: cfc7381b30 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Fixes: 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: Couple of router fixes
v1->v2:
- patch2:
- use net_eq
====================
Signed-off-by: David S. Miller <davem@davemloft.net>