Commit dc975382 "net: fec: add napi support to improve proformance"
converted the fec driver to the napi model. However, that commit
forgot to remove the call to skb_defer_rx_timestamp which is only
needed in non-napi drivers.
(The function napi_gro_receive eventually calls netif_receive_skb,
which in turn calls skb_defer_rx_timestamp.)
This patch should also be applied to the 3.9 and 3.10 kernels.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking into MLDv1/v2 code, I noticed that bridging code does
not convert it's max delay into jiffies for MLDv2 messages as we do
in core IPv6' multicast code.
RFC3810, 5.1.3. Maximum Response Code says:
The Maximum Response Code field specifies the maximum time allowed
before sending a responding Report. The actual time allowed, called
the Maximum Response Delay, is represented in units of milliseconds,
and is derived from the Maximum Response Code as follows: [...]
As we update timers that work with jiffies, we need to convert it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Linus Lüssing <linus.luessing@web.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If skb->len is too short then we should return an error. Otherwise we
read beyond the end of skb->data for several bytes.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8728c544a9 ("net: dev_pick_tx() fix") and commit
b6fe83e952 ("bonding: refine IFF_XMIT_DST_RELEASE capability")
are quite incompatible : Queue selection is disabled because skb
dst was dropped before entering bonding device.
This causes major performance regression, mainly because TCP packets
for a given flow can be sent to multiple queues.
This is particularly visible when using the new FQ packet scheduler
with MQ + FQ setup on the slaves.
We can safely revert the first commit now that 416186fbf8
("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
properly caps the queue_index.
Reported-by: Xi Wang <xii@google.com>
Diagnosed-by: Xi Wang <xii@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1f324e3887.
It seems to cause regressions, and in particular the output path
really depends upon there being a socket attached to skb->sk for
checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2().
Reported-by: Stephen Warren <swarren@wwwdotorg.org>
Reported-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3d7b46cd20 (ip_tunnel: push generic protocol handling to
ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on
an IPv4 over IPv4 tunnel.
xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But
this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb).
Signed-off-by: Li Hongjun <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so,
the netfilter owner match lost its ability to block the SYNACK packet on
outbound listening sockets. Revert the change, restoring the owner match
functionality.
This closes netfilter bugzilla #847.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we would still potentially suffer multicast packet loss if there
is just either an IGMP or an MLD querier: For the former case, we would
possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is
because we are currently assuming that if either an IGMP or MLD querier
is present that the other one is present, too.
This patch makes the behaviour and fix added in
"bridge: disable snooping if there is no querier" (b00589af3b)
to also work if there is either just an IGMP or an MLD querier on the
link: It refines the deactivation of the snooping to be protocol
specific by using separate timers for the snooped IGMP and MLD queries
as well as separate timers for our internal IGMP and MLD queriers.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
This pull request fixes some issues that arise when 6in4 or 4in6 tunnels
are used in combination with IPsec, all from Hannes Frederic Sowa and a
null pointer dereference when queueing packets to the policy hold queue.
1) We might access the local error handler of the wrong address family if
6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer
to the correct local_error to xfrm_state_afinet.
2) Add a helper function to always refer to the correct interpretation
of skb->sk.
3) Call skb_reset_inner_headers to record the position of the inner headers
when adding a new one in various ipv6 tunnels. This is needed to identify
the addresses where to send back errors in the xfrm layer.
4) Dereference inner ipv6 header if encapsulated to always call the
right error handler.
5) Choose protocol family by skb protocol to not call the wrong
xfrm{4,6}_local_error handler in case an ipv6 sockets is used
in ipv4 mode.
6) Partly revert "xfrm: introduce helper for safe determination of mtu"
because this introduced pmtu discovery problems.
7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs.
We need this to get the correct mtu informations in xfrm.
8) Fix null pointer dereference in xdst_queue_output.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.
If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.
The number of neighbour discovery messages sent is very limited,
simply use alloc_skb() and don't depend on any socket wmem space any
longer.
This patch has orginally been posted by Eric Dumazet in a modified
form.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: raw_sendmsg: don't use header's destination address
A sendto() regression was bisected and found to start with commit
f8126f1d51 (ipv4: Adjust semantics of rt->rt_gateway.)
The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.
Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.
cf. commit 2ad5b9e4bd
Reported-by: Chris Clark <chris.clark@alcatel-lucent.com>
Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com>
Tested-by: Chris Clark <chris.clark@alcatel-lucent.com>
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The zero value means that tsecr is not valid, so it's a special case.
tsoffset is used to customize tcp_time_stamp for one socket.
tsoffset is usually zero, it's used when a socket was moved from one
host to another host.
Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to
incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to
TCP_RTO_MAX.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
u32 rcv_tstamp; /* timestamp of last received ACK */
Its value used in tcp_retransmit_timer, which closes socket
if the last ack was received more then TCP_RTO_MAX ago.
Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer
is called before receiving a first ack, the connection is closed.
This patch initializes rcv_tstamp to a timestamp, when a socket was
restored.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds another entry (HP hs2434 Mobile Broadband) to the list
of exceptional devices that require a zero length packet in order to
function properly. This list was added in commit 844e88f0. The hs2434
is manufactured by Sierra Wireless, who also produces the MC7710,
which the ZLP exception list was created for in the first place. So
hopefully it is just this one producer's devices that will need this
workaround.
Tested on a DM1-4310NR HP notebook, which does not function without this
change.
Signed-off-by: Rob Gardner <robmatic@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a cpu_relaxt to sk_busy_loop.
Julie Cummings reported performance issues when hyperthreading is on.
Arjan van de Ven observed that we should have a cpu_relax() in the
busy poll loop.
Reported-by: Julie Cummings <julie.a.cummings@intel.com>
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixed the pbl(programmable burst length) setting
using DT. Even though the default pbl is 8, If there is no
pbl property in device tree file, pbl is set 0 and it causes
bandwidth degradation.
Signed-off-by: Byungho An <bh74.an@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink dump operations take module as parameter to hold
reference for entire netlink dump duration.
Currently it holds ref only on genl module which is not correct
when we use ops registered to genl from another module.
Following patch adds module pointer to genl_ops so that netlink
can hold ref count on it.
CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of genl-family with parallel ops off, dumpif() callback
is expected to run under genl_lock, But commit def3117493
(genl: Allow concurrent genl callbacks.) changed this behaviour
where only first dumpit() op was called under genl-lock.
For subsequent dump, only nlk->cb_lock was taken.
Following patch fixes it by defining locked dumpit() and done()
callback which takes care of genl-locking.
CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The net_device might be not set on the skb when we try refcounting.
This leads to a null pointer dereference in xdst_queue_output().
It turned out that the refcount to the net_device is not needed
after all. The dst_entry has a refcount to the net_device before
we queue the skb, so it can't go away. Therefore we can remove the
refcount on queueing to fix the null pointer dereference.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Since the PF gathers statistics for the VF, when the VF is about to unload
we must synchronize the release of its statistics buffer with the PF, so that
no DMA operation will be made to that address after the buffer release.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to incorrect VF/PF conditions, when unloading a VF it will not release
part of the memory it has previously allocated.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check on return code of bnx2x_vfop_config_vlan0() would lead to error
handling flow as the return value indicating an existing pending ramrod would
be erroneously considered as an error.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If driver will fail to allocate all queues, it will shrink the number of
queues and move the storage queue to its correct place (i.e., the last
queue among the newly supported number).
When changing the pointers of the new location of the FCoE queue, we need
to pay special attention to the aggregations pointer - that memory is allocated
during probe and released upon driver removal. Current implementation has 2
pointers pointing to the same chunk of allocated memory, meaning upon removal
there will be two kfree() of the same chunk while the other won't be released.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>