TCP coalescing added a regression in splice(socket->pipe) performance,
for some workloads because of the way tcp_read_sock() is implemented.
The reason for this is the break when (offset + 1 != skb->len).
As we released the socket lock, this condition is possible if TCP stack
added a fragment to the skb, which can happen with TCP coalescing.
So let's go back to the beginning of the loop when this happens,
to give a chance to splice more frags per system call.
Doing so fixes the issue and makes GRO 10% faster than LRO
on CPU-bound splice() workloads instead of the opposite.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Woodhouse says:
====================
This is the result of pulling on the thread started by Krzysztof Mazur's
original patch 'pppoatm: don't send frames to destroyed vcc'.
Various problems in the pppoatm and br2684 code are solved, some of which
were easily triggered and would panic the kernel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't need to schedule the wakeup tasklet on *every* unlock; only if we
actually blocked the channel in the first place.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
The br2684 does not check if used vcc is in connected state,
causing potential Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.
Now br2684 can be assigned only on connected sockets; otherwise
-EINVAL error is returned.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The br2684 code used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.
Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
Now that we can return zero from pppoatm_send() for reasons *other* than
the queue being full, that means we can't depend on a subsequent call to
pppoatm_pop() waking the queue, and we might leave it stalled
indefinitely.
Use the ->release_cb() callback to wake the queue after the sock is
unlocked.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
Avoid submitting packets to a vcc which is being closed. Things go badly
wrong when the ->pop method gets later called after everything's been
torn down.
Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
which clears the ATM_VF_READY bit under the same lock. Otherwise, we
could end up submitting a packet to the device driver even after its
->ops->close method has been called. And it could call the vcc's ->pop
method after the protocol has been shut down. Which leads to a panic.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
The immediate use case for this is that it will allow us to ensure that a
pppoatm queue is woken after it has to drop a packet due to the sock being
locked.
Note that 'release_cb' is called when the socket is *unlocked*. This is
not to be confused with vcc_release() — which probably ought to be called
vcc_close().
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.
However the condition itself is repeated in several code locations.
Unify it by introducing 'ipv6_accept_ra()' accessor.
Also, simplify the condition expression, making it more readable.
No semantic change.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As time passed, available memory increased faster than number of
concurrent tcp sockets.
As a result, a machine with 4GB of ram gets a hash table
with 524288 slots, using 8388608 bytes of memory.
Lets change that by a 16x factor (one slot for 128 KB of ram)
Even if a small machine needs a _lot_ of sockets, tcp lookups are now
very efficient, using one cache line per socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68835aba4d (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.
This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.
Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.
Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.
Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.
The namespace check can also be done at last.
This slightly improves TCP/UDP lookup times.
IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.
With help from Ben Hutchings & Joe Perches.
Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the variable parameter length provided in the mandatory
heartbeat information parameter exceeds the calculated payload
length the packet has been corrupted. Reply with a parameter
length protocol violation message.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5e by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kfree_skb() indicates failure, which is where this is being used.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
kfree_skb() was not getting called in the case of some failures.
This was pointed out by Eric Dumazet.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the threshold for framentation of a lowpan packet from
using the MTU size to now use the MTU size minus the checksum length,
which is added by the hardware. For IEEE 802.15.4, this effectively
changes it from 127 bytes to 125 bytes.
Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug is observed on running FCoE over a VLAN device associated w/
a real device that has IFF_UNICAST_FLT set since FCoE would add unicast
address such as FLOGI MAC to the VLAN interface that FCoE is on. Since
currently, VLAN device is not inheriting the IFF_UNICAST_FLT flag from the
parent real device even though the real device is capable of doing unicast
filtering. This forces the VLAN device and its real device go to promiscuous
mode unnecessarily even the added address is actually being added to the
available unicast filter table in real device.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Cc: devel@open-fcoe.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/exthdrs_core.c
Jesse Gross says:
====================
This series of improvements for 3.8/net-next contains four components:
* Support for modifying IPv6 headers
* Support for matching and setting skb->mark for better integration with
things like iptables
* Ability to recognize the EtherType for RARP packets
* Two small performance enhancements
The movement of ipv6_find_hdr() into exthdrs_core.c causes two small merge
conflicts. I left it as is but can do the merge if you want. The conflicts
are:
* ipv6_find_hdr() and ipv6_find_tlv() were both moved to the bottom of
exthdrs_core.c. Both should stay.
* A new use of ipv6_find_hdr() was added to net/netfilter/ipvs/ip_vs_core.c
after this patch. The IPVS user has two instances of the old constant
name IP6T_FH_F_FRAG which has been renamed to IP6_FH_F_FRAG.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The pppoatm_may_send() is quite heavy and it's called three times
in pppoatm_send() and inlining costs more than 200 bytes of code
(more than 10% of total pppoatm driver code size).
add/remove: 1/0 grow/shrink: 0/1 up/down: 132/-367 (-235)
function old new delta
pppoatm_may_send - 132 +132
pppoatm_send 900 533 -367
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The vcc_destroy_socket() closes vcc before the protocol is detached
from vcc by calling vcc->push() with NULL skb. This leaves some time
window, where the protocol may call vcc->send() on closed vcc
and crash.
Now pppoatm_send(), like vcc_sendmsg(), checks for vcc flags that
indicate that vcc is not ready. If the vcc is not ready we just
drop frame. Queueing frames is much more complicated because we
don't have callbacks that inform us about vcc flags changes.
Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The ETH_P_BATMAN ethertype is now defined kernel-wide. Use it instead
of the private BATADV_ETH_P_BATMAN define.
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
This patch changes three methods to be static and removes their
EXPORT_SYMBOLs in core/dev.c and their external declaration in
netdevice.h. The methods, dev_gro_receive(), napi_frags_finish() and
napi_skb_finish(), which are in the GRO rx path, are not used
outside core/dev.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>