Commit Graph

696 Commits

Author SHA1 Message Date
Jesse Gross c8d5bcd1af offloading: Support multiple vlan tags in GSO.
We assume that hardware TSO can't support multiple levels of vlan tags
but we allow it to be done.  Therefore, enable GSO to parse these tags
so we can fallback to software.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 09:22:53 -08:00
Jesse Gross e1e78db628 offloading: Make scatter/gather more tolerant of vlans.
When checking if it is necessary to linearize a packet, we currently
use vlan_features if the packet contains either an in-band or out-
of-band vlan tag.  However, in-band tags aren't special in any way
for scatter/gather since they are part of the packet buffer and are
simply more data to DMA.  Therefore, only use vlan_features for out-
of-band tags, which could potentially have some interaction with
scatter/gather.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 09:22:52 -08:00
Joe Perches b194a3674f net/core/dev.c: Update WARN uses
Coalesce long formats.
Add missing newlines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-09 09:22:31 -08:00
Tom Herbert df32cc193a net: check queue_index from sock is valid for device
In dev_pick_tx recompute the queue index if the value stored in the
socket is greater than or equal to the number of real queues for the
device.  The saved index in the sock structure is not guaranteed to
be appropriate for the egress device (this could happen on a route
change or in presence of tunnelling).  The result of the queue index
being bad would be to return a bogus queue (crash could prersumably
follow).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-01 12:55:52 -07:00
Uwe Kleine-König b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Ben Hutchings 66c68bcc48 net: NETIF_F_HW_CSUM does not imply FCoE CRC offload
NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit
checksum with the checksum of an arbitrary part of the packet data,
whereas the FCoE CRC is something entirely different.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [2.6.32+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:29 -07:00
Ben Hutchings af1905dbec net: Fix some corner cases in dev_can_checksum()
dev_can_checksum() incorrectly returns true in these cases:

1. The skb has both out-of-band and in-band VLAN tags and the device
   supports checksum offload for the encapsulated protocol but only with
   one layer of encapsulation.
2. The skb has a VLAN tag and the device supports generic checksumming
   but not in conjunction with VLAN encapsulation.

Rearrange the VLAN tag checks to avoid these.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:29 -07:00
Eric Dumazet 6e3f7faf3e rps: add __rcu annotations
Add __rcu annotations to :
	(struct netdev_rx_queue)->rps_map
	(struct netdev_rx_queue)->rps_flow_table
	struct rps_sock_flow_table *rps_sock_flow_table;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:27 -07:00
Eric Dumazet 198caeca3e ipv6: ip6_ptr rcu annotations
(struct net_device)->ip6_ptr is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:43 -07:00
David S. Miller 11a766ce91 net: Increase xmit RECURSION_LIMIT to 10.
Three is definitely too low, and we know from reports that GRE tunnels
stacked as deeply as 37 levels cause stack overflows, so pick some
reasonable value between those two.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 12:51:55 -07:00
Linus Torvalds 5f05647dd8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
  bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
  vlan: Calling vlan_hwaccel_do_receive() is always valid.
  tproxy: use the interface primary IP address as a default value for --on-ip
  tproxy: added IPv6 support to the socket match
  cxgb3: function namespace cleanup
  tproxy: added IPv6 support to the TPROXY target
  tproxy: added IPv6 socket lookup function to nf_tproxy_core
  be2net: Changes to use only priority codes allowed by f/w
  tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
  tproxy: added tproxy sockopt interface in the IPV6 layer
  tproxy: added udp6_lib_lookup function
  tproxy: added const specifiers to udp lookup functions
  tproxy: split off ipv6 defragmentation to a separate module
  l2tp: small cleanup
  nf_nat: restrict ICMP translation for embedded header
  can: mcp251x: fix generation of error frames
  can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
  can-raw: add msg_flags to distinguish local traffic
  9p: client code cleanup
  rds: make local functions/variables static
  ...

Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
drivers/net/wireless/ath/ath9k/debug.c as per David
2010-10-23 11:47:02 -07:00
David S. Miller 2198a10b50 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/core/dev.c
2010-10-21 08:43:05 -07:00
stephen hemminger d0c2b0d265 napi: unexport napi_reuse_skb
The function napi_reuse_skb is only used inside core.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 04:26:38 -07:00
Ben Greear d2ed817766 net/core: Allow tagged VLAN packets to flow through VETH devices.
When there are VLANs on a VETH device, the packets being transmitted
through the VETH device may be 4 bytes bigger than MTU.  A check
in dev_forward_skb did not take this into account and so dropped
these packets.

This patch is needed at least as far back as 2.6.34.7 and should
be considered for -stable.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 04:06:29 -07:00
Jesse Gross 3701e51382 vlan: Centralize handling of hardware acceleration.
Currently each driver that is capable of vlan hardware acceleration
must be aware of the vlan groups that are configured and then pass
the stripped tag to a specialized receive function.  This is

different from other types of hardware offload in that it places a
significant amount of knowledge in the driver itself rather keeping
it in the networking core.

This makes vlan offloading function more similarly to other forms
of offloading (such as checksum offloading or TSO) by doing the
following:
* On receive, stripped vlans are passed directly to the network
core, without attempting to check for vlan groups or reconstructing
the header if no group
* vlans are made less special by folding the logic into the main
receive routines
* On transmit, the device layer will add the vlan header in software
if the hardware doesn't support it, instead of spreading that logic
out in upper layers, such as bonding.

There are a number of advantages to this:
* Fixes all bugs with drivers incorrectly dropping vlan headers at once.
* Avoids having to disable VLAN acceleration when in promiscuous mode
(good for bridging since it always puts devices in promiscuous mode).
* Keeps VLAN tag separate until given to ultimate consumer, which
avoids needing to do header reconstruction as in tg3 unless absolutely
necessary.
* Consolidates common code in core networking.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 01:26:53 -07:00
Jesse Gross 7b9c609037 vlan: Enable software emulation for vlan accleration.
Currently users of hardware vlan accleration need to know whether
the device supports it before generating packets.  However, vlan
acceleration will soon be available in a more flexible manner so
knowing ahead of time becomes much more difficult.  This adds
a software fallback path for vlan packets on devices without the
necessary offloading support, similar to other types of hardware
accleration.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-21 01:26:52 -07:00
Tom Herbert e6484930d7 net: allocate tx queues in register_netdevice
This patch introduces netif_alloc_netdev_queues which is called from
register_device instead of alloc_netdev_mq.  This makes TX queue
allocation symmetric with RX allocation.  Also, queue locks allocation
is done in netdev_init_one_queue.  Change set_real_num_tx_queues to
fail if requested number < 1 or greater than number of allocated
queues.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20 02:27:59 -07:00
Tom Herbert bd25fa7ba5 net: cleanups in RX queue allocation
Clean up in RX queue allocation.  In netif_set_real_num_rx_queues
return error on attempt to set zero queues, or requested number is
greater than number of allocated queues.  In netif_alloc_rx_queues,
do BUG_ON if queue_count is zero.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20 02:27:59 -07:00
Tom Herbert 55513fb428 net: fail alloc_netdev_mq if queue count < 1
In alloc_netdev_mq fail if requested queue_count < 1.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-20 02:27:58 -07:00
Eric Dumazet 29b4433d99 net: percpu net_device refcount
We tried very hard to remove all possible dev_hold()/dev_put() pairs in
network stack, using RCU conversions.

There is still an unavoidable device refcount change for every dst we
create/destroy, and this can slow down some workloads (routers or some
app servers, mmap af_packet)

We can switch to a percpu refcount implementation, now dynamic per_cpu
infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
per device.

On x86, dev_hold(dev) code :

before
        lock    incl 0x280(%ebx)
after:
        movl    0x260(%ebx),%eax
        incl    fs:(%eax)

Stress bench :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE)

Before:

real    1m1.662s
user    0m14.373s
sys     12m55.960s

After:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-12 12:35:25 -07:00
Tom Herbert 4315d834c1 net: Fix rxq ref counting
The rx->count reference is used to track reference counts to the
number of rx-queue kobjects created for the device.  This patch
eliminates initialization of the counter in netif_alloc_rx_queues
and instead increments the counter each time a kobject is created.
This is now symmetric with the decrement that is done when an object is
released.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-08 14:34:32 -07:00
Ben Hutchings 4e7f79511e net: Update kernel-doc for netif_set_real_num_rx_queues()
Synchronise the comment with the preceding implementation change.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-08 10:33:39 -07:00
John Fastabend 3d3211ef5c net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
Do not set num_rx_queues in netif_set_real_num_rx_queues() some
drivers will increase the real_num_rx_queues later due to a feature
changes or available interrupts increasing. By setting num_rx_queues
here this ends up creating a cap on the number of rx queues
available.

For example the ixgbe driver sets the max number of queues it intends
to use ever then sets the current number in use with the
netif_set_num_{rx|tx}_queues calls. With the current implementation
the number of rx queues gets limited so when a feature such as DCB
or FCoE is enabled the queues are no longer available.

kobjects will only be allocated for real_num_rx_queues so the waste
in memory is minimal.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-06 23:35:15 -07:00
Eric Dumazet caf586e5f2 net: add a core netdev->rx_dropped counter
In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)

We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)

This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 14:47:55 -07:00
Eric Dumazet 24824a09e3 net: dynamic ingress_queue allocation
ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)

dev_ingress_queue(dev) helper should be used only with RTNL taken.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-05 00:23:44 -07:00