Commit Graph

1472 Commits

Author SHA1 Message Date
Herbert Xu
372ee0be3e gro: Fix bogus gso_size on the first fraglist entry
commit 622e0ca1cd upstream.

When GRO produces fraglist entries, and the resulting skb hits
an interface that is incapable of TSO but capable of FRAGLIST,
we end up producing a bogus packet with gso_size non-zero.

This was reported in the field with older versions of KVM that
did not set the TSO bits on tuntap.

This patch fixes that.

Reported-by: Igor Zhang <yugzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26 17:21:37 -07:00
Jarek Poplawski
602b219309 gro: Re-fix different skb headrooms
[ Upstream commit 64289c8e68 ]

The patch: "gro: fix different skb headrooms" in its part:
"2) allocate a minimal skb for head of frag_list" is buggy. The copied
skb has p->data set at the ip header at the moment, and skb_gro_offset
is the length of ip + tcp headers. So, after the change the length of
mac header is skipped. Later skb_set_mac_header() sets it into the
NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
original skb was wrongly allocated, so let's copy it as it was.

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
fixes commit: 3d3be4333f

Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26 17:21:15 -07:00
Eric Dumazet
c69a7edf62 gro: fix different skb headrooms
[ Upstream commit 3d3be4333f ]

Packets entering GRO might have different headrooms, even for a given
flow (because of implementation details in drivers, like copybreak).
We cant force drivers to deliver packets with a fixed headroom.

1) fix skb_segment()

skb_segment() makes the false assumption headrooms of fragments are same
than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
errors, and crash later in skb_copy_and_csum_dev()

2) allocate a minimal skb for head of frag_list

skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
allocate a fresh skb. This adds NET_SKB_PAD to a padding already
provided by netdevice, depending on various things, like copybreak.

Use alloc_skb() to allocate an exact padding, to reduce cache line
needs:
NET_SKB_PAD + NET_IP_ALIGN

bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626

Many thanks to Plamen Petrov, testing many debugging patches !
With help of Jarek Poplawski.

Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-09-26 17:21:15 -07:00
Jarek Poplawski
28917a0280 net: Fix a memmove bug in dev_gro_receive()
[ Upstream commit e5093aec2e ]

>Xin Xiaohui wrote:
> I looked into the code dev_gro_receive(), found the code here:
> if the frags[0] is pulled to 0, then the page will be released,
> and memmove() frags left.
> Is that right? I'm not sure if memmove do right or not, but
> frags[0].size is never set after memove at least. what I think
> a simple way is not to do anything if we found frags[0].size == 0.
> The patch is as followed.
...

This version of the patch fixes the bug directly in memmove.

Reported-by: "Xin, Xiaohui" <xiaohui.xin@intel.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-26 16:41:50 -07:00
Ben Hutchings
2441cdd9bb ethtool: Fix potential user buffer overflow for ETHTOOL_{G, S}RXFH
commit bf988435bd upstream.

struct ethtool_rxnfc was originally defined in 2.6.27 for the
ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
fields.  It was then extended in 2.6.30 to support various additional
commands.  These commands should have been defined to use a new
structure, but it is too late to change that now.

Since user-space may still be using the old structure definition
for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
additional fields, only copy the originally defined fields to and
from user-space.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02 10:21:06 -07:00
Ben Hutchings
81cb675339 ethtool: Fix potential kernel buffer overflow in ETHTOOL_GRXCLSRLALL
commit db048b6903 upstream.

On a 32-bit machine, info.rule_cnt >= 0x40000000 leads to integer
overflow and the buffer may be smaller than needed.  Since
ETHTOOL_GRXCLSRLALL is unprivileged, this can presumably be used for at
least denial of service.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02 10:20:53 -07:00
Doug Kehn
d7091c2bbd net/core: neighbour update Oops
commit 91a72a7059 upstream.

When configuring DMVPN (GRE + openNHRP) and a GRE remote
address is configured a kernel Oops is observed.  The
obserseved Oops is caused by a NULL header_ops pointer
(neigh->dev->header_ops) in neigh_update_hhs() when

void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
= neigh->dev->header_ops->cache_update;

is executed.  The dev associated with the NULL header_ops is
the GRE interface.  This patch guards against the
possibility that header_ops is NULL.

This Oops was first observed in kernel version 2.6.26.8.

Signed-off-by: Doug Kehn <rdkehn@yahoo.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-02 10:20:53 -07:00
Eric W. Biederman
c6476d517e scm: Only support SCM_RIGHTS on unix domain sockets.
commit 76dadd76c2 upstream.

We use scm_send and scm_recv on both unix domain and
netlink sockets, but only unix domain sockets support
everything required for file descriptor passing,
so error if someone attempts to pass file descriptors
over netlink sockets.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:58 -07:00
Ajit Khaparde
3118b8e153 net: bug fix for vlan + gro issue
[ Upstream commit e76b69cc01 ]

Traffic (tcp) doesnot start on a vlan interface when gro is enabled.
Even the tcp handshake was not taking place.
This is because, the eth_type_trans call before the netif_receive_skb
in napi_gro_finish() resets the skb->dev to napi->dev from the previously
set vlan netdev interface. This causes the ip_route_input to drop the
incoming packet considering it as a packet coming from a martian source.

I could repro this on 2.6.32.7 (stable) and 2.6.33-rc7.
With this fix, the traffic starts and the test runs fine on both vlan
and non-vlan interfaces.

CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:39 -07:00
Eric W. Biederman
9c42239c50 net-sysfs: Use rtnl_trylock in wireless sysfs methods.
[ Upstream commit b8afe64161 ]

The wireless sysfs methods like the rest of the networking sysfs
methods are removed with the rtnl_lock held and block until
the existing methods stop executing.  So use rtnl_trylock
and restart_syscall so that the code continues to work.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15 08:49:38 -07:00
Rafael J. Wysocki
6d14e6b46a pktgen: Fix freezing problem
commit 1b3f720bf0 upstream.

Add missing try_to_freeze() to one of the pktgen_thread_worker() code
paths so that it doesn't block suspend/hibernation.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=15006

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Ciprian Dorin Craciun <ciprian.craciun@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:56 -08:00
Eric Dumazet
b49b8e39e9 dst: call cond_resched() in dst_gc_task()
commit 2fc1b5dd99 upstream.

Kernel bugzilla #15239

On some workloads, it is quite possible to get a huge dst list to
process in dst_gc_task(), and trigger soft lockup detection.

Fix is to call cond_resched(), as we run in process context.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23 07:37:55 -08:00
Octavian Purdila
16b8efad28 tcp: update the netstamp_needed counter when cloning sockets
[ Upstream commit 704da560c0 ]

This fixes a netstamp_needed accounting issue when the listen socket
has SO_TIMESTAMP set:

    s = socket(AF_INET, SOCK_STREAM, 0);
    setsockopt(s, SOL_SOCKET, SO_TIMESTAMP, 1); -> netstamp_needed = 1
    bind(s, ...);
    listen(s, ...);
    s2 = accept(s, ...); -> netstamp_needed = 1
    close(s2); -> netstamp_needed = 0
    close(s); -> netstamp_needed = -1

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09 04:50:55 -08:00
Eric W. Biederman
ad496b34c6 net: Fix userspace RTM_NEWLINK notifications.
commit d90a909e1f upstream.

I received some bug reports about userspace programs having problems
because after RTM_NEWLINK was received they could not immeidate
access files under /proc/sys/net/ because they had not been
registered yet.

The problem was trivailly fixed by moving the userspace
notification from rtnetlink_event to the end of register_netdevice.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-18 14:05:38 -08:00
Eric Dumazet
3e9848403a pktgen: Fix netdevice unregister
When multi queue compatable names are used by pktgen (eg eth0@0),
we currently cannot unload a NIC driver if one of its device
is currently in use.

Allow pktgen_find_dev() to find pktgen devices by their suffix (netdev name)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-24 14:50:53 -08:00
Eric Dumazet
593f63b0be pktgen: Fix device name compares
Commit e6fce5b916 (pktgen: multiqueue etc.) tried to relax
the pktgen restriction of one device per kernel thread, adding a '@'
tag to device names.

Problem is we dont perform check on full pktgen device name.
This allows adding many time same 'device' to pktgen thread

 pgset "add_device eth0@0"

one session later :

 pgset "add_device eth0@0"

(This doesnt find previous device)

This consumes ~1.5 MBytes of vmalloc memory per round and also triggers
this warning :

[  673.186380] proc_dir_entry 'pktgen/eth0@0' already registered
[  673.186383] Modules linked in: pktgen ixgbe ehci_hcd psmouse mdio mousedev evdev [last unloaded: pktgen]
[  673.186406] Pid: 6219, comm: bash Tainted: G        W  2.6.32-rc7-03302-g41cec6f-dirty #16
[  673.186410] Call Trace:
[  673.186417]  [<ffffffff8104a29b>] warn_slowpath_common+0x7b/0xc0
[  673.186422]  [<ffffffff8104a341>] warn_slowpath_fmt+0x41/0x50
[  673.186426]  [<ffffffff8114e789>] proc_register+0x109/0x210
[  673.186433]  [<ffffffff8100bf2e>] ? apic_timer_interrupt+0xe/0x20
[  673.186438]  [<ffffffff8114e905>] proc_create_data+0x75/0xd0
[  673.186444]  [<ffffffffa006ad38>] pktgen_thread_write+0x568/0x640 [pktgen]
[  673.186449]  [<ffffffffa006a7d0>] ? pktgen_thread_write+0x0/0x640 [pktgen]
[  673.186453]  [<ffffffff81149144>] proc_reg_write+0x84/0xc0
[  673.186458]  [<ffffffff810f5a58>] vfs_write+0xb8/0x180
[  673.186463]  [<ffffffff810f5c11>] sys_write+0x51/0x90
[  673.186468]  [<ffffffff8100b51b>] system_call_fastpath+0x16/0x1b
[  673.186470] ---[ end trace ccbb991b0a8d994d ]---

Solution to this problem is to use a odevname field (includes @ tag and suffix),
instead of using netdevice name.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-23 10:39:35 -08:00
Herbert Xu
69c0cab120 gro: Fix illegal merging of trailer trash
When we've merged skb's with page frags, and subsequently receive
a trailer skb (< MSS) that is not completely non-linear (this can
occur on Intel NICs if the packet size falls below the threshold),
GRO ends up producing an illegal GSO skb with a frag_list.

This is harmless unless the skb is then forwarded through an
interface that requires software GSO, whereupon the GSO code
will BUG.

This patch detects this case in GRO and avoids merging the
trailer skb.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 05:18:18 -08:00
Eric Dumazet
91e9c07bd6 net: Fix the rollback test in dev_change_name()
net: Fix the rollback test in dev_change_name()

In dev_change_name() an err variable is used for storing the original
call_netdevice_notifiers() errno (negative) and testing for a rollback
error later, but the test for non-zero is wrong, because the err might
have positive value as well - from dev_alloc_name(). It means the
rollback for a netdevice with a number > 0 will never happen. (The err
test is reordered btw. to make it more readable.)

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-16 03:30:35 -08:00
Eric Dumazet
9d410c7960 net: fix sk_forward_alloc corruption
On UDP sockets, we must call skb_free_datagram() with socket locked,
or risk sk_forward_alloc corruption. This requirement is not respected
in SUNRPC.

Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC

Reported-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-30 12:25:12 -07:00
Eric Dumazet
66ed1e5ec1 pktgen: Dont leak kernel memory
While playing with pktgen, I realized IP ID was not filled and a
random value was taken, possibly leaking 2 bytes of kernel memory.
 
We can use an increasing ID, this can help diagnostics anyway.

Also clear packet payload, instead of leaking kernel memory.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-24 06:55:20 -07:00
Johannes Berg
a160ee69c6 wext: let get_wireless_stats() sleep
A number of drivers (recently including cfg80211-based ones)
assume that all wireless handlers, including statistics, can
sleep and they often also implicitly assume that the rtnl is
held around their invocation. This is almost always true now
except when reading from sysfs:

  BUG: sleeping function called from invalid context at kernel/mutex.c:280
  in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head
  2 locks held by head/10450:
   #0:  (&buffer->mutex){+.+.+.}, at: [<c10ceb99>] sysfs_read_file+0x24/0xf4
   #1:  (dev_base_lock){++.?..}, at: [<c12844ee>] wireless_show+0x1a/0x4c
  Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1
  Call Trace:
   [<c102301c>] __might_sleep+0xf0/0xf7
   [<c1324355>] mutex_lock_nested+0x1a/0x33
   [<f8cea53b>] wdev_lock+0xd/0xf [cfg80211]
   [<f8cea58f>] cfg80211_wireless_stats+0x45/0x12d [cfg80211]
   [<c13118d6>] get_wireless_stats+0x16/0x1c
   [<c12844fe>] wireless_show+0x2a/0x4c

Fix this by using the rtnl instead of dev_base_lock.

Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-05 02:22:23 -07:00
Eric Dumazet
9240d7154e pktgen: restore nanosec delays
Commit fd29cf72 (pktgen: convert to use ktime_t)
inadvertantly converted "delay" parameter from nanosec to microsec.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-04 21:08:58 -07:00
Eric Dumazet
896a7cf8d8 pktgen: Fix multiqueue handling
It is not currently possible to instruct pktgen to use one selected tx queue.

When Robert added multiqueue support in commit 45b270f8, he added
an interval (queue_map_min, queue_map_max), and his code doesnt take
into account the case of min = max, to select one tx queue exactly.

I suspect a high performance setup on a eight txqueue device wants
to use exactly eight cpus, and assign one tx queue to each sender.

This patchs makes pktgen select the right tx queue, not the first one.

Also updates Documentation to reflect Robert changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-04 21:08:54 -07:00
Eric Dumazet
417bc4b855 pktgen: Fix delay handling
After last pktgen changes, delay handling is wrong.

pktgen actually sends packets at full line speed.

Fix is to update pkt_dev->next_tx even if spin() returns early,
so that next spin() calls have a chance to see a positive delay.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-01 09:29:45 -07:00
Eric Dumazet
81bbb3d404 net: restore tx timestamping for accelerated vlans
Since commit 9b22ea5609
( net: fix packet socket delivery in rx irq handler )

We lost rx timestamping of packets received on accelerated vlans.

Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps
too late (ie at skb dequeueing time, not at skb queueing time)

14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1
14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1

14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2
14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2

14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3
14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30 16:42:42 -07:00