Commit Graph

1004 Commits

Author SHA1 Message Date
David S. Miller c773e847ea netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue.
Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:13:53 -07:00
David S. Miller eb6aafe3f8 pkt_sched: Make qdisc_run take a netdev_queue.
This allows us to use this calling convention all the way down into
qdisc_restart().

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:12:38 -07:00
David S. Miller 86d804e10a netdev: Make netif_schedule() routines work with netdev_queue objects.
Only plain netif_schedule() remains taking a net_device, mostly as a
compatability item while we transition the rest of these interfaces.

Everything else calls netif_schedule_queue() or __netif_schedule(),
both of which take a netdev_queue pointer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:11:25 -07:00
David S. Miller 6fa9864b53 net: Clean up explicit ->tx_queue references in link watch.
First, we add a qdisc_tx_changing() helper which returns true if the
qdisc attachment is in transition.

Second, we remove an assertion warning which is of limited value and
is hard to express precisely in a multiqueue environment.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 23:01:06 -07:00
David S. Miller ee609cb362 netdev: Move next_sched into struct netdev_queue.
We schedule queues, not the device, for output queue processing in BH.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 22:58:37 -07:00
David S. Miller 816f3258e7 netdev: Kill qdisc_ingress, use netdev->rx_queue.qdisc instead.
Now that our qdisc management is bi-directional, per-queue, and fully
orthogonal, there is no reason to have a special ingress qdisc pointer
in struct net_device.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 22:49:00 -07:00
David S. Miller b0e1e6462d netdev: Move rest of qdisc state into struct netdev_queue
Now qdisc, qdisc_sleeping, and qdisc_list also live there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:42:10 -07:00
David S. Miller 555353cfa1 netdev: The ingress_lock member is no longer needed.
Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:33:13 -07:00
David S. Miller dc2b48475a netdev: Move queue_lock into struct netdev_queue.
The lock is now an attribute of the device queue.

One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling.  They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 17:18:23 -07:00
David S. Miller bb949fbd18 netdev: Create netdev_queue abstraction.
A netdev_queue is an entity managed by a qdisc.

Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.

The Qdisc struct is augmented with a netdev_queue pointer as well.

Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:

	net_device --> netdev_queue --> Qdisc

Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-08 16:55:56 -07:00
Patrick McHardy 4b5a698ef4 net: fix dev_set_promiscuity() breakage
Commit dad9b335 (netdevice: Fix promiscuity and allmulti overflow) broke
dev_set_promiscuity() by returning on success without reprogramming the
device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-06 15:49:08 -07:00
David S. Miller ea2aca084b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	Documentation/feature-removal-schedule.txt
	drivers/net/wan/hdlc_fr.c
	drivers/net/wireless/iwlwifi/iwl-4965.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2008-07-05 23:08:07 -07:00
Denis V. Lunev ae299fc051 net: add fib_rules_ops to flush_cache method
This is required to pass namespace context into rt_cache_flush called from
->flush_cache.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-05 19:01:28 -07:00
Santwona Behera 0853ad66b1 netdev: Add support for rx flow hash configuration, using ethtool.
Added new interfaces to ethtool to configure receive network flow
distribution across multiple rx rings using hashing.

Signed-off-by: Santwona Behera <santwona.behera@sun.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-02 03:47:41 -07:00
Patrick McHardy 2fe195cfe3 net: fib_rules: fix error code for unsupported families
The errno code returned must be negative.

Fixes "RTNETLINK answers: Unknown error 18446744073709551519".

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:59:37 -07:00
Wang Chen 93b3cff991 netdevice: Fix wrong string handle in kernel command line parsing
v1->v2: Use strlcpy() to ensure s[i].name be null-termination.

1. In netdev_boot_setup_add(), a long name will leak.
   ex. : dev=21,0x1234,0x1234,0x2345,eth123456789verylongname.........
2. In netdev_boot_setup_check(), mismatch will happen if s[i].name
   is a substring of dev->name.
   ex. : dev=...eth1 dev=...eth11

[ With feedback from Ben Hutchings. ]

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:57:19 -07:00
Wang Chen 8fde8a0769 net: Tyop of sk_filter() comment
Parameter "needlock" no long exists.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-01 19:55:40 -07:00
David S. Miller 1b63ba8a86 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/iwlwifi/iwl4965-base.c
2008-06-28 01:19:40 -07:00
Wang Chen 5dbaec5dc6 netdevice: Fix typo of dev_unicast_add() comment
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 19:35:16 -07:00
Octavian Purdila db43a282d3 tcp: fix for splice receive when used with software LRO
If an skb has nr_frags set to zero but its frag_list is not empty (as
it can happen if software LRO is enabled), and a previous
tcp_read_sock has consumed the linear part of the skb, then
__skb_splice_bits:

(a) incorrectly reports an error and

(b) forgets to update the offset to account for the linear part

Any of the two problems will cause the subsequent __skb_splice_bits
call (the one that handles the frag_list skbs) to either skip data,
or, if the unadjusted offset is greater then the size of the next skb
in the frag_list, make tcp_splice_read loop forever.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-27 17:27:21 -07:00
Eric W. Biederman b9f75f45a6 netns: Don't receive new packets in a dead network namespace.
Alexey Dobriyan <adobriyan@gmail.com> writes:
> Subject: ICMP sockets destruction vs ICMP packets oops

> After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
> icmp_reply() wants ICMP socket.
>
> Steps to reproduce:
>
> 	launch shell in new netns
> 	move real NIC to netns
> 	setup routing
> 	ping -i 0
> 	exit from shell
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
> PGD 17f3cd067 PUD 17f3ce067 PMD 0 
> Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU 0 
> Modules linked in: usblp usbcore
> Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
> RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
> RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
> RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
> RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
> R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
> R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
> FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
> Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
>  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
>  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
> Call Trace:
>  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
>  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
>  [<ffffffff803fd645>] icmp_echo+0x65/0x70
>  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
>  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
>  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
>  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
>  [<ffffffff803be57d>] process_backlog+0x9d/0x100
>  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
>  [<ffffffff80237be5>] __do_softirq+0x75/0x100
>  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
>  [<ffffffff8020f085>] do_softirq+0x65/0xa0
>  [<ffffffff80237af7>] irq_exit+0x97/0xa0
>  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
>  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
>  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
>  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
>  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
>  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
>  [<ffffffff8040f380>] ? rest_init+0x70/0x80
> Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
> 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
> 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
> RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
>  RSP <ffffffff8057fc30>
> CR2: 0000000000000000
> ---[ end trace ea161157b76b33e8 ]---
> Kernel panic - not syncing: Aiee, killing interrupt handler!

Receiving packets while we are cleaning up a network namespace is a
racy proposition. It is possible when the packet arrives that we have
removed some but not all of the state we need to fully process it.  We
have the choice of either playing wack-a-mole with the cleanup routines
or simply dropping packets when we don't have a network namespace to
handle them.

Since the check looks inexpensive in netif_receive_skb let's just
drop the incoming packets.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20 22:16:51 -07:00
Ben Hutchings 4497b0763c net: Discard and warn about LRO'd skbs received for forwarding
Add skb_warn_if_lro() to test whether an skb was received with LRO and
warn if so.

Change br_forward(), ip_forward() and ip6_forward() to call it) and
discard the skb if it returns true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:22:28 -07:00
Ben Hutchings 0187bdfb05 net: Disable LRO on devices that are forwarding
Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded.  It can also confuse the GSO on output.

Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.

Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:15:47 -07:00
Wang Chen dad9b335c6 netdevice: Fix promiscuity and allmulti overflow
Max of promiscuity and allmulti plus positive @inc can cause overflow.
Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a
positive @inc will cause allmulti be off.
This is not what we want, though it's rare case.
The fix is that only negative @inc will cause allmulti or promiscuity be off
and when any caller makes the counters touch the roof, we return error.

Change of v2:
Change void function dev_set_promiscuity/allmulti to return int.
So callers can get the overflow error.
Caller's fix will be done later.

Change of v3:
1. Since we return error to caller, we don't need to print KERN_ERROR,
KERN_WARNING is enough.
2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we
return at once.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 01:48:28 -07:00
David S. Miller 972692e0db net: Add sk_set_socket() helper.
In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.

Suggested (although only half-seriously) by Evgeniy Polyakov.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 22:41:38 -07:00