Commit Graph

2287 Commits

Author SHA1 Message Date
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Tony Lindgren
bc417e30f8 net: Add back alignment for size for __alloc_skb
Commit 87fb4b7b53 (net: more
accurate skb truesize) changed the alignment of size. This
can cause problems at least on some machines with NFS root:

Unhandled fault: alignment exception (0x801) at 0xc183a43a
Internal error: : 801 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.1.0-08784-g5eeee4a #733)
pc : [<c02fbba0>]    lr : [<c02fbb9c>]    psr: 60000013
sp : c180fef8  ip : 00000000  fp : c181f580
r10: 00000000  r9 : c044b28c  r8 : 00000001
r7 : c183a3a0  r6 : c1835be0  r5 : c183a412  r4 : 000001f2
r3 : 00000000  r2 : 00000000  r1 : ffffffe6  r0 : c183a43a
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 10004000  DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc180e270)
Stack: (0xc180fef8 to 0xc1810000)
fee0:                                                       00000024 00000000
ff00: 00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
ff20: c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
ff40: 00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
ff60: c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
ff80: c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
ffa0: c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
ffc0: 00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
ffe0: 00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
Function entered at [<c02fbba0>] from [<c019dfc4>]
Function entered at [<c019dfc4>] from [<c01f34f4>]
Function entered at [<c01f34f4>] from [<c01f4918>]
Function entered at [<c01f4918>] from [<c0008758>]
Function entered at [<c0008758>] from [<c04b2224>]
Function entered at [<c04b2224>] from [<c001056c>]
Code: e1a00005 e3a01028 ebfa7cb0 e35a0000 (e5858028)

Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
skb->end can be unaligned without this patch.

As explained by Eric Dumazet <eric.dumazet@gmail.com>, this happens
only with SLOB, and not with SLAB or SLUB:

* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:56]:
>
> Your patch is absolutely needed, I completely forgot about SLOB :(
>
> since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
> power of two.
>
> [   60.305763] malloc(size=385)->ffff880112c11e38 ksize=386 -> nsize=2
> [   60.305921] malloc(size=385)->ffff88007c92ce28 ksize=386 -> nsize=2
> [   60.306898] malloc(size=656)->ffff88007c44ad28 ksize=656 -> nsize=272
> [   60.325385] malloc(size=656)->ffff88007c575868 ksize=656 -> nsize=272
> [   60.325531] malloc(size=656)->ffff88011c777230 ksize=656 -> nsize=272
> [   60.325701] malloc(size=656)->ffff880114011008 ksize=656 -> nsize=272
> [   60.346716] malloc(size=385)->ffff880114142008 ksize=386 -> nsize=2
> [   60.346900] malloc(size=385)->ffff88011c777690 ksize=386 -> nsize=2

Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-03 18:09:16 -04:00
David S. Miller
045f7b3b00 neigh: Kill bogus SMP protected debugging message.
Whatever situations make this state legitimate when SMP
also would be legitimate when !SMP and f.e. preemption is
enabled.

This is dubious enough that we should just delete it entirely.  If we
want to add debugging for neigh timer races, better more thorough
mechanisms are needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-01 17:45:55 -04:00
Paul Gortmaker
bc3b2d7fb9 net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules
These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:30 -04:00
Paul Gortmaker
3a9a231d97 net: Fix files explicitly needing to include module.h
With calls to modular infrastructure, these files really
needs the full module.h header.  Call it out so some of the
cleanups of implicit and unrequired includes elsewhere can be
cleaned up.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:28 -04:00
Eric Dumazet
6a32e4f9dd vlan: allow nested vlan_do_receive()
commit 2425717b27 (net: allow vlan traffic to be received under bond)
broke ARP processing on vlan on top of bonding.

       +-------+
eth0 --| bond0 |---bond0.103
eth1 --|       |
       +-------+

52870.115435: skb_gro_reset_offset <-napi_gro_receive
52870.115435: dev_gro_receive <-napi_gro_receive
52870.115435: napi_skb_finish <-napi_gro_receive
52870.115435: netif_receive_skb <-napi_skb_finish
52870.115435: get_rps_cpu <-netif_receive_skb
52870.115435: __netif_receive_skb <-netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: bond_handle_frame <-__netif_receive_skb
52870.115436: vlan_do_receive <-__netif_receive_skb
52870.115436: arp_rcv <-__netif_receive_skb
52870.115436: kfree_skb <-arp_rcv

Packet is dropped in arp_rcv() because its pkt_type was set to
PACKET_OTHERHOST in the first vlan_do_receive() call, since no eth0.103
exists.

We really need to change pkt_type only if no more rx_handler is about to
be called for the packet.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-30 04:43:30 -04:00
Thomas Gleixner
b0691c8ee7 net: Unlock sock before calling sk_free()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-25 19:17:25 -04:00
Linus Torvalds
8a9ea3237e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
  dp83640: free packet queues on remove
  dp83640: use proper function to free transmit time stamping packets
  ipv6: Do not use routes from locally generated RAs
  |PATCH net-next] tg3: add tx_dropped counter
  be2net: don't create multiple RX/TX rings in multi channel mode
  be2net: don't create multiple TXQs in BE2
  be2net: refactor VF setup/teardown code into be_vf_setup/clear()
  be2net: add vlan/rx-mode/flow-control config to be_setup()
  net_sched: cls_flow: use skb_header_pointer()
  ipv4: avoid useless call of the function check_peer_pmtu
  TCP: remove TCP_DEBUG
  net: Fix driver name for mdio-gpio.c
  ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
  rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
  ipv4: fix ipsec forward performance regression
  jme: fix irq storm after suspend/resume
  route: fix ICMP redirect validation
  net: hold sock reference while processing tx timestamps
  tcp: md5: add more const attributes
  Add ethtool -g support to virtio_net
  ...

Fix up conflicts in:
 - drivers/net/Kconfig:
	The split-up generated a trivial conflict with removal of a
	stale reference to Documentation/networking/net-modules.txt.
	Remove it from the new location instead.
 - fs/sysfs/dir.c:
	Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
	with Eric Biederman's changes for tagged directories.
2011-10-25 13:25:22 +02:00
Linus Torvalds
2d03423b23 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits)
  mm: memory hotplug: Check if pages are correctly reserved on a per-section basis
  Revert "memory hotplug: Correct page reservation checking"
  Update email address for stable patch submission
  dynamic_debug: fix undefined reference to `__netdev_printk'
  dynamic_debug: use a single printk() to emit messages
  dynamic_debug: remove num_enabled accounting
  dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions
  uio: Support physical addresses >32 bits on 32-bit systems
  sysfs: add unsigned long cast to prevent compile warning
  drivers: base: print rejected matches with DEBUG_DRIVER
  memory hotplug: Correct page reservation checking
  memory hotplug: Refuse to add unaligned memory regions
  remove the messy code file Documentation/zh_CN/SubmitChecklist
  ARM: mxc: convert device creation to use platform_device_register_full
  new helper to create platform devices with dma mask
  docs/driver-model: Update device class docs
  docs/driver-model: Document device.groups
  kobj_uevent: Ignore if some listeners cannot handle message
  dynamic_debug: make netif_dbg() call __netdev_printk()
  dynamic_debug: make netdev_dbg() call __netdev_printk()
  ...
2011-10-25 12:13:59 +02:00
David S. Miller
1805b2f048 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-10-24 18:18:09 -04:00
Eric W. Biederman
d2237d3574 rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
Renato Westphal noticed that since commit a2835763e1
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.

Fix this by adding the missing manual notification in dev_change_net_namespaces.

Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.

Cc: stable@kernel.org
Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 03:03:37 -04:00
Richard Cochran
da92b194cc net: hold sock reference while processing tx timestamps
The pair of functions,

 * skb_clone_tx_timestamp()
 * skb_complete_tx_timestamp()

were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.

As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.

These functions first appeared in v2.6.36.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-24 02:54:50 -04:00
Eric Dumazet
cf533ea53e tcp: add const qualifiers where possible
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.

For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 05:22:42 -04:00
Mihai Maruseac
f04565ddf5 dev: use name hash for dev_seq_ops
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.

Tests revealed the following results for ifconfig > /dev/null
	* 1000 interfaces:
		* 0.114s without patch
		* 0.089s with patch
	* 3000 interfaces:
		* 0.489s without patch
		* 0.110s with patch
	* 5000 interfaces:
		* 1.363s without patch
		* 0.250s with patch
	* 128000 interfaces (other setup):
		* ~100s without patch
		* ~30s with patch

Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:54:38 -04:00
Ian Campbell
a8605c6063 net: add opaque struct around skb frag page
I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:52:53 -04:00
Eric Dumazet
33136d12be pktgen: remove ndelay() call
Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.

Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.

Reported-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-20 17:00:21 -04:00
Ian Campbell
a0bec1cd8f net: do not take an additional reference in skb_frag_set_page
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.

In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:

6a930b9f16 cxgb3: convert to SKB paged frag API.
5dc3e196ea myri10ge: convert to SKB paged frag API.
0e0634d20d vmxnet3: convert to SKB paged frag API.
86ee8130a4 virtionet: convert to SKB paged frag API.
4a22c4c919 sfc: convert to SKB paged frag API.
18324d690d cassini: convert to SKB paged frag API.
b061b39e3a benet: convert to SKB paged frag API.
b7b6a688d2 bnx2: convert to SKB paged frag API.
804cf14ea5 net: xfrm: convert to SKB frag APIs
ea2ab69379 net: convert core to skb paged frag APIs

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:40:39 -04:00
roy.qing.li@gmail.com
e049f28883 neigh: fix rcu splat in neigh_update()
when use dst_get_neighbour to get neighbour, we need
rcu_read_lock to protect, since dst_get_neighbour uses
rcu_dereference.

The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com>

[  105.612095]
[  105.612096] ===================================================
[  105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
[  105.612101] ---------------------------------------------------
[  105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
without protection!
[  105.612105]
[  105.612106] other info that might help us debug this:
[  105.612106]
[  105.612108]
[  105.612108] rcu_scheduler_active = 1, debug_locks = 0
[  105.612110] 1 lock held by dnsmasq/2618:
[  105.612111]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>]
rtnl_lock+0x17/0x20
[  105.612120]
[  105.612121] stack backtrace:
[  105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
[  105.612125] Call Trace:
[  105.612129]  [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0
[  105.612132]  [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0
[  105.612135]  [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220
[  105.612139]  [<ffffffff81639298>] arp_req_set+0xb8/0x230
[  105.612142]  [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310
[  105.612146]  [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60
[  105.612150]  [<ffffffff8163fb75>] inet_ioctl+0x85/0x90
[  105.612154]  [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70
[  105.612157]  [<ffffffff815b55d3>] sock_ioctl+0x73/0x280
[  105.612162]  [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570
[  105.612165]  [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0
[  105.612168]  [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80
[  105.612172]  [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b

Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:38:51 -04:00
Dan Carpenter
4f25af2782 filter: use unsigned int to silence static checker warning
This is just a cleanup.

My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
	warn: check 'flen' for negative values

flen comes from the user.  We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative.  This is a bug, but it's not exploitable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:35:51 -04:00
Yan, Zheng
afaef734e5 fib_rules: fix unresolved_rules counting
we should decrease ops->unresolved_rules when deleting a unresolved rule.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 19:17:41 -04:00
Richard Cochran
4dc360c5e7 net: validate HWTSTAMP ioctl parameters
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 17:00:35 -04:00
Eric W. Biederman
850a545bd8 net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.
This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.

The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs().  Changing
the rcu_barrier back to a synchronize_net is therefore safe.

Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.

This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat.  /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10.  There was no observable
difference in the amount of time it takes to destroy a network device.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 16:59:42 -04:00
Andy Fleming
3d153a7c8b net: Allow skb_recycle_check to be done in stages
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility.  We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 15:59:45 -04:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
John Fastabend
2425717b27 net: allow vlan traffic to be received under bond
The following configuration used to work as I expected. At least
we could use the fcoe interfaces to do MPIO and the bond0 iface
to do load balancing or failover.

       ---eth2.228-fcoe
       |
eth2 -----|
          |
          |---- bond0
          |
eth3 -----|
       |
       ---eth3.228-fcoe

This worked because of a change we added to allow inactive slaves
to rx 'exact' matches. This functionality was kept intact with the
rx_handler mechanism. However now the vlan interface attached to the
active slave never receives traffic because the bonding rx_handler
updates the skb->dev and goto's another_round. Previously, the
vlan_do_receive() logic was called before the bonding rx_handler.

Now by the time vlan_do_receive calls vlan_find_dev() the
skb->dev is set to bond0 and it is clear no vlan is attached
to this iface. The vlan lookup fails.

This patch moves the VLAN check above the rx_handler. A VLAN
tagged frame is now routed to the eth2.228-fcoe iface in the
above schematic. Untagged frames continue to the bond0 as
normal. This case also remains intact,

eth2 --> bond0 --> vlan.228

Here the skb is VLAN tagged but the vlan lookup fails on eth2
causing the bonding rx_handler to be called. On the second
pass the vlan lookup is on the bond0 iface and completes as
expected.

Putting a VLAN.228 on both the bond0 and eth2 device will
result in eth2.228 receiving the skb. I don't think this is
completely unexpected and was the result prior to the rx_handler
result.

Note, the same setup is also used for other storage traffic that
MPIO is used with eg. iSCSI and similar setups can be contrived
without storage protocols.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Tested-by: Hans Schillstrom <hams.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-18 23:46:46 -04:00