Commit Graph

54 Commits

Author SHA1 Message Date
Jiri Pirko ab95bfe01f net: replace hooks in __netif_receive_skb V5
What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-02 07:11:15 -07:00
Jiri Pirko f16d3d5748 macvlan: do proper cleanup in macvlan_common_newlink() V2
Fixes possible memory leak.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 18:42:12 -07:00
Eric Dumazet 2d6c9ffcca net: congestion notifications are not dropped packets
vlan/macvlan start_xmit() can inform caller of congestion with
NET_XMIT_CN return value. This doesnt mean packet was dropped.
Increment normal stat counters instead of tx_dropped.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:42:15 -07:00
Jiri Pirko a14462f1bd net: adjust handle_macvlan to pass port struct to hook
Now there's null check here and also again in the hook. Looking at bridge bits
which are simmilar, port structure is rcu_dereferenced right away in
handle_bridge and passed to hook. Looks nicer.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:48:02 -07:00
Jiri Pirko a748ee2426 net: move address list functions to a separate file
+little renaming of unicast functions to be smooth with multicast ones

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:22:11 -07:00
Jiri Pirko 1c01fe14a8 net: forbid underlaying devices to change its type
It's not desired for underlaying devices to change type. At the time,
there is for example possible to have bond with changed type from
Ethernet to Infiniband as a port of a bridge. This patch fixes this.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:00:02 -07:00
Arnd Bergmann fc0663d6b5 macvlan: allow multiple driver backends
This makes it possible to hook into the macvlan driver
from another kernel module. In particular, the goal is
to extend it with the macvtap backend that provides
a tun/tap compatible interface directly on the macvlan
device.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03 20:20:33 -08:00
Arnd Bergmann 8a83a00b07 net: maintain namespace isolation between vlan and real device
In the vlan and macvlan drivers, the start_xmit function forwards
data to the dev_queue_xmit function for another device, which may
potentially belong to a different namespace.

To make sure that classification stays within a single namespace,
this resets the potentially critical fields.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-03 20:20:32 -08:00
Patrick Mullaney 6eb3a85533 macvlan: add GRO bit to features mask
Allow macvlan devices to support GRO.

Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-16 01:05:38 -08:00
Patrick Mullaney fc4a748966 netdevice: provide common routine for macvlan and vlan operstate management
Provide common routine for the transition of operational state for a leaf
device during a root device transition.

Signed-off-by: Patrick Mullaney <pmullaney@novell.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-03 15:59:22 -08:00
David S. Miller 9b963e5d0e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/ieee802154/fakehard.c
	drivers/net/e1000e/ich8lan.c
	drivers/net/e1000e/phy.c
	drivers/net/netxen/netxen_nic_init.c
	drivers/net/wireless/ath/ath9k/main.c
2009-11-29 00:57:15 -08:00
Arnd Bergmann 27c0b1a850 macvlan: export macvlan mode through netlink
In order to support all three modes of macvlan at
runtime, extend the existing netlink protocol
to allow choosing the mode per macvlan slave
interface.

This depends on a matching patch to iproute2
in order to become accessible in user land.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-26 15:53:10 -08:00
Arnd Bergmann 618e1b7482 macvlan: implement bridge, VEPA and private mode
This allows each macvlan slave device to be in one
of three modes, depending on the use case:

MACVLAN_PRIVATE:
  The device never communicates with any other device
  on the same upper_dev. This even includes frames
  coming back from a reflective relay, where supported
  by the adjacent bridge.

MACVLAN_VEPA:
  The new Virtual Ethernet Port Aggregator (VEPA) mode,
  we assume that the adjacent bridge returns all frames
  where both source and destination are local to the
  macvlan port, i.e. the bridge is set up as a reflective
  relay.
  Broadcast frames coming in from the upper_dev get
  flooded to all macvlan interfaces in VEPA mode.
  We never deliver any frames locally.

MACVLAN_BRIDGE:
  We provide the behavior of a simple bridge between
  different macvlan interfaces on the same port. Frames
  from one interface to another one get delivered directly
  and are not sent out externally. Broadcast frames get
  flooded to all other bridge ports and to the external
  interface, but when they come back from a reflective
  relay, we don't deliver them again.
  Since we know all the MAC addresses, the macvlan bridge
  mode does not require learning or STP like the bridge
  module does.

Based on an earlier patch "macvlan: Reflect macvlan packets
meant for other macvlan devices" by Eric Biederman.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-26 15:53:07 -08:00
Arnd Bergmann a1e514c5d0 macvlan: cleanup rx statistics
We have very similar code for rx statistics in
two places in the macvlan driver, with a third
one being added in the next patch.

Consolidate them into one function to improve
overall readability of the driver.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-26 15:53:02 -08:00
Patrick McHardy 8c2acc53fd macvlan: fix gso_max_size setting
gso_max_size must be set based on the value of the underlying device to
support devices not using the full 64k.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-23 14:18:53 -08:00
Eric Dumazet fccaf71011 macvlan: Precise RX stats accounting
With multi queue devices, its possible that several cpus call
macvlan RX routines simultaneously for the same macvlan device.

We update RX stats counter without any locking, so we can
get slightly wrong counters.

One possible fix is to use percpu counters, to get precise
accounting and also get guarantee of no cache line ping pongs
between cpus.

Note: this adds 16 bytes (32 bytes on 64bit arches) of percpu
data per macvlan device.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 23:51:57 -08:00
Patrick McHardy cbbef5e183 vlan/macvlan: propagate transmission state to upper layers
Both vlan and macvlan devices usually don't use a qdisc and immediately
queue packets to the underlying device. Propagate transmission state of
the underlying device to the upper layers so they can react on congestion
and/or inform the sending process.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 14:07:33 -08:00
Eric W. Biederman 81adee47df net: Support specifying the network namespace upon device creation.
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.

We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call.  To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.

In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-11-08 00:53:51 -08:00
Eric Dumazet 23289a37e2 net: add a list_head parameter to dellink() method
Adding a list_head parameter to rtnl_link_ops->dellink() methods
allow us to queue devices on a list, in order to dismantle
them all at once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-28 02:22:07 -07:00
Eric Dumazet 2c11455321 macvlan: add multiqueue capability
macvlan devices are currently not multi-queue capable.

We can do that defining rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()

This new method gets num_tx_queues/real_num_tx_queues
from lower device.

macvlan_get_tx_queues() is a copy of vlan_get_tx_queues().

Because macvlan_start_xmit() has to update netdev_queue
stats only (and not dev->stats), I chose to change
tx_errors/tx_aborted_errors accounting to tx_dropped,
since netdev_queue structure doesnt define tx_errors /
tx_aborted_errors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03 20:02:13 -07:00
Eric Dumazet ac06713d55 macvlan: Use compare_ether_addr_64bits()
To speedup ether addresses compares, we can use compare_ether_addr_64bits()
(all operands are guaranteed to be at least 8 bytes long)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:25 -07:00
Stephen Hemminger 424efe9caf netdev: convert pseudo drivers to netdev_tx_t
These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 01:13:40 -07:00
sg.tweak@gmail.com ef5c89967d drivers/net/macvlan.c: fix cloning of tagged VLAN interfaces
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13348

akpm: the reporter disappeared, so I typed it in again.

It is not possible to make clone of tagged VLAN interface to be used as
mac-based vlan interfave.

How reproducible:
Use any 802.1q tagged vlan interface, e.g. eth2.700 and clone it:

  ip link add link eth2.700 address 00:04:75:cb:38:09 macvlan0 type macvlan
  ip link set dev macvlan0 up
  ip addr add 10.195.1.1/24 dev macvlan0

So far, so good. Now try to ping anything via macvlan0:

  ping 10.195.1.2

Actual results:
For every attempted packet tx kernel writes to console:

------------[ cut here ]------------
WARNING: at net/8021q/vlan_dev.c:254 vlan_dev_hard_header+0x36/0x126 [8021q]()
Hardware name: M22ES
Modules linked in: arptable_filter arp_tables bridge veth macvlan arc4 ecb
ppp_mppe ppp_async crc_ccitt ppp_generic slhc autofs4 sunrpc 8021q garp stp
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp
x_tables dm_mirror dm_region_hash dm_log dm_multipath dm_mod sbs sbshc lp
floppy snd_intel8x0 joydev snd_seq_dummy snd_intel8x0m snd_ac97_codec
ide_cd_mod ac97_bus snd_seq_oss cdrom snd_seq_midi_event serio_raw snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss parport_pc snd_pcm parport battery
8139cp snd_timer i2c_sis96x ac button snd rtc_cmos rtc_core 8139too soundcore
rtc_lib mii i2c_core pcspkr snd_page_alloc pata_sis libata sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: ip_tables]
Pid: 0, comm: swapper Tainted: G        W  2.6.29.3 #1
Call Trace:
 [<c0425f48>] warn_slowpath+0x60/0x9f
 [<c0425f6f>] warn_slowpath+0x87/0x9f
 [<dffb850d>] vlan_dev_hard_header+0x0/0x126 [8021q]
 [<dffb8543>] vlan_dev_hard_header+0x36/0x126 [8021q]
 [<dffb850d>] vlan_dev_hard_header+0x0/0x126 [8021q]
 [<df83155d>] macvlan_hard_header+0x3c/0x47 [macvlan]
 [<df831521>] macvlan_hard_header+0x0/0x47 [macvlan]
 [<c062bf3f>] arp_create+0xef/0x1ff
 [<c062c08c>] arp_send+0x3d/0x54
 [<c062c916>] arp_solicit+0x16c/0x177
 [<c05fadd2>] neigh_timer_handler+0x227/0x269
 [<c05fabab>] neigh_timer_handler+0x0/0x269
 [<c042ce4d>] run_timer_softirq+0xf0/0x141
 [<c0429e5a>] __do_softirq+0x76/0xf8
 [<c0429de4>] __do_softirq+0x0/0xf8
 <IRQ>  [<c044fb67>] handle_level_irq+0x0/0xad
 [<c0429db7>] irq_exit+0x35/0x62
 [<c04046bb>] do_IRQ+0xdf/0xf4
 [<c04035a7>] common_interrupt+0x27/0x2c
 [<c04079c5>] default_idle+0x2a/0x3d
 [<c0401bb6>] cpu_idle+0x57/0x70

Macvlan driver always uses standard ethernet header length for all types
of interface to which it is linked.  This patch fixes this problem.

Reported-by: <sg.tweak@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-11 02:32:39 -07:00
Jiri Pirko ccffad25b5 net: convert unicast addr list
This patch converts unicast address list to standard list_head using
previously introduced struct netdev_hw_addr. It also relaxes the
locking. Original spinlock (still used for multicast addresses) is not
needed and is no longer used for a protection of this list. All
reading and writing takes place under rtnl (with no changes).

I also removed a possibility to specify the length of the address
while adding or deleting unicast address. It's always dev->addr_len.

The convertion touched especially e1000 and ixgbe codes when the
change is not so trivial.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

 drivers/net/bnx2.c               |   13 +--
 drivers/net/e1000/e1000_main.c   |   24 +++--
 drivers/net/ixgbe/ixgbe_common.c |   14 ++--
 drivers/net/ixgbe/ixgbe_common.h |    4 +-
 drivers/net/ixgbe/ixgbe_main.c   |    6 +-
 drivers/net/ixgbe/ixgbe_type.h   |    4 +-
 drivers/net/macvlan.c            |   11 +-
 drivers/net/mv643xx_eth.c        |   11 +-
 drivers/net/niu.c                |    7 +-
 drivers/net/virtio_net.c         |    7 +-
 drivers/s390/net/qeth_l2_main.c  |    6 +-
 drivers/scsi/fcoe/fcoe.c         |   16 ++--
 include/linux/netdevice.h        |   18 ++--
 net/8021q/vlan.c                 |    4 +-
 net/8021q/vlan_dev.c             |   10 +-
 net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
 net/dsa/slave.c                  |   10 +-
 net/packet/af_packet.c           |    4 +-
 18 files changed, 227 insertions(+), 137 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-29 22:12:32 -07:00
Eric Dumazet 93f154b594 net: release dst entry in dev_hard_start_xmit()
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
     already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:19:19 -07:00