Commit Graph

442 Commits

Author SHA1 Message Date
Eric Dumazet
449f454426 macvlan: remove one synchronize_rcu() call
When one macvlan device is dismantled, we can avoid one
synchronize_rcu() call done after deletion from hash list, since caller
will perform a synchronize_net() call after its ndo_stop() call.

Add a new netdev->dismantle field to signal this dismantle intent.

Reduces RTNL hold time.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-20 00:33:18 -04:00
Joe Perches
bdc220da32 netdevice.h: Align struct net_device members
Save a bit of space.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 14:07:23 -04:00
Michał Mirosław
afe12cc86b net: introduce netdev_change_features()
It will be needed by bonding and other drivers changing vlan_features
after ndo_init callback.

As a bonus, this includes kernel-doc for netdev_update_features().

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 18:40:56 -04:00
Yi Zou
7f267051bd net: group FCoE related feature flags
Michał Mirosław's patch (http://patchwork.ozlabs.org/patch/94421/) fixes the
issue (http://patchwork.ozlabs.org/patch/94188/) about not populating FCoE related
flags correctly on vlan devices. However, only NETIF_F_FCOE_CRC is part of the
NETIF_F_ALL_TX_OFFLOADS right now, where weed NETIF_F_FCOE_MTU and NETIF_F_FSO
as well.

Therefore, add NETIF_F_ALL_FCOE to indicate feature flags used by FCoE TX offloads.
These include NETIF_F_FCOE_CRC, NETIF_F_FCOE_MTU, and NETIF_F_FSO and add them to
be part of NETIF_F_ALL_TX_OFFLOADS. This would eventually make sure all FCoE needed
flags are populated properly to vlan devices.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 17:54:16 -04:00
Mahesh Bandewar
eed2a12f1e net: Allow ethtool to set interface in loopback mode.
This patch enables ethtool to set the loopback mode on a given interface.
By configuring the interface in loopback mode in conjunction with a policy
route / rule, a userland application can stress the egress / ingress path
exposing the flows of the change in progress and potentially help developer(s)
understand the impact of those changes without even sending a packet out
on the network.

Following set of commands illustrates one such example -
    a) ip -4 addr add 192.168.1.1/24 dev eth1
    b) ip -4 rule add from all iif eth1 lookup 250
    c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
    d) arp -Ds 192.168.1.100 eth1
    e) arp -Ds 192.168.1.200 eth1
    f) sysctl -w net.ipv4.ip_nonlocal_bind=1
    g) sysctl -w net.ipv4.conf.all.accept_local=1
    # Assuming that the machine has 8 cores
    h) taskset 000f netserver -L 192.168.1.200
    i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08 15:59:12 -07:00
David Decotigny
8ae6daca85 ethtool: Call ethtool's get/set_settings callbacks with cleaned data
This makes sure that when a driver calls the ethtool's
get/set_settings() callback of another driver, the data passed to it
is clean. This guarantees that speed_hi will be zeroed correctly if
the called callback doesn't explicitely set it: we are sure we don't
get a corrupted speed from the underlying driver. We also take care of
setting the cmd field appropriately (ETHTOOL_GSET/SSET).

This applies to dev_ethtool_get_settings(), which now makes sure it
sets up that ethtool command parameter correctly before passing it to
drivers. This also means that whoever calls dev_ethtool_get_settings()
does not have to clean the ethtool command parameter. This function
also becomes an exported symbol instead of an inline.

All drivers visible to make allyesconfig under x86_64 have been
updated.

Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-29 14:01:30 -07:00
Michał Mirosław
fa2bd7ff92 net: allow user to change NETIF_F_HIGHDMA
NETIF_F_HIGHDMA is like any other TX offloads, so allow user to toggle it.
This is needed later for bridge and bonding convertsion to hw_features.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28 13:33:08 -07:00
Michał Mirosław
1742f183fc net: fix netdev_increment_features()
Simplify and fix netdev_increment_features() to conform to what is
stated in netdevice.h comments about NETIF_F_ONE_FOR_ALL.
Include FCoE segmentation and VLAN-challedged flags in computation.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-28 13:33:07 -07:00
Eric Dumazet
0a14842f5a net: filter: Just In Time compiler for x86-64
In order to speedup packet filtering, here is an implementation of a
JIT compiler for x86_64

It is disabled by default, and must be enabled by the admin.

echo 1 >/proc/sys/net/core/bpf_jit_enable

It uses module_alloc() and module_free() to get memory in the 2GB text
kernel range since we call helpers functions from the generated code.

EAX : BPF A accumulator
EBX : BPF X accumulator
RDI : pointer to skb   (first argument given to JIT function)
RBP : frame pointer (even if CONFIG_FRAME_POINTER=n)
r9d : skb->len - skb->data_len (headlen)
r8  : skb->data

To get a trace of generated code, use :

echo 2 >/proc/sys/net/core/bpf_jit_enable

Example of generated code :

# tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24

flen=18 proglen=147 pass=3 image=ffffffffa00b5000
JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60
JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00
JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00
JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be
JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0
JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00
JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00
JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24
JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31
JIT code: ffffffffa00b5090: c0 c9 c3

BPF program is 144 bytes long, so native program is almost same size ;)

(000) ldh      [12]
(001) jeq      #0x800           jt 2    jf 8
(002) ld       [26]
(003) and      #0xffffff00
(004) jeq      #0xc0a81400      jt 16   jf 5
(005) ld       [30]
(006) and      #0xffffff00
(007) jeq      #0xc0a81400      jt 16   jf 17
(008) jeq      #0x806           jt 10   jf 9
(009) jeq      #0x8035          jt 10   jf 17
(010) ld       [28]
(011) and      #0xffffff00
(012) jeq      #0xc0a81400      jt 16   jf 13
(013) ld       [38]
(014) and      #0xffffff00
(015) jeq      #0xc0a81400      jt 16   jf 17
(016) ret      #65535
(017) ret      #0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-27 23:05:08 -07:00
Michał Mirosław
1aac626716 net: vlan_features comment clarification
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-12 14:54:24 -07:00
Tom Herbert
c6e1a0d12c net: Allow no-cache copy from user on transmit
This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement.  skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg.  This functionality is
configurable per device using ethtool.

Presumably, this feature would only be useful when the driver does
not touch the data.  The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary.  For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.

This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply.  Platform is 16 core AMD x86.

No-cache copy disabled:
   672703 tps, 97.13% utilization
   50/90/99% latency:244.31 484.205 1028.41

No-cache copy enabled:
   702113 tps, 96.16% utilization,
   50/90/99% latency 238.56 467.56 956.955

Using 14000 byte request and response sizes demonstrate the
effects more dramatically:

No-cache copy disabled:
   79571 tps, 34.34 %utlization
   50/90/95% latency 1584.46 2319.59 5001.76

No-cache copy enabled:
   83856 tps, 34.81% utilization
   50/90/95% latency 2508.42 2622.62 2735.88

Note especially the effect on latency tail (95th percentile).

This seems to provide a nice performance improvement and is
consistent in the tests I ran.  Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-04 22:30:30 -07:00
David S. Miller
083dd8b8aa Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-04-04 10:39:12 -07:00
Michał Mirosław
6cb6a27c45 net: Call netdev_features_change() from netdev_update_features()
Issue FEAT_CHANGE notification when features are changed by
netdev_update_features().  This will allow changes made by extra constraints
on e.g. MTU change to be properly propagated like changes via ethtool.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-02 22:48:47 -07:00
Michał Mirosław
4dd5ffe4fc net: Fix dev dev_ethtool_get_rx_csum() for forced NETIF_F_RXCSUM
dev_ethtool_get_rx_csum() won't report rx checksumming when it's not
changeable and driver is converted to hw_features and friends. Fix this.

(dev->hw_features & NETIF_F_RXCSUM) check is dropped - if the
ethtool_ops->get_rx_csum is set, then driver is not coverted, yet.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-01 20:23:25 -07:00
Jiri Pirko
8a4eb5734e net: introduce rx_handler results and logic around that
This patch allows rx_handlers to better signalize what to do next to
it's caller. That makes skb->deliver_no_wcard no longer needed.

kernel-doc for rx_handler_result is taken from Nicolas' patch.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-16 12:53:54 -07:00
Yi Zou
e9bce845c0 net: add proper documentation for previously added net_device_ops for FCoE
Add proper documentation for previously added net_device_ops ops for FCoE.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-11 01:17:50 -08:00
David S. Miller
33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
Vasiliy Kulikov
8909c9ad8f net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules
Since a8f80e8ff9 any process with
CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.

This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases.  This fixes CVE-2011-1019.

Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".

Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	fffffff800001000
    CapEff:	fffffff800001000
    CapBnd:	fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0      Link encap:IPv6-in-IPv4
	      NOARP  MTU:1480  Metric:1

    root@albatros:~# lsmod | grep sit
    sit                    10457  0
    tunnel4                 2957  1 sit

For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh:	0000000000000000
    CapPrm:	ffffffffffffffff
    CapEff:	ffffffffffffffff
    CapBnd:	ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs                   745319  0

Reference: https://lkml.org/lkml/2011/2/24/203

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2011-03-10 10:25:19 +11:00
Yi Zou
6247e08618 net: add ndo_fcoe_ddp_target() to support FCoE DDP in target mode
The Fiber Channel over Ethernet (FCoE) Direct Data Placement (DDP) can also be
used for FCoE target, where the DDP used for read I/O on an initiator can be
used on an FCoE target to speed up the write I/O to the target from the initiator.
The added ndo_fcoe_ddp_target() works in the similar way as the existing
ndo_fcoe_ddp_setup() to allow the underlying hardware set up the DDP context
accordingly when it gets called from the FCoE target implementation on top
the existing Open-FCoE fcoe/libfc protocol stack so without losing the ability
to provide DDP for read I/O as an initiator, it can also provide DDP offload
to the write I/O coming from the initiator as a target.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-03-07 18:00:17 -08:00
Joe Perches
256ee435b9 netdevice: Convert printk to pr_info in netif_tx_stop_queue
This allows any caller to be prefaced by any specific
pr_fmt to better identify which device driver is using
this function inappropriately.

Add terminating newline.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-07 12:34:01 -08:00
Vlad Dogaru
23b41168fc netdevice: make initial group visible to userspace
INIT_NETDEV_GROUP is needed by userspace, move it outside __KERNEL__
guards.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 21:55:52 -08:00
Michał Mirosław
e83d360d9a net: introduce NETIF_F_RXCSUM
Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.

ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:35 -08:00
Michał Mirosław
5455c6998d net: Introduce new feature setting ops
This introduces a new framework to handle device features setting.
It consists of:
  - new fields in struct net_device:
	+ hw_features - features that hw/driver supports toggling
	+ wanted_features - features that user wants enabled, when possible
  - new netdev_ops:
	+ feat = ndo_fix_features(dev, feat) - API checking constraints for
		enabling features or their combinations
	+ ndo_set_features(dev) - API updating hardware state to match
		changed dev->features
  - new ethtool commands:
	+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
		and trigger device reconfiguration if resulting dev->features
		changed
	+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:33 -08:00
Michał Mirosław
0a41770477 ethtool: factorize get/set_one_feature
This allows to enable GRO even if RX csum is disabled. GRO will not
be used for packets without hardware checksum anyway.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:33 -08:00
Michał Mirosław
212b573f55 ethtool: enable GSO and GRO by default
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:32 -08:00