When one macvlan device is dismantled, we can avoid one
synchronize_rcu() call done after deletion from hash list, since caller
will perform a synchronize_net() call after its ndo_stop() call.
Add a new netdev->dismantle field to signal this dismantle intent.
Reduces RTNL hold time.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be needed by bonding and other drivers changing vlan_features
after ndo_init callback.
As a bonus, this includes kernel-doc for netdev_update_features().
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław's patch (http://patchwork.ozlabs.org/patch/94421/) fixes the
issue (http://patchwork.ozlabs.org/patch/94188/) about not populating FCoE related
flags correctly on vlan devices. However, only NETIF_F_FCOE_CRC is part of the
NETIF_F_ALL_TX_OFFLOADS right now, where weed NETIF_F_FCOE_MTU and NETIF_F_FSO
as well.
Therefore, add NETIF_F_ALL_FCOE to indicate feature flags used by FCoE TX offloads.
These include NETIF_F_FCOE_CRC, NETIF_F_FCOE_MTU, and NETIF_F_FSO and add them to
be part of NETIF_F_ALL_TX_OFFLOADS. This would eventually make sure all FCoE needed
flags are populated properly to vlan devices.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables ethtool to set the loopback mode on a given interface.
By configuring the interface in loopback mode in conjunction with a policy
route / rule, a userland application can stress the egress / ingress path
exposing the flows of the change in progress and potentially help developer(s)
understand the impact of those changes without even sending a packet out
on the network.
Following set of commands illustrates one such example -
a) ip -4 addr add 192.168.1.1/24 dev eth1
b) ip -4 rule add from all iif eth1 lookup 250
c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
d) arp -Ds 192.168.1.100 eth1
e) arp -Ds 192.168.1.200 eth1
f) sysctl -w net.ipv4.ip_nonlocal_bind=1
g) sysctl -w net.ipv4.conf.all.accept_local=1
# Assuming that the machine has 8 cores
h) taskset 000f netserver -L 192.168.1.200
i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sure that when a driver calls the ethtool's
get/set_settings() callback of another driver, the data passed to it
is clean. This guarantees that speed_hi will be zeroed correctly if
the called callback doesn't explicitely set it: we are sure we don't
get a corrupted speed from the underlying driver. We also take care of
setting the cmd field appropriately (ETHTOOL_GSET/SSET).
This applies to dev_ethtool_get_settings(), which now makes sure it
sets up that ethtool command parameter correctly before passing it to
drivers. This also means that whoever calls dev_ethtool_get_settings()
does not have to clean the ethtool command parameter. This function
also becomes an exported symbol instead of an inline.
All drivers visible to make allyesconfig under x86_64 have been
updated.
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NETIF_F_HIGHDMA is like any other TX offloads, so allow user to toggle it.
This is needed later for bridge and bonding convertsion to hw_features.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify and fix netdev_increment_features() to conform to what is
stated in netdevice.h comments about NETIF_F_ONE_FOR_ALL.
Include FCoE segmentation and VLAN-challedged flags in computation.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement. skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg. This functionality is
configurable per device using ethtool.
Presumably, this feature would only be useful when the driver does
not touch the data. The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary. For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.
This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply. Platform is 16 core AMD x86.
No-cache copy disabled:
672703 tps, 97.13% utilization
50/90/99% latency:244.31 484.205 1028.41
No-cache copy enabled:
702113 tps, 96.16% utilization,
50/90/99% latency 238.56 467.56 956.955
Using 14000 byte request and response sizes demonstrate the
effects more dramatically:
No-cache copy disabled:
79571 tps, 34.34 %utlization
50/90/95% latency 1584.46 2319.59 5001.76
No-cache copy enabled:
83856 tps, 34.81% utilization
50/90/95% latency 2508.42 2622.62 2735.88
Note especially the effect on latency tail (95th percentile).
This seems to provide a nice performance improvement and is
consistent in the tests I ran. Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Issue FEAT_CHANGE notification when features are changed by
netdev_update_features(). This will allow changes made by extra constraints
on e.g. MTU change to be properly propagated like changes via ethtool.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_ethtool_get_rx_csum() won't report rx checksumming when it's not
changeable and driver is converted to hw_features and friends. Fix this.
(dev->hw_features & NETIF_F_RXCSUM) check is dropped - if the
ethtool_ops->get_rx_csum is set, then driver is not coverted, yet.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows rx_handlers to better signalize what to do next to
it's caller. That makes skb->deliver_no_wcard no longer needed.
kernel-doc for rx_handler_result is taken from Nicolas' patch.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add proper documentation for previously added net_device_ops ops for FCoE.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since a8f80e8ff9 any process with
CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
allow anybody load any module not related to networking.
This patch restricts an ability of autoloading modules to netdev modules
with explicit aliases. This fixes CVE-2011-1019.
Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
of loading netdev modules by name (without any prefix) for processes
with CAP_SYS_MODULE to maintain the compatibility with network scripts
that use autoloading netdev modules by aliases like "eth0", "wlan0".
Currently there are only three users of the feature in the upstream
kernel: ipip, ip_gre and sit.
root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: fffffff800001000
CapEff: fffffff800001000
CapBnd: fffffff800001000
root@albatros:~# modprobe xfs
FATAL: Error inserting xfs
(/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
root@albatros:~# lsmod | grep xfs
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit
sit: error fetching interface information: Device not found
root@albatros:~# lsmod | grep sit
root@albatros:~# ifconfig sit0
sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
root@albatros:~# lsmod | grep sit
sit 10457 0
tunnel4 2957 1 sit
For CAP_SYS_MODULE module loading is still relaxed:
root@albatros:~# grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: ffffffffffffffff
CapEff: ffffffffffffffff
CapBnd: ffffffffffffffff
root@albatros:~# ifconfig xfs
xfs: error fetching interface information: Device not found
root@albatros:~# lsmod | grep xfs
xfs 745319 0
Reference: https://lkml.org/lkml/2011/2/24/203
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
The Fiber Channel over Ethernet (FCoE) Direct Data Placement (DDP) can also be
used for FCoE target, where the DDP used for read I/O on an initiator can be
used on an FCoE target to speed up the write I/O to the target from the initiator.
The added ndo_fcoe_ddp_target() works in the similar way as the existing
ndo_fcoe_ddp_setup() to allow the underlying hardware set up the DDP context
accordingly when it gets called from the FCoE target implementation on top
the existing Open-FCoE fcoe/libfc protocol stack so without losing the ability
to provide DDP for read I/O as an initiator, it can also provide DDP offload
to the write I/O coming from the initiator as a target.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This allows any caller to be prefaced by any specific
pr_fmt to better identify which device driver is using
this function inappropriately.
Add terminating newline.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
INIT_NETDEV_GROUP is needed by userspace, move it outside __KERNEL__
guards.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.
ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces a new framework to handle device features setting.
It consists of:
- new fields in struct net_device:
+ hw_features - features that hw/driver supports toggling
+ wanted_features - features that user wants enabled, when possible
- new netdev_ops:
+ feat = ndo_fix_features(dev, feat) - API checking constraints for
enabling features or their combinations
+ ndo_set_features(dev) - API updating hardware state to match
changed dev->features
- new ethtool commands:
+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
and trigger device reconfiguration if resulting dev->features
changed
+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to enable GRO even if RX csum is disabled. GRO will not
be used for packets without hardware checksum anyway.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>