Commit Graph

43192 Commits

Author SHA1 Message Date
sjur.brandeland@stericsson.com bd30ce4bc0 caif: Use RCU instead of spin-lock in caif_dev.c
RCU read_lock and refcount is used to protect in-flight packets.

Use RCU and counters to manage freeing lower part of the CAIF stack if
CAIF-link layer is removed. Old solution based on delaying removal of
device is removed.

When CAIF link layer goes down the use of CAIF link layer is disabled
(by calling caif_set_phy_state()), but removal and freeing of the
lower part of the CAIF stack is done when Link layer is unregistered.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:54 -04:00
sjur.brandeland@stericsson.com 0b1e9738de caif: Use rcu_read_lock in CAIF mux layer.
Replace spin_lock with rcu_read_lock when accessing lists to layers
and cache. While packets are in flight rcu_read_lock should not be held,
instead ref-counters are used in combination with RCU.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:54 -04:00
David S. Miller 8e36360ae8 ipv4: Remove route key identity dependencies in ip_rt_get_source().
Pass in the sk_buff so that we can fetch the necessary keys from
the packet header when working with input routes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 17:29:41 -04:00
Vasiliy Kulikov c319b4d76b net: ipv4: add IPPROTO_ICMP socket kind
This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges.  In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

    socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets.  Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
    - switched to flowi4.
    - minor changes to be consistent with raw sockets code.

PATCH v2:
    - changed ping_debug() to pr_debug().
    - removed CONFIG_IP_PING.
    - removed ping_seq_fops.owner field (unused for procfs).
    - switched to proc_net_fops_create().
    - switched to %pK in seq_printf().

PATCH v1:
    - fixed checksumming bug.
    - CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
    - minor cleanups.
    - introduced sysctl'able group range to restrict socket(2).

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 16:08:13 -04:00
Anirban Chakraborty 29dd54b72b ethtool: Added support for FW dump
Added code to take FW dump via ethtool. Dump level can be controlled via setting the
dump flag. A get function is provided to query the current setting of the dump flag.
Dump data is obtained from the driver via a separate get function.

Changes from v3:
Fixed buffer length issue in ethtool_get_dump_data function.
Updated kernel doc for ethtool_dump struct and get_dump_flag function.

Changes from v2:
Provided separate commands for get flag and data.
Check for minimum of the two buffer length obtained via ethtool and driver and
use that for dump buffer
Pass up the driver return error codes up to the caller.
Added kernel doc comments.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 14:37:28 -04:00
Joe Perches bdc220da32 netdevice.h: Align struct net_device members
Save a bit of space.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 14:07:23 -04:00
Michał Mirosław afe12cc86b net: introduce netdev_change_features()
It will be needed by bonding and other drivers changing vlan_features
after ndo_init callback.

As a bonus, this includes kernel-doc for netdev_update_features().

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 18:40:56 -04:00
Julian Anastasov c92f5ca2e5 ipvs: Remove all remaining references to rt->rt_{src,dst}
Remove all remaining references to rt->rt_{src,dst}
by using dest->dst_saddr to cache saddr (used for TUN mode).
For ICMP in FORWARD hook just restrict the rt_mode for NAT
to disable LOCALNODE. All other modes do not allow
IP_VS_RT_MODE_RDR, so we should be safe with the ICMP
forwarding. Using cp->daddr as replacement for rt_dst
is safe for all modes except BYPASS, even when cp->dest is
NULL because it is cp->daddr that is used to assign cp->dest
for sync-ed connections.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 18:24:46 -04:00
Yi Zou 7f267051bd net: group FCoE related feature flags
Michał Mirosław's patch (http://patchwork.ozlabs.org/patch/94421/) fixes the
issue (http://patchwork.ozlabs.org/patch/94188/) about not populating FCoE related
flags correctly on vlan devices. However, only NETIF_F_FCOE_CRC is part of the
NETIF_F_ALL_TX_OFFLOADS right now, where weed NETIF_F_FCOE_MTU and NETIF_F_FSO
as well.

Therefore, add NETIF_F_ALL_FCOE to indicate feature flags used by FCoE TX offloads.
These include NETIF_F_FCOE_CRC, NETIF_F_FCOE_MTU, and NETIF_F_FSO and add them to
be part of NETIF_F_ALL_TX_OFFLOADS. This would eventually make sure all FCoE needed
flags are populated properly to vlan devices.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 17:54:16 -04:00
Eric Dumazet f607a15800 garp: remove last synchronize_rcu() call
When removing last vlan from a device, garp_uninit_applicant() calls
synchronize_rcu() to make sure no user can still manipulate struct
garp_applicant before we free it.

Use call_rcu() instead, as a step to further net_device dismantle
optimizations.

Add the temporary garp_cleanup_module() function to make sure no pending
call_rcu() are left at module unload time [ this will be removed when
kfree_rcu() is available ]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 17:46:56 -04:00
David S. Miller 3c709f8fb4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6
Conflicts:
	drivers/net/benet/be_main.c
2011-05-11 14:26:58 -04:00
David S. Miller 0074820978 Merge branch 'tipc-May10-2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6 2011-05-11 12:41:28 -04:00
David S. Miller 9bbc052d5e Merge branch 'pablo/nf-2.6-updates' of git://1984.lsi.us.es/net-2.6 2011-05-10 15:04:35 -07:00
Steffen Klassert 43a4dea4c9 xfrm: Assign the inner mode output function to the dst entry
As it is, we assign the outer modes output function to the dst entry
when we create the xfrm bundle. This leads to two problems on interfamily
scenarios. We might insert ipv4 packets into ip6_fragment when called
from xfrm6_output. The system crashes if we try to fragment an ipv4
packet with ip6_fragment. This issue was introduced with git commit
ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
as needed). The second issue is, that we might insert ipv4 packets in
netfilter6 and vice versa on interfamily scenarios.

With this patch we assign the inner mode output function to the dst entry
when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
mode is used and the right fragmentation and netfilter functions are called.
We switch then to outer mode with the output_finish functions.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10 15:03:34 -07:00
David S. Miller 0a5ebb8000 ipv4: Pass explicit daddr arg to ip_send_reply().
This eliminates an access to rt->rt_src.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10 13:32:46 -07:00
Allan Stephens c29c3f70c9 tipc: Abort excessive send requests as early as possible
Adds checks to TIPC's socket send routines to promptly detect and
abort attempts to send more than 66,000 bytes in a single TIPC
message or more than 2**31-1 bytes in a single TIPC byte stream request.
In addition, this ensures that the number of iovecs in a send request
does not exceed the limits of a standard integer variable.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-05-10 16:03:56 -04:00
Hans Schillstrom 7a4f0761fc IPVS: init and cleanup restructuring
DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.
Netns also requires action when net devices unregister
which has never been implemented. I.e this patch also
covers when a device moves into a network namespace,
and has to be released.

IMPLEMENTATION
The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.

This patch adds a function __ip_vs_service_cleanup()
and an enable flag for the netfilter hooks.

The nf hooks will be enabled when the first service is loaded
and never disabled again, except when a namespace exit starts.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
[horms@verge.net.au: minor edit to changelog]
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-05-10 09:52:47 +02:00
Alexey Dobriyan 4940fc889e net: add mac_pton() for parsing MAC address
mac_pton() parses MAC address in form XX:XX:XX:XX:XX:XX and only in that form.

mac_pton() doesn't dirty result until it's sure string representation is valid.

mac_pton() doesn't care about characters _after_ last octet,
it's up to caller to deal with it.

mac_pton() diverges from 0/-E return value convention.
Target usage:

	if (!mac_pton(str, whatever->mac))
		return -EINVAL;
	/* ->mac being u8 [ETH_ALEN] is filled at this point. */
	/* optionally check str[3 * ETH_ALEN - 1] for termination */

Use mac_pton() in pktgen and netconsole for start.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09 12:10:49 -07:00
Eric Dumazet 48752e1b18 vlan: remove one synchronize_net() call
At VLAN dismantle phase, unregister_vlan_dev() makes one
synchronize_net() call after vlan_group_set_device(grp, vlan_id, NULL).

This call can be safely removed because we are calling
unregister_netdevice_queue() to queue device for deletion, and this
process needs at least one rcu grace period to complete.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09 11:41:41 -07:00
Eric Dumazet da37e36876 garp: remove one synchronize_rcu() call
Speedup vlan dismantling in CONFIG_VLAN_8021Q_GVRP=y cases,
by using a call_rcu() to free the memory instead of waiting with
expensive synchronize_rcu() [ while RTNL is held ]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-09 11:41:40 -07:00
David S. Miller f5fca60865 ipv4: Pass flow key down into ip_append_*().
This way rt->rt_dst accesses are unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08 21:24:07 -07:00
David S. Miller 77968b7824 ipv4: Pass flow keys down into datagram packet building engine.
This way ip_output.c no longer needs rt->rt_{src,dst}.

We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08 21:24:06 -07:00
Mahesh Bandewar eed2a12f1e net: Allow ethtool to set interface in loopback mode.
This patch enables ethtool to set the loopback mode on a given interface.
By configuring the interface in loopback mode in conjunction with a policy
route / rule, a userland application can stress the egress / ingress path
exposing the flows of the change in progress and potentially help developer(s)
understand the impact of those changes without even sending a packet out
on the network.

Following set of commands illustrates one such example -
    a) ip -4 addr add 192.168.1.1/24 dev eth1
    b) ip -4 rule add from all iif eth1 lookup 250
    c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
    d) arp -Ds 192.168.1.100 eth1
    e) arp -Ds 192.168.1.200 eth1
    f) sysctl -w net.ipv4.ip_nonlocal_bind=1
    g) sysctl -w net.ipv4.conf.all.accept_local=1
    # Assuming that the machine has 8 cores
    h) taskset 000f netserver -L 192.168.1.200
    i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08 15:59:12 -07:00
Yaniv Rosner 57cc71bc3c ethtool: Add 20G bit definitions
Add 20G supported and advertising bit definitions.
20G will be supported with the 57840 chips.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
------
 include/linux/ethtool.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08 15:42:43 -07:00
David S. Miller d9d8da805d inet: Pass flowi to ->queue_xmit().
This allows us to acquire the exact route keying information from the
protocol, however that might be managed.

It handles all of the possibilities, from the simplest case of storing
the key in inet->cork.fl to the more complex setup SCTP has where
individual transports determine the flow.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-08 15:28:28 -07:00