Commit Graph

19231 Commits

Author SHA1 Message Date
KOSAKI Motohiro 2142c131a3 net: convert to new cpumask API
We plan to remove cpu_xx() old api later. Thus this patch
convert it.

This patch has no functional change.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 11:53:49 -07:00
Eric Dumazet 5173cc0577 ipv4: more compliant RFC 3168 support
Commit 6623e3b24a (ipv4: IP defragmentation must be ECN aware) was an
attempt to not lose "Congestion Experienced" (CE) indications when
performing datagram defragmentation.

Stefanos Harhalakis raised the point that RFC 3168 requirements were not
completely met by this commit.

In particular, we MUST detect invalid combinations and eventually drop
illegal frames.

Reported-by: Stefanos Harhalakis <v13@v13.gr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 14:49:14 -04:00
David S. Miller c5be24ff62 ipv4: Trivial rt->rt_src conversions in net/ipv4/route.c
At these points we have a fully filled in value via the IP
header the form of ip_hdr(skb)->saddr

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 13:49:05 -04:00
Eric Dumazet 1a8218e962 net: ping: dont call udp_ioctl()
udp_ioctl() really handles UDP and UDPLite protocols.

1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds
a frame with bad checksum.

2) It has a dependency on sizeof(struct udphdr), not applicable to
ICMP/PING

If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be
done differently.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16 11:49:39 -04:00
sjur.brandeland@stericsson.com 3f874adc4a caif: remove unesesarry exports
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:56 -04:00
sjur.brandeland@stericsson.com 33b2f5598b caif: Bugfix debugfs directory name must be unique.
Race condition caused debugfs_create_dir() to fail due to duplicate
name. Use atomic counter to create unique directory name.

net_ratelimit() is introduced to limit debug printouts.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:56 -04:00
sjur.brandeland@stericsson.com c85c2951d4 caif: Handle dev_queue_xmit errors.
Do proper handling of dev_queue_xmit errors in order to
avoid double free of skb and leaks in error conditions.
In cfctrl pending requests are removed when CAIF Link layer goes down.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:56 -04:00
sjur.brandeland@stericsson.com bee925db9a caif: prepare support for namespaces
Use struct net to reference CAIF configuration object instead of static variables.
Refactor functions caif_connect_client, caif_disconnect_client and squach
files cfcnfg.c and caif_config_utils.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:55 -04:00
sjur.brandeland@stericsson.com b3ccfbe409 caif: Protected in-flight packets using dev or sock refcont.
CAIF Socket Layer and ip-interface registers reference counters
in CAIF service layer. The functions sock_hold, sock_put and
dev_hold, dev_put are used by CAIF Stack to protect from freeing
memory while packets are in-flight.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:55 -04:00
sjur.brandeland@stericsson.com 43e3692101 caif: Move refcount from service layer to sock and dev.
Instead of having reference counts in caif service layers,
we hook into existing refcount handling in socket layer and netdevice.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:55 -04:00
sjur.brandeland@stericsson.com cb3cb423a0 caif: Add ref-count to framing layer
Introduce Per-cpu reference for lower part of CAIF Stack.
Before freeing payload is disabled, synchronize_rcu() is called,
and then ref-count verified to be zero.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:55 -04:00
sjur.brandeland@stericsson.com f362144084 caif: Use RCU and lists in cfcnfg.c for managing caif link layers
RCU lists are used for handling the link layers instead of array.
When generating CAIF phy-id, ifindex is used as base. Legal range is 1-6.
Introduced set_phy_state() for managing CAIF Link layer state.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:54 -04:00
sjur.brandeland@stericsson.com bd30ce4bc0 caif: Use RCU instead of spin-lock in caif_dev.c
RCU read_lock and refcount is used to protect in-flight packets.

Use RCU and counters to manage freeing lower part of the CAIF stack if
CAIF-link layer is removed. Old solution based on delaying removal of
device is removed.

When CAIF link layer goes down the use of CAIF link layer is disabled
(by calling caif_set_phy_state()), but removal and freeing of the
lower part of the CAIF stack is done when Link layer is unregistered.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:54 -04:00
sjur.brandeland@stericsson.com 0b1e9738de caif: Use rcu_read_lock in CAIF mux layer.
Replace spin_lock with rcu_read_lock when accessing lists to layers
and cache. While packets are in flight rcu_read_lock should not be held,
instead ref-counters are used in combination with RCU.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 17:45:54 -04:00
Eric Dumazet 1b1cb1f78a net: ping: small changes
ping_table is not __read_mostly, since it contains one rwlock,
and is static to ping.c

ping_port_rover & ping_v4_lookup are static

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15 01:22:21 -04:00
David S. Miller 4dc6ec26fe Merge branch 'batman-adv/next' of git://git.open-mesh.org/ecsv/linux-merge 2011-05-14 22:47:51 -04:00
Marek Lindner ca06c6eb9a batman-adv: reset broadcast flood protection on error
The broadcast flood protection should be reset to its original value
if the primary interface could not be retrieved.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-15 00:02:06 +02:00
Sven Eckelmann 6d5808d4ae batman-adv: Add missing hardif_free_ref in forw_packet_free
add_bcast_packet_to_list increases the refcount for if_incoming but the
reference count is never decreased. The reference count must be
increased for all kinds of forwarded packets which have the primary
interface stored and forw_packet_free must decrease them. Also
purge_outstanding_packets has to invoke forw_packet_free when a work
item was really cancelled.

This regression was introduced in
32ae9b221e.

Reported-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-05-15 00:02:06 +02:00
David S. Miller 7be799a70b ipv4: Remove rt->rt_dst reference from ip_forward_options().
At this point iph->daddr equals what rt->rt_dst would hold.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 17:31:02 -04:00
David S. Miller 8e36360ae8 ipv4: Remove route key identity dependencies in ip_rt_get_source().
Pass in the sk_buff so that we can fetch the necessary keys from
the packet header when working with input routes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 17:29:41 -04:00
David S. Miller 22f728f8f3 ipv4: Always call ip_options_build() after rest of IP header is filled in.
This will allow ip_options_build() to reliably look at the values of
iph->{daddr,saddr}

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 17:21:27 -04:00
David S. Miller 0374d9ceb0 ipv4: Kill spurious write to iph->daddr in ip_forward_options().
This code block executes when opt->srr_is_hit is set.  It will be
set only by ip_options_rcv_srr().

ip_options_rcv_srr() walks until it hits a matching nexthop in the SRR
option addresses, and when it matches one 1) looks up the route for
that nexthop and 2) on route lookup success it writes that nexthop
value into iph->daddr.

ip_forward_options() runs later, and again walks the SRR option
addresses looking for the option matching the destination of the route
stored in skb_rtable().  This route will be the same exact one looked
up for the nexthop by ip_options_rcv_srr().

Therefore "rt->rt_dst == iph->daddr" must be true.

All it really needs to do is record the route's source address in the
matching SRR option adddress.  It need not write iph->daddr again,
since that has already been done by ip_options_rcv_srr() as detailed
above.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 17:15:50 -04:00
Peter Pan(潘卫平) 0696c3a8ac net:set valid name before calling ndo_init()
In commit 1c5cae815d (net: call dev_alloc_name from register_netdevice),
a bug of bonding was involved, see example 1 and 2.

In register_netdevice(), the name of net_device is not valid until
dev_get_valid_name() is called. But dev->netdev_ops->ndo_init(that is
bond_init) is called before dev_get_valid_name(),
and it uses the invalid name of net_device.

I think register_netdevice() should make sure that the name of net_device is
valid before calling ndo_init().

example 1:
modprobe bonding
ls  /proc/net/bonding/bond%d

ps -eLf
root      3398     2  3398  0    1 21:34 ?        00:00:00 [bond%d]

example 2:
modprobe bonding max_bonds=3

[  170.100292] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[  170.101090] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
[  170.102469] ------------[ cut here ]------------
[  170.103150] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157()
[  170.104075] Hardware name: VirtualBox
[  170.105065] proc_dir_entry 'bonding/bond%d' already registered
[  170.105613] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding]
[  170.108397] Pid: 3457, comm: modprobe Not tainted 2.6.39-rc2+ #14
[  170.108935] Call Trace:
[  170.109382]  [<c0438f3b>] warn_slowpath_common+0x6a/0x7f
[  170.109911]  [<c051a42a>] ? proc_register+0x126/0x157
[  170.110329]  [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f
[  170.110846]  [<c051a42a>] proc_register+0x126/0x157
[  170.111870]  [<c051a4dd>] proc_create_data+0x82/0x98
[  170.112335]  [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding]
[  170.112905]  [<f94dd806>] bond_init+0x77/0xa5 [bonding]
[  170.113319]  [<c0721ac6>] register_netdevice+0x8c/0x1d3
[  170.113848]  [<f94e0e30>] bond_create+0x6c/0x90 [bonding]
[  170.114322]  [<f94f4763>] bonding_init+0x763/0x7b1 [bonding]
[  170.114879]  [<c0401240>] do_one_initcall+0x76/0x122
[  170.115317]  [<f94f4000>] ? 0xf94f3fff
[  170.115799]  [<c0463f1e>] sys_init_module+0x1286/0x140d
[  170.116879]  [<c07c6d9f>] sysenter_do_call+0x12/0x28
[  170.117404] ---[ end trace 64e4fac3ae5fff1a ]---
[  170.117924] bond%d: Warning: failed to register to debugfs
[  170.128728] ------------[ cut here ]------------
[  170.129360] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157()
[  170.130323] Hardware name: VirtualBox
[  170.130797] proc_dir_entry 'bonding/bond%d' already registered
[  170.131315] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding]
[  170.133731] Pid: 3457, comm: modprobe Tainted: G        W   2.6.39-rc2+ #14
[  170.134308] Call Trace:
[  170.134743]  [<c0438f3b>] warn_slowpath_common+0x6a/0x7f
[  170.135305]  [<c051a42a>] ? proc_register+0x126/0x157
[  170.135820]  [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f
[  170.137168]  [<c051a42a>] proc_register+0x126/0x157
[  170.137700]  [<c051a4dd>] proc_create_data+0x82/0x98
[  170.138174]  [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding]
[  170.138745]  [<f94dd806>] bond_init+0x77/0xa5 [bonding]
[  170.139278]  [<c0721ac6>] register_netdevice+0x8c/0x1d3
[  170.139828]  [<f94e0e30>] bond_create+0x6c/0x90 [bonding]
[  170.140361]  [<f94f4763>] bonding_init+0x763/0x7b1 [bonding]
[  170.140927]  [<c0401240>] do_one_initcall+0x76/0x122
[  170.141494]  [<f94f4000>] ? 0xf94f3fff
[  170.141975]  [<c0463f1e>] sys_init_module+0x1286/0x140d
[  170.142463]  [<c07c6d9f>] sysenter_do_call+0x12/0x28
[  170.142974] ---[ end trace 64e4fac3ae5fff1b ]---
[  170.144949] bond%d: Warning: failed to register to debugfs

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 16:49:49 -04:00
Vasiliy Kulikov c319b4d76b net: ipv4: add IPPROTO_ICMP socket kind
This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges.  In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

    socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets.  Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
    - switched to flowi4.
    - minor changes to be consistent with raw sockets code.

PATCH v2:
    - changed ping_debug() to pr_debug().
    - removed CONFIG_IP_PING.
    - removed ping_seq_fops.owner field (unused for procfs).
    - switched to proc_net_fops_create().
    - switched to %pK in seq_printf().

PATCH v1:
    - fixed checksumming bug.
    - CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
    - minor cleanups.
    - introduced sysctl'able group range to restrict socket(2).

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 16:08:13 -04:00
KOSAKI Motohiro f20190302e convert old cpumask API into new one
Adapt new API.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13 14:55:21 -04:00