Commit Graph

196 Commits

Author SHA1 Message Date
Eric W. Biederman
15e473046c netlink: Rename pid to portid to avoid confusion
It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10 15:30:41 -04:00
David S. Miller
c32f38619a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:14:18 -04:00
David S. Miller
e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
Fengguang Wu
a0dfb2634e af_packet: match_fanout_group() can be static
cc: Eric Leblond <eric@regit.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:27:12 -07:00
Pavel Emelyanov
0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
danborkmann@iogearbox.net
9e67030af3 af_packet: use define instead of constant
Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.

Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
David S. Miller
1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Eric Leblond
c0de08d042 af_packet: don't emit packet on orig fanout group
If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.

Reported-by: Aleksandr Kotov <a1k@mail.ru>
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:37:29 -07:00
Pavel Emelyanov
fff3321d75 packet: Report fanout status via diag engine
Reported value is the same reported by the FANOUT getsockoption, but
unlike it, the absent fanout setup results in absent nlattr, rather
than in nlattr with zero value. This is done so, since zero fanout
report may mean both -- no fanout, and fanout with both id and type zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:23:14 -07:00
Pavel Emelyanov
16f01365fa packet: Report rings cfg via diag engine
One extension bit may result in two nlattrs -- one per ring type.
If some ring type is not configured, then the respective nlatts
will be empty.

The structure reported contains the data, that is given to the
corresponding ring setup socket option.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:23:14 -07:00
Eric W. Biederman
a7cb5a49bf userns: Print out socket uids in a user namespace aware fashion.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:48:06 -07:00
Pavel Emelyanov
eea68e2f1a packet: Report socket mclist info via diag module
The info is reported as an array of packet_diag_mclist structures. Each
includes not only the directly configured values (index, type, etc), but
also the "count".

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
8a360be0c5 packet: Report more packet sk info via diag module
This reports in one rtattr message all the other scalar values, that can be
set on a packet socket with setsockopt.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
96ec632714 packet: Diag core and basic socket info dumping
The diag module can be built independently from the af_packet.ko one,
just like it's done in unix sockets.

The core dumping message carries the info available at socket creation
time, i.e. family, type and protocol (in the same byte order as shown in
the proc file).

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
2787b04b6c packet: Introduce net/packet/internal.h header
The diag module will need to access some private packet_sock data, so
move it to a header in advance. This file will be shared between the
af_packet.c and the diag.c

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
danborkmann@iogearbox.net
7f5c3e3a80 af_packet: remove BUG statement in tpacket_destruct_skb
Here's a quote of the comment about the BUG macro from asm-generic/bug.h:

 Don't use BUG() or BUG_ON() unless there's really no way out; one
 example might be detecting data structure corruption in the middle
 of an operation that can't be backed out of.  If the (sub)system
 can somehow continue operating, perhaps with reduced functionality,
 it's probably not BUG-worthy.

 If you're tempted to BUG(), think again:  is completely giving up
 really the *only* solution?  There are usually better options, where
 users don't need to reboot ASAP and can mostly shut down cleanly.

In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-12 13:42:17 -07:00
Ying Xue
99aa3473e6 af_packet: Quiet sparse noise about using plain integer as NULL pointer
Quiets the sparse warning:
warning: Using plain integer as NULL pointer

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-08 15:43:22 -07:00
parav.pandit@emulex.com
e440cf2ca0 net: added support for 40GbE link.
1. removed code replication for tov calculation for 1G, 10G and
made is common for speed > 1G (1G, 10G, 40G, 100G).
2. defines values for #4 different 40G Phys (KR4, LF4, SR4, CR4)

Signed-off-by: Parav Pandit <parav.pandit@emulex.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27 15:42:24 -07:00
danborkmann@iogearbox.net
de74e92aa8 af_packet: use sizeof instead of constant in spkt_device
This small patch removes access to the last element of the spkt_device
array through a constant. Instead, it is accessed by sizeof() to respect
possible changes in if_packet.h.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11 16:51:51 -07:00
Joe Perches
e3192690a3 net: Remove casts to same type
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.

For example, this cast:

	int y;
	int *p = (int *)&y;

I used the coccinelle script below to find and remove these
unnecessary casts.  I manually removed the conversions this
script produces of casts with __force and __user.

@@
type T;
T *p;
@@

-	(T *)p
+	p

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-04 11:45:11 -04:00
Eric Dumazet
c06fff6e17 af_packet: packet_getsockopt() cleanup
Factorize code, since most fetched values are int type.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Eric Dumazet
abc4e4fa29 packet: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet
95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
David Howells
9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Ben Greear
3bdc0eba0b net: Add framework to allow sending packets with customized CRC.
This is useful for testing RX handling of frames with bad
CRCs.

Requires driver support to actually put the packet on the
wire properly.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-02-24 01:37:35 -08:00