Commit Graph

31 Commits

Author SHA1 Message Date
Al Viro 4040153087 security: trim security.h
Trim security.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-14 10:45:42 +11:00
Krishna Kumar ef0002b577 macvtap: Fix macvtap_get_queue to use rxhash first
It was reported that the macvtap device selects a
Acked-by: Michael S. Tsirkin <mst@redhat.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20 13:45:55 -05:00
Eric Dumazet 2cfa5a0471 net: treewide use of RCU_INIT_POINTER
rcu_assign_pointer(ptr, NULL) can be safely replaced by
RCU_INIT_POINTER(ptr, NULL)

(old rcu_assign_pointer() macro was testing the NULL value and could
omit the smp_wmb(), but this had to be removed because of compiler
warnings)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-23 18:48:19 -05:00
Eric W. Biederman e09eff7fc1 macvtap: Fix the minor device number allocation
On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices.  I have systems running
automated tests that that hit this condition in just a few days.

Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.

Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device.  With macvtap
specific data structures it is impossible to find any other
kind of networking device.

Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers.  It doesn't solve the
original problem but there is no penalty for a larger minor
device range.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:07 -04:00
Eric W. Biederman 9bf1907f42 macvtap: Rewrite macvtap_newlink so the error handling works.
Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.

Move device_create into a netdevice creation notifier.   The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:07 -04:00
Eric W. Biederman 2259fef0bb macvtap: Don't leak unreceived packets when we delete a macvtap device.
To avoid leaking packets in the receive queue.  Add a socket destructor
that will run whenever destroy a macvtap socket.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:07 -04:00
Eric W. Biederman 047af9cfed macvtap: Fix macvtap_open races in the zero copy enable code.
To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags.   Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags.  This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:07 -04:00
Eric W. Biederman 99f34b38cd macvtap: Close a race between macvtap_open and macvtap_dellink.
There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink.   After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-21 02:53:06 -04:00
Jason Wang 653fc91557 macvtap: fix the uninitialized var using in macvtap_alloc_skb()
Commit d1b08284 use new frag API but would leave f to be used
uninitialized, this patch fix it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-20 14:37:22 -04:00
Ian Campbell d1b08284ad macvtap: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15 15:34:59 -04:00
Shirley Ma 97bc3633be macvtap: macvtapTX zero-copy support
Only 128 bytes is copied, the rest of data is DMA mapped directly from
userspace.

Signed-off-by: Shirley Ma <xma@...ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-07 04:41:24 -07:00
Jason Wang 10a8d94a95 virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID
There's no need for the guest to validate the checksum if it have been
validated by host nics. So this patch introduces a new flag -
VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
examing in guest. The backend (tap/macvtap) may set this flag when
met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.

No feature negotiation is needed as old driver just ignore this flag.

Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
when gro is on no difference as it produces skb with partial
checksum. But when gro is disabled, 20% or even higher improvement
could be measured by netperf.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-11 15:57:47 -07:00
David S. Miller 33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
Nicolas Kaiser ce3c869283 drivers/net/macvtap: fix error check
'len' is unsigned of type size_t and can't be negative.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-07 15:57:58 -08:00
Eric Dumazet 13707f9e5e drivers/net: remove some rcu sparse warnings
Add missing __rcu annotations and helpers.
minor : Fix some rcu_dereference() calls in macvtap

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 15:02:57 -08:00
Eric Dumazet 1ac9ad1394 net: remove dev_txq_stats_fold()
After recent changes, (percpu stats on vlan/tunnels...), we dont need
anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

1) ixgbe can be converted to use existing tx_ring counters.

2) macvlan incremented txq->tx_dropped, it can use the
dev->stats.tx_dropped counter.

3) sch_teql : almost revert ab35cd4b8f (Use net_device internal stats)
    Now we have ndo_get_stats64(), use it, even for "unsigned long"
fields (No need to bring back a struct net_device_stats)

4) gianfar adds a stats structure per tx queue to hold
tx_bytes/tx_packets

This removes a lockdep warning (and possible lockup) in rndis gadget,
calling dev_get_stats() from hard IRQ context.

Ref: http://www.spinics.net/lists/netdev/msg149202.html

Reported-by: Neil Jones <neiljay@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-13 21:44:34 -08:00
Michał Mirosław 55508d601d net: Use skb_checksum_start_offset()
Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:43:14 -08:00
Krishna Kumar 1565c7c1c4 macvtap: Implement multiqueue for macvtap driver
Implement multiqueue facility for macvtap driver. The idea is that
a macvtap device can be opened multiple times and the fd's can be
used to register eg, as backend for vhost.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-08-16 21:06:25 -07:00
David S. Miller bb7e95c8fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x_main.c

Merge bnx2x bug fixes in by hand... :-/

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-27 21:01:35 -07:00
Herbert Xu 8a35747a5d macvtap: Limit packet queue length
Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length.  This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Chris Wright noticed that for this patch to work, we need a
non-zero TX queue length.  This patch includes his work to change
the default macvtap TX queue length to 500.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:08:56 -07:00
David S. Miller 1ebed71ae2 macvtap: Use dev_t for macvtap_major.
Reported-by: "Robert P. J. Day" <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-10 19:25:50 -07:00
Michael S. Tsirkin 55afbd0810 macvtap: add ioctl to modify vnet header size
This adds TUNSETVNETHDRSZ/TUNGETVNETHDRSZ support
to macvtap.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
2010-05-04 01:35:47 +03:00
Eric Dumazet 4381548237 net: sock_def_readable() and friends RCU conversion
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-01 15:00:15 -07:00
Eric Dumazet 4a4771a58e net: use sk_sleep()
Commit aa395145 (net: sk_sleep() helper) missed three files in the
conversion.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:18:45 -07:00
Eric Dumazet aa39514516 net: sk_sleep() helper
Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
	return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 16:37:13 -07:00