Commit Graph

309 Commits

Author SHA1 Message Date
Craig Gallek 86dfb4acb3 tun: Don't assume type tun in tun_device_event
The referenced change added a netlink notifier for processing
device queue size events.  These events are fired for all devices
but the registered callback assumed they only occurred for tun
devices.  This fix adds a check (borrowed from macvtap.c) to discard
non-tun device events.

For reference, this fixes the following splat:
[   71.505935] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[   71.513870] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[   71.519906] PGD 3f41f56067 PUD 3f264b7067 PMD 0
[   71.524497] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
[   71.529374] gsmi: Log Shutdown Reason 0x03
[   71.533417] Modules linked in:[   71.533826] mlx4_en: eth1: Link Up

[   71.539616]  bonding w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
[   71.549282] CPU: 12 PID: 7915 Comm: set.ixion-haswe Not tainted 4.7.0-dbx-DEV #8
[   71.556586] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15, BIOS 2.58.0 05/03/2016
[   71.564495] task: ffff887f00bb20c0 ti: ffff887f00798000 task.ti: ffff887f00798000
[   71.571894] RIP: 0010:[<ffffffff8153c1a0>]  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[   71.580327] RSP: 0018:ffff887f0079bbd8  EFLAGS: 00010202
[   71.585576] RAX: fffffffffffffae8 RBX: ffff887ef6d03378 RCX: 0000000000000000
[   71.592624] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
[   71.599675] RBP: ffff887f0079bc48 R08: 0000000000000000 R09: 0000000000000001
[   71.606730] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
[   71.613780] R13: 0000000000000000 R14: 0000000000000001 R15: ffff887f0079bd00
[   71.620832] FS:  00007f5cdc581700(0000) GS:ffff883f7f700000(0000) knlGS:0000000000000000
[   71.628826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   71.634500] CR2: 0000000000000010 CR3: 0000003f3eb62000 CR4: 00000000001406e0
[   71.641549] Stack:
[   71.643533]  ffff887f0079bc08 0000000000000246 000000000000001e ffff887ef6d00000
[   71.650871]  ffff887f0079bd00 0000000000000000 0000000000000000 ffffffff00000000
[   71.658210]  ffff887f0079bc48 ffffffff81d24070 00000000fffffff9 ffffffff81cec7a0
[   71.665549] Call Trace:
[   71.667975]  [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
[   71.673823]  [<ffffffff816365d0>] ? show_tx_maxrate+0x30/0x30
[   71.679502]  [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
[   71.685778]  [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
[   71.691976]  [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
[   71.698681]  [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
[   71.704956]  [<ffffffff81636636>] change_tx_queue_len+0x66/0x90
[   71.710807]  [<ffffffff816381ef>] netdev_store.isra.5+0xbf/0xd0
[   71.716658]  [<ffffffff81638350>] tx_queue_len_store+0x50/0x60
[   71.722431]  [<ffffffff814a6798>] dev_attr_store+0x18/0x30
[   71.727857]  [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
[   71.733274]  [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
[   71.739045]  [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
[   71.745499]  [<ffffffff8125a108>] __vfs_write+0x28/0x120
[   71.750748]  [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
[   71.756516]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
[   71.762278]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
[   71.768038]  [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
[   71.773113]  [<ffffffff8125c092>] SyS_write+0x52/0xa0
[   71.778110]  [<ffffffff817528e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[   71.784472] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00 00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6 41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa ff ff
[   71.803655] RIP  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
[   71.809769]  RSP <ffff887f0079bbd8>
[   71.813213] CR2: 0000000000000010
[   71.816512] ---[ end trace 4db6449606319f73 ]---

Fixes: 1576d98605 ("tun: switch to use skb array for tx")
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08 23:58:57 -04:00
Jason Wang f48cc6b266 tun: fix build warnings
Stephen Rothwell reports a build warnings(powerpc ppc64_defconfig)

drivers/net/tun.c: In function 'tun_do_read.part.5':
/home/sfr/next/next/drivers/net/tun.c:1491:6: warning: 'err' may be
used uninitialized in this function [-Wmaybe-uninitialized]
   int err;

This is because tun_ring_recv() may return an uninitialized err, fix this.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04 17:19:27 -07:00
Jason Wang 1576d98605 tun: switch to use skb array for tx
We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.

This patch tries to address this by:

- switch from sk_receive_queue to a skb_array, and resize it when
  tx_queue_len was changed.
- introduce a new proto_ops peek_len which was used for peeking the
  skb length.
- implement a tun version of peek_len for vhost_net to use and convert
  vhost_net to use peek_len if possible.

Pktgen test shows about 15.3% improvement on guest receiving pps for small
buffers:

Before: ~1300000pps
After : ~1500000pps

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01 05:32:17 -04:00
Paolo Abeni df10db98ab tun: fix csum generation for tap devices
The commit 3416609363 ("tuntap: use common code for virtio_net_hdr
and skb GSO conversion") replaced the tun code for header manipulation
with the generic helpers. While doing so, it implictly moved the
skb_partial_csum_set() invocation after eth_type_trans(), which
invalidate the current gso start/offset values.
Fix it by moving the helper invocation before the mac pulling.

Fixes: 3416609363 ("tuntap: use common code for virtio_net_hdr and skb GSO conversion")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 14:00:33 -07:00
Mike Rapoport 3416609363 tuntap: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_{from,to}_skb

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:03:55 -07:00
Jason Wang addf8fc4ac tuntap: correctly wake up process during uninit
We used to check dev->reg_state against NETREG_REGISTERED after each
time we are woke up. But after commit 9e641bdcfa ("net-tun:
restructure tun_do_read for better sleep/wakeup efficiency"), it uses
skb_recv_datagram() which does not check dev->reg_state. This will
result if we delete a tun/tap device after a process is blocked in the
reading. The device will wait for the reference count which was held
by that process for ever.

Fixes this by using RCV_SHUTDOWN which will be checked during
sk_recv_datagram() before trying to wake up the process during uninit.

Fixes: 9e641bdcfa ("net-tun: restructure tun_do_read for better
sleep/wakeup efficiency")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Xi Wang <xii@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 19:28:37 -04:00
Jason Wang 3df97ba830 tuntap: calculate rps hash only when needed
There's no need to calculate rps hash if it was not enabled. So this
patch export rps_needed and check it before trying to get rps
hash. Tests (using pktgen to inject packets to guest) shows this can
improve pps about 13% (when rps is disabled).

Before:
~1150000 pps
After:
~1300000 pps

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
----
Changes from V1:
- Fix build when CONFIG_RPS is not set
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-28 16:38:54 -04:00
Paolo Abeni 2a2bbf1700 tun: don't require serialization lock on tx
The current tun_net_xmit() implementation don't need any external
lock since it relies on rcu protection for the tun data structure
and on socket queue lock for skb queuing.

This patch set the NETIF_F_LLTX feature bit in the tun device, so
that on xmit, in absence of qdisc, no serialization lock is acquired
by the caller.

The user space can remove the default tun qdisc with:

tc qdisc replace dev <tun device name> root noqueue

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-18 14:36:26 -04:00
Paolo Abeni 608b997726 tun: use per cpu variables for stats accounting
Currently the tun device accounting uses dev->stats without applying any
kind of protection, regardless that accounting happens in preemptible
process context.
This patch move the tun stats to a per cpu data structure, and protect
the updates with  u64_stats_update_begin()/u64_stats_update_end() or
this_cpu_inc according to the stat type. The per cpu stats are
aggregated by the newly added ndo_get_stats64 ops.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-14 22:55:25 -04:00
David S. Miller ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
Jason Wang 016adb7260 tuntap: restore default qdisc
After commit f84bb1eac0 ("net: fix IFF_NO_QUEUE for drivers using
alloc_netdev"), default qdisc was changed to noqueue because
tuntap does not set tx_queue_len during .setup(). This patch restores
default qdisc by setting tx_queue_len in tun_setup().

Fixes: f84bb1eac0 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-08 15:52:45 -04:00
Hannes Frederic Sowa 8ced425ee6 tun: use socket locks for sk_{attach,detatch}_filter
This reverts commit 5a5abb1fa3 ("tun, bpf: fix suspicious RCU usage
in tun_{attach, detach}_filter") and replaces it to use lock_sock around
sk_{attach,detach}_filter. The checks inside filter.c are updated with
lockdep_sock_is_held to check for proper socket locks.

It keeps the code cleaner by ensuring that only one lock governs the
socket filter instead of two independent locks.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07 16:44:14 -04:00
Soheil Hassas Yeganeh c14ac9451c sock: enable timestamping using control messages
Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
This is very costly when users want to sample writes to gather
tx timestamps.

Add support for enabling SO_TIMESTAMPING via control messages by
using tsflags added in `struct sockcm_cookie` (added in the previous
patches in this series) to set the tx_flags of the last skb created in
a sendmsg. With this patch, the timestamp recording bits in tx_flags
of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.

Please note that this is only effective for overriding the recording
timestamps flags. Users should enable timestamp reporting (e.g.,
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
socket options and then should ask for SOF_TIMESTAMPING_TX_*
using control messages per sendmsg to sample timestamps for each
write.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-04 15:50:30 -04:00
Daniel Borkmann 5a5abb1fa3 tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter
Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:

  [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
  [   52.765688] other info that might help us debug this:
  [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
  [   52.765701] 1 lock held by a.out/1525:
  [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
  [   52.765721] stack backtrace:
  [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
  [...]
  [   52.765768] Call Trace:
  [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
  [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
  [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
  [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
  [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
  [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
  [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
  [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
  [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
  [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
  [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
  [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25

Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.

Since the fix in f91ff5b9ff ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.

Since its introduction in 9940516259 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.

Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-01 14:33:46 -04:00
Paolo Abeni eaea34b23c net/tun: implement ndo_set_rx_headroom
ndo_set_rx_headroom controls the align value used by tun devices to
allocate skbs on frame reception.
When the xmit device adds a large encapsulation, this avoids an skb
head reallocation on forwarding.

The measured improvement when forwarding towards a vxlan dev with
frame size below the egress device MTU is as follow:

vxlan over ipv6, bridged: +6%
vxlan over ipv6, ovs: +7%

In case of ipv4 tunnels there is no improvement, since the tun
device default alignment provides enough headroom to avoid the skb
head reallocation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 15:54:30 -05:00
Eric Dumazet 1bd4978a88 tun: honor IFF_UP in tun_get_user()
If a tun interface is turned down, we should not allow packet injection
into the kernel.

Kernel does not send packets to the tun already.

TUNATTACHFILTER can not be used as only tun_net_xmit() is taking care
of it.

Reported-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-17 15:25:57 -05:00
Eric Dumazet 9cd3e072b0 net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:45:05 -05:00
Eric Dumazet 5fcd2d8be4 tun: use sk_fullsock() before reading sk->sk_tsflags
timewait or request sockets are small and do not contain sk->sk_tsflags

Without this fix, we might read garbage, and crash later in

__skb_complete_tx_timestamp()
 -> sock_queue_err_skb()

(These pseudo sockets do not have an error queue either)

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:45:48 -07:00
Toshiaki Makita 5e52796a9a tuntap: Don't segment multiple tagged packets on tap device
Tap devices don't need to segment multiple tagged packets.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03 14:24:50 -07:00
Linus Torvalds 5fc835284d Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost cross endian support from Michael Tsirkin:
 "I have just queued some more bugfix patches today but none fix
  regressions and none are related to these ones, so it looks like a
  good time for a merge for -rc1.

  The motivation for this is support for legacy BE guests on the new LE
  hosts.  There are two redeeming properties that made me merge this:

   - It's a trivial amount of code: since we wrap host/guest accesses
     anyway, almost all of it is well hidden from drivers.

   - Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY,
     and when it's clear, there's zero overhead (as some point it was
     tested by compiling with and without the patches, got the same
     stripped binary).

  Maybe we could create a Kconfig symbol to enforce the second point:
  prevent people from enabling it eg on x86.  I will look into this"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-pci: alloc only resources actually used.
  macvtap/tun: cross-endian support for little-endian hosts
  vhost: cross-endian support for legacy devices
  virtio: add explicit big-endian support to memory accessors
  vhost: introduce vhost_is_little_endian() helper
  vringh: introduce vringh_is_little_endian() helper
  macvtap: introduce macvtap_is_little_endian() helper
  tun: add tun_is_little_endian() helper
  virtio: introduce virtio_is_little_endian() helper
2015-07-03 16:02:25 -07:00
Greg Kurz 8b8e658b16 macvtap/tun: cross-endian support for little-endian hosts
The VNET_LE flag was introduced to fix accesses to virtio 1.0 headers
that are always little-endian. It can also be used to handle the special
case of a legacy little-endian device implemented by a big-endian host.

Let's add a flag and ioctls for big-endian devices as well. If both flags
are set, little-endian wins.

Since this is isn't a common usecase, the feature is controlled by a kernel
config option (not set by default).

Both macvtap and tun are covered by this patch since they share the same
API with userland.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:56 +02:00
Greg Kurz 7d82410950 virtio: add explicit big-endian support to memory accessors
The current memory accessors logic is:
- little endian if little_endian
- native endian (i.e. no byteswap) if !little_endian

If we want to fully support cross-endian vhost, we also need to be
able to convert to big endian.

Instead of changing the little_endian argument to some 3-value enum, this
patch changes the logic to:
- little endian if little_endian
- big endian if !little_endian

The native endian case is handled by all users with a trivial helper. This
patch doesn't change any functionality, nor it does add overhead.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:54 +02:00
Greg Kurz 25bd55bbab tun: add tun_is_little_endian() helper
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
2015-06-01 15:48:50 +02:00
Eric W. Biederman 11aa9c28b4 net: Pass kern from net_proto_family.create to sk_alloc
In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:17 -04:00
Eric W. Biederman 140e807da1 tun: Utilize the normal socket network namespace refcounting.
There is no need for tun to do the weird network namespace refcounting.
The existing network namespace refcounting in tfile has almost exactly
the same lifetime.  So rewrite the code to use the struct sock network
namespace refcounting and remove the unnecessary hand rolled network
namespace refcounting and the unncesary tfile->net.

This change allows the tun code to directly call sock_put bypassing
sock_release and making SOCK_EXTERNALLY_ALLOCATED unnecessary.

Remove the now unncessary tun_release so that if anything tries to use
the sock_release code path the kernel will oops, and let us know about
the bug.

The macvtap code already uses it's internal socket this way.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-11 10:50:16 -04:00