When I made the tun driver use non-linear packets as the preferred
option, it broke non-GSO users because they would end up allocating
a completely non-linear packet, which triggers a crash when we call
eth_type_trans.
This patch reverts non-GSO users to using linear packets and adds
a check to ensure that GSO users can't cause crashes in the same
way.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When forward-porting the tun accounting patch I managed to break
the send path compltely by dropping the tun_get call.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tap devices can make use of a small MAC filter set via the
TUNSETTXFILTER ioctl. The filter has a set of exact matches
plus a hash for imperfect filtering of additional multicast
addresses. The current code is unbalanced, adding unicast
addresses to the multicast hash, but only checking the hash
against multicast addresses. This results in the filter
dropping unicast addresses that overflow the exact filter.
The fix is simply to disable the filter by leaving count set
to zero if we find non-multicast addresses after the exact
match table is filled.
Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike a normal socket path, the tuntap device send path does
not have any accounting. This means that the user-space sender
may be able to pin down arbitrary amounts of kernel memory by
continuing to send data to an end-point that is congested.
Even when this isn't an issue because of limited queueing at
most end points, this can also be a problem because its only
response to congestion is packet loss. That is, when those
local queues at the end-point fills up, the tuntap device will
start wasting system time because it will continue to send
data there which simply gets dropped straight away.
Of course one could argue that everybody should do congestion
control end-to-end, unfortunately there are people in this world
still hooked on UDP, and they don't appear to be going away
anywhere fast. In fact, we've always helped them by performing
accounting in our UDP code, the sole purpose of which is to
provide congestion feedback other than through packet loss.
This patch attempts to apply the same bandaid to the tuntap device.
It creates a pseudo-socket object which is used to account our
packets just as a normal socket does for UDP. Of course things
are a little complex because we're actually reinjecting traffic
back into the stack rather than out of the stack.
The stack complexities however should have been resolved by preceding
patches. So this one can simply start using skb_set_owner_w.
For now the accounting is essentially disabled by default for
backwards compatibility. In particular, we set the cap to INT_MAX.
This is so that existing applications don't get confused by the
sudden arrival EAGAIN errors.
In future we may wish (or be forced to) do this by default.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Tokarev wrote:
[]
> 2, and this is the main one: How about supplementary groups?
>
> Here I have a valid usage case: a group of testers running various
> versions of windows using KVM (kernel virtual machine), 1 at a time,
> to test some software. kvm is set up to use bridge with a tap device
> (there should be a way to connect to the machine). Anyone on that group
> has to be able to start/stop the virtual machines.
>
> My first attempt - pretty obvious when I saw -g option of tunctl - is
> to add group ownership for the tun device and add a supplementary group
> to each user (their primary group should be different). But that fails,
> since kernel only checks for egid, not any other group ids.
>
> What's the reasoning to not allow supplementary groups and to only check
> for egid?
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This greatly simplifies testing to verify I have fixed the problems
with a tun device disappearing when the tun file descriptor is still
held open.
Further it allows removal network namespace operations for the tun
driver. Reducing the network namespace handling in the driver to the
minimum. i.e. When we are creating a tun device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the awkward case between free_netdev and dev_chr_close fixed
there is no longer any need to limit tun and tap devices to the
network namespace they were created in. So remove the
NETIF_F_NETNS_LOCAL flag on the network device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tun code does not cope gracefully if the network device goes away before
the tun file descriptor is closed. It looks like we can trigger this with
rmmod, and moving tun devices between network namespaces will allow this
to be triggered when network namespaces exit.
To fix this I introduce an intermediate data structure tun_file which
holds a count of users and a pointer to the struct tun_struct. tun_get
increments that reference count if it is greater than 0. tun_put decrements
that reference count and detaches from the network device if the count is 0.
While we have a file attached to the network device I hold a reference
to the network device keeping it from going away completely.
When a network device is unregistered I decrement the count of the
attached tun_file and if that was the last user I detach the tun_file,
and all processes on read_wait are woken up to ensure they do not
sleep indefinitely. As some of those sleeps happen with the count on
the tun device elevated waking up the read waiters ensures that
tun_file will be detached in a timely manner.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The poll interface requires that the waitqueue exist while the struct
file is open. In the rare case when a tun device disappears before
the tun file closes we fail to provide this property, so move
read_wait.
This is safe now that tun_net_xmit is atomic with tun_detach.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently this small race allows for a packet to be received when we
detach from an tun device and still be enqueued. Not especially
important but not what the code is trying to do.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Grabbing namespaces in open, and putting them in close always seems to
be the cleanest approach with the fewest surprises.
So now that we have tun_file so we have somepleace to put the network
namespace, let's grab the network namespace on file open and put on
file close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun code suffers from only having a single word of
data that exists for the entire life of the tun file descriptor.
This results in peculiar holding of references to the network namespace
as well as races between free_netdevice and tun_chr_close.
Fix this by introducing tun_file which will hold the per file state.
For the moment it still holds just a single word so the differences
are all logic changes with no changes in semantics.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EBADF is meaningless in the context of a poll mask so use POLLERR
instead.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for two different tasks with access to the same file
descriptor to call tun_set_iff on it at the same time and race to
attach to a tap device. Prevent this by placing all of the logic to
attach to a file descriptor in one function and testing the file
descriptor to be certain it is not already attached to another tun
device.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the tun driver keeps a private list of tun devices for what
appears to be a small gain in performance when reconnecting a file
descriptor to an existing tun or tap device. So simplify the code by
removing it.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_pernet_gen_device() expects 'int*', found via sparse.
CHECK drivers/net/tun.c
drivers/net/tun.c:1245:36: warning: incorrect type in argument 1 (different signedness)
drivers/net/tun.c:1245:36: expected int *id
drivers/net/tun.c:1245:36: got unsigned int static [toplevel] *<noident>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the TUN/TAP tunnel driver to net_device_ops.
Split the ops in two, and retain compatability.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: netdev@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>