It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.
Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb. So remove net_alive allowing
packet reception run a little faster.
Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The untracked conntrack actually does usually have events marked for
delivery as its not special-cased in that part of the code. Skip the
actual delivery since it impacts performance noticeably.
Signed-off-by: Patrick McHardy <kaber@trash.net>
A long time ago we had bugs, primarily in TCP, where we would modify
skb->truesize (for TSO queue collapsing) in ways which would corrupt
the socket memory accounting.
skb_truesize_check() was added in order to try and catch this error
more systematically.
However this debugging check has morphed into a Frankenstein of sorts
and these days it does nothing other than catch false-positives.
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that in_atomic() will return false inside spinlocks if
CONFIG_PREEMPT=n. This will lead to deadlockable GFP_KERNEL allocations
from spinlocked regions.
Secondly, if CONFIG_PREEMPT=y, this bug solves itself because networking
will instead use GFP_ATOMIC from this callsite. Hence we won't get the
might_sleep() debugging warnings which would have informed us of the buggy
callsites.
Solve both these problems by switching to in_interrupt(). Now, if someone
runs a gfp_any() allocation from inside spinlock we will get the warning
if CONFIG_PREEMPT=y.
I reviewed all callsites and most of them were too complex for my little
brain and none of them documented their interface requirements. I have no
idea what this patch will do.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 9db66bdcc8 (net: convert
TCP/DCCP ehash rwlocks to spinlocks), I forgot to change one
occurrence of rwlock_t to spinlock_t
I believe sizeof(raw_spinlock_t) might be > 0 on !CONFIG_SMP if
CONFIG_DEBUG_SPINLOCK while sizeof(raw_rwlock_t) should be 0 in this
case.
Fortunatly, CONFIG_DEBUG_SPINLOCK adds fields to both spinlock_t and
rwlock_t, but at this might change in the future (being able to debug
spinlocks but not rwlocks for example), better to be safe.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
crc32c algorithm provides a byteswaped result. On little-endian
arches, the result ends up in big-endian/network byte order.
On big-endinan arches, the result ends up in little-endian
order and needs to be byte swapped again. Thus calling cpu_to_le32
gives the right output.
Tested-by: Jukka Taimisto <jukka.taimisto@mail.suomi.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix (delete) more mac80211 kernel-doc:
Warning(linux-2.6.28-git13//include/net/mac80211.h:375): Excess struct/union/enum/typedef member 'retry_count' description in 'ieee80211_tx_info'
Warning(linux-2.6.28-git13//net/mac80211/sta_info.h:308): Excess struct/union/enum/typedef member 'last_txrate' description in 'sta_info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The kernel-doc was referring to member @debufs_dentry instead of
@debugfs_dentry.
Reported by Randy Dunlap http://marc.info/?l=linux-netdev&m=123147942302885&w=2
As well, escape the colon in the field's text description, as it is
causing the generated text to be erraticly broken up (with paragraphs
moved down). Could not find a reason why it is happening so, even when
other field descriptions use colons and work as expected.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
ioat: fix self test for multi-channel case
dmaengine: bump initcall level to arch_initcall
dmaengine: advertise all channels on a device to dma_filter_fn
dmaengine: use idr for registering dma device numbers
dmaengine: add a release for dma class devices and dependent infrastructure
ioat: do not perform removal actions at shutdown
iop-adma: enable module removal
iop-adma: kill debug BUG_ON
iop-adma: let devm do its job, don't duplicate free
dmaengine: kill enum dma_state_client
dmaengine: remove 'bigref' infrastructure
dmaengine: kill struct dma_client and supporting infrastructure
dmaengine: replace dma_async_client_register with dmaengine_get
atmel-mci: convert to dma_request_channel and down-level dma_slave
dmatest: convert to dma_request_channel
dmaengine: introduce dma_request_channel and private channels
net_dma: convert to dma_find_channel
dmaengine: provide a common 'issue_pending_all' implementation
dmaengine: centralize channel allocation, introduce dma_find_channel
dmaengine: up-level reference counting to the module level
...
Reported by Randy Dunlap from a warning in the v2.6.29 merge window
tree as of 2009/1/8.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GRO support for IPv6. IPv6 GRO supports extension
headers in the same way as GSO (by using the same infrastructure).
It's also simpler compared to IPv4 since we no longer have to worry
about fragmentation attributes or header checksums.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Definitions for the user/kernel API protocol through generic
netlink. User space can copy it verbatim and use it.
Kernel API definition declares the main data types and calls for the
drivers to integrate into the WiMAX stack. Provides usage
documentation.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Simply, if a client wants any dmaengine channel then prevent all dmaengine
modules from being removed. Once the clients are done re-enable module
removal.
Why?, beyond reducing complication:
1/ Tracking reference counts per-transaction in an efficient manner, as
is currently done, requires a complicated scheme to avoid cache-line
bouncing effects.
2/ Per-transaction ref-counting gives the false impression that a
dma-driver can be gracefully removed ahead of its user (net, md, or
dma-slave)
3/ None of the in-tree dma-drivers talk to hot pluggable hardware, but
if such an engine were built one day we still would not need to notify
clients of remove events. The driver can simply return NULL to a
->prep() request, something that is much easier for a client to handle.
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Thanks to excellent diagnosis by Eduard Guzovsky.
The core problem is that on a network with lots of active
multicast traffic, the neighbour cache can fill up. If
we try to allocate a new route and thus neighbour cache
entry, the bog-standard GC attempt the neighbour layer does
in ineffective because route entries hold a reference
to the existing neighbour entries and GC can only liberate
entries with no references.
IPV4 already has a way to handle this, by doing a route cache
GC in such situations (when neigh attach returns -ENOBUFS).
So simply mimick this on the ipv6 side.
Tested-by: Eduard Guzovsky <eguzovsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the NetLabel kernel API to expose the new features added in kernel
releases 2.6.25 and 2.6.28: the static/fallback label functionality and network
address based selectors.
Signed-off-by: Paul Moore <paul.moore@hp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
See commit 1045b03e07 ("netlink: fix
overrun in attribute iteration") for a detailed explanation of why
this patch is necessary.
In short, nlmsg_next() can make "remaining" go negative, and the
remaining >= sizeof(...) comparison will promote "remaining" to an
unsigned type, which means that the expression will evaluate to
true for negative numbers, even though it was not intended.
I put "theoretical" in the title because I have no evidence that
this can actually happen, but I suspect that a crafted netlink
packet can trigger some badness.
Note that the last test, which seemingly has the exact same
problem (also true for nla_ok()), is perfectly OK, since we
already know that remaining is positive.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket
extensions API draft.
8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER)
This option gets the current number of associations that are attached
to a one-to-many style socket. The option value is an uint32_t.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a locking free version of iucv_message_receive and iucv_message_send
that do not call local_bh_enable in a spin_lock_(bh|irqsave)() context.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>