Commit Graph

199942 Commits

Author SHA1 Message Date
Herbert Xu d5f31fbfd8 netpoll: Use correct primitives for RCU dereferencing
Now that RCU debugging checks for matching rcu_dereference calls
and rcu_read_lock, we need to use the correct primitives or face
nasty warnings.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:44:29 -07:00
Herbert Xu 9f70b0fcec bridge: Add const to dummy br_netpoll_send_skb
The version of br_netpoll_send_skb used when netpoll is off is
missing a const thus causing a warning.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 21:43:48 -07:00
Eric Dumazet 5933dd2f02 net: NET_SKB_PAD should depend on L1_CACHE_BYTES
In old kernels, NET_SKB_PAD was defined to 16.

Then commit d6301d3dd1 (net: Increase default NET_SKB_PAD to 32), and
commit 18e8c134f4 (net: Increase NET_SKB_PAD to 64 bytes) increased it
to 64.

While first patch was governed by network stack needs, second was more
driven by performance issues on current hardware. Real intent was to
align data on a cache line boundary.

So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic.

Remove microblaze and powerpc own NET_SKB_PAD definitions.

Thanks to Alexander Duyck and David Miller for their comments.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:16:43 -07:00
Eric Dumazet a95d8c88be ipfrag : frag_kfree_skb() cleanup
Third param (work) is unused, remove it.

Remove __inline__ and inline qualifiers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:12:44 -07:00
Eric Dumazet d27f9b3582 ip_frag: Remove some atomic ops
Instead of doing one atomic operation per frag, we can factorize them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:12:44 -07:00
Florian Westphal 2bbdf389a9 ipv6: syncookies: do not skip ->iif initialization
When syncookies are in effect, req->iif is left uninitialized.
In case of e.g. link-local addresses the route lookup then fails
and no syn-ack is sent.

Rearrange things so ->iif is also initialized in the syncookie case.

want_cookie can only be true when the isn was zero, thus move the want_cookie
check into the "!isn" branch.

Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 18:10:29 -07:00
Ben Hutchings 82695d9b18 net: Fix error in comment on net_device_ops::ndo_get_stats
ndo_get_stats still returns struct net_device_stats *; there is
no struct net_device_stats64.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 15:08:48 -07:00
Sonic Zhang 4fcc3d3409 netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
SKBs hold onto resources that can't be held indefinitely, such as TCP
socket references and netfilter conntrack state.  So if a packet is left
in TX ring for a long time, there might be a TCP socket that cannot be
closed and freed up.

Current blackfin EMAC driver always reclaim and free used tx skbs in future
transfers. The problem is that future transfer may not come as soon as
possible. This patch start a timer after transfer to reclaim and free skb.
There is nearly no performance drop with this patch.

TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC.
If EMAC TX transfer control is turned on, endless TX interrupts are triggered
no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically,
TX transfer control can't be turned off in the middle. The only way is to disable
TX interrupt completely.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 15:04:10 -07:00
Eric Dumazet aa1039e73c inetpeer: RCU conversion
inetpeer currently uses an AVL tree protected by an rwlock.

It's possible to make most lookups use RCU

1) Add a struct rcu_head to struct inet_peer

2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().

3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.

4) add an smp_wmb() in link_to_pool() right before node insert.

5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.

6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.

7) inet_getpeer() first attempts lockless lookup.
   Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
   If this attemps fails, lock is taken a regular lookup is performed
again.

8) convert peers.lock from rwlock to a spinlock

9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:38 -07:00
Michael Chan 7b34a4644b cnic: Fix cnic_cm_abort() error handling.
Fix the code that handles the error case when cnic_cm_abort() cannot
proceed normally.  We cannot just set the csk->state and we must
go through cnic_ready_to_close() to handle all the conditions.  We
also add error return code in cnic_cm_abort().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:37 -07:00
Michael Chan 943189f1d5 cnic: Refactor and fix cnic_ready_to_close().
Combine RESET_RECEIVED and RESET_COMP logic and fix race condition
between these 2 events and cnic_cm_close().  In particular, we need
to (test_and_clear_bit(SK_F_OFFLD_COMPLETE, &csk->flags)) before we
update csk->state.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:37 -07:00
Michael Chan a1e621bf6d cnic: Refactor code in cnic_cm_process_kcqe().
Move chip-specific code to the respective chip's ->close_conn() functions
for better code organization.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:36 -07:00
Michael Chan ed99daa5a0 cnic: Return error code in cnic_cm_close() if unsuccessful.
So that bnx2i can handle the error condition immediately and not have to
wait for timeout.

Signed-off-by: Michael Chan <mchan@broadcom.com.
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:36 -07:00
Alexander Duyck 2850062af1 ixgbe: update set_rx_mode to fix issues w/ macvlan
This change corrects issues where macvlan was not correctly triggering
promiscuous mode on ixgbe due to the filters not being correctly set.  It
also corrects the fact that VF rar filters were being overwritten when the
PF was reset.

CC: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 14:23:35 -07:00
David S. Miller 16fb62b6b4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-06-15 13:49:24 -07:00
Changli Gao a3433f35a5 tcp: unify tcp flag macros
unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
with the corresponding TCPHDR_*.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/tcp.h                      |   24 ++++++-------
 net/ipv4/tcp.c                         |    8 ++--
 net/ipv4/tcp_input.c                   |    2 -
 net/ipv4/tcp_output.c                  |   59 ++++++++++++++++-----------------
 net/netfilter/nf_conntrack_proto_tcp.c |   32 ++++++-----------
 net/netfilter/xt_TCPMSS.c              |    4 --
 6 files changed, 58 insertions(+), 71 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:56:19 -07:00
Jiri Pirko f350a0a873 bridge: use rx_handler_data pointer to store net_bridge_port pointer
Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:48:58 -07:00
Jiri Pirko a35e2c1b6d macvlan: use rx_handler_data pointer to store macvlan_port pointer V2
Register macvlan_port pointer as rx_handler data pointer. As macvlan_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a macvlan port.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:47:12 -07:00
Jiri Pirko 93e2c32b5c net: add rx_handler data pointer
Add possibility to register rx_handler data pointer along with a rx_handler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:47:11 -07:00
Herbert Xu 91d2c34a4e bridge: Fix netpoll support
There are multiple problems with the newly added netpoll support:

1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.

This patch fixes all of these problems.  In particular, we now
allocate proper netpoll structures for each underlying device.

We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll.  Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).

This allows us to do away with the npinfo juggling that caused
problem number 1.

Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.

Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 11:00:40 -07:00
Herbert Xu c18370f5b2 netpoll: Add netpoll_tx_running
This patch adds the helper netpoll_tx_running for use within
ndo_start_xmit.  It returns non-zero if ndo_start_xmit is being
invoked by netpoll, and zero otherwise.

This is currently implemented by simply looking at the hardirq
count.  This is because for all non-netpoll uses of ndo_start_xmit,
IRQs must be enabled while netpoll always disables IRQs before
calling ndo_start_xmit.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:40 -07:00
Herbert Xu 8fdd95ec16 netpoll: Allow netpoll_setup/cleanup recursion
This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.

They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:40 -07:00
Herbert Xu 4247e161b1 netpoll: Add ndo_netpoll_setup
This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:39 -07:00
Herbert Xu dbaa154178 netpoll: Add locking for netpoll_setup/cleanup
As it stands, netpoll_setup and netpoll_cleanup have no locking
protection whatsoever.  So chaos ensures if two entities try to
perform them on the same device.

This patch adds RTNL to the equation.  The code has been rearranged so
that bits that do not need RTNL protection are now moved to the top of
netpoll_setup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:39 -07:00
Herbert Xu de85d99eb7 netpoll: Fix RCU usage
The use of RCU in netpoll is incorrect in a number of places:

1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.

This patch fixes those issues.  As most users are in BH context,
this also converts the RCU usage to the BH variant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-15 10:58:38 -07:00