This fix triggering the WARN_ON_ONCE(in_irq() || irqs_disabled()); in
local_bh_enable().
Here is no need to grab this lock, this was wrong at all and may
cause a deadlock and access to freed memory, since on a TEI remove
the current listelement can be deleted under us. So this is clearly
a case for list_for_each_entry_safe.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
If we get no interrupts for after 3 resets we need to unregister
the interrupt function, which is already done outside the loop.
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Some users still load bond module multiple times to create bonding
devices. This accidentally was broken by a later patch about
the time sysfs was fixed. According to Jay, it was broken
by:
commit b8a9787edd
Author: Jay Vosburgh <fubar@us.ibm.com>
Date: Fri Jun 13 18:12:04 2008 -0700
bonding: Allow setting max_bonds to zero
Note: sysfs and procfs still produce WARN() messages when this is done
so the sysfs method is the recommended API.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code errors out the INCOMPLETE neigh entry skb queue only from
the timer if maximum probes have been attempted and there has been no reply.
This also causes the transtion to FAILED state.
However, the neigh entry can be also updated via Netlink to inform that the
address is unavailable. Currently, neigh_update() just stops the timers and
leaves the pending skb's unreleased. This results that the clean up code in
the timer callback is never called, preventing also proper garbage collection.
This fixes neigh_update() to process the pending skb queue immediately if
INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.
This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.
We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.
We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.
As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.
skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.
Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.
Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vfree() does its own 'NULL' check, so no need for check before
calling it.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the tx queue destroy path, be_tx_q_clean() is currently called after the tx queues are freed; it must be called before.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
be_rx_compl_get() must not reset(via the valid word) the rx_compl as the rx_compl is not processed yet; it must be reset after it is processed.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rx stats are not getting updated when an rx_compl with only one frag is rcvd in non-lro path.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix netdev stat rx_errors to cover length related errors and checksum errors and rx_dropped to the pkts dropped due to lack of buffers
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use cancel_delayed_work_sycn instead of cancel_delayed_work() to reliably kill be_worker() as it rearms itself.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vfree() does its own 'NULL' check, so no need for check before
calling it.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vfree() does its own 'NULL' check, so no need for check before
calling it.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-allocate a skb at init time to be used for control messages to the HW
if skb allocation fails.
Tolerate failures to send messages initializing some memories at the cost of
parity error detection for these memories.
Retry sending connection id release messages if both alloc_skb(GFP_ATOMIC)
and alloc_skb(GFP_KERNEL) fail.
Do not bring the interface up if messages binding queue set to port fail to
be sent.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtl8169_tx_interrupt() is used from NAPI context, it can
directly free skbs. dev_kfree_skb_irq() is a leftover from
pre-NAPI times of this driver.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RX buffer rings can be comprised of non-contiguous fixed
size chunks of memory. The ring is given to the hardware
as a pointer to a location that stores the location of
the queue. If the queue is greater than 4096 bytes then
the hardware gets a list of said pointers.
This patch addes the necessary logic to generate the list if
the queue size exceeds 4096.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The alignment was on size of queue boundary, but the hardware
only requires 4-byte alignment.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a configurable Descriptor Skip Length for systems that lack cache
coherence.
(akpm: I think this should be done as a module parameter, not a
compile-tinme option)
Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>