Bug report from Steven Jan Springl:
Issuing the following command causes a kernel oops:
tc qdisc add dev eth0 handle ffff: ingress
The problem mostly stems from all of the special case handling of
ingress qdiscs.
So, to fix this, do the grafting operation the same way we do for TX
qdiscs. Which means that dev_activate() and dev_deactivate() now do
the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too.
Future simplifications are possible now, mainly because it is
impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There
are NULL checks all over to handle the ingress qdisc special case
that used to exist before this commit.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netns: fix ip_rt_frag_needed rt_is_expired
netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences
netfilter: fix double-free and use-after free
netfilter: arptables in netns for real
netfilter: ip{,6}tables_security: fix future section mismatch
selinux: use nf_register_hooks()
netfilter: ebtables: use nf_register_hooks()
Revert "pkt_sched: sch_sfq: dump a real number of flows"
qeth: use dev->ml_priv instead of dev->priv
syncookies: Make sure ECN is disabled
net: drop unused BUG_TRAP()
net: convert BUG_TRAP to generic WARN_ON
drivers/net: convert BUG_TRAP to generic WARN_ON
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
pkt_sched: sch_sfq: dump a real number of flows
atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups
netfilter: make security table depend on NETFILTER_ADVANCED
tcp: Clear probes_out more aggressively in tcp_ack().
e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call
net: Update entry in af_family_clock_key_strings
netdev: Remove warning from __netif_schedule().
sky2: don't stop queue on shutdown
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
It isn't helping anything and we aren't going to be able to change all
the drivers that do queue wakeups in strange situations.
Just letting a noop_qdisc get scheduled will work because when
qdisc_run() executes via net_tx_work() it will simply find no packets
pending when it makes the ->dequeue() call in qdisc_restart.
Signed-off-by: David S. Miller <davem@davemloft.net>
The new address list lock needs to handle the same device layering
issues that the _xmit_lock one does.
This integrates work done by Patrick McHardy.
Signed-off-by: David S. Miller <davem@davemloft.net>
The function header comments have to go with the functions
they are documenting, or things go horribly wrong when we
try to process them with the docbook tools.
Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq'
Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; '
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Dave:
This patch adds a function to get the driver name from a struct net_device,
and consequently uses this in the watchdog timeout handler to print as
part of the message.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor nit, use size_t for allocation size and kcalloc to allocate
an array. Probably makes no actual code difference.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon feedback from Eric Dumazet and Andi Kleen.
Cure several deficiencies in simple_tx_hash() by using
jhash + reciprocol multiply.
1) Eliminates expensive modulus operation.
2) Makes hash less attackable by using random seed.
3) Eliminates endianness hash distribution issues.
Signed-off-by: David S. Miller <davem@davemloft.net>
Idea is from Patrick McHardy.
Instead of managing the list of qdiscs on the device level, manage it
in the root qdisc of a netdev_queue. This solves all kinds of
visibility issues during qdisc destruction.
The way to iterate over all qdiscs of a netdev_queue is to visit
the netdev_queue->qdisc, and then traverse it's list.
The only special case is to ignore builting qdiscs at the root when
dumping or doing a qdisc_lookup(). That was not needed previously
because builtin qdiscs were not added to the device's qdisc_list.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.
Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.
Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just xor hashes over IPv4/IPv6 addresses and ports of transport.
The only assumption it makes is that skb_network_header() is set
correctly.
With bug fixes from Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().
This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.
This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.
Signed-off-by: David S. Miller <davem@davemloft.net>
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.
Non-multiqueue drivers need no changes. The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.
Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.
pktgen and netpoll required a little bit more surgery than the others.
In particular the pktgen changes, whilst functional, could be largely
improved. The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless. The thing to do is probably to
invoke fill_packet() earlier.
The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.
Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.
Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.
With IGB changes from Jeff Kirsher.
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.
Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces. This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.
Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.
Signed-off-by: David S. Miller <davem@davemloft.net>