* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
cls_cgroup: Fix oops when user send improperly 'tc filter add' request
r8169: fix crash when large packets are received
I found a bug in cls_cgroup_change() in cls_cgroup.c.
cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly,
but tc in iproute2-2.6.29-1 (which I used) didn't set it.
In the current source code of tc in git, it set tca[TCA_OPTIONS].
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
If we always use a newest iproute2 in git when we use cls_cgroup,
we don't face this oops probably.
But I think, kernel shouldn't panic regardless of use program's behaviour.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.
When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.
In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
svcrdma: dma unmap the correct length for the RPCRDMA header page.
nfsd: Revert "svcrpc: take advantage of tcp autotuning"
nfsd: fix hung up of nfs client while sync write data to nfs server
The svcrdma module was incorrectly unmapping the RPCRDMA header page.
On IBM pserver systems this causes a resource leak that results in
running out of bus address space (10 cthon iterations will reproduce it).
The code was mapping the full page but only unmapping the actual header
length. The fix is to only map the header length.
I also cleaned up the use of ib_dma_map_page() calls since the unmap
logic always uses ib_dma_unmap_single(). I made these symmetrical.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reverts commit 47a14ef1af "svcrpc:
take advantage of tcp autotuning", which uncovered some further problems
in the server rpc code, causing significant performance regressions in
common cases.
We will likely reinstate this patch after releasing 2.6.30 and applying
some work on the underlying fixes to the problem (developed by Trond).
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This problem was introduced in 72961ecf84
since no space was reserved for the new attributes NFULA_HWTYPE,
NFULA_HWLEN and NFULA_HWHEADER.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The function dl_seq_show() returns 1 (equal to SEQ_SKIP) in case
a seq_printf() call return -1. It should return -1.
This SEQ_SKIP behavior brakes processing the proc file e.g. via a
pipe or just through less.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The calls to flush_work() are pointless in a single thread workqueue
and they are actually causing a lockdep warning.
=============================================
[ INFO: possible recursive locking detected ]
2.6.30-rc6-02911-gbb803cf #16
---------------------------------------------
bluetooth/2518 is trying to acquire lock:
(bluetooth){+.+.+.}, at: [<c0130c14>] flush_work+0x28/0xb0
but task is already holding lock:
(bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
other info that might help us debug this:
2 locks held by bluetooth/2518:
#0: (bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
#1: (&conn->work_del){+.+...}, at: [<c0130424>] worker_thread+0x149/0x25e
stack backtrace:
Pid: 2518, comm: bluetooth Not tainted 2.6.30-rc6-02911-gbb803cf #16
Call Trace:
[<c03d64d9>] ? printk+0xf/0x11
[<c0140d96>] __lock_acquire+0x7ce/0xb1b
[<c0141173>] lock_acquire+0x90/0xad
[<c0130c14>] ? flush_work+0x28/0xb0
[<c0130c2e>] flush_work+0x42/0xb0
[<c0130c14>] ? flush_work+0x28/0xb0
[<f8b84966>] del_conn+0x1c/0x84 [bluetooth]
[<c0130469>] worker_thread+0x18e/0x25e
[<c0130424>] ? worker_thread+0x149/0x25e
[<f8b8494a>] ? del_conn+0x0/0x84 [bluetooth]
[<c0133843>] ? autoremove_wake_function+0x0/0x33
[<c01302db>] ? worker_thread+0x0/0x25e
[<c013355a>] kthread+0x45/0x6b
[<c0133515>] ? kthread+0x0/0x6b
[<c01034a7>] kernel_thread_helper+0x7/0x10
Based on a report by Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Avoid reading the unsynchronized value cs->classid multiple times,
since it could change concurrently from non-zero to zero; this would
result in the classifier returning a positive result with a bogus
(zero) classid.
Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFSv4: Fix the case where NFSv4 renewal fails
nfs: fix build error in nfsroot with initconst
XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices
mlx4/connectX FRMR requires local write enable together with remote
rdma write enable. This fixes NFS/RDMA operation over the ConnectX
Infiniband HCA in the default memreg mode.
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases
Signed-off-by: Doug Leith <doug.leith@nuim.ie>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert L Mathews discovered that some clients send evil TCP RST segments,
which are accepted by netfilter conntrack but discarded by the
destination. Thus the conntrack entry is destroyed but the destination
retransmits data until timeout.
The same technique, i.e. sending properly crafted RST segments, can easily
be used to bypass connlimit/connbytes based restrictions (the sample
script written by Robert can be found in the netfilter mailing list
archives).
The patch below adds a new flag and new field to struct ip_ct_tcp_state so
that checking RST segments can be made more strict and thus TCP conntrack
can catch the invalid ones: the RST segment is accepted only if its
sequence number higher than or equal to the highest ack we seen from the
other direction. (The last_ack field cannot be reused because it is used
to catch resent packets.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
rxrpc_alloc_connection() doesn't return an error code on failure, it just
returns NULL. IS_ERR(NULL) is false.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems we can fix this by disabling preemption while we re-balance the
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high
loads continuesly taking a full BGP table up/down via iproute -batch.
Note. fib_trie is not updated for CONFIG_PREEMPT_RCU
Reported-by: Andrei Popa
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.
Reported-By: Vladimir Ivashchenko <hazard@francoudi.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of unspecified protocol in IPv6 initial route prevents quagga to
install IPv6 default route:
# show ipv6 route
S ::/0 [1/0] via fe80::1, eth1_0
K>* ::/0 is directly connected, lo, rej
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0
# ip -6 route
fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1
unreachable default dev lo proto none metric -1 error -101 hoplimit 255
The attached patch ensures RTPROT_KERNEL to the default initial route
and fixes the problem for quagga.
This is similar to "ipv6: protocol for address routes"
f410a1fba7.
# show ipv6 route
S>* ::/0 [1/0] via fe80::1, eth1_0
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0
# ip -6 route
fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440
hoplimit -1
fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1
default via fe80::1 dev eth1_0 proto zebra metric 1024 mtu 1500
advmss 1440 hoplimit -1
unreachable default dev lo proto kernel metric -1 error -101 hoplimit 255
Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :
begin_of_quotation
2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
patch has at least one critical flaw, and another problem.
rt_intern_hash calculates rthi pointer, which is later used for new entry
insertion. The same loop calculates cand pointer which is used to clean the
list. If the pointers are the same, rtable leak occurs, as first the cand is
removed then the new entry is appended to it.
This leak leads to unregister_netdevice problem (usage count > 0).
Another problem of the patch is that it tries to insert the entries in certain
order, to facilitate counting of entries distinct by all but QoS parameters.
Unfortunately, referencing an existing rtable entry moves it to list beginning,
to speed up further lookups, so the carefully built order is destroyed.
For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
it will also destroy the ordering.
end_of_quotation
Problematic commit is 1080d709fb
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)
Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.
A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.
Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>