The RCU callback xt_rateest_free_rcu() just calls kfree(), so we can
use kfree_rcu() instead of call_rcu(). This also allows us to dispense
with an rcu_barrier() call, speeding up unloading of this module.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Patrick McHardy <kaber@trash.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe it
net: refine {udp|tcp|sctp}_mem limits
vmxnet3: round down # of queues to power of two
net: sh_eth: fix the parameter for the ETHER of SH7757
net: sh_eth: fix cannot work half-duplex mode
net: vlan: enable soft features regardless of underlying device
vmxnet3: fix starving rx ring whenoc_skb kb fails
bridge: Always flood broadcast packets
greth: greth_set_mac_add would corrupt the MAC address.
net: bind() fix error return on wrong address family
natsemi: silence dma-debug warnings
net: 8139too: Initial necessary vlan_features to support vlan
Fix call trace when interrupts are disabled while sleeping function kzalloc is called
qlge:Version change to v1.00.00.29
qlge: Fix printk priority so chip fatal errors are always reported.
qlge:Fix crash caused by mailbox execution on wedged chip.
xfrm4: Don't call icmp_send on local error
ipv4: Don't use ufo handling on later transformed packets
xfrm: Remove family arg from xfrm_bundle_ok
ipv6: Don't put artificial limit on routing table size.
...
We forgot to send up SCTP_SENDER_DRY_EVENT notification when
user app subscribes to this event, and there is no data to be
sent or retransmit.
This is required by the Socket API and used by the DTLS/SCTP
implementation.
Reported-by: Michael Tüxen <Michael.Tuexen@lurchi.franken.de>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Tested-by: Robin Seggelmann <seggelmann@fh-muenster.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current tcp/udp/sctp global memory limits are not taking into account
hugepages allocations, and allow 50% of ram to be used by buffers of a
single protocol [ not counting space used by sockets / inodes ...]
Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
per protocol, and a minimum of 128 pages.
Heavy duty machines sysadmins probably need to tweak limits anyway.
References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032
Reported-by: starlight <starlight@binnacle.cx>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If gso/gro feature of underlying device is turned off,
then new created vlan device never can turn gso/gro on.
Although underlying device don't support TSO, we still
should use software segments for vlan device.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As is_multicast_ether_addr returns true on broadcast packets as
well, we need to explicitly exclude broadcast packets so that
they're always flooded. This wasn't an issue before as broadcast
packets were considered to be an unregistered multicast group,
which were always flooded. However, as we now only flood such
packets to router ports, this is no longer acceptable.
Reported-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: fix sync and dio writes across stripe boundaries
libceph: fix page calculation for non-page-aligned io
ceph: fix page alignment corrections
Hi,
Reinhard Max also pointed out that the error should EAFNOSUPPORT according
to POSIX.
The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use
EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN.
Other protocols error values in their af bind() methods in current mainline git as far
as a brief look shows:
EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc
EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25,
No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip
Ciao, Marcus
Signed-off-by: Marcus Meissner <meissner@suse.de>
Cc: Reinhard Max <max@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling icmp_send() on a local message size error leads to
an incorrect update of the path mtu. So use ip_local_error()
instead to notify the socket about the error.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We might call ip_ufo_append_data() for packets that will be IPsec
transformed later. This function should be used just for real
udp packets. So we check for rt->dst.header_len which is only
nonzero on IPsec handling and call ip_ufo_append_data() just
if rt->dst.header_len is zero.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPV6, unlike IPV4, doesn't have a routing cache.
Routing table entries, as well as clones made in response
to route lookup requests, all live in the same table. And
all of these things are together collected in the destination
cache table for ipv6.
This means that routing table entries count against the garbage
collection limits, even though such entries cannot ever be reclaimed
and are added explicitly by the administrator (rather than being
created in response to lookups).
Therefore it makes no sense to count ipv6 routing table entries
against the GC limits.
Add a DST_NOCOUNT destination cache entry flag, and skip the counting
if it is set. Use this flag bit in ipv6 when adding routing table
entries.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
usbnet: Remove over-broad module alias from zaurus.
MAINTAINERS: drop Michael from bfin_mac driver
net/can: activate bit-timing calculation and netlink based drivers by default
rionet: fix NULL pointer dereference in rionet_remove
net+crypto: Use vmalloc for zlib inflate buffers.
netfilter: Fix ip_route_me_harder triggering ip_rt_bug
ipv4: Fix IPsec slowpath fragmentation problem
ipv4: Fix packet size calculation in __ip_append_data
cxgb3: skb_record_rx_queue now records the queue index relative to the net_device.
bridge: Only flood unregistered groups to routers
qlge: Add maintainer.
MAINTAINERS: mark socketcan-core lists as subscribers-only
MAINTAINERS: Remove Sven Eckelmann from BATMAN ADVANCED
r8169: fix wrong register use.
net/usb/kalmia: signedness bug in kalmia_bind()
net/usb: kalmia: Various fixes for better support of non-x86 architectures.
rtl8192cu: Fix missing firmware load
udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet
ipv6/udp: Use the correct variable to determine non-blocking condition
netconsole: fix build when CONFIG_NETCONSOLE_DYNAMIC is turned on
...
Avoid creating input routes with ip_route_me_harder.
It does not work for locally generated packets. Instead,
restrict sockets to provide valid saddr for output route (or
unicast saddr for transparent proxy). For other traffic
allow saddr to be unicast or local but if callers forget
to check saddr type use 0 for the output route.
The resulting handling should be:
- REJECT TCP:
- in INPUT we can provide addr_type = RTN_LOCAL but
better allow rejecting traffic delivered with
local route (no IP address => use RTN_UNSPEC to
allow also RTN_UNICAST).
- FORWARD: RTN_UNSPEC => allow RTN_LOCAL/RTN_UNICAST
saddr, add fix to ignore RTN_BROADCAST and RTN_MULTICAST
- OUTPUT: RTN_UNSPEC
- NAT, mangle, ip_queue, nf_ip_reroute: RTN_UNSPEC in LOCAL_OUT
- IPVS:
- use RTN_LOCAL in LOCAL_OUT and FORWARD after SNAT
to restrict saddr to be local
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Git commit 59104f06 (ip: take care of last fragment in ip_append_data)
added a check to see if we exceed the mtu when we add trailer_len.
However, the mtu is already subtracted by the trailer length when the
xfrm transfomation bundles are set up. So IPsec packets with mtu
size get fragmented, or if the DF bit is set the packets will not
be send even though they match the mtu perfectly fine. This patch
actually reverts commit 59104f06.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes when reporting a MIC failure rx->key may be unset. This
code path is hit when receiving a packet meant for a multicast
address, and decryption is performed in HW.
Fortunately, the failing key_idx is not used for anything up to
(and including) usermode, so we allow ourselves to drop it on the
way up when a key cannot be retrieved.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The bridge currently floods packets to groups that we have never
seen before to all ports. This is not required by RFC4541 and
in fact it is not desirable in environment where traffic to
unregistered group is always present.
This patch changes the behaviour so that we only send traffic
to unregistered groups to ports marked as routers.
The user can always force flooding behaviour to any given port
by marking it as a router.
Note that this change does not apply to traffic to 224.0.0.X
as traffic to those groups must always be flooded to all ports.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider this scenario: When the size of the first received udp packet
is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags.
However, if checksum error happens and this is a blocking socket, it will
goto try_again loop to receive the next packet. But if the size of the
next udp packet is smaller than receive buffer, MSG_TRUNC flag should not
be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before
receive the next packet, MSG_TRUNC is still set, which is wrong.
Fix this problem by clearing MSG_TRUNC flag when starting over for a
new packet.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
udpv6_recvmsg() function is not using the correct variable to determine
whether or not the socket is in non-blocking operation, this will lead
to unexpected behavior when a UDP checksum error occurs.
Consider a non-blocking udp receive scenario: when udpv6_recvmsg() is
called by sock_common_recvmsg(), MSG_DONTWAIT bit of flags variable in
udpv6_recvmsg() is cleared by "flags & ~MSG_DONTWAIT" in this call:
err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
i.e. with udpv6_recvmsg() getting these values:
int noblock = flags & MSG_DONTWAIT
int flags = flags & ~MSG_DONTWAIT
So, when udp checksum error occurs, the execution will go to
csum_copy_err, and then the problem happens:
csum_copy_err:
...............
if (flags & MSG_DONTWAIT)
return -EAGAIN;
goto try_again;
...............
But it will always go to try_again as MSG_DONTWAIT has been cleared
from flags at call time -- only noblock contains the original value
of MSG_DONTWAIT, so the test should be:
if (noblock)
return -EAGAIN;
This is also consistent with what the ipv4/udp code does.
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix decode_secinfo_maxsz
NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
NFSv4.1: Fix some issues with pnfs_generic_pg_test
NFSv4.1: file layout must consider pg_bsize for coalescing
pnfs-obj: No longer needed to take an extra ref at add_device
SUNRPC: Ensure the RPC client only quits on fatal signals
NFSv4: Fix a readdir regression
nfs4.1: mark layout as bad on error path in _pnfs_return_layout
nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
NFS: (d)printks should use %zd for ssize_t arguments
NFSv4.1: fix break condition in pnfs_find_lseg
nfs4.1: fix several problems with _pnfs_return_layout
NFSv4.1: allow zero fh array in filelayout decode layout
NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
NFSv4.1: Fix a refcounting issue in the pNFS device id cache
NFSv4.1: deprecate headerpadsz in CREATE_SESSION
NFS41: do not update isize if inode needs layoutcommit
NLM: Don't hang forever on NLM unlock requests
NFS: fix umount of pnfs filesystems