The pmtu informations on the inetpeer are visible for output and
input routes. On packet forwarding, we might propagate a learned
pmtu to the sender. As we update the pmtu informations of the
inetpeer on demand, the original sender of the forwarded packets
might never notice when the pmtu to that inetpeer increases.
So use the mtu of the outgoing device on packet forwarding instead
of the pmtu to the final destination.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it is, we return null as the default mtu of blackhole routes.
This may lead to a propagation of a bogus pmtu if the default_mtu
method of a blackhole route is invoked. So return dst->dev->mtu
as the default mtu instead.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Skip entries from foreign network namespaces.
Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was copy and pasted from the IPv4 code. We're calling the
ip4 version of that function and map4 is NULL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can not update iph->daddr in ip_options_rcv_srr(), It is too early.
When some exception ocurred later (eg. in ip_forward() when goto
sr_failed) we need the ip header be identical to the original one as
ICMP need it.
Add a field 'nexthop' in struct ip_options to save nexthop of LSRR
or SSRR option.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use round_jiffies_relative to align the ehea workqueue and avoid
extra wakeups.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we enable multiqueue by default the ehea driver is using
quite a lot of memory for its buffer pools. With 4 queues we
consume 64MB in the jumbo packet ring, 16MB in the medium packet
ring and 16MB in the tiny packet ring.
We should only fill the jumbo ring once the MTU is increased but
for now halve it's size so it consumes 32MB. Also reduce the tiny
packet ring, with 4 queues we had 16k entries which is overkill.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transmiting a fragmented skb, qlge fills a descriptor with the
fragment addresses, after DMA-mapping them. If there are more than eight
fragments, it will use the eighth descriptor as a pointer to an external
list. After mapping this external list, called OAL to a structure
containing more descriptors, it fills it with the extra fragments.
However, considering that systems with pages larger than 8KiB would have
less than 8 fragments, which was true before commit a715dea3c8, it
defined a macro for the OAL size as 0 in those cases.
Now, if a skb with more than 8 fragments (counting skb->data as one
fragment), this would start overwriting the list of addresses already
mapped and would make the driver fail to properly unmap the right
addresses on architectures with pages larger than 8KiB.
Besides that, the list of mappings was one size too small, since it must
have a mapping for the maxinum number of skb fragments plus one for
skb->data and another for the OAL. So, even on architectures with page
sizes 4KiB and 8KiB, a skb with the maximum number of fragments would
make the driver overwrite its counter for the number of mappings, which,
again, would make it fail to unmap the mapped DMA addresses.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When add sources to interface failure, need to roll back the sfcount[MODE]
to before state. We need to match it corresponding.
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since linux 2.6.26 (commit c6aefafb7e : Add IPv6 support to TCP SYN
cookies), we can drop a SYN packet reusing a TIME_WAIT socket.
(As a matter of fact we fail to send the SYNACK answer)
As the client resends its SYN packet after a one second timeout, we
accept it, because first packet removed the TIME_WAIT socket before
being dropped.
This probably explains why nobody ever noticed or complained.
Reported-by: Jesse Young <jlyo@jlyo.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported issues when using dev_kfree_skb() on UP systems and
systems with low numbers of cores. dev_kfree_skb_irq() will
properly save IRQ state before freeing the skb.
Tested on 3.1.1 and 3.2_rc2
Example of reproducible trace of kernel 3.1.1
------------[ cut here ]------------
WARNING: at kernel/softirq.c:159 local_bh_enable+0x32/0x79()
...
Pid: 0, comm: swapper Not tainted 3.1.1-gentoo #1
Call Trace:
[<c1022970>] warn_slowpath_common+0x65/0x7a
[<c102699e>] ? local_bh_enable+0x32/0x79
[<c1022994>] warn_slowpath_null+0xf/0x13
[<c102699e>] local_bh_enable+0x32/0x79
[<c134bfd8>] destroy_conntrack+0x7c/0x9b
[<c134890b>] nf_conntrack_destroy+0x1f/0x26
[<c132e3a6>] skb_release_head_state+0x74/0x83
[<c132e286>] __kfree_skb+0xb/0x6b
[<c132e30a>] consume_skb+0x24/0x26
[<c127c925>] b44_poll+0xaa/0x449
[<c1333ca1>] net_rx_action+0x3f/0xea
[<c1026a44>] __do_softirq+0x5f/0xd5
[<c10269e5>] ? local_bh_enable+0x79/0x79
<IRQ> [<c1026c32>] ? irq_exit+0x34/0x8d
[<c1003628>] ? do_IRQ+0x74/0x87
[<c13f5329>] ? common_interrupt+0x29/0x30
[<c1006e18>] ? default_idle+0x29/0x3e
[<c10015a7>] ? cpu_idle+0x2f/0x5d
[<c13e91c5>] ? rest_init+0x79/0x7b
[<c15c66a9>] ? start_kernel+0x297/0x29c
[<c15c60b0>] ? i386_start_kernel+0xb0/0xb7
---[ end trace 583f33bb1aa207a9 ]---
Signed-off-by: Xander Hover <LKML@hover.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Distributions are using this in their default scripts, so don't hide
them behind the advanced setting.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 72a3effaf6 ([NET]: Size listen hash tables using backlog
hint) added a bug allowing inet6_synq_hash() to return an out of bound
array index, because of u16 overflow.
Bug can happen if system admins set net.core.somaxconn &
net.ipv4.tcp_max_syn_backlog sysctls to values greater than 65536
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 4ba7d99978.
The original patch was a misguided attempt to improve performance on
some hardware that is apparently prone to spurious interrupt generation.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 23085d5796.
The original patch was a misguided attempt to improve performance on
some hardware that is apparently prone to spurious interrupt generation.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
when skb_shift, we want to shift paged data from skb to tgt frag area.
Original comments revert the shift order
Signed-off-by: Feng King <kinwin2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "tmp" variable here is used to store the result of cpu_to_le16()
so it should be an __le16 instead of an int. We want the high bits
set and the current code works on little endian systems but not on
big endian systems.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>