Commit Graph

2514 Commits

Author SHA1 Message Date
Ilpo Järvinen
29d0a309d1 [TCP]: Fix two off-by-one errors in fackets_out adjusting logic
1) Passing wrong skb to tcp_adjust_fackets_out could corrupt
fastpath_cnt_hint as tcp_skb_pcount(next_skb) is not included
to it if hint points exactly to the next_skb (it's lagging
behind, see sacktag).

2) When fastpath_skb_hint is put backwards to avoid dangling
skb reference, the skb's pcount must also be removed from count
(not included like above).

Reported by Cedric Le Goater <legoater@free.fr>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:47 -07:00
Ilpo Järvinen
3de96471bd [TCP]: Wrap-safed reordering detection FRTO check
In case somebody has a suggestion about a better place for this
check, which must guarantee execution "early enough" (i.e,
before the wrap can occur), I'm very open to them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:54:00 -07:00
Ilpo Järvinen
0e835331e3 [TCP]: Update comment of SACK block validator
Just came across what RFC2018 states about generation of valid
SACK blocks in case of reneging. Alter comment a bit to point
out clearly.

IMHO, there isn't any reason to change code because the
validation is there for a purpose (counters will inform user
about decision TCP made if this case ever surfaces).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:59 -07:00
Ilpo Järvinen
95eacd27e2 [TCP]: fix comments that got messed up during code move
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:59 -07:00
Ilpo Järvinen
dc86967b54 [TCP]: No fackets_out/highest_sack tuning when SACK isn't enabled
This was found due to bug report from Cedric Le Goater though
it turned this turned out to be unrelated bug.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:58 -07:00
Patrick McHardy
f73e924cdd [NETFILTER]: ctnetlink: use netlink policy
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:35 -07:00
Patrick McHardy
fdf708322d [NETFILTER]: nfnetlink: rename functions containing 'nfattr'
There is no struct nfattr anymore, rename functions to 'nlattr'.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:32 -07:00
Patrick McHardy
df6fb868d6 [NETFILTER]: nfnetlink: convert to generic netlink attribute functions
Get rid of the duplicated rtnetlink macros and use the generic netlink
attribute functions. The old duplicated stuff is moved to a new header
file that exists just for userspace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:53:31 -07:00
Stephen Hemminger
3b04ddde02 [NET]: Move hardware header operations out of netdevice.
Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:52 -07:00
Stephen Hemminger
b95cce3576 [NET]: Wrap hard_header_parse
Wrap the hard_header_parse function to simplify next step of
header_ops conversion.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:51 -07:00
Stephen Hemminger
0c4e85813d [NET]: Wrap netdevice hardware header creation.
Add inline for common usage of hardware header creation, and
fix bug in IPV6 mcast where the assumption about negative return is
an errno. Negative return from hard_header means not enough space
was available,(ie -N bytes).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:50 -07:00
Eric W. Biederman
2774c7aba6 [NET]: Make the loopback device per network namespace.
This patch makes loopback_dev per network namespace.  Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.

This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working.  A later pass will be needed to
update the users to use something other than the initial network
namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:49 -07:00
Eric W. Biederman
0cc217e16c [IPV4]: When possible test for IFF_LOOPBACK and not dev == loopback_dev
Now that multiple loopback devices are becoming possible it makes
the code a little cleaner and more maintainable to test if a deivice
is th a loopback device by testing dev->flags & IFF_LOOPBACK instead
of dev == loopback_dev.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:48 -07:00
Eric W. Biederman
5967789dbc [IPV4]: Remove unnecessary test for the loopback device from inetdev_destroy
Currently we never call unregister_netdev for the loopback device so
it is impossible for us to reach inetdev_destroy with the loopback
device.  So the test in inetdev_destroy is unnecessary.

Further when testing with my network namespace patches removing
unregistering the loopback device and calling inetdev_destroy works
fine so there appears to be no reason for avoiding unregistering the
loopback device.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:48 -07:00
Ilpo Järvinen
912d8f0b1f [TCP] MIB: Count FRTO's successfully detected spurious RTOs
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:39 -07:00
Ilpo Järvinen
93e6802029 [TCP]: Reordered ACK's (old) SACKs not included to discarded MIB
In case of ACK reordering, the SACK block might be valid in it's
time but is already obsoleted since we've received another kind
of confirmation about arrival of the segments through snd_una
advancement of an earlier packet.

I didn't bother to build distinguishing of valid and invalid
SACK blocks but simply made reordered SACK blocks that are too
old always not counted regardless of their "real" validity which
could be determined by using the ack field of the reordered
packet (won't be significant IMHO).

DSACKs can very well be considered useful even in this situation,
so won't do any of this for them.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:38 -07:00
Ilpo Järvinen
a6963a6b3d [TCP]: Re-place highest_sack check to a more robust position
I previously added checking to position that is rather poor as
state has already been adjusted quite a bit. Re-placing it above
all state changes should be more robust though the return should
never ever get executed regardless of its place :-).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:38 -07:00
Daniel Lezcano
de3cb747ff [NET]: Dynamically allocate the loopback device, part 1.
This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-By: Kirill Korotaev <dev@sw.ru>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:14 -07:00
Ilpo Järvinen
b76892051c [TCP]: Avoid clearing sacktag hint in trivial situations
There's no reason to clear the sacktag skb hint when small part
of the rexmit queue changes. Account changes (if any) instead when
fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so
no need to discard SACK tag hint at all.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:12 -07:00
Ilpo Järvinen
c96fd3d461 [TCP]: Enable SACK enhanced FRTO (RFC4138) by default
Most of the description that follows comes from my mail to
netdev (some editing done):

Main obstacle to FRTO use is its deployment as it has to be on
the sender side where as wireless link is often the receiver's
access link. Take initiative on behalf of unlucky receivers and
enable it by default in future Linux TCP senders. Also IETF
seems to interested in advancing FRTO from experimental [1].

How does FRTO help?
===================

FRTO detects spurious RTOs and avoids a number of unnecessary
retransmissions and a couple of other problems that can arise
due to incorrect guess made at RTO (i.e., that segments were
lost when they actually got delayed which is likely to occur
e.g. in wireless environments with link-layer retransmission).
Though FRTO cannot prevent the first (potentially unnecessary)
retransmission at RTO, I suspect that it won't cost that much
even if you have to pay for each bit (won't be that high
percentage out of all packets after all :-)). However, usually
when you have a spurious RTO, not only the first segment
unnecessarily retransmitted but the *whole window*. It goes like
this: all cumulative ACKs got delayed due to in-order delivery,
then TCP will actually send 1.5*original cwnd worth of data in
the RTO's slow-start when the delayed ACKs arrive (basically the
original cwnd worth of it unnecessarily). In case one is
interested in minimizing unnecessary retransmissions e.g. due to
cost, those rexmissions must never see daylight. Besides, in the
worst case the generated burst overloads the bottleneck buffers
which is likely to significantly delay the further progress of
the flow. In case of ll rexmissions, ACK compression often
occurs at the same time making the burst very "sharp edged" (in
that case TCP often loses most of the segments above high_seq
=> very bad performance too). When FRTO is enabled, those
unnecessary retransmissions are fully avoided except for the
first segment and the cwnd behavior after detected spurious RTO
is determined by the response (one can tune that by sysctl).

Basic version (non-SACK enhanced one), FRTO can fail to detect
spurious RTO as spurious and falls back to conservative
behavior. ACK lossage is much less significant than reordering,
usually the FRTO can detect spurious RTO if at least 2
cumulative ACKs from original window are preserved (excluding
the ACK that advances to high_seq). With SACK-enhanced version,
the detection is quite robust.

FRTO should remove the need to set a high lower bound for the
RTO estimator due to delay spikes that occur relatively common
in some environments (esp. in wireless/cellular ones).

[1] http://www1.ietf.org/mail-archive/web/tcpm/current/msg02862.html

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:12 -07:00
Ilpo Järvinen
009a2e3e4e [TCP] FRTO: Improve interoperability with other undo_marker users
Basically this change enables it, previously other undo_marker
users were left with nothing. Reverse undo_marker logic
completely to get it set right in CA_Loss. On the other hand,
when spurious RTO is detected, clear it. Clearing might be too
heavy for some scenarios but seems safe enough starting point
for now and shouldn't have much effect except in majority of
cases (if in any).

By adding a new FLAG_ we avoid looping through write_queue when
RTO occurs.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:11 -07:00
Ilpo Järvinen
7c46a03e67 [TCP]: Cleanup tcp_tso_acked and tcp_clean_rtx_queue
Implements following cleanups:
- Comment re-placement (CodingStyle)
- tcp_tso_acked() local (wrapper-like) variable removal
  (readability)
- __-types removed (IMHO they make local variables jumpy looking
  and just was space)
- acked -> flag (naming conventions elsewhere in TCP code)
- linebreak adjustments (readability)
- nested if()s combined (reduced indentation)
- clarifying newlines added

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:10 -07:00
Ilpo Järvinen
13fcf850cc [TCP]: Move accounting from tso_acked to clean_rtx_queue
The accounting code is pretty much the same, so it's a shame
we do it in two places.

I'm not too sure if added fully_acked check in MTU probing is
really what we want perhaps the added end_seq could be used in
the after() comparison.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:09 -07:00
Ilpo Järvinen
5af4ec236f [TCP]: clear_all_retrans_hints prefixed by tcp_
In addition, fix its function comment spacing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
2007-10-10 16:52:09 -07:00
Ilpo Järvinen
91fed7a15c [TCP]: Make fackets_out accurate
Substraction for fackets_out is unconditional when snd_una
advances, thus there's no need to do it inside the loop. Just
make sure correct bounds are honored.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:52:08 -07:00