Linux controls socket send/receive buffers using a few sysctl variables
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_max
- net.core.wmem_default
- net.ipv4.tcp_rmem
- net.ipv4.tcp_wmem
The first 4 control the default socket buffer sizes for all sockets
raw/packet/tcp/udp and also the maximum permitted socket buffer that can be
specified in setsockopt(SOL_SOCKET, SO_(RCV|SND)BUF,...).
The last two control the TCP auto-tuning limits and override the default
specified in rmem_default/wmem_default as well as the max limits.
Netstack today only implements tcp_rmem/tcp_wmem and incorrectly uses it
to limit the maximum size in setsockopt() as well as uses it for raw/udp
sockets.
This changelist introduces the other 4 and updates the udp/raw sockets to use
the newly introduced variables. The values for min/max match the current
tcp_rmem/wmem values and the default value buffers for UDP/RAW sockets is
updated to match the linux value of 212KiB up from the really low current value
of 32 KiB.
Updates #3043Fixes#3043
PiperOrigin-RevId: 318089805
... to help reduce flakes.
When waiting for an event to occur, use a timeout of 10s. When waiting
for an event to not occur, use a timeout of 1s.
Test: Ran test locally w/ run count of 1000 with and without gotsan.
PiperOrigin-RevId: 316998128
Netstack has traditionally parsed headers on-demand as a packet moves up the
stack. This is conceptually simple and convenient, but incompatible with
iptables, where headers can be inspected and mangled before even a routing
decision is made.
This changes header parsing to happen early in the incoming packet path, as soon
as the NIC gets the packet from a link endpoint. Even if an invalid packet is
found (e.g. a TCP header of insufficient length), the packet is passed up the
stack for proper stats bookkeeping.
PiperOrigin-RevId: 315179302
Historically we've been passing PacketBuffer by shallow copying through out
the stack. Right now, this is only correct as the caller would not use
PacketBuffer after passing into the next layer in netstack.
With new buffer management effort in gVisor/netstack, PacketBuffer will
own a Buffer (to be added). Internally, both PacketBuffer and Buffer may
have pointers and shallow copying shouldn't be used.
Updates #2404.
PiperOrigin-RevId: 314610879
Implement rule 7 of Source Address Selection RFC 6724 section 5. This
makes temporary (short-lived) addresses preferred over non-temporary
addresses when earlier rules are equal.
Test: stack_test.TestIPv6SourceAddressSelectionScopeAndSameAddress
PiperOrigin-RevId: 309250975
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.
The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
PiperOrigin-RevId: 308163542
These methods let users eaily break the VectorisedView abstraction, and
allowed netstack to slip into pseudo-enforcement of the "all headers are
in the first View" invariant. Removing them and replacing with PullUp(n)
breaks this reliance and will make it easier to add iptables support and
rework network buffer management.
The new View.PullUp(n) method is low cost in the common case, when when
all the headers fit in the first View.
Software GSO implementation currently has a complicated code path with
implicit assumptions that all packets to WritePackets carry same Data
and it does this to avoid allocations on the path etc. But this makes it
hard to reuse the WritePackets API.
This change breaks all such assumptions by introducing a new Vectorised
View API ReadToVV which can be used to cleanly split a VV into multiple
independent VVs. Further this change also makes packet buffers linkable
to form an intrusive list. This allows us to get rid of the array of
packet buffers that are passed in the WritePackets API call and replace
it with a list of packet buffers.
While this code does introduce some more allocations in the benchmarks
it doesn't cause any degradation.
Updates #231
PiperOrigin-RevId: 304731742
Timeouts were increased to deflake pkg/tcpip/stack:stack_x_test tests
that depend on timers. Some timeouts used previously were intended for
tests that do not depend on timers, so this change updates those
timeouts to give more time for a timer-based event to occur. This
change also de-parallelizes non-subtests to reduce the number of active
timers.
Test: bazel test //pkg/tcpip/stack:stack_x_test --runs_per_test=500
PiperOrigin-RevId: 304287622
This is a precursor to be being able to build an intrusive list
of PacketBuffers for use in queuing disciplines being implemented.
Updates #2214
PiperOrigin-RevId: 302677662
When a NIC is removed, attempt to disable the NIC first to cleanup
dynamic state and stop ongoing periodic tasks (e.g. IPv6 router
solicitations, DAD) so that a removed NIC does not attempt to send
packets.
Tests:
- stack_test.TestRemoveUnknownNIC
- stack_test.TestRemoveNIC
- stack_test.TestDADStop
- stack_test.TestCleanupNDPState
- stack_test.TestRouteWithDownNIC
- stack_test.TestStopStartSolicitingRouters
PiperOrigin-RevId: 300805857
LinkEndpoints may expect/assume that the a tcpip.PacketBuffer's Header
has enough capacity for its own headers, as per documentation for
LinkEndpoint.MaxHeaderLength.
Test: stack_test.TestNICForwarding
PiperOrigin-RevId: 300784192
This change also updates where the IP packet buffer is held in an
outbound tcpip.PacketBuffer from Header to Data. This change removes
unncessary copying of the IP packet buffer when forwarding.
Test: stack_test.TestNICForwarding
PiperOrigin-RevId: 300217972
Protocol dispatchers were previously leaked. Bypassing TIME_WAIT is required to
test this change.
Also fix a race when a socket in SYN-RCVD is closed. This is also required to
test this change.
PiperOrigin-RevId: 296922548
- Disabled NICs will have their associated NDP state cleared.
- Disabled NICs will not accept incoming packets.
- Writes through a Route with a disabled NIC will return an invalid
endpoint state error.
- stack.Stack.FindRoute will not return a route with a disabled NIC.
- NIC's Running flag will report the NIC's enabled status.
Tests:
- stack_test.TestDisableUnknownNIC
- stack_test.TestDisabledNICsNICInfoAndCheckNIC
- stack_test.TestRoutesWithDisabledNIC
- stack_test.TestRouteWritePacketWithDisabledNIC
- stack_test.TestStopStartSolicitingRouters
- stack_test.TestCleanupNDPState
- stack_test.TestAddRemoveIPv4BroadcastAddressOnNICEnableDisable
- stack_test.TestJoinLeaveAllNodesMulticastOnNICEnableDisable
PiperOrigin-RevId: 296298588
Previously, a DAD event would not be sent if DAD was disabled.
This allows integrators to do some work when an IPv6 address is bound to
a NIC without special logic that checks if DAD is enabled.
Without this change, integrators would need to check if a NIC has DAD
enabled when an address is auto-generated. If DAD is enabled, it would
need to delay the work until the DAD completion event; otherwise, it
would need to do the work in the address auto-generated event handler.
Test: stack_test.TestDADDisabled
PiperOrigin-RevId: 293732914