- Do not create expectation when forwarding the PORT
command to avoid blocking the connection. The problem is that
nf_conntrack_ftp.c:help() tries to create the same expectation later in
POST_ROUTING and drops the packet with "dropping packet" message after
failure in nf_ct_expect_related.
- Change ip_vs_update_conntrack to alter the conntrack
for related connections from real server. If we do not alter the reply in
this direction the next packet from client sent to vport 20 comes as NEW
connection. We alter it but may be some collision happens for both
conntracks and the second conntrack gets destroyed immediately. The
connection stucks too.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix Passive FTP problem in ip_vs_ftp:
- Do not oops in nf_nat_set_seq_adjust (adjust_tcp_sequence) when
iptable_nat module is not loaded
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This removes duplicate code by providing a default implementation
which is used by 3 of the 4 modules that provide these call.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
IPVS was merged into the kernel quite a long time ago and
has been seeing wide-spread production use for even longer.
It seems appropriate for it to be no longer tagged as EXPERIMENTAL
Signed-off-as: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Use nf_conntrack/nf_nat code to do the packet mangling and the TCP
sequence adjusting. The function 'ip_vs_skb_replace' is now dead
code, so it is removed.
To SNAT FTP, use something like:
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
--vport 21 -j SNAT --to-source 192.168.10.10
and for the data connections in passive mode:
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 \
--vportctl 21 -j SNAT --to-source 192.168.10.10
using '-m state --state RELATED' would also works.
Make sure the kernel modules ip_vs_ftp, nf_conntrack_ftp, and
nf_nat_ftp are loaded.
[ up-port and minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Update the nf_conntrack tuple in reply direction, as we will see
traffic from the real server (RIP) to the client (CIP). Once this is
done we can use netfilters SNAT in POSTROUTING, especially with
xt_ipvs, to do source NAT, e.g.:
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 80 \
-j SNAT --to-source 192.168.10.10
[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This implements the kernel-space side of the netfilter matcher xt_ipvs.
[ minor fixes by Simon Horman <horms@verge.net.au> ]
Signed-off-by: Hannes Eder <heder@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
[ Patrick: added xt_ipvs.h to Kbuild ]
Signed-off-by: Patrick McHardy <kaber@trash.net>
lvs sctp protocol handler is incorrectly invoked ip_vs_app_pkt_out
Since there's no sctp helpers at present, it does the same thing as
ip_vs_app_pkt_in.
Signed-off-by: Xiaoyu Du <tingsrain@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
IP_VS_PROTO_AH_ESP should be set iff either of IP_VS_PROTO_{AH,ESP} is
selected. Express this with standard kconfig syntax.
Signed-off-by: Michal Marek <mmarek@suse.cz>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Allow one-packet scheduling for UDP connections. When the fwmark-based or
normal virtual service is marked with '-o' or '--ops' options all
connections are created only to schedule one packet. Useful to schedule UDP
packets from same client port to different real servers. Recommended with
RR or WRR schedulers (the connections are not visible with ipvsadm -L).
Signed-off-by: Nick Chalk <nick@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
remove useless union keyword in rtable, rt6_info and dn_route.
Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code that hashes and unhashes connections from the connection table
is missing locking of the connection being modified, which opens up a
race condition and results in memory corruption when this race condition
is hit.
Here is what happens in pretty verbose form:
CPU 0 CPU 1
------------ ------------
An active connection is terminated and
we schedule ip_vs_conn_expire() on this
CPU to expire this connection.
IRQ assignment is changed to this CPU,
but the expire timer stays scheduled on
the other CPU.
New connection from same ip:port comes
in right before the timer expires, we
find the inactive connection in our
connection table and get a reference to
it. We proper lock the connection in
tcp_state_transition() and read the
connection flags in set_tcp_state().
ip_vs_conn_expire() gets called, we
unhash the connection from our
connection table and remove the hashed
flag in ip_vs_conn_unhash(), without
proper locking!
While still holding proper locks we
write the connection flags in
set_tcp_state() and this sets the hashed
flag again.
ip_vs_conn_expire() fails to expire the
connection, because the other CPU has
incremented the reference count. We try
to re-insert the connection into our
connection table, but this fails in
ip_vs_conn_hash(), because the hashed
flag has been set by the other CPU. We
re-schedule execution of
ip_vs_conn_expire(). Now this connection
has the hashed flag set, but isn't
actually hashed in our connection table
and has a dangling list_head.
We drop the reference we held on the
connection and schedule the expire timer
for timeouting the connection on this
CPU. Further packets won't be able to
find this connection in our connection
table.
ip_vs_conn_expire() gets called again,
we think it's already hashed, but the
list_head is dangling and while removing
the connection from our connection table
we write to the memory location where
this list_head points to.
The result will probably be a kernel oops at some other point in time.
This race condition is pretty subtle, but it can be triggered remotely.
It needs the IRQ assignment change or another circumstance where packets
coming from the same ip:port for the same service are being processed on
different CPUs. And it involves hitting the exact time at which
ip_vs_conn_expire() gets called. It can be avoided by making sure that
all packets from one connection are always processed on the same CPU and
can be made harder to exploit by changing the connection timeouts to
some custom values.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Define a new function to return the waitqueue of a "struct sock".
static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
return sk->sk_sleep;
}
Change all read occurrences of sk_sleep by a call to this function.
Needed for a future RCU conversion. sk_sleep wont be a field directly
available.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When protocols use very long names, the sprintf calls might overflow
the on-stack buffer. No protocol in the kernel does this however.
Print the protocol name in the pr_debug statement directly to avoid
this.
Based on patch by Zhitong Wang <zhitong.wangzt@alibaba-inc.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
NIPQUAD has very few uses left.
Remove this use and make the code have the identical form of the only
other use of "%u,%u,%u,%u,%u,%u" in net/ipv4/netfilter/nf_nat_ftp.c
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>