Robert L Mathews discovered that some clients send evil TCP RST segments,
which are accepted by netfilter conntrack but discarded by the
destination. Thus the conntrack entry is destroyed but the destination
retransmits data until timeout.
The same technique, i.e. sending properly crafted RST segments, can easily
be used to bypass connlimit/connbytes based restrictions (the sample
script written by Robert can be found in the netfilter mailing list
archives).
The patch below adds a new flag and new field to struct ip_ct_tcp_state so
that checking RST segments can be made more strict and thus TCP conntrack
can catch the invalid ones: the RST segment is accepted only if its
sequence number higher than or equal to the highest ack we seen from the
other direction. (The last_ack field cannot be reused because it is used
to catch resent packets.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes the wrong message type that are triggered by
user updates, the following commands:
(term1)# conntrack -I -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state LISTEN
(term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_SENT
(term1)# conntrack -U -p tcp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 --state SYN_RECV
only trigger event message of type NEW, when only the first is NEW
while others should be UPDATE.
(term2)# conntrack -E
[NEW] tcp 6 10 LISTEN src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
[NEW] tcp 6 10 SYN_SENT src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
[NEW] tcp 6 10 SYN_RECV src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0
This patch also removes IPCT_REFRESH from the bitmask since it is
not of any use.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes a problem when you use 32 nodes in the cluster
match:
% iptables -I PREROUTING -t mangle -i eth0 -m cluster \
--cluster-total-nodes 32 --cluster-local-node 32 \
--cluster-hash-seed 0xdeadbeef -j MARK --set-mark 0xffff
iptables: Invalid argument. Run `dmesg' for more information.
% dmesg | tail -1
xt_cluster: this node mask cannot be higher than the total number of nodes
The problem is related to this checking:
if (info->node_mask >= (1 << info->total_nodes)) {
printk(KERN_ERR "xt_cluster: this node mask cannot be "
"higher than the total number of nodes\n");
return false;
}
(1 << 32) is 1. Thus, the checking fails.
BTW, I said this before but I insist: I have only tested the cluster
match with 2 nodes getting ~45% extra performance in an active-active setup.
The maximum limit of 32 nodes is still completely arbitrary. I'd really
appreciate if people that have more nodes in their setups let me know.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
As packets ending with NEXTHDR_NONE don't have a last extension header,
the check for the length needs to be after the check for NEXTHDR_NONE.
Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Pointed out by Dave Miller:
CHECK include/linux/netfilter (57 files)
/home/davem/src/GIT/net-2.6/usr/include/linux/netfilter/xt_LED.h:6: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Related-to: commit 325fb5b4d2
The compat path suffers from a similar problem. It only uses a __be32
when all of the recent code uses, and expects, an nf_inet_addr
everywhere. As a result, addresses stored by xt_recents were
filled with whatever other stuff was on the stack following the be32.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
With a minor compile fix from Roman.
Reported-and-tested-by: Roman Hoog Antink <rha@open.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds missing role attribute to the DCCP type, otherwise
the creation of entries is not of any use.
The attribute added is CTA_PROTOINFO_DCCP_ROLE which contains the
role of the conntrack original tuple.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit d0dba725 (netfilter: ctnetlink: add callbacks to the per-proto
nlattrs) changed the protocol registration function to abort if the
to-be registered protocol doesn't provide a new callback function.
The DCCP and UDP-Lite IPv6 protocols were missed in this conversion,
add the required callback pointer.
Reported-and-tested-by: Steven Jan Springl <steven@springl.ukfsn.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
br_nf_dev_queue_xmit only checks for ETH_P_IP packets for fragmenting but not
VLAN packets. This results in dropping of large VLAN packets. This can be
observed when connection tracking is enabled. Connection tracking re-assembles
fragmented packets, and these have to re-fragmented when transmitting out. Also,
make sure only refragmented packets are defragmented as per suggestion from
Patrick McHardy.
Signed-off-by: Saikiran Madugula <hummerbliss@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
With this patch, nfnetlink returns -ENOMEM instead of -EPERM if we
fail to create the nfnetlink netlink socket during the module
loading. This is exactly what rtnetlink does in this case.
Ideally, it would be better if we propagate the error that has
happened in netlink_kernel_create(), however, this function still
does not implement this yet.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes an inconsistency that results in no error reports
to user-space listeners if we fail to allocate the event message.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The removal of the SAME target accidentally removed one feature that is
not available from the normal NAT targets so far, having multi-range
mappings that use the same mapping for each connection from a single
client. The current behaviour is to choose the address from the range
based on source and destination IP, which breaks when communicating
with sites having multiple addresses that require all connections to
originate from the same IP address.
Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the
destination address is taken into account for selecting addresses.
http://bugzilla.kernel.org/show_bug.cgi?id=12954
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit ea781f197d (netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and)
get rid of call_rcu() was missing one conversion to the hlist_nulls
functions, causing a crash when unloading conntrack helper modules.
Reported-and-tested-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
commit ca735b3aaa
'netfilter: use a linked list of loggers'
introduced an array of list_head in "struct nf_logger", but
forgot to initialize it in nf_log_register(). This resulted
in oops when calling nf_log_unregister() at module unload time.
Reported-and-tested-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes a regression (introduced by myself in commit 19abb7b:
netfilter: ctnetlink: deliver events for conntracks changed from
userspace) that results in an expectation re-insertion since
__nf_ct_expect_check() may return 0 for expectation timer refreshing.
This patch also removes a unnecessary refcount bump that
pretended to avoid a possible race condition with event delivery
and expectation timers (as said, not needed since we hold a
reference to the object since until we finish the expectation
setup). This also merges nf_ct_expect_related_report() and
nf_ct_expect_related() which look basically the same.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* 'audit.b62' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
Audit: remove spaces from audit_log_d_path
audit: audit_set_auditable defined but not used
audit: incorrect ref counting in audit tree tag_chunk
audit: Fix possible return value truncation in audit_get_context()
audit: ignore terminating NUL in AUDIT_USER_TTY messages
Audit: fix handling of 'strings' with NULL characters
make the e->rule.xxx shorter in kernel auditfilter.c
auditsc: fix kernel-doc notation
audit: EXECVE record - removed bogus newline
* 'for-next' of git://git.o-hand.com/linux-mfd:
mfd: fix da903x warning
mfd: fix MAINTAINERS entry
mfd: Use the value of the final spin when reading the AUXADC
mfd: Storage class should be before const qualifier
mfd: PASIC3: supply clock_rate to DS1WM via driver_data
mfd: remove DS1WM clock handling
mfd: remove unused PASIC3 bus_shift field
pxa/magician: remove deprecated .bus_shift from PASIC3 platform_data
mfd: convert PASIC3 to use MFD core
mfd: convert DS1WM to use MFD core
mfd: Support active high IRQs on WM835x
mfd: Use bulk read to fill WM8350 register cache
mfd: remove duplicated #include from pcf50633
* 'for-linus' of git://repo.or.cz/cris-mirror:
CRISv32: Remove extraneous space between -I and the path.
cris: convert obsolete hw_interrupt_type to struct irq_chip
BUG to BUG_ON changes
cpumask: use mm_cpumask() wrapper: cris
cpumask: Use accessors code.: cris
cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.: cris
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (23 commits)
sh: sh7785lcr: Map whole PCI address space.
sh: Fix up DSP context save/restore.
sh: Fix up number of on-chip DMA channels on SH7091.
sh: update defconfigs.
sh: Kill off broken direct-mapped cache mode.
sh: Wire up ARCH_HAS_DEFAULT_IDLE for cpuidle.
sh: Add a command line option for disabling I/O trapping.
sh: Select ARCH_HIBERNATION_POSSIBLE.
sh: migor: Fix up CEU use flags.
input: migor_ts: add wakeup support
rtc: rtc-sh: use set_irq_wake()
input: sh_keysc: use enable/disable_irq_wake()
sh: intc: set_irq_wake() support
sh: intc: install enable, disable and shutdown callbacks
clocksource: sh_cmt: use remove_irq() and remove clockevent workaround
sh: ap325 and Migo-R use new sh_mobile_ceu_info flags
sh: Fix up -Wformat-security whining.
sh: ap325rxa: Add ov772x support, again.
sh: Sanitize asm/mmu.h for assembly use.
sh: Tidy up sh7786 pinmux table.
...
* 'avr32-arch' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
avr32: add hardware handshake support to atmel_serial
avr32: add RTS/CTS/CLK pin selection for the USARTs
Add RTC support for Merisc boards
avr32: at32ap700x: setup DMA for AC97C in the machine code
avr32: at32ap700x: setup DMA for ABDAC in the machine code
Add Merisc board support
avr32: use gpio_is_valid() to check USBA vbus_pin I/O line
atmel-usba-udc: use gpio_is_valid() to check vbus_pin I/O line
avr32: fix timing LCD parameters for EVKLCD10X boards
avr32: use GPIO line PB15 on EVKLCD10x boards for backlight
avr32: configure MCI detect and write protect pins for EVKLCD10x boards
avr32: set pin mask to alternative 18 bpp for EVKLCD10x boards
avr32: add pin mask for 18-bit color on the LCD controller
avr32: fix 15-bit LCDC pin mask to use MSB lines