"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.
nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8
I have seen "len" up to 280 and my host has crashes w/o this patch.
The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.
Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
There are no these aliases, so kernel can not request appropriate
match table:
$ iptables -I INPUT -p tcp -m osf --genre Windows --ttl 2 -j DROP
iptables: No chain/target/match by that name.
setsockopt() requests ipt_osf module, which is not present. Add
the aliases.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This simple modification allows iptables to work with INPUT chain
in combination with cgroup module. It could be useful for counting
ingress traffic per cgroup with nfacct netfilter module. There
were no problems to count the egress traffic that way formerly.
It's possible to get classified sk_buff after PREROUTING, due to
socket lookup being done in early_demux (tcp_v4_early_demux). Also
it works for udp as well.
Trivial usage example, assuming we're in the same shell every step
and we have enough permissions:
1) Classic net_cls cgroup initialization:
mkdir /sys/fs/cgroup/net_cls
mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
2) Set up cgroup for interesting application:
mkdir /sys/fs/cgroup/net_cls/wget
echo 1 > /sys/fs/cgroup/net_cls/wget/net_cls.classid
echo $BASHPID > /sys/fs/cgroup/net_cls/wget/cgroup.procs
3) Create kernel counters:
nfacct add wget-cgroup-in
iptables -A INPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-in
nfacct add wget-cgroup-out
iptables -A OUTPUT -m cgroup ! --cgroup 1 -m nfacct --nfacct-name wget-cgroup-out
4) Network usage:
wget https://www.kernel.org/pub/linux/kernel/v3.x/testing/linux-3.14-rc6.tar.xz
5) Check results:
nfacct list
Cgroup approach is being used for the DataUsage (counting & blocking
traffic) feature for Samsung's modification of the Tizen OS.
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Eric points out that the locks can be global.
Moreover, both Jesper and Eric note that using only 32 locks increases
false sharing as only two cache lines are used.
This increases locks to 256 (16 cache lines assuming 64byte cacheline and
4 bytes per spinlock).
Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
cannot use ARRAY_SIZE() if spinlock_t is empty struct.
Fixes: 1442e7507d ("netfilter: connlimit: use keyed locks")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Recycling skb always had been very tough...
This time it appears GRO layer can accumulate skb->truesize
adjustments made by drivers when they attach a fragment to skb.
skb_gro_receive() can only subtract from skb->truesize the used part
of a fragment.
I spotted this problem seeing TcpExtPruneCalled and
TcpExtTCPRcvCollapsed that were unexpected with a recent kernel, where
TCP receive window should be sized properly to accept traffic coming
from a driver not overshooting skb->truesize.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the Micrel KSZ8864RMN switch to the spi_ks8995
driver. The KSZ8864RMN switch has a wider 256-byte register space.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5902385a24 ("tipc: obsolete
the remote management feature") introduces a regression where node
topology events are not being generated because the publication
that triggers this: {0, <z.c.n>, <z.c.n>} is no longer available.
This will break applications that rely on node events to discover
when nodes join/leave a cluster.
We fix this by advertising the node publication when TIPC enters
networking mode, and withdraws it upon shutdown.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is no way how to find out if a device supports busy
polling. So add a feature and make it dependent on ndo_busy_poll
existence.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, in packet_direct_xmit() we test the assigned netdevice queue
for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().
This can have the side-effect that BQL enabled drivers which make use
of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
within the stack and would not fully fill the device's TX ring from
packet sockets with PACKET_QDISC_BYPASS enabled.
Instead, use a test without BQL bit so that bursts can be absorbed
into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 015f0688f5 ("net: net: add a core netdev->tx_dropped
counter"), we can now account for TX drops from within the core
stack instead of drivers.
Therefore, fix packet_direct_xmit() and increase drop count when we
encounter a problem before driver's xmit function was called (we do
not want to doubly account for it).
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An old inefficiency of the TX path that we are grant mapping the first slot,
and then copy the header part to the linear area. Instead, doing a grant copy
for that header straight on is more reasonable. Especially because there are
ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
not touched in Dom0. In the original way the memcpy ruined that.
The key changes:
- the vif has a tx_copy_ops array again
- xenvif_tx_build_gops sets up the grant copy operations
- we don't have to figure out whether the header and first frag are on the same
grant mapped page or not
Note, we only grant copy PKT_PROT_LEN bytes from the first slot, the rest (if
any) will be on the first frag, which is grant mapped. If the first slot is
smaller than PKT_PROT_LEN, then we grant copy that, and later __pskb_pull_tail
will pull more from the frags (if any)
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qlcnic driver fails to build on ARM with errors like:
In file included from drivers/net/ethernet/qlogic/qlcnic/qlcnic.h:36:0,
from drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c:8:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h:585:1: error: unknown type name 'irqreturn_t'
irqreturn_t qlcnic_83xx_clear_legacy_intr(struct qlcnic_adapter *);
^
Nothing in the driver is explicitly including the irq definitions, so we
add an include of linux/irq.h to pick them up.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The enic driver fails to build on ARM with:
In file included from drivers/net/ethernet/cisco/enic/enic_res.c:40:0:
drivers/net/ethernet/cisco/enic/enic.h:48:2: error: expected specifier-qualifier-list before 'irqreturn_t'
irqreturn_t (*isr)(int, void *);
^
Nothing in the driver is explicitly including the irq definitions, so we add
an include of linux/irq.h to pick them up.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bnx2x driver fails to build on ARM with:
In file included from drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:28:0:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h:243:1: error: unknown type name 'irqreturn_t'
irqreturn_t bnx2x_msix_sp_int(int irq, void *dev_instance);
^
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h:251:1: error: unknown type name 'irqreturn_t'
irqreturn_t bnx2x_interrupt(int irq, void *dev_instance);
^
Nothing in bnx2x_link.c or bnx2x_cmn.h is explicitly including the irq
definitions, so we add an include of linux/irq.h to pick them up.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return -EINVAL unless all of user-given strings are correctly
NUL-terminated.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix build errors:
drivers/net/ethernet/ti/cpts.c:266:12: error: 'ETH_HLEN' undeclared (first use in this function)
drivers/net/ethernet/ti/cpts.c:276:23: error: 'VLAN_HLEN' undeclared (first use in this function)
Fixes: 408eccce32 ("net: ptp: move PTP classifier in its own file")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Suggested-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
Pull HID updates from Jiri Kosina:
- substantial cleanup of the generic and transport layers, in the
direction of an ultimate goal of making struct hid_device completely
transport independent, by Benjamin Tissoires
- cp2112 driver from David Barksdale
- a lot of fixes and new hardware support (Dualshock 4) to hid-sony
driver, by Frank Praznik
- support for Win 8.1 multitouch protocol by Andrew Duggan
- other smaller fixes / device ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (75 commits)
HID: sony: fix force feedback mismerge
HID: sony: Set the quriks flag for Bluetooth controllers
HID: sony: Fix Sixaxis cable state detection
HID: uhid: Add UHID_CREATE2 + UHID_INPUT2
HID: hyperv: fix _raw_request() prototype
HID: hyperv: Implement a stub raw_request() entry point
HID: hid-sensor-hub: fix sleeping function called from invalid context
HID: multitouch: add support for Win 8.1 multitouch touchpads
HID: remove hid_output_raw_report transport implementations
HID: sony: do not rely on hid_output_raw_report
HID: cp2112: remove the last hid_output_raw_report() call
HID: cp2112: remove various hid_out_raw_report calls
HID: multitouch: add support of other generic collections in hid-mt
HID: multitouch: remove pen special handling
HID: multitouch: remove registered devices with default behavior
HID: hidp: Add a comment that some devices depend on the current behavior of uniq
HID: sony: Prevent duplicate controller connections.
HID: sony: Perform a boundry check on the sixaxis battery level index.
HID: sony: Fix work queue issues
HID: sony: Fix multi-line comment styling
...
Pull sched/idle changes from Ingo Molnar:
"More idle code reorganization, to prepare for more integration.
(Sent separately because it depended on pending timer work, which is
now upstream)"
* 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/idle: Add more comments to the code
sched/idle: Move idle conditions in cpuidle_idle main function
sched/idle: Reorganize the idle loop
cpuidle/idle: Move the cpuidle_idle_call function to idle.c
idle/cpuidle: Split cpuidle_idle_call main function into smaller functions