Packets may race when create the new element in nft_hash_update:
CPU0 CPU1
lookup_fast - fail lookup_fast - fail
new - ok new - ok
insert - ok insert - fail(EEXIST)
So when race happened, we reuse the existing element. Otherwise,
these *racing* packets will not be handled properly.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When nft_expr_clone failed, a series of problems will happen:
1. module refcnt will leak, we call __module_get at the beginning but
we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
clone fail, we should decrease the set->nelems
Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.
Fixes: 086f332167 ("netfilter: nf_tables: add clone interface to expression operations")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When CONFIG_NFT_SET_HASH is not enabled and I input the following rule:
"nft add rule filter output flow table test {ip daddr counter }", kernel
panic happened on my system:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Call Trace:
[<ffffffffa0590466>] ? nft_dynset_eval+0x56/0x100 [nf_tables]
[<ffffffffa05851bb>] nft_do_chain+0xfb/0x4e0 [nf_tables]
[<ffffffffa0432f01>] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack]
[<ffffffffa0459ea6>] ? get_unique_tuple+0x136/0x560 [nf_nat]
[<ffffffffa043bca1>] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack]
[<ffffffffa045a357>] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat]
[<ffffffff81761e27>] ? ipt_do_table+0x327/0x610
[<ffffffffa045a6d7>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
[<ffffffffa059f21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
[<ffffffff81702515>] nf_iterate+0x55/0x60
[<ffffffff81702593>] nf_hook_slow+0x73/0xd0
Because in rbtree type set, ops->update is not implemented. So just keep
it simple, in such case, report -EOPNOTSUPP to the user space.
Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Following Cavium's acquisition of qlogic we need to update all the qlogic
drivers maintainer's entries to point to our new e-mail addresses,
as well as update some of the driver's maintainers as those are no longer
working for Cavium.
I would like to thank Sony Chacko and Rajesh Borundia for their support
and development of our various networking drivers.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Hyper-V netvsc driver was looking at the incorrect status bits
in the checksum info. It was setting the receive checksum unnecessary
flag based on the IP header checksum being correct. The checksum
flag is skb is about TCP and UDP checksum status. Because of this
bug, any packet received with bad TCP checksum would be passed
up the stack and to the application causing data corruption.
The problem is reproducible via netcat and netem.
This had a side effect of not doing receive checksum offload
on IPv6. The driver was also also always doing checksum offload
independent of the checksum setting done via ethtool.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First bug was added in commit ad6f939ab1 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));
Then commit e6afc8ace6 ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.
Fixes: ad6f939ab1 ("ip: Add offset parameter to ip_cmsg_recv")
Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7303a14750 ("sctp: identify chunks that need to be fragmented
at IP level") made the chunk be fragmented at IP level in the next round
if it's size exceed PMTU.
But there still is another case, PMTU can be updated if transport's dst
expires and transport's pmtu_pending is set in sctp_packet_transmit. If
the new PMTU is less than the chunk, the same issue with that commit can
be triggered.
So we should drop this packet and let it retransmit in another round
where it would be fragmented at IP level.
This patch is to fix it by checking the chunk size after PMTU may be
updated and dropping this packet if it's size exceed PMTU.
Fixes: 90017accff ("sctp: Add GSO support")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skbuff and sock structure both had missing parameter annotation
values.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: e420114eef ("rocker: introduce worlds infrastructure")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2016-10-21
Here are some more Bluetooth fixes for the 4.9 kernel:
- Fix to btwilink driver probe function return value
- Power management fix to hci_bcm
- Fix to encoding name in scan response data
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
if (len < sizeof(int))
return -EINVAL;
On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.
Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.
Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.
If not checked, this is the result:
UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
shift exponent 52 is too large for 32-bit type 'int'
CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
Call Trace:
[<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
...
[<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
[<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
[<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
[<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
[<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
[<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
[<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3ac72b7b63 ("net: fec: align IP header in hardware") breaks
networking on mx28.
There is an erratum on mx28 (ENGR121613 - ENET big endian mode
not compatible with ARM little endian) that requires an additional
byte-swap operation to workaround this problem.
So call swap_buffer() prior to performing the IP header alignment
to restore network functionality on mx28.
Fixes: 3ac72b7b63 ("net: fec: align IP header in hardware")
Reported-and-tested-by: Henri Roosen <henri.roosen@ginzinger.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we'll overflow the integer. This occurs when layer 3 tunneled
packets are handed off to the IPv6 layer.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Time Sync (PTP) implementation uses the divisor/shift value for converting
the clock ticks to nanoseconds. Driver currently defines shift value as 1,
this results in the nanoseconds value to be calculated as half the actual
value. Hence the user application fails to synchronize the device clock
value with the PTP master device clock. Need to use the 'shift' value of 0.
Signed-off-by: Sony.Chacko <Sony.Chacko@cavium.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru says:
====================
qed*: fix series.
The patch series contains several minor bug fixes for qed/qede modules.
Please consider applying this to 'net' branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver uses incorrect APIs to unmap DMA memory which were
mapped using dma_map_single(). This patch fixes it to use
appropriate APIs for un-mapping DMA memory.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qed_dcbx_query_params() implementation populate the values to input
buffer based on the dcbx mode and, the current negotiated state/params,
the caller of this API need to memset the buffer to zero.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rx indirection table entries are in the range [0, (rss_count - 1)]. If
user reduces the rss count, the table entries may not be in the ccorrect
range. Need to reconfigure the table with new rss_count as a basis.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the current default values for Rx path i.e., 8 queues of 8Kb entries
each with 4Kb size, interface will consume 256Mb for Rx. The default values
causing the driver probe to fail when the system memory is low. Based on
the perforamnce results, rx-ring count value of 1Kb gives the comparable
performance with Rx coalesce timeout of 12 seconds. Updating the default
values.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the execution of loopback test, driver may receive the packets which
are not originated by this test, loopback implementation need to skip those
packets.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes in kernel ethtool implementation requires the driver
callback for get_channels() has to populate the values for max tx/rx
coalesce fields.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a681574c99
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).
Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.
We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.
Fixes: a681574c99 ("ipv4: disable BH in set_ping_group_range()")
Fixes: ba6b918ab2 ("ping: move ping_group_range out of CONFIG_SYSCTL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric Salo <salo@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: dsa: bcm_sf2: Do not rely on kexec_in_progress
These are the two patches following the discussing we had on kexec_in_progress.
Feel free to apply or discard them, thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After discussing with Eric, it turns out that, while using
kexec_in_progress is a nice optimization, which prevents us from always
powering on the integrated PHY, let's just turn it on in the shutdown
path.
This removes a dependency on kexec_in_progress which, according to Eric
should not be used by modules
Fixes: 2399d6143f ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>