The timestamping API has separate bits for generating and reporting
timestamps. A software timestamp should only be reported for a packet
when the packet has the relevant generation flag (SKBTX_..) set
and the socket has reporting bit SOF_TIMESTAMPING_SOFTWARE set.
The second check was accidentally removed. Reinstitute the original
behavior.
Tested:
Without this patch, Documentation/networking/txtimestamp reports
timestamps regardless of whether SOF_TIMESTAMPING_SOFTWARE is set.
After the patch, it only reports them when the flag is set.
Fixes: f24b9be595 ("net-timestamp: extend SCM_TIMESTAMPING ancillary data struct")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling setsockopt with IPV6_JOIN_ANYCAST or IPV6_LEAVE_ANYCAST
triggers the assertion in addrconf_join_solict()/addrconf_leave_solict()
ipv6_sock_ac_join(), ipv6_sock_ac_drop(), ipv6_sock_ac_close() need to
take RTNL before calling ipv6_dev_ac_inc/dec. Same thing with
ipv6_sock_mc_join(), ipv6_sock_mc_drop(), ipv6_sock_mc_close() before
calling ipv6_dev_mc_inc/dec.
This patch moves ASSERT_RTNL() up a level in the call stack.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user_skb maybe be leaked if the operation on it failed and codes
skipped into the label "out:" without calling genlmsg_unicast.
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
make defconfig reports:
warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NETFILTER_ADVANCED)
Fixes: d79a61d netfilter: NETFILTER_XT_TARGET_LOG selects NF_LOG_*
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
pull request: Netfilter/IPVS fixes for net
The following patchset contains seven Netfilter fixes for your net
tree, they are:
1) Make the NAT infrastructure independent of x_tables, some users are
already starting to test nf_tables with NAT without enabling x_tables.
Without this patch for Kconfig, there's a superfluous dependency
between NAT and x_tables.
2) Allow to use 0 in the cgroup match, the kernel rejects with -EINVAL
with no good reason. From Daniel Borkmann.
3) Select CONFIG_NF_NAT from the nf_tables NAT expression, this also
resolves another NAT dependency with x_tables.
4) Use HAVE_JUMP_LABEL instead of CONFIG_JUMP_LABEL in the Netfilter hook
code as elsewhere in the kernel to resolve toolchain problems, from
Zhouyi Zhou.
5) Use iptunnel_handle_offloads() to set up tunnel encapsulation
depending on the offload capabilities, reported by Alex Gartrell
patch from Julian Anastasov.
6) Fix wrong family when registering the ip_vs_local_reply6() hook,
also from Julian.
7) Select the NF_LOG_* symbols from NETFILTER_XT_TARGET_LOG. Rafał
Miłecki reported that when jumping from 3.16 to 3.17-rc, his log
target is not selected anymore due to changes in the previous
development cycle to accomodate the full logging support for
nf_tables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless 2014-08-28
Please pull this batch of fixes intended for the 3.17 stream.
For the Bluetooth/6LowPAN/802.15.4 bits, Johan says:
'It contains a connection reference counting fix for LE where a
connection might stay up even though it should get disconnected.
The other 802.15.4 6LoWPAN related patches were sent to the bluetooth
tree by Alexander Aring and described as follows by him:
"
these patches contains patches for the bluetooth branch.
This series includes memory leak fixes and an errno value fix.
Also there are two patches for sending and receiving 1280 6LoWPAN
packets, which makes the IEEE 802.15.4 6LoWPAN stack more RFC
compliant.
"'
Along with that...
Alexey Khoroshilov fixes a use-after-free bug on at76c50x-usb.
Hauke Mehrtens adds a PCI ID to bcma.
Himangi Saraogi fixes a silly "A || A" test in rtlwifi.
Larry Finger adds a device ID to rtl8192cu.
Maks Naumov fixes a strncmp argument in ath9k.
Álvaro Fernández Rojas adds a PCI ID to ssb.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_NETFILTER_XT_TARGET_LOG is not selected anymore when jumping
from 3.16 to 3.17-rc1 if you don't set on the new NF_LOG_IPV4 and
NF_LOG_IPV6 switches.
Change this to select the three new symbols NF_LOG_COMMON, NF_LOG_IPV4
and NF_LOG_IPV6 instead, so NETFILTER_XT_TARGET_LOG remains enabled
when moving from old to new kernels.
Reported-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
tree, the official <netinet/sctp.h> header carries a copy of enum
sctp_sstat_state that looks like (compared to the current in-kernel
enumeration):
User definition: Kernel definition:
enum sctp_sstat_state { typedef enum {
SCTP_EMPTY = 0, <removed>
SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0,
SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1,
SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2,
SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3,
SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4,
SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5,
SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6,
SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
}; } sctp_state_t;
This header was later on also placed into the uapi, so that user space
programs can compile without having <netinet/sctp.h>, but the shipped
with <linux/sctp.h> instead.
While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
nevertheless have a what it appears to be dummy SCTP_EMPTY state from
the very early days.
While it seems to do just nothing, commit 0b8f9e25b0 ("sctp: remove
completely unsed EMPTY state") did the right thing and removed this dead
code. That however, causes an off-by-one when the user asks the SCTP
stack via SCTP_STATUS API and checks for the current socket state thus
yielding possibly undefined behaviour in applications as they expect
the kernel to tell the right thing.
The enumeration had to be changed however as based on the current socket
state, we access a function pointer lookup-table through this. Therefore,
I think the best way to deal with this is just to add a helper function
sctp_assoc_to_state() to encapsulate the off-by-one quirk.
Reported-by: Tristan Su <sooqing@gmail.com>
Fixes: 0b8f9e25b0 ("sctp: remove completely unsed EMPTY state")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit ed98df3361 ("net: use __GFP_NORETRY for high order
allocations") we tried to address one issue caused by order-3
allocations.
We still observe high latencies and system overhead in situations where
compaction is not successful.
Instead of trying order-3, order-2, and order-1, do a single order-3
best effort and immediately fallback to plain order-0.
This mimics slub strategy to fallback to slab min order if the high
order allocation used for performance failed.
Order-3 allocations give a performance boost only if they can be done
without recurring and expensive memory scan.
Quoting David :
The page allocator relies on synchronous (sync light) memory compaction
after direct reclaim for allocations that don't retry and deferred
compaction doesn't work with this strategy because the allocation order
is always decreasing from the previous failed attempt.
This means sync light compaction will always be encountered if memory
cannot be defragmented or reclaimed several times during the
skb_page_frag_refill() iteration.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit fc60476761
("ipvs: changes for local real server") from 2.6.37
introduced DNAT support to local real server but the
IPv6 LOCAL_OUT handler ip_vs_local_reply6() is
registered incorrectly as IPv4 hook causing any outgoing
IPv4 traffic to be dropped depending on the IP header values.
Chris tracked down the problem to CONFIG_IP_VS_IPV6=y
Bug report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349768
Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Tested-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The tunneling method should properly use tunnel encapsulation.
Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
offload is supported.
Thanks to Alex Gartrell for reporting the problem, providing
solution and for all suggestions.
Reported-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
This is follow-up to
da08143b85 ("vlan: more careful checksum features handling")
which introduced more careful feature intersection in vlan code,
taking into account that HW_CSUM should be considered superset
of IP_CSUM/IPV6_CSUM. The same is needed in netif_skb_features()
in order to avoid offloading mismatch warning when vlan is
created on top of a bond consisting of slaves supporting IP/IPv6
checksumming but not vlan Tx offloading.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code manipulating sysfs symlinks on adjacent net_devices(s)
currently doesn't take into account that devices potentially
belong to different namespaces.
This patch trying to fix an issue as follows:
- check for net_ns before creating / deleting symlink.
for now only netdev_adjacent_rename_links and
__netdev_adjacent_dev_remove are affected, afaics
__netdev_adjacent_dev_insert implies both net_devs
belong to the same namespace.
- Drop all existing symlinks to / from all adj_devs before
switching namespace and recreate them just after.
Signed-off-by: Alexander Y. Fomichev <git.user@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use HAVE_JUMP_LABEL as elsewhere in the kernel to ensure
that the toolchain has the required support in addition to
CONFIG_JUMP_LABEL being set.
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The new_ctx pointer is set only for non-chanctx drivers. This yielded a
crash for chanctx-based drivers during channel switch finalization:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
Use an adequate chanctx pointer to fix this.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In SCTP, selection of active (T.ACT) and retransmission (T.RET)
transports is being done whenever transport control operations
(UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
Commits 4c47af4d5e ("net: sctp: rework multihoming retransmission
path selection to rfc4960") and a7288c4dd5 ("net: sctp: improve
sctp_select_active_and_retran_path selection") have both improved
it towards a more fine-grained and optimal path selection.
Currently, the selection algorithm for T.ACT and T.RET is as follows:
1) Elect the two most recently used ACTIVE transports T1, T2 for
T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
T.ACT<-T.PRI and T.RET<-T1
3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
still slightly suboptimal as it can lead to the following scenario:
Setup:
<A> <B>
T1: p1p1 (10.0.10.10) <==> .'`) <==> p1p1 (10.0.10.12) <= T.PRI
T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
net.sctp.rto_min = 1000
net.sctp.path_max_retrans = 2
net.sctp.pf_retrans = 0
net.sctp.hb_interval = 1000
T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
link flapping). Here, the first time transmission is sent over PF path
T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
are sent over the INACTIVE path T1 (T.PRI), which is not good.
After the patch, it's choosing better transports in both cases by
modifying step 4):
4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
the most recently used (if avail) in PF, set T.RET<-T.ACT_new
This will still select a best possible path in PF if available (which
can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
for a very short while before going back ACTIVE, it will guarantee that
this path will be reselected for T.ACT/T.RET since T3 (PF) is not
available.
Previously, this was not possible, as we would only select between T.PRI
and T.RET, and a possible T3 would be NULL due to the fact that we have
just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
and would select a suboptimal path when T.PRI/T.RET have worse properties.
In the case that T.ACT_old permanently went to INACTIVE during this
transition and there's no PF path available, plus T.PRI and T.RET are
INACTIVE as well, we would now camp on T.ACT_old, but if everything is
being INACTIVE there's really not much we can do except hoping for a
successful HB to bring one of the transports back up again and, thus
cause a new selection through sctp_assoc_control_transport().
Now both tests work fine:
Case 1:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
2. T1 S(ACTIVE) T.ACT, T.RET
T2 S(PF)
3. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)
5. T1 S(PF) T.ACT, T.RET
T2 S(INACTIVE)
[ 5.1 T1 S(INACTIVE) T.ACT, T.RET
T2 S(INACTIVE) ]
6. T1 S(ACTIVE) T.ACT, T.RET
T2 S(INACTIVE)
7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
Case 2:
1. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
2. T1 S(PF)
T2 S(ACTIVE) T.ACT, T.RET
3. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET
5. T1 S(INACTIVE)
T2 S(PF) T.ACT, T.RET
[ 5.1 T1 S(INACTIVE)
T2 S(INACTIVE) T.ACT, T.RET ]
6. T1 S(INACTIVE)
T2 S(ACTIVE) T.ACT, T.RET
7. T1 S(ACTIVE) T.ACT
T2 S(ACTIVE) T.RET
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When both transports are the same, we don't have to go down that
road only to realize that we will return the very same transport.
We are guaranteed that curr is always non-NULL. Therefore, just
short-circuit this special case.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there are multiple vlan headers present in a received frame, the first
one is put into vlan_tci and protocol is set to ETH_P_8021Q. Anything in the
skb beyond the VLAN TPID may be still non-linear, including the inner TCI
and ethertype. While ovs_flow_extract takes care of IP and IPv6 headers, it
does nothing with ETH_P_8021Q. Later, if OVS_ACTION_ATTR_POP_VLAN is
executed, __pop_vlan_tci pulls the next vlan header into vlan_tci.
This leads to two things:
1. Part of the resulting ethernet header is in the non-linear part of the
skb. When eth_type_trans is called later as the result of
OVS_ACTION_ATTR_OUTPUT, kernel BUGs in __skb_pull. Also, __pop_vlan_tci
is in fact accessing random data when it reads past the TPID.
2. network_header points into the ethernet header instead of behind it.
mac_len is set to a wrong value (10), too.
Reported-by: Yulong Pei <ypei@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
af_packet can currently overwrite kernel memory by out of bound
accesses, because it assumed a [new] block can always hold one frame.
This is not generally the case, even if most existing tools do it right.
This patch clamps too long frames as API permits, and issue a one time
error on syslog.
[ 394.357639] tpacket_rcv: packet too big, clamped from 5042 to 3966. macoff=82
In this example, packet header tp_snaplen was set to 3966,
and tp_len was set to 5042 (skb->len)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>