Netlink FDB messages are sent in the link netns. The header of these messages
contains the ifindex (ndm_ifindex) of the netdevice, but this ifindex is
unusable in case of x-netns vxlan.
I named the new attribute NDA_NDM_IFINDEX_NETNSID, to avoid confusion with
NDA_IFINDEX.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some further updates for net-next:
* fix network-manager which was broken by the previous changes
* fix delete-station events, which were broken by me making the
genlmsg_end() mistake
* fix a timer left running during suspend in some race conditions
that would cause an annoying (but harmless) warning
* (less important, but in the tree already) remove 80+80 MHz rate
reporting since the spec doesn't distinguish it from 160 MHz;
as the bitrate they're both 160 MHz bandwidth
Signed-off-by: David S. Miller <davem@davemloft.net>
This tc action allows you to retrieve the connection tracking mark
This action has been used heavily by openwrt for a few years now.
There are known limitations currently:
doesn't work for initial packets, since we only query the ct table.
Fine given use case is for returning packets
no implicit defrag.
frags should be rare so fix later..
won't work for more complex tasks, e.g. lookup of other extensions
since we have no means to store results
we still have a 2nd lookup later on via normal conntrack path.
This shouldn't break anything though since skb->nfct isn't altered.
V2:
remove unnecessary braces (Jiri)
change the action identifier to 14 (Jiri)
Fix some stylistic issues caught by checkpatch
V3:
Move module params to bottom (Cong)
Get rid of tcf_hashinfo_init and friends and conform to newer API (Cong)
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new attribute (IFLA_LINK_NETNSID) which contains the 'link'
netns id when this netns is different from the netns where the interface
stands (for example for x-net interfaces like ip tunnels).
With this attribute, it's possible to interpret correctly all advertised
information (like IFLA_LINK, etc.).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to the netns where it is added (ie valid only into this
netns).
The main function (ie the one exported to other module), peernet2id(), allows to
get the id of a peer netns. If no id has been assigned by the user, this
function allocates one.
These ids will be used in netlink messages to point to a peer netns, for example
in case of a x-netns interface.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This action provides a possibility to exec custom BPF code.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here's a big pile of changes for this round.
We have
* a lot of regulatory code changes to deal with the
way newer Intel devices handle this
* a change to drop packets while disconnecting from
an AP instead of trying to wait for them
* a new attempt at improving the tailroom accounting
to not kick in too much for performance reasons
* improvements in wireless link statistics
* many other small improvements and small fixes that
didn't seem necessary for 3.19 (e.g. in hwsim which
is testing only code)
Conflicts:
drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c
Minor overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
For some reason, we made the bandwidth separate flags, which
is rather confusing - a single rate cannot have different
bandwidths at the same time.
Change this to no longer be flags but use a separate field
for the bandwidth ('bw') instead.
While at it, add support for 5 and 10 MHz rates - these are
reported as regular legacy rates with their real bitrate,
but tagged as 5/10 now to make it easier to distinguish them.
In the nl80211 API, the flags are preserved, but the code
now can also clearly only set a single one of the flags.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These rates are treated the same as 160 MHz in the spec, so
it makes no sense to distinguish them. As no driver uses them
yet, this is also not a problem, just remove them.
In the userspace API the field remains reserved to preserve
API and ABI.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduces support for the group policy extension to the VXLAN virtual
port. The extension is disabled by default and only enabled if the user
has provided the respective configuration.
ovs-vsctl add-port br0 vxlan0 -- \
set Interface vxlan0 type=vxlan options:exts=gbp
The configuration interface to enable the extension is based on a new
attribute OVS_VXLAN_EXT_GBP nested inside OVS_TUNNEL_ATTR_EXTENSION
which can carry additional extensions as needed in the future.
The group policy metadata is stored as binary blob (struct ovs_vxlan_opts)
internally just like Geneve options but transported as nested Netlink
attributes to user space.
Renames the existing TUNNEL_OPTIONS_PRESENT to TUNNEL_GENEVE_OPT with the
binary value kept intact, a new flag TUNNEL_VXLAN_OPT is introduced.
The attributes OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS and existing
OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS are implemented mutually exclusive.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.
The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.
SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels. However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.
Host 1: Host 2:
Group A Group B Group B Group A
+-----+ +-------------+ +-------+ +-----+
| lxc | | SELinux CTX | | httpd | | VM |
+--+--+ +--+----------+ +---+---+ +--+--+
\---+---/ \----+---/
| |
+---+---+ +---+---+
| vxlan | | vxlan |
+---+---+ +---+---+
+------------------------------+
Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP frames. The extension is therefore disabled by default
and needs to be specifically enabled:
ip link add [...] type vxlan [...] gbp
In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.
Examples:
iptables:
host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
OVS:
# ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
# ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/xen-netfront.c
Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Don't use uninitialized data in IPVS, from Dan Carpenter.
2) conntrack race fixes from Pablo Neira Ayuso.
3) Fix TX hangs with i40e, from Jesse Brandeburg.
4) Fix budget return from poll calls in dnet and alx, from Eric
Dumazet.
5) Fix bugus "if (unlikely(x) < 0)" test in AF_PACKET, from Christoph
Jaeger.
6) Fix bug introduced by conversion to list_head in TIPC retransmit
code, from Jon Paul Maloy.
7) Don't use GFP_NOIO under spinlock in USB kaweth driver, from Alexey
Khoroshilov.
8) Fix bridge build with INET disabled, from Arnd Bergmann.
9) Fix netlink array overrun for PROBE attributes in openvswitch, from
Thomas Graf.
10) Don't hold spinlock across synchronize_irq() in tg3 driver, from
Prashant Sreedharan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
tg3: Release tp->lock before invoking synchronize_irq()
tg3: tg3_reset_task() needs to use rtnl_lock to synchronize
tg3: tg3_timer() should grab tp->lock before checking for tp->irq_sync
team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin
openvswitch: packet messages need their own probe attribtue
i40e: adds FCoE configure option
cxgb4vf: Fix queue allocation for 40G adapter
netdevice: Add missing parentheses in macro
bridge: only provide proxy ARP when CONFIG_INET is enabled
neighbour: fix base_reachable_time(_ms) not effective immediatly when changed
net: fec: fix MDIO bus assignement for dual fec SoC's
xen-netfront: use different locks for Rx and Tx stats
drivers: net: cpsw: fix multicast flush in dual emac mode
cxgb4vf: Initialize mdio_addr before using it
net: Corrected the comment describing the ndo operations to reflect the actual prototype for couple of operations
usb/kaweth: use GFP_ATOMIC under spin_lock in usb_start_wait_urb()
MAINTAINERS: add me as ibmveth maintainer
tipc: fix bug in broadcast retransmit code
update ip-sysctl.txt documentation (v2)
net/at91_ether: prepare and unprepare clock
...
User space is currently sending a OVS_FLOW_ATTR_PROBE for both flow
and packet messages. This leads to an out-of-bounds access in
ovs_packet_cmd_execute() because OVS_FLOW_ATTR_PROBE >
OVS_PACKET_ATTR_MAX.
Introduce a new OVS_PACKET_ATTR_PROBE with the same numeric value
as OVS_FLOW_ATTR_PROBE to grow the range of accepted packet attributes
while maintaining to be binary compatible with existing OVS binaries.
Fixes: 05da589 ("openvswitch: Add support for OVS_FLOW_ATTR_PROBE.")
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tracked-down-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for remote checksum offload in VXLAN. This uses a
reserved bit to indicate that RCO is being done, and uses the low order
reserved eight bits of the VNI to hold the start and offset values in a
compressed manner.
Start is encoded in the low order seven bits of VNI. This is start >> 1
so that the checksum start offset is 0-254 using even values only.
Checksum offset (transport checksum field) is indicated in the high
order bit in the low order byte of the VNI. If the bit is set, the
checksum field is for UDP (so offset = start + 6), else checksum
field is for TCP (so offset = start + 16). Only TCP and UDP are
supported in this implementation.
Remote checksum offload for VXLAN is described in:
https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
With UDP checksums and Remote Checksum Offload
IPv4
Client
11.84% CPU utilization
Server
12.96% CPU utilization
9197 Mbps
IPv6
Client
12.46% CPU utilization
Server
14.48% CPU utilization
8963 Mbps
With UDP checksums, no remote checksum offload
IPv4
Client
15.67% CPU utilization
Server
14.83% CPU utilization
9094 Mbps
IPv6
Client
16.21% CPU utilization
Server
14.32% CPU utilization
9058 Mbps
No UDP checksums
IPv4
Client
15.03% CPU utilization
Server
23.09% CPU utilization
9089 Mbps
IPv6
Client
16.18% CPU utilization
Server
26.57% CPU utilization
8954 Mbps
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send the netdetect configuration information in the response to
NL8021_CMD_GET_WOWLAN commands. This includes the scan interval,
SSIDs to match and frequencies to scan.
Additionally, add the NL80211_WOWLAN_TRIG_NET_DETECT with
NL80211_ATTR_WOWLAN_TRIGGERS_SUPPORTED.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Patch 3b50d90298 ("ipv6: fix redefinition of in6_pktinfo ...")
fixed a libc compatibility issue in ipv6 structure definitions
as described in include/uapi/linux/libc-compat.h.
It relies on including linux/in6.h to include libc-compat.h itself.
Include that file directly to clearly communicate the dependency
(libc-compat.h: "This include must be as early as possible").
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
As discussed in http://patchwork.ozlabs.org/patch/427384/
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This filter is same as RTEXT_FILTER_BRVLAN except that it tries
to compress the consecutive vlans into ranges.
This helps on systems with large number of configured vlans.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes bridge IFLA_AF_SPEC netlink attribute parser to
look for more than one IFLA_BRIDGE_VLAN_INFO attribute. This allows
userspace to pack more than one vlan in the setlink msg.
The dumps were already sending more than one vlan info in the getlink msg.
This patch also adds bridge_vlan_info flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
BRIDGE_VLAN_INFO_RANGE_END to indicate start and end of vlan range
This patch also deletes unused ifla_br_policy.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both netinet/in.h and linux/ipv6.h define these two structs,
if we include both of them, we got:
/usr/include/linux/ipv6.h:19:8: error: redefinition of ‘struct in6_pktinfo’
struct in6_pktinfo {
^
In file included from /usr/include/arpa/inet.h:22:0,
from txtimestamp.c:33:
/usr/include/netinet/in.h:524:8: note: originally defined here
struct in6_pktinfo
^
In file included from txtimestamp.c:40:0:
/usr/include/linux/ipv6.h:24:8: error: redefinition of ‘struct ip6_mtuinfo’
struct ip6_mtuinfo {
^
In file included from /usr/include/arpa/inet.h:22:0,
from txtimestamp.c:33:
/usr/include/netinet/in.h:531:8: note: originally defined here
struct ip6_mtuinfo
^
So similarly to what we did for in6_addr, we need to sync with
libc header on their definitions.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The base for the current statistics is pretty mixed up, support
exporting RX/TX statistics for MSDUs per TID. This (currently)
covers received MSDUs, transmitted MSDUs and retries/failures
thereof.
Doing it per TID for MSDUs makes more sense than say only per AC
because it's symmetric - we could export per-AC statistics for all
frames (which AC we used for transmission can be determined also
for management frames) but per TID is better and usually data
frames are really the ones we care about. Also, on RX we can't
determine the AC - but we do know the TID for any QoS MPDU we
received.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current statistics we keep aren't very clear, some are on
MPDUs and some on MSDUs/MMPDUs. Clarify the descriptions based
on the counters mac80211 keeps.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add these two values:
* BEACON_RX: number of beacons received from this peer
* BEACON_SIGNAL_AVG: signal strength average for beacons only
These can then be used for Android Lollipop's statistics request.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is really just duplicating the list of information that's
already available in the nl80211 attribute, so remove the list.
Two small changes are needed:
* remove STATION_INFO_ASSOC_REQ_IES complete, but the length
(assoc_req_ies_len) can be used instead
* add NL80211_STA_INFO_RX_DROP_MISC which exists internally
but not in nl80211 yet
This gets rid of the duplicate maintenance of the two lists.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add the time spent scanning to the survey data so it can be
reported by drivers that collect such information.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>