Commit Graph

126 Commits

Author SHA1 Message Date
Xin Long
81f6cb3122 ipv6: add ipv6_dev_find()
This is to add an ip_dev_find like function for ipv6, used to find
the dev by saddr.

It will be used by TIPC protocol. So also export it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:19:52 -07:00
Cong Wang
8c0de6e96c ipv6: fix memory leaks on IPV6_ADDRFORM path
IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket
to IPv4, particularly struct ipv6_ac_socklist. Similar to
struct ipv6_mc_socklist, we should just close it on this path.

This bug can be easily reproduced with the following C program:

  #include <stdio.h>
  #include <string.h>
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <arpa/inet.h>

  int main()
  {
    int s, value;
    struct sockaddr_in6 addr;
    struct ipv6_mreq m6;

    s = socket(AF_INET6, SOCK_DGRAM, 0);
    addr.sin6_family = AF_INET6;
    addr.sin6_port = htons(5000);
    inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr);
    connect(s, (struct sockaddr *)&addr, sizeof(addr));

    inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr);
    m6.ipv6mr_interface = 5;
    setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6));

    value = AF_INET;
    setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value));

    close(s);
    return 0;
  }

Reported-by: ch3332xr@gmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30 16:30:55 -07:00
Fernando Gont
b75326c201 ipv6: Honor all IPv6 PIO Valid Lifetime values
RFC4862 5.5.3 e) prevents received Router Advertisements from reducing
the Valid Lifetime of configured addresses to less than two hours, thus
preventing hosts from reacting to the information provided by a router
that has positive knowledge that a prefix has become invalid.

This patch makes hosts honor all Valid Lifetime values, as per
draft-gont-6man-slaac-renum-06, Section 4.2. This is meant to help
mitigate the problem discussed in draft-ietf-v6ops-slaac-renum.

Note: Attacks aiming at disabling an advertised prefix via a Valid
Lifetime of 0 are not really more harmful than other attacks
that can be performed via forged RA messages, such as those
aiming at completely disabling a next-hop router via an RA that
advertises a Router Lifetime of 0, or performing a Denial of
Service (DoS) attack by advertising illegitimate prefixes via
forged PIOs.  In scenarios where RA-based attacks are of concern,
proper mitigations such as RA-Guard [RFC6105] [RFC7113] should
be implemented.

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-23 12:29:21 -07:00
Alexander Aring
f37c605936 addrconf: add functionality to check on rpl requirements
This patch adds a functionality to addrconf to check on a specific RPL
address configuration. According to RFC 6554:

To detect loops in the SRH, a router MUST determine if the SRH
includes multiple addresses assigned to any interface on that
router. If such addresses appear more than once and are separated by
at least one address not assigned to that router.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:30:57 -07:00
Sven Eckelmann
54e1f08bdd ipv6: Annotate ipv6_addr_is_* bitwise pointer casts
The sparse commit 6002ded74587 ("add a flag to warn on casts to/from
bitwise pointers") introduced a check for non-direct casts from/to
restricted datatypes (when -Wbitwise-pointer is enabled).

This triggered a warning in the 64 bit optimized ipv6_addr_is_*() functions
because sparse doesn't know that the buffer already points to some data in
the correct bitwise integer format. But these were correct and can
therefore be marked with __force to signalize sparse an intended cast to a
specific bitwise type.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16 16:09:44 -08:00
David S. Miller
3289086210 ipv6: Make ipv6_mc_may_pull() return bool.
Consistent with how pskb_may_pull() also now does so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-07 09:37:27 -04:00
Stefano Brivio
3a7ef457e8 ipv6: Fix return value of ipv6_mc_may_pull() for malformed packets
Commit ba5ea61462 ("bridge: simplify ip_mc_check_igmp() and
ipv6_mc_check_mld() calls") replaces direct calls to pskb_may_pull()
in br_ipv6_multicast_mld2_report() with calls to ipv6_mc_may_pull(),
that returns -EINVAL on buffers too short to be valid IPv6 packets,
while maintaining the previous handling of the return code.

This leads to the direct opposite of the intended effect: if the
packet is malformed, -EINVAL evaluates as true, and we'll happily
proceed with the processing.

Return 0 if the packet is too short, in the same way as this was
fixed for IPv4 by commit 083b78a9ed ("ip: fix ip_mc_may_pull()
return value").

I don't have a reproducer for this, unlike the one referred to by
the IPv4 commit, but this is clearly broken.

Fixes: ba5ea61462 ("bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 17:19:46 -07:00
Stephen Suryaputra
e1ae5c2ea4 vrf: Increment Icmp6InMsgs on the original netdev
Get the ingress interface and increment ICMP counters based on that
instead of skb->dev when the the dev is a VRF device.

This is a follow up on the following message:
https://www.spinics.net/lists/netdev/msg560268.html

v2: Avoid changing skb->dev since it has unintended effect for local
    delivery (David Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-12 11:00:11 -07:00
David Ahern
3616d08bcb ipv6: Move ipv6 stubs to a separate header file
The number of stubs is growing and has nothing to do with addrconf.
Move the definition of the stubs to a separate header file and update
users. In the move, drop the vxlan specific comment before ipv6_stub.

Code move only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:53:45 -07:00
David Ahern
6d3d07b45c ipv6: Refactor fib6_ignore_linkdown
fib6_ignore_linkdown takes a fib6_info but only looks at the net_device
and its IPv6 config. Change it to take a net_device over a fib6_info as
its input argument.

In addition, move it to a header file to make the check inline and usable
later with IPv4 code without going through the ipv6 stub, and rename to
ip6_ignore_linkdown since it is only checking the setting based on the
ipv6 struct on a device.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-29 10:48:04 -07:00
Peter Oskolkov
9b0a6a9dba ipv6_stub: add ipv6_route_input stub/proxy.
Proxy ip6_route_input via ipv6_stub, for later use by lwt bpf ip encap
(see the next patch in the patchset).

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-13 18:27:55 -08:00
Linus Lüssing
4b3087c7e3 bridge: Snoop Multicast Router Advertisements
When multiple multicast routers are present in a broadcast domain then
only one of them will be detectable via IGMP/MLD query snooping. The
multicast router with the lowest IP address will become the selected and
active querier while all other multicast routers will then refrain from
sending queries.

To detect such rather silent multicast routers, too, RFC4286
("Multicast Router Discovery") provides a standardized protocol to
detect multicast routers for multicast snooping switches.

This patch implements the necessary MRD Advertisement message parsing
and after successful processing adds such routers to the internal
multicast router list.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:09 -08:00
Linus Lüssing
ba5ea61462 bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls
This patch refactors ip_mc_check_igmp(), ipv6_mc_check_mld() and
their callers (more precisely, the Linux bridge) to not rely on
the skb_trimmed parameter anymore.

An skb with its tail trimmed to the IP packet length was initially
introduced for the following three reasons:

1) To be able to verify the ICMPv6 checksum.
2) To be able to distinguish the version of an IGMP or MLD query.
   They are distinguishable only by their size.
3) To avoid parsing data for an IGMPv3 or MLDv2 report that is
   beyond the IP packet but still within the skb.

The first case still uses a cloned and potentially trimmed skb to
verfiy. However, there is no need to propagate it to the caller.
For the second and third case explicit IP packet length checks were
added.

This hopefully makes ip_mc_check_igmp() and ipv6_mc_check_mld() easier
to read and verfiy, as well as easier to use.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-22 17:18:08 -08:00
Jeff Barnhill
2384d02520 net/ipv6: Add anycast addresses to a global hashtable
icmp6_send() function is expensive on systems with a large number of
interfaces. Every time it’s called, it has to verify that the source
address does not correspond to an existing anycast address by looping
through every device and every anycast address on the device.  This can
result in significant delays for a CPU when there are a large number of
neighbors and ND timers are frequently timing out and calling
neigh_invalidate().

Add anycast addresses to a global hashtable to allow quick searching for
matching anycast addresses.  This is based on inet6_addr_lst in addrconf.c.

Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-02 23:54:56 -07:00
Joe Stringer
8a615c6b03 bpf: Allow sk_lookup with IPv6 module
This is a more complete fix than d71019b54b ("net: core: Fix build
with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
module is loaded (not just if it's compiled in).

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15 16:08:39 -07:00
Martin KaFai Lau
2dbb9b9e6d bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT
This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY.  Like other
non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.

BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
to store the bpf context instead of using the skb->cb[48].

At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp).  At this
point,  it is not always clear where the bpf context can be appended
in the skb->cb[48] to avoid saving-and-restoring cb[].  Even putting
aside the difference between ipv4-vs-ipv6 and udp-vs-tcp.  It is not
clear if the lower layer is only ipv4 and ipv6 in the future and
will it not touch the cb[] again before transiting to the upper
layer.

For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
instead of IP[6]CB and it may still modify the cb[] after calling
the udp[46]_lib_lookup_skb().  Because of the above reason, if
sk->cb is used for the bpf ctx, saving-and-restoring is needed
and likely the whole 48 bytes cb[] has to be saved and restored.

Instead of saving, setting and restoring the cb[], this patch opts
to create a new "struct sk_reuseport_kern" and setting the needed
values in there.

The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
will serve all ipv4/ipv6 + udp/tcp combinations.  There is no protocol
specific usage at this point and it is also inline with the current
sock_reuseport.c implementation (i.e. no protocol specific requirement).

In "struct sk_reuseport_md", this patch exposes data/data_end/len
with semantic similar to other existing usages.  Together
with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
the bpf prog can peek anywhere in the skb.  The "bind_inany" tells
the bpf prog that the reuseport group is bind-ed to a local
INANY address which cannot be learned from skb.

The new "bind_inany" is added to "struct sock_reuseport" which will be
used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
to avoid repeating the "bind INANY" test on
"sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run.  It can
only be properly initialized when a "sk->sk_reuseport" enabled sk is
adding to a hashtable (i.e. during "reuseport_alloc()" and
"reuseport_add_sock()").

The new "sk_select_reuseport()" is the main helper that the
bpf prog will use to select a SO_REUSEPORT sk.  It is the only function
that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY.  As mentioned in
the earlier patch, the validity of a selected sk is checked in
run time in "sk_select_reuseport()".  Doing the check in
verification time is difficult and inflexible (consider the map-in-map
use case).  The runtime check is to compare the selected sk's reuseport_id
with the reuseport_id that we want.  This helper will return -EXXX if the
selected sk cannot serve the incoming request (e.g. reuseport_id
not match).  The bpf prog can decide if it wants to do SK_DROP as its
discretion.

When the bpf prog returns SK_PASS, the kernel will check if a
valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
If it does , it will use the selected sk.  If not, the kernel
will select one from "reuse->socks[]" (as before this patch).

The SK_DROP and SK_PASS handling logic will be in the next patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-11 01:58:46 +02:00
David Ahern
8308f3ff17 net/ipv6: Add support for specifying metric of connected routes
Add support for IFA_RT_PRIORITY to ipv6 addresses.

If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:12:45 -04:00
David Ahern
e6464b8c63 net/ipv6: Convert ipv6_add_addr to struct ifa6_config
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.

Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:

    addr      -->  cfg->pfx
    peer_addr -->  cfg->peer_pfx
    pfxlen    -->  cfg->plen
    flags     -->  cfg->ifa_flags

scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:12:44 -04:00
David Ahern
901731b882 net/ipv6: Add helper to return path MTU based on fib result
Determine path MTU from a FIB lookup result. Logic is based on
ip6_dst_mtu_forward plus lookup of nexthop exception.

Add ip6_dst_mtu_forward to ipv6_stubs to handle access by core
bpf code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-22 10:51:09 +02:00
David Ahern
65a2022e89 net/ipv6: Add fib lookup stubs for use in bpf helper
Add stubs to retrieve a handle to an IPv6 FIB table, fib6_get_table,
a stub to do a lookup in a specific table, fib6_table_lookup, and
a stub for a full route lookup.

The stubs are needed for core bpf code to handle the case when the
IPv6 module is not builtin.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11 00:10:57 +02:00
Stephen Suryaputra
bdb7cc643f ipv6: Count interface receive statistics on the ingress netdev
The statistics such as InHdrErrors should be counted on the ingress
netdev rather than on the dev from the dst, which is the egress.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 13:39:51 -04:00
Andrey Ignatov
d74bad4e74 bpf: Hooks for sys_connect
== The problem ==

See description of the problem in the initial patch of this patch set.

== The solution ==

The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.

It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.

Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
  significantly;
* there is no use-case for port.

As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.

Support is added for IPv4 and IPv6, for TCP and UDP.

IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.

== Implementation notes ==

The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.

bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.

bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-31 02:15:54 +02:00
David Ahern
232378e8db net/ipv6: Change address check to always take a device argument
ipv6_chk_addr_and_flags determines if an address is a local address and
optionally if it is an address on a specific device. For example, it is
called by ip6_route_info_create to determine if a given gateway address
is a local address. The address check currently does not consider L3
domains and as a result does not allow a route to be added in one VRF
if the nexthop points to an address in a second VRF. e.g.,

    $ ip route add 2001:db8:1::/64 vrf r2 via 2001:db8:102::23
    Error: Invalid gateway address.

where 2001:db8:102::23 is an address on an interface in vrf r1.

ipv6_chk_addr_and_flags needs to allow callers to always pass in a device
with a separate argument to not limit the address to the specific device.
The device is used used to determine the L3 domain of interest.

To that end add an argument to skip the device check and update callers
to always pass a device where possible and use the new argument to mean
any address in the domain.

Update a handful of users of ipv6_chk_addr with a NULL dev argument. This
patch handles the change to these callers without adding the domain check.

ip6_validate_gw needs to handle 2 cases - one where the device is given
as part of the nexthop spec and the other where the device is resolved.
There is at least 1 VRF case where deferring the check to only after
the route lookup has resolved the device fails with an unintuitive error
"RTNETLINK answers: No route to host" as opposed to the preferred
"Error: Gateway can not be a local address." The 'no route to host'
error is because of the fallback to a full lookup. The check is done
twice to avoid this error.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-16 11:28:38 -04:00
Florian Westphal
a3fde2addd rtnetlink: ipv6: convert remaining users to rtnl_register_module
convert remaining users of rtnl_register to rtnl_register_module
and un-export rtnl_register.

Requested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 13:35:36 -05:00
Eric Dumazet
27c565ae9d ipv6: remove IN6_ADDR_HSIZE from addrconf.h
IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid
confusion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 09:17:27 +09:00