Eric Dumazet wrote:
| Some strange crashes happen in rt6_check_expired(), with access
| to random addresses.
|
| At first glance, it looks like the RTF_EXPIRES and
| stuff added in commit 1716a96101
| (ipv6: fix problem with expired dst cache)
| are racy : same dst could be manipulated at the same time
| on different cpus.
|
| At some point, our stack believes rt->dst.from contains a dst pointer,
| while its really a jiffie value (as rt->dst.expires shares the same area
| of memory)
|
| rt6_update_expires() should be fixed, or am I missing something ?
|
| CC Neil because of https://bugzilla.redhat.com/show_bug.cgi?id=892060
Because we do not have any locks for dst_entry, we cannot change
essential structure in the entry; e.g., we cannot change reference
to other entity.
To fix this issue, split 'from' and 'expires' field in dst_entry
out of union. Once it is 'from' is assigned in the constructor,
keep the reference until the very last stage of the life time of
the object.
Of course, it is unsafe to change 'from', so make rt6_set_from simple
just for fresh entries.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Neil Horman <nhorman@tuxdriver.com>
CC: Gao Feng <gaofeng@cn.fujitsu.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reported-by: Steinar H. Gunderson <sesse@google.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6431cbc25f(Create a mechanism for upward inetpeer propagation into routes)
introduces these codes, but this mechanism is never enabled since
rt6i_peer_genid always is zero whether it is not assigned or assigned by
rt6_peer_genid(). After 5943634fc5 (ipv4: Maintain redirect and PMTU info
in struct rtable again), the ipv4 related codes of this mechanism has been
removed, I think we maybe able to remove them now.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Eric, we could introduce a helper function
for ipv6 too, to avoid checking if rt is NULL before
dst_release().
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each nexthop is added like a single route in the routing table. All routes
that have the same metric/weight and destination but not the same gateway
are considering as ECMP routes. They are linked together, through a list called
rt6i_siblings.
ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
the other (in both case, the flag NLM_F_EXCL should not be set).
The patch is based on a previous work from
Luc Saillard <luc.saillard@6wind.com>.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 dst should take care of rt_genid too. When a xfrm policy is inserted or
deleted, all dst should be invalidated.
To force the validation, dst entries should be created with ->obsolete set to
DST_OBSOLETE_FORCE_CHK. This was already the case for all functions calling
ip6_dst_alloc(), except for ip6_rt_copy().
As a consequence, we can remove the specific code in inet6_connection_sock.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When adding a blackhole or a prohibit route, they were handling like classic
routes. Moreover, it was only possible to add this kind of routes by specifying
an interface.
Bug already reported here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498
Before the patch:
$ ip route add blackhole 2001::1/128
RTNETLINK answers: No such device
$ ip route add blackhole 2001::1/128 dev eth0
$ ip -6 route | grep 2001
2001::1 dev eth0 metric 1024
After:
$ ip route add blackhole 2001::1/128
$ ip -6 route | grep 2001
blackhole 2001::1 dev lo metric 1024 error -22
v2: wrong patch
v3: add a field fc_type in struct fib6_config to store RTN_* type
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes for a simplified conversion away from dst_get_neighbour*().
All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/route.c
Pull in 'net' again to get the revert of Thomas's change
which introduced regressions.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2a0c451ade.
It causes crashes, because now ip6_null_entry is used before
it is initialized.
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/route.c
This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.
Signed-off-by: David S. Miller <davem@davemloft.net>
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc
handler is installed in ip6_route_net_init() whereas fib_table_hash is
allocated in fib6_net_init() _after_ the proc handler has been installed.
This opens up a short time frame to access fib_table_hash with its pants
down.
fib6_init() as a whole can't be moved to an earlier position as it also
registers the rtnetlink message handlers which should be registered at
the end. Therefore split it into fib6_init() which is run early and
fib6_init_late() to register the rtnetlink message handlers.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We encode the pointer(s) into an unsigned long with one state bit.
The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.
Later the peer roots will be per-FIB table, and this change works to
facilitate that.
Signed-off-by: David S. Miller <davem@davemloft.net>
Functionally, this change is a NOP.
Semantically, rt6_clean_expires() wants to do rt->dst.from = NULL instead of
rt->dst.expires = 0. It is clearing the RTF_EXPIRES flag, so the union is going
to be treated as a pointer (dst.from) not a long (dst.expires).
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1716a961 (ipv6: fix problem with expired dst cache) broke PMTU
discovery. rt6_update_expires() calls dst_set_expires(), which only updates
dst->expires if it has not been set previously (expires == 0) or if the new
expires is earlier than the current dst->expires.
rt6_update_expires() needs to zero rt->dst.expires, otherwise it will contain
ivalid data left over from rt->dst.from and will confuse dst_set_expires().
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.
Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.
rt6_check_expired check if rt6_info.dst.from is expired.
ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
ip6_dst_destroy release the ort.
Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.
Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.
Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.
To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
* doing nothing
* reading ipv6 route table via proc
(while :; do cat /proc/net/ipv6_route > /dev/null; done)
* reading ipv6 route table via rtnetlink
(while :; do ip -6 route show table all > /dev/null; done)
* Load the ipv6 route table up with:
* for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
* iperf commands:
* client: iperf -i 1 -V -c <ipv6 addr>
* server: iperf -V -s
* iperf results - 3 runs each (in Mbits/sec)
* nothing: client: 927,927,927 server: 927,927,927
* proc: client: 179,97,96,113 server: 142,112,133
* iproute: client: 928,927,928 server: 927,927,927
lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.
Signed-off-by: Josh Hunt <joshhunt00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.
And it makes grepping for dst_entry member uses much harder too.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just makes it harder to see 1) what the code is doing
and 2) grep for all users of dst{->,.}neighbour
Signed-off-by: David S. Miller <davem@davemloft.net>
These header files are never installed to user consumption, so any
__KERNEL__ cpp checks are superfluous.
Projects should also not copy these files into their userland utility
sources and try to use them there. If they insist on doing so, the
onus is on them to sanitize the headers as needed.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ipv6] Add support for RTA_PREFSRC
This patch allows a user to select the preferred source address
for a specific IPv6-Route. It can be set via a netlink message
setting RTA_PREFSRC to a valid IPv6 address which must be
up on the device the route will be bound to.
Signed-off-by: Daniel Walter <dwalter@barracuda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>