This patch releases the lock on the state before calling
x->type->input. It also adds the lock to the spots where they're
currently needed.
Most of those places (all except mip6) are expected to disappear with
async crypto.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the moving out of the replay processing on the output, this
patch moves the integrity stat collectin from x->type->input into
xfrm_input.
This would eventually allow transforms such as AH/ESP to be lockless.
The error value EBADMSG (currently unused in the crypto layer) is used
to indicate a failed integrity check. In future this error can be
directly returned by the crypto layer once we switch to aead
algorithms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the work on asynchronous cryptographic operations, we need
to be able to resume from the spot where they occur. As such, it
helps if we isolate them to one spot.
This patch moves most of the remaining family-specific processing into
the common input code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur. As such, it
helps if we isolate them to one spot.
This patch moves most of the remaining family-specific processing into
the common output code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so. They also share the same output
function dst_output.
This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.
Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With inter-family transforms the inner mode differs from the outer
mode. Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.
This patch separates the two parts on the input path so that each
function deals with one family only.
In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb. This is then used by the inner mode
input functions to modify the inner IP header. In this way the input
function no longer has to know about the outer address family.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
With inter-family transforms the inner mode differs from the outer
mode. Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.
This patch separates the two parts on the output path so that each
function deals with one family only.
In particular, the functions xfrm4_extract_output/xfrm6_extract_output
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb. This is then used by the outer mode
output functions to write the outer IP header. In this way the output
function no longer has to know about the inner address family.
Since the extract functions are only called by tunnel modes (the only
modes that can support inter-family transforms), I've also moved the
xfrm*_tunnel_check_size calls into them. This allows the correct ICMP
message to be sent as opposed to now where you might call icmp_send
with an IPv6 packet and vice versa.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
While BEET can theoretically work with IPComp the current code can't
do that because it tries to construct a BEET mode tunnel type which
doesn't (and cannot) exist. In fact as it is it won't even attach a
tunnel object at all for BEET which is bogus.
To support this fully we'd also need to change the policy checks on
input to recognise a plain tunnel as a legal variant of an optional
BEET transform.
This patch simply fails such constructions for now.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common. This patch extracts that logic and puts it into
xfrm_bundle_create. The rest of it are then accessed through afinfo.
As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.
This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function. It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.
This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we took the device from the bottom route and idev from the
top route. This is bad because idev may well point to a different
device. This patch changes it so that we get the idev from the device
directly.
It also makes it an error if either dev or idev is NULL. This is
consistent with the rest of the routing code which also treats these
cases as errors.
I've removed the err initialisation in xfrm6_policy.c because it
achieves no purpose and hid a bug when an initial version of this
patch neglected to set err to -ENODEV (fortunately the IPv4 version
warned about it).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input function should never be invoked on IPsec dst objects. This
is because we don't apply IPsec on input until after we've made the
routing decision.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbour field is only used by dst_confirm which only ever happens on
the top-most xfrm dst. So it's a waste to duplicate for every other xfrm
dst. This patch moves its setting out of the loop so that only the top one
gets set.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.
This patch deletes all of them except dst_discard and points all the
users to it.
The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors. So they should
either all return errors or all return zero. For now I've stuck with
the majority and picked zero as the return value.
It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dst member nfheader_len is only used by IPv6. It's also currently
creating a rather ugly alignment hole in struct dst. Therefore this patch
moves it from there into struct rt6_info.
It also reorders the fields in rt6_info to minimize holes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.
The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.
The next step will be to add a way to configure the scope for an IPoIB
interface.
Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As it is ip_append_data only counts page fragments to the skb that
allocated it. As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.
This bug was exposed by the UDP accounting patch.
[ The wmem_alloc bumping needs to be moved with the truesize,
noticed by Takahiro Yasui. -DaveM ]
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit "96793b482540f3a26e2188eaf75cb56b7829d3e3" (Add ICMPMsgStats
MIB (RFC 4293)) made a mistake.
In that patch, David L added a icmp_out_count() in
ip_push_pending_frames(), remove icmp_out_count() from
icmp_reply(). But he forgot to remove icmp_out_count() from
icmp_send() too. Since icmp_send and icmp_reply will call
icmp_push_reply, which will call ip_push_pending_frames, a duplicated
increment happened in icmp_send.
This patch remove the icmp_out_count from icmp_send too.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed "ip route list" was slower than "cat /proc/net/route" on a
machine with a full Internet routing table (214392 entries : Special
thanks to Robert ;) )
This is similar to problem reported in commit
d8c9283089 ("[IPV4] ROUTE: ip_rt_dump()
is unecessary slow")
Fix is to avoid scanning the begining of fz_hash table, but directly
seek to the right offset.
Before patch :
time ip route >/tmp/ROUTE
real 0m1.285s
user 0m0.712s
sys 0m0.436s
After patch
# time ip route >/tmp/ROUTE
real 0m0.835s
user 0m0.692s
sys 0m0.124s
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>