This patch makes two changes to protect applications from either removing or
tampering with the CIPSOv4 IP option on a socket. The first is the requirement
that applications have the CAP_NET_RAW capability to set an IPOPT_CIPSO option
on a socket; this prevents untrusted applications from setting their own
CIPSOv4 security attributes on the packets they send. The second change is to
SELinux and it prevents applications from setting any IPv4 options when there
is an IPOPT_CIPSO option already present on the socket; this prevents
applications from removing CIPSOv4 security attributes from the packets they
send.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes bug in iptables modules refcounting on compat error way.
As we are getting modules in check_compat_entry_size_and_hooks(), in case of
later error, we should put them all in translate_compat_table(), not in the
compat_copy_entry_from_user() or compat_copy_match_from_user(), as it is now.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Vasily Averin <vvs@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds forgotten compat_flush_offset() call to error way of
translate_compat_table(). May lead to table corruption on the next
compat_do_replace().
Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a number of issues in parsing user-provided table in
translate_table(). Malicious user with CAP_NET_ADMIN may crash system by
passing special-crafted table to the *_tables.
The first issue is that mark_source_chains() function is called before entry
content checks. In case of standard target, mark_source_chains() function
uses t->verdict field in order to determine new position. But the check, that
this field leads no further, than the table end, is in check_entry(), which
is called later, than mark_source_chains().
The second issue, that there is no check that target_offset points inside
entry. If so, *_ITERATE_MATCH macro will follow further, than the entry
ends. As a result, we'll have oops or memory disclosure.
And the third issue, that there is no check that the target is completely
inside entry. Results are the same, as in previous issue.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using H-TCP with a single flow on a 500Mbit connection (or less
actually), alpha can exceed 65000, so alpha needs to be a u32.
Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie>
Signed-off-by: Doug Leith <doug.leith@nuim.ie>
Signed-off-by: David S. Miller <davem@davemloft.net>
memcpy 4 bytes to address of auto unsigned long variable followed
by comparison with u32 is a bloody bad idea.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
I believe this NET_INC_STATS() call can be replaced by
NET_INC_STATS_BH(), a little bit cheaper.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 32bit compatibility layer has no CAP_NET_ADMIN check in
compat_do_ipt_get_ctl, which for example allows to list the current
iptables rules even without having that capability (the non-compat
version requires it). Other capabilities might be required to exploit
the bug (eg. CAP_NET_RAW to get the nfnetlink socket?), so a plain user
can't exploit it, but a setup actually using the posix capability system
might very well hit such a constellation of granted capabilities.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch limits the amount of time you will defer sending a TSO segment
to less than two clock ticks, or the time between two acks, whichever is
longer.
On slow links, deferring causes significant bursts. See attached plots,
which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue
for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness
causes significant jitter, tends to overflow queues early (bad for short
queues), and makes delay-based congestion control more difficult.
Deferring by a couple clock ticks I believe will have a relatively small
impact on performance.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CIPSO passthrough mapping had a problem when sending categories which
would cause no or incorrect categories to be sent on the wire with a packet.
This patch fixes the problem which was a simple off-by-one bug.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix several places in the CIPSO code where it was dereferencing fields which
did not have valid pointers by moving those pointer dereferences into code
blocks where the pointers are valid.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Remove (compilation-breaking) debugging messages introduced at early
development stage.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though the tos field is only a single byte large, the values need to
be converted to net-endian for the checkum update so they are in the
corrent byte position. Also fix incorrect endian annotations.
Reported by Stephane Chazelas <Stephane_Chazelas@yahoo.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
They are not necessarily initialized to zero by the compiler,
for example when using run-time initializers of automatic
on-stack variables.
Noticed by Eric Dumazet and Patrick McHardy.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.
The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.
This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.
With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).
Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL. We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.
The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.
Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).
This patch: Fix the selinux side of things.
This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.
Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
Testing revealed a problem with the NetLabel cache where a cached entry could
be freed while in use by the LSM layer causing an oops and other problems.
This patch fixes that problem by introducing a reference counter to the cache
entry so that it is only freed when it is no longer in use.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>