Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.
nf_reset() is used in the following cases:
- when passing packets up the the socket layer, at which point we want to
release all netfilter references that might keep modules pinned while
the packet is queued. nf_trace doesn't matter anymore at this point.
- when encapsulating or decapsulating IPsec packets. We want to continue
tracing these packets after IPsec processing.
- when passing packets through virtual network devices. Only devices on
that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
used anymore. Its not entirely clear whether those packets should
be traced after that, however we've always done that.
- when passing packets through virtual network devices that make the
packet cross network namespace boundaries. This is the only cases
where we clearly want to reset nf_trace and is also what the
original patch intended to fix.
Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
if userspace changes lifetime of address, send netlink notification and
call notifier.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.
The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.
The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.
Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.
I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.
Reported-by: Karel Srot <ksrot@redhat.com>
Reported-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 14134f6584.
The problem that the above patch was meant to address is that af_unix
messages are not being coallesced because we are sending unnecesarry
credentials. Not sending credentials in maybe_add_creds totally
breaks unconnected unix domain sockets that wish to send credentails
to other sockets.
In practice this break some versions of udev because they receive a
message and the sending uid is bogus so they drop the message.
Reported-by: Sven Joachim <svenjoac@gmx.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a race condition if we try to rmmod bonding and simultaneously add
a bond master through sysfs. In bonding_exit() we first remove the devices
(through rtnl_link_unregister() ) and only after that we remove the sysfs.
If we manage to add a device through sysfs after that the devices were
removed - we'll end up with that device/sysfs structure and with the module
unloaded.
Fix this by first removing the sysfs and only after that calling
rtnl_link_unregister().
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices. In
some cases (bond/team) a single address list is synched down
to multiple devices. At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.
Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It would cause no link after suspending or shutdowning when the
nic changes the speed to 10M and connects to a link partner which
forces the speed to 100M.
Check the link partner ability to determine which speed to set.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains netfilter updates for your net tree,
they are:
* Fix missing the skb->trace reset in nf_reset, noticed by Gao Feng
while using the TRACE target with several net namespaces.
* Fix prefix translation in IPv6 NPT if non-multiple of 32 prefixes
are used, from Matthias Schiffer.
* Fix invalid nfacct objects with empty name, they are now rejected
with -EINVAL, spotted by Michael Zintakis, patch from myself.
* A couple of fixes for wrong return values in the error path of
nfnetlink_queue and nf_conntrack, from Wei Yongjun.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Here are some more fixes intended for the 3.9 stream...
Regarding the mac80211 bits, Johannes says:
"I had changed the idle handling to simplify it, but broken the
sequencing of commands, at least for ath9k-htc, one patch restores the
sequence. The other patch fixes a crash Jouni found while stress-testing
the remain-on-channel code, when an item is deleted the work struct can
run twice and crash the second time."
As for the iwlwifi bits, Johannes says:
"The only fix here is to the passive-no-RX firmware regulatory
enforcement driver support code to not drop auth frames in quick
succession, leading to not being able to connect to APs on passive
channels in certain circumstances."
Don't forget the NFC bits, about which Samuel says:
"This time we have:
- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
APIs have been merged into char-misc-next."
On top of that, Stone Piao provides an mwifiex fix to avoid accessing
beyond the end of a buffer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The bitmask used for the prefix mangling was being calculated
incorrectly, leading to the wrong part of the address being replaced
when the prefix length wasn't a multiple of 32.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking fixes from David Miller:
1) Fix VSOCK layer handling of context ID changes, from Reilly Grant.
2) Now that we have a synchronize_net() in netdev_rx_handler_unregister(),
we can't let any call sites hold locks. Unfortunately bonding does,
so we have to drop the rwlock there a little bit earlier, fix from
Veaceslav Falico.
3) MAC address setting loop exits one iteration too early in mlx4
driver, from Yan Burman.
4) Restore ipv6 routes properly upon ifdown/ifup of loopback, from
Balakumaran Kannan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
VSOCK: Handle changes to the VMCI context ID.
net IPv6 : Fix broken IPv6 routing table after loopback down-up
cbq: incorrect processing of high limits
net/mlx4_en: Fix setting initial MAC address
bonding: get netdev_rx_handler_unregister out of locks
Pull regmap fixes from Mark Brown:
"A small collection of fixes. The most important ones are those from
Stephen and Lars-Peter both of which fix cache issues that have been
lurking for a while but not manifesting noticably enough for anyone to
report them."
* tag 'regmap-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: async: Add missing return
regmap: don't corrupt work buffer in _regmap_raw_write()
regmap: cache Fix regcache-rbtree sync
regmap: Initialize `map->debugfs' before regcache
Pull DRM fixes from Dave Airlie:
"Two core fixes, both regressions, along with some intel and some
nouveau fixes for regressions and oopses"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: correctly restore mappings if drm_open fails
drm/nouveau: fix NULL ptr dereference from nv50_disp_intr()
drm/nouveau: fix handling empty channel list in ioctl's
drm: don't unlock in the addfb error paths
drm/i915: Fix build failure
drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2)
drm/i915: duct-tape locking when eDP init fails
Pull MIPS fixes from Ralf Baechle:
"A collection of fixes pretty much across the MIPS code. Even the
change to include/linux/signal.h by David Howells' 2a1486981c ("Fix
breakage in MIPS siginfo handling") should be considered MIPS-specific
as it touches an ifdefed segment that is only relevant to MIPS and
which unfortunately can't be made to go away entirely."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
Fix breakage in MIPS siginfo handling
Revert "MIPS: BCM63XX: Call board_register_device from device_initcall()"
MIPS: BCM63XX: Make nvram checksum failure non fatal
MIPS: Fix code generation for non-DSP capable CPUs
MIPS: Fix inconsistent formatting inside /proc/cpuinfo
MIPS: SEAD3: Enable LL/SC.
MIPS: Get rid of CONFIG_CPU_HAS_LLSC again
MIPS: Add dependencies for HAVE_ARCH_TRANSPARENT_HUGEPAGE
MIPS: VR4133: Fix probe for LL/SC.
MIPS: Fix logic errors in bitops.c
MIPS: Use CONFIG_CPU_MIPSR2 in csum_partial.S
MIPS: compat: Return same error ENOSYS as native for invalid operation.
One locking regression fix, and a couple of other i915 ones.
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm: don't unlock in the addfb error paths
drm/i915: Fix build failure
drm/i915: Be sure to turn hsync/vsync back on at crt enable (v2)
drm/i915: duct-tape locking when eDP init fails
The VMCI context ID of a virtual machine may change at any time. There
is a VMCI event which signals this but datagrams may be processed before
this is handled. It is therefore necessary to be flexible about the
destination context ID of any datagrams received. (It can be assumed to
be correct because it is provided by the hypervisor.) The context ID on
existing sockets should be updated to reflect how the hypervisor is
currently referring to the system.
Signed-off-by: Reilly Grant <grantr@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.
IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.
This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.
==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$
After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$
Signed-off-by: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
Signed-off-by: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link
In shaper | Actual Result
-----------+---------------
100M | 108 Mbps
200M | 244 Mbps
300M | 412 Mbps
500M | 893 Mbps
This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.
To fix this problem we prevent change of q->now until its synchronization
with real time.
Signed-off-by: Vasily Averin <vvs@openvz.org>
Reviewed-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that msg pointer is set back to error value in case of
MSG_COPY flag is set and desired message to copy wasn't found. This
garantees that msg is either a error pointer or a copy address.
Otherwise the last message in queue will be freed without unlinking from
the queue (which leads to memory corruption) and the dummy allocated
copy won't be released.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6bbb6d9 "net/mlx4_en: Optimize Rx fast path filter checks" introduced a regression
under which the MAC address read from the card was not converted correctly
(the most significant byte was not handled), fix that.
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>