We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
First, let's explain the problem.
Suppose you have an ipip interface that stands in the netns foo and its link
part in the netns bar (so the netns bar has an nsid into the netns foo).
Now, you remove the netns bar:
- the bar nsid into the netns foo is removed
- the netns exit method of ipip is called, thus our ipip iface is removed:
=> a netlink message is built in the netns foo to advertise this deletion
=> this netlink message requests an nsid for bar, thus a new nsid is
allocated for bar and never removed.
This patch adds a check in peernet2id() so that an id cannot be allocated for
a netns which is currently destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts
commit 4217291e59 ("netns: don't clear nsid too early on removal").
This is not the right fix, it introduces races.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to hold rtnl lock for fib_rules_unregister()
otherwise the following race could happen:
fib_rules_unregister(): fib_nl_delrule():
... ...
... ops = lookup_rules_ops();
list_del_rcu(&ops->list);
list_for_each_entry(ops->rules) {
fib_rules_cleanup_ops(ops); ...
list_del_rcu(); list_del_rcu();
}
Note, net->rules_mod_lock is actually not needed at all,
either upper layer netns code or rtnl lock guarantees
we are safe.
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On processing cumulative ACKs, the FRTO code was not checking the
SACKed bit, meaning that there could be a spurious FRTO undo on a
cumulative ACK of a previously SACKed skb.
The FRTO code should only consider a cumulative ACK to indicate that
an original/unretransmitted skb is newly ACKed if the skb was not yet
SACKed.
The effect of the spurious FRTO undo would typically be to make the
connection think that all previously-sent packets were in flight when
they really weren't, leading to a stall and an RTO.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: e33099f96d ("tcp: implement RFC5682 F-RTO")
Signed-off-by: David S. Miller <davem@davemloft.net>
xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.
Since c/s 9ecd1a75d, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09d) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.
Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).
Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.
Fixes: 9ecd1a75d9 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull dmaengine fixes from Vinod Koul:
"This time we have addition of caps for jz4740 which fixes intentional
warning at boot. Then we have memory leak issues in drivers using
virt-dma by Peter on few drive"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: moxart-dma: Fix memory leak when stopping a running transfer
dmaengine: bcm2835-dma: Fix memory leak when stopping a running transfer
dmaengine: omap-dma: Fix memory leak when terminating running transfer
dmaengine: edma: fix memory leak when terminating running transfers
dmaengine: jz4740: Define capabilities
Pull networking fixes from David Miller:
1) Fix use-after-free with mac80211 RX A-MPDU reorder timer, from
Johannes Berg.
2) iwlwifi leaks memory every module load/unload cycles, fix from Larry
Finger.
3) Need to use for_each_netdev_safe() in rtnl_group_changelink()
otherwise we can crash, from WANG Cong.
4) mlx4 driver does register_netdev() too early in the probe sequence,
from Ido Shamay.
5) Don't allow router discovery hop limit to decrease the interface's
hop limit, from D.S. Ljungmark.
6) tx_packets and tx_bytes improperly accounted for certain classes of
USB network devices, fix from Ben Hutchings.
7) ip{6}mr_rules_init() mistakenly use plain kfree to release the ipmr
tables in the error path, they must instead use ip{6}mr_free_table().
Fix from WANG Cong.
8) cxgb4 doesn't properly quiesce all RX activity before unregistering
the netdevice. Fix from Hariprasad Shenai.
9) Fix hash corruptions in ipvlan driver, from Jiri Benc.
10) nla_memcpy(), like a real memcpy, should fully initialize the
destination buffer, even if the source attribute is smaller. Fix
from Jiri Benc.
11) Fix wrong error code returned from iucv_sock_sendmsg(). We should
use whatever sock_alloc_send_skb() put into 'err'. From Eugene
Crosser.
12) Fix slab object leak on module unload in TIPC, from Ying Xue.
13) Need a READ_ONCE() when reading the cached RX socket route in
tcp_v{4,6}_early_demux(). From Michal Kubecek.
14) Still too many problems with TPC support in the ath9k driver, so
disable it for now. From Felix Fietkau.
15) When in AP mode the rtlwifi driver can leak DMA mappings, fix from
Larry Finger.
16) Missing kzalloc() failure check in gs_usb CAN driver, from Colin Ian
King.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
cxgb4: Fix to dump devlog, even if FW is crashed
cxgb4: Firmware macro changes for fw verison 1.13.32.0
bnx2x: Fix kdump when iommu=on
bnx2x: Fix kdump on 4-port device
mac80211: fix RX A-MPDU session reorder timer deletion
MAINTAINERS: Update Intel Wired Ethernet Driver info
tipc: fix a slab object leak
net/usb/r8152: add device id for Lenovo TP USB 3.0 Ethernet
af_iucv: fix AF_IUCV sendmsg() errno
openvswitch: Return vport module ref before destruction
netlink: pad nla_memcpy dest buffer with zeroes
bonding: Bonding Overriding Configuration logic restored.
ipvlan: fix check for IP addresses in control path
ipvlan: do not use rcu operations for address list
ipvlan: protect against concurrent link removal
ipvlan: fix addr hash list corruption
net: fec: setup right value for mdio hold time
net: tcp6: fix double call of tcp_v6_fill_cb()
cxgb4vf: Fix sparse warnings
netns: don't clear nsid too early on removal
...
Kalle Valo says:
====================
iwlwifi:
* fix a memory leak, we leaked memory each time the module
was loaded.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai says:
====================
cxgb4 FW macro changes for new FW
Fix to dump device log even in the case of firmware crash. Also
incorporates changes for new FW.
This patch series has been created against net tree and includes patches on
cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new Common Code routines to retrieve Firmware Device Log
parameters from PCIE_FW_PF[7]. The firmware initializes its Device Log very
early on and stores the parameters for its location/size in that register.
Using the parameters from the register allows us to access the Firmware
Device Log even when the firmware crashes very early on or we're not
attached to the firmware
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds new macro and few macro changes for fw version 1.13.32.0 also
changes version string in driver to match 1.13.32.0
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg says:
====================
This contains just a single fix for a crash I happened to randomly
run into today during testing. It's clearly been around for a while,
but is pretty hard to trigger, even when I tried explicitly (and
modified the code to make it more likely) it rarely did.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IOMMU fixes from Joerg Roedel:
"This contains fixes for:
- a VT-d issue where hardware domain-ids might be freed while still
in use.
- an ipmmu-vmsa issue where where the device-table was not zero
terminated
- unchecked register access issue in the arm-smmu driver"
* tag 'iommu-fixes-v4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Remove unused variable
iommu: ipmmu-vmsa: Add terminating entry for ipmmu_of_ids
iommu/vt-d: Detach domain *only* from attached iommus
iommu/arm-smmu: fix ARM_SMMU_FEAT_TRANS_OPS condition
Pull lazytime fixes from Ted Ts'o:
"This fixes a problem in the lazy time patches, which can cause
frequently updated inods to never have their timestamps updated.
These changes guarantee that no timestamp on disk will be stale by
more than 24 hours"
* tag 'lazytime_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fs: add dirtytime_expire_seconds sysctl
fs: make sure the timestamps for lazytime inodes eventually get written
Pull nfsd fixes from Bruce Fields:
"Two main issues:
- We found that turning on pNFS by default (when it's configured at
build time) was too aggressive, so we want to switch the default
before the 4.0 release.
- Recent client changes to increase open parallelism uncovered a
serious bug lurking in the server's open code.
Also fix a krb5/selinux regression.
The rest is mainly smaller pNFS fixes"
* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
sunrpc: make debugfs file creation failure non-fatal
nfsd: require an explicit option to enable pNFS
NFSD: Fix bad update of layout in nfsd4_return_file_layout
NFSD: Take care the return value from nfsd4_encode_stateid
NFSD: Printk blocklayout length and offset as format 0x%llx
nfsd: return correct lockowner when there is a race on hash insert
nfsd: return correct openowner when there is a race to put one in the hash
NFSD: Put exports after nfsd4_layout_verify fail
NFSD: Error out when register_shrinker() fail
NFSD: Take care the return value from nfsd4_decode_stateid
NFSD: Check layout type when returning client layouts
NFSD: restore trace event lost in mismerge
Yuval Mintz says:
====================
bnx2x: kdump related fixes
This patch series aims to fix bnx2x driver issues when loading in kdump kernel.
Both issues fixed here would be fatal to the device, requiring full reset of
the system in order to recover, preventing the device from serving its purpose
in the kdump environment.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When IOMM-vtd is active, once main kernel crashes unfinished DMAE transactions
will be blocked, putting the HW in an error state which will cause further
transactions to timeout.
Current employed logic uses wrong macros, causing the first function to be the
only function that cleanups that error state during its probe/load.
This patch allows all the functions to successfully re-load in kdump kernel.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running in a kdump kernel, it's very likely that due to sync. loss with
management firmware the first PCI function to probe and reach the previous
unload flow would decide it can reset the chip and continue onward. While doing
so, it will only close its own Rx port.
On a 4-port device where 2nd port on engine is a 1g-port, the 2nd port would
allow ingress traffic after the chip is reset [assuming it was active on the
first kernel]. This would later cause a HW attention.
This changes driver flow to close both ports' 1g capabilities during the
previous driver unload flow prior to the chip reset.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's an issue with the way the RX A-MPDU reorder timer is
deleted that can cause a kernel crash like this:
* tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
* station is destroyed
* reorder timer fires before ieee80211_free_tid_rx() runs,
accessing the station, thus potentially crashing due to
the use-after-free
The station deletion is protected by synchronize_net(), but
that isn't enough -- ieee80211_free_tid_rx() need not have
run when that returns (it deletes the timer.) We could use
rcu_barrier() instead of synchronize_net(), but that's much
more expensive.
Instead, to fix this, add a field tracking that the session
is being deleted. In this case, the only re-arming of the
timer happens with the reorder spinlock held, so make that
code not rearm it if the session is being deleted and also
delete the timer after setting that field. This ensures the
timer cannot fire after ___ieee80211_stop_rx_ba_session()
returns, which fixes the problem.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Update the git tree info with a recent change in tree names. Also
add our new mailing list created solely for Linux kernel patches
and kernel development, as well as the new patchwork project for
tracking patches. Lastly update the list of "reviewers" since a
couple of developers have moved on to different projects.
Made an update to the section header so that it is more manageable
going forward as we add new drivers.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When remove TIPC module, there is a warning to remind us that a slab
object is leaked like:
root@localhost:~# rmmod tipc
[ 19.056226] =============================================================================
[ 19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
[ 19.058736] -----------------------------------------------------------------------------
[ 19.058736]
[ 19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
[ 19.061915] INFO: Object 0xffff880014668000 @offset=0
[ 19.062717] kmem_cache_destroy TIPC: Slab cache still has objects
This is because the listening socket of TIPC topology server is not
closed before TIPC proto handler is unregistered with proto_unregister().
However, as the socket is closed in tipc_exit_net() which is called by
unregister_pernet_subsys() during unregistering TIPC namespace operation,
the warning can be eliminated if calling unregister_pernet_subsys() is
moved before calling proto_unregister().
Fixes: e05b31f4bf ("tipc: make tipc socket support net namespace")
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>