Rename skb_dst_set_noref to __skb_dst_set_noref and
add force flag as suggested by David Miller. The new wrapper
skb_dst_set_noref_force will force dst entries that are not
cached to be attached as skb dst without taking reference
as long as provided dst is reclaimed after RCU grace period.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
Tokenring support was deleted in v3.5. One last holdout of the macro
CONFIG_TR escaped that fate. Until now.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 40893fd(net: switch to use skb_probe_transport_header())
involes a new error accidently. When NET_SKBUFF_DATA_USES_OFFSE is
not enabled, below compile error happens:
CC net/packet/af_packet.o
net/packet/af_packet.c: In function ‘packet_sendmsg_spkt’:
net/packet/af_packet.c:1516:2: error: implicit declaration of function ‘skb_probe_transport_header’ [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[2]: *** [net/packet/af_packet.o] Error 1
make[1]: *** [net/packet] Error 2
make: *** [net] Error 2
As it seems skb_probe_transport_header() is not related to
NET_SKBUFF_DATA_USES_OFFSE, we should move the definition of
skb_probe_transport_header() out of scope of
NET_SKBUFF_DATA_USES_OFFSE macro.
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
include/net/ipip.h
The changes made to ipip.h in 'net' were already included
in 'net-next' before that header was moved to another location.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes, we need probe and set the transport header for packets (e.g from
untrusted source). This patch introduces a new helper
skb_probe_transport_header() which tries to probe and set the l4 header through
skb_flow_dissect(), if not just set the transport header to the hint passed by
caller.
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Always increment IPV4 ID field in encapsulated GSO packets, even
when DF is set. Regression fix from Pravin B Shelar.
2) Fix per-net subsystem initialization in netfilter conntrack,
otherwise we may access dynamically allocated memory before it is
actually allocated. From Gao Feng.
3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.
4) Fix race between submission of sync vs async commands in mwifiex
driver, from Amitkumar Karwar.
5) Add missing cancel of command timer in mwifiex driver, from Bing
Zhao.
6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.
7) Thermal layer tries to use a genetlink multicast string that is
longer than the 16 character limit. Fix it and add a BUG check to
prevent this kind of thing from happening in the future.
From Masatake YAMATO.
8) Fix many bugs in the handling of the teardown of L2TP connections,
UDP encapsulation instances, and sockets. From Tom Parkin.
9) Missing socket release in IRDA, from Kees Cook.
10) Fix fec driver modular build, from Fabio Estevam.
11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
from Wei Yongjun.
12) Fix bugs in handling of queue numbers and steering rules in mlx4
driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.
13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
Vagin.
14) TCP segmentation deferral is unintentionally done too strongly,
breaking ACK clocking. Fix from Eric Dumazet.
15) net_enable_timestamp() can legitimately be invoked from software
interrupts, and in a way that is safe, so remove the WARN_ON().
Also from Eric Dumazet.
16) Fix use after free in VLANs, from Cong Wang.
17) Fix TCP slow start retransmit storms after SACK reneging, from
Yuchung Cheng.
18) Unix socket release should mark a socket dead before NULL'ing out
sock->sk, otherwise we can race. Fix from Paul Moore.
19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.
20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
clock frequency in IGB driver. From Lior LevyAlex Williamson, Jiri
Benc, and Jeff Kirsher.
21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
forwarding. Fix from Veaceslav Falico.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
ipv4: Fix ip-header identification for gso packets.
bonding: remove already created master sysfs link on failure
af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
pch_gbe: fix ip_summed checksum reporting on rx
igb: fix PHC stopping on max freq
igb: make sensor info static
igb: SR-IOV init reordering
igb: Fix null pointer dereference
igb: fix i350 anti spoofing config
ixgbevf: don't release the soft entries
ipv6: fix bad free of addrconf_init_net
unix: fix a race condition in unix_release()
tcp: undo spurious timeout after SACK reneging
bnx2x: fix assignment of signed expression to unsigned variable
bridge: fix crash when set mac address of br interface
8021q: fix a potential use-after-free
net: remove a WARN_ON() in net_enable_timestamp()
tcp: preserve ACK clocking in TSO
net: fix *_DIAG_MAX constants
net/mlx4_core: Disallow releasing VF QPs which have steering rules
...
Inspection of upper layer protocol is considered harmful, especially
if it is about ARP or other stateful upper layer protocol; driver
cannot (and should not) have full state of them.
IPv4 over Firewire module used to inspect ARP (both in sending path
and in receiving path), and record peer's GUID, max packet size, max
speed and fifo address. This patch removes such inspection by extending
our "hardware address" definition to include other information as well:
max packet size, max speed and fifo. By doing this, The neighbour
module in networking subsystem can cache them.
Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
information will not be used in the driver when sending.
When a packet is being sent, the IP layer fills our pseudo header with
the extended "hardware address", including GUID and fifo. The driver
can look-up node-id (the real but rather volatile low-level address)
by GUID, and then the module can send the packet to the wire using
parameters provided in the extendedn hardware address.
This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
over IEEE1394 (RFC3146) share same "hardware address" format
in their address resolution protocols.
Here, extended "hardware address" is defined as follows:
union fwnet_hwaddr {
u8 u[16];
struct {
__be64 uniq_id; /* EUI-64 */
u8 max_rec; /* max packet size */
u8 sspd; /* max speed */
__be16 fifo_hi; /* hi 16bits of FIFO addr */
__be32 fifo_lo; /* lo 32bits of FIFO addr */
} __packed uc;
};
Note that Hardware address is declared as union, so that we can map full
IP address into this, when implementing MCAP (Multicast Cannel Allocation
Protocol) for IPv6, but IP and ARP subsystem do not need to know this
format in detail.
One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
is that 1394 ARP Request/Reply do not contain the target hardware address
field (aka ar$tha). This difference is handled in the ARP subsystem.
CC: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM SoC bug fixes from Arnd Bergmann:
"Four patches for arm-soc this week:
- Kevin Hilman is no longer reachable under his previous email
address. He submitted the patch earlier, but nobody felt
responsible to pick it up.
- One Tegra fix for an incorect register address in device tree.
- IMX multiplatform support exposes a configuration option that leads
to unbootable kernels on all other machines and that needs to
depend on that platform.
- A nontrivial bug fix for the setup of the mxs video output."
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: update email address for Kevin Hilman
ARM: tegra: fix register address of slink controller
ARM: imx: add dependency check for DEBUG_IMX_UART_PORT
ARM: video: mxs: Fix mxsfb misconfiguring VDCTRL0
Pull NVMe driver update from Matthew Wilcox:
"These patches have mostly been baking for a few months; sorry I didn't
get them in during the merge window. They're all bug fixes, except
for the addition of the SMART log and the addition to MAINTAINERS."
* git://git.infradead.org/users/willy/linux-nvme:
NVMe: Add namespaces with no LBA range feature
MAINTAINERS: Add entry for the NVMe driver
NVMe: Initialize iod nents to 0
NVMe: Define SMART log
NVMe: Add result to nvme_get_features
NVMe: Set result from user admin command
NVMe: End queued bio requests when freeing queue
NVMe: Free cmdid on nvme_submit_bio error
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mqueue: sys_mq_open: do not call mnt_drop_write() if read-only
mm/hotplug: only free wait_table if it's allocated by vmalloc
dma-debug: update DMA debug API to better handle multiple mappings of a buffer
dma-debug: fix locking bug in check_unmap()
drivers/rtc/rtc-at91rm9200.c: use a variable for storing IMR
drivers/video/ep93xx-fb.c: include <linux/io.h> for devm_ioremap()
drivers/rtc/rtc-da9052.c: fix for rtc device registration
mm: zone_end_pfn is too small
poweroff: change orderly_poweroff() to use schedule_work()
mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting
printk: Provide a wake_up_klogd() off-case
irq_work.h: fix warning when CONFIG_IRQ_WORK=n
wake_up_klogd() is useless when CONFIG_PRINTK=n because neither printk()
nor printk_sched() are in use and there are actually no waiter on
log_wait waitqueue. It should be a stub in this case for users like
bust_spinlocks().
Otherwise this results in this warning when CONFIG_PRINTK=n and
CONFIG_IRQ_WORK=n:
kernel/built-in.o In function `wake_up_klogd':
(.text.wake_up_klogd+0xb4): undefined reference to `irq_work_queue'
To fix this, provide an off-case for wake_up_klogd() when
CONFIG_PRINTK=n.
There is much more from console_unlock() and other console related code
in printk.c that should be moved under CONFIG_PRINTK. But for now,
focus on a minimal fix as we passed the merged window already.
[akpm@linux-foundation.org: include printk.h in bust_spinlocks.c]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reported-by: James Hogan <james.hogan@imgtec.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull KVM fix from Marcelo Tosatti:
"Fix compilation on PPC with !CONFIG_KVM"
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
Revert "KVM: allow host header to be included even for !CONFIG_KVM"
Pull USB fixes from Greg Kroah-Hartman:
"Here are a number of USB fixes that resolve issues that have been
reported against 3.9-rc3."
* tag 'usb-3.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (37 commits)
USB: ti_usb_3410_5052: fix use-after-free in TIOCMIWAIT
USB: ssu100: fix use-after-free in TIOCMIWAIT
USB: spcp8x5: fix use-after-free in TIOCMIWAIT
USB: quatech2: fix use-after-free in TIOCMIWAIT
USB: pl2303: fix use-after-free in TIOCMIWAIT
USB: oti6858: fix use-after-free in TIOCMIWAIT
USB: mos7840: fix use-after-free in TIOCMIWAIT
USB: mos7840: fix broken TIOCMIWAIT
USB: mct_u232: fix use-after-free in TIOCMIWAIT
USB: io_ti: fix use-after-free in TIOCMIWAIT
USB: io_edgeport: fix use-after-free in TIOCMIWAIT
USB: ftdi_sio: fix use-after-free in TIOCMIWAIT
USB: f81232: fix use-after-free in TIOCMIWAIT
USB: cypress_m8: fix use-after-free in TIOCMIWAIT
USB: ch341: fix use-after-free in TIOCMIWAIT
USB: ark3116: fix use-after-free in TIOCMIWAIT
USB: serial: add modem-status-change wait queue
USB: serial: fix interface refcounting
USB: io_ti: fix get_icount for two port adapters
USB: garmin_gps: fix memory leak on disconnect
...
Pull EDAC fixes from Borislav Petkov:
"A fix from Mauro to correct csrow size accounting in sysfs and a
sparse fix from Stephen Hemminger."
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Merge mci.mem_is_per_rank with mci.csbased
amd64_edac: Correct DIMM sizes
EDAC: Make sysfs functions static
Pull to get the thermal netlink multicast group name fix, otherwise
the assertion added in net-next to netlink to detect that kind of bug
makes systems unbootable for some folks.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the Marvell MV643XX ethernet driver to use the
Marvell Orion MDIO driver. As a result, PowerPC and ARM platforms
registering the Marvell MV643XX ethernet driver are also updated to
register a Marvell Orion MDIO driver. This driver voluntarily overlaps
with the Marvell Ethernet shared registers because it will use a subset
of this shared register (shared_base + 0x4 to shared_base + 0x84). The
Ethernet driver is also updated to look up for a PHY device using the
Orion MDIO bus driver.
For ARM and PowerPC we register a single instance of the "mvmdio" driver
in the system like it used to be done with the use of the "shared_smi"
platform_data cookie on ARM.
Note that it is safe to register the mvmdio driver only for the "ge00"
instance of the driver because this "ge00" interface is guaranteed to
always be explicitely registered by consumers of
arch/arm/plat-orion/common.c and other instances (ge01, ge10 and ge11)
were all pointing their shared_smi to ge00. For PowerPC the in-tree
Device Tree Source files mention only one MV643XX ethernet MAC instance
so the MDIO bus driver is registered only when id == 0.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add modem-status-change wait queue to struct usb_serial_port that
subdrivers can use to implement TIOCMIWAIT.
Currently subdrivers use a private wait queue which may have been
released when waking up after device disconnected.
Note that we're adding a new wait queue rather than reusing the tty-port
one as we do not want to get woken up at hangup (yet).
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If bpf_jit_enable > 1, then we dump the emitted JIT compiled image
after creation. Currently, only SPARC and PowerPC has similar output
as in the reference implementation on x86_64. Make a small helper
function in order to reduce duplicated code and make the dump output
uniform across architectures x86_64, SPARC, PPC, ARM (e.g. on ARM
flen, pass and proglen are currently not shown, but would be
interesting to know as well), also for future BPF JIT implementations
on other archs.
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Eric Dumazet <eric.dumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements F-RTO (foward RTO recovery):
When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted. Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.
The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss(). The basic version is not supported
because SACK enhanced version also works for non-SACK connections.
The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.
Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch series refactor the F-RTO feature (RFC4138/5682).
This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features. It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).
The new code implements newer F-RTO RFC5682 using CA_Loss processing
path. F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently. F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.
The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation. Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>