In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).
This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
I saw the following BUG when ->newlink() fails in rtnl_newlink():
[ 40.240058] kernel BUG at net/core/dev.c:6438!
this is due to free_netdev() is not supposed to be called before
netdev is completely unregistered, therefore it is not correct
to call free_netdev() here, at least for ops->newlink!=NULL case,
many drivers call it in ->destructor so that rtnl_unlock() will
take care of it, we probably don't need to do anything here.
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark functions as static in core/dev.c because they are not used outside
this file.
This eliminates the following warning in core/dev.c:
net/core/dev.c:2806:5: warning: no previous prototype for ‘__dev_queue_xmit’ [-Wmissing-prototypes]
net/core/dev.c:4640:5: warning: no previous prototype for ‘netdev_adjacent_sysfs_add’ [-Wmissing-prototypes]
net/core/dev.c:4650:6: warning: no previous prototype for ‘netdev_adjacent_sysfs_del’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_alloc_send_pskb() & sk_page_frag_refill()
have a loop trying high order allocations to prepare
skb with low number of fragments as this increases performance.
Problem is that under memory pressure/fragmentation, this can
trigger OOM while the intent was only to try the high order
allocations, then fallback to order-0 allocations.
We had various reports from unexpected regressions.
According to David, setting __GFP_NORETRY should be fine,
as the asynchronous compaction is still enabled, and this
will prevent OOM from kicking as in :
CFSClientEventm invoked oom-killer: gfp_mask=0x42d0, order=3, oom_adj=0,
oom_score_adj=0, oom_score_badness=2 (enabled),memcg_scoring=disabled
CFSClientEventm
Call Trace:
[<ffffffff8043766c>] dump_header+0xe1/0x23e
[<ffffffff80437a02>] oom_kill_process+0x6a/0x323
[<ffffffff80438443>] out_of_memory+0x4b3/0x50d
[<ffffffff8043a4a6>] __alloc_pages_may_oom+0xa2/0xc7
[<ffffffff80236f42>] __alloc_pages_nodemask+0x1002/0x17f0
[<ffffffff8024bd23>] alloc_pages_current+0x103/0x2b0
[<ffffffff8028567f>] sk_page_frag_refill+0x8f/0x160
[<ffffffff80295fa0>] tcp_sendmsg+0x560/0xee0
[<ffffffff802a5037>] inet_sendmsg+0x67/0x100
[<ffffffff80283c9c>] __sock_sendmsg_nosec+0x6c/0x90
[<ffffffff80283e85>] sock_sendmsg+0xc5/0xf0
[<ffffffff802847b6>] __sys_sendmsg+0x136/0x430
[<ffffffff80284ec8>] sys_sendmsg+0x88/0x110
[<ffffffff80711472>] system_call_fastpath+0x16/0x1b
Out of Memory: Kill process 2856 (bash) score 9999 or sacrifice child
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, to make netconsole start over IPv6, the source address
needs to be specified. Without a source address, netpoll_parse_options
assumes we're setting up over IPv4 and the destination IPv6 address is
rejected.
Check if the IP version has been forced by a source address before
checking for a version mismatch when parsing the destination address.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixed following Warning while executing "make htmldocs".
Warning(/net/core/skbuff.c:2164): No description found for parameter 'from'
Warning(/net/core/skbuff.c:2164): Excess function parameter 'source'
description in 'skb_zerocopy'
Replace "@source" with "@from" fixed the warning.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) BPF debugger and asm tool by Daniel Borkmann.
2) Speed up create/bind in AF_PACKET, also from Daniel Borkmann.
3) Correct reciprocal_divide and update users, from Hannes Frederic
Sowa and Daniel Borkmann.
4) Currently we only have a "set" operation for the hw timestamp socket
ioctl, add a "get" operation to match. From Ben Hutchings.
5) Add better trace events for debugging driver datapath problems, also
from Ben Hutchings.
6) Implement auto corking in TCP, from Eric Dumazet. Basically, if we
have a small send and a previous packet is already in the qdisc or
device queue, defer until TX completion or we get more data.
7) Allow userspace to manage ipv6 temporary addresses, from Jiri Pirko.
8) Add a qdisc bypass option for AF_PACKET sockets, from Daniel
Borkmann.
9) Share IP header compression code between Bluetooth and IEEE802154
layers, from Jukka Rissanen.
10) Fix ipv6 router reachability probing, from Jiri Benc.
11) Allow packets to be captured on macvtap devices, from Vlad Yasevich.
12) Support tunneling in GRO layer, from Jerry Chu.
13) Allow bonding to be configured fully using netlink, from Scott
Feldman.
14) Allow AF_PACKET users to obtain the VLAN TPID, just like they can
already get the TCI. From Atzm Watanabe.
15) New "Heavy Hitter" qdisc, from Terry Lam.
16) Significantly improve the IPSEC support in pktgen, from Fan Du.
17) Allow ipv4 tunnels to cache routes, just like sockets. From Tom
Herbert.
18) Add Proportional Integral Enhanced packet scheduler, from Vijay
Subramanian.
19) Allow openvswitch to mmap'd netlink, from Thomas Graf.
20) Key TCP metrics blobs also by source address, not just destination
address. From Christoph Paasch.
21) Support 10G in generic phylib. From Andy Fleming.
22) Try to short-circuit GRO flow compares using device provided RX
hash, if provided. From Tom Herbert.
The wireless and netfilter folks have been busy little bees too.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2064 commits)
net/cxgb4: Fix referencing freed adapter
ipv6: reallocate addrconf router for ipv6 address when lo device up
fib_frontend: fix possible NULL pointer dereference
rtnetlink: remove IFLA_BOND_SLAVE definition
rtnetlink: remove check for fill_slave_info in rtnl_have_link_slave_info
qlcnic: update version to 5.3.55
qlcnic: Enhance logic to calculate msix vectors.
qlcnic: Refactor interrupt coalescing code for all adapters.
qlcnic: Update poll controller code path
qlcnic: Interrupt code cleanup
qlcnic: Enhance Tx timeout debugging.
qlcnic: Use bool for rx_mac_learn.
bonding: fix u64 division
rtnetlink: add missing IFLA_BOND_AD_INFO_UNSPEC
sfc: Use the correct maximum TX DMA ring size for SFC9100
Add Shradha Shah as the sfc driver maintainer.
net/vxlan: Share RX skb de-marking and checksum checks with ovs
tulip: cleanup by using ARRAY_SIZE()
ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
net/cxgb4: Don't retrieve stats during recovery
...
This check is not needed because the same check is done before
fill_slave_info is used in rtnl_link_slave_info_fill.
Also, by removing this check, kernel will fillup IFLA_INFO_SLAVE_KIND
even for slaves of masters which does not implement fill_slave_info.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have multiple devices attempting to sync the same address
to a single destination, each device should be permitted to sync
it once. To accomplish this, pass the 'sync_cnt' of the source
address when adding the addresss to the lower device. 'sync_cnt'
tracks how many time a given address has been succefully synced.
This way, we know that if the 'sync_cnt' passed in is 0, we should
sync this address.
Also, turn 'synced' member back into the counter as was originally
done in
commit 4543fbefe6.
net: count hw_addr syncs so that unsync works properly.
It tracks how many time a given address has been added via a
'sync' operation. For every successfull 'sync' the counter is
incremented, and for ever 'unsync', the counter is decremented.
This makes sure that the address will be properly removed from
the the lower device when all the upper devices have removed it.
Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
CC: Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru>
CC: Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull fuse update from Miklos Szeredi:
"This contains a fix for a potential use-after-module-unload bug
noticed by Al and caching improvements for read-only fuse filesystems
by Andrew Gallagher"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: support clients that don't implement 'open'
fuse: don't invalidate attrs when not using atime
fuse: fix SetPageUptodate() condition in STORE
fuse: fix pipe_buf_operations
Recent patch
bonding: add netlink attributes to slave link dev (1d3ee88ae0)
Introduced yet another device specific way to access slave information
over rtnetlink. There is one already there for bridge.
This patch introduces generic way to do this, for getting and setting
info as well by extending link_ops. Later on, this new interface will
be used for bridge ports as well.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Workqueue used in neighbour layer have no real dependency of scheduling these on
the cpu which scheduled them.
On a idle system, it is observed that an idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.
This patch replaces normal workqueues with power efficient versions. This
doesn't change existing behavior of code unless CONFIG_WQ_POWER_EFFICIENT is
enabled.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull trivial tree updates from Jiri Kosina:
"Usual rocket science stuff from trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
neighbour.h: fix comment
sched: Fix warning on make htmldocs caused by wait.h
slab: struct kmem_cache is protected by slab_mutex
doc: Fix typo in USB Gadget Documentation
of/Kconfig: Spelling s/one/once/
mkregtable: Fix sscanf handling
lp5523, lp8501: comment improvements
thermal: rcar: comment spelling
treewide: fix comments and printk msgs
IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart()
Documentation: update /proc/uptime field description
Documentation: Fix size parameter for snprintf
arm: fix comment header and macro name
asm-generic: uaccess: Spelling s/a ny/any/
mtd: onenand: fix comment header
doc: driver-model/platform.txt: fix a typo
drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text
doc: Fix typo (acces_process_vm -> access_process_vm)
treewide: Fix typos in printk
drivers/gpu/drm/qxl/Kconfig: reformat the help text
...
Having this struct in module memory could Oops when if the module is
unloaded while the buffer still persists in a pipe.
Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal
merge them into nosteal_pipe_buf_ops (this is the same as
default_pipe_buf_ops except stealing the page from the buffer is not
allowed).
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to
save some cycles on arm and powerpc.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export the gro_find_receive/complete_by_type helpers to they can be invoked
by the gro callbacks of encapsulation protocols such as vxlan.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.
For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.
The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.
Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>