kernel

mirror of https://github.com/ukui/kernel.git synced 2026-03-09 10:07:04 -07:00

Author	SHA1	Message	Date
Eric Dumazet	520ac30f45	net_sched: drop packets after root qdisc lock is released Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-25 12:19:35 -04:00
Wei Tang	be4da0e340	net: the space is required after ',' The space is missing after ',', and this will introduce much more noise when checking patch around. Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-16 17:41:23 -07:00
Wei Tang	84d15ae57d	net: do not initialise statics to 0 This patch fixes the checkpatch.pl error to dev.c: ERROR: do not initialise statics to 0 Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-16 17:41:22 -07:00
Daniel Borkmann	a70b506efe	bpf: enforce recursion limit on redirects Respect the stack's xmit_recursion limit for calls into dev_queue_xmit(). Currently, they are not handeled by the limiter when attached to clsact's egress parent, for example, and a buggy program redirecting it to the same device again could run into stack overflow eventually. It would be good if we could notify an admin to give him a chance to react. We reuse xmit_recursion instead of having one private to eBPF, so that the stack's current recursion depth will be taken into account as well. Follow-up to commit `3896d655f4` ("bpf: introduce bpf_clone_redirect() helper") and `27b29f6305` ("bpf: add bpf_redirect() helper"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-10 18:00:57 -07:00
Hariprasad Shenai	40e4e713eb	net: Reduce queue allocation to one in kdump kernel When in kdump kernel, reduce memory usage by only using a single Queue Set for multiqueue devices. So make netif_get_num_default_rss_queues() return one, when in kdump kernel. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-08 11:13:58 -07:00
Eric Dumazet	f9eb8aea2a	net_sched: transform qdisc running bit into a seqcount Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 16:37:13 -07:00
Eric Dumazet	3bcb846ca4	net: get rid of spin_trylock() in net_tx_action() Note: Tom Herbert posted almost same patch 3 months back, but for different reasons. The reasons we want to get rid of this spin_trylock() are : 1) Under high qdisc pressure, the spin_trylock() has almost no chance to succeed. 2) We loop multiple times in softirq handler, eventually reaching the max retry count (10), and we schedule ksoftirqd. Since we want to adhere more strictly to ksoftirqd being waked up in the future (https://lwn.net/Articles/687617/), better avoid spurious wakeups. 3) calls to __netif_reschedule() dirty the cache line containing q->next_sched, slowing down the owner of qdisc. 4) RT kernels can not use the spin_trylock() here. With help of busylock, we get the qdisc spinlock fast enough, and the trylock trick brings only performance penalty. Depending on qdisc setup, I observed a gain of up to 19 % in qdisc performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel) ("mpstat -I SCPU 1" is much happier now) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-07 15:32:03 -07:00
Daniel Borkmann	7e2c3aea43	net: also make sch_handle_egress() drop monitor ready Follow-up for `8a3a4c6e7b` ("net: make sch_handle_ingress() drop monitor ready") to also make the egress side drop monitor ready. Also here only TC_ACT_SHOT is a clear indication that something went wrong. Hence don't provide false positives to drop monitors such as 'perf record -e skb:kfree_skb ...'. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-16 14:02:44 -04:00
David Ahern	74b20582ac	net: l3mdev: Add hook in ip and ipv6 Currently the VRF driver uses the rx_handler to switch the skb device to the VRF device. Switching the dev prior to the ip / ipv6 layer means the VRF driver has to duplicate IP/IPv6 processing which adds overhead and makes features such as retaining the ingress device index more complicated than necessary. This patch moves the hook to the L3 layer just after the first NF_HOOK for PRE_ROUTING. This location makes exposing the original ingress device trivial (next patch) and allows adding other NF_HOOKs to the VRF driver in the future. dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb with the switched device through the packet taps to maintain current behavior (tcpdump can be used on either the vrf device or the enslaved devices). Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-11 19:31:40 -04:00
Eric Dumazet	8a3a4c6e7b	net: make sch_handle_ingress() drop monitor ready TC_ACT_STOLEN is used when ingress traffic is mirred/redirected to say ifb. Packet is not dropped, but consumed. Only TC_ACT_SHOT is a clear indication something went wrong. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-08 23:53:22 -04:00
Alexander Duyck	b1dc497b28	net: Fix netdev_fix_features so that TSO_MANGLEID is only available with TSO This change makes it so that we will strip the TSO_MANGLEID bit if TSO is not present. This way we will also handle ECN correctly of TSO is not present. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:27 -04:00
David S. Miller	cba6532100	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 00:52:29 -04:00
Alexander Duyck	996e802187	net: Disable segmentation if checksumming is not supported In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum offload for tunnels. With this being the case we should disable GSO in addition to the checksum offload features when we find that a device cannot perform a checksum on a given packet type. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 16:00:54 -04:00
Nikolay Aleksandrov	f4b05d27ec	net: constify is_skb_forwardable's arguments is_skb_forwardable is not supposed to change anything so constify its arguments Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:13:36 -04:00
Jason Wang	3df97ba830	tuntap: calculate rps hash only when needed There's no need to calculate rps hash if it was not enabled. So this patch export rps_needed and check it before trying to get rps hash. Tests (using pktgen to inject packets to guest) shows this can improve pps about 13% (when rps is disabled). Before: ~1150000 pps After: ~1300000 pps Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> ---- Changes from V1: - Fix build when CONFIG_RPS is not set Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:38:54 -04:00
Eric Dumazet	02a1d6e7a6	net: rename NET_{ADD\|INC}_STATS_BH() Rename NET_INC_STATS_BH() to __NET_INC_STATS() and NET_ADD_STATS_BH() to __NET_ADD_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Alexander Duyck	7f348a6076	net: Add support for IP ID mangling TSO in cases that require encapsulation This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports NETIF_F_TSO. This way if needed a device can then later enable the TSO with IP ID mangling and the tunnels on top of that device can then also make use of the IP ID mangling as well. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:11:07 -04:00
Eric Dumazet	d21fd63ea3	net: validate_xmit_skb() changes skbs given to validate_xmit_skb() should not have a next pointer anymore. Also if a packet is dropped, increment dev->tx_dropped __dev_queue_xmit() no longer has to change tx_dropped in this case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:40:24 -04:00
Alexander Duyck	802ab55adc	GSO: Support partial segmentation offload This patch adds support for something I am referring to as GSO partial. The basic idea is that we can support a broader range of devices for segmentation if we use fixed outer headers and have the hardware only really deal with segmenting the inner header. The idea behind the naming is due to the fact that everything before csum_start will be fixed headers, and everything after will be the region that is handled by hardware. With the current implementation it allows us to add support for the following GSO types with an inner TSO_MANGLEID or TSO6 offload: NETIF_F_GSO_GRE NETIF_F_GSO_GRE_CSUM NETIF_F_GSO_IPIP NETIF_F_GSO_SIT NETIF_F_UDP_TUNNEL NETIF_F_UDP_TUNNEL_CSUM In the case of hardware that already supports tunneling we may be able to extend this further to support TSO_TCPV4 without TSO_MANGLEID if the hardware can support updating inner IPv4 headers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:23:41 -04:00
Alexander Duyck	1530545ed6	GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values This patch does two things. First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field. As a result we should now be able to aggregate flows that were converted from IPv6 to IPv4. In addition this allows us more flexibility for future implementations of segmentation as we may be able to use a fixed IP ID when segmenting the flow. The second thing this does is that it places limitations on the outer IPv4 ID header in the case of tunneled frames. Specifically it forces the IP ID to be incrementing by 1 unless the DF bit is set in the outer IPv4 header. This way we can avoid creating overlapping series of IP IDs that could possibly be fragmented if the frame goes through GRO and is then resegmented via GSO. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:23:41 -04:00
Alexander Duyck	cbc53e08a7	GSO: Add GSO type for fixed IPv4 ID This patch adds support for TSO using IPv4 headers with a fixed IP ID field. This is meant to allow us to do a lossless GRO in the case of TCP flows that use a fixed IP ID such as those that convert IPv6 header to IPv4 headers. In addition I am adding a feature that for now I am referring to TSO with IP ID mangling. Basically when this flag is enabled the device has the option to either output the flow with incrementing IP IDs or with a fixed IP ID regardless of what the original IP ID ordering was. This is useful in cases where the DF bit is set and we do not care if the original IP ID value is maintained. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:23:40 -04:00
Eric Dumazet	743b03a832	net: remove netdevice gso_min_segs After introduction of ndo_features_check(), we believe that very specific checks for rare features should not be done in core networking stack. No driver uses gso_min_segs yet, so we revert this feature and save few instructions per tx packet in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 00:37:08 -04:00
David S. Miller	ae95d71261	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-04-09 17:41:41 -04:00
Alexander Duyck	a0ca153f98	GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: `c3483384ee` ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 16:56:33 -04:00
Aaron Conole	4da46cebbd	net/core/dev: Warn on a too-short GRO frame When signaling that a GRO frame is ready to be processed, the network stack correctly checks length and aborts processing when a frame is less than 14 bytes. However, such a condition is really indicative of a broken driver, and should be loudly signaled, rather than silently dropped as the case is today. Convert the condition to use net_warn_ratelimited() to ensure the stack loudly complains about such broken drivers. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-05 19:58:39 -04:00

1 2 3 4 5 ...

1290 Commits