linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Martin KaFai Lau	695ba2651a	bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 After doing map_perf_test with a much bigger BPF_F_NO_COMMON_LRU map, the perf report shows a lot of time spent in rotating the inactive list (i.e. __bpf_lru_list_rotate_inactive): > map_perf_test 32 8 10000 1000000 \| awk '{sum += $3}END{print sum}' 19644783 (19M/s) > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 6283930 (6.28M/s) By inactive, it usually means the element is not in cache. Hence, there is a need to tune the PERCPU_NR_SCANS value. This patch finds a better number of elements to scan during each list rotation. The PERCPU_NR_SCANS (which is defined the same as PERCPU_FREE_TARGET) decreases from 16 elements to 4 elements. This change only affects the BPF_F_NO_COMMON_LRU map. The test_lru_dist does not show meaningful difference between 16 and 4. Our production L4 load balancer which uses the LRU map for conntrack-ing also shows little change in cache hit rate. Since both benchmark and production data show no cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4. We can consider making it configurable if we find a usecase later that shows another value works better and/or use a different rotation strategy. After this change: > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 9240324 (9.2M/s) i.e. 6.28M/s -> 9.2M/s The test_lru_dist has not shown meaningful difference: > test_lru_dist zipf.100k.a1_01.out 4000 1: nr_misses: 31575 (Before) vs 31566 (After) > test_lru_dist zipf.100k.a0_01.out 40000 1 nr_misses: 67036 (Before) vs 67031 (After) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	9fd63d05f3	bpf: Allow bpf sample programs (_user.c) to change bpf_map_def The current bpf_map_def is statically defined during compile time. This patch allows the _user.c program to change it during runtime. It is done by adding load_bpf_file_fixup_map() which takes a callback. The callback will be called before creating each map so that it has a chance to modify the bpf_map_def. The current usecase is to change max_entries in map_perf_test. It is interesting to test with a much bigger map size in some cases (e.g. the following patch on bpf_lru_map.c). However, it is hard to find one size to fit all testing environment. Hence, it is handy to take the max_entries as a cmdline arg and then configure the bpf_map_def during runtime. This patch adds two cmdline args. One is to configure the map's max_entries. Another is to configure the max_cnt which controls how many times a syscall is called. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	bf8db5d243	bpf: lru: Refactor LRU map tests in map_perf_test One more LRU test will be added later in this patch series. In this patch, we first move all existing LRU map tests into a single syscall (connect) first so that the future new LRU test can be added without hunting another syscall. One of the map name is also changed from percpu_lru_hash_map to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	6467acbc70	bpf: lru: Cleanup test_lru_map.c This patch does the following cleanup on test_lru_map.c 1) Fix indentation (Replace spaces by tabs) 2) Remove redundant BPF_F_NO_COMMON_LRU test 3) Simplify some comments Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Martin KaFai Lau	9746f85686	bpf: lru: Add test_lru_sanity6 for BPF_F_NO_COMMON_LRU test_lru_sanity3 is not applicable to BPF_F_NO_COMMON_LRU. It just happens to work when PERCPU_FREE_TARGET == 16. This patch: 1) Disable test_lru_sanity3 for BPF_F_NO_COMMON_LRU 2) Add test_lru_sanity6 to test list rotation for the BPF_F_NO_COMMON_LRU map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:55:52 -04:00
Jisheng Zhang	82960fff09	net: mvneta: fix failed to suspend if WOL is enabled Recently, suspend/resume and WOL support are added into mvneta driver. If we enable WOL, then we get some error as below on Marvell BG4CT platforms during suspend: [ 184.149723] dpm_run_callback(): mdio_bus_suspend+0x0/0x50 returns -16 [ 184.149727] PM: Device f7b62004.mdio-mi:00 failed to suspend: error -16 -16 means -EBUSY, phy_suspend() will return -EBUSY if it finds the device has WOL enabled. We fix this issue by properly setting the netdev's power.can_wakeup and power.wakeup, i.e 1. in mvneta_mdio_probe(), call device_set_wakeup_capable() to set power.can_wakeup if the phy support WOL. 2. in mvneta_ethtool_set_wol(), call device_set_wakeup_enable() to set power.wakeup if WOL has been successfully enabled in phy. Signed-off-by: Jisheng Zhang <jszhang@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:46:35 -04:00
Nikolay Aleksandrov	cab93af0ed	net: bridge: notify on hw fdb takeover Recently we added support for SW fdbs to take over HW ones, but that results in changing a user-visible fdb flag thus we need to send a notification, also it's consistent with how HW takes over SW entries. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:45:34 -04:00
WANG Cong	f5001ceab8	kcm: remove a useless copy_from_user() struct kcm_clone only contains fd, and kcm_clone() only writes this struct, so there is no need to copy it from user. Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:28:48 -04:00
Jiri Pirko	6b2af241f0	MAINTAINERS: rename TC entry and add couple of header files The section is not specific only to "TC classifiers", but applies to the whole TC subsystem. Also, add couple of forgotten headers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:26:21 -04:00
Russell King	786df9c2a4	net: phy: simplify phy_supported_speeds() Simplify the loop in phy_supported_speeds(). Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:25:07 -04:00
Russell King	d06130377c	net: phy: improve phylib correctness for non-autoneg settings phylib has some undesirable behaviour when forcing a link mode through ethtool. phylib uses this code: idx = phy_find_valid(phy_find_setting(phydev->speed, phydev->duplex), features); to find an index in the settings table. phy_find_setting() starts at index 0, and scans upwards looking for an exact speed and duplex match. When it doesn't find it, it returns MAX_NUM_SETTINGS - 1, which is 10baseT-Half duplex. phy_find_valid() then scans from the point (and effectively only checks one entry) before bailing out, returning MAX_NUM_SETTINGS - 1. phy_sanitize_settings() then sets ->speed to SPEED_10 and ->duplex to DUPLEX_HALF whether or not 10baseT-Half is supported or not. This goes against all the comments against these functions, and 10baseT-Half may not even be supported by the hardware. Rework these functions, introducing a new method of scanning the table. There are two modes of lookup that phylib wants: exact, and inexact. - in exact mode, we return either an exact match or failure - in inexact mode, we return an exact match if it exists, a match at the highest speed that is not greater than the requested speed (ignoring duplex), or failing that, the lowest supported speed, or failure. The biggest difference is that we always check whether the entry is supported before further consideration, so all unsupported entries are not considered as candidates. This results in arguably saner behaviour, better matches the comments, and is probably what users would expect. This becomes important as ethernet speeds increase, PHYs exist which do not support the 10Mbit speeds, and half-duplex is likely to become obsolete - it's already not even an option on 10Gbit and faster links. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:25:07 -04:00
stephen hemminger	8ea3e43911	Subject: net: allow configuring default qdisc Since 3.12 it has been possible to configure the default queuing discipline via sysctl. This patch adds ability to configure the default queue discipline in kernel configuration. This is useful for environments where configuring the value from userspace is difficult to manage. The default is still the same as before (pfifo_fast) and it is possible to change after kernel init with sysctl. This is similar to how TCP congestion control works. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:23:06 -04:00
David S. Miller	7ca9511813	Merge branch 'qed-arfs' Manish Chopra says: ==================== qed/qede: aRFS support This series adds support for Accelerated Flow Steering in qede driver for TCP/UDP over IPv4/IPv6 protocols. Please consider applying this series to "net-next" ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:06:19 -04:00
Chopra, Manish	e4917d46a6	qede: Add aRFS support This patch adds support for aRFS for TCP and UDP protocols with IPv4/IPv6. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:06:18 -04:00
Chopra, Manish	d51e4af5c2	qed: aRFS infrastructure support This patch adds necessary APIs to interface with qede aRFS support in successive patch. It also reserves separate PTT entry for aRFS, [as being in fastpath flow] for hardware access instead of trying to acquire it at run time from the ptt pool. Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:06:18 -04:00
Martin Wetterwald	53a759c89b	smsc95xx: Add comments to the registers definition This chip is used by a lot of embedded devices and also by the Raspberry Pi 1, 2 & 3 which were created to promote the study of computer sciences. Students wanting to learn kernel / network device driver programming through those devices can only rely on the Linux kernel driver source to make their own. This commit adds a lot of comments to the registers definition to expand the register names. Cc: Steve Glendinning <steve.glendinning@shawell.net> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Martin Wetterwald <martin@wetterwald.eu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:04:52 -04:00
R. Parameswaran	57240d0078	l2tp: device MTU setup, tunnel socket needs a lock The MTU overhead calculation in L2TP device set-up merged via commit `b784e7ebfc` needs to be adjusted to lock the tunnel socket while referencing the sub-data structures to derive the socket's IP overhead. Reported-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 13:01:48 -04:00
David Ahern	4a6e3c5def	net: ipv6: send unsolicited NA on admin up ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is set to 1, gratuitous arp requests are sent when the device is brought up. The same is expected when ndisc_notify is set to 1 (per ndisc_notify in Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP event; add it. Fixes: `5cb04436ee` ("ipv6: add knob to send unsolicited ND on link-layer address change") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 12:44:55 -04:00
David S. Miller	70d40b366d	Merge branch 'mlx5-RDMA-netdevice' Saeed Mahameed says: ==================== Mellanox, mlx5 RDMA net device support This series provides the lower level mlx5 support of RDMA netdevice creation API [1] suggested and introduced by Intel's HFI OPA VNIC netdevice driver [2], to enable IPoIB mlx5 RDMA netdevice creation. mlx5 IPoIB RDMA netdev will serve as an acceleration netdevice for the current IPoIB ULP generic netdevice, providing: - mlx5 RSS support. - mlx5 HW RX,TX offloads (checksum, TSO, LRO, etc ..). - Full mlx5 HW features transparent to the ULP itself. The idea here is to reuse and benefit from the already implemented mlx5e netdevice management and channels API for both etherent and RDMA netdevices, since both IPoIB and Ethernet netdevices share same common mlx5 HW resources (with some small exceptions) and share most of the control/data path logic, it is more natural to have them share the same code. The differences between IPoIB and Ethernet netdevices can be summarized to: Steering: In mlx5, IPoIB traffic is sent and received from an underlay special QP, and in Ethernet the traffic is handled by vports and vport steering is managed by e-switch or FW. For IPoIB traffic to get steered correctly the only thing we need to do is to create RSS HW contexts for RX and TX HW contexts for TX (similar to mlx5e) with the underlay QP attached to them (underlay QP will be 0 in case of Ethernet). RX,TX: Since IPoIB traffic is different, slightly modified RX and TX handlers are required, still we do some code reuse in data path via common helper functions. All of the other generic netdevice and mlx5 aspects will be shared between mlx5 Ethernet and IPoIB netdevices, e.g. - Channels creation and handling (RQs,SQs,CQs, NAPI, interrupt moderation, etc..) - Offloads, checksum, GRO, LRO, TSO, and more. - netdevice logic and non Ethernet specific ndos (open/close, etc..) In order to achieve what we want: In patchet 1 to 3, Erez added the supported for underlay QP in mlx5_ifc and refactored the mlx5 steering code to accept the underlay QP as a parameter for creating steering objects and enabled flow steering for IB link. Then we are going to use the mlx5e netdevice profile, which is already used to separate between NIC and VF representors netdevices, to create new type of IPoIB netdevice profile. For that, one small refactoring is required to make mlx5e netdevice profile management more genetic and agnostic to link type which is done in patch #4. In patch #5, we introduce ipoib.c to host all of mlx5 IPoIB (mlx5i) specific logic and a skeleton for the IPoIB mlx5 netdevice profile, and we will start filling it in next patches, using mlx5e already existing APIs. Patch #6 and #7, Implement init/cleanup RX mlx5i netdev profile handlers to create mlx5 RSS resources, same as mlx5e but without vlan and L2 steering tables. Patch #8, Implement init/cleanup TX mlx5i netdev profile handlers, to create TX resources same as mlx5e but with one TC (tc = 0) support. Patch #9, Implement mlx5i open/close ndos, where we reuese the mlx5e channels API, to start/stop TX/RX channels. Patch #10, Create the underlay QP and attach it to mlx5i RSS and TX HW contexts. Patch #11 and #12, Break down the mlx5e xmit flow into smaller helper function and implement the mlx5i IPoIB xmit routine. Patch #13 and #14, Have an RX handler per netdevice profile. We already do this before this series in a non clean way to separate between NIC netdev and VF representor RX handlers, in patch 13 we make the RX handler generic and bound to a profile and in patch 14 we implement the IPoIB RX handlers. Patch #15, Small cleanup to avoid e-switch with IPoIB netdev. In order to enable mlx5 IPoIB, a merge between the IPoIB RDMA netdev offolad support [3] - which was alread submitted to the rdma mailing list - and this series is required plus an extra small patch [4] which will connect between both sides and actually enables the offload. Once both patch-sets are merged into linux we will have to submit the extra small patch [4], to enable the feature. Thanks, Saeed. [1] https://patchwork.kernel.org/patch/9676637/ [2] https://lwn.net/Articles/715453/ https://patchwork.kernel.org/patch/9587815/ [3] https://patchwork.kernel.org/patch/9672069/ [4] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/commit/?id=0141db6a686e32294dee015b7d07706162ba48d8 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:33 -04:00
Erez Shitrit	93d576af3c	hw/mlx5: Add New bit to check over QP creation Add check for bit IB_QP_CREATE_NETIF_QP while creating QP. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:32 -04:00
Saeed Mahameed	955bc48081	net/mlx5e: E-switch vport manager is valid for ethernet only Currently the driver support only ethernet eswitch, and we want to protect downstream IPoIB netdev from trying to access it in IB link. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:32 -04:00
Saeed Mahameed	9d6bd752c6	net/mlx5e: IPoIB, RX handler Implement IPoIB RX SKB handler. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:32 -04:00
Saeed Mahameed	20fd0c193f	net/mlx5e: RX handlers per netdev profile In order to have different RX handler per profile, fix and refactor the current code to take the rx handler directly from the netdevice profile rather than computing it on runtime as it was done with the switchdev mode representor rx handler. This will also remove the current wrong assumption in mlx5e_alloc_rq code that mlx5e_priv->ppriv is of the type vport_rep. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:31 -04:00
Saeed Mahameed	258545449b	net/mlx5e: IPoIB, Xmit flow Implement mlx5e's IPoIB SKB transmit using the helper functions provided by mlx5e ethernet tx flow, the only difference in the code between mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill (UD datagram segment) in the TX descriptor (WQE) and it doesn't need to have any vlan handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:31 -04:00
Saeed Mahameed	77bdf8950b	net/mlx5e: Xmit flow break down Break current mlx5e xmit flow into smaller blocks (helper functions) in order to reuse them for IPoIB SKB transmission. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-17 11:08:31 -04:00

1 2 3 4 5 ...

664686 Commits