linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Gustavo A. R. Silva	155f04366e	RDMA/uverbs: Avoid -Wflex-array-member-not-at-end warnings -Wflex-array-member-not-at-end is coming in GCC-14, and we are getting ready to enable it globally. There are currently a couple of objects (`alloc_head` and `bundle`) in `struct bundle_priv` that contain a couple of flexible structures: struct bundle_priv { /* Must be first / struct bundle_alloc_head alloc_head; ... / * Must be last. bundle ends in a flex array which overlaps * internal_buffer. / struct uverbs_attr_bundle bundle; u64 internal_buffer[32]; }; So, in order to avoid ending up with a couple of flexible-array members in the middle of a struct, we use the `struct_group_tagged()` helper to separate the flexible array from the rest of the members in the flexible structures: struct uverbs_attr_bundle { struct_group_tagged(uverbs_attr_bundle_hdr, hdr, ... the rest of the members ); struct uverbs_attr attrs[]; }; With the change described above, we now declare objects of the type of the tagged struct without embedding flexible arrays in the middle of another struct: struct bundle_priv { / Must be first */ struct bundle_alloc_head_hdr alloc_head; ... struct uverbs_attr_bundle_hdr bundle; u64 internal_buffer[32]; }; We also use `container_of()` whenever we need to retrieve a pointer to the flexible structures. Notice that the `bundle_size` computed in `uapi_compute_bundle_size()` remains the same. So, with these changes, fix the following warnings: drivers/infiniband/core/uverbs_ioctl.c:45:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 45 \| struct bundle_alloc_head alloc_head; \| ^~~~~~~~~~ drivers/infiniband/core/uverbs_ioctl.c:67:35: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] 67 \| struct uverbs_attr_bundle bundle; \| ^~~~~~ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZeIgeZ5Sb0IZTOyt@neat Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2024-03-03 15:38:44 +02:00
Erick Archer	14b526f55b	RDMA/uverbs: Remove flexible arrays from struct _filter When a struct containing a flexible array is included in another struct, and there is a member after the struct-with-flex-array, there is a possibility of memory overlap. These cases must be audited [1]. See: struct inner { ... int flex[]; }; struct outer { ... struct inner header; int overlap; ... }; This is the scenario for all the "struct _filter" structures that are included in the following "struct ib_flow_spec_" structures: struct ib_flow_spec_eth struct ib_flow_spec_ib struct ib_flow_spec_ipv4 struct ib_flow_spec_ipv6 struct ib_flow_spec_tcp_udp struct ib_flow_spec_tunnel struct ib_flow_spec_esp struct ib_flow_spec_gre struct ib_flow_spec_mpls The pattern is like the one shown below: struct _filter { ... u8 real_sz[]; }; struct ib_flow_spec_* { ... struct _filter val; struct _filter mask; }; In this case, the trailing flexible array "real_sz" is never allocated and is only used to calculate the size of the structures. Here the use of the "offsetof" helper can be changed by the "sizeof" operator because the goal is to get the size of these structures. Therefore, the trailing flexible arrays can also be removed. However, due to the trailing padding that can be induced in structs it is possible that the: offsetof(struct _filter, real_sz) != sizeof(struct _filter) This situation happens with the "struct ib_flow_ipv6_filter" and to avoid it the "__packed" macro is used in this structure. But now, the "sizeof(struct ib_flow_ipv6_filter)" has changed. This is not a problem since this size is not used in the code. The situation now is that "sizeof(struct ib_flow_spec_ipv6)" has also changed (this struct contains the struct ib_flow_ipv6_filter). This is also not a problem since it is only used to set the size of the "union ib_flow_spec", which can store all the "ib_flow_spec_*" structures. Link: https://lore.kernel.org/r/20240217142913.4285-1-erick.archer@gmx.com Signed-off-by: Erick Archer <erick.archer@gmx.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2024-02-21 13:28:52 -04:00
Mike Marciniszyn	4fbc3a52cd	RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz 64k pages introduce the situation in this diagram when the HCA 4k page size is being used: +-------------------------------------------+ <--- 64k aligned VA \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ \| o \| \| \| \| o \| \| \| \| o \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ <--- Live HCA page \|OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO\| <--- offset \| \| <--- VA \| MR data \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ \| o \| \| \| \| o \| \| \| \| o \| +-------------------------------------------+ \| \| \| HCA 4k page \| \| \| +-------------------------------------------+ The VA addresses are coming from rdma-core in this diagram can be arbitrary, but for 64k pages, the VA may be offset by some number of HCA 4k pages and followed by some number of HCA 4k pages. The current iterator doesn't account for either the preceding 4k pages or the following 4k pages. Fix the issue by extending the ib_block_iter to contain the number of DMA pages like comment [1] says and by using __sg_advance to start the iterator at the first live HCA page. The changes are contained in a parallel set of iterator start and next functions that are umem aware and specific to umem since there is one user of the rdma_for_each_block() without umem. These two fixes prevents the extra pages before and after the user MR data. Fix the preceding pages by using the __sq_advance field to start at the first 4k page containing MR data. Fix the following pages by saving the number of pgsz blocks in the iterator state and downcounting on each next. This fix allows for the elimination of the small page crutch noted in the Fixes. Fixes: `10c75ccb54` ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") Link: https://lore.kernel.org/r/20231129202143.1434-2-shiraz.saleem@intel.com Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2023-12-04 20:02:41 -04:00
Chuck Lever	0aa44595d6	RDMA/core: Fix a couple of obvious typos in comments Fix typos. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/169643338101.8035.6826446669479247727.stgit@manet.1015granger.net Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-04 21:55:44 +03:00
Kees Cook	4755dc6f29	RDMA: Annotate struct rdma_hw_stats with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct rdma_hw_stats. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230929180431.3005464-1-keescook@chromium.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-10-02 14:44:54 +03:00
Or Har-Toov	561b4a3ac6	IB/mlx5: Expose XDR speed through MAD Under MAD query port, Report NDR speed when NDR is supported in the port capability mask. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/d30bdec2a66a8a2edd1d84ee61453c58cf346b43.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:43 +03:00
Or Har-Toov	703289ce43	IB/core: Add support for XDR link speed Add new IBTA speed XDR, the new rate that was added to Infiniband spec as part of XDR and supporting signaling rate of 200Gb. In order to report that value to rdma-core, add new u32 field to query_port response. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/9d235fc600a999e8274010f0e18b40fa60540e6c.1695204156.git.leon@kernel.org Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-26 12:38:39 +03:00
wenglianfa	aebf8145e1	RDMA/core: Add support to dump SRQ resource in RAW format Add support to dump SRQ resource in raw format. It enable drivers to return the entire device specific SRQ context without setting each field separately. Example: $ rdma res show srq -r dev hns3 149000... $ rdma res show srq -j -r [{"ifindex":0,"ifname":"hns3","data":[149,0,0,...]}] Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-20 10:50:54 +03:00
wenglianfa	0e32d7d43b	RDMA/core: Add dedicated SRQ resource tracker function Add a dedicated callback function for SRQ resource tracker. Signed-off-by: wenglianfa <wenglianfa@huawei.com> Link: https://lore.kernel.org/r/20230918131110.3987498-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-09-20 10:50:54 +03:00
Yue Haibing	40cc695d63	RDMA Remove unused function declarations Commit `c2261dd76b` ("RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev") declared but never implemented ib_device_netdev(), remove it. Commit `922a8e9fb2` ("RDMA: iWARP Connection Manager.") declared but never implemented iw_cm_unbind_qp() and iw_cm_get_qp(). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20230809142718.42316-1-yuehaibing@huawei.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-08-13 10:32:35 +03:00
Arnd Bergmann	b39aeb338a	rdma: fix INFINIBAND_USER_ACCESS dependency After a change to the bnxt_re driver, it fails to link when CONFIG_INFINIBAND_USER_ACCESS is disabled: aarch64-linux-ld: drivers/infiniband/hw/bnxt_re/ib_verbs.o: in function `bnxt_re_handler_BNXT_RE_METHOD_ALLOC_PAGE': ib_verbs.c:(.text+0xd64): undefined reference to `ib_uverbs_get_ucontext_file' aarch64-linux-ld: drivers/infiniband/hw/bnxt_re/ib_verbs.o:(.rodata+0x168): undefined reference to `uverbs_idr_class' aarch64-linux-ld: drivers/infiniband/hw/bnxt_re/ib_verbs.o:(.rodata+0x1a8): undefined reference to `uverbs_destroy_def_handler' The problem is that the 'bnxt_re_uapi_defs' structure is built unconditionally and references a couple of functions that are never really called in this configuration but instead require other functions that are left out. Adding an #ifdef around the new code, or a Kconfig dependency would address this problem, but adding the compile-time check inside of the UAPI_DEF_CHAIN_OBJ_TREE_NAMED() macro seems best because that also addresses the problem in other drivers that may run into the same dependency. Fixes: `360da60d6c` ("RDMA/bnxt_re: Enable low latency push") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-07-03 16:55:04 -07:00
Mark Zhang	58030c76cc	RDMA/cma: Always set static rate to 0 for RoCE Set static rate to 0 as it should be discovered by path query and has no meaning for RoCE. This also avoid of using the rtnl lock and ethtool API, which is a bottleneck when try to setup many rdma-cm connections at the same time, especially with multiple processes. Fixes: `3c86aa70bf` ("RDMA/cm: Add RDMA CM support for IBoE devices") Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/f72a4f8b667b803aee9fa794069f61afb5839ce4.1685960567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-06-11 11:26:02 +03:00
Jason Gunthorpe	8d7c7c0eeb	RDMA: Add ib_virt_dma_to_page() Make it clearer what is going on by adding a function to go back from the "virtual" dma_addr to a kva and another to a struct page. This is used in the ib_uses_virt_dma() style drivers (siw, rxe, hfi, qib). Call them instead of a naked casting and virt_to_page() when working with dma_addr values encoded by the various ib_map functions. This also fixes the virt_to_page() casting problem Linus Walleij has been chasing. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-05ea785520ed+10-ib_virt_page_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-04-16 11:08:07 +03:00
Jason Gunthorpe	91d088a030	RDMA/umem: Remove unused 'work' member from struct ib_umem It is not used now. Fixes: `b95df5e3e4` ("drivers/IB,core: reduce scope of mmap_sem") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-22a2667fa089+a3-umem_work_jgg@nvidia.com Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-02-12 20:25:25 +02:00
Deming Wang	68e416255b	RDMA/restrack: Correct spelling Fix spelling errors. Signed-off-by: Deming Wang <wangdeming@inspur.com> Link: https://lore.kernel.org/r/20230206085725.1507-1-wangdeming@inspur.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-02-07 11:25:10 +02:00
Mark Zhang	312b8f79eb	RDMA/mlx: Calling qp event handler in workqueue context Move the call of qp event handler from atomic to workqueue context, so that the handler is able to block. This is needed by following patches. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-01-15 12:23:10 +02:00
Mark Zhang	ccae0447af	RDMA/cma: Refactor the inbound/outbound path records process flow Refactors based on comments [1] of the multiple path records support patchset: - Return failure if not able to set inbound/outbound PRs; - Simplify the flow when receiving the PRs from netlink channel: When a good PR response is received, unpack it and call the path_query callback directly. This saves two memory allocations; - Define RDMA_PRIMARY_PATH_MAX_REC_NUM in a proper place. [1] https://lore.kernel.org/linux-rdma/Yyxp9E9pJtUids2o@nvidia.com/ Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> #srp Link: https://lore.kernel.org/r/7610025d57342b8b6da0f19516c9612f9c3fdc37.1672819376.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2023-01-10 10:49:50 +02:00
Li Zhijian	208e3a134b	RDMA: Extend RDMA kernel verbs ABI to support flush This commit extends the RDMA kernel verbs ABI to support the flush operation defined in IBA A19.4.1. These changes are backward compatible with the existing RDMA kernel verbs ABI. It makes device/HCA support new FLUSH attributes/capabilities, and it also makes memory region support new FLUSH access flags. Users can use ibv_reg_mr(3) to register flush access flags. Only the access flags also supported by device's capabilities can be registered successfully. Once registered successfully, it means the MR is flushable. Similarly, A flushable MR should also have one or both of GLOBAL_VISIBILITY and PERSISTENT attributes/capabilities like device/HCA. Link: https://lore.kernel.org/r/20221206130201.30986-3-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-09 19:36:01 -04:00
Xiao Yang	3ff81e827b	RDMA: Extend RDMA kernel ABI to support atomic write 1) Define new atomic write request/completion in kernel. 2) Define new atomic write capability in kernel. 3) Define new atomic write opcode for RC service in packet. Link: https://lore.kernel.org/r/1669905432-14-3-git-send-email-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-12-01 19:51:09 -04:00
Jason Gunthorpe	09f530f0c6	RDMA: Add netdevice_tracker to ib_device_set_netdev() This will cause an informative backtrace to print if the user of ib_device_set_netdev() isn't careful about tearing down the ibdevice before its the netdevice parent is destroyed. Such as like this: unregister_netdevice: waiting for vlan0 to become free. Usage count = 2 leaked reference. ib_device_set_netdev+0x266/0x730 siw_newlink+0x4e0/0xfd0 nldev_newlink+0x35c/0x5c0 rdma_nl_rcv_msg+0x36d/0x690 rdma_nl_rcv+0x2ee/0x430 netlink_unicast+0x543/0x7f0 netlink_sendmsg+0x918/0xe20 sock_sendmsg+0xcf/0x120 ____sys_sendmsg+0x70d/0x8b0 ___sys_sendmsg+0x11d/0x1b0 __sys_sendmsg+0xfa/0x1d0 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This will help debug the issues syzkaller is seeing. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-a7c81b3842ce+e5-netdev_tracker_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-11-28 11:58:19 +02:00
Jiangshan Yi	7ac7bfe746	RDMA/opa_vnic: fix spelling typo in comment Fix spelling typo in comment. Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Link: https://lore.kernel.org/r/20221009081047.2643471-1-13667453960@163.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-19 10:02:37 +03:00
Li Zhijian	53c2d5b14a	RDMA/core: return -EOPNOSUPP for ODP unsupported device ib_reg_mr(3) which is used to register a MR with specific access flags for specific HCA will set errno when something go wrong. So, here we should return the specific -EOPNOTSUPP when the being requested ODP access flag is unsupported by the HCA(such as RXE). Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://lore.kernel.org/r/20221001020045.8324-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-10-19 10:02:18 +03:00
Jason Gunthorpe	015bda8abd	RDMA/core: Add UVERBS_ATTR_RAW_FD This uses the same passing protocol as UVERBS_ATTR_FD (eg len = 0 data_s64 = fd), except that the FD is not required to be a uverbs object and the core code does not covert the FD to an object handle automatically. Access to the int fd is provided by uverbs_get_raw_fd(). Link: https://lore.kernel.org/r/2-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2022-09-27 10:15:24 -03:00
Mark Zhang	eb8336dbe3	RDMA/cm: Use DLID from inbound/outbound PathRecords as the datapath DLID In inter-subnet cases, when inbound/outbound PRs are available, outbound_PR.dlid is used as the requestor's datapath DLID and inbound_PR.dlid is used as the responder's DLID. The inbound_PR.dlid is passed to responder side with the "ConnectReq.Primary_Local_Port_LID" field. With this solution the PERMISSIVE_LID is no longer used in Primary Local LID field. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/b3f6cac685bce9dde37c610be82e2c19d9e51d9e.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-22 12:35:31 +03:00
Mark Zhang	5a37494933	RDMA/cma: Multiple path records support with netlink channel Support receiving inbound and outbound IB path records (along with GMP PathRecord) from user-space service through the RDMA netlink channel. The LIDs in these 3 PRs can be used in this way: 1. GMP PR: used as the standard local/remote LIDs; 2. DLID of outbound PR: Used as the "dlid" field for outbound traffic; 3. DLID of inbound PR: Used as the "dlid" field for outbound traffic in responder side. This is aimed to support adaptive routing. With current IB routing solution when a packet goes out it's assigned with a fixed DLID per target, meaning a fixed router will be used. The LIDs in inbound/outbound path records can be used to identify group of routers that allow communication with another subnet's entity. With them packets from an inter-subnet connection may travel through any router in the set to reach the target. As confirmed with Jason, when sending a netlink request, kernel uses LS_RESOLVE_PATH_USE_ALL so that the service knows kernel supports multiple PRs. Signed-off-by: Mark Zhang <markzhang@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/2fa2b6c93c4c16c8915bac3cfc4f27be1d60519d.1662631201.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2022-09-22 12:35:21 +03:00

1 2 3 4 5 ...

1335 Commits