In Rust, producing an invalid value of any type is immediate undefined
behavior (UB); this includes via zeroing memory. Therefore, since an
uninhabited type has no valid values, producing any values at all for it is
UB.
The Rust standard library type `core::convert::Infallible` is uninhabited,
by virtue of having been declared as an enum with no cases, which always
produces uninhabited types in Rust.
The current kernel code allows this UB to be triggered, for example by code
like `Box::<core::convert::Infallible>::init(kernel::init::zeroed())`.
Thus, remove the implementation of `Zeroable` for `Infallible`, thereby
avoiding the unsoundness (potential for future UB).
Cc: stable@vger.kernel.org
Fixes: 38cde0bd7b ("rust: init: add `Zeroable` trait and `init::zeroed` function")
Closes: https://github.com/Rust-for-Linux/pinned-init/pull/13
Signed-off-by: Laine Taffin Altman <alexanderaltman@me.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Link: https://lore.kernel.org/r/CA160A4E-561E-4918-837E-3DCEBA74F808@me.com
[ Reformatted the comment slightly. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
Pull workqueue updates from Tejun Heo:
"This cycle, a lot of workqueue changes including some that are
significant and invasive.
- During v6.6 cycle, unbound workqueues were updated so that they are
more topology aware and flexible, which among other things improved
workqueue behavior on modern multi-L3 CPUs. In the process, commit
636b927eba ("workqueue: Make unbound workqueues to use per-cpu
pool_workqueues") switched unbound workqueues to use per-CPU
frontend pool_workqueues as a part of increasing front-back mapping
flexibility.
An unwelcome side effect of this change was that this made max
concurrency enforcement per-CPU blowing up the maximum number of
allowed concurrent executions. I incorrectly assumed that this
wouldn't cause practical problems as most unbound workqueue users
are self-regulate max concurrency; however, there definitely are
which don't (e.g. on IO paths) and the drastic increase in the
allowed max concurrency led to noticeable perf regressions in some
use cases.
This is now addressed by separating out max concurrency enforcement
to a separate struct - wq_node_nr_active - which makes @max_active
consistently mean system-wide max concurrency regardless of the
number of CPUs or (finally) NUMA nodes. This is a rather invasive
and, in places, a bit clunky; however, the clunkiness rises from
the the inherent requirement to handle the disagreement between the
execution locality domain and max concurrency enforcement domain on
some modern machines.
See commit 5797b1c189 ("workqueue: Implement system-wide
nr_active enforcement for unbound workqueues") for more details.
- BH workqueue support is added.
They are similar to per-CPU workqueues but execute work items in
the softirq context. This is expected to replace tasklet. However,
currently, it's missing the ability to disable and enable work
items which is needed to convert many tasklet users. To avoid
crowding this merge window too much, this will be included in the
next merge window. A separate pull request will be sent for the
couple conversion patches that are currently pending.
- Waiman plugged a long-standing hole in workqueue CPU isolation
where ordered workqueues didn't follow wq_unbound_cpumask updates.
Ordered workqueues now follow the same rules as other unbound
workqueues.
- More CPU isolation improvements: Juri fixed another deficit in
workqueue isolation where unbound rescuers don't respect
wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on
isolated CPUs.
- Other misc changes"
* tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits)
workqueue: Drain BH work items on hot-unplugged CPUs
workqueue: Introduce from_work() helper for cleaner callback declarations
workqueue: Control intensive warning threshold through cmdline
workqueue: Make @flags handling consistent across set_work_data() and friends
workqueue: Remove clear_work_data()
workqueue: Factor out work_grab_pending() from __cancel_work_sync()
workqueue: Clean up enum work_bits and related constants
workqueue: Introduce work_cancel_flags
workqueue: Use variable name irq_flags for saving local irq flags
workqueue: Reorganize flush and cancel[_sync] functions
workqueue: Rename __cancel_work_timer() to __cancel_timer_sync()
workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held()
workqueue: Cosmetic changes
workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK
workqueue: Fix queue_work_on() with BH workqueues
async: Use a dedicated unbound workqueue with raised min_active
workqueue: Implement workqueue_set_min_active()
workqueue: Fix kernel-doc comment of unplug_oldest_pwq()
workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask
kernel/workqueue: Let rescuers follow unbound wq cpumask changes
...
Currently, `BStr` is just a type alias of `[u8]`, limiting its
representation to a byte list rather than a character list, which is not
ideal for printing and debugging.
Implement `Display` and `Debug` traits for `BStr` to facilitate easier
printing and debugging.
Also, for this purpose, change `BStr` from a type alias of `[u8]` to a
struct wrapper of `[u8]`.
Co-developed-by: Virgile Andreani <armavica@ulminfo.fr>
Signed-off-by: Virgile Andreani <armavica@ulminfo.fr>
Signed-off-by: Yutaro Ohno <yutaro.ono.418@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/ZcSlGMGP-e9HqybA@ohnotp
[ Formatted code comment. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Currently `ForeignOwnable::from_foreign()` only works for non-null
pointers for the existing `impl`s (e.g. `Box`, `Arc`). In turn, this
means callers may write code like:
```rust
// `p` is a pointer that may be null.
if p.is_null() {
None
} else {
unsafe { Some(Self::from_foreign(ptr)) }
}
```
Add a `try_from_foreign()` method to the trait with a default
implementation that returns `None` if `ptr` is null, otherwise
`Some(from_foreign(ptr))`, so that it can be used by callers instead.
Link: https://github.com/Rust-for-Linux/linux/issues/1057
Signed-off-by: Obei Sideg <linux@obei.io>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Link: https://lore.kernel.org/r/0100018d53f737f8-80c1fe97-0019-40d7-ab69-b1b192785cd7-000000@email.amazonses.com
[ Fixed intra-doc links, improved `SAFETY` comment and reworded commit. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Currently, all macros are reexported with #[macro_export] only, which
means that to access `new_work!` from the workqueue, you need to import
it from the path `kernel::new_work` instead of importing it from the
workqueue module like all other items in the workqueue. By adding
reexports of the macros, it becomes possible to import the macros from
the correct modules.
It's still possible to import the macros from the root, but I don't
think we can do anything about that.
There is no functional change. This is merely a code cleanliness
improvement.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20240129145837.1419880-1-aliceryhl@google.com
[ Removed new `use kernel::prelude::*`s, reworded title. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Replace instances of 'ref-count[ed]' with 'refcount[ed]' to increase
consistency within the Rust documentation. The latter form is used more
widely in the rest of the kernel:
```console
$ rg '(\*|//).*?\srefcount(|ed)[\s,.]' | wc -l
1605
$ rg '(\*|//).*?\sref-count(|ed)[\s,.]' | wc -l
43
```
(numbers are for commit 052d534373 ("Merge tag 'exfat-for-6.8-rc1'
of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat"))
Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-7-0c8af94ed7de@valentinobst.de
[ Reworded to use the kernel's commit description style. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Fixes multiple trivial typos in documentation and comments of the
kernel crate.
allocator:
- Fix a trivial list item alignment issue in the last SAFETY comment of
`krealloc_aligned`.
init:
- Replace 'type' with 'trait' in the doc comments of the `PinInit` and
`Init` traits.
- Add colons before starting lists.
- Add spaces between the type and equal sign to respect the code
formatting rules in example code.
- End a sentence with a full stop instead of a colon.
ioctl:
- Replace 'an' with 'a' where appropriate.
str:
- Replace 'Return' with 'Returns' in the doc comment of `bytes_written`
as the text describes what the function does.
sync/lock:
- Fix a trivial list item alignment issue in the Safety section of the
`Backend` trait's description.
sync/lock/spinlock:
- The code in this module operates on spinlocks, not mutexes. Thus,
replace 'mutex' with 'spinlock' in the SAFETY comment of `unlock`.
workqueue:
- Replace "wont" with "won't" in the doc comment of `__enqueue`.
Signed-off-by: Valentin Obst <kernel@valentinobst.de>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240131-doc-fixes-v3-v3-1-0c8af94ed7de@valentinobst.de
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Commit e563d0a7cd ("workqueue: Break up enum definitions and give
names to the types") gives a name to the `enum` where `WORK_CPU_UNBOUND`
was defined, so `bindgen` changes its output from e.g.:
pub type _bindgen_ty_10 = core::ffi::c_uint;
pub const WORK_CPU_UNBOUND: _bindgen_ty_10 = 64;
to e.g.:
pub type wq_misc_consts = core::ffi::c_uint;
pub const wq_misc_consts_WORK_CPU_UNBOUND: wq_misc_consts = 64;
Thus update Rust's side to match the change (which requires a slight
reformat of the code), fixing the build error.
Closes: https://lore.kernel.org/rust-for-linux/CANiq72=9PZ89bCAVX0ZV4cqrYSLoZWyn-d_K4KpBMHjwUMdC3A@mail.gmail.com/
Fixes: e563d0a7cd ("workqueue: Break up enum definitions and give names to the types")
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reduce the chances of compilation failures due to integer type
mismatches in `CondVar`.
When an integer is defined using a #define in C, bindgen doesn't know
which integer type it is supposed to be, so it will just use `u32` by
default (if it fits in an u32). Whenever the right type is something
else, we insert a cast in Rust. However, this means that the code has a
lot of extra casts, and sometimes the code will be missing casts if u32
happens to be correct on the developer's machine, even though the type
might be something else on a different platform.
This patch updates all uses of such constants in
`rust/kernel/sync/condvar.rs` to use constants defined with the right
type. This allows us to remove various unnecessary casts, while also
future-proofing for the case where `unsigned int != u32` (even though
that is unlikely to ever happen in the kernel).
I wrote this patch at the suggestion of Benno in [1].
Link: https://lore.kernel.org/all/nAEg-6vbtX72ZY3oirDhrSEf06TBWmMiTt73EklMzEAzN4FD4mF3TPEyAOxBZgZtjzoiaBYtYr3s8sa9wp1uYH9vEWRf2M-Lf4I0BY9rAgk=@proton.me/ [1]
Suggested-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Tiago Lam <tiagolam@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-4-88e0c871cc05@google.com
[ Added note on the unlikeliness of `sizeof(int)` changing. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Sleep on a condition variable with a timeout.
This is used by Rust Binder for process freezing. There, we want to
sleep until the freeze operation completes, but we want to be able to
abort the process freezing if it doesn't complete within some timeout.
Note that it is not enough to avoid jiffies by introducing a variant of
`CondVar::wait_timeout` that takes the timeout in msecs because we need
to be able to restart the sleep with the remaining sleep duration if it
is interrupted, and if the API takes msecs rather than jiffies, then
that would require a conversion roundtrip jiffies->msecs->jiffies that
is best avoided.
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Reviewed-by: Tiago Lam <tiagolam@gmail.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Link: https://lore.kernel.org/r/20240108-rb-new-condvar-methods-v4-3-88e0c871cc05@google.com
[ Added `CondVarTimeoutResult` re-export and fixed typo. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>