mirror of
https://github.com/ukui/kernel.git
synced 2026-03-09 10:07:04 -07:00
c443514a7d6d648bc497efbe502e2a49738b94de
86 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
169e77764a |
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
|
||
|
|
6ee64cc302 |
fprobe: Add sample program for fprobe
Add a sample program for the fprobe. The sample_fprobe puts a fprobe on kernel_clone() by default. This dump stack and some called address info at the function entry and exit. The sample_fprobe.ko gets 2 parameters. - symbol: you can specify the comma separated symbols or wildcard symbol pattern (in this case you can not use comma) - stackdump: a bool value to enable or disable stack dump in the fprobe handler. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735291987.1084943.4449670993752806840.stgit@devnote2 |
||
|
|
953c2f0521 |
tracing: Add sample code for custom trace events
Add sample code to show how to create custom trace events in the tracefs directory that can be enabled and modified like any event in tracefs (including triggers, histograms, synthetic events and event probes). The example is creating a custom sched_switch and a sched_waking to limit what is recorded: If the custom sched switch only records the prev_prio, next_prio and next_pid, it can bring the size from 64 bytes per event, down to just 16 bytes! If sched_waking only records the prio and pid of the woken event, it will bring the size down from 36 bytes to 12 bytes per event. This will allow for a much smaller footprint into the ring buffer and keep more events from dropping. Link: https://lkml.kernel.org/r/20220303220625.369226746@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Suggested-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
fdcee305c0 |
Merge tag 'coresight-next-v5.17' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Mathieu writes: Coresight changes for v5.17 This pull request includes: - A patch that uses devm_bitmap_zalloc() instead of the open-coded equivalent. - Work to make coresight complex configuration loadable via modules. - Some coresight documentation updates. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> * tag 'coresight-next-v5.17' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux: coresight: core: Fix typo in a comment Documentation: coresight: Update coresight configuration docs coresight: configfs: Allow configfs to activate configuration coresight: syscfg: Example CoreSight configuration loadable module coresight: syscfg: Update load API for config loadable modules coresight: configuration: Update API to permit dynamic load/unload coresight: configuration: Update API to introduce load owner concept Documentation: coresight: Fix documentation issue coresight: Use devm_bitmap_zalloc when applicable |
||
|
|
ede5bab874 |
coresight: syscfg: Example CoreSight configuration loadable module
An example of creating a loadable module to add CoreSight configurations into a system. In the Kernel samples/coresight directory. Signed-off-by: Mike Leach <mike.leach@linaro.org> Link: https://lore.kernel.org/r/20211124200038.28662-5-mike.leach@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> |
||
|
|
503e451084 |
ftrace/samples: add missing Kconfig option for ftrace direct multi sample
Currently it is not possible to build the ftrace direct multi example
anymore due to broken config dependencies. Fix this by adding
SAMPLE_FTRACE_DIRECT_MULTI config option.
This broke when merging s390-5.16-1 due to an incorrect merge conflict
resolution proposed by me.
Also rename SAMPLE_FTRACE_MULTI_DIRECT to SAMPLE_FTRACE_DIRECT_MULTI
so it matches the module name.
Fixes:
|
||
|
|
2acda7549e |
Merge tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara: "Support for reporting filesystem errors through fanotify so that system health monitoring daemons can watch for these and act instead of scraping system logs" * tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits) samples: remove duplicate include in fs-monitor.c samples: Fix warning in fsnotify sample docs: Fix formatting of literal sections in fanotify docs samples: Make fs-monitor depend on libc and headers docs: Document the FAN_FS_ERROR event samples: Add fs error monitoring example ext4: Send notifications on error fanotify: Allow users to request FAN_FS_ERROR events fanotify: Emit generic error info for error event fanotify: Report fid info for file related file system errors fanotify: WARN_ON against too large file handles fanotify: Add helpers to decide whether to report FID/DFID fanotify: Wrap object_fh inline space in a creator macro fanotify: Support merging of error events fanotify: Support enqueueing of error events fanotify: Pre-allocate pool of error events fanotify: Reserve UAPI bits for FAN_FS_ERROR fsnotify: Support FS_ERROR event type fanotify: Require fid_mode for any non-fd event fanotify: Encode empty file handle when no inode is provided ... |
||
|
|
0b707e572a |
Merge tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik: - Add support for ftrace with direct call and ftrace direct call samples. - Add support for kernel command lines longer than current 896 bytes and make its length configurable. - Add support for BEAR enhancement facility to improve last breaking event instruction tracking. - Add kprobes sanity checks and testcases to prevent kprobe in the mid of an instruction. - Allow concurrent access to /dev/hwc for the CPUMF users. - Various ftrace / jump label improvements. - Convert unwinder tests to KUnit. - Add s390_iommu_aperture kernel parameter to tweak the limits on concurrently usable DMA mappings. - Add ap.useirq AP module option which can be used to disable interrupt use. - Add add_disk() error handling support to block device drivers. - Drop arch specific and use generic implementation of strlcpy and strrchr. - Several __pa/__va usages fixes. - Various cio, crypto, pci, kernel doc and other small fixes and improvements all over the code. [ Merge fixup as per https://lore.kernel.org/all/YXAqZ%2FEszRisunQw@osiris/ ] * tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (63 commits) s390: make command line configurable s390: support command lines longer than 896 bytes s390/kexec_file: move kernel image size check s390/pci: add s390_iommu_aperture kernel parameter s390/spinlock: remove incorrect kernel doc indicator s390/string: use generic strlcpy s390/string: use generic strrchr s390/ap: function rework based on compiler warning s390/cio: make ccw_device_dma_* more robust s390/vfio-ap: s390/crypto: fix all kernel-doc warnings s390/hmcdrv: fix kernel doc comments s390/ap: new module option ap.useirq s390/cpumf: Allow multiple processes to access /dev/hwc s390/bitops: return true/false (not 1/0) from bool functions s390: add support for BEAR enhancement facility s390: introduce nospec_uses_trampoline() s390: rename last_break to pgm_last_break s390/ptrace: add last_break member to pt_regs s390/sclp: sort out physical vs virtual pointers usage s390/setup: convert start and end initrd pointers to virtual ... |
||
|
|
8fc70b3a14 |
samples: Make fs-monitor depend on libc and headers
Prevent build errors when headers or libc are not available, such as on
kernel build bots, like the below:
samples/fanotify/fs-monitor.c:7:10: fatal error: errno.h: No such file
or directory
7 | #include <errno.h>
| ^~~~~~~~~
Link: https://lore.kernel.org/r/87fsslasgz.fsf@collabora.com
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
||
|
|
5451093081 |
samples: Add fs error monitoring example
Introduce an example of a FAN_FS_ERROR fanotify user to track filesystem errors. Link: https://lore.kernel.org/r/20211025192746.66445-31-krisman@collabora.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz> |
||
|
|
c316eb4460 |
samples: add HAVE_SAMPLE_FTRACE_DIRECT config option
Add HAVE_SAMPLE_FTRACE_DIRECT config option which can be selected by architectures which have support for ftrace direct call samples. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20211012133802.2460757-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> |
||
|
|
af3ab3f9b9 |
vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE
For some reason the vfio_mdev shim mdev_driver has its own module and kconfig. As the next patch requires access to it from mdev.ko merge the two modules together and remove VFIO_MDEV_DEVICE. A later patch deletes this driver entirely. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20210617142218.1877096-7-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com> |
||
|
|
ba84b0bf5a |
samples/landlock: Add a sandbox manager example
Add a basic sandbox tool to launch a command which can only access a list of file hierarchies in a read-only or read-write way. Cc: James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210422154123.13086-12-mic@digikod.net Signed-off-by: James Morris <jamorris@linux.microsoft.com> |
||
|
|
8fe62e0c0e |
watch_queue: Drop references to /dev/watch_queue
The merged API doesn't use a watch_queue device, but instead relies on
pipes, so let the documentation reflect that.
Fixes:
|
||
|
|
214377e9b7 |
samples: watch_queue: build sample program for target architecture
This userspace program includes UAPI headers exported to usr/include/. 'make headers' always works for the target architecture (i.e. the same architecture as the kernel), so the sample program should be built for the target as well. Kbuild now supports 'userprogs' for that. I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
|
|
6adc19fd13 |
Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada: - fix build rules in binderfs sample - fix build errors when Kbuild recurses to the top Makefile - covert '---help---' in Kconfig to 'help' * tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: treewide: replace '---help---' in Kconfig files with 'help' kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables samples: binderfs: really compile this sample and fix build issues |
||
|
|
6c32978414 |
Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull notification queue from David Howells:
"This adds a general notification queue concept and adds an event
source for keys/keyrings, such as linking and unlinking keys and
changing their attributes.
Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:
https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
Without this, g-o-a has to constantly poll a keyring-based kerberos
cache to find out if kinit has changed anything.
[ There are other notification pending: mount/sb fsinfo notifications
for libmount that Karel Zak and Ian Kent have been working on, and
Christian Brauner would like to use them in lxc, but let's see how
this one works first ]
LSM hooks are included:
- A set of hooks are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different
"watched object" parameter, so they're not really shareable. The
LSM should use current's credentials. [Wanted by SELinux & Smack]
- A hook is provided to allow an LSM to rule on whether or not a
particular message may be posted to a particular queue. This is
given the credentials from the event generator (which may be the
system) and the watch setter. [Wanted by Smack]
I've provided SELinux and Smack with implementations of some of these
hooks.
WHY
===
Key/keyring notifications are desirable because if you have your
kerberos tickets in a file/directory, your Gnome desktop will monitor
that using something like fanotify and tell you if your credentials
cache changes.
However, we also have the ability to cache your kerberos tickets in
the session, user or persistent keyring so that it isn't left around
on disk across a reboot or logout. Keyrings, however, cannot currently
be monitored asynchronously, so the desktop has to poll for it - not
so good on a laptop. This facility will allow the desktop to avoid the
need to poll.
DESIGN DECISIONS
================
- The notification queue is built on top of a standard pipe. Messages
are effectively spliced in. The pipe is opened with a special flag:
pipe2(fds, O_NOTIFICATION_PIPE);
The special flag has the same value as O_EXCL (which doesn't seem
like it will ever be applicable in this context)[?]. It is given up
front to make it a lot easier to prohibit splice&co from accessing
the pipe.
[?] Should this be done some other way? I'd rather not use up a new
O_* flag if I can avoid it - should I add a pipe3() system call
instead?
The pipe is then configured::
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
Messages are then read out of the pipe using read().
- It should be possible to allow write() to insert data into the
notification pipes too, but this is currently disabled as the
kernel has to be able to insert messages into the pipe *without*
holding pipe->mutex and the code to make this work needs careful
auditing.
- sendfile(), splice() and vmsplice() are disabled on notification
pipes because of the pipe->mutex issue and also because they
sometimes want to revert what they just did - but one or more
notification messages might've been interleaved in the ring.
- The kernel inserts messages with the wait queue spinlock held. This
means that pipe_read() and pipe_write() have to take the spinlock
to update the queue pointers.
- Records in the buffer are binary, typed and have a length so that
they can be of varying size.
This allows multiple heterogeneous sources to share a common
buffer; there are 16 million types available, of which I've used
just a few, so there is scope for others to be used. Tags may be
specified when a watchpoint is created to help distinguish the
sources.
- Records are filterable as types have up to 256 subtypes that can be
individually filtered. Other filtration is also available.
- Notification pipes don't interfere with each other; each may be
bound to a different set of watches. Any particular notification
will be copied to all the queues that are currently watching for it
- and only those that are watching for it.
- When recording a notification, the kernel will not sleep, but will
rather mark a queue as having lost a message if there's
insufficient space. read() will fabricate a loss notification
message at an appropriate point later.
- The notification pipe is created and then watchpoints are attached
to it, using one of:
keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
where in both cases, fd indicates the queue and the number after is
a tag between 0 and 255.
- Watches are removed if either the notification pipe is destroyed or
the watched object is destroyed. In the latter case, a message will
be generated indicating the enforced watch removal.
Things I want to avoid:
- Introducing features that make the core VFS dependent on the
network stack or networking namespaces (ie. usage of netlink).
- Dumping all this stuff into dmesg and having a daemon that sits
there parsing the output and distributing it as this then puts the
responsibility for security into userspace and makes handling
namespaces tricky. Further, dmesg might not exist or might be
inaccessible inside a container.
- Letting users see events they shouldn't be able to see.
TESTING AND MANPAGES
====================
- The keyutils tree has a pipe-watch branch that has keyctl commands
for making use of notifications. Proposed manual pages can also be
found on this branch, though a couple of them really need to go to
the main manpages repository instead.
If the kernel supports the watching of keys, then running "make
test" on that branch will cause the testing infrastructure to spawn
a monitoring process on the side that monitors a notifications pipe
for all the key/keyring changes induced by the tests and they'll
all be checked off to make sure they happened.
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
- A test program is provided (samples/watch_queue/watch_test) that
can be used to monitor for keyrings, mount and superblock events.
Information on the notifications is simply logged to stdout"
* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
smack: Implement the watch_key and post_notification hooks
selinux: Implement the watch_key security hook
keys: Make the KEY_NEED_* perms an enum rather than a mask
pipe: Add notification lossage handling
pipe: Allow buffers to be marked read-whole-or-error for notifications
Add sample notification program
watch_queue: Add a key/keyring notification facility
security: Add hooks to rule on setting a watch
pipe: Add general notification queue support
pipe: Add O_NOTIFICATION_PIPE
security: Add a hook for the point of notification insertion
uapi: General notification queue definitions
|
||
|
|
fca5e94921 |
samples: binderfs: really compile this sample and fix build issues
Even after commit |
||
|
|
cff11abeca |
Merge tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- fix warnings in 'make clean' for ARCH=um, hexagon, h8300, unicore32
- ensure to rebuild all objects when the compiler is upgraded
- exclude system headers from dependency tracking and fixdep processing
- fix potential bit-size mismatch between the kernel and BPF user-mode
helper
- add the new syntax 'userprogs' to build user-space programs for the
target architecture (the same arch as the kernel)
- compile user-space sample code under samples/ for the target arch
instead of the host arch
- make headers_install fail if a CONFIG option is leaked to user-space
- sanitize the output format of scripts/checkstack.pl
- handle ARM 'push' instruction in scripts/checkstack.pl
- error out before modpost if a module name conflict is found
- error out when multiple directories are passed to M= because this
feature is broken for a long time
- add CONFIG_DEBUG_INFO_COMPRESSED to support compressed debug info
- a lot of cleanups of modpost
- dump vmlinux symbols out into vmlinux.symvers, and reuse it in the
second pass of modpost
- do not run the second pass of modpost if nothing in modules is
updated
- install modules.builtin(.modinfo) by 'make install' as well as by
'make modules_install' because it is useful even when
CONFIG_MODULES=n
- add new command line variables, GZIP, BZIP2, LZOP, LZMA, LZ4, and XZ
to allow users to use alternatives such as pigz, pbzip2, etc.
* tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (96 commits)
kbuild: add variables for compression tools
Makefile: install modules.builtin even if CONFIG_MODULES=n
mksysmap: Fix the mismatch of '.L' symbols in System.map
kbuild: doc: rename LDFLAGS to KBUILD_LDFLAGS
modpost: change elf_info->size to size_t
modpost: remove is_vmlinux() helper
modpost: strip .o from modname before calling new_module()
modpost: set have_vmlinux in new_module()
modpost: remove mod->skip struct member
modpost: add mod->is_vmlinux struct member
modpost: remove is_vmlinux() call in check_for_{gpl_usage,unused}()
modpost: remove mod->is_dot_o struct member
modpost: move -d option in scripts/Makefile.modpost
modpost: remove -s option
modpost: remove get_next_text() and make {grab,release_}file static
modpost: use read_text_file() and get_line() for reading text files
modpost: avoid false-positive file open error
modpost: fix potential mmap'ed file overrun in get_src_version()
modpost: add read_text_file() and get_line() helpers
modpost: do not call get_modinfo() for vmlinux(.o)
...
|
||
|
|
f5b5a164f9 |
Add sample notification program
The sample program is run like: ./samples/watch_queue/watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test read() = 16 NOTIFY[000]: ty=000001 sy=02 i=00000110 KEY 2ffc2e5d change=2[linked] aux=1035096409 read() = 16 NOTIFY[000]: ty=000001 sy=02 i=00000110 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: read() = 22 NOTIFY[000]: ty=000003 sy=02 i=00000416 USB 3-7.7 dev-reset e=0 r=0 read() = 24 NOTIFY[000]: ty=000002 sy=06 i=00000418 BLOCK 00800050 e=6[critical medium] s=64000ef8 This corresponds to: blk_update_request: critical medium error, dev sdf, sector 1677725432 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> |
||
|
|
88a8e278ff |
samples: watchdog: use 'userprogs' syntax
Kbuild now supports the 'userprogs' syntax to compile userspace programs for the same architecture as the kernel. Add the entry to samples/Makefile to put this into the build bot coverage. I also added the CONFIG option guarded by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> |
||
|
|
b98ccc7150 |
samples: timers: use 'userprogs' syntax
Kbuild now supports the 'userprogs' syntax to compile userspace programs for the same architecture as the kernel. Add the entry to samples/Makefile to put this into the build bot coverage. I also added the CONFIG option guarded by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> |
||
|
|
87ffbba9a9 |
samples: auxdisplay: use 'userprogs' syntax
Kbuild now supports the 'userprogs' syntax to compile userspace programs for the same architecture as the kernel. Add the entry to samples/Makefile to put this into the build bot coverage. I also added the CONFIG option guarded by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> |
||
|
|
c4c10996b1 |
samples: mei: build sample program for target architecture
This userspace program includes UAPI headers exported to usr/include/. 'make headers' always works for the target architecture (i.e. the same architecture as the kernel), so the sample program should be built for the target as well. Kbuild now supports 'userprogs' for that. I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> |
||
|
|
60fb0b1239 |
samples: pidfd: build sample program for target architecture
This userspace program includes UAPI headers exported to usr/include/. 'make headers' always works for the target architecture (i.e. the same architecture as the kernel), so the sample program should be built for the target as well. Kbuild now supports 'userprogs' for that. I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because $(CC) may not provide libc. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> |