18508 Commits

Author SHA1 Message Date
izzy2lost
d2c6059df3 integrate libsamplerate and other audio fixes 2026-03-20 09:06:52 -04:00
izzy2lost
3a58ffe0c8 enable real NV2A transform-program. Fixes spy vs spy, metal slug 3 and other games 2026-03-20 05:20:28 -04:00
izzy2lost
ddc32eaea4 added opengl es option 2026-03-06 14:09:09 -05:00
izzy2lost
b75429b103 Video working slow bad audio 2026-02-06 19:51:23 -05:00
Matt Borgerson
ebe582768e util/mstring: Use G_GNUC_PRINTF macro for format attribute specification 2026-01-18 19:06:37 -07:00
Matt Borgerson
704ece9ac6 Merge QEMU v10.2.0 2026-01-18 16:36:55 -07:00
Philippe Mathieu-Daudé
efd6b3d176 Revert "hw/net/virtio-net: make VirtIONet.vlans an array instead of a pointer"
Per https://lore.kernel.org/qemu-devel/7798584d-e861-47b7-af52-2c2efb67a4de@proxmox.com/:

Loading a VM state taken with v10.1.2 or older doesn't work anymore,
using the script [*] we get:

  kvm: VQ 1 size 0x100 < last_avail_idx 0x9 - used_idx 0x3e30
  kvm: load of migration failed: Operation not permitted: error while loading state for instance 0x0 of device '0000:00:13.0/virtio-net': Failed to load element of type virtio for virtio: -1
  qemu-system-x86_64: Missing section footer for 0000:00:13.0/virtio-net
  qemu-system-x86_64: Section footer error, section_id: 41

[*]:

  #!/bin/bash
  rm /tmp/disk.qcow2
  args="
    -netdev type=tap,id=net1,ifname=tap104i1,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on
    -device virtio-net-pci,mac=BC:24:11:32:3C:69,netdev=net1,bus=pci.0,addr=0x13,id=net1
    -machine type=pc-i440fx-10.1
  "
  $1/qemu-img create -f qcow2 /tmp/disk.qcow2 1G
  $1/qemu-system-x86_64 --qmp stdio --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 $args <<EOF
  {"execute": "qmp_capabilities"}
  {"execute": "snapshot-save", "arguments": { "job-id": "save0", "tag": "snap", "vmstate": "node0", "devices": ["node0"] } }
  {"execute": "quit"}
  EOF
  $2/qemu-system-x86_64 --qmp stdio --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 $args -loadvm snap

This reverts commit 3a9cd2a4a1.

Reported-by: Fiona Ebner <f.ebner@proxmox.com>
Suggested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-09 21:00:15 +01:00
Philippe Mathieu-Daudé
0d42e48c73 Revert "migration/vmstate: remove VMSTATE_BUFFER_POINTER_UNSAFE macro"
Next commit will re-use VMSTATE_BUFFER_POINTER_UNSAFE().

This reverts commit 58341158d0.

Suggested-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-09 21:00:15 +01:00
Stefan Weil
4fdff25625 hw/pci: Fix typo in documentation
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-ID: <20251209125759.764296-1-sw@weilnetz.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-12-09 20:56:14 +01:00
Philippe Mathieu-Daudé
df3b304605 osdep: Undefine FSCALE definition to fix Solaris builds
Solaris defines FSCALE in <sys/param.h>:

  301 /*
  302  * Scale factor for scaled integers used to count
  303  * %cpu time and load averages.
  304  */
  305 #define FSHIFT  8               /* bits to right of fixed binary point */
  306 #define FSCALE  (1<<FSHIFT)

When emulating the SVE FSCALE instruction, we defines the same name
in decodetree format in target/arm/tcg/sve.decode:

  1129:FSCALE          01100101 .. 00 1001 100 ... ..... .....    @rdn_pg_rm

This leads to a definition clash:

  In file included from ../target/arm/tcg/translate-sve.c:21:
  ../target/arm/tcg/translate.h:875:17: error: pasting "trans_" and "(" does not give a valid preprocessing token
    875 |     static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
        |                 ^~~~~~
  ../target/arm/tcg/translate-sve.c:4205:5: note: in expansion of macro 'TRANS_FEAT'
   4205 |     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
        |     ^~~~~~~~~~
  ../target/arm/tcg/translate-sve.c:4249:1: note: in expansion of macro 'DO_ZPZZ_FP'
   4249 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
        | ^~~~~~~~~~
  ../target/arm/tcg/translate-sve.c:4249:12: error: expected declaration specifiers or '...' before numeric constant
   4249 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
        |            ^~~~~~
  ../target/arm/tcg/translate.h:875:25: note: in definition of macro 'TRANS_FEAT'
    875 |     static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
        |                         ^~~~
  ../target/arm/tcg/translate-sve.c:4249:1: note: in expansion of macro 'DO_ZPZZ_FP'
   4249 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
        | ^~~~~~~~~~
  ../target/arm/tcg/translate.h:875:47: error: pasting "arg_" and "(" does not give a valid preprocessing token
    875 |     static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
        |                                               ^~~~
  ../target/arm/tcg/translate-sve.c:4205:5: note: in expansion of macro 'TRANS_FEAT'
   4205 |     TRANS_FEAT(NAME, FEAT, gen_gvec_fpst_arg_zpzz, name##_zpzz_fns[a->esz], a)
        |     ^~~~~~~~~~
  ../target/arm/tcg/translate-sve.c:4249:1: note: in expansion of macro 'DO_ZPZZ_FP'
   4249 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
        | ^~~~~~~~~~
  In file included from ../target/arm/tcg/translate-sve.c💯
  libqemu-aarch64-softmmu.a.p/decode-sve.c.inc:1227:13: warning: 'trans_FSCALE' used but never defined
   1227 | static bool trans_FSCALE(DisasContext *ctx, arg_FSCALE *a);
        |             ^~~~~~~~~~~~
  ../target/arm/tcg/translate-sve.c:4249:30: warning: 'sve_fscalbn_zpzz_fns' defined but not used [-Wunused-const-variable=]
   4249 | DO_ZPZZ_FP(FSCALE, aa64_sve, sve_fscalbn)
        |                              ^~~~~~~~~~~
  ../target/arm/tcg/translate-sve.c:4201:42: note: in definition of macro 'DO_ZPZZ_FP'
   4201 |     static gen_helper_gvec_4_ptr * const name##_zpzz_fns[4] = { \
        |                                          ^~~~

As a kludge, undefine it globally in <qemu/osdep.h>.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20251203120315.62889-1-philmd@linaro.org>
2025-12-09 20:42:53 +01:00
Richard Henderson
8c00f56fca tcg/tci: Disable -Wundef FFI_GO_CLOSURES warning
Since we build TCI with FFI (commit 22f15579fa "tcg: Build ffi data
structures for helpers") we get on Darwin:

  In file included from ../../tcg/tci.c:22:
  In file included from include/tcg/helper-info.h:13:
  /Library/Developer/CommandLineTools/SDKs/MacOSX15.sdk/usr/include/ffi/ffi.h:483:5: warning: 'FFI_GO_CLOSURES' is not defined, evaluates to 0 [-Wundef]
    483 | #if FFI_GO_CLOSURES
        |     ^
  1 warning generated.

This was fixed in upstream libffi in 2023, but not backported to MacOSX.
Simply disable the warning locally.

Reported-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-12-05 07:50:15 -06:00
Peter Maydell
ef44cc0a76 hw/pci: Make msix_init take a uint32_t for nentries
msix_init() and msix_init_exclusive_bar() take an "unsigned short"
argument for the number of MSI-X vectors to try to use.  This is big
enough for the maximum permitted number of vectors, which is 2048.
Unfortunately, we have several devices (most notably virtio) which
allow the user to specify the desired number of vectors, and which
use uint32_t properties for this.  If the user sets the property to a
value that is too big for a uint16_t, the value will be truncated
when it is passed to msix_init(), and msix_init() may then return
success if the truncated value is a valid one.

The resulting mismatch between the number of vectors the msix code
thinks the device has and the number of vectors the device itself
thinks it has can cause assertions, such as the one in issue 2631,
where "-device virtio-mouse-pci,vectors=19923041" is interpreted by
msix as "97 vectors" and by the virtio-pci layer as "19923041
vectors"; a guest attempt to access vector 97 thus passes the
virtio-pci bounds checking and hits an essertion in
msix_vector_use().

Avoid this by making msix_init() and its wrapper function
msix_init_exclusive_bar() take the number of vectors as a uint32_t.
The erroneous command line will now produce the warning

 qemu-system-i386: -device virtio-mouse-pci,vectors=19923041:
   warning: unable to init msix vectors to 19923041

and proceed without crashing.  (The virtio device warns and falls
back to not using MSIX, rather than complaining that the option is
not a valid value this is the same as the existing behaviour for
values that are beyond the MSI-X maximum possible value but fit into
a 16-bit integer, like 2049.)

To ensure this doesn't result in potential overflows in calculation
of the BAR size in msix_init_exclusive_bar(), we duplicate the
nentries error-check from msix_init() at the top of
msix_init_exclusive_bar(), so we know nentries is sane before we
start using it.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2631
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20251107131044.1321637-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-11-25 22:41:40 +01:00
Richard Henderson
a8d023be62 Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging
Block layer patches

- Image creation: Honour pwrite_zeroes_alignment for zeroing first sector
- block-backend: Fix race (causing a crash) when resuming queued requests

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCgAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmklvQMRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9byFA//d9VtU3wLZpJRL2mnYH2qJME3WeqJaSB+
# FzkG32gkCb0JtH5yr427oJYKhZsKpNkz20E7z4+1ZT4ovcjo7mddJYW7DwaMjUmO
# G3UXWE33ayLNZFMDrsMRV5tfiQkSb7Y0ekYfwU7GjC3qhMhRIX9eCRBrCLD6jdUx
# mg2h0ML0smE9AV5AEuunwSoqp+rD+OpRQ6EBkkCVF5iMlIHeiewP/TQbJtKBtxdK
# AumiIcYgPbH7QFG8kDTmVCCGPDC0v2i1G6Owtptbt9RmWTEGp++Ngm8F+7u/kPMk
# weRhlVhnxwDxVxmHzvysh0m+n08oVJyA2vB4QJrti6ZmgDcJYulxFfQgPCKxjvGd
# 6va02q0DYrCbO3YiViaAtnudEuqqaB1to57jeQq6tP9KrpH8uzAddrFWeb3TY4gN
# CvWr+p4V7bYvteNASJt/+VC5T3haR+U5eCRD5nOKPyXqCbMT+z6zZRuYxP2q1W6i
# VwQLIjuWIx+bXVRUrHkf9VNy1clB4ga+ZDbTGFrl0NOLDcn6u3Vcr4GQ7VvQ31Pj
# ulGA9F+DXjPRQpZC+WnCZsBSLwVBrNeYPyxsCSk2ORH930djgb7e1lxX5OawT7MT
# lNzbQ+N7PXCd5Yt0UyJ3uCF6gqlpvmUV7IZMbyoYHceoCnz8+McqvGORYfzkLwk9
# HUDS3UTI8Ks=
# =57x4
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 25 Nov 2025 06:28:19 AM PST
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin:
  iotests: add Linux loop device image creation test
  block: use pwrite_zeroes_alignment when writing first sector
  file-posix: populate pwrite_zeroes_alignment
  block-backend: Fix race when resuming queued requests

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-11-25 10:25:16 -08:00
Stefan Hajnoczi
d704a13d2c block: use pwrite_zeroes_alignment when writing first sector
Since commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t
writeback"), qemu-img create errors out on a Linux loop block device
with a 4 KB sector size:

  # dd if=/dev/zero of=blockfile bs=1M count=1024
  # losetup --sector-size 4096 /dev/loop0 blockfile
  # qemu-img create -f raw /dev/loop0 1G
  Formatting '/dev/loop0', fmt=raw size=1073741824
  qemu-img: /dev/loop0: Failed to clear the new image's first sector: Invalid argument

Use the pwrite_zeroes_alignment block limit to avoid misaligned
fallocate(2) or ioctl(BLKZEROOUT) in the block/file-posix.c block
driver.

Cc: qemu-stable@nongnu.org
Fixes: 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/3127
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20251007141700.71891-3-stefanha@redhat.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-25 15:26:22 +01:00
Klaus Jensen
3050b34921 hw/nvme: fix namespace atomic parameter setup
Coverity complains about a possible copy-paste error in the verification
of the namespace atomic parameters (CID 1642811). While the check is
correct, the code (and the intention) is unclear.

Fix this by reworking how the parameters are verified. Peter also
identified that the realize function was not correctly erroring out if
parameters were misconfigured, so fix that too.

Lastly, change the error messages to be more describing.

Coverity: CID 1642811
Fixes: bce51b8370 ("hw/nvme: add atomic boundary support")
Fixes: 3b41acc962 ("hw/nvme: enable ns atomic writes")
Reviewed-by: Jesper Wendel Devantier <foss@defmacro.it>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-11-25 09:21:35 +01:00
Richard Henderson
5e0242e9a8 Merge tag 'hw-misc-20251118' of https://github.com/philmd/qemu into staging
Misc HW patches

- Re-enable xenpvh machine in qemu-system-arm/aarch64 binaries
- Correct Xilinx Zynq DMA Devcfg registers range size
- Correct ACCEL_KERNEL_GSI_IRQFD_POSSIBLE typo
- Allow for multiple CHR_EVENT_CLOSED events in QTest framework
- Fix ACMD41 state machine for SD cards in SPI mode
- Avoid confusing address calculation around eMMC RPMB HMAC
- Fix a pair of build failures on Solaris (guest-agent and RDMA migration)
- Correct QOM parent of LASI south bridge
- Clarify MIPS / PPC 32-bit hosts removal in documentation
- Prevent further uses of DEVICE_NATIVE_ENDIAN definition
- Fix Error uses in eBPF
- Update David Hildenbrand's email address

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmkcwjAACgkQ4+MsLN6t
# wN7Wxg//UMbpEgp92clPcGUX1RFHViEYu5DDM96nwjLpOR8nNAJvLZ5+qxDfyZRQ
# qfVGaE0cm5a/rXRMgFAzeJw5ptcSwLJXsUvnRuNLEpKlIAfqInqqk+JTi/r7hJSq
# W8m07IrdtADwoas0OYKur0XwF+k1hqVOENQWPxiLiyViEH2tR8MFA+nrqQhZzgwo
# Emu3ICc01wX+hhY2R51mf+GdVcmr8RACc07lmG7MnMtvQW8vzCkA/VJ5jWWQv6Xj
# ADKBTciYEK/PKD5vbbwMadZfxaWRiH1l+unfpw0qXC46YOAMvpe3+0mRqk7VeSRc
# anqdXQk9dbqw7qwJ+L+RVdUjNf1bLc9LxOePeMOgsNzd8wxlsBia9PDNxvVTRFmh
# /JxLYO9bM4vRojaGOCFoppoF++JSdZzI6WM56hY465L3VCx36V1p2YESX8x/5F1B
# +w/JPV0dUGeq+MFUNKg/pBy9dgRYIGJfcbcp2jwMxyEB5d0np53zXbMaZmqX/cEO
# AjE/haqtpu/yAqSK7oklx1gJEI9gRE0cJp2B/7l/3RwW3fcMsN3HJB3GH8f+3vg2
# VQMYDrAWBF5wA/5HQtsGNrfImlYGHa535KnLujTcNLVwS+2gZ6N6FwfwhM2fwXQh
# +X7nQZbBsAVa0jDqck8zkIarVuISocC10DWfuP5k4hlKxeyg71M=
# =K5DF
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 18 Nov 2025 08:00:00 PM CET
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'hw-misc-20251118' of https://github.com/philmd/qemu:
  ebpf: Make ebpf_rss_load() return value consistent with @errp
  ebpf: Clean up useless error check in ebpf_rss_set_all()
  ebpf: Fix stubs to set an error when they return failure
  scripts/checkpatch: Check DEVICE_NATIVE_ENDIAN
  docs: Mention 32-bit PPC host as removed
  docs: Correct release of MIPS deprecations / removals
  migration/rdma: Check ntohll() availability with meson
  buildsys: Remove dead 'mips' entry in supported_cpus[] array
  hw/southbridge/lasi: Correct LasiState parent
  qga/commands: Include proper Solaris header for getloadavg()
  hw/sd/sdcard: Avoid confusing address calculation in rpmb_calc_hmac
  hw/arm: Re-enable xenpvh machine in qemu-system-arm/aarch64 binaries
  hw/dma/zynq-devcfg: Fix register memory
  hw/sd: Fix ACMD41 state machine in SPI mode
  hw/sd: Fix incorrect idle state reporting in R1 response for SPI mode
  system/qtest.c: Allow for multiple CHR_EVENT_CLOSED events
  hw/intc/ioapic: Fix ACCEL_KERNEL_GSI_IRQFD_POSSIBLE typo
  MAINTAINERS: Update David Hildenbrand's email address

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-11-19 07:38:44 +01:00
Philippe Mathieu-Daudé
9c3b76a0d4 hw/southbridge/lasi: Correct LasiState parent
TYPE_LASI_CHIP inherits from TYPE_SYS_BUS_DEVICE, not
TYPE_PCI_HOST_BRIDGE, so its parent structure is of
SysBusDevice type.

Cc: qemu-stable@nongnu.org
Fixes: 376b851909 ("hppa: Add support for LASI chip with i82596 NIC")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20251117091804.56529-1-philmd@linaro.org>
2025-11-18 19:59:36 +01:00
Hanna Czenczek
d45b2c65f2 block: Note in which AioContext AIO CBs are called
This doesn’t seem to be specified anywhere, but is something we probably
want to be clear.  I believe it is reasonable to implicitly assume that
callbacks are run in the current thread (unless explicitly noted
otherwise), so codify that assumption.

Some implementations don’t actually fulfill this contract yet.  The next
patches should rectify that.

Note: I don’t know of any user-visible bugs produced by not running AIO
callbacks in the original context.  AIO functionality is generally
mapped to coroutines through the use of bdrv_co_io_em_complete(), which
can run in any AioContext, and will always wake the yielding coroutine
in its original context.  The only benefit here is that running
bdrv_co_io_em_complete() in the original context will make that
aio_co_wake() most likely a simpler qemu_coroutine_enter() instead of
scheduling the wakeup through AioContext.co_schedule_bh.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-17-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:55 +01:00
Hanna Czenczek
aed74d3d62 block: Note on aio_co_wake use if not yet yielding
aio_co_wake() is generally safe to call regardless of whether the
coroutine is already yielding or not.  If it is not yet yielding, it
will be scheduled to run when it does yield.

Caveats:
- The caller must be independent of the coroutine (to ensure the
  coroutine must be yielding if both are in the same AioContext), i.e.
  must not be the same coroutine
- The coroutine must yield at some point

Make note of this so callers can reason that their use is safe.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20251110154854.151484-2-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-18 18:01:39 +01:00
Eric Blake
de252f7993 qio: Add QIONetListener API for using AioContext
The user calling himself "John Doe" reported a deadlock when
attempting to use qemu-storage-daemon to serve both a base file over
NBD, and a qcow2 file with that NBD export as its backing file, from
the same process, even though it worked just fine when there were two
q-s-d processes.  The bulk of the NBD server code properly uses
coroutines to make progress in an event-driven manner, but the code
for spawning a new coroutine at the point when listen(2) detects a new
client was hard-coded to use the global GMainContext; in other words,
the callback that triggers nbd_client_new to let the server start the
negotiation sequence with the client requires the main loop to be
making progress.  However, the code for bdrv_open of a qcow2 image
with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
ensure that the entire qcow2 backing chain is either fully loaded or
rejected, without any side effects from the main loop causing unwanted
changes to the disk being loaded (in short, an AioContext represents
the set of actions that are known to be safe while handling block
layer I/O, while excluding any other pending actions in the global
main loop with potentially larger risk of unwanted side effects).

This creates a classic case of deadlock: the server can't progress to
the point of accept(2)ing the client to write to the NBD socket
because the main loop is being starved until the AIO_WAIT_WHILE
completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
it is blocked on the client coroutine stuck in a read() of the
expected magic number from the server side of the socket.

This patch adds a new API to allow clients to opt in to listening via
an AioContext rather than a GMainContext.  This will allow NBD to fix
the deadlock by performing all actions during bdrv_open in the main
loop AioContext.

Technical debt warning: I would have loved to utilize a notify
function with AioContext to guarantee that we don't finalize listener
due to an object_unref if there is any callback still running (the way
GSource does), but wiring up notify functions into AioContext is a
bigger task that will be deferred to a later QEMU release.  But for
solving the NBD deadlock, it is sufficient to note that the QMP
commands for enabling and disabling the NBD server are really the only
points where we want to change the listener's callback.  Furthermore,
those commands are serviced in the main loop, which is the same
AioContext that is also listening for connections.  Since a thread
cannot interrupt itself, we are ensured that at the point where we are
changing the watch, there are no callbacks active.  This is NOT as
powerful as the GSource cross-thread safety, but sufficient for the
needs of today.

An upcoming patch will then add a unit test (kept separate to make it
easier to rearrange the series to demonstrate the deadlock without
this patch).

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-26-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2025-11-13 10:58:26 -06:00
Eric Blake
cc0faf8273 qio: Prepare NetListener to use AioContext
For ease of review, this patch adds an AioContext pointer to the
QIONetListener struct, the code to trace it, and refactors
listener->io_source to instead be an array of utility structs; but the
aio_context pointer is always NULL until the next patch adds an API to
set it.  There should be no semantic change in this patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-25-eblake@redhat.com>
2025-11-13 10:54:50 -06:00
Eric Blake
ec59a65a4d qio: Provide accessor around QIONetListener->sioc
An upcoming patch needs to pass more than just sioc as the opaque
pointer to an AioContext; but since our AioContext code in general
(and its QIO Channel wrapper code) lacks a notify callback present
with GSource, we do not have the trivial option of just g_malloc'ing a
small struct to hold all that data coupled with a notify of g_free.
Instead, the data pointer must outlive the registered handler; in
fact, having the data pointer have the same lifetime as QIONetListener
is adequate.

But the cleanest way to stick such a helper struct in QIONetListener
will be to rearrange internal struct members.  And that in turn means
that all existing code that currently directly accesses
listener->nsioc and listener->sioc[] should instead go through
accessor functions, to be immune to the upcoming struct layout
changes.  So this patch adds accessor methods qio_net_listener_nsioc()
and qio_net_listener_sioc(), and puts them to use.

While at it, notice that the pattern of grabbing an sioc from the
listener only to turn around can call
qio_channel_socket_get_local_address is common enough to also warrant
the helper of qio_net_listener_get_local_address, and fix a copy-paste
error in the corresponding documentation.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-24-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2025-11-13 10:54:44 -06:00
Eric Blake
9d86181874 qio: Protect NetListener callback with mutex
Without a mutex, NetListener can run into this data race between a
thread changing the async callback callback function to use when a
client connects, and the thread servicing polling of the listening
sockets:

  Thread 1:
       qio_net_listener_set_client_func(lstnr, f1, ...);
           => foreach sock: socket
               => object_ref(lstnr)
               => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);

  Thread 2:
       poll()
          => event POLLIN on socket
               => ref(GSourceCallback)
               => if (lstnr->io_func) // while lstnr->io_func is f1
                    ...interrupt..

  Thread 1:
       qio_net_listener_set_client_func(lstnr, f2, ...);
          => foreach sock: socket
               => g_source_unref(sock_src)
          => foreach sock: socket
               => object_ref(lstnr)
               => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);

  Thread 2:
               => call lstnr->io_func(lstnr->io_data) // now sees f2
               => return dispatch(sock)
               => unref(GSourceCallback)
                  => destroy-notify
                     => object_unref

Found by inspection; I did not spend the time trying to add sleeps or
execute under gdb to try and actually trigger the race in practice.
This is a SEGFAULT waiting to happen if f2 can become NULL because
thread 1 deregisters the user's callback while thread 2 is trying to
service the callback.  Other messes are also theoretically possible,
such as running callback f1 with an opaque pointer that should only be
passed to f2 (if the client code were to use more than just a binary
choice between a single async function or NULL).

Mitigating factor: if the code that modifies the QIONetListener can
only be reached by the same thread that is executing the polling and
async callbacks, then we are not in a two-thread race documented above
(even though poll can see two clients trying to connect in the same
window of time, any changes made to the listener by the first async
callback will be completed before the thread moves on to the second
client).  However, QEMU is complex enough that this is hard to
generically analyze.  If QMP commands (like nbd-server-stop) are run
in the main loop and the listener uses the main loop, things should be
okay.  But when a client uses an alternative GMainContext, or if
servicing a QMP command hands off to a coroutine to avoid blocking, I
am unable to state with certainty whether a given net listener can be
modified by a thread different from the polling thread running
callbacks.

At any rate, it is worth having the API be robust.  To ensure that
modifying a NetListener can be safely done from any thread, add a
mutex that guarantees atomicity to all members of a listener object
related to callbacks.  This problem has been present since
QIONetListener was introduced.

Note that this does NOT prevent the case of a second round of the
user's old async callback being invoked with the old opaque data, even
when the user has already tried to change the async callback during
the first async callback; it is only about ensuring that there is no
sharding (the eventual io_func(io_data) call that does get made will
correspond to a particular combination that the user had requested at
some point in time, and not be sharded to a combination that never
existed in practice).  In other words, this patch maintains the status
quo that a user's async callback function already needs to be robust
to parallel clients landing in the same window of poll servicing, even
when only one client is desired, if that particular listener can be
amended in a thread other than the one doing the polling.

CC: qemu-stable@nongnu.org
Fixes: 53047392 ("io: introduce a network socket listener API", v2.12.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-20-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: minor commit message wording improvements]
Signed-off-by: Eric Blake <eblake@redhat.com>
2025-11-13 08:46:41 -06:00
Eric Blake
b5676493a0 qio: Remember context of qio_net_listener_set_client_func_full
io/net-listener.c has two modes of use: asynchronous (the user calls
qio_net_listener_set_client_func to wake up the callback via the
global GMainContext, or qio_net_listener_set_client_func_full to wake
up the callback via the caller's own alternative GMainContext), and
synchronous (the user calls qio_net_listener_wait_client which creates
its own GMainContext and waits for the first client connection before
returning, with no need for a user's callback).  But commit 938c8b79
has a latent logic flaw: when qio_net_listener_wait_client finishes on
its temporary context, it reverts all of the siocs back to the global
GMainContext rather than the potentially non-NULL context they might
have been originally registered with.  Similarly, if the user creates
a net-listener, adds initial addresses, registers an async callback
with a non-default context (which ties to all siocs for the initial
addresses), then adds more addresses with qio_net_listener_add, the
siocs for later addresses are blindly placed in the global context,
rather than sharing the context of the earlier ones.

In practice, I don't think this has caused issues.  As pointed out by
the original commit, all async callers prior to that commit were
already okay with the NULL default context; and the typical usage
pattern is to first add ALL the addresses the listener will pay
attention to before ever setting the async callback.  Likewise, if a
file uses only qio_net_listener_set_client_func instead of
qio_net_listener_set_client_func_full, then it is never using a custom
context, so later assignments of async callbacks will still be to the
same global context as earlier ones.  Meanwhile, any callers that want
to do the sync operation to grab the first client are unlikely to
register an async callback; altogether bypassing the question of
whether later assignments of a GSource are being tied to a different
context over time.

I do note that chardev/char-socket.c is the only file that calls both
qio_net_listener_wait_client (sync for a single client in
tcp_chr_accept_server_sync), and qio_net_listener_set_client_func_full
(several places, all with chr->gcontext, but sometimes with a NULL
callback function during teardown).  But as far as I can tell, the two
uses are mutually exclusive, based on the is_waitconnect parameter to
qmp_chardev_open_socket_server.

That said, it is more robust to remember when an async callback
function is tied to a non-default context, and have both the sync wait
and any late address additions honor that same context.  That way, the
code will be robust even if a later user performs a sync wait for a
specific client in the middle of servicing a longer-lived
QIONetListener that has an async callback for all other clients.

CC: qemu-stable@nongnu.org
Fixes: 938c8b79 ("qio: store gsources for net listeners", v2.12.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-ID: <20251113011625.878876-19-eblake@redhat.com>
2025-11-13 08:29:46 -06:00
Eric Blake
1bd7bfbc2b block: Allow drivers to control protocol prefix at creation
This patch is pure refactoring: instead of hard-coding permission to
use a protocol prefix when creating an image, the drivers can now pass
in a parameter, comparable to what they could already do for opening a
pre-existing image.  This patch is purely mechanical (all drivers pass
in true for now), but it will enable the next patch to cater to
drivers that want to differ in behavior for the primary image vs. any
secondary images that are opened at the same time as creating the
primary image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250915213919.3121401-5-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-11-11 22:06:09 +01:00