devtmpfs_delete_node() calls devnode() callback with mode==NULL but
vfio still tries to write there.
The patch fixes this.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull vfio updates from Alex Williamson:
"Changes include extension to support PCI AER notification to
userspace, byte granularity of PCI config space and access to
unarchitected PCI config space, better protection around IOMMU driver
accesses, default file mode fix, and a few misc cleanups."
* tag 'vfio-for-v3.10' of git://github.com/awilliam/linux-vfio:
vfio: Set container device mode
vfio: Use down_reads to protect iommu disconnects
vfio: Convert container->group_lock to rwsem
PCI/VFIO: use pcie_flags_reg instead of access PCI-E Capabilities Register
vfio-pci: Enable raw access to unassigned config space
vfio-pci: Use byte granularity in config map
vfio: make local function vfio_pci_intx_unmask_handler() static
VFIO-AER: Vfio-pci driver changes for supporting AER
VFIO: Wrapper for getting reference to vfio_device
Minor 0 is the VFIO container device (/dev/vfio/vfio). On it's own
the container does not provide a user with any privileged access. It
only supports API version check and extension check ioctls. Only by
attaching a VFIO group to the container does it gain any access. Set
the mode of the container to allow access.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull PCI updates from Bjorn Helgaas:
"PCI changes for the v3.10 merge window:
PCI device hotplug
- Remove ACPI PCI subdrivers (Jiang Liu, Myron Stowe)
- Make acpiphp builtin only, not modular (Jiang Liu)
- Add acpiphp mutual exclusion (Jiang Liu)
Power management
- Skip "PME enabled/disabled" messages when not supported (Rafael
Wysocki)
- Fix fallback to PCI_D0 (Rafael Wysocki)
Miscellaneous
- Factor quirk_io_region (Yinghai Lu)
- Cache MSI capability offsets & cleanup (Gavin Shan, Bjorn Helgaas)
- Clean up EISA resource initialization and logging (Bjorn Helgaas)
- Fix prototype warnings (Andy Shevchenko, Bjorn Helgaas)
- MIPS: Initialize of_node before scanning bus (Gabor Juhos)
- Fix pcibios_get_phb_of_node() declaration "weak" annotation (Gabor
Juhos)
- Add MSI INTX_DISABLE quirks for AR8161/AR8162/etc (Xiong Huang)
- Fix aer_inject return values (Prarit Bhargava)
- Remove PME/ACPI dependency (Andrew Murray)
- Use shared PCI_BUS_NUM() and PCI_DEVID() (Shuah Khan)"
* tag 'pci-v3.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (63 commits)
vfio-pci: Use cached MSI/MSI-X capabilities
vfio-pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Remove "extern" from function declarations
PCI: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
PCI: Drop msi_mask_reg() and remove drivers/pci/msi.h
PCI: Use msix_table_size() directly, drop multi_msix_capable()
PCI: Drop msix_table_offset_reg() and msix_pba_offset_reg() macros
PCI: Drop is_64bit_address() and is_mask_bit_support() macros
PCI: Drop msi_data_reg() macro
PCI: Drop msi_lower_address_reg() and msi_upper_address_reg() macros
PCI: Drop msi_control_reg() macro and use PCI_MSI_FLAGS directly
PCI: Use cached MSI/MSI-X offsets from dev, not from msi_desc
PCI: Clean up MSI/MSI-X capability #defines
PCI: Use cached MSI-X cap while enabling MSI-X
PCI: Use cached MSI cap while enabling MSI interrupts
PCI: Remove MSI/MSI-X cap check in pci_msi_check_device()
PCI: Cache MSI/MSI-X capability offsets in struct pci_dev
PCI: Use u8, not int, for PM capability offset
[SCSI] megaraid_sas: Use correct #define for MSI-X capability
PCI: Remove "extern" from function declarations
...
If a group or device is released or a container is unset from a group
it can race against file ops on the container. Protect these with
down_reads to allow concurrent users.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Michael S. Tsirkin <mst@redhat.com>
All current users are writers, maintaining current mutual exclusion.
This lets us add read users next.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We now cache the MSI/MSI-X capability offsets in the struct pci_dev,
so no need to find the capabilities again.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the
Table Offset register, not the flags ("Message Control" per spec)
register.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Currently, we use pcie_flags_reg to cache PCI-E Capabilities Register,
because PCI-E Capabilities Register bits are almost read-only. This patch
use pcie_caps_reg() instead of another access PCI-E Capabilities Register.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Devices like be2net hide registers between the gaps in capabilities
and architected regions of PCI config space. Our choices to support
such devices is to either build an ever growing and unmanageable white
list or rely on hardware isolation to protect us. These registers are
really no different than MMIO or I/O port space registers, which we
don't attempt to regulate, so treat PCI config space in the same way.
Reported-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
The config map previously used a byte per dword to map regions of
config space to capabilities. Modulo a bug where we round the length
of capabilities down instead of up, this theoretically works well and
saves space so long as devices don't try to hide registers in the gaps
between capabilities. Unfortunately they do exactly that so we need
byte granularity on our config space map. Increase the allocation of
the config map and split accesses at capability region boundaries.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Gavin Shan <shangw@linux.vnet.ibm.com>
The VFIO_DEVICE_SET_IRQS ioctl takes a start and count parameter, both
of which are unsigned. We attempt to bounds check these, but fail to
account for the case where start is a very large number, allowing
start + count to wrap back into the valid range. Bounds check both
start and start + count.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The vfio drivers call kmalloc or kzalloc, but do not
include <linux/slab.h>, which causes build errors on
ARM.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org
- New VFIO_SET_IRQ ioctl option to pass the eventfd that is signaled when
an error occurs in the vfio_pci_device
- Register pci_error_handler for the vfio_pci driver
- When the device encounters an error, the error handler registered by
the vfio_pci driver gets invoked by the AER infrastructure
- In the error handler, signal the eventfd registered for the device.
- This results in the qemu eventfd handler getting invoked and
appropriate action taken for the guest.
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
- Added vfio_device_get_from_dev() as wrapper to get
reference to vfio_device from struct device.
- Added vfio_device_data() as a wrapper to get device_data from
vfio_device.
Signed-off-by: Vijay Mohan Pandarathil <vijaymohan.pandarathil@hp.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
PCI defines display class VGA regions at I/O port address 0x3b0, 0x3c0
and MMIO address 0xa0000. As these are non-overlapping, we can ignore
the I/O port vs MMIO difference and expose them both in a single
region. We make use of the VGA arbiter around each access to
configure chipset access as necessary.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We give the user access to change the power state of the device but
certain transitions result in an uninitialized state which the user
cannot resolve. To fix this we need to mark the PowerState field of
the PMCSR register read-only and effect the requested change on behalf
of the user. This has the added benefit that pdev->current_state
remains accurate while controlled by the user.
The primary example of this bug is a QEMU guest doing a reboot where
the device it put into D3 on shutdown and becomes unusable on the next
boot because the device did a soft reset on D3->D0 (NoSoftRst-).
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
pcieport does nice things like manage AER and we know it doesn't do
DMA or expose any user accessible devices on the host. It also keeps
the Memory, I/O, and Busmaster bits enabled, which is pretty handy
when trying to use anyting below it. Devices owned by pcieport cannot
be given to users via vfio, but we can tolerate them not being owned
by vfio-pci.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio_dev_present is meant to give us a wait_event callback so that we
can block removing a device from vfio until it becomes unused. The
root of this check depends on being able to get the iommu group from
the device. Unfortunately if the BUS_NOTIFY_DEL_DEVICE notifier has
fired then the device-group reference is no longer searchable and we
fail the lookup.
We don't need to go to such extents for this though. We have a
reference to the device, from which we can acquire a reference to the
group. We can then use the group reference to search for the device
and properly block removal.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
We can actually handle MMIO and I/O port from the same access function
since PCI already does abstraction of this. The ROM BAR only requires
a minor difference, so it gets included too. vfio_pci_config_readwrite
gets renamed for consistency.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The read and write functions are nearly identical, combine them
and convert to a switch statement. This also makes it easy to
narrow the scope of when we use the io/mem accessors in case new
regions are added.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>