vfio.c abuses (misuses) "/**", which indicates the beginning of
kernel-doc notation in the kernel tree. This causes a bunch of
kernel-doc complaints about this source file, so quieten all of
them by changing all "/**" to "/*".
vfio.c:236: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* IOMMU driver registration
vfio.c:236: warning: missing initial short description on line:
* IOMMU driver registration
vfio.c:295: warning: expecting prototype for Container objects(). Prototype was for vfio_container_get() instead
vfio.c:317: warning: expecting prototype for Group objects(). Prototype was for __vfio_group_get_from_iommu() instead
vfio.c:496: warning: Function parameter or member 'device' not described in 'vfio_device_put'
vfio.c:496: warning: expecting prototype for Device objects(). Prototype was for vfio_device_put() instead
vfio.c:599: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Async device support
vfio.c:599: warning: missing initial short description on line:
* Async device support
vfio.c:693: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* VFIO driver API
vfio.c:693: warning: missing initial short description on line:
* VFIO driver API
vfio.c:835: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Get a reference to the vfio_device for a device. Even if the
vfio.c:835: warning: missing initial short description on line:
* Get a reference to the vfio_device for a device. Even if the
vfio.c:969: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* VFIO base fd, /dev/vfio/vfio
vfio.c:969: warning: missing initial short description on line:
* VFIO base fd, /dev/vfio/vfio
vfio.c:1187: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* VFIO Group fd, /dev/vfio/$GROUP
vfio.c:1187: warning: missing initial short description on line:
* VFIO Group fd, /dev/vfio/$GROUP
vfio.c:1540: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* VFIO Device fd
vfio.c:1540: warning: missing initial short description on line:
* VFIO Device fd
vfio.c:1615: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* External user API, exported by symbols to be linked dynamically.
vfio.c:1615: warning: missing initial short description on line:
* External user API, exported by symbols to be linked dynamically.
vfio.c:1663: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* External user API, exported by symbols to be linked dynamically.
vfio.c:1663: warning: missing initial short description on line:
* External user API, exported by symbols to be linked dynamically.
vfio.c:1742: warning: Function parameter or member 'caps' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: Function parameter or member 'size' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: Function parameter or member 'id' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: Function parameter or member 'version' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: expecting prototype for Sub(). Prototype was for vfio_info_cap_add() instead
vfio.c:2276: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* Module/class support
vfio.c:2276: warning: missing initial short description on line:
* Module/class support
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/38a9cb92-a473-40bf-b8f9-85cc5cfc2da4@infradead.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Modernize how vfio is creating the group char dev and sysfs presence.
These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime.
This API requires the driver to put the struct device and struct cdev
inside its state struct (vfio_group), and then use the usual
device_initialize()/cdev_device_add()/cdev_device_del() sequence.
Split the code to make this possible:
- vfio_group_alloc()/vfio_group_release() are pair'd functions to
alloc/free the vfio_group. release is done under the struct device
kref.
- vfio_create_group()/vfio_group_put() are pairs that manage the
sysfs/cdev lifetime. Once the uses count is zero the vfio group's
userspace presence is destroyed.
- The IDR is replaced with an IDA. container_of(inode->i_cdev)
is used to get back to the vfio_group during fops open. The IDA
assigns unique minor numbers.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The next patch adds a struct device to the struct vfio_group, and it is
confusing/bad practice to have two krefs in the same struct. This kref is
controlling the period when the vfio_group is registered in sysfs, and
visible in the internal lookup. Switch it to a refcount_t instead.
The refcount_dec_and_mutex_lock() is still required because we need
atomicity of the list searches and sysfs presence.
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
If vfio_create_group() searches the group list and returns an already
existing group it does not put back the iommu_group reference that the
caller passed in.
Change the semantic of vfio_create_group() to not move the reference in
from the caller, but instead obtain a new reference inside and leave the
caller's reference alone. The two callers must now call iommu_group_put().
This is an unlikely race as the only caller that could hit it has already
searched the group list before attempting to create the group.
Fixes: cba3345cc4 ("vfio: VFIO core")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
iommu_group_register_notifier()/iommu_group_unregister_notifier() are
built using a blocking_notifier_chain which integrates a rwsem. The
notifier function cannot be running outside its registration.
When considering how the notifier function interacts with create/destroy
of the group there are two fringe cases, the notifier starts before
list_add(&vfio.group_list) and the notifier runs after the kref
becomes 0.
Prior to vfio_create_group() unlocking and returning we have
container_users == 0
device_list == empty
And this cannot change until the mutex is unlocked.
After the kref goes to zero we must also have
container_users == 0
device_list == empty
Both are required because they are balanced operations and a 0 kref means
some caller became unbalanced. Add the missing assertion that
container_users must be zero as well.
These two facts are important because when checking each operation we see:
- IOMMU_GROUP_NOTIFY_ADD_DEVICE
Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
0 container_users ends the call
- IOMMU_GROUP_NOTIFY_BOUND_DRIVER
0 container_users ends the call
Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
items from the unbound list. During creation this list is empty, during
kref == 0 nothing can read this list, and it will be freed soon.
Since the vfio_group_release() doesn't hold the appropriate lock to
manipulate the unbound_list and could race with the notifier, move the
cleanup to directly before the kfree.
This allows deleting all of the deferred group put code.
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Liu Yi L <yi.l.liu@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Due to historical reason, some legacy shipped system doesn't follow
OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended
VBT is not contiguous after OpRegion in physical address, but any
location pointed by RVDA via absolute address. Also although current
OpRegion 2.1+ systems appears that the extended VBT follows OpRegion,
RVDA is the relative address to OpRegion head, the extended VBT location
may change to non-contiguous to OpRegion. In both cases, it's impossible
to map a contiguous range to hold both OpRegion and the extended VBT and
expose via one vfio region.
The only difference between OpRegion 2.0 and 2.1 is where extended
VBT is stored: For 2.0, RVDA is the absolute address of extended VBT
while for 2.1, RVDA is the relative address of extended VBT to OpRegion
baes, and there is no other difference between OpRegion 2.0 and 2.1.
To support the non-contiguous region case as described, the updated read
op will patch OpRegion version and RVDA on-the-fly accordingly. So that
from vfio igd OpRegion view, only 2.1+ with contiguous extended VBT
after OpRegion is exposed, regardless the underneath host OpRegion is
2.0 or 2.1+. The mechanism makes it possible to support legacy OpRegion
2.0 extended VBT systems with on the market, and support OpRegion 2.1+
where the extended VBT isn't contiguous after OpRegion.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Hang Yuan <hang.yuan@linux.intel.com>
Cc: Swee Yee Fonn <swee.yee.fonn@intel.com>
Cc: Fred Gao <fred.gao@intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Link: https://lore.kernel.org/r/20211012124855.52463-1-colin.xu@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Ensure pgsize_bitmap is always valid by initializing it to PAGE_MASK
in vfio_iommu_type1_open and remove the now pointless update for
the external domain case in vfio_iommu_type1_attach_group, which was
just setting pgsize_bitmap to PAGE_MASK when only external domains
were attached.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210924155705.4258-14-hch@lst.de
[aw: s/ULONG_MAX/PAGE_MASK/ per discussion in link]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reuse the logic in vfio_noiommu_group_alloc to allocate a fake
single-device iommu group for mediated devices by factoring out a common
function, and replacing the noiommu boolean field in struct vfio_group
with an enum to distinguish the three different kinds of groups.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
vfio_noiommu_attach_group has two callers:
1) __vfio_container_attach_groups is called by vfio_ioctl_set_iommu,
which just called vfio_iommu_driver_allowed
2) vfio_group_set_container requires already checks ->noiommu on the
vfio_group, which is propagated from the iommudata in
vfio_create_group
so this check is entirely superflous and can be removed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-4-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>