Pull VFIO updates from Alex Williamson:

 - Prune private items from vfio_pci_core.h to a new internal header,
   fix missed function rename, and refactor vfio-pci interrupt defines
   (Jason Gunthorpe)

 - Create consistent naming and handling of ioctls with a function per
   ioctl for vfio-pci and vfio group handling, use proper type args
   where available (Jason Gunthorpe)

 - Implement a set of low power device feature ioctls allowing userspace
   to make use of power states such as D3cold where supported (Abhishek
   Sahu)

 - Remove device counter on vfio groups, which had restricted the page
   pinning interface to singleton groups to account for limitations in
   the type1 IOMMU backend. Document usage as limited to emulated IOMMU
   devices, ie. traditional mdev devices where this restriction is
   consistent (Jason Gunthorpe)

 - Correct function prefix in hisi_acc driver incurred during previous
   refactoring (Shameer Kolothum)

 - Correct typo and remove redundant warning triggers in vfio-fsl driver
   (Christophe JAILLET)

 - Introduce device level DMA dirty tracking uAPI and implementation in
   the mlx5 variant driver (Yishai Hadas & Joao Martins)

 - Move much of the vfio_device life cycle management into vfio core,
   simplifying and avoiding duplication across drivers. This also
   facilitates adding a struct device to vfio_device which begins the
   introduction of device rather than group level user support and fills
   a gap allowing userspace identify devices as vfio capable without
   implicit knowledge of the driver (Kevin Tian & Yi Liu)

 - Split vfio container handling to a separate file, creating a more
   well defined API between the core and container code, masking IOMMU
   backend implementation from the core, allowing for an easier future
   transition to an iommufd based implementation of the same (Jason
   Gunthorpe)

 - Attempt to resolve race accessing the iommu_group for a device
   between vfio releasing DMA ownership and removal of the device from
   the IOMMU driver. Follow-up with support to allow vfio_group to exist
   with NULL iommu_group pointer to support existing userspace use cases
   of holding the group file open (Jason Gunthorpe)

 - Fix error code and hi/lo register manipulation issues in the hisi_acc
   variant driver, along with various code cleanups (Longfang Liu)

 - Fix a prior regression in GVT-g group teardown, resulting in
   unreleased resources (Jason Gunthorpe)

 - A significant cleanup and simplification of the mdev interface,
   consolidating much of the open coded per driver sysfs interface
   support into the mdev core (Christoph Hellwig)

 - Simplification of tracking and locking around vfio_groups that fall
   out from previous refactoring (Jason Gunthorpe)

 - Replace trivial open coded f_ops tests with new helper (Alex
   Williamson)

* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
  vfio: More vfio_file_is_group() use cases
  vfio: Make the group FD disassociate from the iommu_group
  vfio: Hold a reference to the iommu_group in kvm for SPAPR
  vfio: Add vfio_file_is_group()
  vfio: Change vfio_group->group_rwsem to a mutex
  vfio: Remove the vfio_group->users and users_comp
  vfio/mdev: add mdev available instance checking to the core
  vfio/mdev: consolidate all the description sysfs into the core code
  vfio/mdev: consolidate all the available_instance sysfs into the core code
  vfio/mdev: consolidate all the name sysfs into the core code
  vfio/mdev: consolidate all the device_api sysfs into the core code
  vfio/mdev: remove mtype_get_parent_dev
  vfio/mdev: remove mdev_parent_dev
  vfio/mdev: unexport mdev_bus_type
  vfio/mdev: remove mdev_from_dev
  vfio/mdev: simplify mdev_type handling
  vfio/mdev: embedd struct mdev_parent in the parent data structure
  vfio/mdev: make mdev.h standalone includable
  drm/i915/gvt: simplify vgpu configuration management
  drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
  ...
This commit is contained in:
Linus Torvalds
2022-10-12 14:46:48 -07:00
51 changed files with 4940 additions and 2773 deletions

View File

@@ -0,0 +1,8 @@
What: /sys/.../<device>/vfio-dev/vfioX/
Date: September 2022
Contact: Yi Liu <yi.l.liu@intel.com>
Description:
This directory is created when the device is bound to a
vfio driver. The layout under this directory matches what
exists for a standard 'struct device'. 'X' is a unique
index marking this device in vfio.

View File

@@ -58,19 +58,19 @@ devices as examples, as these devices are the first devices to use this module::
| MDEV CORE |
| MODULE |
| mdev.ko |
| +-----------+ | mdev_register_device() +--------------+
| +-----------+ | mdev_register_parent() +--------------+
| | | +<------------------------+ |
| | | | | nvidia.ko |<-> physical
| | | +------------------------>+ | device
| | | | callbacks +--------------+
| | Physical | |
| | device | | mdev_register_device() +--------------+
| | device | | mdev_register_parent() +--------------+
| | interface | |<------------------------+ |
| | | | | i915.ko |<-> physical
| | | +------------------------>+ | device
| | | | callbacks +--------------+
| | | |
| | | | mdev_register_device() +--------------+
| | | | mdev_register_parent() +--------------+
| | | +<------------------------+ |
| | | | | ccw_device.ko|<-> physical
| | | +------------------------>+ | device
@@ -103,7 +103,8 @@ structure to represent a mediated device's driver::
struct mdev_driver {
int (*probe) (struct mdev_device *dev);
void (*remove) (struct mdev_device *dev);
struct attribute_group **supported_type_groups;
unsigned int (*get_available)(struct mdev_type *mtype);
ssize_t (*show_description)(struct mdev_type *mtype, char *buf);
struct device_driver driver;
};
@@ -125,8 +126,8 @@ vfio_device_ops.
When a driver wants to add the GUID creation sysfs to an existing device it has
probe'd to then it should call::
int mdev_register_device(struct device *dev,
struct mdev_driver *mdev_driver);
int mdev_register_parent(struct mdev_parent *parent, struct device *dev,
struct mdev_driver *mdev_driver);
This will provide the 'mdev_supported_types/XX/create' files which can then be
used to trigger the creation of a mdev_device. The created mdev_device will be
@@ -134,7 +135,7 @@ attached to the specified driver.
When the driver needs to remove itself it calls::
void mdev_unregister_device(struct device *dev);
void mdev_unregister_parent(struct mdev_parent *parent);
Which will unbind and destroy all the created mdevs and remove the sysfs files.
@@ -200,17 +201,14 @@ Directories and files under the sysfs for Each Physical Device
sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
(or using mdev_parent_dev(mdev) to arrive at the parent device outside
of the core mdev code)
* device_api
This attribute should show which device API is being created, for example,
This attribute shows which device API is being created, for example,
"vfio-pci" for a PCI device.
* available_instances
This attribute should show the number of devices of type <type-id> that can be
This attribute shows the number of devices of type <type-id> that can be
created.
* [device]
@@ -220,11 +218,11 @@ Directories and files under the sysfs for Each Physical Device
* name
This attribute should show human readable name. This is optional attribute.
This attribute shows a human readable name.
* description
This attribute should show brief features/description of the type. This is
This attribute can show brief features/description of the type. This is an
optional attribute.
Directories and Files Under the sysfs for Each mdev Device

View File

@@ -297,7 +297,7 @@ of the VFIO AP mediated device driver::
| MDEV CORE |
| MODULE |
| mdev.ko |
| +---------+ | mdev_register_device() +--------------+
| +---------+ | mdev_register_parent() +--------------+
| |Physical | +<-----------------------+ |
| | device | | | vfio_ap.ko |<-> matrix
| |interface| +----------------------->+ | device

View File

@@ -156,7 +156,7 @@ Below is a high Level block diagram::
| MDEV CORE |
| MODULE |
| mdev.ko |
| +---------+ | mdev_register_device() +--------------+
| +---------+ | mdev_register_parent() +--------------+
| |Physical | +<-----------------------+ |
| | device | | | vfio_ccw.ko |<-> subchannel
| |interface| +----------------------->+ | device

View File

@@ -21558,6 +21558,7 @@ R: Cornelia Huck <cohuck@redhat.com>
L: kvm@vger.kernel.org
S: Maintained
T: git git://github.com/awilliam/linux-vfio.git
F: Documentation/ABI/testing/sysfs-devices-vfio-dev
F: Documentation/driver-api/vfio.rst
F: drivers/vfio/
F: include/linux/vfio.h

View File

@@ -240,13 +240,13 @@ static void free_resource(struct intel_vgpu *vgpu)
}
static int alloc_resource(struct intel_vgpu *vgpu,
struct intel_vgpu_creation_params *param)
const struct intel_vgpu_config *conf)
{
struct intel_gvt *gvt = vgpu->gvt;
unsigned long request, avail, max, taken;
const char *item;
if (!param->low_gm_sz || !param->high_gm_sz || !param->fence_sz) {
if (!conf->low_mm || !conf->high_mm || !conf->fence) {
gvt_vgpu_err("Invalid vGPU creation params\n");
return -EINVAL;
}
@@ -255,7 +255,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
max = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
taken = gvt->gm.vgpu_allocated_low_gm_size;
avail = max - taken;
request = MB_TO_BYTES(param->low_gm_sz);
request = conf->low_mm;
if (request > avail)
goto no_enough_resource;
@@ -266,7 +266,7 @@ static int alloc_resource(struct intel_vgpu *vgpu,
max = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
taken = gvt->gm.vgpu_allocated_high_gm_size;
avail = max - taken;
request = MB_TO_BYTES(param->high_gm_sz);
request = conf->high_mm;
if (request > avail)
goto no_enough_resource;
@@ -277,16 +277,16 @@ static int alloc_resource(struct intel_vgpu *vgpu,
max = gvt_fence_sz(gvt) - HOST_FENCE;
taken = gvt->fence.vgpu_allocated_fence_num;
avail = max - taken;
request = param->fence_sz;
request = conf->fence;
if (request > avail)
goto no_enough_resource;
vgpu_fence_sz(vgpu) = request;
gvt->gm.vgpu_allocated_low_gm_size += MB_TO_BYTES(param->low_gm_sz);
gvt->gm.vgpu_allocated_high_gm_size += MB_TO_BYTES(param->high_gm_sz);
gvt->fence.vgpu_allocated_fence_num += param->fence_sz;
gvt->gm.vgpu_allocated_low_gm_size += conf->low_mm;
gvt->gm.vgpu_allocated_high_gm_size += conf->high_mm;
gvt->fence.vgpu_allocated_fence_num += conf->fence;
return 0;
no_enough_resource:
@@ -340,11 +340,11 @@ void intel_vgpu_reset_resource(struct intel_vgpu *vgpu)
*
*/
int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
struct intel_vgpu_creation_params *param)
const struct intel_vgpu_config *conf)
{
int ret;
ret = alloc_resource(vgpu, param);
ret = alloc_resource(vgpu, conf);
if (ret)
return ret;

View File

@@ -36,6 +36,7 @@
#include <uapi/linux/pci_regs.h>
#include <linux/kvm_host.h>
#include <linux/vfio.h>
#include <linux/mdev.h>
#include "i915_drv.h"
#include "intel_gvt.h"
@@ -172,6 +173,7 @@ struct intel_vgpu_submission {
#define KVMGT_DEBUGFS_FILENAME "kvmgt_nr_cache_entries"
struct intel_vgpu {
struct vfio_device vfio_device;
struct intel_gvt *gvt;
struct mutex vgpu_lock;
int id;
@@ -211,7 +213,6 @@ struct intel_vgpu {
u32 scan_nonprivbb;
struct vfio_device vfio_device;
struct vfio_region *region;
int num_regions;
struct eventfd_ctx *intx_trigger;
@@ -294,15 +295,25 @@ struct intel_gvt_firmware {
bool firmware_loaded;
};
#define NR_MAX_INTEL_VGPU_TYPES 20
struct intel_vgpu_type {
char name[16];
unsigned int avail_instance;
unsigned int low_gm_size;
unsigned int high_gm_size;
struct intel_vgpu_config {
unsigned int low_mm;
unsigned int high_mm;
unsigned int fence;
/*
* A vGPU with a weight of 8 will get twice as much GPU as a vGPU with
* a weight of 4 on a contended host, different vGPU type has different
* weight set. Legal weights range from 1 to 16.
*/
unsigned int weight;
enum intel_vgpu_edid resolution;
enum intel_vgpu_edid edid;
const char *name;
};
struct intel_vgpu_type {
struct mdev_type type;
char name[16];
const struct intel_vgpu_config *conf;
};
struct intel_gvt {
@@ -326,6 +337,8 @@ struct intel_gvt {
struct intel_gvt_workload_scheduler scheduler;
struct notifier_block shadow_ctx_notifier_block[I915_NUM_ENGINES];
DECLARE_HASHTABLE(cmd_table, GVT_CMD_HASH_BITS);
struct mdev_parent parent;
struct mdev_type **mdev_types;
struct intel_vgpu_type *types;
unsigned int num_types;
struct intel_vgpu *idle_vgpu;
@@ -436,19 +449,8 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
/* ring context size i.e. the first 0x50 dwords*/
#define RING_CTX_SIZE 320
struct intel_vgpu_creation_params {
__u64 low_gm_sz; /* in MB */
__u64 high_gm_sz; /* in MB */
__u64 fence_sz;
__u64 resolution;
__s32 primary;
__u64 vgpu_id;
__u32 weight;
};
int intel_vgpu_alloc_resource(struct intel_vgpu *vgpu,
struct intel_vgpu_creation_params *param);
const struct intel_vgpu_config *conf);
void intel_vgpu_reset_resource(struct intel_vgpu *vgpu);
void intel_vgpu_free_resource(struct intel_vgpu *vgpu);
void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
@@ -494,8 +496,8 @@ void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt);
struct intel_vgpu *intel_gvt_create_idle_vgpu(struct intel_gvt *gvt);
void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu);
struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
struct intel_vgpu_type *type);
int intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
const struct intel_vgpu_config *conf);
void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,

View File

@@ -34,7 +34,6 @@
*/
#include <linux/init.h>
#include <linux/device.h>
#include <linux/mm.h>
#include <linux/kthread.h>
#include <linux/sched/mm.h>
@@ -43,7 +42,6 @@
#include <linux/rbtree.h>
#include <linux/spinlock.h>
#include <linux/eventfd.h>
#include <linux/uuid.h>
#include <linux/mdev.h>
#include <linux/debugfs.h>
@@ -115,117 +113,18 @@ static void kvmgt_page_track_flush_slot(struct kvm *kvm,
struct kvm_memory_slot *slot,
struct kvm_page_track_notifier_node *node);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
static ssize_t intel_vgpu_show_description(struct mdev_type *mtype, char *buf)
{
struct intel_vgpu_type *type;
unsigned int num = 0;
struct intel_gvt *gvt = kdev_to_i915(mtype_get_parent_dev(mtype))->gvt;
type = &gvt->types[mtype_get_type_group_id(mtype)];
if (!type)
num = 0;
else
num = type->avail_instance;
return sprintf(buf, "%u\n", num);
}
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING);
}
static ssize_t description_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
struct intel_vgpu_type *type;
struct intel_gvt *gvt = kdev_to_i915(mtype_get_parent_dev(mtype))->gvt;
type = &gvt->types[mtype_get_type_group_id(mtype)];
if (!type)
return 0;
struct intel_vgpu_type *type =
container_of(mtype, struct intel_vgpu_type, type);
return sprintf(buf, "low_gm_size: %dMB\nhigh_gm_size: %dMB\n"
"fence: %d\nresolution: %s\n"
"weight: %d\n",
BYTES_TO_MB(type->low_gm_size),
BYTES_TO_MB(type->high_gm_size),
type->fence, vgpu_edid_str(type->resolution),
type->weight);
}
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
struct intel_vgpu_type *type;
struct intel_gvt *gvt = kdev_to_i915(mtype_get_parent_dev(mtype))->gvt;
type = &gvt->types[mtype_get_type_group_id(mtype)];
if (!type)
return 0;
return sprintf(buf, "%s\n", type->name);
}
static MDEV_TYPE_ATTR_RO(available_instances);
static MDEV_TYPE_ATTR_RO(device_api);
static MDEV_TYPE_ATTR_RO(description);
static MDEV_TYPE_ATTR_RO(name);
static struct attribute *gvt_type_attrs[] = {
&mdev_type_attr_available_instances.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_description.attr,
&mdev_type_attr_name.attr,
NULL,
};
static struct attribute_group *gvt_vgpu_type_groups[] = {
[0 ... NR_MAX_INTEL_VGPU_TYPES - 1] = NULL,
};
static int intel_gvt_init_vgpu_type_groups(struct intel_gvt *gvt)
{
int i, j;
struct intel_vgpu_type *type;
struct attribute_group *group;
for (i = 0; i < gvt->num_types; i++) {
type = &gvt->types[i];
group = kzalloc(sizeof(struct attribute_group), GFP_KERNEL);
if (!group)
goto unwind;
group->name = type->name;
group->attrs = gvt_type_attrs;
gvt_vgpu_type_groups[i] = group;
}
return 0;
unwind:
for (j = 0; j < i; j++) {
group = gvt_vgpu_type_groups[j];
kfree(group);
}
return -ENOMEM;
}
static void intel_gvt_cleanup_vgpu_type_groups(struct intel_gvt *gvt)
{
int i;
struct attribute_group *group;
for (i = 0; i < gvt->num_types; i++) {
group = gvt_vgpu_type_groups[i];
gvt_vgpu_type_groups[i] = NULL;
kfree(group);
}
BYTES_TO_MB(type->conf->low_mm),
BYTES_TO_MB(type->conf->high_mm),
type->conf->fence, vgpu_edid_str(type->conf->edid),
type->conf->weight);
}
static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
@@ -1546,7 +1445,28 @@ static const struct attribute_group *intel_vgpu_groups[] = {
NULL,
};
static int intel_vgpu_init_dev(struct vfio_device *vfio_dev)
{
struct mdev_device *mdev = to_mdev_device(vfio_dev->dev);
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
struct intel_vgpu_type *type =
container_of(mdev->type, struct intel_vgpu_type, type);
vgpu->gvt = kdev_to_i915(mdev->type->parent->dev)->gvt;
return intel_gvt_create_vgpu(vgpu, type->conf);
}
static void intel_vgpu_release_dev(struct vfio_device *vfio_dev)
{
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
intel_gvt_destroy_vgpu(vgpu);
vfio_free_device(vfio_dev);
}
static const struct vfio_device_ops intel_vgpu_dev_ops = {
.init = intel_vgpu_init_dev,
.release = intel_vgpu_release_dev,
.open_device = intel_vgpu_open_device,
.close_device = intel_vgpu_close_device,
.read = intel_vgpu_read,
@@ -1558,35 +1478,28 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
static int intel_vgpu_probe(struct mdev_device *mdev)
{
struct device *pdev = mdev_parent_dev(mdev);
struct intel_gvt *gvt = kdev_to_i915(pdev)->gvt;
struct intel_vgpu_type *type;
struct intel_vgpu *vgpu;
int ret;
type = &gvt->types[mdev_get_type_group_id(mdev)];
if (!type)
return -EINVAL;
vgpu = intel_gvt_create_vgpu(gvt, type);
vgpu = vfio_alloc_device(intel_vgpu, vfio_device, &mdev->dev,
&intel_vgpu_dev_ops);
if (IS_ERR(vgpu)) {
gvt_err("failed to create intel vgpu: %ld\n", PTR_ERR(vgpu));
return PTR_ERR(vgpu);
}
vfio_init_group_dev(&vgpu->vfio_device, &mdev->dev,
&intel_vgpu_dev_ops);
dev_set_drvdata(&mdev->dev, vgpu);
ret = vfio_register_emulated_iommu_dev(&vgpu->vfio_device);
if (ret) {
intel_gvt_destroy_vgpu(vgpu);
return ret;
}
if (ret)
goto out_put_vdev;
gvt_dbg_core("intel_vgpu_create succeeded for mdev: %s\n",
dev_name(mdev_dev(mdev)));
return 0;
out_put_vdev:
vfio_put_device(&vgpu->vfio_device);
return ret;
}
static void intel_vgpu_remove(struct mdev_device *mdev)
@@ -1595,18 +1508,43 @@ static void intel_vgpu_remove(struct mdev_device *mdev)
if (WARN_ON_ONCE(vgpu->attached))
return;
intel_gvt_destroy_vgpu(vgpu);
vfio_unregister_group_dev(&vgpu->vfio_device);
vfio_put_device(&vgpu->vfio_device);
}
static unsigned int intel_vgpu_get_available(struct mdev_type *mtype)
{
struct intel_vgpu_type *type =
container_of(mtype, struct intel_vgpu_type, type);
struct intel_gvt *gvt = kdev_to_i915(mtype->parent->dev)->gvt;
unsigned int low_gm_avail, high_gm_avail, fence_avail;
mutex_lock(&gvt->lock);
low_gm_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE -
gvt->gm.vgpu_allocated_low_gm_size;
high_gm_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE -
gvt->gm.vgpu_allocated_high_gm_size;
fence_avail = gvt_fence_sz(gvt) - HOST_FENCE -
gvt->fence.vgpu_allocated_fence_num;
mutex_unlock(&gvt->lock);
return min3(low_gm_avail / type->conf->low_mm,
high_gm_avail / type->conf->high_mm,
fence_avail / type->conf->fence);
}
static struct mdev_driver intel_vgpu_mdev_driver = {
.device_api = VFIO_DEVICE_API_PCI_STRING,
.driver = {
.name = "intel_vgpu_mdev",
.owner = THIS_MODULE,
.dev_groups = intel_vgpu_groups,
},
.probe = intel_vgpu_probe,
.remove = intel_vgpu_remove,
.supported_type_groups = gvt_vgpu_type_groups,
.probe = intel_vgpu_probe,
.remove = intel_vgpu_remove,
.get_available = intel_vgpu_get_available,
.show_description = intel_vgpu_show_description,
};
int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
@@ -1904,8 +1842,7 @@ static void intel_gvt_clean_device(struct drm_i915_private *i915)
if (drm_WARN_ON(&i915->drm, !gvt))
return;
mdev_unregister_device(i915->drm.dev);
intel_gvt_cleanup_vgpu_type_groups(gvt);
mdev_unregister_parent(&gvt->parent);
intel_gvt_destroy_idle_vgpu(gvt->idle_vgpu);
intel_gvt_clean_vgpu_types(gvt);
@@ -2005,19 +1942,15 @@ static int intel_gvt_init_device(struct drm_i915_private *i915)
intel_gvt_debugfs_init(gvt);
ret = intel_gvt_init_vgpu_type_groups(gvt);
ret = mdev_register_parent(&gvt->parent, i915->drm.dev,
&intel_vgpu_mdev_driver,
gvt->mdev_types, gvt->num_types);
if (ret)
goto out_destroy_idle_vgpu;
ret = mdev_register_device(i915->drm.dev, &intel_vgpu_mdev_driver);
if (ret)
goto out_cleanup_vgpu_type_groups;
gvt_dbg_core("gvt device initialization is done\n");
return 0;
out_cleanup_vgpu_type_groups:
intel_gvt_cleanup_vgpu_type_groups(gvt);
out_destroy_idle_vgpu:
intel_gvt_destroy_idle_vgpu(gvt->idle_vgpu);
intel_gvt_debugfs_clean(gvt);

View File

@@ -73,24 +73,21 @@ void populate_pvinfo_page(struct intel_vgpu *vgpu)
drm_WARN_ON(&i915->drm, sizeof(struct vgt_if) != VGT_PVINFO_SIZE);
}
/*
* vGPU type name is defined as GVTg_Vx_y which contains the physical GPU
* generation type (e.g V4 as BDW server, V5 as SKL server).
*
* Depening on the physical SKU resource, we might see vGPU types like
* GVTg_V4_8, GVTg_V4_4, GVTg_V4_2, etc. We can create different types of
* vGPU on same physical GPU depending on available resource. Each vGPU
* type will have a different number of avail_instance to indicate how
* many vGPU instance can be created for this type.
*/
#define VGPU_MAX_WEIGHT 16
#define VGPU_WEIGHT(vgpu_num) \
(VGPU_MAX_WEIGHT / (vgpu_num))
static const struct {
unsigned int low_mm;
unsigned int high_mm;
unsigned int fence;
/* A vGPU with a weight of 8 will get twice as much GPU as a vGPU
* with a weight of 4 on a contended host, different vGPU type has
* different weight set. Legal weights range from 1 to 16.
*/
unsigned int weight;
enum intel_vgpu_edid edid;
const char *name;
} vgpu_types[] = {
/* Fixed vGPU type table */
static const struct intel_vgpu_config intel_vgpu_configs[] = {
{ MB_TO_BYTES(64), MB_TO_BYTES(384), 4, VGPU_WEIGHT(8), GVT_EDID_1024_768, "8" },
{ MB_TO_BYTES(128), MB_TO_BYTES(512), 4, VGPU_WEIGHT(4), GVT_EDID_1920_1200, "4" },
{ MB_TO_BYTES(256), MB_TO_BYTES(1024), 4, VGPU_WEIGHT(2), GVT_EDID_1920_1200, "2" },
@@ -106,104 +103,60 @@ static const struct {
*/
int intel_gvt_init_vgpu_types(struct intel_gvt *gvt)
{
unsigned int num_types;
unsigned int i, low_avail, high_avail;
unsigned int min_low;
/* vGPU type name is defined as GVTg_Vx_y which contains
* physical GPU generation type (e.g V4 as BDW server, V5 as
* SKL server).
*
* Depend on physical SKU resource, might see vGPU types like
* GVTg_V4_8, GVTg_V4_4, GVTg_V4_2, etc. We can create
* different types of vGPU on same physical GPU depending on
* available resource. Each vGPU type will have "avail_instance"
* to indicate how many vGPU instance can be created for this
* type.
*
*/
low_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
high_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
num_types = ARRAY_SIZE(vgpu_types);
unsigned int low_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE;
unsigned int high_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE;
unsigned int num_types = ARRAY_SIZE(intel_vgpu_configs);
unsigned int i;
gvt->types = kcalloc(num_types, sizeof(struct intel_vgpu_type),
GFP_KERNEL);
if (!gvt->types)
return -ENOMEM;
min_low = MB_TO_BYTES(32);
gvt->mdev_types = kcalloc(num_types, sizeof(*gvt->mdev_types),
GFP_KERNEL);
if (!gvt->mdev_types)
goto out_free_types;
for (i = 0; i < num_types; ++i) {
if (low_avail / vgpu_types[i].low_mm == 0)
const struct intel_vgpu_config *conf = &intel_vgpu_configs[i];
if (low_avail / conf->low_mm == 0)
break;
if (conf->weight < 1 || conf->weight > VGPU_MAX_WEIGHT)
goto out_free_mdev_types;
gvt->types[i].low_gm_size = vgpu_types[i].low_mm;
gvt->types[i].high_gm_size = vgpu_types[i].high_mm;
gvt->types[i].fence = vgpu_types[i].fence;
if (vgpu_types[i].weight < 1 ||
vgpu_types[i].weight > VGPU_MAX_WEIGHT)
return -EINVAL;
gvt->types[i].weight = vgpu_types[i].weight;
gvt->types[i].resolution = vgpu_types[i].edid;
gvt->types[i].avail_instance = min(low_avail / vgpu_types[i].low_mm,
high_avail / vgpu_types[i].high_mm);
if (GRAPHICS_VER(gvt->gt->i915) == 8)
sprintf(gvt->types[i].name, "GVTg_V4_%s",
vgpu_types[i].name);
else if (GRAPHICS_VER(gvt->gt->i915) == 9)
sprintf(gvt->types[i].name, "GVTg_V5_%s",
vgpu_types[i].name);
sprintf(gvt->types[i].name, "GVTg_V%u_%s",
GRAPHICS_VER(gvt->gt->i915) == 8 ? 4 : 5, conf->name);
gvt->types[i].conf = conf;
gvt_dbg_core("type[%d]: %s avail %u low %u high %u fence %u weight %u res %s\n",
i, gvt->types[i].name,
gvt->types[i].avail_instance,
gvt->types[i].low_gm_size,
gvt->types[i].high_gm_size, gvt->types[i].fence,
gvt->types[i].weight,
vgpu_edid_str(gvt->types[i].resolution));
min(low_avail / conf->low_mm,
high_avail / conf->high_mm),
conf->low_mm, conf->high_mm, conf->fence,
conf->weight, vgpu_edid_str(conf->edid));
gvt->mdev_types[i] = &gvt->types[i].type;
gvt->mdev_types[i]->sysfs_name = gvt->types[i].name;
}
gvt->num_types = i;
return 0;
out_free_mdev_types:
kfree(gvt->mdev_types);
out_free_types:
kfree(gvt->types);
return -EINVAL;
}
void intel_gvt_clean_vgpu_types(struct intel_gvt *gvt)
{
kfree(gvt->mdev_types);
kfree(gvt->types);
}
static void intel_gvt_update_vgpu_types(struct intel_gvt *gvt)
{
int i;
unsigned int low_gm_avail, high_gm_avail, fence_avail;
unsigned int low_gm_min, high_gm_min, fence_min;
/* Need to depend on maxium hw resource size but keep on
* static config for now.
*/
low_gm_avail = gvt_aperture_sz(gvt) - HOST_LOW_GM_SIZE -
gvt->gm.vgpu_allocated_low_gm_size;
high_gm_avail = gvt_hidden_sz(gvt) - HOST_HIGH_GM_SIZE -
gvt->gm.vgpu_allocated_high_gm_size;
fence_avail = gvt_fence_sz(gvt) - HOST_FENCE -
gvt->fence.vgpu_allocated_fence_num;
for (i = 0; i < gvt->num_types; i++) {
low_gm_min = low_gm_avail / gvt->types[i].low_gm_size;
high_gm_min = high_gm_avail / gvt->types[i].high_gm_size;
fence_min = fence_avail / gvt->types[i].fence;
gvt->types[i].avail_instance = min(min(low_gm_min, high_gm_min),
fence_min);
gvt_dbg_core("update type[%d]: %s avail %u low %u high %u fence %u\n",
i, gvt->types[i].name,
gvt->types[i].avail_instance, gvt->types[i].low_gm_size,
gvt->types[i].high_gm_size, gvt->types[i].fence);
}
}
/**
* intel_gvt_active_vgpu - activate a virtual GPU
* @vgpu: virtual GPU
@@ -298,12 +251,6 @@ void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu)
intel_vgpu_clean_mmio(vgpu);
intel_vgpu_dmabuf_cleanup(vgpu);
mutex_unlock(&vgpu->vgpu_lock);
mutex_lock(&gvt->lock);
intel_gvt_update_vgpu_types(gvt);
mutex_unlock(&gvt->lock);
vfree(vgpu);
}
#define IDLE_VGPU_IDR 0
@@ -363,42 +310,38 @@ void intel_gvt_destroy_idle_vgpu(struct intel_vgpu *vgpu)
vfree(vgpu);
}
static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
struct intel_vgpu_creation_params *param)
int intel_gvt_create_vgpu(struct intel_vgpu *vgpu,
const struct intel_vgpu_config *conf)
{
struct intel_gvt *gvt = vgpu->gvt;
struct drm_i915_private *dev_priv = gvt->gt->i915;
struct intel_vgpu *vgpu;
int ret;
gvt_dbg_core("low %llu MB high %llu MB fence %llu\n",
param->low_gm_sz, param->high_gm_sz,
param->fence_sz);
vgpu = vzalloc(sizeof(*vgpu));
if (!vgpu)
return ERR_PTR(-ENOMEM);
gvt_dbg_core("low %u MB high %u MB fence %u\n",
BYTES_TO_MB(conf->low_mm), BYTES_TO_MB(conf->high_mm),
conf->fence);
mutex_lock(&gvt->lock);
ret = idr_alloc(&gvt->vgpu_idr, vgpu, IDLE_VGPU_IDR + 1, GVT_MAX_VGPU,
GFP_KERNEL);
if (ret < 0)
goto out_free_vgpu;
goto out_unlock;;
vgpu->id = ret;
vgpu->gvt = gvt;
vgpu->sched_ctl.weight = param->weight;
vgpu->sched_ctl.weight = conf->weight;
mutex_init(&vgpu->vgpu_lock);
mutex_init(&vgpu->dmabuf_lock);
INIT_LIST_HEAD(&vgpu->dmabuf_obj_list_head);
INIT_RADIX_TREE(&vgpu->page_track_tree, GFP_KERNEL);
idr_init_base(&vgpu->object_idr, 1);
intel_vgpu_init_cfg_space(vgpu, param->primary);
intel_vgpu_init_cfg_space(vgpu, 1);
vgpu->d3_entered = false;
ret = intel_vgpu_init_mmio(vgpu);
if (ret)
goto out_clean_idr;
ret = intel_vgpu_alloc_resource(vgpu, param);
ret = intel_vgpu_alloc_resource(vgpu, conf);
if (ret)
goto out_clean_vgpu_mmio;
@@ -412,7 +355,7 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
if (ret)
goto out_clean_gtt;
ret = intel_vgpu_init_display(vgpu, param->resolution);
ret = intel_vgpu_init_display(vgpu, conf->edid);
if (ret)
goto out_clean_opregion;
@@ -437,7 +380,9 @@ static struct intel_vgpu *__intel_gvt_create_vgpu(struct intel_gvt *gvt,
if (ret)
goto out_clean_sched_policy;
return vgpu;
intel_gvt_update_reg_whitelist(vgpu);
mutex_unlock(&gvt->lock);
return 0;
out_clean_sched_policy:
intel_vgpu_clean_sched_policy(vgpu);
@@ -455,48 +400,9 @@ out_clean_vgpu_mmio:
intel_vgpu_clean_mmio(vgpu);
out_clean_idr:
idr_remove(&gvt->vgpu_idr, vgpu->id);
out_free_vgpu:
vfree(vgpu);
return ERR_PTR(ret);
}
/**
* intel_gvt_create_vgpu - create a virtual GPU
* @gvt: GVT device
* @type: type of the vGPU to create
*
* This function is called when user wants to create a virtual GPU.
*
* Returns:
* pointer to intel_vgpu, error pointer if failed.
*/
struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
struct intel_vgpu_type *type)
{
struct intel_vgpu_creation_params param;
struct intel_vgpu *vgpu;
param.primary = 1;
param.low_gm_sz = type->low_gm_size;
param.high_gm_sz = type->high_gm_size;
param.fence_sz = type->fence;
param.weight = type->weight;
param.resolution = type->resolution;
/* XXX current param based on MB */
param.low_gm_sz = BYTES_TO_MB(param.low_gm_sz);
param.high_gm_sz = BYTES_TO_MB(param.high_gm_sz);
mutex_lock(&gvt->lock);
vgpu = __intel_gvt_create_vgpu(gvt, &param);
if (!IS_ERR(vgpu)) {
/* calculate left instance change for types */
intel_gvt_update_vgpu_types(gvt);
intel_gvt_update_reg_whitelist(vgpu);
}
out_unlock:
mutex_unlock(&gvt->lock);
return vgpu;
return ret;
}
/**

View File

@@ -12,7 +12,6 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/mdev.h>
@@ -142,7 +141,6 @@ static struct vfio_ccw_private *vfio_ccw_alloc_private(struct subchannel *sch)
INIT_LIST_HEAD(&private->crw);
INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
INIT_WORK(&private->crw_work, vfio_ccw_crw_todo);
atomic_set(&private->avail, 1);
private->cp.guest_cp = kcalloc(CCWCHAIN_LEN_MAX, sizeof(struct ccw1),
GFP_KERNEL);
@@ -203,7 +201,6 @@ static void vfio_ccw_free_private(struct vfio_ccw_private *private)
mutex_destroy(&private->io_mutex);
kfree(private);
}
static int vfio_ccw_sch_probe(struct subchannel *sch)
{
struct pmcw *pmcw = &sch->schib.pmcw;
@@ -222,7 +219,12 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
dev_set_drvdata(&sch->dev, private);
ret = mdev_register_device(&sch->dev, &vfio_ccw_mdev_driver);
private->mdev_type.sysfs_name = "io";
private->mdev_type.pretty_name = "I/O subchannel (Non-QDIO)";
private->mdev_types[0] = &private->mdev_type;
ret = mdev_register_parent(&private->parent, &sch->dev,
&vfio_ccw_mdev_driver,
private->mdev_types, 1);
if (ret)
goto out_free;
@@ -241,7 +243,7 @@ static void vfio_ccw_sch_remove(struct subchannel *sch)
{
struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
mdev_unregister_device(&sch->dev);
mdev_unregister_parent(&private->parent);
dev_set_drvdata(&sch->dev, NULL);

View File

@@ -11,7 +11,6 @@
*/
#include <linux/vfio.h>
#include <linux/mdev.h>
#include <linux/nospec.h>
#include <linux/slab.h>
@@ -45,47 +44,14 @@ static void vfio_ccw_dma_unmap(struct vfio_device *vdev, u64 iova, u64 length)
vfio_ccw_mdev_reset(private);
}
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "I/O subchannel (Non-QDIO)\n");
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_CCW_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
static int vfio_ccw_mdev_init_dev(struct vfio_device *vdev)
{
struct vfio_ccw_private *private =
dev_get_drvdata(mtype_get_parent_dev(mtype));
container_of(vdev, struct vfio_ccw_private, vdev);
return sprintf(buf, "%d\n", atomic_read(&private->avail));
init_completion(&private->release_comp);
return 0;
}
static MDEV_TYPE_ATTR_RO(available_instances);
static struct attribute *mdev_types_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group mdev_type_group = {
.name = "io",
.attrs = mdev_types_attrs,
};
static struct attribute_group *mdev_type_groups[] = {
&mdev_type_group,
NULL,
};
static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
{
@@ -95,12 +61,9 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
if (private->state == VFIO_CCW_STATE_NOT_OPER)
return -ENODEV;
if (atomic_dec_if_positive(&private->avail) < 0)
return -EPERM;
memset(&private->vdev, 0, sizeof(private->vdev));
vfio_init_group_dev(&private->vdev, &mdev->dev,
&vfio_ccw_dev_ops);
ret = vfio_init_device(&private->vdev, &mdev->dev, &vfio_ccw_dev_ops);
if (ret)
return ret;
VFIO_CCW_MSG_EVENT(2, "sch %x.%x.%04x: create\n",
private->sch->schid.cssid,
@@ -109,16 +72,32 @@ static int vfio_ccw_mdev_probe(struct mdev_device *mdev)
ret = vfio_register_emulated_iommu_dev(&private->vdev);
if (ret)
goto err_atomic;
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, private);
return 0;
err_atomic:
vfio_uninit_group_dev(&private->vdev);
atomic_inc(&private->avail);
err_put_vdev:
vfio_put_device(&private->vdev);
return ret;
}
static void vfio_ccw_mdev_release_dev(struct vfio_device *vdev)
{
struct vfio_ccw_private *private =
container_of(vdev, struct vfio_ccw_private, vdev);
/*
* We cannot free vfio_ccw_private here because it includes
* parent info which must be free'ed by css driver.
*
* Use a workaround by memset'ing the core device part and
* then notifying the remove path that all active references
* to this device have been released.
*/
memset(vdev, 0, sizeof(*vdev));
complete(&private->release_comp);
}
static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
{
struct vfio_ccw_private *private = dev_get_drvdata(mdev->dev.parent);
@@ -130,8 +109,16 @@ static void vfio_ccw_mdev_remove(struct mdev_device *mdev)
vfio_unregister_group_dev(&private->vdev);
vfio_uninit_group_dev(&private->vdev);
atomic_inc(&private->avail);
vfio_put_device(&private->vdev);
/*
* Wait for all active references on mdev are released so it
* is safe to defer kfree() to a later point.
*
* TODO: the clean fix is to split parent/mdev info from ccw
* private structure so each can be managed in its own life
* cycle.
*/
wait_for_completion(&private->release_comp);
}
static int vfio_ccw_mdev_open_device(struct vfio_device *vdev)
@@ -592,6 +579,8 @@ static void vfio_ccw_mdev_request(struct vfio_device *vdev, unsigned int count)
}
static const struct vfio_device_ops vfio_ccw_dev_ops = {
.init = vfio_ccw_mdev_init_dev,
.release = vfio_ccw_mdev_release_dev,
.open_device = vfio_ccw_mdev_open_device,
.close_device = vfio_ccw_mdev_close_device,
.read = vfio_ccw_mdev_read,
@@ -602,6 +591,8 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
};
struct mdev_driver vfio_ccw_mdev_driver = {
.device_api = VFIO_DEVICE_API_CCW_STRING,
.max_instances = 1,
.driver = {
.name = "vfio_ccw_mdev",
.owner = THIS_MODULE,
@@ -609,5 +600,4 @@ struct mdev_driver vfio_ccw_mdev_driver = {
},
.probe = vfio_ccw_mdev_probe,
.remove = vfio_ccw_mdev_remove,
.supported_type_groups = mdev_type_groups,
};

View File

@@ -18,6 +18,7 @@
#include <linux/workqueue.h>
#include <linux/vfio_ccw.h>
#include <linux/vfio.h>
#include <linux/mdev.h>
#include <asm/crw.h>
#include <asm/debug.h>
@@ -72,7 +73,6 @@ struct vfio_ccw_crw {
* @sch: pointer to the subchannel
* @state: internal state of the device
* @completion: synchronization helper of the I/O completion
* @avail: available for creating a mediated device
* @io_region: MMIO region to input/output I/O arguments/results
* @io_mutex: protect against concurrent update of I/O regions
* @region: additional regions for other subchannel operations
@@ -88,13 +88,14 @@ struct vfio_ccw_crw {
* @req_trigger: eventfd ctx for signaling userspace to return device
* @io_work: work for deferral process of I/O handling
* @crw_work: work for deferral process of CRW handling
* @release_comp: synchronization helper for vfio device release
* @parent: parent data structures for mdevs created
*/
struct vfio_ccw_private {
struct vfio_device vdev;
struct subchannel *sch;
int state;
struct completion *completion;
atomic_t avail;
struct ccw_io_region *io_region;
struct mutex io_mutex;
struct vfio_ccw_region *region;
@@ -113,6 +114,12 @@ struct vfio_ccw_private {
struct eventfd_ctx *req_trigger;
struct work_struct io_work;
struct work_struct crw_work;
struct completion release_comp;
struct mdev_parent parent;
struct mdev_type mdev_type;
struct mdev_type *mdev_types[1];
} __aligned(8);
int vfio_ccw_sch_quiesce(struct subchannel *sch);

View File

@@ -684,42 +684,41 @@ static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm,
AP_DOMAINS);
}
static int vfio_ap_mdev_probe(struct mdev_device *mdev)
static int vfio_ap_mdev_init_dev(struct vfio_device *vdev)
{
struct ap_matrix_mdev *matrix_mdev;
int ret;
struct ap_matrix_mdev *matrix_mdev =
container_of(vdev, struct ap_matrix_mdev, vdev);
if ((atomic_dec_if_positive(&matrix_dev->available_instances) < 0))
return -EPERM;
matrix_mdev = kzalloc(sizeof(*matrix_mdev), GFP_KERNEL);
if (!matrix_mdev) {
ret = -ENOMEM;
goto err_dec_available;
}
vfio_init_group_dev(&matrix_mdev->vdev, &mdev->dev,
&vfio_ap_matrix_dev_ops);
matrix_mdev->mdev = mdev;
matrix_mdev->mdev = to_mdev_device(vdev->dev);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
matrix_mdev->pqap_hook = handle_pqap;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable.queues);
return 0;
}
static int vfio_ap_mdev_probe(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
int ret;
matrix_mdev = vfio_alloc_device(ap_matrix_mdev, vdev, &mdev->dev,
&vfio_ap_matrix_dev_ops);
if (IS_ERR(matrix_mdev))
return PTR_ERR(matrix_mdev);
ret = vfio_register_emulated_iommu_dev(&matrix_mdev->vdev);
if (ret)
goto err_list;
goto err_put_vdev;
dev_set_drvdata(&mdev->dev, matrix_mdev);
mutex_lock(&matrix_dev->mdevs_lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->mdevs_lock);
return 0;
err_list:
vfio_uninit_group_dev(&matrix_mdev->vdev);
kfree(matrix_mdev);
err_dec_available:
atomic_inc(&matrix_dev->available_instances);
err_put_vdev:
vfio_put_device(&matrix_mdev->vdev);
return ret;
}
@@ -766,6 +765,11 @@ static void vfio_ap_mdev_unlink_fr_queues(struct ap_matrix_mdev *matrix_mdev)
}
}
static void vfio_ap_mdev_release_dev(struct vfio_device *vdev)
{
vfio_free_device(vdev);
}
static void vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(&mdev->dev);
@@ -779,54 +783,9 @@ static void vfio_ap_mdev_remove(struct mdev_device *mdev)
list_del(&matrix_mdev->node);
mutex_unlock(&matrix_dev->mdevs_lock);
mutex_unlock(&matrix_dev->guests_lock);
vfio_uninit_group_dev(&matrix_mdev->vdev);
kfree(matrix_mdev);
atomic_inc(&matrix_dev->available_instances);
vfio_put_device(&matrix_mdev->vdev);
}
static ssize_t name_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_AP_MDEV_NAME_HWVIRT);
}
static MDEV_TYPE_ATTR_RO(name);
static ssize_t available_instances_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr,
char *buf)
{
return sprintf(buf, "%d\n",
atomic_read(&matrix_dev->available_instances));
}
static MDEV_TYPE_ATTR_RO(available_instances);
static ssize_t device_api_show(struct mdev_type *mtype,
struct mdev_type_attribute *attr, char *buf)
{
return sprintf(buf, "%s\n", VFIO_DEVICE_API_AP_STRING);
}
static MDEV_TYPE_ATTR_RO(device_api);
static struct attribute *vfio_ap_mdev_type_attrs[] = {
&mdev_type_attr_name.attr,
&mdev_type_attr_device_api.attr,
&mdev_type_attr_available_instances.attr,
NULL,
};
static struct attribute_group vfio_ap_mdev_hwvirt_type_group = {
.name = VFIO_AP_MDEV_TYPE_HWVIRT,
.attrs = vfio_ap_mdev_type_attrs,
};
static struct attribute_group *vfio_ap_mdev_type_groups[] = {
&vfio_ap_mdev_hwvirt_type_group,
NULL,
};
#define MDEV_SHARING_ERR "Userspace may not re-assign queue %02lx.%04lx " \
"already assigned to %s"
@@ -1824,6 +1783,8 @@ static const struct attribute_group vfio_queue_attr_group = {
};
static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.init = vfio_ap_mdev_init_dev,
.release = vfio_ap_mdev_release_dev,
.open_device = vfio_ap_mdev_open_device,
.close_device = vfio_ap_mdev_close_device,
.ioctl = vfio_ap_mdev_ioctl,
@@ -1831,6 +1792,8 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
};
static struct mdev_driver vfio_ap_matrix_driver = {
.device_api = VFIO_DEVICE_API_AP_STRING,
.max_instances = MAX_ZDEV_ENTRIES_EXT,
.driver = {
.name = "vfio_ap_mdev",
.owner = THIS_MODULE,
@@ -1839,20 +1802,22 @@ static struct mdev_driver vfio_ap_matrix_driver = {
},
.probe = vfio_ap_mdev_probe,
.remove = vfio_ap_mdev_remove,
.supported_type_groups = vfio_ap_mdev_type_groups,
};
int vfio_ap_mdev_register(void)
{
int ret;
atomic_set(&matrix_dev->available_instances, MAX_ZDEV_ENTRIES_EXT);
ret = mdev_register_driver(&vfio_ap_matrix_driver);
if (ret)
return ret;
ret = mdev_register_device(&matrix_dev->device, &vfio_ap_matrix_driver);
matrix_dev->mdev_type.sysfs_name = VFIO_AP_MDEV_TYPE_HWVIRT;
matrix_dev->mdev_type.pretty_name = VFIO_AP_MDEV_NAME_HWVIRT;
matrix_dev->mdev_types[0] = &matrix_dev->mdev_type;
ret = mdev_register_parent(&matrix_dev->parent, &matrix_dev->device,
&vfio_ap_matrix_driver,
matrix_dev->mdev_types, 1);
if (ret)
goto err_driver;
return 0;
@@ -1864,7 +1829,7 @@ err_driver:
void vfio_ap_mdev_unregister(void)
{
mdev_unregister_device(&matrix_dev->device);
mdev_unregister_parent(&matrix_dev->parent);
mdev_unregister_driver(&vfio_ap_matrix_driver);
}

View File

@@ -13,7 +13,6 @@
#define _VFIO_AP_PRIVATE_H_
#include <linux/types.h>
#include <linux/device.h>
#include <linux/mdev.h>
#include <linux/delay.h>
#include <linux/mutex.h>
@@ -30,7 +29,6 @@
* struct ap_matrix_dev - Contains the data for the matrix device.
*
* @device: generic device structure associated with the AP matrix device
* @available_instances: number of mediated matrix devices that can be created
* @info: the struct containing the output from the PQAP(QCI) instruction
* @mdev_list: the list of mediated matrix devices created
* @mdevs_lock: mutex for locking the AP matrix device. This lock will be
@@ -47,12 +45,14 @@
*/
struct ap_matrix_dev {
struct device device;
atomic_t available_instances;
struct ap_config_info info;
struct list_head mdev_list;
struct mutex mdevs_lock; /* serializes access to each ap_matrix_mdev */
struct ap_driver *vfio_ap_drv;
struct mutex guests_lock; /* serializes access to each KVM guest */
struct mdev_parent parent;
struct mdev_type mdev_type;
struct mdev_type *mdev_types[];
};
extern struct ap_matrix_dev *matrix_dev;

View File

@@ -3,6 +3,7 @@ menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
select IOMMU_API
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
select INTERVAL_TREE
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.

View File

@@ -1,9 +1,12 @@
# SPDX-License-Identifier: GPL-2.0
vfio_virqfd-y := virqfd.o
vfio-y += vfio_main.o
obj-$(CONFIG_VFIO) += vfio.o
vfio-y += vfio_main.o \
iova_bitmap.o \
container.o
obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o

680
drivers/vfio/container.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -108,9 +108,9 @@ static void vfio_fsl_mc_close_device(struct vfio_device *core_vdev)
/* reset the device before cleaning up the interrupts */
ret = vfio_fsl_mc_reset_device(vdev);
if (WARN_ON(ret))
if (ret)
dev_warn(&mc_cont->dev,
"VFIO_FLS_MC: reset device has failed (%d)\n", ret);
"VFIO_FSL_MC: reset device has failed (%d)\n", ret);
vfio_fsl_mc_irqs_cleanup(vdev);
@@ -418,16 +418,7 @@ static int vfio_fsl_mc_mmap(struct vfio_device *core_vdev,
return vfio_fsl_mc_mmap_mmio(vdev->regions[index], vma);
}
static const struct vfio_device_ops vfio_fsl_mc_ops = {
.name = "vfio-fsl-mc",
.open_device = vfio_fsl_mc_open_device,
.close_device = vfio_fsl_mc_close_device,
.ioctl = vfio_fsl_mc_ioctl,
.read = vfio_fsl_mc_read,
.write = vfio_fsl_mc_write,
.mmap = vfio_fsl_mc_mmap,
};
static const struct vfio_device_ops vfio_fsl_mc_ops;
static int vfio_fsl_mc_bus_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -518,35 +509,43 @@ static void vfio_fsl_uninit_device(struct vfio_fsl_mc_device *vdev)
bus_unregister_notifier(&fsl_mc_bus_type, &vdev->nb);
}
static int vfio_fsl_mc_init_dev(struct vfio_device *core_vdev)
{
struct vfio_fsl_mc_device *vdev =
container_of(core_vdev, struct vfio_fsl_mc_device, vdev);
struct fsl_mc_device *mc_dev = to_fsl_mc_device(core_vdev->dev);
int ret;
vdev->mc_dev = mc_dev;
mutex_init(&vdev->igate);
if (is_fsl_mc_bus_dprc(mc_dev))
ret = vfio_assign_device_set(core_vdev, &mc_dev->dev);
else
ret = vfio_assign_device_set(core_vdev, mc_dev->dev.parent);
if (ret)
return ret;
/* device_set is released by vfio core if @init fails */
return vfio_fsl_mc_init_device(vdev);
}
static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev)
{
struct vfio_fsl_mc_device *vdev;
struct device *dev = &mc_dev->dev;
int ret;
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return -ENOMEM;
vfio_init_group_dev(&vdev->vdev, dev, &vfio_fsl_mc_ops);
vdev->mc_dev = mc_dev;
mutex_init(&vdev->igate);
if (is_fsl_mc_bus_dprc(mc_dev))
ret = vfio_assign_device_set(&vdev->vdev, &mc_dev->dev);
else
ret = vfio_assign_device_set(&vdev->vdev, mc_dev->dev.parent);
if (ret)
goto out_uninit;
ret = vfio_fsl_mc_init_device(vdev);
if (ret)
goto out_uninit;
vdev = vfio_alloc_device(vfio_fsl_mc_device, vdev, dev,
&vfio_fsl_mc_ops);
if (IS_ERR(vdev))
return PTR_ERR(vdev);
ret = vfio_register_group_dev(&vdev->vdev);
if (ret) {
dev_err(dev, "VFIO_FSL_MC: Failed to add to vfio group\n");
goto out_device;
goto out_put_vdev;
}
ret = vfio_fsl_mc_scan_container(mc_dev);
@@ -557,30 +556,44 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev)
out_group_dev:
vfio_unregister_group_dev(&vdev->vdev);
out_device:
vfio_fsl_uninit_device(vdev);
out_uninit:
vfio_uninit_group_dev(&vdev->vdev);
kfree(vdev);
out_put_vdev:
vfio_put_device(&vdev->vdev);
return ret;
}
static void vfio_fsl_mc_release_dev(struct vfio_device *core_vdev)
{
struct vfio_fsl_mc_device *vdev =
container_of(core_vdev, struct vfio_fsl_mc_device, vdev);
vfio_fsl_uninit_device(vdev);
mutex_destroy(&vdev->igate);
vfio_free_device(core_vdev);
}
static int vfio_fsl_mc_remove(struct fsl_mc_device *mc_dev)
{
struct device *dev = &mc_dev->dev;
struct vfio_fsl_mc_device *vdev = dev_get_drvdata(dev);
vfio_unregister_group_dev(&vdev->vdev);
mutex_destroy(&vdev->igate);
dprc_remove_devices(mc_dev, NULL, 0);
vfio_fsl_uninit_device(vdev);
vfio_uninit_group_dev(&vdev->vdev);
kfree(vdev);
vfio_put_device(&vdev->vdev);
return 0;
}
static const struct vfio_device_ops vfio_fsl_mc_ops = {
.name = "vfio-fsl-mc",
.init = vfio_fsl_mc_init_dev,
.release = vfio_fsl_mc_release_dev,
.open_device = vfio_fsl_mc_open_device,
.close_device = vfio_fsl_mc_close_device,
.ioctl = vfio_fsl_mc_ioctl,
.read = vfio_fsl_mc_read,
.write = vfio_fsl_mc_write,
.mmap = vfio_fsl_mc_mmap,
};
static struct fsl_mc_driver vfio_fsl_mc_driver = {
.probe = vfio_fsl_mc_probe,
.remove = vfio_fsl_mc_remove,

422
drivers/vfio/iova_bitmap.c Normal file
View File

@@ -0,0 +1,422 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2022, Oracle and/or its affiliates.
* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
*/
#include <linux/iova_bitmap.h>
#include <linux/mm.h>
#include <linux/highmem.h>
#define BITS_PER_PAGE (PAGE_SIZE * BITS_PER_BYTE)
/*
* struct iova_bitmap_map - A bitmap representing an IOVA range
*
* Main data structure for tracking mapped user pages of bitmap data.
*
* For example, for something recording dirty IOVAs, it will be provided a
* struct iova_bitmap structure, as a general structure for iterating the
* total IOVA range. The struct iova_bitmap_map, though, represents the
* subset of said IOVA space that is pinned by its parent structure (struct
* iova_bitmap).
*
* The user does not need to exact location of the bits in the bitmap.
* From user perspective the only API available is iova_bitmap_set() which
* records the IOVA *range* in the bitmap by setting the corresponding
* bits.
*
* The bitmap is an array of u64 whereas each bit represents an IOVA of
* range of (1 << pgshift). Thus formula for the bitmap data to be set is:
*
* data[(iova / page_size) / 64] & (1ULL << (iova % 64))
*/
struct iova_bitmap_map {
/* base IOVA representing bit 0 of the first page */
unsigned long iova;
/* page size order that each bit granules to */
unsigned long pgshift;
/* page offset of the first user page pinned */
unsigned long pgoff;
/* number of pages pinned */
unsigned long npages;
/* pinned pages representing the bitmap data */
struct page **pages;
};
/*
* struct iova_bitmap - The IOVA bitmap object
*
* Main data structure for iterating over the bitmap data.
*
* Abstracts the pinning work and iterates in IOVA ranges.
* It uses a windowing scheme and pins the bitmap in relatively
* big ranges e.g.
*
* The bitmap object uses one base page to store all the pinned pages
* pointers related to the bitmap. For sizeof(struct page*) == 8 it stores
* 512 struct page pointers which, if the base page size is 4K, it means
* 2M of bitmap data is pinned at a time. If the iova_bitmap page size is
* also 4K then the range window to iterate is 64G.
*
* For example iterating on a total IOVA range of 4G..128G, it will walk
* through this set of ranges:
*
* 4G - 68G-1 (64G)
* 68G - 128G-1 (64G)
*
* An example of the APIs on how to use/iterate over the IOVA bitmap:
*
* bitmap = iova_bitmap_alloc(iova, length, page_size, data);
* if (IS_ERR(bitmap))
* return PTR_ERR(bitmap);
*
* ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn);
*
* iova_bitmap_free(bitmap);
*
* Each iteration of the @dirty_reporter_fn is called with a unique @iova
* and @length argument, indicating the current range available through the
* iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty
* areas (@iova_length) within that provided range, as following:
*
* iova_bitmap_set(bitmap, iova, iova_length);
*
* The internals of the object uses an index @mapped_base_index that indexes
* which u64 word of the bitmap is mapped, up to @mapped_total_index.
* Those keep being incremented until @mapped_total_index is reached while
* mapping up to PAGE_SIZE / sizeof(struct page*) maximum of pages.
*
* The IOVA bitmap is usually located on what tracks DMA mapped ranges or
* some form of IOVA range tracking that co-relates to the user passed
* bitmap.
*/
struct iova_bitmap {
/* IOVA range representing the currently mapped bitmap data */
struct iova_bitmap_map mapped;
/* userspace address of the bitmap */
u64 __user *bitmap;
/* u64 index that @mapped points to */
unsigned long mapped_base_index;
/* how many u64 can we walk in total */
unsigned long mapped_total_index;
/* base IOVA of the whole bitmap */
unsigned long iova;
/* length of the IOVA range for the whole bitmap */
size_t length;
};
/*
* Converts a relative IOVA to a bitmap index.
* This function provides the index into the u64 array (bitmap::bitmap)
* for a given IOVA offset.
* Relative IOVA means relative to the bitmap::mapped base IOVA
* (stored in mapped::iova). All computations in this file are done using
* relative IOVAs and thus avoid an extra subtraction against mapped::iova.
* The user API iova_bitmap_set() always uses a regular absolute IOVAs.
*/
static unsigned long iova_bitmap_offset_to_index(struct iova_bitmap *bitmap,
unsigned long iova)
{
unsigned long pgsize = 1 << bitmap->mapped.pgshift;
return iova / (BITS_PER_TYPE(*bitmap->bitmap) * pgsize);
}
/*
* Converts a bitmap index to a *relative* IOVA.
*/
static unsigned long iova_bitmap_index_to_offset(struct iova_bitmap *bitmap,
unsigned long index)
{
unsigned long pgshift = bitmap->mapped.pgshift;
return (index * BITS_PER_TYPE(*bitmap->bitmap)) << pgshift;
}
/*
* Returns the base IOVA of the mapped range.
*/
static unsigned long iova_bitmap_mapped_iova(struct iova_bitmap *bitmap)
{
unsigned long skip = bitmap->mapped_base_index;
return bitmap->iova + iova_bitmap_index_to_offset(bitmap, skip);
}
/*
* Pins the bitmap user pages for the current range window.
* This is internal to IOVA bitmap and called when advancing the
* index (@mapped_base_index) or allocating the bitmap.
*/
static int iova_bitmap_get(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
unsigned long npages;
u64 __user *addr;
long ret;
/*
* @mapped_base_index is the index of the currently mapped u64 words
* that we have access. Anything before @mapped_base_index is not
* mapped. The range @mapped_base_index .. @mapped_total_index-1 is
* mapped but capped at a maximum number of pages.
*/
npages = DIV_ROUND_UP((bitmap->mapped_total_index -
bitmap->mapped_base_index) *
sizeof(*bitmap->bitmap), PAGE_SIZE);
/*
* We always cap at max number of 'struct page' a base page can fit.
* This is, for example, on x86 means 2M of bitmap data max.
*/
npages = min(npages, PAGE_SIZE / sizeof(struct page *));
/*
* Bitmap address to be pinned is calculated via pointer arithmetic
* with bitmap u64 word index.
*/
addr = bitmap->bitmap + bitmap->mapped_base_index;
ret = pin_user_pages_fast((unsigned long)addr, npages,
FOLL_WRITE, mapped->pages);
if (ret <= 0)
return -EFAULT;
mapped->npages = (unsigned long)ret;
/* Base IOVA where @pages point to i.e. bit 0 of the first page */
mapped->iova = iova_bitmap_mapped_iova(bitmap);
/*
* offset of the page where pinned pages bit 0 is located.
* This handles the case where the bitmap is not PAGE_SIZE
* aligned.
*/
mapped->pgoff = offset_in_page(addr);
return 0;
}
/*
* Unpins the bitmap user pages and clears @npages
* (un)pinning is abstracted from API user and it's done when advancing
* the index or freeing the bitmap.
*/
static void iova_bitmap_put(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
if (mapped->npages) {
unpin_user_pages(mapped->pages, mapped->npages);
mapped->npages = 0;
}
}
/**
* iova_bitmap_alloc() - Allocates an IOVA bitmap object
* @iova: Start address of the IOVA range
* @length: Length of the IOVA range
* @page_size: Page size of the IOVA bitmap. It defines what each bit
* granularity represents
* @data: Userspace address of the bitmap
*
* Allocates an IOVA object and initializes all its fields including the
* first user pages of @data.
*
* Return: A pointer to a newly allocated struct iova_bitmap
* or ERR_PTR() on error.
*/
struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
unsigned long page_size, u64 __user *data)
{
struct iova_bitmap_map *mapped;
struct iova_bitmap *bitmap;
int rc;
bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
if (!bitmap)
return ERR_PTR(-ENOMEM);
mapped = &bitmap->mapped;
mapped->pgshift = __ffs(page_size);
bitmap->bitmap = data;
bitmap->mapped_total_index =
iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
bitmap->iova = iova;
bitmap->length = length;
mapped->iova = iova;
mapped->pages = (struct page **)__get_free_page(GFP_KERNEL);
if (!mapped->pages) {
rc = -ENOMEM;
goto err;
}
rc = iova_bitmap_get(bitmap);
if (rc)
goto err;
return bitmap;
err:
iova_bitmap_free(bitmap);
return ERR_PTR(rc);
}
/**
* iova_bitmap_free() - Frees an IOVA bitmap object
* @bitmap: IOVA bitmap to free
*
* It unpins and releases pages array memory and clears any leftover
* state.
*/
void iova_bitmap_free(struct iova_bitmap *bitmap)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
iova_bitmap_put(bitmap);
if (mapped->pages) {
free_page((unsigned long)mapped->pages);
mapped->pages = NULL;
}
kfree(bitmap);
}
/*
* Returns the remaining bitmap indexes from mapped_total_index to process for
* the currently pinned bitmap pages.
*/
static unsigned long iova_bitmap_mapped_remaining(struct iova_bitmap *bitmap)
{
unsigned long remaining;
remaining = bitmap->mapped_total_index - bitmap->mapped_base_index;
remaining = min_t(unsigned long, remaining,
(bitmap->mapped.npages << PAGE_SHIFT) / sizeof(*bitmap->bitmap));
return remaining;
}
/*
* Returns the length of the mapped IOVA range.
*/
static unsigned long iova_bitmap_mapped_length(struct iova_bitmap *bitmap)
{
unsigned long max_iova = bitmap->iova + bitmap->length - 1;
unsigned long iova = iova_bitmap_mapped_iova(bitmap);
unsigned long remaining;
/*
* iova_bitmap_mapped_remaining() returns a number of indexes which
* when converted to IOVA gives us a max length that the bitmap
* pinned data can cover. Afterwards, that is capped to
* only cover the IOVA range in @bitmap::iova .. @bitmap::length.
*/
remaining = iova_bitmap_index_to_offset(bitmap,
iova_bitmap_mapped_remaining(bitmap));
if (iova + remaining - 1 > max_iova)
remaining -= ((iova + remaining - 1) - max_iova);
return remaining;
}
/*
* Returns true if there's not more data to iterate.
*/
static bool iova_bitmap_done(struct iova_bitmap *bitmap)
{
return bitmap->mapped_base_index >= bitmap->mapped_total_index;
}
/*
* Advances to the next range, releases the current pinned
* pages and pins the next set of bitmap pages.
* Returns 0 on success or otherwise errno.
*/
static int iova_bitmap_advance(struct iova_bitmap *bitmap)
{
unsigned long iova = iova_bitmap_mapped_length(bitmap) - 1;
unsigned long count = iova_bitmap_offset_to_index(bitmap, iova) + 1;
bitmap->mapped_base_index += count;
iova_bitmap_put(bitmap);
if (iova_bitmap_done(bitmap))
return 0;
/* When advancing the index we pin the next set of bitmap pages */
return iova_bitmap_get(bitmap);
}
/**
* iova_bitmap_for_each() - Iterates over the bitmap
* @bitmap: IOVA bitmap to iterate
* @opaque: Additional argument to pass to the callback
* @fn: Function that gets called for each IOVA range
*
* Helper function to iterate over bitmap data representing a portion of IOVA
* space. It hides the complexity of iterating bitmaps and translating the
* mapped bitmap user pages into IOVA ranges to process.
*
* Return: 0 on success, and an error on failure either upon
* iteration or when the callback returns an error.
*/
int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque,
iova_bitmap_fn_t fn)
{
int ret = 0;
for (; !iova_bitmap_done(bitmap) && !ret;
ret = iova_bitmap_advance(bitmap)) {
ret = fn(bitmap, iova_bitmap_mapped_iova(bitmap),
iova_bitmap_mapped_length(bitmap), opaque);
if (ret)
break;
}
return ret;
}
/**
* iova_bitmap_set() - Records an IOVA range in bitmap
* @bitmap: IOVA bitmap
* @iova: IOVA to start
* @length: IOVA range length
*
* Set the bits corresponding to the range [iova .. iova+length-1] in
* the user bitmap.
*
* Return: The number of bits set.
*/
void iova_bitmap_set(struct iova_bitmap *bitmap,
unsigned long iova, size_t length)
{
struct iova_bitmap_map *mapped = &bitmap->mapped;
unsigned long offset = (iova - mapped->iova) >> mapped->pgshift;
unsigned long nbits = max_t(unsigned long, 1, length >> mapped->pgshift);
unsigned long page_idx = offset / BITS_PER_PAGE;
unsigned long page_offset = mapped->pgoff;
void *kaddr;
offset = offset % BITS_PER_PAGE;
do {
unsigned long size = min(BITS_PER_PAGE - offset, nbits);
kaddr = kmap_local_page(mapped->pages[page_idx]);
bitmap_set(kaddr + page_offset, offset, size);
kunmap_local(kaddr);
page_offset = offset = 0;
nbits -= size;
page_idx++;
} while (nbits > 0);
}
EXPORT_SYMBOL_GPL(iova_bitmap_set);

View File

@@ -8,9 +8,7 @@
*/
#include <linux/module.h>
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/uuid.h>
#include <linux/sysfs.h>
#include <linux/mdev.h>
@@ -20,71 +18,11 @@
#define DRIVER_AUTHOR "NVIDIA Corporation"
#define DRIVER_DESC "Mediated device Core Driver"
static LIST_HEAD(parent_list);
static DEFINE_MUTEX(parent_list_lock);
static struct class_compat *mdev_bus_compat_class;
static LIST_HEAD(mdev_list);
static DEFINE_MUTEX(mdev_list_lock);
struct device *mdev_parent_dev(struct mdev_device *mdev)
{
return mdev->type->parent->dev;
}
EXPORT_SYMBOL(mdev_parent_dev);
/*
* Return the index in supported_type_groups that this mdev_device was created
* from.
*/
unsigned int mdev_get_type_group_id(struct mdev_device *mdev)
{
return mdev->type->type_group_id;
}
EXPORT_SYMBOL(mdev_get_type_group_id);
/*
* Used in mdev_type_attribute sysfs functions to return the index in the
* supported_type_groups that the sysfs is called from.
*/
unsigned int mtype_get_type_group_id(struct mdev_type *mtype)
{
return mtype->type_group_id;
}
EXPORT_SYMBOL(mtype_get_type_group_id);
/*
* Used in mdev_type_attribute sysfs functions to return the parent struct
* device
*/
struct device *mtype_get_parent_dev(struct mdev_type *mtype)
{
return mtype->parent->dev;
}
EXPORT_SYMBOL(mtype_get_parent_dev);
/* Should be called holding parent_list_lock */
static struct mdev_parent *__find_parent_device(struct device *dev)
{
struct mdev_parent *parent;
list_for_each_entry(parent, &parent_list, next) {
if (parent->dev == dev)
return parent;
}
return NULL;
}
void mdev_release_parent(struct kref *kref)
{
struct mdev_parent *parent = container_of(kref, struct mdev_parent,
ref);
struct device *dev = parent->dev;
kfree(parent);
put_device(dev);
}
/* Caller must hold parent unreg_sem read or write lock */
static void mdev_device_remove_common(struct mdev_device *mdev)
{
@@ -99,145 +37,96 @@ static void mdev_device_remove_common(struct mdev_device *mdev)
static int mdev_device_remove_cb(struct device *dev, void *data)
{
struct mdev_device *mdev = mdev_from_dev(dev);
if (mdev)
mdev_device_remove_common(mdev);
if (dev->bus == &mdev_bus_type)
mdev_device_remove_common(to_mdev_device(dev));
return 0;
}
/*
* mdev_register_device : Register a device
* mdev_register_parent: Register a device as parent for mdevs
* @parent: parent structure registered
* @dev: device structure representing parent device.
* @mdev_driver: Device driver to bind to the newly created mdev
* @types: Array of supported mdev types
* @nr_types: Number of entries in @types
*
* Registers the @parent stucture as a parent for mdev types and thus mdev
* devices. The caller needs to hold a reference on @dev that must not be
* released until after the call to mdev_unregister_parent().
*
* Add device to list of registered parent devices.
* Returns a negative value on error, otherwise 0.
*/
int mdev_register_device(struct device *dev, struct mdev_driver *mdev_driver)
int mdev_register_parent(struct mdev_parent *parent, struct device *dev,
struct mdev_driver *mdev_driver, struct mdev_type **types,
unsigned int nr_types)
{
int ret;
struct mdev_parent *parent;
char *env_string = "MDEV_STATE=registered";
char *envp[] = { env_string, NULL };
int ret;
/* check for mandatory ops */
if (!mdev_driver->supported_type_groups)
return -EINVAL;
dev = get_device(dev);
if (!dev)
return -EINVAL;
mutex_lock(&parent_list_lock);
/* Check for duplicate */
parent = __find_parent_device(dev);
if (parent) {
parent = NULL;
ret = -EEXIST;
goto add_dev_err;
}
parent = kzalloc(sizeof(*parent), GFP_KERNEL);
if (!parent) {
ret = -ENOMEM;
goto add_dev_err;
}
kref_init(&parent->ref);
memset(parent, 0, sizeof(*parent));
init_rwsem(&parent->unreg_sem);
parent->dev = dev;
parent->mdev_driver = mdev_driver;
parent->types = types;
parent->nr_types = nr_types;
atomic_set(&parent->available_instances, mdev_driver->max_instances);
if (!mdev_bus_compat_class) {
mdev_bus_compat_class = class_compat_register("mdev_bus");
if (!mdev_bus_compat_class) {
ret = -ENOMEM;
goto add_dev_err;
}
if (!mdev_bus_compat_class)
return -ENOMEM;
}
ret = parent_create_sysfs_files(parent);
if (ret)
goto add_dev_err;
return ret;
ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
if (ret)
dev_warn(dev, "Failed to create compatibility class link\n");
list_add(&parent->next, &parent_list);
mutex_unlock(&parent_list_lock);
dev_info(dev, "MDEV: Registered\n");
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
return 0;
add_dev_err:
mutex_unlock(&parent_list_lock);
if (parent)
mdev_put_parent(parent);
else
put_device(dev);
return ret;
}
EXPORT_SYMBOL(mdev_register_device);
EXPORT_SYMBOL(mdev_register_parent);
/*
* mdev_unregister_device : Unregister a parent device
* @dev: device structure representing parent device.
*
* Remove device from list of registered parent devices. Give a chance to free
* existing mediated devices for given device.
* mdev_unregister_parent : Unregister a parent device
* @parent: parent structure to unregister
*/
void mdev_unregister_device(struct device *dev)
void mdev_unregister_parent(struct mdev_parent *parent)
{
struct mdev_parent *parent;
char *env_string = "MDEV_STATE=unregistered";
char *envp[] = { env_string, NULL };
mutex_lock(&parent_list_lock);
parent = __find_parent_device(dev);
if (!parent) {
mutex_unlock(&parent_list_lock);
return;
}
dev_info(dev, "MDEV: Unregistering\n");
list_del(&parent->next);
mutex_unlock(&parent_list_lock);
dev_info(parent->dev, "MDEV: Unregistering\n");
down_write(&parent->unreg_sem);
class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
device_for_each_child(dev, NULL, mdev_device_remove_cb);
class_compat_remove_link(mdev_bus_compat_class, parent->dev, NULL);
device_for_each_child(parent->dev, NULL, mdev_device_remove_cb);
parent_remove_sysfs_files(parent);
up_write(&parent->unreg_sem);
mdev_put_parent(parent);
/* We still have the caller's reference to use for the uevent */
kobject_uevent_env(&dev->kobj, KOBJ_CHANGE, envp);
kobject_uevent_env(&parent->dev->kobj, KOBJ_CHANGE, envp);
}
EXPORT_SYMBOL(mdev_unregister_device);
EXPORT_SYMBOL(mdev_unregister_parent);
static void mdev_device_release(struct device *dev)
{
struct mdev_device *mdev = to_mdev_device(dev);
/* Pairs with the get in mdev_device_create() */
kobject_put(&mdev->type->kobj);
struct mdev_parent *parent = mdev->type->parent;
mutex_lock(&mdev_list_lock);
list_del(&mdev->next);
if (!parent->mdev_driver->get_available)
atomic_inc(&parent->available_instances);
mutex_unlock(&mdev_list_lock);
/* Pairs with the get in mdev_device_create() */
kobject_put(&mdev->type->kobj);
dev_dbg(&mdev->dev, "MDEV: destroying\n");
kfree(mdev);
}
@@ -259,6 +148,18 @@ int mdev_device_create(struct mdev_type *type, const guid_t *uuid)
}
}
if (!drv->get_available) {
/*
* Note: that non-atomic read and dec is fine here because
* all modifications are under mdev_list_lock.
*/
if (!atomic_read(&parent->available_instances)) {
mutex_unlock(&mdev_list_lock);
return -EUSERS;
}
atomic_dec(&parent->available_instances);
}
mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
if (!mdev) {
mutex_unlock(&mdev_list_lock);

Some files were not shown because too many files have changed in this diff Show More