Merge tag 'cgroup-dmem-drm-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next

DMEM cgroup pull request

This introduces a new cgroup controller to limit the device memory.
Notable users would be DRM, dma-buf heaps, or v4l2.

This pull request is based on the series developped by Maarten
Lankhorst, Friedrich Vock, and I:
https://lore.kernel.org/all/20241204134410.1161769-1-dev@lankhorst.se/

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250110-cryptic-warm-mandrill-b71f5d@houat
This commit is contained in:
Dave Airlie
2025-01-11 07:20:22 +10:00
20 changed files with 1194 additions and 32 deletions

View File

@@ -64,13 +64,14 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-6. Device
5-7. RDMA
5-7-1. RDMA Interface Files
5-8. HugeTLB
5.8-1. HugeTLB Interface Files
5-9. Misc
5.9-1 Miscellaneous cgroup Interface Files
5.9-2 Migration and Ownership
5-10. Others
5-10-1. perf_event
5-8. DMEM
5-9. HugeTLB
5.9-1. HugeTLB Interface Files
5-10. Misc
5.10-1 Miscellaneous cgroup Interface Files
5.10-2 Migration and Ownership
5-11. Others
5-11-1. perf_event
5-N. Non-normative information
5-N-1. CPU controller root cgroup process behaviour
5-N-2. IO controller root cgroup process behaviour
@@ -2626,6 +2627,49 @@ RDMA Interface Files
mlx4_0 hca_handle=1 hca_object=20
ocrdma1 hca_handle=1 hca_object=23
DMEM
----
The "dmem" controller regulates the distribution and accounting of
device memory regions. Because each memory region may have its own page size,
which does not have to be equal to the system page size, the units are always bytes.
DMEM Interface Files
~~~~~~~~~~~~~~~~~~~~
dmem.max, dmem.min, dmem.low
A readwrite nested-keyed file that exists for all the cgroups
except root that describes current configured resource limit
for a region.
An example for xe follows::
drm/0000:03:00.0/vram0 1073741824
drm/0000:03:00.0/stolen max
The semantics are the same as for the memory cgroup controller, and are
calculated in the same way.
dmem.capacity
A read-only file that describes maximum region capacity.
It only exists on the root cgroup. Not all memory can be
allocated by cgroups, as the kernel reserves some for
internal use.
An example for xe follows::
drm/0000:03:00.0/vram0 8514437120
drm/0000:03:00.0/stolen 67108864
dmem.current
A read-only file that describes current resource usage.
It exists for all the cgroup except root.
An example for xe follows::
drm/0000:03:00.0/vram0 12550144
drm/0000:03:00.0/stolen 8650752
HugeTLB
-------

View File

@@ -0,0 +1,9 @@
==================
Cgroup Kernel APIs
==================
Device Memory Cgroup API (dmemcg)
=========================
.. kernel-doc:: kernel/cgroup/dmem.c
:export:

View File

@@ -109,6 +109,7 @@ more memory-management documentation in Documentation/mm/index.rst.
dma-isa-lpc
swiotlb
mm-api
cgroup
genalloc
pin_user_pages
boot-time-mm

View File

@@ -0,0 +1,54 @@
==================================
Long running workloads and compute
==================================
Long running workloads (compute) are workloads that will not complete in 10
seconds. (The time let the user wait before he reaches for the power button).
This means that other techniques need to be used to manage those workloads,
that cannot use fences.
Some hardware may schedule compute jobs, and have no way to pre-empt them, or
have their memory swapped out from them. Or they simply want their workload
not to be preempted or swapped out at all.
This means that it differs from what is described in driver-api/dma-buf.rst.
As with normal compute jobs, dma-fence may not be used at all. In this case,
not even to force preemption. The driver with is simply forced to unmap a BO
from the long compute job's address space on unbind immediately, not even
waiting for the workload to complete. Effectively this terminates the workload
when there is no hardware support to recover.
Since this is undesirable, there need to be mitigations to prevent a workload
from being terminated. There are several possible approach, all with their
advantages and drawbacks.
The first approach you will likely try is to pin all buffers used by compute.
This guarantees that the job will run uninterrupted, but also allows a very
denial of service attack by pinning as much memory as possible, hogging the
all GPU memory, and possibly a huge chunk of CPU memory.
A second approach that will work slightly better on its own is adding an option
not to evict when creating a new job (any kind). If all of userspace opts in
to this flag, it would prevent cooperating userspace from forced terminating
older compute jobs to start a new one.
If job preemption and recoverable pagefaults are not available, those are the
only approaches possible. So even with those, you want a separate way of
controlling resources. The standard kernel way of doing so is cgroups.
This creates a third option, using cgroups to prevent eviction. Both GPU and
driver-allocated CPU memory would be accounted to the correct cgroup, and
eviction would be made cgroup aware. This allows the GPU to be partitioned
into cgroups, that will allow jobs to run next to each other without
interference.
The interface to the cgroup would be similar to the current CPU memory
interface, with similar semantics for min/low/high/max, if eviction can
be made cgroup aware.
What should be noted is that each memory region (tiled memory for example)
should have its own accounting.
The key is set to the regionid set by the driver, for example "tile0".
For the value of $card, we use drmGetUnique().

View File

@@ -26,6 +26,7 @@
* DEALINGS IN THE SOFTWARE.
*/
#include <linux/cgroup_dmem.h>
#include <linux/debugfs.h>
#include <linux/fs.h>
#include <linux/module.h>
@@ -820,6 +821,37 @@ void drm_dev_put(struct drm_device *dev)
}
EXPORT_SYMBOL(drm_dev_put);
static void drmm_cg_unregister_region(struct drm_device *dev, void *arg)
{
dmem_cgroup_unregister_region(arg);
}
/**
* drmm_cgroup_register_region - Register a region of a DRM device to cgroups
* @dev: device for region
* @region_name: Region name for registering
* @size: Size of region in bytes
*
* This decreases the ref-count of @dev by one. The device is destroyed if the
* ref-count drops to zero.
*/
struct dmem_cgroup_region *drmm_cgroup_register_region(struct drm_device *dev, const char *region_name, u64 size)
{
struct dmem_cgroup_region *region;
int ret;
region = dmem_cgroup_register_region(size, "drm/%s/%s", dev->unique, region_name);
if (IS_ERR_OR_NULL(region))
return region;
ret = drmm_add_action_or_reset(dev, drmm_cg_unregister_region, region);
if (ret)
return ERR_PTR(ret);
return region;
}
EXPORT_SYMBOL_GPL(drmm_cgroup_register_region);
static int create_compat_control_link(struct drm_device *dev)
{
struct drm_minor *minor;

View File

@@ -258,13 +258,13 @@ static void ttm_bo_unreserve_basic(struct kunit *test)
bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE, NULL);
bo->priority = bo_prio;
err = ttm_resource_alloc(bo, place, &res1);
err = ttm_resource_alloc(bo, place, &res1, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo->resource = res1;
/* Add a dummy resource to populate LRU */
ttm_resource_alloc(bo, place, &res2);
ttm_resource_alloc(bo, place, &res2, NULL);
dma_resv_lock(bo->base.resv, NULL);
ttm_bo_unreserve(bo);
@@ -300,12 +300,12 @@ static void ttm_bo_unreserve_pinned(struct kunit *test)
dma_resv_lock(bo->base.resv, NULL);
ttm_bo_pin(bo);
err = ttm_resource_alloc(bo, place, &res1);
err = ttm_resource_alloc(bo, place, &res1, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo->resource = res1;
/* Add a dummy resource to the pinned list */
err = ttm_resource_alloc(bo, place, &res2);
err = ttm_resource_alloc(bo, place, &res2, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
KUNIT_ASSERT_EQ(test,
list_is_last(&res2->lru.link, &priv->ttm_dev->unevictable), 1);
@@ -355,7 +355,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
ttm_bo_set_bulk_move(bo1, &lru_bulk_move);
dma_resv_unlock(bo1->base.resv);
err = ttm_resource_alloc(bo1, place, &res1);
err = ttm_resource_alloc(bo1, place, &res1, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo1->resource = res1;
@@ -363,7 +363,7 @@ static void ttm_bo_unreserve_bulk(struct kunit *test)
ttm_bo_set_bulk_move(bo2, &lru_bulk_move);
dma_resv_unlock(bo2->base.resv);
err = ttm_resource_alloc(bo2, place, &res2);
err = ttm_resource_alloc(bo2, place, &res2, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo2->resource = res2;
@@ -401,7 +401,7 @@ static void ttm_bo_put_basic(struct kunit *test)
bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE, NULL);
bo->type = ttm_bo_type_device;
err = ttm_resource_alloc(bo, place, &res);
err = ttm_resource_alloc(bo, place, &res, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo->resource = res;
@@ -518,7 +518,7 @@ static void ttm_bo_pin_unpin_resource(struct kunit *test)
bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE, NULL);
err = ttm_resource_alloc(bo, place, &res);
err = ttm_resource_alloc(bo, place, &res, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo->resource = res;
@@ -569,7 +569,7 @@ static void ttm_bo_multiple_pin_one_unpin(struct kunit *test)
bo = ttm_bo_kunit_init(test, test->priv, BO_SIZE, NULL);
err = ttm_resource_alloc(bo, place, &res);
err = ttm_resource_alloc(bo, place, &res, NULL);
KUNIT_ASSERT_EQ(test, err, 0);
bo->resource = res;

View File

@@ -542,7 +542,7 @@ static void ttm_bo_validate_no_placement_signaled(struct kunit *test)
bo->ttm = old_tt;
}
err = ttm_resource_alloc(bo, place, &bo->resource);
err = ttm_resource_alloc(bo, place, &bo->resource, NULL);
KUNIT_EXPECT_EQ(test, err, 0);
KUNIT_ASSERT_EQ(test, man->usage, size);
@@ -603,7 +603,7 @@ static void ttm_bo_validate_no_placement_not_signaled(struct kunit *test)
bo = ttm_bo_kunit_init(test, test->priv, size, NULL);
bo->type = params->bo_type;
err = ttm_resource_alloc(bo, place, &bo->resource);
err = ttm_resource_alloc(bo, place, &bo->resource, NULL);
KUNIT_EXPECT_EQ(test, err, 0);
placement = kunit_kzalloc(test, sizeof(*placement), GFP_KERNEL);

View File

@@ -302,7 +302,7 @@ static void ttm_sys_man_free_basic(struct kunit *test)
res = kunit_kzalloc(test, sizeof(*res), GFP_KERNEL);
KUNIT_ASSERT_NOT_NULL(test, res);
ttm_resource_alloc(bo, place, &res);
ttm_resource_alloc(bo, place, &res, NULL);
man = ttm_manager_type(priv->devs->ttm_dev, mem_type);
man->func->free(man, res);

View File

@@ -42,6 +42,7 @@
#include <linux/file.h>
#include <linux/module.h>
#include <linux/atomic.h>
#include <linux/cgroup_dmem.h>
#include <linux/dma-resv.h>
#include "ttm_module.h"
@@ -499,6 +500,13 @@ struct ttm_bo_evict_walk {
struct ttm_resource **res;
/** @evicted: Number of successful evictions. */
unsigned long evicted;
/** @limit_pool: Which pool limit we should test against */
struct dmem_cgroup_pool_state *limit_pool;
/** @try_low: Whether we should attempt to evict BO's with low watermark threshold */
bool try_low;
/** @hit_low: If we cannot evict a bo when @try_low is false (first pass) */
bool hit_low;
};
static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
@@ -507,6 +515,10 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
container_of(walk, typeof(*evict_walk), walk);
s64 lret;
if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
evict_walk->try_low, &evict_walk->hit_low))
return 0;
if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
return 0;
@@ -524,7 +536,7 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
evict_walk->evicted++;
if (evict_walk->res)
lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
evict_walk->res);
evict_walk->res, NULL);
if (lret == 0)
return 1;
out:
@@ -545,7 +557,8 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
struct ttm_buffer_object *evictor,
struct ttm_operation_ctx *ctx,
struct ww_acquire_ctx *ticket,
struct ttm_resource **res)
struct ttm_resource **res,
struct dmem_cgroup_pool_state *limit_pool)
{
struct ttm_bo_evict_walk evict_walk = {
.walk = {
@@ -556,22 +569,39 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
.place = place,
.evictor = evictor,
.res = res,
.limit_pool = limit_pool,
};
s64 lret;
evict_walk.walk.trylock_only = true;
lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
/* One more attempt if we hit low limit? */
if (!lret && evict_walk.hit_low) {
evict_walk.try_low = true;
lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
}
if (lret || !ticket)
goto out;
/* Reset low limit */
evict_walk.try_low = evict_walk.hit_low = false;
/* If ticket-locking, repeat while making progress. */
evict_walk.walk.trylock_only = false;
retry:
do {
/* The walk may clear the evict_walk.walk.ticket field */
evict_walk.walk.ticket = ticket;
evict_walk.evicted = 0;
lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
} while (!lret && evict_walk.evicted);
/* We hit the low limit? Try once more */
if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
evict_walk.try_low = true;
goto retry;
}
out:
if (lret < 0)
return lret;
@@ -689,6 +719,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
for (i = 0; i < placement->num_placement; ++i) {
const struct ttm_place *place = &placement->placement[i];
struct dmem_cgroup_pool_state *limit_pool = NULL;
struct ttm_resource_manager *man;
bool may_evict;
@@ -701,15 +732,20 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo,
continue;
may_evict = (force_space && place->mem_type != TTM_PL_SYSTEM);
ret = ttm_resource_alloc(bo, place, res);
ret = ttm_resource_alloc(bo, place, res, force_space ? &limit_pool : NULL);
if (ret) {
if (ret != -ENOSPC)
if (ret != -ENOSPC && ret != -EAGAIN) {
dmem_cgroup_pool_state_put(limit_pool);
return ret;
if (!may_evict)
}
if (!may_evict) {
dmem_cgroup_pool_state_put(limit_pool);
continue;
}
ret = ttm_bo_evict_alloc(bdev, man, place, bo, ctx,
ticket, res);
ticket, res, limit_pool);
dmem_cgroup_pool_state_put(limit_pool);
if (ret == -EBUSY)
continue;
if (ret)
@@ -1056,6 +1092,8 @@ struct ttm_bo_swapout_walk {
struct ttm_lru_walk walk;
/** @gfp_flags: The gfp flags to use for ttm_tt_swapout() */
gfp_t gfp_flags;
bool hit_low, evict_low;
};
static s64
@@ -1106,7 +1144,7 @@ ttm_bo_swapout_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo)
memset(&hop, 0, sizeof(hop));
place.mem_type = TTM_PL_SYSTEM;
ret = ttm_resource_alloc(bo, &place, &evict_mem);
ret = ttm_resource_alloc(bo, &place, &evict_mem, NULL);
if (ret)
goto out;

View File

@@ -26,6 +26,7 @@
#include <linux/io-mapping.h>
#include <linux/iosys-map.h>
#include <linux/scatterlist.h>
#include <linux/cgroup_dmem.h>
#include <drm/ttm/ttm_bo.h>
#include <drm/ttm/ttm_placement.h>
@@ -350,15 +351,28 @@ EXPORT_SYMBOL(ttm_resource_fini);
int ttm_resource_alloc(struct ttm_buffer_object *bo,
const struct ttm_place *place,
struct ttm_resource **res_ptr)
struct ttm_resource **res_ptr,
struct dmem_cgroup_pool_state **ret_limit_pool)
{
struct ttm_resource_manager *man =
ttm_manager_type(bo->bdev, place->mem_type);
struct dmem_cgroup_pool_state *pool = NULL;
int ret;
if (man->cg) {
ret = dmem_cgroup_try_charge(man->cg, bo->base.size, &pool, ret_limit_pool);
if (ret)
return ret;
}
ret = man->func->alloc(man, bo, place, res_ptr);
if (ret)
if (ret) {
if (pool)
dmem_cgroup_uncharge(pool, bo->base.size);
return ret;
}
(*res_ptr)->css = pool;
spin_lock(&bo->bdev->lru_lock);
ttm_resource_add_bulk_move(*res_ptr, bo);
@@ -370,6 +384,7 @@ EXPORT_SYMBOL_FOR_TESTS_ONLY(ttm_resource_alloc);
void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res)
{
struct ttm_resource_manager *man;
struct dmem_cgroup_pool_state *pool;
if (!*res)
return;
@@ -377,9 +392,13 @@ void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res)
spin_lock(&bo->bdev->lru_lock);
ttm_resource_del_bulk_move(*res, bo);
spin_unlock(&bo->bdev->lru_lock);
pool = (*res)->css;
man = ttm_manager_type(bo->bdev, (*res)->mem_type);
man->func->free(man, *res);
*res = NULL;
if (man->cg)
dmem_cgroup_uncharge(pool, bo->base.size);
}
EXPORT_SYMBOL(ttm_resource_free);

View File

@@ -5,6 +5,7 @@
*/
#include <drm/drm_managed.h>
#include <drm/drm_drv.h>
#include <drm/ttm/ttm_placement.h>
#include <drm/ttm/ttm_range_manager.h>
@@ -311,6 +312,13 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
struct ttm_resource_manager *man = &mgr->manager;
int err;
if (mem_type != XE_PL_STOLEN) {
const char *name = mem_type == XE_PL_VRAM0 ? "vram0" : "vram1";
man->cg = drmm_cgroup_register_region(&xe->drm, name, size);
if (IS_ERR(man->cg))
return PTR_ERR(man->cg);
}
man->func = &xe_ttm_vram_mgr_func;
mgr->mem_type = mem_type;
mutex_init(&mgr->lock);

View File

@@ -34,6 +34,7 @@
#include <drm/drm_device.h>
struct dmem_cgroup_region;
struct drm_fb_helper;
struct drm_fb_helper_surface_size;
struct drm_file;
@@ -436,6 +437,10 @@ void *__devm_drm_dev_alloc(struct device *parent,
const struct drm_driver *driver,
size_t size, size_t offset);
struct dmem_cgroup_region *
drmm_cgroup_register_region(struct drm_device *dev,
const char *region_name, u64 size);
/**
* devm_drm_dev_alloc - Resource managed allocation of a &drm_device instance
* @parent: Parent device object

View File

@@ -38,6 +38,7 @@
#define TTM_MAX_BO_PRIORITY 4U
#define TTM_NUM_MEM_TYPES 8
struct dmem_cgroup_device;
struct ttm_device;
struct ttm_resource_manager;
struct ttm_resource;
@@ -211,6 +212,11 @@ struct ttm_resource_manager {
* bdev->lru_lock.
*/
uint64_t usage;
/**
* @cg: &dmem_cgroup_region used for memory accounting, if not NULL.
*/
struct dmem_cgroup_region *cg;
};
/**
@@ -239,6 +245,7 @@ struct ttm_bus_placement {
* @placement: Placement flags.
* @bus: Placement on io bus accessible to the CPU
* @bo: weak reference to the BO, protected by ttm_device::lru_lock
* @css: cgroup state this resource is charged to
*
* Structure indicating the placement and space resources used by a
* buffer object.
@@ -251,6 +258,8 @@ struct ttm_resource {
struct ttm_bus_placement bus;
struct ttm_buffer_object *bo;
struct dmem_cgroup_pool_state *css;
/**
* @lru: Least recently used list, see &ttm_resource_manager.lru
*/
@@ -432,7 +441,8 @@ void ttm_resource_fini(struct ttm_resource_manager *man,
int ttm_resource_alloc(struct ttm_buffer_object *bo,
const struct ttm_place *place,
struct ttm_resource **res);
struct ttm_resource **res,
struct dmem_cgroup_pool_state **ret_limit_pool);
void ttm_resource_free(struct ttm_buffer_object *bo, struct ttm_resource **res);
bool ttm_resource_intersects(struct ttm_device *bdev,
struct ttm_resource *res,

View File

@@ -0,0 +1,66 @@
/* SPDX-License-Identifier: MIT */
/*
* Copyright © 2023-2024 Intel Corporation
*/
#ifndef _CGROUP_DMEM_H
#define _CGROUP_DMEM_H
#include <linux/types.h>
#include <linux/llist.h>
struct dmem_cgroup_pool_state;
/* Opaque definition of a cgroup region, used internally */
struct dmem_cgroup_region;
#if IS_ENABLED(CONFIG_CGROUP_DMEM)
struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const char *name_fmt, ...) __printf(2,3);
void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region);
int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
struct dmem_cgroup_pool_state **ret_pool,
struct dmem_cgroup_pool_state **ret_limit_pool);
void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
struct dmem_cgroup_pool_state *test_pool,
bool ignore_low, bool *ret_hit_low);
void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
#else
static inline __printf(2,3) struct dmem_cgroup_region *
dmem_cgroup_register_region(u64 size, const char *name_fmt, ...)
{
return NULL;
}
static inline void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region)
{ }
static inline int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
struct dmem_cgroup_pool_state **ret_pool,
struct dmem_cgroup_pool_state **ret_limit_pool)
{
*ret_pool = NULL;
if (ret_limit_pool)
*ret_limit_pool = NULL;
return 0;
}
static inline void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size)
{ }
static inline
bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
struct dmem_cgroup_pool_state *test_pool,
bool ignore_low, bool *ret_hit_low)
{
return true;
}
static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
{ }
#endif
#endif /* _CGROUP_DMEM_H */

View File

@@ -65,6 +65,10 @@ SUBSYS(rdma)
SUBSYS(misc)
#endif
#if IS_ENABLED(CONFIG_CGROUP_DMEM)
SUBSYS(dmem)
#endif
/*
* The following subsystems are not supported on the default hierarchy.
*/

View File

@@ -96,7 +96,7 @@ static inline void page_counter_reset_watermark(struct page_counter *counter)
counter->watermark = usage;
}
#ifdef CONFIG_MEMCG
#if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM)
void page_counter_calculate_protection(struct page_counter *root,
struct page_counter *counter,
bool recursive_protection);

View File

@@ -1128,6 +1128,7 @@ config CGROUP_PIDS
config CGROUP_RDMA
bool "RDMA controller"
select PAGE_COUNTER
help
Provides enforcement of RDMA resources defined by IB stack.
It is fairly easy for consumers to exhaust RDMA resources, which
@@ -1136,6 +1137,15 @@ config CGROUP_RDMA
Attaching processes with active RDMA resources to the cgroup
hierarchy is allowed even if can cross the hierarchy's limit.
config CGROUP_DMEM
bool "Device memory controller (DMEM)"
help
The DMEM controller allows compatible devices to restrict device
memory usage based on the cgroup hierarchy.
As an example, it allows you to restrict VRAM usage for applications
in the DRM subsystem.
config CGROUP_FREEZER
bool "Freezer controller"
help

View File

@@ -7,4 +7,5 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o
obj-$(CONFIG_CPUSETS) += cpuset.o
obj-$(CONFIG_CPUSETS_V1) += cpuset-v1.o
obj-$(CONFIG_CGROUP_MISC) += misc.o
obj-$(CONFIG_CGROUP_DMEM) += dmem.o
obj-$(CONFIG_CGROUP_DEBUG) += debug.o

861
kernel/cgroup/dmem.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -288,7 +288,7 @@ int page_counter_memparse(const char *buf, const char *max,
}
#ifdef CONFIG_MEMCG
#if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM)
/*
* This function calculates an individual page counter's effective
* protection which is derived from its own memory.min/low, its
@@ -460,4 +460,4 @@ void page_counter_calculate_protection(struct page_counter *root,
atomic_long_read(&parent->children_low_usage),
recursive_protection));
}
#endif /* CONFIG_MEMCG */
#endif /* CONFIG_MEMCG || CONFIG_CGROUP_DMEM */