This means something different and is only relevant for gen6 and the
reason why we cant use anything else than aliasing ppgtt there.
Note that the currently implemented logic for secure batches is
broken: Userspace wants the buffer both in ppgtt (for self-referencing
relocations) and in ggtt (for priveledge operations).
This is the same issue the command parser is also facing.
Unfortunately our coverage for corner-cases of self-referencing
batches is spotty.
Note that this will break vsync'ed Xv and DRI2 copies.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts the abi-change from
commit 67e3d2979b
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Fri Dec 6 14:11:01 2013 -0800
drm/i915: Permit contexts on all rings
We don't actually need this, only the internal changes to allow
contexts on all rings for the purpose of ppgtt switching are required.
And I'm not sure whether this is the right thing to do given some of
the hw features in the pipeline.
Also, new abi needs userspace patches as a proof-of-need, which is
completely lacking here.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).
For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.
One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).
v2: Disable BDW on rebase
NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.
Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I need the tricky do_switch fix before I can merge the final piece of
the ppgtt enabling puzzle. Otherwise the conflict will be a real pain
to resolve since the do_switch hunk from -fixes must be placed at the
exact right place within a hunk in the next patch.
Conflicts:
drivers/gpu/drm/i915/i915_gem_context.c
drivers/gpu/drm/i915/i915_gem_execbuffer.c
drivers/gpu/drm/i915/intel_display.c
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.
This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.
Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex
Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).
Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:
CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.
v2: Rebased on top of
commit b6359918b8
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Oct 30 15:44:16 2013 +0200
drm/i915: add i915_get_reset_stats_ioctl
v3: Rebased on top of
commit 25b3dfc87b
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Tue Nov 12 11:57:30 2013 +0200
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Nov 26 16:14:33 2013 +0200
drm/i915: check context reset stats before relocations
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we want to use contexts in more abstract terms (specifically with
PPGTT in mind), we need to allow them to be specified for any ring.
Since the upcoming patches will bring about the use of multiple address
spaces, and each ring needs to have an address space programmed (which
we intend to do at context switch time), we can no longer only use RCS.
With multiple rings having a last context, we must now unreference these
contexts.
NOTE: This commit requires an update to intel-gpu-tools to make it not
fail.
v2: Rebased with some logical conflicts.
Squashed in the context fini refcount patch
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The only place we were using it was for GEN6, which won't have PPGTT
support anyway (ie. the VM is always the same). To clear things up,
(it only added confusion for me since it doesn't allow us to assert
vma->vm is what we always want, when just looking at the code).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As the execbuffer dispatch grows ever more complex and involves multiple
stages of moving objects into the aperture, we need to take greater care
that we do not evict our execbuffer objects prior to dispatch. This is
relatively simple as we can just keep the objects pinned for not just
the relocation but until we are finished.
One such example is the possibility of the context switch causing an
eviction or hitting the shrinker in order to fit its object into the
aperture.
Link: http://lists.freedesktop.org/archives/intel-gfx/2013-November/036166.html
Reported-by: "Siluvery, Arun" <arun.siluvery@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: stable@vger.kernel.org
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add the additional explanations from Chris to the commit
message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This belonged in
commit 07fe0b1280
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 31 17:00:10 2013 -0700
drm/i915: plumb VM into bind/unbind code
But it was somehow missed along the way.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
i915_gem_execbuffer_relocate became defunct in:
commit 27173f1f95
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Aug 14 11:38:36 2013 +0200
drm/i915: Convert execbuf code to use vmas
eb_create: never used?
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: The lingering vm parameter to eb_create might have been back
from the days where we didn't yet keep both vmas and obj lists in the
eb struct. But I didn't check really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Squash in fix from Ben: Set PPGTT batches as necessary
This fixes the regression in the last couple of days when we enabled
PPGTT.
v3: Squash in fixup to still use GTT for secure batches from Ville:
BDW doesn't have a separate secure vs. non-secure bit in
MI_BATCH_BUFFER_START. So for secure batches we have to simply
leave the PPGTT bit unset. Fortunately older generations (except
HSW) had similar limitations so execbuffer already creates a GTT
mapping for all secure batches.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We don't actually return any to userspace yet, however we can pretend
like we do now so userspace will support it when it happens.
This is just to please Chris as the code itself isn't ready for > 64b
relocations.
v2: Rebase on top of the refactored relocate_entry_gtt|cpu functions.
v3: Squash in fixup from Rafal Barbalho for 64 byte relocs using cpu
relocs and those crossing a page boundary.
v4: Squash in a fixup for the fixup from Rafael.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Barbalho, Rafael <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.
NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.
v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There's actually no real risk since we already check for stricter
constraints earlier (using UINT_MAX / sizeof (struct
drm_i915_gem_exec_object2) as the limit). But in eb_create we use
signed integers, which steals a factor of 2. Luckily struct
drm_i915_gem_exec_object2 for this to not matter.
Still, be consistent and use unsigned integers.
Similar use unsinged integers when checking for overflows in the
relocation entry processing.
I've also added a new subtests to igt/gem_reloc_overflow to also
test for overflowing args->buffer_count values.
v2: Give the variables again tighter scope to make it clear that the
computation is purely local and doesn't leak out to the 2nd block.
Requested by Chris Wilson.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
No buffer overflows here, but better safe than sorry.
v2:
- Fixup the sizeof conversion, I've missed the pointer deref (Jani).
- Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
- Use kmalloc_array for the execbuf fastpath to avoid the memset
(Chris). I've opted to leave all other conversions as-is since they
aren't in a fastpath and dealing with cleared memory instead of
random garbage is just generally nicer.
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Drop the contentious kmalloc_array hunk in execbuf.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When reserving objects during execbuf, it is possible to come across an
object which will not fit given the current fragmentation of the address
space. We do not have any defragment in drm_mm, so the strategy is to
instead evict everything, and reallocate objects.
With the upcoming addition of multiple VMs, there is no point to evict
everything since doing so is overkill for the specific case mentioned
above.
Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: One additional s/evict_everything/evict_vm/ to update a
comment in the code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.
If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.
There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.
v2: Store guilty ban status bool in gpu_error instead of pointers
that might become danling before hang is declared.
v3: Use return value for banned status instead of stashing state
into gpu_error (Chris Wilson)
v4: - rebase on top of fixed hang stats api
- add define for ban period
- rename commit and improve commit msg
v5: - rely context banning instead of wedging the gpu
- beautification and fix for ban calculation (Chris)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A follow-on to the update of the LLC coherency logic is that we can rely
on the LLC being coherent with the CS for rewriting batchbuffers
irrespective of their cache domain. (This should have no effect
currently as all the batch buffers are expected to be I915_CACHE_LLC and
so using the cpu relocation path anyway.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.
This is just a safety measure to prevent surprises in the future.
Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.
v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.
The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.
This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.
The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.
Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.
WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.
v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)
v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Aug 7 18:30:54 2013 +0100
drm/i915: List objects allocated from stolen memory in debugfs
v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>