I've found this by accident. The docs don't really come out and say you
need to do this. What the docs do tell you is you need to flush the TLBs
before you set the PP_DIR_BASE, and that the RCS will invalidate its
TLBs upon setting the new PP_DIR_BASE. It makes no such comment about
any of the other rings.
Empirically, this indeed fixes a really obvious bug whereby the batches
being sent to the blitter were not executing (we were executing the
HSWP somehow instead).
NOTE: This should make no difference with the current code. It only
applies when we start using multiple VMs.
NOTE2: HSW appears to be immune to this.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The docs seem to suggest this is the appropriate method (though it
doesn't say so outright). In other words, we probably should have done
this before. We certainly must do this for switching VMs on the fly,
since synchronizing the rings to MMIO updates isn't acceptable.
v2:
Make the reset code actually work for all rings. Note that this was
fixed in subsequent commits, but was indeed broken for this commit.
Add a posting read to the reset case. It probably should have existed
before hand, but since we have no failures; there is no reason to make
it a separate commit.
Make IS_GEN6 not use the ring because I am seeing crashes when using it.
It is a bit of a hack in this patch, it will get fixed up in a couple of
patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In order to do the full context switch with address space, it's
convenient to have a way to switch the address space. We already have
this in our code - just pull it out to be called by the context switch
code later.
v2: Rebased on BDW support. Required adding BDW.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The patch before this changed the way in which we allocate space for the
PPGTT PDEs. It began carving out the PPGTT PDEs (which live in the
Global GTT) from the GGTT's drm_mm. Prior to that patch, the PDEs were
hidden from the drm_mm, and therefore could never fail to be allocated.
In unfortunate cases, the drm_mm may be full when we want to allocate
the space. This can technically occur whenever we try to allocate, which
happens in two places currently. Practically, it can only really ever
happen at GPU reset.
Later, when we allocate more PDEs for multiple PPGTTs this will
potentially even more useful.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When PPGTT support was originally enabled, it was only designed to
support 1 PPGTT. It therefore made sense to simply hide the GGTT space
required to enable this from the drm_mm allocator.
Since we intend to support full PPGTT, which means more than 1, and they
can be created and destroyed ad hoc it will be required to use the
proper allocation techniques we already have.
The first step here is to make the existing single PPGTT use the
allocator.
The astute observer will notice that we are reserving space in the GGTT
for the PDEs for the lifetime of the address space, and would be right
to question whether or not this is a good idea. It does not make a
difference with this current patch only the aliasing PPGTT (indeed the
PDEs should still be hidden from the shrinker). For the future, we are
allocating from top to bottom to avoid using the precious "gtt
space" The GGTT space at that point should only be used for scanout, HW
contexts, ringbuffers, HWSP, PDEs, and a couple of other small buffers
(potentially) used by the kernel. Everything else should be mapped into
a PPGTT. To put the consumption in more tangible terms, it takes
approximately 4 sets of PDEs to equal one 19x10 framebuffer (with no
fancy stride or alignment constraints). 3/4 of the total [average] GGTT
can be used for PDEs, and hopefully never touch the 1/4 that the
framebuffer needs.
The astute, and persistent observer might ask about the page tables
which are also pinned for the address space. This waste is unfortunate.
We use 2MB of memory per address space. We leave wrapping the PDEs as a
real GEM object as a TODO.
v2: Align PDEs to 64b in GTT
Allocate the node dynamically so we can use drm_mm_put_block
Now tested on IGT
Allocate node at the top to avoid fragmentation (Chris)
v3: Use Chris' top down allocator
v4: Embed drm_mm_node into ppgtt struct (Jesse)
Remove hunks which didn't belong (Jesse)
v5: Don't subtract guard page since we now killed the guard page prior
to this patch. (Ben)
v6: Rebased and removed guard page stuff.
Added a chunk to the commit message
Allow adding a context to mappable region
v7: Undo v3, so we can make the drm patch last in the series
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v4)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
squash: drm/i915: allow PPGTT to use mappable
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.
What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.
drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage
drm/i915: Add bind/unbind object functions to VMA
As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.
Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.
v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)
v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.
v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)
v5: Update the comment to not suck (Chris)
v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: Use the new vm [un]bind functions
Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.
Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.
v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.
v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)
At this point the original mailing list thread diverges. ie.
v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
drm/i915: reduce vm->insert_entries() usage
FKA: drm/i915: eliminate vm->insert_entries()
With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.
v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The initial implementation of this function used MMIO to write the PDPs.
Upon review it was determined (correctly) that the docs say to use LRI.
The issue is there are times where we want to do a synchronous write
(GPU reset).
I've tested this, and it works. I've verified with as many people as
possible that it should work.
This should fix the failing reset problems.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Our VM code already has a cleanup function, and this is a nice place to
put the drm_mm_takedown. This should have no functional impact, it just
leaves the unload function a bit cleaer, and is more logical IMO
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This should really have been added in BDW integration, as well as:
commit 93bd8649db
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Tue Jul 16 16:50:06 2013 -0700
drm/i915: Put the mm in the parent address space
It didn't really matter before, but it will in the future.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we fail for some reason on loading the PDPs, it would be wise to
disable the PPGTT in the ring registers. If we do not do this, we have
undefined results.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pull in Jani's backlight rework branch. This was merged through a
separate branch to be able to sort out the Broadwell conflicts
properly before pulling it into the main development branch.
Conflicts:
drivers/gpu/drm/i915/intel_display.c
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Because of the way in which we're allocating the pages for the Aliasing
PPGTT, we cannot actually successfully alloc enough space for anything
greater than 2GB.
Instead of a quick hack to fix this, we should defer until we have the
real solution in place (allocating much less contiguous space).
This wasn't found sooner because we didn't not have any systems
supporting more than a 2GB GTT.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Preallocated objects will already have been added to the vma_list when
creating their ggtt vma entry, and coincidentally also marked as holding
a ggtt mapping. Repeating the vma_list manipulation when setting up the
ggtt after preallocation is a recipe for an unhappy kernel.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Use the improve commit message suggest by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Resolve rebase conflicts and switch to gen < 8 color for GenX
checking.
v3: Rebase on top of the address space refactoring.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Squash in fix from Ben: Set PPGTT batches as necessary
This fixes the regression in the last couple of days when we enabled
PPGTT.
v3: Squash in fixup to still use GTT for secure batches from Ville:
BDW doesn't have a separate secure vs. non-secure bit in
MI_BATCH_BUFFER_START. So for secure batches we have to simply
leave the PPGTT bit unset. Fortunately older generations (except
HSW) had similar limitations so execbuffer already creates a GTT
mapping for all secure batches.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Legacy PPGTT on GEN8 requires programming 4 PDP registers per ring.
Since all rings are using the same address space with the current code
the logic is simply to program all the tables we've setup for the PPGTT.
v2: Turn on PPGTT in GFX_MODE
v3: v2 was the wrong patch
v4: Resolve conflicts due to patch series reordering.
v5: Squash in fixup from Ben: Use LRI to write PDPs
The docs (and simulator seems to back up) suggest that we can only
program legacy PPGTT PDPs with LRI commands.
v6: Rebase around context differences conflicts.
v7: Use #defines for per ring PDPs. (Damien)
v8: Don't use typede'f private_t.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (up to v3 and v7)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GEN8 insertion is very similar to GEN6.
v2: Rebase on top of Imre's for_each_sg_page helpers.
v3: Fixup my conversion (spotted by Ville).
v4: Rebase on top of the address space refactoring.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GEN8 PPGTT range clearing is very similar to GEN6 if we assume that our
PDEs are all valid, which they should be.
v2: Rebase on top of the address space refactoring.
v3: Rebase on top of the bool use_scratch addition to the clear_range interface.
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The upcoming clear and insert routines will expect that PDEs all point
to valid Page Directories. Doing that lazily doesn't really buy us
anything.
The page allocation is done regardless earlier in init so it shouldn't
hurt set the PDEs.
v2: Squash in patches to implement fixed PDE write function:
- If I had done this in the first place, the bug that's going to be
fixed in an upcoming patch would have been much easier to find.
- Use WB for PDEs.
The PAT bit is used for page size. 2ME PDEs aren't even supported in
BDW, so this was completely invalid. The solution is to make our
PDEs WB+LLC instead of the pervious WB+eLLC. As far as I can guess,
this change won't matter for performance.
Thanks to Ville for the quick correction when discussing on IRC.
v3: Return the pde type for pde encoding (Damien)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Aside from the potential size increase of the PPGTT, the primary
difference from previous hardware is the Page Directories are no longer
carved out of the Global GTT.
Note that the PDE allocation is done as a 8MB contiguous allocation,
this needs to be eventually fixed (since driver reloading will be a
pain otherwise). Also, this will be a no-go for real PPGTT support.
v2: Move vtable initialization
v3: Resolve conflicts due to patch series reordering.
v4: Rebase on top of the address space refactoring of the PPGTT
support. Drop Imre's r-b tag for v2, too outdated by now.
v5: Free the correct amount of memory, "get_order takes size not a page
count." (Imre)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
BDW caching works differently than the previous generations. Instead of
having bits in the PTE which directly control how the page is cached,
the 3 PTE bits PWT PCD and PAT provide an index into a PAT defined by
register 0x40e0. This style of caching is functionally equivalent to how
it works on HSW and before.
v2: Tiny bikeshed as discussed on internal irc.
v3: Squash in patch from Ville to mirror the x86 PAT setup more like
in arch/x86/mm/pat.c. Primarily, the 0th index will be WB, and not
uncached.
v4: Comment for reason to not use a 64b write on the PPAT.
v5: Add a FIXME comment that the caching bits in the PAT registers
might be wrong due to doc confusion.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the PTE clarifications, the bind and clear functions can now be
added for gen8.
v2: Use for_each_sg_pages in gen8_ggtt_insert_entries.
v3: Drop dev argument to pte encode functions, upstream lost it. Also
rebase on top of the scratch page movement.
v4: Rebase on top of the new address space vfuncs.
v5: Add the bool use_scratch argument to clear_range and the bool valid argument
to the PTE encode function to follow upstream changes.
v6: Add a FIXME(BDW) about the size mismatch of the readback check
that Jon Bloomfield spotted.
v7: Squash in fixup patch from Ben for the posting read to match the
64bit ptes and so shut up the WARN.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>