The fence register value also depends upon the stride of the object, so we
need to clear the fence if that is changed as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: Added 8xx and 965 paths, and renamed the confusing
i915_gem_object_tiling_ok function to i915_gem_object_fence_offset_ok]
Signed-off-by: Eric Anholt <eric@anholt.net>
With the work by Jesse Barnes to eliminate allocation of fences during
execbuffer, it becomes possible to write to the scan-out buffer with it
never acquiring a fence (simply by only ever writing to the object using
tiled GPU commands and never writing to it via the GTT). So for pre-i965
chipsets which require fenced access for tiled scan-out buffers, we need
to obtain a fence register.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
After performing an operation over the page list for a buffer retrieved by
i915_gem_object_get_pages() the pages need to be returned with
i915_gem_object_put_pages(). This was not being observed for the phys
objects which were thus leaking references to their backing pages.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Dave Airlie <airlied@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
To differentiate between encountering an out-of-memory error with running
out of space in the aperture, use ENOSPC for the later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
The batch buffer may be shared with another read buffer, so we should not
ignore any previously set domains, but just or in the command domain (and
check that the buffer is not writable).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
By sending a broken execbuffer (its length was not suitably aligned) I
triggered an operation upon a freed object. The invalid alignment was
discovered after updating the write_domain on the object but before the
object was placed on the active queue. So during the unwind process
following the error, the now freed object attempts to flush its
non-existent, but outstanding, GPU writes causing this use-after-free.
[drm:i915_dispatch_gem_execbuffer] *ERROR* alignment
[drm:i915_gem_execbuffer] *ERROR* dispatch failed -22
WARNING: at lib/kref.c:43 warn_slowpath_null+0x10/0x15()
Modules linked in:
Pid: 4552, comm: lt-csi-drm Not tainted 2.6.30-rc6 #423
Call Trace:
[<c0119ef3>] warn_slowpath_fmt+0x57/0x6d
[<c014de24>] ? get_pageblock_migratetype+0x18/0x1e
[<c014e8fd>] ? free_hot_page+0xa/0xc
[<c014e915>] ? __free_pages+0x16/0x1f
[<c0153ebf>] ? shmem_truncate_range+0x63e/0x656
[<c015fb2f>] ? slob_page_alloc+0x146/0x1c8
[<c0119f19>] warn_slowpath_null+0x10/0x15
[<c01f55f2>] kref_get+0x1b/0x21
[<c02605db>] i915_gem_object_move_to_active+0x1f/0x56
[<c0261302>] i915_add_request+0x156/0x19a
[<c026136e>] i915_gem_object_flush_gpu_write_domain+0x28/0x3f
[<c0261eca>] i915_gem_object_unbind+0x4a/0x124
[<c0261fd7>] i915_gem_free_object+0x33/0x9b
[<c0250d6b>] drm_gem_object_free+0x28/0x4a
[<c0250d43>] ? drm_gem_object_free+0x0/0x4a
[<c01f55ce>] kref_put+0x38/0x41
[<c0250cbf>] drm_gem_object_unreference+0x11/0x13
[<c0250d06>] drm_gem_object_handle_unreference+0x1e/0x21
[<c0250d13>] drm_gem_object_release_handle+0xa/0xe
[<c01f3e6b>] idr_for_each+0x5f/0x98
[<c0250d09>] ? drm_gem_object_release_handle+0x0/0xe
[<c0250daf>] drm_gem_release+0x22/0x34
[<c025046f>] drm_release+0x1e8/0x3c4
[<c0162d25>] __fput+0xaf/0x146
[<c0162dce>] fput+0x12/0x14
[<c01605ef>] filp_close+0x48/0x52
[<c011b182>] put_files_struct+0x57/0x9b
[<c011b1e4>] exit_files+0x1e/0x20
[<c011c6b6>] do_exit+0x16d/0x511
[<c03704ab>] ? __schedule+0x3d4/0x3e5
[<c0103f0d>] ? handle_irq+0xd/0x69
[<c011caa7>] do_group_exit+0x4d/0x73
[<c011cae0>] sys_exit_group+0x13/0x17
[<c010268c>] sysenter_do_call+0x12/0x2b
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Update interrupt handling methods for IGDNG with new registers
for display and graphics interrupt functions. As we won't use
irq-based vblank sync in dri2, so display interrupt on new chip
will be used for hotplug only in future.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
keithp didn't like the original 20ms plan because a cooperative client could
be starved by an uncooperative client. There may even have been problems
with cooperative clients versus cooperative clients. So keithp changed
throttle to just wait for the second to last seqno emitted by that client.
It worked well, until we started getting more round-trips to the server
due to DRI2 -- the server throttles in BlockHandler, and so if you did more
than one round trip after finishing your frame, you'd end up unintentionally
syncing to the swap.
Fix this by keeping track of the client's requests, so the client can wait
when it has an outstanding request over 20ms old. This should have
non-starving behavior, good behavior in the presence of restarts, and less
waiting. Improves high-settings openarena performance on my GM45 by 50%.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This could be triggered by a gtt mapping fault on 965 that decides to
remove the fence from another object that happens to be active currently.
Since the other object doesn't rely on the fence reg for its execution, we
don't wait for it to finish. We'll soon be not waiting on 915 most of the
time as well, so just drop the BUG_ON.
Signed-off-by: Eric Anholt <eric@anholt.net>
When a GEM object is evicted from the GTT we set it to the CPU domain,
as it might get swapped in and out or ever mmapped regularly. If the
object is mmapped through the GTT it can still get evicted in this way
by other objects requiring GTT space. When the GTT mapping is touched
again we fault it back into the GTT, but fail to set it back to the
GTT domain. This means we fail to flush any cached CPU writes to the
pages backing the object which will then happen "eventually", typically
after we write to the page through the uncached GTT mapping.
[anholt: Note that userland does do a set_domain(GTT, GTT) when starting
to access the GTT mapping. That covers getting the existing mapping of the
object synchronized if it's bound to the GTT. But set_domain(GTT, GTT)
doesn't do anything if the object is currently unbound. This fix covers the
transition to being bound for GTT mapping.]
Fixes glyph and other pixmap corruption during swapping. fd.o bug #21790
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
On the 865, but not the 855, the clflush we do appears to not actually make
it out to the hardware all the time. An easy way to safely reproduce was
X -retro, which would show that some of the blits involved in drawing the
lovely root weave didn't make it out to the hardware. Those blits are 32
bytes each, and 1-2 would be missing at various points around the screen.
Other experimentation (doing more clflush, doing more AGP chipset flush,
poking at some more device registers to maybe trigger more flushing) didn't
help. krh came up with the wbinvd as a way to successfully get all those
blits to appear.
Signed-off-by: Eric Anholt <eric@anholt.net>
The pitch field is an exponent on pre-965, so we were rejecting buffers
on 8xx that we shouldn't have. 915 got lucky in that the largest legal
value happened to match (8KB / 512 = 0x10), but 8xx has a smaller tile width.
Additionally, we programmed that bad value into the register on 8xx, so the
only pitch that would work correctly was 4096 (512-1023 pixels), while others
would probably give bad rendering or hangs.
Signed-off-by: Eric Anholt <eric@anholt.net>
fd.o bug #20473.
For some reason we never added 8xx desktop cursor support to the
kernel. This patch fixes that.
[krh: Also set the size on pre-i915 hw.]
Tested-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
For awhile now, many of the GEM code paths have allocated page or
object arrays with the slab allocator. This is nice and fast, but
won't work well if memory is fragmented, since the slab allocator works
with physically contiguous memory (i.e. order > 2 allocations are
likely to fail fairly early after booting and doing some work).
This patch works around the issue by falling back to vmalloc for
>PAGE_SIZE allocations. This is ugly, but much less work than chaining
a bunch of pages together by hand (suprisingly there's not a bunch of
generic kernel helpers for this yet afaik). vmalloc space is somewhat
precious on 32 bit kernels, but our allocations shouldn't be big enough
to cause problems, though they're routinely more than a page.
Note that this patch doesn't address the unchecked
alloc-based-on-ioctl-args in GEM; that needs to be fixed in a separate
patch.
Also, I've deliberately ignored the DRM's "area" junk. I don't think
anyone actually uses it anymore and I'm hoping it gets ripped out soon.
[Updated: removed size arg to new free function. We could unify the
free functions as well once the DRM mem tracking is ripped out.]
fd.o bug #20152 (part 1/3)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
We might sleep here anyway so I hope an extra uncached read is ok to
add.
In #20896 we found that vbetool clobbers the IER. In KMS mode this is
particularly bad since we don't set the interrupt regs late (in
EnterVT), so we'd fail to get *any* interrupts at all after X started
(since some distros have scripts that call vbetool at X startup
apparently).
So this patch checks IER at wait_request time, and re-enables
interrupts if it's been clobbered. In a proper config this check
should never be triggered.
This is really a distro issue, but having a sanity check is nice, as
long as it doesn't have a real performance hit.
Tested-by: Mateusz Kaduk <mateusz.kaduk@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: Moved the check inside of the sleeping case to avoid perf cost]
Signed-off-by: Eric Anholt <eric@anholt.net>
regression caused by commit 5e118f4139:
i915_gem_object_move_to_inactive() should be called in task context,
as it calls fput();
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
[anholt: Add more detail to the comment about the lock break that's added]
Signed-off-by: Eric Anholt <eric@anholt.net>
Save the bit 17 state of the pages when freeing the page list, and
reswizzle them if necessary when rebinding the pages (in case they were
swapped out). Since we have userland with expectations that the swizzle
enums let it pread and pwrite contents accurately, we can't expose a new
swizzle enum for bit 17 (which it would have to GTT map to handle), so we
handle it down in pread and pwrite by swizzling the copy when bit 17 of the
page address is set.
Signed-off-by: Eric Anholt <eric@anholt.net>
Otherwise, the results of our read didn't show up when we were faulting in
the page being read into (as happened with a testcase reading into a big
stack area). Likely accounts for some conformance test failures.
Signed-off-by: Eric Anholt <eric@anholt.net>
i915_gem_put_relocs_to_user returned an uninitialized value which
got returned to userspace. This caused libdrm in my setup to never
get out of a do{}while() loop retrying i915_gem_execbuffer.
result was hanging X, overheating of cpu and 2-3gb of logfile-spam.
This patch adresses the issue by
1. initializing vars in this file where necessary
2. correcting wrongly interpreted return values of copy_[from/to]_user
Signed-off-by: Florian Mickler <florian@mickler.org>
[anholt: cleanups of unnecessary changes, consistency in APIs]
Signed-off-by: Eric Anholt <eric@anholt.net>
We create a debugfs node (i915_ringbuffer_data) to expose a hex dump
of the ring buffer itself. We also expose another debugfs node
(i915_ringbuffer_info) with information on the state (i.e. head, tail
addresses) of the ringbuffer.
For batchbuffer dumping, we look at the device's active_list, dumping
each object which has I915_GEM_DOMAIN_COMMAND in its read
domains. This is all exposed through the dri/i915_batchbuffers debugfs
file with a header for each object (giving the objects gtt_offset so
that it can be matched against the offset given in the
BATCH_BUFFER_START command.
Signed-off-by: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This is a baby-step in the direction of having finer-grained
locking than the struct_mutex. Specifically, this will enable
new debugging code to read the active list for printing out
GPU state when the GPU is wedged, (while the struct_mutex is
held, of course).
Signed-off-by: Carl Worth <cworth@cworth.org>
[anholt: indentation fix]
Signed-off-by: Eric Anholt <eric@anholt.net>
Indicates something is wrong with the mapping; and apparently triggers
in current kernels.
Signed-off-by: Jesse Barnes <jbarnes@virtuosugeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>