Commit Graph

7086 Commits

Author SHA1 Message Date
Daniel Vetter
f56f821feb mm: extend prefault helpers to fault in more than PAGE_SIZE
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:36:30 +02:00
Daniel Vetter
d174bd6472 drm/i915: extract copy helpers from shmem_pread|pwrite
While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.

v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:30:33 +02:00
Daniel Vetter
117babcdd5 drm/i915: use uncached writes in pwrite
It's around 20% faster.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:29:38 +02:00
Daniel Vetter
ffc62976d2 drm/i915: fall back to shmem pwrite when the buffer is not accessible
It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:29:08 +02:00
Daniel Vetter
586428852a drm/i915: implement inline clflush for pwrite
In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.

Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.

v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.

v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
  uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
  differentiate it from needs_clflush_before.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:28:45 +02:00
Daniel Vetter
96d79b5270 drm/i915: don't clobber userspace memory before commiting to the pread
The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
  deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
  outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:28:32 +02:00
Daniel Vetter
935aaa692e drm/i915: drop gtt slowpath
With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:27:21 +02:00
Daniel Vetter
692a576b9d drm/i915: don't call shmem_read_mapping unnecessarily
This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).

v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.

v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:27:03 +02:00
Daniel Vetter
3ae5378330 drm/i915: don't use gtt_pwrite on LLC cached objects
~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:25:45 +02:00
Daniel Vetter
a0356fc373 drm/i915: kill ranged cpu read domain support
No longer needed.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:25:32 +02:00
Daniel Vetter
8489731c9b drm/i915: move clflushing into shmem_pread
This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.

v2: Properly sync with the gpu on LLC machines.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:20:01 +02:00
Daniel Vetter
6d5cd9cb1e drm: add helper to clflush a virtual address range
Useful when the page is already mapped to copy date in/out.

For -stable because the next patch (fixing phys obj pwrite) needs this
little helper function.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:19:45 +02:00
Daniel Vetter
dbf7bff074 drm/i915: merge shmem_pread slow&fast-path
With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:19:11 +02:00
Daniel Vetter
e244a443bf drm/i915: merge shmem_pwrite slow&fast-path
With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:18:58 +02:00
Chris Wilson
dabdfe021a drm/i915: Avoid using mappable space for relocation processing through the CPU
We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:16:17 +02:00
Daniel Vetter
d1dd20a965 drm/i915: clear the entire gtt when using gem
We've lost our guard page somewhere in the gtt rewrite, this patch
here will restore it.

Exercised by i-g-t/tests/gem_cs_prefetch.

v2: Substract the guard page from the range we're supposed to manage
with gem. Suggested by Chris Wilson to increase the odds of old ums +
gem userspace not blowing up. To compensate for the loss of a page,
don't substract the guard page in the modeset init code any longer.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44748
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:15:24 +02:00
Daniel Vetter
9021f284e9 drm/i915: the intel gtt is _not_ an agp bridge!
So don't call it like that.

Also rip out a confusing comment and instead explain what's really
going on.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:15:07 +02:00
Daniel Vetter
644ec02b5d drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt
... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.

Also move it to the other global gtt functions in i915_gem_gtt.c

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:14:59 +02:00
Daniel Vetter
d42c9e2c24 drm/i915: reinstate GM45 TV detection fix
This reverts commmit d4b74bf078 which
reverted the origin fix fb8b5a39b6.

We have at least 3 different bug reports that this fixes things and no
indication what is exactly wrong with this. So try again.

To make matters slightly more fun, the commit itself was cc: stable
whereas the revert has not been.

According to Peter Clifton he discussed this with Zhao Yakui and this
seems to be in contradiction of the GM45 PRM, but rumours have it that
this is how the BIOS does it ... let's see.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Tested-by: Peter Clifton <Peter.Clifton@clifton-electronics.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16236
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=25913
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=14792
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:12:28 +02:00
Jerome Glisse
88f50c8074 drm/radeon/kms: add htile support to the cs checker v3
For 6xx+.  Required for mesa to use htile support for HiZ/HiS.
Userspace will check radeon version 2.14 with is bumped either
by tiling patch or stream out patch. This patch only add support
for htile relocation which should be enough for any userspace
to implement the hyperz (using htile buffer) feature.

v2: Jerome: Fix size checking for htile buffer.
v3: Jerome: Adapt on top of r600/evergreen cs checker changes,
            also check htile surface in case only stencil is
            present.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-26 09:53:22 +01:00
Alex Deucher
017d213f64 drm/radeon/kms/atom: force bpc to 8 for now
Using the bpc (bits per color) specified by the monitor
can cause problems in some cases.  Until we get a better
handle on how to deal with those cases, just use a bpc of 8.

Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-26 09:53:12 +01:00
Ben Skeggs
5936567146 drm/nouveau/i2c: fix thinko/regression on really old chipsets
Fixes i2c on my TNT2.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-26 09:36:07 +01:00
Ben Skeggs
c8435362f2 drm/nouveau: default to 8bpc for non-LVDS panels if EDID isn't useful
A few reports of bad behaviour since the autodetection defaulted to 6bpc,
lets fix this.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-26 09:36:03 +01:00
Ben Skeggs
c61205b24b drm/nouveau: fix thinko causing init to fail on cards without accel
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-26 09:35:56 +01:00
Kirill A. Shutemov
a1978f74da gma500: medfield: fix build without CONFIG_BACKLIGHT_CLASS_DEVICE
drivers/built-in.o: In function `mdfld_dsi_connector_set_property':
mdfld_dsi_output.c:(.text+0x6e909): undefined reference to `mdfld_set_brightness'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-03-26 09:33:24 +01:00