Commit Graph

402 Commits

Author SHA1 Message Date
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
496545e55c softgpu: Add code for tracking GPU writes.
Unfortunately, it has a pretty noticeable speed impact, even at the basic
"assume everything's written" level.  Compiled off by default, but at
least it's there.

Doesn't account for tests (i.e. alpha test skipping write) so still not
perfectly accurate.
2022-01-02 08:28:30 -08:00
Henrik Rydgård
cb1f26122d Merge pull request #15269 from unknownbrackets/softgpu-opt
softgpu: Reduce interpolation if not needed
2022-01-02 09:47:19 +01:00
Henrik Rydgård
da38c027b5 Merge pull request #15268 from unknownbrackets/samplerjit-nearest
Implement nearest in samplerjit, like linear
2022-01-02 09:46:29 +01:00
Unknown W. Brackets
025ac99f2f softgpu: Reduce interpolation if not needed.
About 3% gain in some areas.
2022-01-01 18:34:04 -08:00
Unknown W. Brackets
40240be91c samplerjit: Update nearest args, temp disable jit.
This temporarily disables jit for nearest, but refactors to use the new
arg structure.  It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
06e954fe2a samplerjit: Create a separate fetch func.
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
d41e42d247 softgpu: Correct off-by-one scissor mask.
Fixes Brave Story in the software renderer.  Was overwriting display list
data in the stride gap.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
b35ca3d472 softgpu: Cleanup min/max tri range handling.
The previous looked like it had off by one errors.  This is simpler.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
12405709f0 softgpu: Skip processing scissored triangles.
If only one side was scissored (common), we might even put it on a thread,
which ended up as a lot of overhead.  Gives 3-4% improvement in some
places.
2022-01-01 16:40:34 -08:00
Unknown W. Brackets
33e9841a4a softgpu: Skip zero size triangles.
These were drawing before, incorrectly, which caused artifacts.
Noticeable in Blade Dancer.
2021-12-31 00:20:12 -08:00
Unknown W. Brackets
4bd94a4e5e samplerjit: Pass funcs as an argument.
Seeing computing the ID in some profiles, so want to avoid computing per
thread/invocation.
2021-12-29 07:11:53 -08:00
Unknown W. Brackets
74eb450e76 samplerjit: Move texture function into jit.
Could do this also for nearest, might end up with a third set of functions
there for a direct sample lookup (for debug funcs.)
2021-12-28 17:52:17 -08:00
Unknown W. Brackets
940e6bb1d7 samplerjit: Lookup both mip tex values. 2021-12-28 16:22:54 -08:00
Unknown W. Brackets
6b55d328e5 samplerjit: Use regcache for linear filtering.
This makes it easier to reuse for mipmap filtering.
2021-12-28 15:37:25 -08:00
Unknown W. Brackets
a4558a5736 samplerjit: Take texptr/bufw as arrays.
Prep for moving mip map sampling into linear.
2021-12-28 12:04:16 -08:00
Unknown W. Brackets
a84accf713 samplerjit: Move S/T calculation into jit.
Gives a pretty decent 5-10% improvement in many places.
2021-12-28 09:58:23 -08:00
Unknown W. Brackets
9cc0883d53 softgpu: Correct non-SSE T clamp. 2021-12-27 15:31:37 -08:00
Unknown W. Brackets
39d5b1c221 softgpu: Reduce mipmap fraction to 4 bits.
For CONST (and SLOPE with flat w), this produces accurate values.
SLOPE is still wrong in its handling of w, and AUTO seems to calculate
using a different and less accurate ramp.  But they both produce values
with 16 steps, in any case.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
d6b6ef4cb1 softgpu: Correct nearest filtering too.
Turns out to have the same behavior as linear, when it comes to the
subpixel offset.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
1dfaea9062 softgpu: Remove no longer possible report.
Also, it's known how this behaves, now.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
75f105f84b softgpu: Make linear filtering more accurate.
This matches tests for various u/v offsets and x/y subpixel offsets.
Mipmaps are probably still wrong.
2021-12-27 11:37:32 -08:00
Unknown W. Brackets
b00a66e34c samplerjit: Pass u/v coords as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
3180e6c043 softgpu: Correct alpha on add + invalid texfuncs. 2021-12-05 16:28:37 -08:00
Unknown W. Brackets
325a1f75aa softgpu: Match texenv blend texfunc accurately. 2021-12-05 16:09:26 -08:00