Commit Graph

1212 Commits

Author SHA1 Message Date
Unknown W. Brackets
079b67e7ed softgpu: Use common SIMD matrix multiplies. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
cba2374abd softgpu: Separate calculation of S/T.
We could probably reuse, but we're not right now and it complicates the
logic.
2022-01-06 21:19:47 -08:00
Henrik Rydgård
683289402c Merge pull request #15279 from unknownbrackets/samplerjit-fastpath
softgpu: Correct mirroring in fastpath+nearest
2022-01-05 09:43:20 +01:00
Henrik Rydgård
f82f24a9bb Merge pull request #15280 from unknownbrackets/samplerjit-dxt
Correct some recent regressions in samplerjit
2022-01-05 09:42:30 +01:00
Unknown W. Brackets
0993771104 samplerjit: Fix standard bufw check.
Oops, bufw could be intentionally higher while w is 16 bytes.
2022-01-05 00:11:34 -08:00
Unknown W. Brackets
741a9b0a4d samplerjit: Fix DXT compilation. 2022-01-05 00:00:03 -08:00
Unknown W. Brackets
19998976c7 samplerjit: Correct linear compile failure.
It was resetting to nullptr, because `nearest` was nullptr.
2022-01-04 23:58:07 -08:00
Unknown W. Brackets
e2f8cf8bf2 softgpu: Correct mirroring in fastpath+nearest. 2022-01-04 23:42:31 -08:00
Unknown W. Brackets
d98e5bfc97 softgpu: Improve usage of SSE for lighting.
Gives about a 2% improvement in many places.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
2aa57679fa softjit: Keep mip S/T calc in SIMD.
This is only a tiny bit faster, though.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
a309ed791b softjit: Use RIP access in color/depth off.
Seems to help, though it's small.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
612cc0ab5c softjit: Optimize depth range checks.
This was higher than I expected on the profile.  Not a huge improvement,
but a bit faster.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
961cfcd75c softjit: Add describes here too.
Helpful to aggregate when there are multiple rasterizers.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
26e7768a67 samplerjit: Remove old linear nearest paths.
We only use it for DXT now, so let's not keep the dead code around.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
5e3bef7e14 samplerjit: Avoid gather if overread could crash.
This should be rare, but a game could easily shove a CLUT4 texture at the
end of VRAM, and then accessing the last index would segfault.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
7806dfddea samplerjit: Use VPGATHERDD for all types. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
ce6ea8da11 samplerjit: Apply gather lookup to all CLUT4. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828 samplerjit: Use VPGATHERDD for simple CLUT4 loads.
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
65c84d5dd5 samplerjit: Avoid a couple more copies in AVX.
From looking at assembly, just trying to keep it small.
2022-01-02 17:01:14 -08:00
Unknown W. Brackets
7594187538 softgpu: Skip sample lookup if masked.
Was hoping making other things faster would make this unnecessary or
worse, but it hasn't seemed to.  This gives a pretty decent improvement in
most places (~4%.)
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
a0fe4d06bf softgpu: Stop specializing on miplevels.
Now that samplerjit is processing mips, it no longer helps.  Just
complexity now.
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
e4673a5fa4 softgpu: Separately profile verts and lighting. 2022-01-02 13:46:11 -08:00
Henrik Rydgård
d3f0af7458 Merge pull request #15273 from unknownbrackets/softjit-bloom
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Henrik Rydgård
c07ca2d89d Merge pull request #15272 from unknownbrackets/softgpu-meminfo
softgpu: Add code for tracking GPU writes
2022-01-02 18:09:16 +01:00
Henrik Rydgård
c7062d7063 Merge pull request #15271 from unknownbrackets/samplerjit-color16
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00