mirror of
https://github.com/HackerN64/F3DEX3.git
synced 2026-01-21 10:37:45 -08:00
97 lines
5.8 KiB
Markdown
97 lines
5.8 KiB
Markdown
@page performance Performance Results
|
|
|
|
# Philosophy
|
|
|
|
The base version of F3DEX3 was created for RDP bound games like OoT, where new
|
|
visual effects are desired and increasing the RSP time a bit does not affect the
|
|
overall performance. If your game is RSP bound, using the base version of F3DEX3
|
|
will make it slower.
|
|
|
|
Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on
|
|
**all** critical paths in the microcode, including command dispatch, vertex
|
|
processing, and triangle processing. Then, the RDP and memory traffic
|
|
performance improvements of F3DEX3--56 vertex buffer, auto-batched rendering,
|
|
etc.--should further improve performance from there. This means that switching
|
|
from F3DEX2 to F3DEX3_LVP_NOC should always improve performance regardless of
|
|
whether your game is RSP bound or RDP bound.
|
|
|
|
|
|
# Performance Results
|
|
|
|
## Cycle Counts
|
|
|
|
These are cycle counts for many key paths in the microcode. Lower numbers are
|
|
better. The timings are hand-counted taking into account all pipeline stalls and
|
|
all dual-issue conditions. Instruction alignment after branches is sometimes
|
|
taken into account, otherwise assumed to be optimal.
|
|
|
|
Vertex / lighting numbers assume no special features (texgen, packed normals,
|
|
etc.) Tri numbers assume texture, shade, and Z, and not flushing the buffer.
|
|
All numbers assume default profiling configuration. Empty cells are "not
|
|
measured yet".
|
|
|
|
| | F3DEX2 | F3DEX3_LVP_NOC | F3DEX3_LVP | F3DEX3_NOC | F3DEX3 |
|
|
|----------------------------|--------|----------------|------------|------------|--------|
|
|
| Command dispatch | 12 | 12 | 12 | 12 | 12 |
|
|
| Small RDP command | 14 | 5 | 5 | 5 | 5 |
|
|
| Vtx before DMA start | 16 | 17 | 17 | 17 | 17 |
|
|
| Vtx pair, no lighting | 54 | 54 | 81 | 79 | 98 |
|
|
| Vtx pair, 0 dir lts | Can't | 64 | | | |
|
|
| Vtx pair, 1 dir lt | 73 | 70 | 96 | 182 | 201 |
|
|
| Vtx pair, 2 dir lts | 76 | 77 | 103 | 211 | 230 |
|
|
| Vtx pair, 3 dir lts | 88 | 84 | 110 | 240 | 259 |
|
|
| Vtx pair, 4 dir lts | 91 | 91 | 117 | 269 | 288 |
|
|
| Vtx pair, 5 dir lts | 103 | 98 | 124 | 298 | 317 |
|
|
| Vtx pair, 6 dir lts | 106 | 105 | 131 | 327 | 346 |
|
|
| Vtx pair, 7 dir lts | 118 | 112 | 138 | 356 | 375 |
|
|
| Vtx pair, 8 dir lts | Can't | 119 | 145 | 385 | 404 |
|
|
| Vtx pair, 9 dir lts | Can't | 126 | 152 | 414 | 433 |
|
|
| Light dir xfrm, 0 dir lts | Can't | 95 | 95 | None | None |
|
|
| Light dir xfrm, 1 dir lt | 141 | 95 | 95 | None | None |
|
|
| Light dir xfrm, 2 dir lts | 180 | 96 | 96 | None | None |
|
|
| Light dir xfrm, 3 dir lts | 219 | 121 | 121 | None | None |
|
|
| Light dir xfrm, 4 dir lts | 258 | 122 | 122 | None | None |
|
|
| Light dir xfrm, 5 dir lts | 297 | 147 | 147 | None | None |
|
|
| Light dir xfrm, 6 dir lts | 336 | 148 | 148 | None | None |
|
|
| Light dir xfrm, 7 dir lts | 375 | 173 | 173 | None | None |
|
|
| Light dir xfrm, 8 dir lts | Can't | 174 | 174 | None | None |
|
|
| Light dir xfrm, 9 dir lts | Can't | 199 | 199 | None | None |
|
|
| Only/2nd tri to offscreen | 27 | 26 | 26 | 26 | 26 |
|
|
| 1st tri to offscreen | 28 | 27 | 27 | 27 | 27 |
|
|
| Only/2nd tri to clip | 32 | 31 | 31 | 31 | 31 |
|
|
| 1st tri to clip | 33 | 32 | 32 | 32 | 32 |
|
|
| Only/2nd tri to backface | 38 | 38 | 38 | 38 | 38 |
|
|
| 1st tri to backface | 39 | 39 | 39 | 39 | 39 |
|
|
| Only/2nd tri to degenerate | 42 | 40 | 40 | 40 | 40 |
|
|
| 1st tri to degenerate | 43 | 41 | 41 | 41 | 41 |
|
|
| Only/2nd tri to occluded | Can't | Can't | 49 | Can't | 49 |
|
|
| 1st tri to occluded | Can't | Can't | 50 | Can't | 50 |
|
|
| Only/2nd tri to draw | 172 | 160 | 163 | 160 | 163 |
|
|
| 1st tri to draw | 173 | 160 | 163 | 160 | 163 |
|
|
|
|
|
|
Tri numbers are measured from the first cycle of the command handler inclusive,
|
|
to the first cycle of whatever is after $ra exclusive. This is in order
|
|
to capture the extra latency and stalls in F3DEX2.
|
|
|
|
## Measurements
|
|
|
|
Vertex processing time as reported by the performance counter in the `PA`
|
|
configuration.
|
|
- Scene 1: Kakariko, adult day, from DMT entrance
|
|
- Scene 2: Custom empty scene with Suzanne monkey head with 1 dir light
|
|
- Scene 3: Same but Suzanne has vertex colors instead of lighting (Link is still
|
|
on screen and has lighting)
|
|
|
|
| Microcode | Scene 1 | Scene 2 | Scene 3 |
|
|
|----------------|---------|---------|---------|
|
|
| F3DEX3 | 7.41ms | 2.99ms | 2.22ms |
|
|
| F3DEX3_NOC | 6.85ms | 2.75ms | 1.98ms |
|
|
| F3DEX3_LVP | 4.12ms | 1.59ms | 1.48ms |
|
|
| F3DEX3_LVP_NOC | 3.34ms | 1.27ms | 1.16ms |
|
|
| F3DEX2 | Can't* | Can't* | Can't* |
|
|
| Vertex count | 3557 | 1548 | 1548 |
|
|
|
|
*F3DEX2 does not contain performance counters, so the portion of the RSP time
|
|
taken for vertex processing cannot be measured.
|