Updated documentation

This commit is contained in:
Sauraen
2024-11-17 22:30:52 -08:00
parent 6182a8540b
commit a10d3dbe06
9 changed files with 76 additions and 76 deletions

View File

@@ -7,6 +7,11 @@
F3DEX3 is backwards compatible with F3DEX2 at the C GBI level for all features
and commands except:
- The viewport Y scale has been negated, and `G_MAXZ` has been renamed as its
value has changed. See the comment near `G_MAXZ` in the GBI.
- For the same reason, in `BrZ` configuration, any Z threshold values in
`SPBranchLessZ*` which are hard-coded into display lists (not based on
`G_MAXZ`) must be multiplied by 0x20.
- The `G_SPECIAL_*` command IDs have been removed. `G_SPECIAL_2` and
`G_SPECIAL_3` were no-ops in F3DEX2, and `G_SPECIAL_1` was a trigger to
recalculate the MVP matrix. There is no MVP matrix in F3DEX3 so this is

View File

@@ -1,4 +1,4 @@
@page microcode Microcode Configuration
@page configuration Microcode Configuration
# Microcode Configuration
@@ -35,30 +35,29 @@ and otherwise use the base version.
The primary tradeoff for all the new lighting features in F3DEX3 is increased
RSP time for vertex processing. The base version of F3DEX3 takes about
**2-2.5x** more RSP time for vertex processing than F3DEX2 (see Performance
Results section below), assuming no lighting or directional lights only.
However, under most circumstances, this does not affect the game's overall
framerate:
- This only applies to vertex processing, not triangle processing or other
miscellaneous microcode tasks. So the total RSP cycles spent doing useful work
during the frame is only modestly increased.
Results section below), assuming no lighting or directional lights only. You
should use the F3DEX3 performance counters (see below) to determine whether your
game is usually RSP or RDP bound.
If your game is usually RDP bound--like OoT--this generally will not affect the
game's overall framerate, so you should stick with base F3DEX3:
- The increased time only applies to vertex processing, not triangle processing
or other miscellaneous microcode tasks. So the total RSP cycles spent doing
useful work during the frame is only modestly increased.
- The increase in time is only RSP cycles; there is no additional memory
traffic, so the RDP time is not directly affected.
- In scenes which are complex enough to fill the RSP->RDP FIFO in DRAM, the RSP
usually spends a significant fraction of time waiting for the FIFO to not be
full (as revealed by the F3DEX3 performance counters, see below). In these
cases, slower vertex processing simply means less time spent waiting, and
little to no change in total RSP time.
full, as revealed by the performance counters. In these cases, slower vertex
processing simply means less time spent waiting, and little to no change in
total RSP time.
- When the FIFO does not fill up, usually the RSP takes significantly less time
during the frame compared to the RDP, so increased RSP time usually does not
affect the overall framerate.
As a result, you should always start with the base version of F3DEX3 in your
romhack, and if the RSP never becomes the bottleneck, you can stick with that.
However, if you have done extreme optimizations in your game to reduce RDP time
(i.e. if you are Kaze Emanuar), it's possible for the RSP to sometimes become
the bottleneck with F3DEX3's advanced vertex processing. As a result, the Legacy
Vertex Pipeline (LVP) configuration has been introduced.
However, for RSP bound or extremely optimized (Kaze Emanuar) games, base F3DEX3
can become a bottleneck, so the Legacy Vertex Pipeline (LVP) configuration has
been introduced.
This configuration replaces F3DEX3's native vertex and lighting code with a
faster version based on the same algorithms as F3DEX2. This removes:
@@ -70,12 +69,11 @@ faster version based on the same algorithms as F3DEX2. This removes:
However, it retains all other F3DEX3 features:
- 56 verts, 9 directional lights
- Occlusion plane (optional with NOC configuration)
- Z attribute offsets
- All features not related to vertex/lighting: auto-batched rendering, packed 5
triangles commands, hints system, etc.
The performance of F3DEX3 vertex processing with both LVP and NOC is nearly
identical that of F3DEX2; see the Performance page.
With both LVP and NOC enabled, F3DEX3 is faster on the RSP than F3DEX2 (see
@ref performance).
## Profiling
@@ -107,12 +105,6 @@ SM64), or `BrW` if the microcode is replacing F3DZEX (i.e. OoT or MM). This
controls whether `SPBranchLessZ*` uses the vertex's W coordinate or screen Z
coordinate.
## Extra Precision (`XP`)
This configuration attempts to reproduce F3DEX(1) numerical behavior for Z
buffer coefficients, potentially improving Z fighting in some cases of decals or
opaque surfaces intended to behave like decals.
## Debug Normals (`dbgN`)
Debug Normals has been moved out of the Makefile as it is not a microcode

View File

@@ -82,12 +82,6 @@ It is recommended to use `G_NORMALS_MODE_FAST` (the default) for most things,
and use `G_NORMALS_MODE_AUTO` only for objects while they currently have a
nonuniform scale (e.g. Mario only while he is squashed).
## Optimizing for RSP code size
A number of optimizations in F3DEX2 which saved a few cycles but took several
more instructions have been removed. Outside of vertex processing, these have a
very small impact on overall RSP time and no impact on RDP time.
## Far clipping removal
Far clipping is completely removed in F3DEX3. Far clipping is not intentionally
@@ -165,12 +159,11 @@ segment 0 must always be 0x00000000 so that this address resolves to e.g.
In F3DEX2, the RSP time for drawing non-textured tris was significantly lower
than for textured tris, by skipping a chunk of computation for the texture
coefficients if they were disabled. In F3DEX3, little to no computation is
skipped when textures are disabled, which means that the performance gain from
disabling textures in F3DEX2 has been mostly eliminated. (RDP time savings from
avoiding loading a texture are unaffected of course.) However, almost all
materials use textures, and F3DEX3 is a little faster at drawing textured tris
than F3DEX2, so this is still a benefit overall.
coefficients if they were disabled. In F3DEX3, no computation is skipped when
textures are disabled. However, almost all materials use textures, and F3DEX3 is
a little faster at drawing textured tris than F3DEX2. Plus, DRAM access time RSP
-> FIFO and FIFO -> RDP is still saved from not sending the coefficients, and
RDP time savings from avoiding loading a texture are unaffected of course.
## Obscure semantic differences from F3DEX2 that should never matter in practice

View File

@@ -7,17 +7,19 @@ visual effects are desired and increasing the RSP time a bit does not affect the
overall performance. If your game is RSP bound, using the base version of F3DEX3
will make it slower.
Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on all
critical paths in the microcode, including command dispatch, vertex processing,
and triangle processing. Then, the RDP and memory traffic performance
improvements of F3DEX3--56 vertex buffer, auto-batched rendering, etc.--should
further improve performance from there. This means that switching from F3DEX2 to
F3DEX3_LVP_NOC should always improve performance regardless of whether your game
is RSP bound or RDP bound.
Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on
**all** critical paths in the microcode, including command dispatch, vertex
processing, and triangle processing. Then, the RDP and memory traffic
performance improvements of F3DEX3--56 vertex buffer, auto-batched rendering,
etc.--should further improve performance from there. This means that switching
from F3DEX2 to F3DEX3_LVP_NOC should always improve performance regardless of
whether your game is RSP bound or RDP bound.
# Performance Results
## Cycle Counts
These are cycle counts for many key paths in the microcode. Lower numbers are
better. The timings are hand-counted taking into account all pipeline stalls and
all dual-issue conditions. Instruction alignment after branches is sometimes
@@ -72,6 +74,7 @@ Tri numbers are measured from the first cycle of the command handler inclusive,
to the first cycle of whatever is after $ra exclusive. This is in order
to capture the extra latency and stalls in F3DEX2.
## Measurements
Vertex processing time as reported by the performance counter in the `PA`
configuration.

View File

@@ -37,13 +37,15 @@ similar for other games):
Both OoT and SM64:
- Remove uses of internal GBI features which have been removed in F3DEX3 (see @ref compatibility for full list). In OoT, the only changes
needed are:
- Remove uses of internal GBI features which have been removed in F3DEX3 (see
@ref compatibility for full list). In OoT, the only changes needed are:
- In `src/code/ucode_disas.c`, remove the switch statement cases for
`G_LINE3D`, `G_MW_CLIP`, `G_MV_MATRIX`, `G_MVO_LOOKATX`, `G_MVO_LOOKATY`,
and `G_MW_PERSPNORM`.
- In `src/libultra/gu/lookathil.c`, remove the lines which set the `col`,
`colc`, and `pad` fields.
- In each place `G_MAXZ` is used, a compiler error will be generated;
negate the Y scale in each related viewport and change to `G_NEW_MAXZ`.
- Change your game engine lighting code to set the `type` (formerly `pad1`)
field to 0 in the initialization of any directional light (`Light_t` and
derived structs like `Light` or `Lightsn`). F3DEX3 ignores the state of the

View File

@@ -1,6 +1,10 @@
@page minimal-scanlines What happened to the clipping minimal scanlines algorithm?
@page removed Removed Features
# What happened to the clipping minimal scanlines algorithm?
# Removed Features
These features were present in earlier F3DEX3 versions, but have been removed.
## Clipping minimal scanlines algorithm
Earlier F3DEX3 versions included a modified algorithm for triangulating the
polygon which was formed as the result of clipping. This algorithm broke up the
@@ -57,3 +61,11 @@ The best we can do, which is what all previous F3D family microcodes did and
F3DEX3 does now, is to triangulate in a consistent way, based on the winding
of the input triangles. The results are still wrong, but they're wrong the same
way every frame, so there are no abrupt changes visible.
## Z attribute offsets
Earlier F3DEX3 versions included attribute offsets for vertex Z as well as ST.
By setting this to -2 and drawing an opaque tri, the tri would appear like a
decal, but with no Z-fighting. This has been removed and replaced with the decal
fix, which is automatic and does not require any special setup in the display
list.

View File

@@ -1,7 +1,7 @@
# Documentation
- @subpage compatibility
- @subpage microcode
- @subpage configuration
- @subpage design-tradeoffs
- @subpage minimal-scanlines
- @subpage removed
- @subpage performance
- @subpage porting