Lots more documentation

This commit is contained in:
Sauraen
2025-07-13 16:37:22 -07:00
parent 35f7faf653
commit 4c3af75485
9 changed files with 218 additions and 323 deletions

View File

@@ -29,8 +29,8 @@ all at the same time!
and normals/lighting on the same mesh**, by encoding the normals in the unused
2 bytes of each vertex using the 5-6-5 bit encoding by HailToDodongo from
[Tiny3D](https://github.com/HailToDodongo/tiny3d). Model-space precision of
the normals is reduced, but this is rarely noticeable and there is barely any
performance penalty compared to regular normals without vertex colors.
the normals is reduced, but this is rarely noticeable, and the performance is
nearly identical to vanilla normals (without simultaneous vertex colors).
- New geometry mode bit `G_AMBOCCLUSION` enables **ambient occlusion** for
opaque materials. Paint the shadow map into the vertex alpha channel; separate
factors (set with `SPAmbOcclusion`) control how much this affects the ambient
@@ -73,7 +73,7 @@ all at the same time!
RSP time as fewer verts have to be reloaded and re-transformed, and also makes
display lists shorter.
- New **occlusion plane** system allows the placement of a 3D quadrilateral
where objects behind this plane in screen space are culled. This can
where triangles behind this plane in screen space are culled. This can
dramatically improve RDP performance by reducing overdraw in scenes with walls
in the middle, such as a city or an indoor scene.
- If a material display list being drawn is the same as the last material, the
@@ -92,7 +92,8 @@ all at the same time!
shade alpha values are all below or above a settable threshold. This
**substantially reduces the performance penalty of cel shading**--only tris
which "straddle" the cel threshold are drawn twice, the others are only drawn
once.
once. This can also be used to **cull tris which are fully in fog**, replacing
far clipping which is removed in F3DEX3.
- A new "hints" system encodes the expected size of the target display list into
call, branch, and return DL commands. This allows only the needed number of DL
commands in the next DL to be fetched, rather than always fetching full
@@ -107,26 +108,30 @@ all at the same time!
value. This can be used for clearing the Z buffer or filling the framebuffer
or the letterbox with a solid color **faster than the RDP can in fill mode**.
Practical performance may vary due to scheduling constraints.
- The key codepaths for triangle draw and vertex processing (assuming lighting
enabled and the occlusion plane disabled with the `NOC` configuration) are
**slightly faster than in F3DEX2**.
- New `SPFlush` command can ensure that the RDP starts clearing the framebuffer
as soon as possible during the frame, instead of waiting a short time for
further RSP processing.
- The key codepaths for command dispatch, triangle draw, and vertex processing
(assuming lighting enabled and the occlusion plane disabled with the `NOC`
configuration) are **slightly faster than in F3DEX2**.
### Miscellaneous
- **Z-fighting of decals has been nearly eliminated**, with only a modest
increase in overdraw of very close occluding geometry. This is based on a
technique developed by SGI, neglected and removed by Nintendo, and re-added
by Rare; the F3DEX3 version improves upon it by choosing optimal parameters
and automatically enabling it for all decals with no code or DL changes. In
addition, the reduction in Z buffer precision from F3DEX(1) to F3DEX2 has been
reversed, and additional Z buffer precision beyond F3DEX(1) has been added.
increase in overdraw onto the decal of very close occluding geometry. This is
based on a technique developed by SGI, neglected and removed by Nintendo, and
re-added by Rare; the F3DEX3 version improves upon it by choosing optimal
parameters and automatically enabling it for all decals with no code or DL
changes.
- The reduction in Z buffer precision from F3DEX(1) to F3DEX2 has been reversed,
and **additional Z buffer precision** beyond F3DEX(1) has been added.
- **Point lighting** has been redesigned. The appearance when a light is close
to an object has been improved. Fixed a bug in F3DEX2/ZEX point lighting where
a Z component was accidentally doubled in the point lighting calculations. The
quadratic point light attenuation factor is now an E3M5 floating-point number.
The performance penalty for using large numbers of point lights has been
reduced.
- Maximum number of directional / point **lights raised from 7 to 9**. Minimum
quadratic point light attenuation factor is now an E3M5 floating-point number
for a wider representable range. The performance penalty for using large
numbers of point lights has been reduced.
- Maximum number of directional / point lights **raised from 7 to 9**. Minimum
number of directional / point lights lowered from 1 to 0 (F3DEX2 required at
least one). Also supports loading all lights in one DMA transfer
(`SPSetLights`), rather than one per light.
@@ -136,15 +141,15 @@ all at the same time!
parameters are encoded in the command. With some limitations, this allows the
tint colors of cel shading to **match scene lighting** with no code
intervention. Also useful for other lighting-dependent effects.
- The microcode automatically switches between two lighting implementations
- The microcode automatically switches between **two lighting implementations**
depending on which visual features are selected in the particular material.
The "basic lighting" codepath--which is roughly the same speed as F3DEX2--
supports all F3DEX2 features (directional lights, texgen), plus packed
normals, ambient occlusion, and light-to-alpha. The "advanced lighting"
codepath, which is slower, adds support for point lights, specular, and
Fresnel. You only pay the performance penalty for the features you use, and
only for the objects which use them.
Fresnel. You only pay the performance penalty for the objects which use these
advanced features.
### Profiling

View File

@@ -20,9 +20,9 @@ update the occlusion plane after updating the camera and write the pointer to
this occlusion plane into the existing DL command near the beginning.
3. Create a system in your game engine for dynamically choosing or creating an
occlusion plane. For example, you might have a set of pre-determined occlusion
planes in the scene, and at runtime pick the one which you think is most
optimal. Some criteria to use for this include:
occlusion plane. (See the implementation in HackerOoT.) For example, you might
have a set of pre-determined occlusion planes in the scene, and at runtime pick
the one which you think is most optimal. Some criteria to use for this include:
- whether the camera is on the correct side of the occlusion plane
- the distance from the camera to the full (infinite) plane
- how far the point of the camera projected onto the full (infinite) plane is

View File

@@ -29,8 +29,8 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `DPLoadTLUT*` | = | = | Up | Load is not sent to RDP if repeated in auto-batched rendering. See the GBI comment near `SPDontSkipTexLoadsAcross`. This is a performance optimization only and doesn't affect on-screen output unless the game is buggy / misusing the feature, so this behavior need not be emulated in HLE. |
| `DPLoadBlock*` | = | = | Up | Same as `DPLoadTLUT*` above. |
| `DPLoadTile*` | = | = | Up | Same as `DPLoadTLUT*` above. |
| `DPLoadBlock*` | = | = | Up | Same as `DPLoadTLUT*` above. |
| `DPLoadTile*` | = | = | Up | Same as `DPLoadTLUT*` above. |
| `SPSetOtherMode` | = | = | | |
| All other `DP*` | = | = | | Microcode generally can't change RDP command behavior. |
@@ -48,13 +48,13 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| `G_MV_POINT` | Rem | Rem | | Removed because the internal vertex format is no longer a multiple of 8 (DMA word). |
| `SPTexture` | = | = | | |
| `SPTextureL` | = | = | | HW V1 workaround; long since deprecated. |
| `SP1Triangle` | = | = | Up | Some of the new features in F3DEX3 (occlusion plane, alpha compare culling, decal fix) are during triangle processing.
| `SP1Triangle` | = | = | Up | Some of the new features in F3DEX3 (occlusion plane, alpha compare culling, decal fix) are during triangle processing. |
| `SP2Triangles` | = | = | Up | Same as `SP1Triangle` above. |
| `SP1Quadrangle` | = | = | Up | Same as `SP1Triangle` above. |
| `SPTriStrip` | New | New | Up | New command that draws 5 tris from 7 indexes, see GBI. |
| `SPTriFan` | New | New | Up | New command that draws 5 tris from 7 indexes, see GBI. |
| `SPMemset` | New | New | Up | New command that memsets a RDRAM region faster than the RDP can, for framebuffer or Z-buffer clear. |
| `G_LINE3D` | Rem | Rem | | Removed; no-op in F3DEX2. |
| `G_LINE3D` | Rem | Rem | | Removed; was a no-op in F3DEX2. |
### Control Logic
@@ -74,7 +74,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| `G_MW_SEGMENT` | = | = | | |
| `G_MWO_SEGMENT_*` | = | = | | These were never needed. |
| `SPFlush` | New | New | Up | This is a performance optimization only and can't be HLE emulated, so it should be treated as a no-op. |
| `G*` (`Gfx` subtypes) | ? | ? | | Deprecated. These did not fully reflect the bits usage in actual commands even in F3DEX2. These have mostly not been updated for F3DEX3. |
| `G*` (`Gfx` subtypes) | ? | ? | | Deprecated. These did not fully reflect the bits usage in actual commands even in F3DEX2. Almost none of these have been updated for F3DEX3. |
### 3D Space
@@ -82,7 +82,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|----------------------|-----|-----|------|-------|
| `Mtx` | = | = | | |
| `SPMatrix` | Chg | = | * | Encoding changed due to multiple flags below changing. |
| `G_MTX_PUSH` | = | = | Down | `SPMatrix` processing with `G_MTX_PUSH` set is moved to Overlay 3 (slower) as games should not use the RSP matrix stack for accuracy and performance reasons (see GBI). |
| `G_MTX_PUSH` | = | = | Down | `SPMatrix` processing with `G_MTX_PUSH` set is moved to Overlay 3 (slower) as games generally should not use the RSP matrix stack for accuracy and performance reasons (see GBI). |
| `G_MTX_NOPUSH` | = | = | | |
| `G_MTX_LOAD` | Chg | = | | Encoding inverted (in SPMatrix, not in the definition of `G_MTX_LOAD`). |
| `G_MTX_MUL` | Chg | = | | Encoding inverted (in SPMatrix, not in the definition of `G_MTX_MUL`). |
@@ -92,20 +92,20 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| `G_MV_TEMPMTX0` | Chg | = | | Encoding changed. |
| `G_MV_VPMTX` | Chg | New | | New name for `G_MV_PMTX`, encoding changed. |
| `G_MV_TEMPMTX1` | Chg | = | | Encoding changed. |
| `SPPopMatrix*` | Chg | = | Down | Moved to Overlay 3 (slower) as games should not use the RSP matrix stack for accuracy and performance reasons (see GBI). Encoding is changed due to `G_MV_MMTX` changing. |
| `SPPopMatrix*` | Chg | = | Down | Moved to Overlay 3 (slower) as games generally should not use the RSP matrix stack for accuracy and performance reasons (see GBI). Encoding is changed due to `G_MV_MMTX` changing. |
| `SPForceMatrix` | Chg | Chg | | Converted into no-op. |
| `G_MV_MATRIX` | Rem | Rem | | Removed. |
| `G_MW_MATRIX` | Rem | Rem | | Removed. |
| `G_MW_FORCEMTX` | Rem | Rem | | Removed. |
| `SPViewport` | * | * | | Command itself is the same, but see `Vp` below. |
| `Vp_t` / `Vp` | Chg | Chg | | The Y scale is now negated, and the Z values are different due to the change from `G_MAXZ` to `G_NEW_MAXZ`.
| `Vp_t` / `Vp` | Chg | Chg | | The Y scale is now negated, and the Z values are different due to the change from `G_MAXZ` to `G_NEW_MAXZ`. |
| `G_MAXZ` | Rem | Rem | | Replaced with `G_NEW_MAXZ`. The name change is to force you to update your code--especially viewport definitions with hardcoded constants which are NOT defined in terms of `G_MAXZ`. |
| `G_NEW_MAXZ` | New | New | | The equivalent of `G_MAXZ` constant used in viewport calculations. |
| `G_MV_VIEWPORT` | = | = | | |
| `SPPerspNormalize` | Chg | = | | Encoding changed. |
| `G_MW_PERSPNORM` | Rem | Rem | | Removed. The perspective normalization factor is set via `G_MW_FX` with the changed encoding of `SPPerspNormalize`. |
| `G_MWO_PERSPNORM` | New | New | | |
| `SPClipRatio` | Chg | Chg | | Converted into no-op. It is not possible to change the clip ratio from 2 in F3DEX3. |
| `SPClipRatio` | Chg | Chg | | Converted into no-op. It is not possible to change the clip ratio from 2 in F3DEX3. Changing the clip ratio was rarely used in production games. |
| `G_MW_CLIP` | Rem | Rem | | Removed. See `SPClipRatio` above. |
### Lighting
@@ -113,9 +113,9 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `Light_t`, `Light` | Chg | * | | `type` field must be set to 0 (`LIGHT_TYPE_DIR`) to indicate directional light. `size` field for specular added. Otherwise the same, though note that now there is not an extra 8 bytes of padding between lights (the offset between them is 16, not 24). |
| `LIGHT_TYPE_DIR` | New | New | | New macro, but the encoding is the same as F3DEX2_PL. |
| `PointLight_t` | Chg | * | | Same changes as `Light_t`. Also note that the `kq` field is now interpreted as an E3M5 floating-point number. |
| `LIGHT_TYPE_POINT` | New | New | | New macro, but the encoding is the same as F3DEX2_PL. |
| `LIGHT_TYPE_DIR` | New | New | | New macro, but the encoding is the same as in F3DEX2_PL. |
| `PointLight_t` | Chg | * | | Same changes as `Light_t`. Also the `kq` field is now interpreted as an E3M5 floating-point number. |
| `LIGHT_TYPE_POINT` | New | New | | New macro, but the encoding is the same as in F3DEX2_PL. |
| `Ambient_t`, `Ambient` | = | = | | Note that you must use `Ambient`, not `Light`, for the ambient light if you have 9 directional/point lights. |
| `Lights1`, `Lights2`, ... | Chg | * | | The ambient light is at the end, not the beginning. The data layout matches the RSP internal data layout to enable `SPSetLights`. |
| `Lightsn` | Chg | * | | Same as `Lights1` etc. Also, now 9 directional/point lights. |
@@ -127,7 +127,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| `G_MWO_NUMLIGHT` | = | = | | |
| `NUML` | Chg | = | | Encoding changed. |
| `NUMLIGHTS_*` | Chg | = | | Deprecated as these are just defined equal to their number, because F3DEX3 supports zero lights. |
| `LIGHT_*` | = | = | | Deprecated and were never useful. |
| `LIGHT_*` | = | = | | Deprecated and were not useful in F3DEX2 either. |
| `SPLight` | Chg | = | | Encoding changed. Note that you must use `SPAmbient`, not `SPLight`, for the ambient light if you have 9 directional/point lights. Also note that you should usually use `SPSetLights` unless you need to set individual lights without affecting the others. |
| `SPAmbient` | New | New | | New command to upload the ambient light. If you have 0-8 directional/point lights, you can also use `SPLight` for this (slightly slower), but if you have 9 directional/point lights you must use `SPAmbient`. |
| `SPLightColor*` | Chg | = | | Encoding changed. |
@@ -138,7 +138,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| `G_MWO_bLIGHT_*` | Chg | = | | Encodings changed. No longer needed. |
| `G_MVO_L*` | Rem | Rem | | Removed. |
| `SPCameraWorld` | New | New | | New command to set the camera position for Fresnel. |
| `PlainVtx` | New | New | | For `SPCameraWorld`.
| `PlainVtx` | New | New | | For `SPCameraWorld`. |
| `SPLookAt` | New | New | | Replaces `SPLookAtX` and `SPLookAtY`. |
| `SPLookAtX` | Chg | * | | Encoding changed; in an attempt at backwards compatibility, defined as `SPLookAt`, which works with basic usage. |
| `SPLookAtY` | Chg | * | | Converted to no-op. |
@@ -155,7 +155,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|--------------------------|-----|-----|------|-------|
| `SP*GeometryMode*` | * | * | | Commands themselves are the same, but many new geometry mode flags, see below. |
| `G_ZBUFFER` | = | = | | |
| `G_TEXTURE_ENABLE` | = | = | | Very old (F3D / HW v1) display lists with this bit set will no longer crash on F3DEX3, unlike F3DEX2. |
| `G_TEXTURE_ENABLE` | = | = | | Very old (F3D / HW v1) display lists with this bit set will crash on F3DEX2, but not on F3DEX3. |
| `G_SHADE` | = | = | | |
| `G_ATTROFFSET_ST_ENABLE` | New | New | | New geometry mode bit that enables ST attribute offsets, usually for smooth scrolling. |
| `SPAttrOffsetST` | New | New | | New command which writes ST attribute offsets using `G_MWO_ATTR_OFFSET_*`. |
@@ -199,13 +199,11 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
| `SPDontSkipTexLoadsAcross` | New | New | Up | New command which locally cancels auto-batched rendering by writing an invalid address to `G_MWO_LAST_MAT_DL_ADDR`. |
| `G_MWO_LAST_MAT_DL_ADDR` | New | New | | |
| `SPAlphaCompareCull` | New | New | Up | New command which enables culling of tris based on shade alpha values, for cel shading. Normal use of this command in cel shading is a performance optimization only and doesn't affect on-screen output, so it can be treated as a no-op by an initial HLE implementation. But it is easy to write a display list where it does affect on-screen output, so a good HLE implementation should emulate it. |
| `G_ALPHA_COMPARE_CULL_DISABLE` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_ALPHA_COMPARE_CULL_BELOW` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_ALPHA_COMPARE_CULL_ABOVE` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_ALPHA_COMPARE_CULL_*` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_MWO_ALPHA_COMPARE_CULL` | New | New | | |
| `MoveWd` | = | = | | Regular/valid encodings are the same. |
| `MoveHalfwd` | New | New | | Like `MoveWd` but writes 2 bytes instead of 4. |
| `G_MW_FX` | New | New | | New moveword table index for base address for many parameters. |
| `G_SPECIAL_1` | Rem | Rem | | Removed; in F3DEX2, triggered MVP matrix recalculation. |
| `G_SPECIAL_2` | Rem | Rem | | Removed; no-op in F3DEX2. |
| `G_SPECIAL_3` | Rem | Rem | | Removed; no-op in F3DEX2. |
| `G_SPECIAL_2` | Rem | Rem | | Removed; was a no-op in F3DEX2. |
| `G_SPECIAL_3` | Rem | Rem | | Removed; was a no-op in F3DEX2. |

View File

@@ -2,7 +2,7 @@
# Microcode Configuration
There are several selectable configuration settings when building F3DEX3, which
There are a few selectable configuration settings when building F3DEX3, which
can be enabled in any combination. With a couple minor exceptions, none of these
settings affect the GBI--in fact, you can swap between the microcode versions on
a per-frame basis if you build multiple versions into your romhack.
@@ -30,65 +30,21 @@ which version to use on the profiling results from the previous frame: if the
RSP is the bottleneck (e.g. the RDP `CLK - CMD` is high), use the NOC version,
and otherwise use the base version.
## Legacy Vertex Pipeline (LVP)
The primary tradeoff for all the new lighting features in F3DEX3 is increased
RSP time for vertex processing. The base version of F3DEX3 takes about
**2-2.5x** more RSP time for vertex processing than F3DEX2 (see Performance
Results section below), assuming no lighting or directional lights only. You
should use the F3DEX3 performance counters (see below) to determine whether your
game is usually RSP or RDP bound.
If your game is usually RDP bound--like OoT--this generally will not affect the
game's overall framerate, so you should stick with base F3DEX3:
- The increased time only applies to vertex processing, not triangle processing
or other miscellaneous microcode tasks. So the total RSP cycles spent doing
useful work during the frame is only modestly increased.
- The increase in time is only RSP cycles; there is no additional memory
traffic, so the RDP time is not directly affected.
- In scenes which are complex enough to fill the RSP->RDP FIFO in DRAM, the RSP
usually spends a significant fraction of time waiting for the FIFO to not be
full, as revealed by the performance counters. In these cases, slower vertex
processing simply means less time spent waiting, and little to no change in
total RSP time.
- When the FIFO does not fill up, usually the RSP takes significantly less time
during the frame compared to the RDP, so increased RSP time usually does not
affect the overall framerate.
However, for RSP bound or extremely optimized (Kaze Emanuar) games, base F3DEX3
can become a bottleneck, so the Legacy Vertex Pipeline (LVP) configuration has
been introduced.
This configuration replaces F3DEX3's native vertex and lighting code with a
faster version based on the same algorithms as F3DEX2. This removes:
- Point lighting
- F3DEX3 lighting features: packed normals, ambient occlusion, light-to-alpha
(cel shading), Fresnel, and specular lighting
- ST attribute offsets
However, it retains all other F3DEX3 features:
- 56 verts, 9 directional lights
- Occlusion plane (optional with NOC configuration)
- All features not related to vertex/lighting: auto-batched rendering, packed 5
triangles commands, hints system, etc.
With both LVP and NOC enabled, F3DEX3 is faster on the RSP than F3DEX2 (see
@ref performance).
## Profiling
As mentioned above, F3DEX3 includes many performance counters. There are far too
many counters for a single microcode to maintain, so multiple configurations of
the microcode can be built, each containing a different set of performance
counters. These can be swapped while the game is running so the full set of
counters can be effectively accessed over multiple frames.
F3DEX3 includes many performance counters. There are far too many counters for a
single microcode to maintain, so multiple configurations of the microcode can be
built, each containing a different set of performance counters. These can be
swapped while the game is running so the full set of counters can be effectively
accessed over multiple frames.
There are a total of 21 performance counters, including:
- Counts of vertices, triangles, rectangles, matrices, DL commands, etc.
- Times the microcode was processing vertices, processing triangles, stalled
because the RDP FIFO in DMEM was full, and stalled waiting for DMAs to finish
- A counter enabling a rough measurement of how long the RDP was stalled
waiting for RDRAM for I/O to the framebuffer / Z buffer
waiting for RDRAM for I/O to the framebuffer / Z buffer (spoiler: often
half to two thirds of the total RDP time!)
The default configuration of F3DEX3 provides a few of the most basic counters.
The additional profiling configurations, called A, B, and C (for example
@@ -103,7 +59,9 @@ because their removal does not affect the RDP render time.
Use `BrZ` if the microcode is replacing F3DEX2 or an earlier F3D version (i.e.
SM64), or `BrW` if the microcode is replacing F3DZEX (i.e. OoT or MM). This
controls whether `SPBranchLessZ*` uses the vertex's W coordinate or screen Z
coordinate.
coordinate. If you are creating a new project for any game without using vanilla
scenes, and you're considering using this instruction for LoD, you should use
`BrW`.
## Debug Normals (`dbgN`)
@@ -113,10 +71,12 @@ version intended to be shipped. It can still be enabled by changing
To help debug lighting issues when integrating F3DEX3 into your romhack, this
feature causes the vertex colors of any material with lighting enabled to be set
to the transformed, normalized world space normals. The X, Y, and Z components
map to R, G, and B, with each dimension's conceptual (-1.0 ... 1.0) range mapped
to (0 ... 255). This is not compatible with LVP as world space normals do not
exist in that pipeline. This also breaks vertex alpha and texgen / lookat.
to the normals. When F3DEX3 is using the "basic" lighting codepath, these are
the model space normals, and when it is using the "advanced" lighting codepath
(point lights, specular, or Fresnel) these are transformed, normalized world
space normals. The X, Y, and Z components map to R, G, and B, with each
dimension's conceptual (-1.0 ... 1.0) range mapped to (0 ... 255). This also
breaks vertex alpha and texgen / lookat.
Some ways to use this for debugging are:
- If the normals have obvious problems (e.g. flickering, or not changing
@@ -124,9 +84,9 @@ Some ways to use this for debugging are:
model space normals or the M matrix. Conversely, if there is a problem with
the standard lighting results (e.g. flickering) but the normals don't have
this problem, the problem is likely in the lighting data.
- Check that the colors don't change based on the camera position, but DO change
as the object rotates, so that the same side of an object in world space is
always the same color.
- If using the "advanced" lighting codepath, check that the colors don't change
based on the camera position, but DO change as the object rotates, so that the
same side of an object in world space is always the same color.
- Make a simple object like an octahedron or sphere, view it in game, and check
that the normals are correct. A normal pointing along +X would be
(1.0, 0.0, 0.0), meaning (255, 128, 128) or pink. A normal pointing along -X

View File

@@ -2,85 +2,39 @@
# What are the tradeoffs for all these new features?
## Vertex Processing RSP Time
In other words, when is F3DEX3 worse than F3DEX2?
See the Microcode Configuration and Performance Results sections above.
## Vertex processing RSP time for occlusion plane
## Overlay 4
In the occlusion plane F3DEX3 configuration, vertex processing is slower than
in F3DEX2. If using this configuration and there is no occlusion plane or it is
occluding almost nothing, the RSP will be slower with no other benefit.
(Note that in the LVP configuration, Overlay 4 is absent; there is no M inverse
transpose matrix discussed below, and the other commands mentioned below are
directly in the microcode without an overlay, due to there being enough IMEM
space.)
However, when the occlusion plane is occluding even a few percent of the
triangles in the scene, the situation changes. This saves RDP time, and most
games are RDP bound, so this trades off RSP time for RDP time and makes the game
faster overall. Plus, RSP time is also saved for the tris which are not drawn,
which can approximately cancel out the extra RSP time for computing the
occlusion plane for all vertices.
F3DEX2 contains Overlay 2, which does lighting, and Overlay 3, which does
clipping (run on any large triangle which extends a large distance offscreen).
These overlays are more RSP assembly code which are loaded into the same space
in IMEM. If the wrong overlay is loaded when the other is needed, the proper
one is loaded and then code jumps to it. Display lists which do not use lighting
can stay on Overlay 3 at all times. Display lists for things that are typically
relatively small on screen, such as characters, can stay on Overlay 2 at all
times, because even when a triangle overlaps the edge of the screen, it
typically moves fully off the screen and is discarded before it reaches the
clipping bounds (2x the screen size).
## Functionality in Overlay 3
In F3DEX2, the only case where the overlays are swapped frequently is for
scenes with lighting, because they have large triangles which often extend far
offscreen (Overlay 3) but also need lighting (Overlay 2). Worst case, the RSP
will load Overlay 2 once for every `SPVertex` command and then load Overlay 3
for every set of `SP*Triangle*` commands.
The following commands are moved to Overlay 3 in F3DEX3 to save IMEM space. This
means that code will have to be loaded from DRAM to run them if Overlays 2 or 4
(for lighting) happen to be loaded already.
- Push and multiply codepaths for `SPMatrix`
- `SPPopMatrix*`
- `SPDma*`
- `SPMemset`
(If you're curious, Overlays 0 and 1 are not related to 2 and 3, and have to do
with starting and stopping RSP tasks. During normal display list execution,
Overlay 1 is always loaded.)
However:
- Multiplying, pushing, and popping matrices is not recommended for performance
or accuracy, and these are not used for most 3D objects in SM64 or OoT.
- `SPDma*` is rarely used except at startup for HLE detection.
- `SPMemset` is a new F3DEX3 command which can improve performance. Plus, it is
typically run shortly after render start, when Overlay 3 is already in IMEM.
F3DEX3 introduces Overlay 4, which can occupy the same IMEM as Overlay 2 and 3.
This overlay contains handlers for:
- Computing the inverse transpose of the model matrix M (abbreviated as mIT),
discussed below
- The codepath for `SPMatrix` with `G_MTX_MUL` set (base version only; this is
moved out of the overlay to normal microcode in the NOC configuration due to
having extra IMEM space available)
- `SPBranchLessZ*`
- `SPDma_io`
Whenever any of these features is needed, the RSP has to swap to Overlay 4. The
next time lighting or clipping is needed, the RSP has to then swap back to
Overlay 2 or 3. The round-trip of these two overlay loads takes about 5
microseconds of DRAM time including overheads. Fortunately, all the above
features other than the mIT matrix are rarely or never used.
The mIT matrix is needed in F3DEX3 because normals are covectors--they stretch
in the opposite direction of an object's scaling. So while you multiply a vertex
by M to transform it from model space to world space, you have to multiply a
normal by M inverse transpose to go to world space. F3DEX2 solves this problem
by instead transforming light directions into model space with M transpose, and
computing the lighting in model space. However, this requires extra DMEM to
store the transformed lights, and adds an additional performance penalty for
point lighting which is absent in F3DEX3. Plus, having world space normals in
F3DEX3 enables Fresnel and specular lighting.
If an object's transformation matrix stack only includes translations,
rotations, and uniform scale (i.e. same scale in X, Y, and Z), then M inverse
transpose is just a rescaled version of M, and the normals can be transformed
with M directly. It is only when the matrix includes nonuniform scales or shear
that M inverse transpose differs from M. The difference gets larger as the scale
or shear gets more extreme.
F3DEX3 provides three options for handling this (see `SPNormalsMode`):
- `G_NORMALS_MODE_FAST`: Use M to transform normals. No performance penalty.
Lighting will be somewhat distorted for objects with nonuniform scale or
shear.
- `G_NORMALS_MODE_AUTO`: The RSP will automatically compute M inverse transpose
whenever M changes. Costs about 3.5 microseconds of DRAM time per matrix, i.e.
per object or skeleton limb which has lighting enabled. Lighting is correct
for nonuniform scale or shear.
- `G_NORMALS_MODE_MANUAL`: You compute M inverse transpose on the CPU and
manually upload it to the RSP every time M changes.
It is recommended to use `G_NORMALS_MODE_FAST` (the default) for most things,
and use `G_NORMALS_MODE_AUTO` only for objects while they currently have a
nonuniform scale (e.g. Mario only while he is squashed).
So there is not a significant practical performance impact from these changes.
## Far clipping removal
@@ -90,6 +44,13 @@ though it can be seen in certain extreme cases. However, it is used on the SM64
title screen for the zoom-in on Mario's face, so this will look slightly
different.
Far clipping can be used to cull tris which are fully "fogged out" if the
background color (no skybox) is also the fog color, for performance benefits.
This effect has a bad reputation in '90s era games for being used as a cheap
trick to hide performance problems, though it's occasionally used in "spooky"
levels in romhacks. In F3DEX3, `SPAlphaCompareCull` can be used instead of far
clipping to cull these triangles which are fully in fog.
The removal of far clipping saved a bunch of DMEM space, and enabled other
changes to the clipping implementation which saved even more DMEM space.
@@ -102,11 +63,11 @@ distance in front of the camera plane.
A few clever romhackers figured out that you could shrink the normals on verts
in your mesh (so their length is less than "1") to make the lighting on those
verts dimmer and create a version of ambient occlusion. In the base vertex
pipeline, F3DEX3 normalizes vertex normals after transforming them, which is
required for most features of the lighting system including packed normals, so
this no longer works. However, F3DEX3 has support for ambient occlusion via
vertex alpha, which accomplishes the same goal with some extra benefits:
verts dimmer and create a version of ambient occlusion. In the "advanced"
lighting codepath, F3DEX3 normalizes vertex normals after transforming them,
which is required for point lights, specular, and Fresnel, so this no longer
works. However, F3DEX3 has support for ambient occlusion via vertex alpha, which
accomplishes the same goal with some extra benefits:
- Much easier to create: just paint the vertex alpha in Blender / fast64. The
scaled normals approach was not supported in fast64 and had to be done with
scripts or by hand.
@@ -118,16 +79,12 @@ vertex alpha, which accomplishes the same goal with some extra benefits:
scaled normals never affect the ambient light, contrary to the concept of
ambient occlusion.
Furthermore, for partial HLE compatibility, the same mesh can have the ambient
occlusion information encoded in both scaled normals and vertex alpha at the
same time. HLE will ignore the vertex alpha AO but use the scaled normals;
F3DEX3 will fix the normals' scale but then apply the AO.
The only case where scaled normals work but F3DEX3 AO doesn't work is for meshes
with vertex alpha actually used for transparency (therefore also no fog).
Note that in LVP mode, scaled normals are supported and work the same way as in
F3DEX2, while ambient occlusion is not supported.
Note that in the "basic" lighting codepath in F3DEX3, vertex normals are treated
the same way as in F3DEX2, so scaled normals are supported there. Ambient
occlusion is also supported there.
## RDP temporary buffers shrinking
@@ -161,20 +118,13 @@ In F3DEX2, the RSP time for drawing non-textured tris was significantly lower
than for textured tris, by skipping a chunk of computation for the texture
coefficients if they were disabled. In F3DEX3, no computation is skipped when
textures are disabled. However, almost all materials use textures, and F3DEX3 is
a little faster at drawing textured tris than F3DEX2. Plus, DRAM access time RSP
-> FIFO and FIFO -> RDP is still saved from not sending the coefficients, and
RDP time savings from avoiding loading a texture are unaffected of course.
a little faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still does
not send the texture cofficients if they are disabled, saving DRAM access time
for RSP -> FIFO and FIFO -> RDP. RDP time savings from avoiding loading a
texture are unaffected of course.
## Obscure semantic differences from F3DEX2 that should never matter in practice
- `SPLoadUcode*` corrupts the current M inverse transpose matrix state. If using
`G_NORMALS_MODE_FAST`, this doesn't matter. If using `G_NORMALS_MODE_AUTO`,
you must send the M matrix to the RSP again after returning to F3DEX3 from the
other microcode (which would normally be done anyway when starting to draw the
next object). If using `G_NORMALS_MODE_MANUAL`, you must send the updated
M inverse transpose matrix to the RSP after returning to F3DEX3 from the other
microcode (which would normally be done anyway when starting to draw the next
object).
- Changing fog settings--i.e. enabling or disabling `G_FOG` in the geometry mode
or executing `SPFogFactor` or `SPFogPosition`--between loading verts and
drawing tris with those verts will lead to incorrect fog values for those

View File

@@ -1,96 +1,71 @@
@page performance Performance Results
# Philosophy
The base version of F3DEX3 was created for RDP bound games like OoT, where new
visual effects are desired and increasing the RSP time a bit does not affect the
overall performance. If your game is RSP bound, using the base version of F3DEX3
will make it slower.
Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on
**all** critical paths in the microcode, including command dispatch, vertex
processing, and triangle processing. Then, the RDP and memory traffic
performance improvements of F3DEX3--56 vertex buffer, auto-batched rendering,
etc.--should further improve performance from there. This means that switching
from F3DEX2 to F3DEX3_LVP_NOC should always improve performance regardless of
whether your game is RSP bound or RDP bound.
# Performance Results
F3DEX3_NOC matches or beats the RSP performance of F3DEX2 on **all** critical
paths in the microcode, including command dispatch, vertex processing, and
triangle processing. Then, the RDP and memory traffic performance improvements
of F3DEX3--56 vertex buffer, auto-batched rendering, etc.--should further
improve overall game performance from there.
## Cycle Counts
These are cycle counts for many key paths in the microcode. Lower numbers are
better. The timings are hand-counted taking into account all pipeline stalls and
all dual-issue conditions. Instruction alignment after branches is sometimes
taken into account, otherwise assumed to be optimal.
all dual-issue conditions. Instruction alignment after branches is usually taken
into account, but in some cases it is assumed to be optimal.
Vertex / lighting numbers assume no special features (texgen, packed normals,
etc.) Tri numbers assume texture, shade, and Z, and not flushing the buffer.
All numbers assume default profiling configuration. Empty cells are "not
measured yet".
All numbers assume default profiling configuration. Tri numbers assume texture,
shade, and Z, and not flushing the buffer. Tri numbers are measured from the
first cycle of the command handler inclusive, to the first cycle of whatever is
after $ra exclusive; this is in order to capture the extra latency and stalls in
F3DEX2.
| | F3DEX2 | F3DEX3_LVP_NOC | F3DEX3_LVP | F3DEX3_NOC | F3DEX3 |
|----------------------------|--------|----------------|------------|------------|--------|
| Command dispatch | 12 | 12 | 12 | 12 | 12 |
| Small RDP command | 14 | 5 | 5 | 5 | 5 |
| Vtx before DMA start | 16 | 17 | 17 | 17 | 17 |
| Vtx pair, no lighting | 54 | 54 | 81 | 79 | 98 |
| Vtx pair, 0 dir lts | Can't | 64 | | | |
| Vtx pair, 1 dir lt | 73 | 70 | 96 | 182 | 201 |
| Vtx pair, 2 dir lts | 76 | 77 | 103 | 211 | 230 |
| Vtx pair, 3 dir lts | 88 | 84 | 110 | 240 | 259 |
| Vtx pair, 4 dir lts | 91 | 91 | 117 | 269 | 288 |
| Vtx pair, 5 dir lts | 103 | 98 | 124 | 298 | 317 |
| Vtx pair, 6 dir lts | 106 | 105 | 131 | 327 | 346 |
| Vtx pair, 7 dir lts | 118 | 112 | 138 | 356 | 375 |
| Vtx pair, 8 dir lts | Can't | 119 | 145 | 385 | 404 |
| Vtx pair, 9 dir lts | Can't | 126 | 152 | 414 | 433 |
| Light dir xfrm, 0 dir lts | Can't | 95 | 95 | None | None |
| Light dir xfrm, 1 dir lt | 141 | 95 | 95 | None | None |
| Light dir xfrm, 2 dir lts | 180 | 96 | 96 | None | None |
| Light dir xfrm, 3 dir lts | 219 | 121 | 121 | None | None |
| Light dir xfrm, 4 dir lts | 258 | 122 | 122 | None | None |
| Light dir xfrm, 5 dir lts | 297 | 147 | 147 | None | None |
| Light dir xfrm, 6 dir lts | 336 | 148 | 148 | None | None |
| Light dir xfrm, 7 dir lts | 375 | 173 | 173 | None | None |
| Light dir xfrm, 8 dir lts | Can't | 174 | 174 | None | None |
| Light dir xfrm, 9 dir lts | Can't | 199 | 199 | None | None |
| Only/2nd tri to offscreen | 27 | 26 | 26 | 26 | 26 |
| 1st tri to offscreen | 28 | 27 | 27 | 27 | 27 |
| Only/2nd tri to clip | 32 | 31 | 31 | 31 | 31 |
| 1st tri to clip | 33 | 32 | 32 | 32 | 32 |
| Only/2nd tri to backface | 38 | 38 | 38 | 38 | 38 |
| 1st tri to backface | 39 | 39 | 39 | 39 | 39 |
| Only/2nd tri to degenerate | 42 | 40 | 40 | 40 | 40 |
| 1st tri to degenerate | 43 | 41 | 41 | 41 | 41 |
| Only/2nd tri to occluded | Can't | Can't | 49 | Can't | 49 |
| 1st tri to occluded | Can't | Can't | 50 | Can't | 50 |
| Only/2nd tri to draw | 172 | 160 | 163 | 160 | 163 |
| 1st tri to draw | 173 | 160 | 163 | 160 | 163 |
Tri numbers are measured from the first cycle of the command handler inclusive,
to the first cycle of whatever is after $ra exclusive. This is in order
to capture the extra latency and stalls in F3DEX2.
## Measurements
Vertex processing time as reported by the performance counter in the `PA`
configuration.
- Scene 1: Kakariko, adult day, from DMT entrance
- Scene 2: Custom empty scene with Suzanne monkey head with 1 dir light
- Scene 3: Same but Suzanne has vertex colors instead of lighting (Link is still
on screen and has lighting)
| Microcode | Scene 1 | Scene 2 | Scene 3 |
|----------------|---------|---------|---------|
| F3DEX3 | 7.41ms | 2.99ms | 2.22ms |
| F3DEX3_NOC | 6.85ms | 2.75ms | 1.98ms |
| F3DEX3_LVP | 4.12ms | 1.59ms | 1.48ms |
| F3DEX3_LVP_NOC | 3.34ms | 1.27ms | 1.16ms |
| F3DEX2 | Can't* | Can't* | Can't* |
| Vertex count | 3557 | 1548 | 1548 |
*F3DEX2 does not contain performance counters, so the portion of the RSP time
taken for vertex processing cannot be measured.
| | F3DEX2 | F3DEX3_NOC | F3DEX3 |
|----------------------------|--------|------------|--------|
| Command dispatch | 12 | 12 | 12 |
| Small RDP command | 14 | 5 | 5 |
| Only/2nd tri to offscreen | 27 | 26 | 26 |
| 1st tri to offscreen | 28 | 27 | 27 |
| Only/2nd tri to clip | 32 | 31 | 31 |
| 1st tri to clip | 33 | 32 | 32 |
| Only/2nd tri to backface | 38 | 38 | 38 |
| 1st tri to backface | 39 | 39 | 39 |
| Only/2nd tri to degenerate | 42 | 40 | 40 |
| 1st tri to degenerate | 43 | 41 | 41 |
| Only/2nd tri to occluded | Can't | Can't | 49 |
| 1st tri to occluded | Can't | Can't | 50 |
| Only/2nd tri to draw | 172 | 159 | 162 |
| 1st tri to draw | 173 | 160 | 163 |
| Vtx before DMA start | 16 | 17 | 17 |
| Vtx pair, no lighting | 54 | 54 | 70 |
| Vtx pair, 0 dir lts | Can't | 65 | 81 |
| Vtx pair, 1 dir lt | 73 | 70 | 86 |
| Vtx pair, 2 dir lts | 76 | 77 | 93 |
| Vtx pair, 3 dir lts | 88 | 84 | 100 |
| Vtx pair, 4 dir lts | 91 | 91 | 107 |
| Vtx pair, 5 dir lts | 103 | 98 | 114 |
| Vtx pair, 6 dir lts | 106 | 105 | 121 |
| Vtx pair, 7 dir lts | 118 | 112 | 128 |
| Vtx pair, 8 dir lts | Can't | 119 | 135 |
| Vtx pair, 9 dir lts | Can't | 126 | 142 |
| Vtx pair, 0 point lts | Can't | TODO | +16 |
| Vtx pair, 1 point lt | TODO | TODO | +16 |
| Vtx pair, 2 point lts | TODO | TODO | +16 |
| Vtx pair, 3 point lts | TODO | TODO | +16 |
| Vtx pair, 4 point lts | TODO | TODO | +16 |
| Vtx pair, 5 point lts | TODO | TODO | +16 |
| Vtx pair, 6 point lts | TODO | TODO | +16 |
| Vtx pair, 7 point lts | TODO | TODO | +16 |
| Vtx pair, 8 point lts | Can't | TODO | +16 |
| Vtx pair, 9 point lts | Can't | TODO | +16 |
| Light dir xfrm, 0 dir lts | Can't | 92 | 92 |
| Light dir xfrm, 1 dir lt | 141 | 92 | 92 |
| Light dir xfrm, 2 dir lts | 180 | 93 | 93 |
| Light dir xfrm, 3 dir lts | 219 | 118 | 118 |
| Light dir xfrm, 4 dir lts | 258 | 119 | 119 |
| Light dir xfrm, 5 dir lts | 297 | 144 | 144 |
| Light dir xfrm, 6 dir lts | 336 | 145 | 145 |
| Light dir xfrm, 7 dir lts | 375 | 170 | 170 |
| Light dir xfrm, 8 dir lts | Can't | 171 | 171 |
| Light dir xfrm, 9 dir lts | Can't | 196 | 196 |

View File

@@ -6,14 +6,14 @@ For an OoT codebase, only a few minor changes are required to use F3DEX3.
However, more changes are recommended to increase performance and enable new
features.
How to modify the microcode in your HackerOoT based romhack (steps may be
similar for other games):
How to modify the microcode in your HackerOoT based romhack (note that this is
already done in HackerOoT, so this is provided as a guide for other games):
- Replace `include/ultra64/gbi.h` in your romhack with `gbi.h` from this repo.
- Make the "Required Changes" listed below.
- Build this repo: install the latest version of `armips`, then `make
F3DEX3_BrZ` or `make F3DEX3_BrW`.
- Copy the microcode binaries (`build/F3DEX3_X/F3DEX3_X.code` and
`build/F3DEX3_X/F3DEX3_X.data`) to somewhere in your romhack repo, e.g. `data`.
`build/F3DEX3_X/F3DEX3_X.data`) to `data` in your romhack repo.
- In `data/rsp.rodata.s`, change the line between `fifoTextStart` and
`fifoTextEnd` to `.incbin "data/F3DEX3_X.code"` (or wherever you put the
binary), and similarly change the line between `fifoDataStart` and
@@ -41,9 +41,12 @@ Both OoT and SM64:
dynamically) (search for `Vp` case-sensitive, `SPViewport`, and `G_MAXZ`),
change the maximum Z value from `G_MAXZ` to `G_NEW_MAXZ` and negate the
Y scale. For more information, see the comment next to `G_MAXZ` in the GBI.
Note that your romhack codebase may have the constant hardcoded, usually as
`511` which is supposed to be `(G_MAXZ/2)`, instead of actually writing
`G_MAXZ`; you need to change these too, there are several of these in SM64.
Note that your romhack codebase may have the constant hardcoded (usually as
`511` which is supposed to be `(G_MAXZ/2)`), instead of actually writing an
expression containing `G_MAXZ`; you need to change these too, there are
several of these in SM64. Fortunately, it is easy to notice if you have failed
to update a Y scale, as anything drawn using that viewport will be upside
down.
- Remove uses of internal GBI features which have been removed in F3DEX3 (see
@ref compatibility for full list). In OoT, the only changes needed are:
- In `src/code/ucode_disas.c`, remove the switch statement cases for
@@ -51,20 +54,22 @@ Both OoT and SM64:
and `G_MW_PERSPNORM`.
- In `src/libultra/gu/lookathil.c`, remove the lines which set the `col`,
`colc`, and `pad` fields.
- In each place `G_MAXZ` is used, a compiler error will be generated;
negate the Y scale in each related viewport and change to `G_NEW_MAXZ`.
- As mentioned above, in each place `G_MAXZ` is used, a compiler error will
be generated; negate the Y scale in each related viewport and change the
Z scale and offset to use `G_NEW_MAXZ`.
- Change your game engine lighting code to set the `type` (formerly `pad1`)
field to 0 in the initialization of any directional light (`Light_t` and
derived structs like `Light` or `Lightsn`). F3DEX3 ignores the state of the
`G_LIGHTING_POSITIONAL` geometry mode bit in all display lists, meaning both
directional and point lights are supported for all display lists (including
vanilla). The light is identified as directional if `type` == 0 or point if
`kc` > 0 (`kc` and `type` are the same byte). This change is required because
otherwise garbage nonzero values may be put in the padding byte, leading
directional lights to be misinterpreted as point lights.
derived structs like `Light` or `Lightsn`). This change is required because
otherwise garbage nonzero values may be put in this byte, which was a padding
byte for a non-point-light microcode but is used to identify the light as
point or directional in a point light microcode.
- The change needed in OoT is: in `src/code/z_lights.c`, in
`Lights_BindPoint`, `Lights_BindDirectional`, and `Lights_NewAndDraw`, set
`l.type` to 0 right before setting `l.col`.
- If your game already had point lighting, use `ENABLE_POINT_LIGHTS` instead
of `G_LIGHTING_POSITIONAL` to indicate that point lights are currently active.
(Static uses of `G_LIGHTING_POSITIONAL` in display lists need not be removed
as this bit is ignored.)
SM64 only:
@@ -72,15 +77,15 @@ SM64 only:
fixed, the vanilla permanent light direction of `{0x28, 0x28, 0x28}` must be
changed to `{0x49, 0x49, 0x49}`, or everything will be too dark. The former
vector is not properly normalized, but F3D through F3DEX2 normalize light
directions in the microcode, so it doesn't matter with those microcodes. In
contrast, F3DEX3 normalizes vertex normals (after transforming them), but
assumes light directions have already been normalized.
directions in the microcode, so it doesn't matter with those microcodes. The
two lighting codepaths in F3DEX3 treat light directions and vertex normals
differently: the fast one works like F3DEX2, but the slow one normalizes
vertex normals after transforming them and does not modify light directions.
Thus in this case, the light directions must already be normalized.
- Matrix stack fix (world space lighting / view matrix in VP instead of in M) is
basically required. If you *really* want camera space lighting, use matrix
stack fix, transform the fixed camera space light direction by V inverse each
frame, and send that to the RSP. This will be faster than the alternative (not
using matrix stack fix and enabling `G_NORMALS_MODE_AUTO` to correct the
matrix).
frame, and send that to the RSP.
## Recommended Changes (Non-Lighting)
@@ -88,18 +93,14 @@ SM64 only:
use `SPLookAt` instead (this is only a few lines change). Also remove any
code which writes `SPClipRatio` or `SPForceMatrix`--these are now no-ops, so
you might as well not write them.
- Avoid using `G_MTX_MUL` in `SPMatrix`. That is, make sure your game engine
computes a matrix stack on the CPU and sends the final matrix for each object
/ limb to the RSP, rather than multiplying matrices on the RSP. OoT already
usually does the former for precision / accuracy reasons and only uses
`G_MTX_MUL` in a couple places (e.g. view * perspective matrix); it is okay to
leave those. This change is recommended because the `G_MTX_MUL` mode of
`SPMatrix` has been moved to Overlay 4 in F3DEX3 (see below), making it
substantially slower than it was in F3DEX2. It still functions the same though
so you can use it if it's really needed.
- Avoid using `G_MTX_MUL` and `G_MTX_PUSH` in `SPMatrix`, and `SPPopMatrix*`,
for performance and accuracy reasons. See the GBI for more information. If
these are only used in a couple non-critical places such as for GUIs, that's
okay.
- Re-export as many display lists (scenes, objects, skeletons, etc.) as possible
with fast64 set to F3DEX3 mode, to take advantage of the substantially larger
vertex buffer, triangle packing commands, "hints" system, etc.
vertex buffer (and eventually when supported by community tools, the triangle
packing commands and "hints" system).
- `#define REQUIRE_SEMICOLONS_AFTER_GBI_COMMANDS` (at the top of, or before
including, the GBI) for a more modern, OoT-style codebase where uses of GBI
commands require semicolons after them. SM64 omits the semicolons sometimes,
@@ -137,10 +138,9 @@ SM64 only:
emulate point lights in a scene with a directional light recomputed per actor.
You can now just send those to the RSP as real point lights, regardless of
whether the display lists are vanilla or new.
- If you are porting a game which already had point lighting (e.g. Majora's
Mask), note that the point light kc, kl, and kq factors have been changed, so
you will need to redesign how game engine light parameters (e.g. "light
radius") map to these parameters.
- If your game already had point lighting, note that the point light kc, kl, and
kq factors have been changed, so you will need to redesign how game engine
light parameters (e.g. "light radius") map to these parameters.
## Changes Required for New Features

View File

@@ -1346,7 +1346,7 @@ tri_noinit: // ra is next cmd, second tri in TRI2, or middle of clipping
vsub $v11, $v4, $v6 // v11 = vertex 2 - vertex 1 (x, y, addr)
vlt $v13, $v2, $v4[1] // v13 = min(v1.y, v2.y), VCO = v1.y < v2.y
bnez $11, return_and_end_mat // Then the whole tri is offscreen, cull
// 22 cycles
// 22 cycles (for tri2 first tri; tri1/only subtract 1 from counts)
vmrg tHPos, $v6, $v4 // v14 = v1.y < v2.y ? v1 : v2 (lower vertex of v1, v2)
vmudh $v29, $v10, $v12[1] // x = (v1 - v2).x * (v1 - v3).y ...
lhu $24, activeClipPlanes
@@ -3147,6 +3147,7 @@ ltbasic_setup_after_xfrm:
j vtx_after_lt_setup
li lbAfter, ltbasic_ao
.align 8
xfrm_light_store_lookat:
vmadh $v29, $v9, lpWrld[1h]
spv lpFinal[0], (xfrmLookatDirs)($zero) // Store lookat. 1st time garbage, 2nd real

10
gbi.h
View File

@@ -2954,8 +2954,14 @@ _DW({ \
/**
* Alpha compare culling. Optimization for cel shading, could also be used for
* other scenarios where tris are being drawn with alpha compare.
* Alpha compare culling. This was originally created as an optimization for cel
* shading, but it can also be used for other scenarios. In particular, it can
* be used with fog to cull tris which are entirely in the fog. This could also
* be accomplished with far clipping, but far clipping is removed in F3DEX3.
* ```
* // Cull tris where all three vertex shade alpha are >= 0xFF
* gSPAlphaCompareCull(..., G_ALPHA_COMPARE_CULL_ABOVE, 0xFF);
* ```
*
* If mode == G_ALPHA_COMPARE_CULL_DISABLE, tris are drawn normally.
*