mirror of
https://github.com/HackerN64/F3DEX3.git
synced 2026-01-21 10:37:45 -08:00
Lots more documentation
This commit is contained in:
47
README.md
47
README.md
@@ -29,8 +29,8 @@ all at the same time!
|
||||
and normals/lighting on the same mesh**, by encoding the normals in the unused
|
||||
2 bytes of each vertex using the 5-6-5 bit encoding by HailToDodongo from
|
||||
[Tiny3D](https://github.com/HailToDodongo/tiny3d). Model-space precision of
|
||||
the normals is reduced, but this is rarely noticeable and there is barely any
|
||||
performance penalty compared to regular normals without vertex colors.
|
||||
the normals is reduced, but this is rarely noticeable, and the performance is
|
||||
nearly identical to vanilla normals (without simultaneous vertex colors).
|
||||
- New geometry mode bit `G_AMBOCCLUSION` enables **ambient occlusion** for
|
||||
opaque materials. Paint the shadow map into the vertex alpha channel; separate
|
||||
factors (set with `SPAmbOcclusion`) control how much this affects the ambient
|
||||
@@ -73,7 +73,7 @@ all at the same time!
|
||||
RSP time as fewer verts have to be reloaded and re-transformed, and also makes
|
||||
display lists shorter.
|
||||
- New **occlusion plane** system allows the placement of a 3D quadrilateral
|
||||
where objects behind this plane in screen space are culled. This can
|
||||
where triangles behind this plane in screen space are culled. This can
|
||||
dramatically improve RDP performance by reducing overdraw in scenes with walls
|
||||
in the middle, such as a city or an indoor scene.
|
||||
- If a material display list being drawn is the same as the last material, the
|
||||
@@ -92,7 +92,8 @@ all at the same time!
|
||||
shade alpha values are all below or above a settable threshold. This
|
||||
**substantially reduces the performance penalty of cel shading**--only tris
|
||||
which "straddle" the cel threshold are drawn twice, the others are only drawn
|
||||
once.
|
||||
once. This can also be used to **cull tris which are fully in fog**, replacing
|
||||
far clipping which is removed in F3DEX3.
|
||||
- A new "hints" system encodes the expected size of the target display list into
|
||||
call, branch, and return DL commands. This allows only the needed number of DL
|
||||
commands in the next DL to be fetched, rather than always fetching full
|
||||
@@ -107,26 +108,30 @@ all at the same time!
|
||||
value. This can be used for clearing the Z buffer or filling the framebuffer
|
||||
or the letterbox with a solid color **faster than the RDP can in fill mode**.
|
||||
Practical performance may vary due to scheduling constraints.
|
||||
- The key codepaths for triangle draw and vertex processing (assuming lighting
|
||||
enabled and the occlusion plane disabled with the `NOC` configuration) are
|
||||
**slightly faster than in F3DEX2**.
|
||||
- New `SPFlush` command can ensure that the RDP starts clearing the framebuffer
|
||||
as soon as possible during the frame, instead of waiting a short time for
|
||||
further RSP processing.
|
||||
- The key codepaths for command dispatch, triangle draw, and vertex processing
|
||||
(assuming lighting enabled and the occlusion plane disabled with the `NOC`
|
||||
configuration) are **slightly faster than in F3DEX2**.
|
||||
|
||||
### Miscellaneous
|
||||
|
||||
- **Z-fighting of decals has been nearly eliminated**, with only a modest
|
||||
increase in overdraw of very close occluding geometry. This is based on a
|
||||
technique developed by SGI, neglected and removed by Nintendo, and re-added
|
||||
by Rare; the F3DEX3 version improves upon it by choosing optimal parameters
|
||||
and automatically enabling it for all decals with no code or DL changes. In
|
||||
addition, the reduction in Z buffer precision from F3DEX(1) to F3DEX2 has been
|
||||
reversed, and additional Z buffer precision beyond F3DEX(1) has been added.
|
||||
increase in overdraw onto the decal of very close occluding geometry. This is
|
||||
based on a technique developed by SGI, neglected and removed by Nintendo, and
|
||||
re-added by Rare; the F3DEX3 version improves upon it by choosing optimal
|
||||
parameters and automatically enabling it for all decals with no code or DL
|
||||
changes.
|
||||
- The reduction in Z buffer precision from F3DEX(1) to F3DEX2 has been reversed,
|
||||
and **additional Z buffer precision** beyond F3DEX(1) has been added.
|
||||
- **Point lighting** has been redesigned. The appearance when a light is close
|
||||
to an object has been improved. Fixed a bug in F3DEX2/ZEX point lighting where
|
||||
a Z component was accidentally doubled in the point lighting calculations. The
|
||||
quadratic point light attenuation factor is now an E3M5 floating-point number.
|
||||
The performance penalty for using large numbers of point lights has been
|
||||
reduced.
|
||||
- Maximum number of directional / point **lights raised from 7 to 9**. Minimum
|
||||
quadratic point light attenuation factor is now an E3M5 floating-point number
|
||||
for a wider representable range. The performance penalty for using large
|
||||
numbers of point lights has been reduced.
|
||||
- Maximum number of directional / point lights **raised from 7 to 9**. Minimum
|
||||
number of directional / point lights lowered from 1 to 0 (F3DEX2 required at
|
||||
least one). Also supports loading all lights in one DMA transfer
|
||||
(`SPSetLights`), rather than one per light.
|
||||
@@ -136,15 +141,15 @@ all at the same time!
|
||||
parameters are encoded in the command. With some limitations, this allows the
|
||||
tint colors of cel shading to **match scene lighting** with no code
|
||||
intervention. Also useful for other lighting-dependent effects.
|
||||
- The microcode automatically switches between two lighting implementations
|
||||
- The microcode automatically switches between **two lighting implementations**
|
||||
depending on which visual features are selected in the particular material.
|
||||
The "basic lighting" codepath--which is roughly the same speed as F3DEX2--
|
||||
supports all F3DEX2 features (directional lights, texgen), plus packed
|
||||
normals, ambient occlusion, and light-to-alpha. The "advanced lighting"
|
||||
codepath, which is slower, adds support for point lights, specular, and
|
||||
Fresnel. You only pay the performance penalty for the features you use, and
|
||||
only for the objects which use them.
|
||||
|
||||
Fresnel. You only pay the performance penalty for the objects which use these
|
||||
advanced features.
|
||||
|
||||
|
||||
### Profiling
|
||||
|
||||
|
||||
@@ -20,9 +20,9 @@ update the occlusion plane after updating the camera and write the pointer to
|
||||
this occlusion plane into the existing DL command near the beginning.
|
||||
|
||||
3. Create a system in your game engine for dynamically choosing or creating an
|
||||
occlusion plane. For example, you might have a set of pre-determined occlusion
|
||||
planes in the scene, and at runtime pick the one which you think is most
|
||||
optimal. Some criteria to use for this include:
|
||||
occlusion plane. (See the implementation in HackerOoT.) For example, you might
|
||||
have a set of pre-determined occlusion planes in the scene, and at runtime pick
|
||||
the one which you think is most optimal. Some criteria to use for this include:
|
||||
- whether the camera is on the correct side of the occlusion plane
|
||||
- the distance from the camera to the full (infinite) plane
|
||||
- how far the point of the camera projected onto the full (infinite) plane is
|
||||
|
||||
@@ -29,8 +29,8 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| Command | Bin | C | Perf | Notes |
|
||||
|----------------------|-----|-----|------|-------|
|
||||
| `DPLoadTLUT*` | = | = | Up | Load is not sent to RDP if repeated in auto-batched rendering. See the GBI comment near `SPDontSkipTexLoadsAcross`. This is a performance optimization only and doesn't affect on-screen output unless the game is buggy / misusing the feature, so this behavior need not be emulated in HLE. |
|
||||
| `DPLoadBlock*` | = | = | Up | Same as `DPLoadTLUT*` above. |
|
||||
| `DPLoadTile*` | = | = | Up | Same as `DPLoadTLUT*` above. |
|
||||
| `DPLoadBlock*` | = | = | Up | Same as `DPLoadTLUT*` above. |
|
||||
| `DPLoadTile*` | = | = | Up | Same as `DPLoadTLUT*` above. |
|
||||
| `SPSetOtherMode` | = | = | | |
|
||||
| All other `DP*` | = | = | | Microcode generally can't change RDP command behavior. |
|
||||
|
||||
@@ -48,13 +48,13 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| `G_MV_POINT` | Rem | Rem | | Removed because the internal vertex format is no longer a multiple of 8 (DMA word). |
|
||||
| `SPTexture` | = | = | | |
|
||||
| `SPTextureL` | = | = | | HW V1 workaround; long since deprecated. |
|
||||
| `SP1Triangle` | = | = | Up | Some of the new features in F3DEX3 (occlusion plane, alpha compare culling, decal fix) are during triangle processing.
|
||||
| `SP1Triangle` | = | = | Up | Some of the new features in F3DEX3 (occlusion plane, alpha compare culling, decal fix) are during triangle processing. |
|
||||
| `SP2Triangles` | = | = | Up | Same as `SP1Triangle` above. |
|
||||
| `SP1Quadrangle` | = | = | Up | Same as `SP1Triangle` above. |
|
||||
| `SPTriStrip` | New | New | Up | New command that draws 5 tris from 7 indexes, see GBI. |
|
||||
| `SPTriFan` | New | New | Up | New command that draws 5 tris from 7 indexes, see GBI. |
|
||||
| `SPMemset` | New | New | Up | New command that memsets a RDRAM region faster than the RDP can, for framebuffer or Z-buffer clear. |
|
||||
| `G_LINE3D` | Rem | Rem | | Removed; no-op in F3DEX2. |
|
||||
| `G_LINE3D` | Rem | Rem | | Removed; was a no-op in F3DEX2. |
|
||||
|
||||
### Control Logic
|
||||
|
||||
@@ -74,7 +74,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| `G_MW_SEGMENT` | = | = | | |
|
||||
| `G_MWO_SEGMENT_*` | = | = | | These were never needed. |
|
||||
| `SPFlush` | New | New | Up | This is a performance optimization only and can't be HLE emulated, so it should be treated as a no-op. |
|
||||
| `G*` (`Gfx` subtypes) | ? | ? | | Deprecated. These did not fully reflect the bits usage in actual commands even in F3DEX2. These have mostly not been updated for F3DEX3. |
|
||||
| `G*` (`Gfx` subtypes) | ? | ? | | Deprecated. These did not fully reflect the bits usage in actual commands even in F3DEX2. Almost none of these have been updated for F3DEX3. |
|
||||
|
||||
### 3D Space
|
||||
|
||||
@@ -82,7 +82,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
|----------------------|-----|-----|------|-------|
|
||||
| `Mtx` | = | = | | |
|
||||
| `SPMatrix` | Chg | = | * | Encoding changed due to multiple flags below changing. |
|
||||
| `G_MTX_PUSH` | = | = | Down | `SPMatrix` processing with `G_MTX_PUSH` set is moved to Overlay 3 (slower) as games should not use the RSP matrix stack for accuracy and performance reasons (see GBI). |
|
||||
| `G_MTX_PUSH` | = | = | Down | `SPMatrix` processing with `G_MTX_PUSH` set is moved to Overlay 3 (slower) as games generally should not use the RSP matrix stack for accuracy and performance reasons (see GBI). |
|
||||
| `G_MTX_NOPUSH` | = | = | | |
|
||||
| `G_MTX_LOAD` | Chg | = | | Encoding inverted (in SPMatrix, not in the definition of `G_MTX_LOAD`). |
|
||||
| `G_MTX_MUL` | Chg | = | | Encoding inverted (in SPMatrix, not in the definition of `G_MTX_MUL`). |
|
||||
@@ -92,20 +92,20 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| `G_MV_TEMPMTX0` | Chg | = | | Encoding changed. |
|
||||
| `G_MV_VPMTX` | Chg | New | | New name for `G_MV_PMTX`, encoding changed. |
|
||||
| `G_MV_TEMPMTX1` | Chg | = | | Encoding changed. |
|
||||
| `SPPopMatrix*` | Chg | = | Down | Moved to Overlay 3 (slower) as games should not use the RSP matrix stack for accuracy and performance reasons (see GBI). Encoding is changed due to `G_MV_MMTX` changing. |
|
||||
| `SPPopMatrix*` | Chg | = | Down | Moved to Overlay 3 (slower) as games generally should not use the RSP matrix stack for accuracy and performance reasons (see GBI). Encoding is changed due to `G_MV_MMTX` changing. |
|
||||
| `SPForceMatrix` | Chg | Chg | | Converted into no-op. |
|
||||
| `G_MV_MATRIX` | Rem | Rem | | Removed. |
|
||||
| `G_MW_MATRIX` | Rem | Rem | | Removed. |
|
||||
| `G_MW_FORCEMTX` | Rem | Rem | | Removed. |
|
||||
| `SPViewport` | * | * | | Command itself is the same, but see `Vp` below. |
|
||||
| `Vp_t` / `Vp` | Chg | Chg | | The Y scale is now negated, and the Z values are different due to the change from `G_MAXZ` to `G_NEW_MAXZ`.
|
||||
| `Vp_t` / `Vp` | Chg | Chg | | The Y scale is now negated, and the Z values are different due to the change from `G_MAXZ` to `G_NEW_MAXZ`. |
|
||||
| `G_MAXZ` | Rem | Rem | | Replaced with `G_NEW_MAXZ`. The name change is to force you to update your code--especially viewport definitions with hardcoded constants which are NOT defined in terms of `G_MAXZ`. |
|
||||
| `G_NEW_MAXZ` | New | New | | The equivalent of `G_MAXZ` constant used in viewport calculations. |
|
||||
| `G_MV_VIEWPORT` | = | = | | |
|
||||
| `SPPerspNormalize` | Chg | = | | Encoding changed. |
|
||||
| `G_MW_PERSPNORM` | Rem | Rem | | Removed. The perspective normalization factor is set via `G_MW_FX` with the changed encoding of `SPPerspNormalize`. |
|
||||
| `G_MWO_PERSPNORM` | New | New | | |
|
||||
| `SPClipRatio` | Chg | Chg | | Converted into no-op. It is not possible to change the clip ratio from 2 in F3DEX3. |
|
||||
| `SPClipRatio` | Chg | Chg | | Converted into no-op. It is not possible to change the clip ratio from 2 in F3DEX3. Changing the clip ratio was rarely used in production games. |
|
||||
| `G_MW_CLIP` | Rem | Rem | | Removed. See `SPClipRatio` above. |
|
||||
|
||||
### Lighting
|
||||
@@ -113,9 +113,9 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| Command | Bin | C | Perf | Notes |
|
||||
|----------------------|-----|-----|------|-------|
|
||||
| `Light_t`, `Light` | Chg | * | | `type` field must be set to 0 (`LIGHT_TYPE_DIR`) to indicate directional light. `size` field for specular added. Otherwise the same, though note that now there is not an extra 8 bytes of padding between lights (the offset between them is 16, not 24). |
|
||||
| `LIGHT_TYPE_DIR` | New | New | | New macro, but the encoding is the same as F3DEX2_PL. |
|
||||
| `PointLight_t` | Chg | * | | Same changes as `Light_t`. Also note that the `kq` field is now interpreted as an E3M5 floating-point number. |
|
||||
| `LIGHT_TYPE_POINT` | New | New | | New macro, but the encoding is the same as F3DEX2_PL. |
|
||||
| `LIGHT_TYPE_DIR` | New | New | | New macro, but the encoding is the same as in F3DEX2_PL. |
|
||||
| `PointLight_t` | Chg | * | | Same changes as `Light_t`. Also the `kq` field is now interpreted as an E3M5 floating-point number. |
|
||||
| `LIGHT_TYPE_POINT` | New | New | | New macro, but the encoding is the same as in F3DEX2_PL. |
|
||||
| `Ambient_t`, `Ambient` | = | = | | Note that you must use `Ambient`, not `Light`, for the ambient light if you have 9 directional/point lights. |
|
||||
| `Lights1`, `Lights2`, ... | Chg | * | | The ambient light is at the end, not the beginning. The data layout matches the RSP internal data layout to enable `SPSetLights`. |
|
||||
| `Lightsn` | Chg | * | | Same as `Lights1` etc. Also, now 9 directional/point lights. |
|
||||
@@ -127,7 +127,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| `G_MWO_NUMLIGHT` | = | = | | |
|
||||
| `NUML` | Chg | = | | Encoding changed. |
|
||||
| `NUMLIGHTS_*` | Chg | = | | Deprecated as these are just defined equal to their number, because F3DEX3 supports zero lights. |
|
||||
| `LIGHT_*` | = | = | | Deprecated and were never useful. |
|
||||
| `LIGHT_*` | = | = | | Deprecated and were not useful in F3DEX2 either. |
|
||||
| `SPLight` | Chg | = | | Encoding changed. Note that you must use `SPAmbient`, not `SPLight`, for the ambient light if you have 9 directional/point lights. Also note that you should usually use `SPSetLights` unless you need to set individual lights without affecting the others. |
|
||||
| `SPAmbient` | New | New | | New command to upload the ambient light. If you have 0-8 directional/point lights, you can also use `SPLight` for this (slightly slower), but if you have 9 directional/point lights you must use `SPAmbient`. |
|
||||
| `SPLightColor*` | Chg | = | | Encoding changed. |
|
||||
@@ -138,7 +138,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| `G_MWO_bLIGHT_*` | Chg | = | | Encodings changed. No longer needed. |
|
||||
| `G_MVO_L*` | Rem | Rem | | Removed. |
|
||||
| `SPCameraWorld` | New | New | | New command to set the camera position for Fresnel. |
|
||||
| `PlainVtx` | New | New | | For `SPCameraWorld`.
|
||||
| `PlainVtx` | New | New | | For `SPCameraWorld`. |
|
||||
| `SPLookAt` | New | New | | Replaces `SPLookAtX` and `SPLookAtY`. |
|
||||
| `SPLookAtX` | Chg | * | | Encoding changed; in an attempt at backwards compatibility, defined as `SPLookAt`, which works with basic usage. |
|
||||
| `SPLookAtY` | Chg | * | | Converted to no-op. |
|
||||
@@ -155,7 +155,7 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
|--------------------------|-----|-----|------|-------|
|
||||
| `SP*GeometryMode*` | * | * | | Commands themselves are the same, but many new geometry mode flags, see below. |
|
||||
| `G_ZBUFFER` | = | = | | |
|
||||
| `G_TEXTURE_ENABLE` | = | = | | Very old (F3D / HW v1) display lists with this bit set will no longer crash on F3DEX3, unlike F3DEX2. |
|
||||
| `G_TEXTURE_ENABLE` | = | = | | Very old (F3D / HW v1) display lists with this bit set will crash on F3DEX2, but not on F3DEX3. |
|
||||
| `G_SHADE` | = | = | | |
|
||||
| `G_ATTROFFSET_ST_ENABLE` | New | New | | New geometry mode bit that enables ST attribute offsets, usually for smooth scrolling. |
|
||||
| `SPAttrOffsetST` | New | New | | New command which writes ST attribute offsets using `G_MWO_ATTR_OFFSET_*`. |
|
||||
@@ -199,13 +199,11 @@ e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
|
||||
| `SPDontSkipTexLoadsAcross` | New | New | Up | New command which locally cancels auto-batched rendering by writing an invalid address to `G_MWO_LAST_MAT_DL_ADDR`. |
|
||||
| `G_MWO_LAST_MAT_DL_ADDR` | New | New | | |
|
||||
| `SPAlphaCompareCull` | New | New | Up | New command which enables culling of tris based on shade alpha values, for cel shading. Normal use of this command in cel shading is a performance optimization only and doesn't affect on-screen output, so it can be treated as a no-op by an initial HLE implementation. But it is easy to write a display list where it does affect on-screen output, so a good HLE implementation should emulate it. |
|
||||
| `G_ALPHA_COMPARE_CULL_DISABLE` | New | New | | Settings for `SPAlphaCompareCull`. |
|
||||
| `G_ALPHA_COMPARE_CULL_BELOW` | New | New | | Settings for `SPAlphaCompareCull`. |
|
||||
| `G_ALPHA_COMPARE_CULL_ABOVE` | New | New | | Settings for `SPAlphaCompareCull`. |
|
||||
| `G_ALPHA_COMPARE_CULL_*` | New | New | | Settings for `SPAlphaCompareCull`. |
|
||||
| `G_MWO_ALPHA_COMPARE_CULL` | New | New | | |
|
||||
| `MoveWd` | = | = | | Regular/valid encodings are the same. |
|
||||
| `MoveHalfwd` | New | New | | Like `MoveWd` but writes 2 bytes instead of 4. |
|
||||
| `G_MW_FX` | New | New | | New moveword table index for base address for many parameters. |
|
||||
| `G_SPECIAL_1` | Rem | Rem | | Removed; in F3DEX2, triggered MVP matrix recalculation. |
|
||||
| `G_SPECIAL_2` | Rem | Rem | | Removed; no-op in F3DEX2. |
|
||||
| `G_SPECIAL_3` | Rem | Rem | | Removed; no-op in F3DEX2. |
|
||||
| `G_SPECIAL_2` | Rem | Rem | | Removed; was a no-op in F3DEX2. |
|
||||
| `G_SPECIAL_3` | Rem | Rem | | Removed; was a no-op in F3DEX2. |
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
# Microcode Configuration
|
||||
|
||||
There are several selectable configuration settings when building F3DEX3, which
|
||||
There are a few selectable configuration settings when building F3DEX3, which
|
||||
can be enabled in any combination. With a couple minor exceptions, none of these
|
||||
settings affect the GBI--in fact, you can swap between the microcode versions on
|
||||
a per-frame basis if you build multiple versions into your romhack.
|
||||
@@ -30,65 +30,21 @@ which version to use on the profiling results from the previous frame: if the
|
||||
RSP is the bottleneck (e.g. the RDP `CLK - CMD` is high), use the NOC version,
|
||||
and otherwise use the base version.
|
||||
|
||||
## Legacy Vertex Pipeline (LVP)
|
||||
|
||||
The primary tradeoff for all the new lighting features in F3DEX3 is increased
|
||||
RSP time for vertex processing. The base version of F3DEX3 takes about
|
||||
**2-2.5x** more RSP time for vertex processing than F3DEX2 (see Performance
|
||||
Results section below), assuming no lighting or directional lights only. You
|
||||
should use the F3DEX3 performance counters (see below) to determine whether your
|
||||
game is usually RSP or RDP bound.
|
||||
|
||||
If your game is usually RDP bound--like OoT--this generally will not affect the
|
||||
game's overall framerate, so you should stick with base F3DEX3:
|
||||
- The increased time only applies to vertex processing, not triangle processing
|
||||
or other miscellaneous microcode tasks. So the total RSP cycles spent doing
|
||||
useful work during the frame is only modestly increased.
|
||||
- The increase in time is only RSP cycles; there is no additional memory
|
||||
traffic, so the RDP time is not directly affected.
|
||||
- In scenes which are complex enough to fill the RSP->RDP FIFO in DRAM, the RSP
|
||||
usually spends a significant fraction of time waiting for the FIFO to not be
|
||||
full, as revealed by the performance counters. In these cases, slower vertex
|
||||
processing simply means less time spent waiting, and little to no change in
|
||||
total RSP time.
|
||||
- When the FIFO does not fill up, usually the RSP takes significantly less time
|
||||
during the frame compared to the RDP, so increased RSP time usually does not
|
||||
affect the overall framerate.
|
||||
|
||||
However, for RSP bound or extremely optimized (Kaze Emanuar) games, base F3DEX3
|
||||
can become a bottleneck, so the Legacy Vertex Pipeline (LVP) configuration has
|
||||
been introduced.
|
||||
|
||||
This configuration replaces F3DEX3's native vertex and lighting code with a
|
||||
faster version based on the same algorithms as F3DEX2. This removes:
|
||||
- Point lighting
|
||||
- F3DEX3 lighting features: packed normals, ambient occlusion, light-to-alpha
|
||||
(cel shading), Fresnel, and specular lighting
|
||||
- ST attribute offsets
|
||||
|
||||
However, it retains all other F3DEX3 features:
|
||||
- 56 verts, 9 directional lights
|
||||
- Occlusion plane (optional with NOC configuration)
|
||||
- All features not related to vertex/lighting: auto-batched rendering, packed 5
|
||||
triangles commands, hints system, etc.
|
||||
|
||||
With both LVP and NOC enabled, F3DEX3 is faster on the RSP than F3DEX2 (see
|
||||
@ref performance).
|
||||
|
||||
## Profiling
|
||||
|
||||
As mentioned above, F3DEX3 includes many performance counters. There are far too
|
||||
many counters for a single microcode to maintain, so multiple configurations of
|
||||
the microcode can be built, each containing a different set of performance
|
||||
counters. These can be swapped while the game is running so the full set of
|
||||
counters can be effectively accessed over multiple frames.
|
||||
F3DEX3 includes many performance counters. There are far too many counters for a
|
||||
single microcode to maintain, so multiple configurations of the microcode can be
|
||||
built, each containing a different set of performance counters. These can be
|
||||
swapped while the game is running so the full set of counters can be effectively
|
||||
accessed over multiple frames.
|
||||
|
||||
There are a total of 21 performance counters, including:
|
||||
- Counts of vertices, triangles, rectangles, matrices, DL commands, etc.
|
||||
- Times the microcode was processing vertices, processing triangles, stalled
|
||||
because the RDP FIFO in DMEM was full, and stalled waiting for DMAs to finish
|
||||
- A counter enabling a rough measurement of how long the RDP was stalled
|
||||
waiting for RDRAM for I/O to the framebuffer / Z buffer
|
||||
waiting for RDRAM for I/O to the framebuffer / Z buffer (spoiler: often
|
||||
half to two thirds of the total RDP time!)
|
||||
|
||||
The default configuration of F3DEX3 provides a few of the most basic counters.
|
||||
The additional profiling configurations, called A, B, and C (for example
|
||||
@@ -103,7 +59,9 @@ because their removal does not affect the RDP render time.
|
||||
Use `BrZ` if the microcode is replacing F3DEX2 or an earlier F3D version (i.e.
|
||||
SM64), or `BrW` if the microcode is replacing F3DZEX (i.e. OoT or MM). This
|
||||
controls whether `SPBranchLessZ*` uses the vertex's W coordinate or screen Z
|
||||
coordinate.
|
||||
coordinate. If you are creating a new project for any game without using vanilla
|
||||
scenes, and you're considering using this instruction for LoD, you should use
|
||||
`BrW`.
|
||||
|
||||
## Debug Normals (`dbgN`)
|
||||
|
||||
@@ -113,10 +71,12 @@ version intended to be shipped. It can still be enabled by changing
|
||||
|
||||
To help debug lighting issues when integrating F3DEX3 into your romhack, this
|
||||
feature causes the vertex colors of any material with lighting enabled to be set
|
||||
to the transformed, normalized world space normals. The X, Y, and Z components
|
||||
map to R, G, and B, with each dimension's conceptual (-1.0 ... 1.0) range mapped
|
||||
to (0 ... 255). This is not compatible with LVP as world space normals do not
|
||||
exist in that pipeline. This also breaks vertex alpha and texgen / lookat.
|
||||
to the normals. When F3DEX3 is using the "basic" lighting codepath, these are
|
||||
the model space normals, and when it is using the "advanced" lighting codepath
|
||||
(point lights, specular, or Fresnel) these are transformed, normalized world
|
||||
space normals. The X, Y, and Z components map to R, G, and B, with each
|
||||
dimension's conceptual (-1.0 ... 1.0) range mapped to (0 ... 255). This also
|
||||
breaks vertex alpha and texgen / lookat.
|
||||
|
||||
Some ways to use this for debugging are:
|
||||
- If the normals have obvious problems (e.g. flickering, or not changing
|
||||
@@ -124,9 +84,9 @@ Some ways to use this for debugging are:
|
||||
model space normals or the M matrix. Conversely, if there is a problem with
|
||||
the standard lighting results (e.g. flickering) but the normals don't have
|
||||
this problem, the problem is likely in the lighting data.
|
||||
- Check that the colors don't change based on the camera position, but DO change
|
||||
as the object rotates, so that the same side of an object in world space is
|
||||
always the same color.
|
||||
- If using the "advanced" lighting codepath, check that the colors don't change
|
||||
based on the camera position, but DO change as the object rotates, so that the
|
||||
same side of an object in world space is always the same color.
|
||||
- Make a simple object like an octahedron or sphere, view it in game, and check
|
||||
that the normals are correct. A normal pointing along +X would be
|
||||
(1.0, 0.0, 0.0), meaning (255, 128, 128) or pink. A normal pointing along -X
|
||||
|
||||
@@ -2,85 +2,39 @@
|
||||
|
||||
# What are the tradeoffs for all these new features?
|
||||
|
||||
## Vertex Processing RSP Time
|
||||
In other words, when is F3DEX3 worse than F3DEX2?
|
||||
|
||||
See the Microcode Configuration and Performance Results sections above.
|
||||
## Vertex processing RSP time for occlusion plane
|
||||
|
||||
## Overlay 4
|
||||
In the occlusion plane F3DEX3 configuration, vertex processing is slower than
|
||||
in F3DEX2. If using this configuration and there is no occlusion plane or it is
|
||||
occluding almost nothing, the RSP will be slower with no other benefit.
|
||||
|
||||
(Note that in the LVP configuration, Overlay 4 is absent; there is no M inverse
|
||||
transpose matrix discussed below, and the other commands mentioned below are
|
||||
directly in the microcode without an overlay, due to there being enough IMEM
|
||||
space.)
|
||||
However, when the occlusion plane is occluding even a few percent of the
|
||||
triangles in the scene, the situation changes. This saves RDP time, and most
|
||||
games are RDP bound, so this trades off RSP time for RDP time and makes the game
|
||||
faster overall. Plus, RSP time is also saved for the tris which are not drawn,
|
||||
which can approximately cancel out the extra RSP time for computing the
|
||||
occlusion plane for all vertices.
|
||||
|
||||
F3DEX2 contains Overlay 2, which does lighting, and Overlay 3, which does
|
||||
clipping (run on any large triangle which extends a large distance offscreen).
|
||||
These overlays are more RSP assembly code which are loaded into the same space
|
||||
in IMEM. If the wrong overlay is loaded when the other is needed, the proper
|
||||
one is loaded and then code jumps to it. Display lists which do not use lighting
|
||||
can stay on Overlay 3 at all times. Display lists for things that are typically
|
||||
relatively small on screen, such as characters, can stay on Overlay 2 at all
|
||||
times, because even when a triangle overlaps the edge of the screen, it
|
||||
typically moves fully off the screen and is discarded before it reaches the
|
||||
clipping bounds (2x the screen size).
|
||||
## Functionality in Overlay 3
|
||||
|
||||
In F3DEX2, the only case where the overlays are swapped frequently is for
|
||||
scenes with lighting, because they have large triangles which often extend far
|
||||
offscreen (Overlay 3) but also need lighting (Overlay 2). Worst case, the RSP
|
||||
will load Overlay 2 once for every `SPVertex` command and then load Overlay 3
|
||||
for every set of `SP*Triangle*` commands.
|
||||
The following commands are moved to Overlay 3 in F3DEX3 to save IMEM space. This
|
||||
means that code will have to be loaded from DRAM to run them if Overlays 2 or 4
|
||||
(for lighting) happen to be loaded already.
|
||||
- Push and multiply codepaths for `SPMatrix`
|
||||
- `SPPopMatrix*`
|
||||
- `SPDma*`
|
||||
- `SPMemset`
|
||||
|
||||
(If you're curious, Overlays 0 and 1 are not related to 2 and 3, and have to do
|
||||
with starting and stopping RSP tasks. During normal display list execution,
|
||||
Overlay 1 is always loaded.)
|
||||
However:
|
||||
- Multiplying, pushing, and popping matrices is not recommended for performance
|
||||
or accuracy, and these are not used for most 3D objects in SM64 or OoT.
|
||||
- `SPDma*` is rarely used except at startup for HLE detection.
|
||||
- `SPMemset` is a new F3DEX3 command which can improve performance. Plus, it is
|
||||
typically run shortly after render start, when Overlay 3 is already in IMEM.
|
||||
|
||||
F3DEX3 introduces Overlay 4, which can occupy the same IMEM as Overlay 2 and 3.
|
||||
This overlay contains handlers for:
|
||||
- Computing the inverse transpose of the model matrix M (abbreviated as mIT),
|
||||
discussed below
|
||||
- The codepath for `SPMatrix` with `G_MTX_MUL` set (base version only; this is
|
||||
moved out of the overlay to normal microcode in the NOC configuration due to
|
||||
having extra IMEM space available)
|
||||
- `SPBranchLessZ*`
|
||||
- `SPDma_io`
|
||||
|
||||
Whenever any of these features is needed, the RSP has to swap to Overlay 4. The
|
||||
next time lighting or clipping is needed, the RSP has to then swap back to
|
||||
Overlay 2 or 3. The round-trip of these two overlay loads takes about 5
|
||||
microseconds of DRAM time including overheads. Fortunately, all the above
|
||||
features other than the mIT matrix are rarely or never used.
|
||||
|
||||
The mIT matrix is needed in F3DEX3 because normals are covectors--they stretch
|
||||
in the opposite direction of an object's scaling. So while you multiply a vertex
|
||||
by M to transform it from model space to world space, you have to multiply a
|
||||
normal by M inverse transpose to go to world space. F3DEX2 solves this problem
|
||||
by instead transforming light directions into model space with M transpose, and
|
||||
computing the lighting in model space. However, this requires extra DMEM to
|
||||
store the transformed lights, and adds an additional performance penalty for
|
||||
point lighting which is absent in F3DEX3. Plus, having world space normals in
|
||||
F3DEX3 enables Fresnel and specular lighting.
|
||||
|
||||
If an object's transformation matrix stack only includes translations,
|
||||
rotations, and uniform scale (i.e. same scale in X, Y, and Z), then M inverse
|
||||
transpose is just a rescaled version of M, and the normals can be transformed
|
||||
with M directly. It is only when the matrix includes nonuniform scales or shear
|
||||
that M inverse transpose differs from M. The difference gets larger as the scale
|
||||
or shear gets more extreme.
|
||||
|
||||
F3DEX3 provides three options for handling this (see `SPNormalsMode`):
|
||||
- `G_NORMALS_MODE_FAST`: Use M to transform normals. No performance penalty.
|
||||
Lighting will be somewhat distorted for objects with nonuniform scale or
|
||||
shear.
|
||||
- `G_NORMALS_MODE_AUTO`: The RSP will automatically compute M inverse transpose
|
||||
whenever M changes. Costs about 3.5 microseconds of DRAM time per matrix, i.e.
|
||||
per object or skeleton limb which has lighting enabled. Lighting is correct
|
||||
for nonuniform scale or shear.
|
||||
- `G_NORMALS_MODE_MANUAL`: You compute M inverse transpose on the CPU and
|
||||
manually upload it to the RSP every time M changes.
|
||||
|
||||
It is recommended to use `G_NORMALS_MODE_FAST` (the default) for most things,
|
||||
and use `G_NORMALS_MODE_AUTO` only for objects while they currently have a
|
||||
nonuniform scale (e.g. Mario only while he is squashed).
|
||||
So there is not a significant practical performance impact from these changes.
|
||||
|
||||
## Far clipping removal
|
||||
|
||||
@@ -90,6 +44,13 @@ though it can be seen in certain extreme cases. However, it is used on the SM64
|
||||
title screen for the zoom-in on Mario's face, so this will look slightly
|
||||
different.
|
||||
|
||||
Far clipping can be used to cull tris which are fully "fogged out" if the
|
||||
background color (no skybox) is also the fog color, for performance benefits.
|
||||
This effect has a bad reputation in '90s era games for being used as a cheap
|
||||
trick to hide performance problems, though it's occasionally used in "spooky"
|
||||
levels in romhacks. In F3DEX3, `SPAlphaCompareCull` can be used instead of far
|
||||
clipping to cull these triangles which are fully in fog.
|
||||
|
||||
The removal of far clipping saved a bunch of DMEM space, and enabled other
|
||||
changes to the clipping implementation which saved even more DMEM space.
|
||||
|
||||
@@ -102,11 +63,11 @@ distance in front of the camera plane.
|
||||
|
||||
A few clever romhackers figured out that you could shrink the normals on verts
|
||||
in your mesh (so their length is less than "1") to make the lighting on those
|
||||
verts dimmer and create a version of ambient occlusion. In the base vertex
|
||||
pipeline, F3DEX3 normalizes vertex normals after transforming them, which is
|
||||
required for most features of the lighting system including packed normals, so
|
||||
this no longer works. However, F3DEX3 has support for ambient occlusion via
|
||||
vertex alpha, which accomplishes the same goal with some extra benefits:
|
||||
verts dimmer and create a version of ambient occlusion. In the "advanced"
|
||||
lighting codepath, F3DEX3 normalizes vertex normals after transforming them,
|
||||
which is required for point lights, specular, and Fresnel, so this no longer
|
||||
works. However, F3DEX3 has support for ambient occlusion via vertex alpha, which
|
||||
accomplishes the same goal with some extra benefits:
|
||||
- Much easier to create: just paint the vertex alpha in Blender / fast64. The
|
||||
scaled normals approach was not supported in fast64 and had to be done with
|
||||
scripts or by hand.
|
||||
@@ -118,16 +79,12 @@ vertex alpha, which accomplishes the same goal with some extra benefits:
|
||||
scaled normals never affect the ambient light, contrary to the concept of
|
||||
ambient occlusion.
|
||||
|
||||
Furthermore, for partial HLE compatibility, the same mesh can have the ambient
|
||||
occlusion information encoded in both scaled normals and vertex alpha at the
|
||||
same time. HLE will ignore the vertex alpha AO but use the scaled normals;
|
||||
F3DEX3 will fix the normals' scale but then apply the AO.
|
||||
|
||||
The only case where scaled normals work but F3DEX3 AO doesn't work is for meshes
|
||||
with vertex alpha actually used for transparency (therefore also no fog).
|
||||
|
||||
Note that in LVP mode, scaled normals are supported and work the same way as in
|
||||
F3DEX2, while ambient occlusion is not supported.
|
||||
Note that in the "basic" lighting codepath in F3DEX3, vertex normals are treated
|
||||
the same way as in F3DEX2, so scaled normals are supported there. Ambient
|
||||
occlusion is also supported there.
|
||||
|
||||
## RDP temporary buffers shrinking
|
||||
|
||||
@@ -161,20 +118,13 @@ In F3DEX2, the RSP time for drawing non-textured tris was significantly lower
|
||||
than for textured tris, by skipping a chunk of computation for the texture
|
||||
coefficients if they were disabled. In F3DEX3, no computation is skipped when
|
||||
textures are disabled. However, almost all materials use textures, and F3DEX3 is
|
||||
a little faster at drawing textured tris than F3DEX2. Plus, DRAM access time RSP
|
||||
-> FIFO and FIFO -> RDP is still saved from not sending the coefficients, and
|
||||
RDP time savings from avoiding loading a texture are unaffected of course.
|
||||
a little faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still does
|
||||
not send the texture cofficients if they are disabled, saving DRAM access time
|
||||
for RSP -> FIFO and FIFO -> RDP. RDP time savings from avoiding loading a
|
||||
texture are unaffected of course.
|
||||
|
||||
## Obscure semantic differences from F3DEX2 that should never matter in practice
|
||||
|
||||
- `SPLoadUcode*` corrupts the current M inverse transpose matrix state. If using
|
||||
`G_NORMALS_MODE_FAST`, this doesn't matter. If using `G_NORMALS_MODE_AUTO`,
|
||||
you must send the M matrix to the RSP again after returning to F3DEX3 from the
|
||||
other microcode (which would normally be done anyway when starting to draw the
|
||||
next object). If using `G_NORMALS_MODE_MANUAL`, you must send the updated
|
||||
M inverse transpose matrix to the RSP after returning to F3DEX3 from the other
|
||||
microcode (which would normally be done anyway when starting to draw the next
|
||||
object).
|
||||
- Changing fog settings--i.e. enabling or disabling `G_FOG` in the geometry mode
|
||||
or executing `SPFogFactor` or `SPFogPosition`--between loading verts and
|
||||
drawing tris with those verts will lead to incorrect fog values for those
|
||||
|
||||
@@ -1,96 +1,71 @@
|
||||
@page performance Performance Results
|
||||
|
||||
# Philosophy
|
||||
|
||||
The base version of F3DEX3 was created for RDP bound games like OoT, where new
|
||||
visual effects are desired and increasing the RSP time a bit does not affect the
|
||||
overall performance. If your game is RSP bound, using the base version of F3DEX3
|
||||
will make it slower.
|
||||
|
||||
Conversely, F3DEX3_LVP_NOC matches or beats the RSP performance of F3DEX2 on
|
||||
**all** critical paths in the microcode, including command dispatch, vertex
|
||||
processing, and triangle processing. Then, the RDP and memory traffic
|
||||
performance improvements of F3DEX3--56 vertex buffer, auto-batched rendering,
|
||||
etc.--should further improve performance from there. This means that switching
|
||||
from F3DEX2 to F3DEX3_LVP_NOC should always improve performance regardless of
|
||||
whether your game is RSP bound or RDP bound.
|
||||
|
||||
|
||||
# Performance Results
|
||||
|
||||
F3DEX3_NOC matches or beats the RSP performance of F3DEX2 on **all** critical
|
||||
paths in the microcode, including command dispatch, vertex processing, and
|
||||
triangle processing. Then, the RDP and memory traffic performance improvements
|
||||
of F3DEX3--56 vertex buffer, auto-batched rendering, etc.--should further
|
||||
improve overall game performance from there.
|
||||
|
||||
## Cycle Counts
|
||||
|
||||
These are cycle counts for many key paths in the microcode. Lower numbers are
|
||||
better. The timings are hand-counted taking into account all pipeline stalls and
|
||||
all dual-issue conditions. Instruction alignment after branches is sometimes
|
||||
taken into account, otherwise assumed to be optimal.
|
||||
all dual-issue conditions. Instruction alignment after branches is usually taken
|
||||
into account, but in some cases it is assumed to be optimal.
|
||||
|
||||
Vertex / lighting numbers assume no special features (texgen, packed normals,
|
||||
etc.) Tri numbers assume texture, shade, and Z, and not flushing the buffer.
|
||||
All numbers assume default profiling configuration. Empty cells are "not
|
||||
measured yet".
|
||||
All numbers assume default profiling configuration. Tri numbers assume texture,
|
||||
shade, and Z, and not flushing the buffer. Tri numbers are measured from the
|
||||
first cycle of the command handler inclusive, to the first cycle of whatever is
|
||||
after $ra exclusive; this is in order to capture the extra latency and stalls in
|
||||
F3DEX2.
|
||||
|
||||
| | F3DEX2 | F3DEX3_LVP_NOC | F3DEX3_LVP | F3DEX3_NOC | F3DEX3 |
|
||||
|----------------------------|--------|----------------|------------|------------|--------|
|
||||
| Command dispatch | 12 | 12 | 12 | 12 | 12 |
|
||||
| Small RDP command | 14 | 5 | 5 | 5 | 5 |
|
||||
| Vtx before DMA start | 16 | 17 | 17 | 17 | 17 |
|
||||
| Vtx pair, no lighting | 54 | 54 | 81 | 79 | 98 |
|
||||
| Vtx pair, 0 dir lts | Can't | 64 | | | |
|
||||
| Vtx pair, 1 dir lt | 73 | 70 | 96 | 182 | 201 |
|
||||
| Vtx pair, 2 dir lts | 76 | 77 | 103 | 211 | 230 |
|
||||
| Vtx pair, 3 dir lts | 88 | 84 | 110 | 240 | 259 |
|
||||
| Vtx pair, 4 dir lts | 91 | 91 | 117 | 269 | 288 |
|
||||
| Vtx pair, 5 dir lts | 103 | 98 | 124 | 298 | 317 |
|
||||
| Vtx pair, 6 dir lts | 106 | 105 | 131 | 327 | 346 |
|
||||
| Vtx pair, 7 dir lts | 118 | 112 | 138 | 356 | 375 |
|
||||
| Vtx pair, 8 dir lts | Can't | 119 | 145 | 385 | 404 |
|
||||
| Vtx pair, 9 dir lts | Can't | 126 | 152 | 414 | 433 |
|
||||
| Light dir xfrm, 0 dir lts | Can't | 95 | 95 | None | None |
|
||||
| Light dir xfrm, 1 dir lt | 141 | 95 | 95 | None | None |
|
||||
| Light dir xfrm, 2 dir lts | 180 | 96 | 96 | None | None |
|
||||
| Light dir xfrm, 3 dir lts | 219 | 121 | 121 | None | None |
|
||||
| Light dir xfrm, 4 dir lts | 258 | 122 | 122 | None | None |
|
||||
| Light dir xfrm, 5 dir lts | 297 | 147 | 147 | None | None |
|
||||
| Light dir xfrm, 6 dir lts | 336 | 148 | 148 | None | None |
|
||||
| Light dir xfrm, 7 dir lts | 375 | 173 | 173 | None | None |
|
||||
| Light dir xfrm, 8 dir lts | Can't | 174 | 174 | None | None |
|
||||
| Light dir xfrm, 9 dir lts | Can't | 199 | 199 | None | None |
|
||||
| Only/2nd tri to offscreen | 27 | 26 | 26 | 26 | 26 |
|
||||
| 1st tri to offscreen | 28 | 27 | 27 | 27 | 27 |
|
||||
| Only/2nd tri to clip | 32 | 31 | 31 | 31 | 31 |
|
||||
| 1st tri to clip | 33 | 32 | 32 | 32 | 32 |
|
||||
| Only/2nd tri to backface | 38 | 38 | 38 | 38 | 38 |
|
||||
| 1st tri to backface | 39 | 39 | 39 | 39 | 39 |
|
||||
| Only/2nd tri to degenerate | 42 | 40 | 40 | 40 | 40 |
|
||||
| 1st tri to degenerate | 43 | 41 | 41 | 41 | 41 |
|
||||
| Only/2nd tri to occluded | Can't | Can't | 49 | Can't | 49 |
|
||||
| 1st tri to occluded | Can't | Can't | 50 | Can't | 50 |
|
||||
| Only/2nd tri to draw | 172 | 160 | 163 | 160 | 163 |
|
||||
| 1st tri to draw | 173 | 160 | 163 | 160 | 163 |
|
||||
|
||||
|
||||
Tri numbers are measured from the first cycle of the command handler inclusive,
|
||||
to the first cycle of whatever is after $ra exclusive. This is in order
|
||||
to capture the extra latency and stalls in F3DEX2.
|
||||
|
||||
## Measurements
|
||||
|
||||
Vertex processing time as reported by the performance counter in the `PA`
|
||||
configuration.
|
||||
- Scene 1: Kakariko, adult day, from DMT entrance
|
||||
- Scene 2: Custom empty scene with Suzanne monkey head with 1 dir light
|
||||
- Scene 3: Same but Suzanne has vertex colors instead of lighting (Link is still
|
||||
on screen and has lighting)
|
||||
|
||||
| Microcode | Scene 1 | Scene 2 | Scene 3 |
|
||||
|----------------|---------|---------|---------|
|
||||
| F3DEX3 | 7.41ms | 2.99ms | 2.22ms |
|
||||
| F3DEX3_NOC | 6.85ms | 2.75ms | 1.98ms |
|
||||
| F3DEX3_LVP | 4.12ms | 1.59ms | 1.48ms |
|
||||
| F3DEX3_LVP_NOC | 3.34ms | 1.27ms | 1.16ms |
|
||||
| F3DEX2 | Can't* | Can't* | Can't* |
|
||||
| Vertex count | 3557 | 1548 | 1548 |
|
||||
|
||||
*F3DEX2 does not contain performance counters, so the portion of the RSP time
|
||||
taken for vertex processing cannot be measured.
|
||||
| | F3DEX2 | F3DEX3_NOC | F3DEX3 |
|
||||
|----------------------------|--------|------------|--------|
|
||||
| Command dispatch | 12 | 12 | 12 |
|
||||
| Small RDP command | 14 | 5 | 5 |
|
||||
| Only/2nd tri to offscreen | 27 | 26 | 26 |
|
||||
| 1st tri to offscreen | 28 | 27 | 27 |
|
||||
| Only/2nd tri to clip | 32 | 31 | 31 |
|
||||
| 1st tri to clip | 33 | 32 | 32 |
|
||||
| Only/2nd tri to backface | 38 | 38 | 38 |
|
||||
| 1st tri to backface | 39 | 39 | 39 |
|
||||
| Only/2nd tri to degenerate | 42 | 40 | 40 |
|
||||
| 1st tri to degenerate | 43 | 41 | 41 |
|
||||
| Only/2nd tri to occluded | Can't | Can't | 49 |
|
||||
| 1st tri to occluded | Can't | Can't | 50 |
|
||||
| Only/2nd tri to draw | 172 | 159 | 162 |
|
||||
| 1st tri to draw | 173 | 160 | 163 |
|
||||
| Vtx before DMA start | 16 | 17 | 17 |
|
||||
| Vtx pair, no lighting | 54 | 54 | 70 |
|
||||
| Vtx pair, 0 dir lts | Can't | 65 | 81 |
|
||||
| Vtx pair, 1 dir lt | 73 | 70 | 86 |
|
||||
| Vtx pair, 2 dir lts | 76 | 77 | 93 |
|
||||
| Vtx pair, 3 dir lts | 88 | 84 | 100 |
|
||||
| Vtx pair, 4 dir lts | 91 | 91 | 107 |
|
||||
| Vtx pair, 5 dir lts | 103 | 98 | 114 |
|
||||
| Vtx pair, 6 dir lts | 106 | 105 | 121 |
|
||||
| Vtx pair, 7 dir lts | 118 | 112 | 128 |
|
||||
| Vtx pair, 8 dir lts | Can't | 119 | 135 |
|
||||
| Vtx pair, 9 dir lts | Can't | 126 | 142 |
|
||||
| Vtx pair, 0 point lts | Can't | TODO | +16 |
|
||||
| Vtx pair, 1 point lt | TODO | TODO | +16 |
|
||||
| Vtx pair, 2 point lts | TODO | TODO | +16 |
|
||||
| Vtx pair, 3 point lts | TODO | TODO | +16 |
|
||||
| Vtx pair, 4 point lts | TODO | TODO | +16 |
|
||||
| Vtx pair, 5 point lts | TODO | TODO | +16 |
|
||||
| Vtx pair, 6 point lts | TODO | TODO | +16 |
|
||||
| Vtx pair, 7 point lts | TODO | TODO | +16 |
|
||||
| Vtx pair, 8 point lts | Can't | TODO | +16 |
|
||||
| Vtx pair, 9 point lts | Can't | TODO | +16 |
|
||||
| Light dir xfrm, 0 dir lts | Can't | 92 | 92 |
|
||||
| Light dir xfrm, 1 dir lt | 141 | 92 | 92 |
|
||||
| Light dir xfrm, 2 dir lts | 180 | 93 | 93 |
|
||||
| Light dir xfrm, 3 dir lts | 219 | 118 | 118 |
|
||||
| Light dir xfrm, 4 dir lts | 258 | 119 | 119 |
|
||||
| Light dir xfrm, 5 dir lts | 297 | 144 | 144 |
|
||||
| Light dir xfrm, 6 dir lts | 336 | 145 | 145 |
|
||||
| Light dir xfrm, 7 dir lts | 375 | 170 | 170 |
|
||||
| Light dir xfrm, 8 dir lts | Can't | 171 | 171 |
|
||||
| Light dir xfrm, 9 dir lts | Can't | 196 | 196 |
|
||||
|
||||
@@ -6,14 +6,14 @@ For an OoT codebase, only a few minor changes are required to use F3DEX3.
|
||||
However, more changes are recommended to increase performance and enable new
|
||||
features.
|
||||
|
||||
How to modify the microcode in your HackerOoT based romhack (steps may be
|
||||
similar for other games):
|
||||
How to modify the microcode in your HackerOoT based romhack (note that this is
|
||||
already done in HackerOoT, so this is provided as a guide for other games):
|
||||
- Replace `include/ultra64/gbi.h` in your romhack with `gbi.h` from this repo.
|
||||
- Make the "Required Changes" listed below.
|
||||
- Build this repo: install the latest version of `armips`, then `make
|
||||
F3DEX3_BrZ` or `make F3DEX3_BrW`.
|
||||
- Copy the microcode binaries (`build/F3DEX3_X/F3DEX3_X.code` and
|
||||
`build/F3DEX3_X/F3DEX3_X.data`) to somewhere in your romhack repo, e.g. `data`.
|
||||
`build/F3DEX3_X/F3DEX3_X.data`) to `data` in your romhack repo.
|
||||
- In `data/rsp.rodata.s`, change the line between `fifoTextStart` and
|
||||
`fifoTextEnd` to `.incbin "data/F3DEX3_X.code"` (or wherever you put the
|
||||
binary), and similarly change the line between `fifoDataStart` and
|
||||
@@ -41,9 +41,12 @@ Both OoT and SM64:
|
||||
dynamically) (search for `Vp` case-sensitive, `SPViewport`, and `G_MAXZ`),
|
||||
change the maximum Z value from `G_MAXZ` to `G_NEW_MAXZ` and negate the
|
||||
Y scale. For more information, see the comment next to `G_MAXZ` in the GBI.
|
||||
Note that your romhack codebase may have the constant hardcoded, usually as
|
||||
`511` which is supposed to be `(G_MAXZ/2)`, instead of actually writing
|
||||
`G_MAXZ`; you need to change these too, there are several of these in SM64.
|
||||
Note that your romhack codebase may have the constant hardcoded (usually as
|
||||
`511` which is supposed to be `(G_MAXZ/2)`), instead of actually writing an
|
||||
expression containing `G_MAXZ`; you need to change these too, there are
|
||||
several of these in SM64. Fortunately, it is easy to notice if you have failed
|
||||
to update a Y scale, as anything drawn using that viewport will be upside
|
||||
down.
|
||||
- Remove uses of internal GBI features which have been removed in F3DEX3 (see
|
||||
@ref compatibility for full list). In OoT, the only changes needed are:
|
||||
- In `src/code/ucode_disas.c`, remove the switch statement cases for
|
||||
@@ -51,20 +54,22 @@ Both OoT and SM64:
|
||||
and `G_MW_PERSPNORM`.
|
||||
- In `src/libultra/gu/lookathil.c`, remove the lines which set the `col`,
|
||||
`colc`, and `pad` fields.
|
||||
- In each place `G_MAXZ` is used, a compiler error will be generated;
|
||||
negate the Y scale in each related viewport and change to `G_NEW_MAXZ`.
|
||||
- As mentioned above, in each place `G_MAXZ` is used, a compiler error will
|
||||
be generated; negate the Y scale in each related viewport and change the
|
||||
Z scale and offset to use `G_NEW_MAXZ`.
|
||||
- Change your game engine lighting code to set the `type` (formerly `pad1`)
|
||||
field to 0 in the initialization of any directional light (`Light_t` and
|
||||
derived structs like `Light` or `Lightsn`). F3DEX3 ignores the state of the
|
||||
`G_LIGHTING_POSITIONAL` geometry mode bit in all display lists, meaning both
|
||||
directional and point lights are supported for all display lists (including
|
||||
vanilla). The light is identified as directional if `type` == 0 or point if
|
||||
`kc` > 0 (`kc` and `type` are the same byte). This change is required because
|
||||
otherwise garbage nonzero values may be put in the padding byte, leading
|
||||
directional lights to be misinterpreted as point lights.
|
||||
derived structs like `Light` or `Lightsn`). This change is required because
|
||||
otherwise garbage nonzero values may be put in this byte, which was a padding
|
||||
byte for a non-point-light microcode but is used to identify the light as
|
||||
point or directional in a point light microcode.
|
||||
- The change needed in OoT is: in `src/code/z_lights.c`, in
|
||||
`Lights_BindPoint`, `Lights_BindDirectional`, and `Lights_NewAndDraw`, set
|
||||
`l.type` to 0 right before setting `l.col`.
|
||||
- If your game already had point lighting, use `ENABLE_POINT_LIGHTS` instead
|
||||
of `G_LIGHTING_POSITIONAL` to indicate that point lights are currently active.
|
||||
(Static uses of `G_LIGHTING_POSITIONAL` in display lists need not be removed
|
||||
as this bit is ignored.)
|
||||
|
||||
SM64 only:
|
||||
|
||||
@@ -72,15 +77,15 @@ SM64 only:
|
||||
fixed, the vanilla permanent light direction of `{0x28, 0x28, 0x28}` must be
|
||||
changed to `{0x49, 0x49, 0x49}`, or everything will be too dark. The former
|
||||
vector is not properly normalized, but F3D through F3DEX2 normalize light
|
||||
directions in the microcode, so it doesn't matter with those microcodes. In
|
||||
contrast, F3DEX3 normalizes vertex normals (after transforming them), but
|
||||
assumes light directions have already been normalized.
|
||||
directions in the microcode, so it doesn't matter with those microcodes. The
|
||||
two lighting codepaths in F3DEX3 treat light directions and vertex normals
|
||||
differently: the fast one works like F3DEX2, but the slow one normalizes
|
||||
vertex normals after transforming them and does not modify light directions.
|
||||
Thus in this case, the light directions must already be normalized.
|
||||
- Matrix stack fix (world space lighting / view matrix in VP instead of in M) is
|
||||
basically required. If you *really* want camera space lighting, use matrix
|
||||
stack fix, transform the fixed camera space light direction by V inverse each
|
||||
frame, and send that to the RSP. This will be faster than the alternative (not
|
||||
using matrix stack fix and enabling `G_NORMALS_MODE_AUTO` to correct the
|
||||
matrix).
|
||||
frame, and send that to the RSP.
|
||||
|
||||
## Recommended Changes (Non-Lighting)
|
||||
|
||||
@@ -88,18 +93,14 @@ SM64 only:
|
||||
use `SPLookAt` instead (this is only a few lines change). Also remove any
|
||||
code which writes `SPClipRatio` or `SPForceMatrix`--these are now no-ops, so
|
||||
you might as well not write them.
|
||||
- Avoid using `G_MTX_MUL` in `SPMatrix`. That is, make sure your game engine
|
||||
computes a matrix stack on the CPU and sends the final matrix for each object
|
||||
/ limb to the RSP, rather than multiplying matrices on the RSP. OoT already
|
||||
usually does the former for precision / accuracy reasons and only uses
|
||||
`G_MTX_MUL` in a couple places (e.g. view * perspective matrix); it is okay to
|
||||
leave those. This change is recommended because the `G_MTX_MUL` mode of
|
||||
`SPMatrix` has been moved to Overlay 4 in F3DEX3 (see below), making it
|
||||
substantially slower than it was in F3DEX2. It still functions the same though
|
||||
so you can use it if it's really needed.
|
||||
- Avoid using `G_MTX_MUL` and `G_MTX_PUSH` in `SPMatrix`, and `SPPopMatrix*`,
|
||||
for performance and accuracy reasons. See the GBI for more information. If
|
||||
these are only used in a couple non-critical places such as for GUIs, that's
|
||||
okay.
|
||||
- Re-export as many display lists (scenes, objects, skeletons, etc.) as possible
|
||||
with fast64 set to F3DEX3 mode, to take advantage of the substantially larger
|
||||
vertex buffer, triangle packing commands, "hints" system, etc.
|
||||
vertex buffer (and eventually when supported by community tools, the triangle
|
||||
packing commands and "hints" system).
|
||||
- `#define REQUIRE_SEMICOLONS_AFTER_GBI_COMMANDS` (at the top of, or before
|
||||
including, the GBI) for a more modern, OoT-style codebase where uses of GBI
|
||||
commands require semicolons after them. SM64 omits the semicolons sometimes,
|
||||
@@ -137,10 +138,9 @@ SM64 only:
|
||||
emulate point lights in a scene with a directional light recomputed per actor.
|
||||
You can now just send those to the RSP as real point lights, regardless of
|
||||
whether the display lists are vanilla or new.
|
||||
- If you are porting a game which already had point lighting (e.g. Majora's
|
||||
Mask), note that the point light kc, kl, and kq factors have been changed, so
|
||||
you will need to redesign how game engine light parameters (e.g. "light
|
||||
radius") map to these parameters.
|
||||
- If your game already had point lighting, note that the point light kc, kl, and
|
||||
kq factors have been changed, so you will need to redesign how game engine
|
||||
light parameters (e.g. "light radius") map to these parameters.
|
||||
|
||||
## Changes Required for New Features
|
||||
|
||||
|
||||
3
f3dex3.s
3
f3dex3.s
@@ -1346,7 +1346,7 @@ tri_noinit: // ra is next cmd, second tri in TRI2, or middle of clipping
|
||||
vsub $v11, $v4, $v6 // v11 = vertex 2 - vertex 1 (x, y, addr)
|
||||
vlt $v13, $v2, $v4[1] // v13 = min(v1.y, v2.y), VCO = v1.y < v2.y
|
||||
bnez $11, return_and_end_mat // Then the whole tri is offscreen, cull
|
||||
// 22 cycles
|
||||
// 22 cycles (for tri2 first tri; tri1/only subtract 1 from counts)
|
||||
vmrg tHPos, $v6, $v4 // v14 = v1.y < v2.y ? v1 : v2 (lower vertex of v1, v2)
|
||||
vmudh $v29, $v10, $v12[1] // x = (v1 - v2).x * (v1 - v3).y ...
|
||||
lhu $24, activeClipPlanes
|
||||
@@ -3147,6 +3147,7 @@ ltbasic_setup_after_xfrm:
|
||||
j vtx_after_lt_setup
|
||||
li lbAfter, ltbasic_ao
|
||||
|
||||
.align 8
|
||||
xfrm_light_store_lookat:
|
||||
vmadh $v29, $v9, lpWrld[1h]
|
||||
spv lpFinal[0], (xfrmLookatDirs)($zero) // Store lookat. 1st time garbage, 2nd real
|
||||
|
||||
10
gbi.h
10
gbi.h
@@ -2954,8 +2954,14 @@ _DW({ \
|
||||
|
||||
|
||||
/**
|
||||
* Alpha compare culling. Optimization for cel shading, could also be used for
|
||||
* other scenarios where tris are being drawn with alpha compare.
|
||||
* Alpha compare culling. This was originally created as an optimization for cel
|
||||
* shading, but it can also be used for other scenarios. In particular, it can
|
||||
* be used with fog to cull tris which are entirely in the fog. This could also
|
||||
* be accomplished with far clipping, but far clipping is removed in F3DEX3.
|
||||
* ```
|
||||
* // Cull tris where all three vertex shade alpha are >= 0xFF
|
||||
* gSPAlphaCompareCull(..., G_ALPHA_COMPARE_CULL_ABOVE, 0xFF);
|
||||
* ```
|
||||
*
|
||||
* If mode == G_ALPHA_COMPARE_CULL_DISABLE, tris are drawn normally.
|
||||
*
|
||||
|
||||
Reference in New Issue
Block a user