Created backwards compatibility table, and ltadv optimizations

This commit is contained in:
Sauraen
2025-06-28 21:45:38 -07:00
parent 274509f103
commit 22fb8a71c9
4 changed files with 278 additions and 123 deletions

View File

@@ -2,68 +2,210 @@
# Backwards Compatibility with F3DEX2
## C GBI Compatibility
F3DEX3 is backwards compatible with F3DEX2 at the C GBI level for all features
and commands except:
- The viewport Y scale has been negated, and `G_MAXZ` has been renamed as its
value has changed. See the comment near `G_MAXZ` in the GBI.
- For the same reason, in `BrZ` configuration, any Z threshold values in
`SPBranchLessZ*` which are hard-coded into display lists (not based on
`G_MAXZ`) must be multiplied by 0x20.
- The `G_SPECIAL_*` command IDs have been removed. `G_SPECIAL_2` and
`G_SPECIAL_3` were no-ops in F3DEX2, and `G_SPECIAL_1` was a trigger to
recalculate the MVP matrix. There is no MVP matrix in F3DEX3 so this is
useless.
- `G_LINE3D` (and `Gfx.line`) has been removed. This command did not actually
work in F3DEX2 (it behaved as a no-op).
- `G_MW_CLIP` has been removed, and `SPClipRatio` has been converted into a
no-op. Clipping is handled differently in F3DEX3 and the clip ratio cannot be
changed from 2.
- `G_MV_MATRIX`, `G_MW_MATRIX`, and `G_MW_FORCEMTX` have been removed, and
`SPForceMatrix` has been converted into a no-op. This is because there is no
MVP matrix in F3DEX3.
- `G_MV_POINT` has been removed. This was not used in any command; it would have
likely been used for debugging to copy vertices from DMEM to examine them.
This does not affect `SPModifyVertex`, which is still supported.
- `G_MW_PERSPNORM` has been removed; `SPPerspNormalize` is still supported but
is encoded differently, no longer using this define.
- `G_MVO_LOOKATX` and `G_MVO_LOOKATY` have been removed, and `SPLookAtX` and
`SPLookAtY` are deprecated. `SPLookAtX` has been changed to set both
directions and `SPLookAtY` has been converted to a no-op. To set the lookat
directions, use `SPLookAt`. The lookat directions are now in one 8-byte DMA
word, so they must always be set at the same time as each other. Most of the
non-functional fields (e.g. color) of `LookAt` and its sub-types have been
removed, so code which accesses these fields needs to change. Code which only
accesses lookat directions should be compatible with no changes.
- As discussed above, the `pad1` field of `Light_t` is renamed to `type` and
must be set to zero.
- If you do not raise the maximum number of lights from 7 to 9, the lighting GBI
commands are backwards compatible. However, if you do raise the number of
lights, you must use `SPAmbient` to write the ambient light, as discussed
above. Note that you can now load all your lights with one command,
`SPSetLights`, so it is not usually necessary to use `SPLight` and `SPAmbient`
at all.
## Binary Display List Compatibility
F3DEX3 is backwards compatible with F3DEX2 at the C GBI level for almost all
features and commands. See @ref porting for the relatively small list of code
changes you have to make to your romhack codebase to move from F3DEX2 to F3DEX3.
Also, some relatively obscure internal GBI definitions have been removed.
F3DEX3 is generally binary backwards compatible with OoT-style display lists for
objects, scenes, etc. **It is not compatible at the binary level with SM64-style
objects, scenes, etc. **It is not binary compatible with vanilla SM64-style
display lists which encode object colors as light colors**, as all the command
encodings related to lighting have changed. Of course, if you recompile these
display lists with the new `gbi.h`, it can run them.
encodings related to lighting have changed.
The deprecated commands mentioned above in the C GBI section have had their
encodings changed (the original encodings will do bad things / crash). In
addition, all lighting-related commands--e.g. `gdSPDefLights*`, `SPNumLights`,
`SPLight`, `SPLightColor`, `SPLookAt`--have had their encodings changed, making
them binary incompatible. The lighting data structures, e.g. `Light_t`,
`PosLight_t`, `LookAt_t`, `Lightsn`, `Lights*`, `PosLights*`, etc., have also
changed--generally only slightly, so most code is compatible with no changes.
## GBI Changes Reference
`SPSegment` has been given a different command id (`G_RELSEGMENT` vs.
`G_MOVEWORD`) to facilitate relative segmented address translation. The
original binary encoding is still valid, but does not support relative
translation like the new encoding. However, recompiling with the C GBI will
always use the new encoding.
This is a reference if you run into GBI-related problems when building your
romhack after porting it to F3DEX3, or for HLE emulator authors implementing
changes from F3DEX2 to F3DEX3. The "Bin" and "C" columns indicate whether there
have been changes from F3DEX2 to F3DEX3 in binary encoding and C GBI usage
respectively. The "Perf" column indicates whether the performance of this
command (or the overall game performace if this command is used properly) has
significantly increased or decreased relative to F3DEX2 in a way that the
programmer should be aware of. The `g`,`gs`, or `gd` prefixes are all omitted,
e.g. `SPMatrix` refers to `gSPMatrix` and `gsSPMatrix`. `*` means wildcard.
### RDP Commands
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `DPLoadTLUT*` | = | = | Up | Load is not sent to RDP if repeated in auto-batched rendering. See the GBI comment near `SPDontSkipTexLoadsAcross`. This is a performance optimization only and doesn't affect on-screen output unless the game is buggy / misusing the feature, so this behavior need not be emulated in HLE. |
| `DPLoadBlock*` | = | = | Up | Same as `DPLoadTLUT*` above. |
| `DPLoadTile*` | = | = | Up | Same as `DPLoadTLUT*` above. |
| `SPSetOtherMode` | = | = | | |
| All other `DP*` | = | = | | Microcode generally can't change RDP command behavior. |
### Main Drawing
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `SPVertex` | = | = | Up | Vertex buffer size in F3DEX3 is 56, up from 32 in F3DEX2. Also, many of the new features in F3DEX3 (new lighting, occlusion plane, etc.) are during `SPVertex` processing. |
| `Vtx_t` / `Vtx` | * | * | | Generally the same, but packed normals go in the `flag` field if enabled. |
| `SPModifyVertex` | = | = | | |
| `G_MWO_POINT_RGBA` | = | = | | |
| `G_MWO_POINT_ST` | = | = | | |
| `G_MWO_POINT_XYSCREEN` | = | = | | |
| `G_MWO_POINT_ZSCREEN` | = | = | | |
| `G_MV_POINT` | Rem | Rem | | Removed because the internal vertex format is no longer a multiple of 8 (DMA word). |
| `SPTexture` | = | = | | |
| `SPTextureL` | = | = | | HW V1 workaround; long since deprecated. |
| `SP1Triangle` | = | = | Up | Some of the new features in F3DEX3 (occlusion plane, alpha compare culling, decal fix) are during triangle processing.
| `SP2Triangles` | = | = | Up | Same as `SP1Triangle` above. |
| `SP1Quadrangle` | = | = | Up | Same as `SP1Triangle` above. |
| `SPTriStrip` | New | New | Up | New command that draws 5 tris from 7 indexes, see GBI. |
| `SPTriFan` | New | New | Up | New command that draws 5 tris from 7 indexes, see GBI. |
| `SPMemset` | New | New | Up | New command that memsets a RDRAM region faster than the RDP can, for framebuffer or Z-buffer clear. |
| `G_LINE3D` | Rem | Rem | | Removed; no-op in F3DEX2. |
### Control Logic
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `SPNoOp` | = | = | | |
| `SPDisplayList*` | = | = | | Hints are encoded into previously unused bits, but this is a performance optimization only and will never affect on-screen output, so the hints encoding can be ignored by HLE. |
| `G_DL_PUSH` | = | = | | |
| `SPBranchList*` | = | = | | Same as `SPDisplayList*` above. |
| `G_DL_NOPUSH` | = | = | | |
| `SPEndDisplayList*` | = | = | | Same as `SPDisplayList*` above. |
| `SPCullDisplayList` | = | = | | |
| `SPBranchLess*` | * | * | | In `BrZ` configuration, Z threshold values which are hard-coded into display lists (not based on `G_MAXZ`) must be multiplied by 0x20. See `G_MAXZ` below. |
| `SPLoadUcode*` | = | = | | Note that F3DEX3_PC (CFG_PROFILING_C) may have compatibility problems with other microcodes. It is specially designed to work with S2DEX for OoT but other microcodes are not guaranteed to work. This is not a limitation in other F3DEX3 variants. |
| `SPDma*` | = | = | Down | Moved to Overlay 3 (slower) as it is rarely used. HLE can't emulate this by definition so must treat it as a no-op; games therefore use it for HLE/LLE detection. |
| `SPSegment` | * | * | | F3DEX3 supports F3DEX2 binary encoding for SPSegment, but this does not have the relative segment resolution behavior. The new behavior is obtained with the new command encoding with `G_RELSEGMENT`. |
| `G_MW_SEGMENT` | = | = | | |
| `G_MWO_SEGMENT_*` | = | = | | These were never needed. |
| `SPFlush` | New | New | Up | This is a performance optimization only and can't be HLE emulated, so it should be treated as a no-op. |
| `G*` (`Gfx` subtypes) | ? | ? | | Deprecated. These did not fully reflect the bits usage in actual commands even in F3DEX2. These have mostly not been updated for F3DEX3. |
### 3D Space
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `Mtx` | = | = | | |
| `SPMatrix` | Chg | = | * | Encoding changed due to multiple flags below changing. |
| `G_MTX_PUSH` | = | = | Down | `SPMatrix` processing with `G_MTX_PUSH` set is moved to Overlay 3 (slower) as games should not use the RSP matrix stack for accuracy and performance reasons (see GBI). |
| `G_MTX_NOPUSH` | = | = | | |
| `G_MTX_LOAD` | Chg | = | | Encoding inverted (in SPMatrix, not in the definition of `G_MTX_LOAD`). |
| `G_MTX_MUL` | Chg | = | | Encoding inverted (in SPMatrix, not in the definition of `G_MTX_MUL`). |
| `G_MTX_MODEL` | = | New | | New name for `G_MTX_MODELVIEW` as the view matrix must be multiplied into the projection matrix stack in F3DEX3. |
| `G_MTX_VIEWPROJECTION` | = | New | | New name for `G_MTX_PROJECTION` as the view matrix must be multiplied into the projection matrix stack in F3DEX3. |
| `G_MV_MMTX` | Chg | New | | Encoding changed. |
| `G_MV_TEMPMTX0` | Chg | = | | Encoding changed. |
| `G_MV_VPMTX` | Chg | New | | New name for `G_MV_PMTX`, encoding changed. |
| `G_MV_TEMPMTX1` | Chg | = | | Encoding changed. |
| `SPPopMatrix*` | Chg | = | Down | Moved to Overlay 3 (slower) as games should not use the RSP matrix stack for accuracy and performance reasons (see GBI). Encoding is changed due to `G_MV_MMTX` changing. |
| `SPForceMatrix` | Chg | Chg | | Converted into no-op. |
| `G_MV_MATRIX` | Rem | Rem | | Removed. |
| `G_MW_MATRIX` | Rem | Rem | | Removed. |
| `G_MW_FORCEMTX` | Rem | Rem | | Removed. |
| `SPViewport` | * | * | | Command itself is the same, but see `Vp` below. |
| `Vp_t` / `Vp` | Chg | Chg | | The Y scale is now negated, and the Z values are different due to the change from `G_MAXZ` to `G_NEW_MAXZ`.
| `G_MAXZ` | Rem | Rem | | Replaced with `G_NEW_MAXZ`. The name change is to force you to update your code--especially viewport definitions with hardcoded constants which are NOT defined in terms of `G_MAXZ`. |
| `G_NEW_MAXZ` | New | New | | The equivalent of `G_MAXZ` constant used in viewport calculations. |
| `G_MV_VIEWPORT` | = | = | | |
| `SPPerspNormalize` | Chg | = | | Encoding changed. |
| `G_MW_PERSPNORM` | Rem | Rem | | Removed. The perspective normalization factor is set via `G_MW_FX` with the changed encoding of `SPPerspNormalize`. |
| `G_MWO_PERSPNORM` | New | New | | |
| `SPClipRatio` | Chg | Chg | | Converted into no-op. It is not possible to change the clip ratio from 2 in F3DEX3. |
| `G_MW_CLIP` | Rem | Rem | | Removed. See `SPClipRatio` above. |
### Lighting
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `Light_t`, `Light` | Chg | * | | `type` field must be set to 0 (`LIGHT_TYPE_DIR`) to indicate directional light. `size` field for specular added. Otherwise the same, though note that now there is not an extra 8 bytes of padding between lights (the offset between them is 16, not 24). |
| `LIGHT_TYPE_DIR` | New | New | | New macro, but the encoding is the same as F3DEX2_PL. |
| `PointLight_t` | Chg | * | | Same changes as `Light_t`. Also note that the `kq` field is now interpreted as an E3M5 floating-point number. |
| `LIGHT_TYPE_POINT` | New | New | | New macro, but the encoding is the same as F3DEX2_PL. |
| `Ambient_t`, `Ambient` | = | = | | Note that you must use `Ambient`, not `Light`, for the ambient light if you have 9 directional/point lights. |
| `Lights1`, `Lights2`, ... | Chg | * | | The ambient light is at the end, not the beginning. The data layout matches the RSP internal data layout to enable `SPSetLights`. |
| `Lightsn` | Chg | * | | Same as `Lights1` etc. Also, now 9 directional/point lights. |
| `Lights0` | Chg | Chg | | Now only contains Ambient (no Light) because F3DEX3 properly supports zero directional/point lights. |
| `SPDefLights*` | Chg | = | | Matches changes in `Lights*`. Also, there is no need for these in a game with a real lighting engine. |
| `SPDefPointLights*` | Chg | = | | Matches changes in `Lights*`. Also, there is no need for these in a game with a real lighting engine. |
| `SPNumLights` | Chg | Chg | | Encoding changed. `ENABLE_POINT_LIGHTS` can now be included. Zero lights is properly supported unlike in F3DEX2. The maximum number of point/directional lights is 9, up from 7. |
| `G_MW_NUMLIGHT` | = | = | | |
| `G_MWO_NUMLIGHT` | = | = | | |
| `NUML` | Chg | = | | Encoding changed. |
| `NUMLIGHTS_*` | Chg | = | | Deprecated as these are just defined equal to their number, because F3DEX3 supports zero lights. |
| `LIGHT_*` | = | = | | Deprecated and were never useful. |
| `SPLight` | Chg | = | | Encoding changed. Note that you must use `SPAmbient`, not `SPLight`, for the ambient light if you have 9 directional/point lights. Also note that you should usually use `SPSetLights` unless you need to set individual lights without affecting the others. |
| `SPAmbient` | New | New | | New command to upload the ambient light. If you have 0-8 directional/point lights, you can also use `SPLight` for this (slightly slower), but if you have 9 directional/point lights you must use `SPAmbient`. |
| `SPLightColor*` | Chg | = | | Encoding changed. |
| `G_MW_LIGHTCOL` | = | = | | |
| `G_MV_LIGHT` | = | = | | |
| `SPSetLights*` | Chg | * | Up | Implementation completely different from F3DEX2, uses one DMA transaction regardless of the number of lights. In C, you can/should use dynamically allocated memory for the `Lights*` struct being uploaded, as opposed to `SPDefLights*`, but you need to dereference the pointer passed to `SPSetLights*`. |
| `G_MWO_aLIGHT_*` | Chg | = | | Encodings changed. No longer needed. |
| `G_MWO_bLIGHT_*` | Chg | = | | Encodings changed. No longer needed. |
| `G_MVO_L*` | Rem | Rem | | Removed. |
| `SPCameraWorld` | New | New | | New command to set the camera position for Fresnel. |
| `PlainVtx` | New | New | | For `SPCameraWorld`.
| `SPLookAt` | New | New | | Replaces `SPLookAtX` and `SPLookAtY`. |
| `SPLookAtX` | Chg | * | | Encoding changed; in an attempt at backwards compatibility, defined as `SPLookAt`, which works with basic usage. |
| `SPLookAtY` | Chg | * | | Converted to no-op. |
| `G_MVO_LOOKAT*` | Rem | Rem | | Removed with `SPLookAt` changes. |
| `LookAt_t`, `LookAt` | Chg | * | | The size is different and most of the non-functional fields have been removed. Code which only accesses the functional fields does not need to change. |
| `Hilite_t`, `Hilite` | = | = | | |
| `SPFog*` | = | = | | |
| `G_MW_FOG` | = | = | | |
| `G_MWO_FOG` | = | = | | |
### Geometry Mode and New Effect Parameters
| Command | Bin | C | Perf | Notes |
|--------------------------|-----|-----|------|-------|
| `SP*GeometryMode*` | * | * | | Commands themselves are the same, but many new geometry mode flags, see below. |
| `G_ZBUFFER` | = | = | | |
| `G_TEXTURE_ENABLE` | = | = | | Very old (F3D / HW v1) display lists with this bit set will no longer crash on F3DEX3, unlike F3DEX2. |
| `G_SHADE` | = | = | | |
| `G_ATTROFFSET_ST_ENABLE` | New | New | | New geometry mode bit that enables ST attribute offsets, usually for smooth scrolling. |
| `SPAttrOffsetST` | New | New | | New command which writes ST attribute offsets using `G_MWO_ATTR_OFFSET_*`. |
| `G_MWO_ATTR_OFFSET_S` | New | New | | |
| `G_MWO_ATTR_OFFSET_T` | New | New | | |
| `G_AMBOCCLUSION` | New | New | | |
| `SPAmbOcclusion*` | New | New | | New commands which write ambient occlusion parameters using `G_MWO_AO_*`. |
| `G_MWO_AO_AMBIENT` | New | New | | |
| `G_MWO_AO_DIRECTIONAL` | New | New | | |
| `G_MWO_AO_POINT` | New | New | | |
| `G_CULL_NEITHER` | = | = | | |
| `G_CULL_FRONT` | = | = | | |
| `G_CULL_BACK` | = | = | | |
| `G_CULL_BOTH` | = | = | | |
| `G_PACKED_NORMALS` | New | New | | New geometry mode bit that enables packed normals (simultaneous lighting and vertex colors). |
| `G_LIGHTTOALPHA` | New | New | | New geometry mode bit that moves the maximum of the three light color channels to shade alpha, usually for cel shading. |
| `G_LIGHTING_SPECULAR` | New | New | | New geometry mode bit that changes lighting from diffuse to specular. |
| `G_FRESNEL_COLOR` | New | New | | New geometry mode bit that computes Fresnel and places it in all three shade color channels. |
| `G_FRESNEL_ALPHA` | New | New | | New geometry mode bit that computes Fresnel and places it in shade alpha. |
| `SPFresnel*` | New | New | | New commands which write Fresnel parameters using `G_MWO_FRESNEL_*`. |
| `G_MWO_FRESNEL_SCALE` | New | New | | |
| `G_MWO_FRESNEL_OFFSET` | New | New | | |
| `G_FOG` | = | = | | |
| `G_LIGHTING` | = | = | | |
| `G_TEXTURE_GEN` | = | = | | |
| `G_TEXTURE_GEN_LINEAR` | = | = | | |
| `G_LOD` | = | = | | Ignored by all F3DEX* variants. |
| `G_SHADING_SMOOTH` | = | = | | |
| `G_LIGHTING_POSITIONAL` | Chg | Chg | | This bit is ignored by F3DEX3--both in order to allow point lighting on all vanilla geometry, and because the F3DEX2_PL design of having this as a property of an object/model rather than a property of the lights state is poor design. In F3DEX3, whether point lights are present or not is determined by the `ENABLE_POINT_LIGHTS` flag in `SPNumLights` and `SPSetLights*`. |
| `G_CLIPPING` | = | = | | Ignored by all F3DEX* variants. |
### Miscellaneous
| Command | Bin | C | Perf | Notes |
|----------------------|-----|-----|------|-------|
| `SPOcclusionPlane` | New | New | | New command that uploads the occlusion plane coefficients. |
| `OcclusionPlane*` | New | New | | Structs for occlusion plane. |
| `SPLightToRDP` | New | New | | New command that copies RSP light color to RDP color, see GBI. |
| `SPLightToPrimColor` | New | New | | Same as `SPLightToRDP` above. |
| `SPLightToFogColor` | New | New | | Same as `SPLightToRDP` above. |
| `SPDontSkipTexLoadsAcross` | New | New | Up | New command which locally cancels auto-batched rendering by writing an invalid address to `G_MWO_LAST_MAT_DL_ADDR`. |
| `G_MWO_LAST_MAT_DL_ADDR` | New | New | | |
| `SPAlphaCompareCull` | New | New | Up | New command which enables culling of tris based on shade alpha values, for cel shading. Normal use of this command in cel shading is a performance optimization only and doesn't affect on-screen output, so it can be treated as a no-op by an initial HLE implementation. But it is easy to write a display list where it does affect on-screen output, so a good HLE implementation should emulate it. |
| `G_ALPHA_COMPARE_CULL_DISABLE` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_ALPHA_COMPARE_CULL_BELOW` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_ALPHA_COMPARE_CULL_ABOVE` | New | New | | Settings for `SPAlphaCompareCull`. |
| `G_MWO_ALPHA_COMPARE_CULL` | New | New | | |
| `MoveWd` | = | = | | Regular/valid encodings are the same. |
| `MoveHalfwd` | New | New | | Like `MoveWd` but writes 2 bytes instead of 4. |
| `G_MW_FX` | New | New | | New moveword table index for base address for many parameters. |
| `G_SPECIAL_1` | Rem | Rem | | Removed; in F3DEX2, triggered MVP matrix recalculation. |
| `G_SPECIAL_2` | Rem | Rem | | Removed; no-op in F3DEX2. |
| `G_SPECIAL_3` | Rem | Rem | | Removed; no-op in F3DEX2. |

123
f3dex3.s
View File

@@ -853,6 +853,7 @@ vOne equ $v28 // All elements = 1; global
// Vertex / lighting vector regs:
// Prefixes: v = vector register, vp = vertex pair, s = vertex store,
// l = basic lighting, a = advanced lighting
// Sadly, "vp" stands for vertex pair, view*projection matrix, and viewport
vMTX0I equ $v0 // Matrix rows int/frac; MVP normally, or M in ltadv
vMTX1I equ $v1
@@ -960,7 +961,7 @@ sSTS equ vPerm4
// ltadv:
aPNScl equ $v8 // ltadv Packed Normals Scales = (1<<0),(1<<5),(1<<11),XX, repeat
aNrmSc equ $v9 // ltadv Normals Scale = [0h:1h] scale to normalize all normals; elems 2,3,6,7 used for point light factors
aLen2F equ $v10 // ltadv Length 2quared Frac part
aDOT equ $v10 // ltadv Dot product = normals dot direction; also briefly light dir
aLen2I equ $v11 // ltadv Length 2quared Int part
// Uses vpMdl = $v12
vpWrlF equ $v13 // vertex pair World position Frac part
@@ -971,20 +972,20 @@ aOAFrs equ $v17 // ltadv Offset Alpha (elem 3,7) and Fresnel (elem 0,4)
// Uses vpRGBA, vpLtTot, vpNrmlX, vpNrmlY, vpNrmlZ = $v18, $v19, $v20, $v21, $v22
aParam equ $v23 // ltadv Parameters = AO, texgen, and Fresnel params
aAOF2 equ aLen2F // Version of aAOF in init, can't be aDPosI/F or vpMdl there
aPLFcF equ aLen2F // ltadv Point Light Factor Frac part
aLTC equ aLen2F // ltadv Light Color
aLkDt0 equ aLen2F // ltadv Lookat Dot product 0 for texgen
aAOF2 equ aDOT // Version of aAOF in init, can't be aDPosI/F or vpMdl there
aPLFcI equ aLen2I // ltadv Point Light Factor Int part
aLkDt1 equ aLen2I // ltadv Lookat Dot product 1 for texgen
aDOT equ vpMdl // ltadv Dot product = normals dot direction; also briefly light dir
aLen2F equ vpMdl // ltadv Length 2quared Frac part
aPLFcF equ vpMdl // ltadv Point Light Factor Frac part
aLTC equ vpMdl // ltadv Light Color
aClOut equ vpWrlF // ltadv Color Out
aAlOut equ vpWrlI // ltadv Alpha Out
aDIR equ aDPosF // ltadv Direction = normalize(light or cam - vertex)
aDotSc equ aDPosF // ltadv Dot product Scale factor
aLkDt0 equ aDPosF // ltadv Lookat Dot product 0 for texgen
aLenF equ aDPosI // ltadv Length Frac part
aAOF equ aDPosI // ltadv Ambient Occlusion Factor
aProj equ aDPosI // ltadv Projection
aLkDt1 equ aDPosI // ltadv Lookat Dot product 1 for texgen
// vpST equ aOAFrs // ST used in texgen
vpWNrm equ vpNrmlX // vertex pair World space Normals
aRcpLn equ $v29 // ltadv Reciprocal of Length
@@ -1263,6 +1264,7 @@ G_MODIFYVTX_handler:
j do_moveword // Moveword adds cmd_w0 to $10 for final addr
lbu cmd_w0, (inputBufferEnd - 0x07)(inputBufferPos) // offset in vtx, bit 15 clear
TODO check vtx 1 behavior
G_TRIFAN_handler: // 17
li $1, 0x8000 // $ra negative = flag for G_TRIFAN
G_TRISTRIP_handler:
@@ -3376,7 +3378,36 @@ ovl234_clipmisc_entrypoint_ovl4ver: // same IMEM address as ovl234_clipmi
jal load_overlays_2_3_4 // Not a call; returns to $ra-8 = here
li cmd_w1_dram, orga(ovl3_start) // set up a load for overlay 3
ltadv_after_mtx:
ltadv_spec_fres_setup: // Odd instruction
// Get aDIR = normalize(camera - vertex), aDOT = (vpWNrm dot aDIR)
ldv aDPosI[0], (cameraWorldPos - altBase)(altBaseReg) // Camera world pos
j ltadv_normal_to_vertex
ldv aDPosI[8], (cameraWorldPos - altBase)(altBaseReg)
ltadv_after_camera:
vmov aOAFrs[0], aDOT[0] // Save Fresnel dot product in aOAFrs[0h]
vmov aOAFrs[4], aDOT[4] // elems 0, 4
bgez laSpecular, ltadv_loop // Sign bit clear = not specular
li laSpecFres, 0 // Clear flag for specular or fresnel
// aProj <- aLenF
vmulf aProj, vpWNrm, aDOT[0h] // Projection of camera vec onto normal
vmudh $v29, aDIR, $v31[1] // -camera vec
j ltadv_normals_to_regs // For specular, replace vpWNrm with reflected vector
vmadh vpWNrm, aProj, $v31[3] // + 2 * projection
// aDPosI <- aProj
ltadv_xfrm: // Even instruction
vmudn $v29, vMTX0F, vpMdl[0h]
lbu curLight, numLightsxSize // Scalar instructions here must be OK to do twice
vmadh $v29, vMTX0I, vpMdl[0h]
luv vpRGBA, (VTX_IN_TC + 0 * inputVtxSize)(laPtr) // Vtx 2:1 RGBA
vmadn $v29, vMTX1F, vpMdl[1h]
vmadh $v29, vMTX1I, vpMdl[1h]
addi curLight, curLight, altBase // Point to ambient light
vmadn aDPosF, vMTX2F, vpMdl[2h]
jr $ra
vmadh aDPosI, vMTX2I, vpMdl[2h]
ltadv_after_mtx: // Even instruction
move laPtr, inVtx
vcopy aPNScl, vOne
move laVtxLeft, vtxLeft
@@ -3389,55 +3420,63 @@ ltadv_after_mtx:
ltadv_continue_setup:
lqv aParam, (fxParams - altBase)(altBaseReg)
vcopy aNrmSc, aRcpLn // aRcpLn[0:1] is int:frac scale (1 / length)
andi $11, vGeomMid, G_AMBOCCLUSION >> 8
lsv aPNScl[6], (packedNormalsMaskConstant - altBase)(altBaseReg) // F800
vge $v29, $v31, $v31[3] // Set VCC to 00011111
andi $11, vGeomMid, G_AMBOCCLUSION >> 8
bnez $11, @@skip_zero_ao
andi laL2A, vGeomMid, G_LIGHTTOALPHA >> 8
vmrg aParam, aParam, $v31[2] // 0
@@skip_zero_ao:
jal while_wait_dma_busy
andi laTexgen, vGeomMid, G_TEXTURE_GEN >> 8
ltadv_vtx_loop:
ldv vpMdl[0], (VTX_IN_OB + 1 * inputVtxSize)(laPtr) // Vtx 2 Model pos + PN
ldv vpMdl[8], (VTX_IN_OB + 0 * inputVtxSize)(laPtr) // Vtx 1 Model pos + PN
align_with_warning 8, "One instruction of padding before ltadv_vtx_loop"
ltadv_vtx_loop: // Even instruction
vmudm $v29, aPNScl, vpMdl[3h] // Packed normals from elem 3,7 of model pos
lw $11, (VTX_IN_CN + 1 * inputVtxSize)(laPtr) // Vtx 2 RGBA
vmadn vpNrmlY, $v31, $v31[2] // 0; load lower (vpMdl unsigned but must be T operand)
lw laSTKept,(VTX_IN_TC + 0 * inputVtxSize)(laPtr) // Vtx 1 ST
vand vpNrmlX, vpMdl, aPNScl[3] // 0xF800; X component masked in elem 3, 7
jal ltadv_xfrm
sw $11, (VTX_IN_TC + 0 * inputVtxSize)(laPtr) // Vtx 2 RGBA -> Vtx 1 ST
vmadn vpWrlF, vMTX3F, vOne // Finish vertex pos transform
andi laPacked, vGeomMid, G_PACKED_NORMALS >> 8
vmadh vpWrlI, vMTX3I, vOne
luv vpLtTot, (ltBufOfs + 0)(curLight) // Total light level, init to ambient
andi laPacked, vGeomMid, G_PACKED_NORMALS >> 8
// aOAFrs <- vpST
vsub aOAFrs, vpRGBA, $v31[7] // 0x7FFF; offset alpha elems 3, 7
vmudm $v29, aPNScl, vpMdl[3h] // Packed normals from elem 3,7 of model pos
bnez laPacked, @@skip_regular_normals
vmadn vpMdl, $v31, $v31[2] // 0; load lower (vpMdl unsigned but must be T operand)
lpv vpMdl, (VTX_IN_TC + 0 * inputVtxSize)(laPtr) // Vtx 2:1 regular normals
@@skip_regular_normals:
luv vpLtTot, (ltBufOfs + 0)(curLight) // Total light level, init to ambient
vne $v29, $v31, $v31[0h] // Set VCC to 01110111
beqz laPacked, @@skip_packed_normals
lpv vpMdl, (VTX_IN_TC + 0 * inputVtxSize)(laPtr) // Vtx 2:1 regular normals
vmrg vpMdl, vpNrmlY, vpNrmlX[3h] // Masked X to 0, 4; multiplied Y, Z in 1, 2, 5, 6
@@skip_packed_normals:
vmudh $v29, vOne, $v31[7] // Load accum mid with 0x7FFF (1 in s.15)
jal ltadv_xfrm
// aAOF2 <- aDOT
vmadm aAOF2, aOAFrs, aParam[0] // + (alpha - 1) * aoAmb factor; elems 3, 7
// aDOT <- vpMdl
// aLTC <- vpMdl
vmulf vpLtTot, vpLtTot, aAOF2[3h] // light color *= ambient factor
// aLTC <- aAOF2
// aDOT <- aAOF2
vmudn $v29, aDPosF, aNrmSc[0h] // Vec frac * int scaling, discard result
// aDIR <- aDPosF
addi laPtr, laPtr, 2 * inputVtxSize
vmadm $v29, aDPosI, aNrmSc[1h] // Vec int * frac scaling, discard result
addi laVtxLeft, laVtxLeft, -2 * inputVtxSize
// vpWNrm <- vpNrmlX
vmadh vpWNrm, aDPosI, aNrmSc[0h] // Vec int * int scaling
sll laSpecular, vGeomMid, (31 - 5) // G_LIGHTING_SPECULAR to sign bit
vmudn vpWrlF, vpWrlF, $v31[1] // -1; negate world pos so add light/cam pos to it
andi laSpecFres, vGeomMid, (G_LIGHTING_SPECULAR | G_FRESNEL_COLOR | G_FRESNEL_ALPHA) >> 8
vmadh vpWrlI, vpWrlI, $v31[1] // -1
ltadv_normals_to_regs:
// vpNrmlX <- vpWNrm
vmudh vpNrmlY, vOne, vpWNrm[1h] // Move normals to separate registers
bnez laSpecFres, ltadv_spec_fres_setup
vmudh vpNrmlZ, vOne, vpWNrm[2h] // per component, in elems 0-3, 4-7
// vpNrmlX <- vpWNrm
// aAOF <- aDPosI
ltadv_loop:
align_with_warning 8, "One instruction of padding before ltadv_loop"
ltadv_loop: // Even instruction
vmudh $v29, vOne, $v31[7] // Load accum mid with 0x7FFF (1 in s.15)
lbu $11, (ltBufOfs + 3 - lightSize)(curLight) // Light type / constant attenuation
vmadm aAOF, aOAFrs, aParam[1] // + (alpha - 1) * aoDir factor; elems 3, 7
@@ -3457,35 +3496,6 @@ ltadv_finish_light:
j ltadv_loop
vmacf vpLtTot, aLTC, aDOT[0h] // + light color * dot product
ltadv_xfrm:
vmudn $v29, vMTX0F, vpMdl[0h]
lbu curLight, numLightsxSize // Scalar instructions here must be OK to do twice
vmadh $v29, vMTX0I, vpMdl[0h]
luv vpRGBA, (VTX_IN_TC + 0 * inputVtxSize)(laPtr) // Vtx 2:1 RGBA
vmadn $v29, vMTX1F, vpMdl[1h]
vmadh $v29, vMTX1I, vpMdl[1h]
addi curLight, curLight, altBase // Point to ambient light
vmadn aDPosF, vMTX2F, vpMdl[2h]
jr $ra
vmadh aDPosI, vMTX2I, vpMdl[2h]
ltadv_spec_fres_setup:
// Get aDIR = normalize(camera - vertex), aDOT = (vpWNrm dot aDIR)
ldv aDPosI[0], (cameraWorldPos - altBase)(altBaseReg) // Camera world pos
j ltadv_normal_to_vertex
ldv aDPosI[8], (cameraWorldPos - altBase)(altBaseReg)
ltadv_after_camera:
vmov aOAFrs[0], aDOT[0] // Save Fresnel dot product in aOAFrs[0h]
vmov aOAFrs[4], aDOT[4] // elems 0, 4
bgez laSpecular, ltadv_loop // Sign bit clear = not specular
li laSpecFres, 0 // Clear flag for specular or fresnel
// aProj <- aLenF
vmulf aProj, vpWNrm, aDOT[0h] // Projection of camera vec onto normal
vmudh $v29, aDIR, $v31[1] // -camera vec
j ltadv_normals_to_regs // For specular, replace vpWNrm with reflected vector
vmadh vpWNrm, aProj, $v31[3] // + 2 * projection
// aDPosI <- aProj
ltadv_specular: // aDOT in/out, uses vpLtTot[3] and $11 as temps
lb $11, (ltBufOfs + 0xF - lightSize)(curLight) // Light size factor
mtc2 $11, vpLtTot[6] // Light size factor in elem 3 as temp
@@ -3494,25 +3504,29 @@ ltadv_specular: // aDOT in/out, uses vpLtTot[3] and $11 as temps
jr $ra
vxor aDOT, aDOT, $v31[7] // = 0x7FFF - result
align_with_warning 8, "One instruction of padding before ltadv_post"
ltadv_post:
// aClOut <- vpWrlF
// aAlOut <- vpWrlI
// vpMdl <- aLTC
vge aAOF, vpLtTot, vpLtTot[1h] // elem 0 = max(R0, G0); elem 4 = max(R1, G1)
ldv vpMdl[0], (VTX_IN_OB + 1 * inputVtxSize)(laPtr) // Vtx 2 Model pos + PN
vmulf aClOut, vpRGBA, vpLtTot // RGB output is RGB * light
beqz laL2A, @@skip_cel
vcopy aAlOut, vpRGBA // Alpha output = vertex alpha (only 3, 7 matter)
// Cel: alpha = max of light components, RGB = vertex color
vge aAlOut, vpLtTot, vpLtTot[1h] // elem 0 = max(R0, G0); elem 4 = max(R1, G1)
vge aAlOut, aAlOut, aAlOut[2h] // elem 0 = max(R0, G0, B0); equiv for elem 4
vge aAOF, aAOF, aAOF[2h] // elem 0 = max(R0, G0, B0); equiv for elem 4
vcopy aClOut, vpRGBA // RGB output is vertex color
vmudh aAlOut, vOne, aAlOut[0h] // move light level elem 0, 4 to 3, 7
vmudh aAlOut, vOne, aAOF[0h] // move light level elem 0, 4 to 3, 7
@@skip_cel:
vne $v29, $v31, $v31[3h] // Set VCC to 11101110
bnez laPacked, @@skip_novtxcolor
andi $11, vGeomMid, (G_FRESNEL_COLOR | G_FRESNEL_ALPHA) >> 8
vcopy aClOut, vpLtTot // If no packed normals, base output is just light
@@skip_novtxcolor:
vmrg vpRGBA, aClOut, aAlOut // Merge base output and alpha output
beqz $11, @@skip_fresnel
vmrg vpRGBA, aClOut, aAlOut // Merge base output and alpha output
ldv vpMdl[8], (VTX_IN_OB + 0 * inputVtxSize)(laPtr) // Vtx 1 Model pos + PN
lsv aAOF[0], (vTRC_0100_addr - altBase)(altBaseReg) // Load constant 0x0100 to temp
vabs aOAFrs, aOAFrs, aOAFrs // Fresnel dot in aOAFrs[0h]; absolute value for underwater
andi $11, vGeomMid, G_FRESNEL_COLOR >> 8
@@ -3538,7 +3552,6 @@ ltadv_texgen_end: // Vtx 2 ST in vpST elem 0, 1; vtx 1 ST in vpST elem 4, 5
lw laSTKept, (tempVtx1ST)(rdpCmdBufEndP1) // Overwrite stored Vtx 1 ST
slv vpST[0], (VTX_IN_TC - 1 * inputVtxSize)(laPtr) // Vtx 2 ST
ltadv_after_texgen:
// vpMdl <- aDOT
lw $11, (VTX_IN_TC - 2 * inputVtxSize)(laPtr) // Vtx 2 RGBA from vtx 1 ST slot
bltz laVtxLeft, vtx_setup_no_lighting
sw laSTKept, (VTX_IN_TC - 2 * inputVtxSize)(laPtr) // Restore vtx 1 ST

4
gbi.h
View File

@@ -2775,7 +2775,7 @@ other segments. */
/**
* @brief Load new MVP matrix directly.
*
* This is no longer supported as there is no MVP matrix in F3DEX3.
* This is no longer supported as it was not used in production games.
* @deprecated
*/
#define gSPForceMatrix(pkt, mptr) gSPNoOp(pkt)
@@ -2949,7 +2949,7 @@ _DW({ \
/**
* Alpha compare culling. Optimization for cel shading, could also be used for
* other scenarios where lots of tris are being drawn with alpha compare.
* other scenarios where tris are being drawn with alpha compare.
*
* If mode == G_ALPHA_COMPARE_CULL_DISABLE, tris are drawn normally.
*

View File

@@ -56,14 +56,14 @@ G_RDPFULLSYNC equ 0xe9
G_SETKEYGB equ 0xea
G_SETKEYR equ 0xeb
G_SETCONVERT equ 0xec
G_SETSCISSOR equ 0xed // scis/oth handler
G_SETSCISSOR equ 0xed
G_SETPRIMDEPTH equ 0xee
G_RDPSETOTHERMODE equ 0xef // scis/oth handler
G_LOADTLUT equ 0xf0 // load handler
G_RDPHALF_2 equ 0xf1 // own handler
G_RDPSETOTHERMODE equ 0xef
G_LOADTLUT equ 0xf0
G_RDPHALF_2 equ 0xf1
G_SETTILESIZE equ 0xf2
G_LOADBLOCK equ 0xf3 // load handler
G_LOADTILE equ 0xf4 // load handler
G_LOADBLOCK equ 0xf3
G_LOADTILE equ 0xf4
G_SETTILE equ 0xf5
G_FILLRECT equ 0xf6
G_SETFILLCOLOR equ 0xf7