mirror of
https://github.com/HackerN64/F3DEX3.git
synced 2026-01-21 10:37:45 -08:00
700 lines
38 KiB
Markdown
700 lines
38 KiB
Markdown
# F3DEX3
|
|
|
|
Modern microcode for N64 romhacks. Will make you want to finally ditch HLE.
|
|
Heavily modified version of F3DEX2, partially rewritten from scratch.
|
|
|
|
**F3DEX3 is in alpha. It is not guaranteed to be bug-free, and updates may bring
|
|
breaking changes.**
|
|
|
|
## Features
|
|
|
|
### New visual features
|
|
|
|
- New geometry mode bit `G_PACKED_NORMALS` enables **simultaneous vertex colors
|
|
and normals/lighting on the same mesh**, by encoding the normals in the unused
|
|
2 bytes of each vertex using a variant of [octahedral encoding](https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/).
|
|
The normals are effectively as precise as with the vanilla method of replacing
|
|
vertex RGB with normal XYZ.
|
|
- New geometry mode bit `G_AMBOCCLUSION` enables **ambient occlusion** for
|
|
opaque materials. Paint the shadow map into the vertex alpha channel; separate
|
|
factors (set with `SPAmbOcclusion`) control how much this affects the ambient
|
|
light, all directional lights, and all point lights.
|
|
- New geometry mode bit `G_LIGHTTOALPHA` moves light intensity (maximum of R, G,
|
|
and B of what would normally be the shade color after lighting) to shade
|
|
alpha. Then, if `G_PACKED_NORMALS` is also enabled, the shade RGB is set to
|
|
the vertex RGB. Together with alpha compare and some special display lists
|
|
from fast64 which draw triangles two or more times with different CC settings,
|
|
this enables **cel shading**. Besides cel shading, `G_LIGHTTOALPHA` can also
|
|
be used for [bump mapping](https://renderu.com/en/spookyiluhablog/post/23631)
|
|
or other unusual CC effects (e.g. texture minus vertex color times lighting).
|
|
- New geometry mode bits `G_FRESNEL_COLOR` or `G_FRESNEL_ALPHA` enable
|
|
**Fresnel**. The dot product between a vertex normal and the vector from the
|
|
vertex to the camera is computed; this is then scaled and offset with settable
|
|
factors. The resulting value is then stored to shade color or shade alpha.
|
|
This is useful for:
|
|
- making surfaces like water and glass fade between transparent when viewed
|
|
straight-on and opaque when viewed at a large angle
|
|
- applying a fake "outline" around the border of meshes
|
|
- the N64 bump mapping implementation mentioned above
|
|
- New geometry mode bit `G_LIGHTING_SPECULAR` changes lighting computation from
|
|
diffuse to **specular**. If enabled, the vertex normal for lighting is
|
|
replaced with the reflection of the vertex-to-camera vector over the vertex
|
|
normal. Also, a new size value for each light controls how large the light
|
|
reflection appears to be. This technique is lower fidelity in some ways than
|
|
the vanilla `hilite` system, as it is per-vertex rather than per-pixel, but it
|
|
allows the material to be textured normally. Plus, it supports all scene
|
|
lights (including point) with different dynamic colors, whereas the vanilla
|
|
system supports up to two directional lights and more than one dynamic color
|
|
is difficult.
|
|
- New geometry mode bits `G_ATTROFFSET_ST_ENABLE` and `G_ATTROFFSET_Z_ENABLE`
|
|
apply settable offsets to vertex ST (`SPAttrOffsetST`) and/or Z
|
|
(`SPAttrOffsetZ`) values. These offsets are applied after their respective
|
|
scales. For Z, this enables a method of drawing coplanar surfaces like decals
|
|
but **without the Z fighting** which can happen with the RDP's native decal
|
|
mode. For ST, this enables **UV scrolling** without CPU intervention.
|
|
|
|
### Performance improvements
|
|
|
|
- **56 verts** can fit into DMEM at once, up from 32 verts in F3DEX2, and only
|
|
13% below the 64 verts of reject microcodes. This reduces DRAM traffic and
|
|
RSP time as fewer verts have to be reloaded and re-transformed, and also makes
|
|
display lists shorter.
|
|
- New **occlusion plane** system allows the placement of a 3D quadrilateral
|
|
where objects behind this plane in screen space are culled. This can
|
|
substantially reduce the performance penalty of overdraw in scenes with walls
|
|
in the middle, such as a city or an indoor scene.
|
|
- If a material display list being drawn is the same as the last material, the
|
|
texture loads in the material are skipped (the second time). This effectively
|
|
results in **auto-batched rendering** of repeated objects, as long as each
|
|
only uses one material. This system supports multitexture and all types of
|
|
loads.
|
|
- New `SPTriangleStrip` and `SPTriangleFan` commands **pack up to 5 tris** into
|
|
one 64-bit GBI command (up from 2 tris in F3DEX2). In any given object, most
|
|
tris can be drawn with these commands, with only a few at the end drawn with
|
|
`SP2Triangles` or `SP1Triangle`. So, this cuts the triangle portion of display
|
|
lists roughly in half, saving DRAM traffic and ROM space.
|
|
- New `SPAlphaCompareCull` command enables culling of triangles whose computed
|
|
shade alpha values are all below or above a settable threshold. This
|
|
**substantially reduces the performance penalty of cel shading**--only tris
|
|
which "straddle" the cel threshold are drawn twice, the others are only drawn
|
|
once.
|
|
- A new "hints" system encodes the expected size of the target display list into
|
|
call, branch, and return DL commands. This allows only the needed number of DL
|
|
commands in the next DL to be fetched, rather than always fetching full
|
|
buffers, **saving some DRAM traffic** (maybe around 100 us per frame). The
|
|
bits used for this are ignored by HLE.
|
|
- Segment addresses are now resolved relative to other segments (feature by
|
|
Tharo). This enables a strategy for skipping repeated material DLs: call
|
|
a segment to run the material, remap the segment in the material to a
|
|
display list that immediately returns, and so if the material is called again
|
|
it won't run.
|
|
- Clipped triangles are drawn by minimal overlapping scanlines algorithm; this
|
|
**slightly improves RDP draw time** for large tris (max of about 500 us per
|
|
frame, usually much less or zero).
|
|
|
|
### Miscellaneous
|
|
|
|
- **Point lighting** has been redesigned. The appearance when a light is close
|
|
to an object has been improved. Fixed a bug in F3DEX2/ZEX point lighting where
|
|
a Z component was accidentally doubled in the point lighting calculations. The
|
|
quadratic point light attenuation factor is now an E3M5 floating-point number.
|
|
The performance penalty for point lighting has been reduced.
|
|
- Maximum number of directional / point **lights raised from 7 to 9**. Minimum
|
|
number of directional / point lights lowered from 1 to 0 (F3DEX2 required at
|
|
least one). Also supports loading all lights in one DMA transfer
|
|
(`SPSetLights`), rather than one per light.
|
|
- New `SPLightToRDP` family of commands (e.g. `SPLightToPrimColor`) writes a
|
|
selectable RDP command (e.g. `DPSetPrimColor`) with the RGB color of a
|
|
selectable light (any including ambient). The alpha channel and any other
|
|
parameters are encoded in the command. With some limitations, this allows the
|
|
tint colors of cel shading to **match scene lighting** with no code
|
|
intervention. Also useful for other lighting-dependent effects.
|
|
|
|
### Profiling
|
|
|
|
F3DEX3 introduces a suite of performance profiling capabilities. These take the
|
|
form of performance counters, which report cycle counts for various operations
|
|
or the number of items processed of a given type. There are a total of 21
|
|
performance counters across multiple microcode versions. See the Profiling
|
|
section below.
|
|
|
|
|
|
## Microcode Configuration
|
|
|
|
There are several selectable configuration settings when building F3DEX3, which
|
|
can be enabled in any combination. With a couple minor exceptions, none of these
|
|
settings affect the GBI--in fact, you can swap between the microcode versions on
|
|
a per-frame basis if you build multiple versions into your romhack.
|
|
|
|
### No Occlusion Plane (NOC)
|
|
|
|
If you are not using the occlusion plane feature in your romhack, you can
|
|
use this configuration, which removes the computation of the occlusion plane
|
|
in the vertex processing pipeline, saving some RSP time.
|
|
|
|
If you care about performance, please do consider using the occlusion plane!
|
|
When it occludes even a small percentage of the total triangles drawn, not only
|
|
is RDP time saved (which is the point), but RSP time is also saved when those
|
|
tris are not drawn. This can offset the extra RSP time for computing the
|
|
occlusion plane for all vertices.
|
|
|
|
You can also build both the NOC and base microcodes into your ROM and switch
|
|
between them on a per-frame basis. If there is no occlusion plane active or the
|
|
best occlusion plane candidate would be very small on screen, you can use the
|
|
NOC microcode and save RSP time. If there is a significant occlusion plane, you
|
|
can use the base microcode and reduce the RDP time. You could also determine
|
|
which version to use on the profiling results from the previous frame: if the
|
|
RSP is the bottleneck (e.g. the RDP `CLK - CMD` is high), use the NOC version,
|
|
and otherwise use the base version.
|
|
|
|
### Legacy Vertex Pipeline (LVP)
|
|
|
|
The primary tradeoff for all the new lighting features in F3DEX3 is increased
|
|
RSP time for vertex processing. The base version of F3DEX3 takes about
|
|
**2-2.5x** more RSP time for vertex processing than F3DEX2 (see Performance
|
|
Results section below), assuming no lighting or directional lights only.
|
|
However, under most circumstances, this does not affect the game's overall
|
|
framerate:
|
|
- This only applies to vertex processing, not triangle processing or other
|
|
miscellaneous microcode tasks. So the total RSP cycles spent doing useful work
|
|
during the frame is only modestly increased.
|
|
- The increase in time is only RSP cycles; there is no additional memory
|
|
traffic, so the RDP time is not directly affected.
|
|
- In scenes which are complex enough to fill the RSP->RDP FIFO in DRAM, the RSP
|
|
usually spends a significant fraction of time waiting for the FIFO to not be
|
|
full (as revealed by the F3DEX3 performance counters, see below). In these
|
|
cases, slower vertex processing simply means less time spent waiting, and
|
|
little to no change in total RSP time.
|
|
- When the FIFO does not fill up, usually the RSP takes significantly less time
|
|
during the frame compared to the RDP, so increased RSP time usually does not
|
|
affect the overall framerate.
|
|
|
|
As a result, you should always start with the base version of F3DEX3 in your
|
|
romhack, and if the RSP never becomes the bottleneck, you can stick with that.
|
|
|
|
However, if you have done extreme optimizations in your game to reduce RDP time
|
|
(i.e. if you are Kaze Emanuar), it's possible for the RSP to sometimes become
|
|
the bottleneck with F3DEX3's advanced vertex processing. As a result, the Legacy
|
|
Vertex Pipeline (LVP) configuration has been introduced.
|
|
|
|
This configuration replaces F3DEX3's native vertex and lighting code with a
|
|
faster version based on the same algorithms as F3DEX2. This removes:
|
|
- Point lighting
|
|
- F3DEX3 lighting features: packed normals, ambient occlusion, light-to-alpha
|
|
(cel shading), Fresnel, and specular lighting
|
|
- ST attribute offsets
|
|
|
|
However, it retains all other F3DEX3 features:
|
|
- 56 verts, 9 directional lights
|
|
- Occlusion plane (optional with NOC configuration)
|
|
- Z attribute offsets
|
|
- All features not related to vertex/lighting: auto-batched rendering, packed 5
|
|
triangles commands, hints system, etc.
|
|
|
|
The performance of F3DEX3 vertex processing with both LVP and NOC is almost the
|
|
same as that of F3DEX2; see the Performance Results section below.
|
|
|
|
### Profiling
|
|
|
|
As mentioned above, F3DEX3 includes many performance counters. There are far too
|
|
many counters for a single microcode to maintain, so multiple configurations of
|
|
the microcode can be built, each containing a different set of performance
|
|
counters. These can be swapped while the game is running so the full set of
|
|
counters can be effectively accessed over multiple frames.
|
|
|
|
There are a total of 21 performance counters, including:
|
|
- Counts of vertices, triangles, rectangles, matrices, DL commands, etc.
|
|
- Times the microcode was processing vertices, processing triangles, stalled
|
|
because the RDP FIFO in DMEM was full, and stalled waiting for DMAs to finish
|
|
- A counter enabling a rough measurement of how long the RDP was stalled
|
|
waiting for RDRAM for I/O to the framebuffer / Z buffer
|
|
|
|
The default configuration of F3DEX3 provides a few of the most basic counters.
|
|
The additional profiling configurations, called A, B, and C (for example
|
|
`F3DEX3_BrZ_PA`), provide additional counters, but have two default features
|
|
removed to make space for the extra profiling. These two features were selected
|
|
because their removal does not affect the RDP render time.
|
|
- The `SPLightToRDP` commands are removed (they become no-ops)
|
|
- Flat shading mode, i.e. `!G_SHADING_SMOOTH`, is removed (all tris are smooth)
|
|
|
|
### Branch Depth Instruction (`BrZ` / `BrW`)
|
|
|
|
Use `BrZ` if the microcode is replacing F3DEX2 or an earlier F3D version (i.e.
|
|
SM64), or `BrW` if the microcode is replacing F3DZEX (i.e. OoT or MM). This
|
|
controls whether `SPBranchLessZ*` uses the vertex's W coordinate or screen Z
|
|
coordinate.
|
|
|
|
### Debug Normals (`dbgN`)
|
|
|
|
To help debug lighting issues when integrating F3DEX3 into your romhack, this
|
|
feature causes the vertex colors of any material with lighting enabled to be set
|
|
to the transformed, normalized world space normals. The X, Y, and Z components
|
|
map to R, G, and B, with each dimension's conceptual (-1.0 ... 1.0) range mapped
|
|
to (0 ... 255). This is not compatible with LVP as world space normals do not
|
|
exist in that pipeline. This also breaks vertex alpha and texgen / lookat.
|
|
|
|
Some ways to use this for debugging are:
|
|
- If the normals have obvious problems (e.g. flickering, or not changing
|
|
smoothly as the object rotates / animates), there is likely a problem with the
|
|
model space normals or the M matrix. Conversely, if there is a problem with
|
|
the standard lighting results (e.g. flickering) but the normals don't have
|
|
this problem, the problem is likely in the lighting data.
|
|
- Check that the colors don't change based on the camera position, but DO change
|
|
as the object rotates, so that the same side of an object in world space is
|
|
always the same color.
|
|
- Make a simple object like an octahedron or sphere, view it in game, and check
|
|
that the normals are correct. A normal pointing along +X would be
|
|
(1.0, 0.0, 0.0), meaning (255, 128, 128) or pink. A normal pointing along -X
|
|
would be (-1.0, 0.0, 0.0), meaning (0, 128, 128) or dark cyan. Bright, fully
|
|
saturated colors like green (0, 255, 0), yellow (255, 255, 0), or black should
|
|
never appear as these would correspond to impossibly long normals.
|
|
- Make the same object (octahedron is easiest in this case) with vertex colors
|
|
which match what the normals should be, and compare them.
|
|
|
|
|
|
## Performance Results
|
|
|
|
Vertex pipeline cycles per **vertex pair** in steady state (lower is better).
|
|
Hand-counted timings taking into account all pipeline stalls and all dual-issue
|
|
conditions except for instruction alignment.
|
|
|
|
| Microcode | No Lighting | First Dir Lt | Total for 1 Dir Lt | Extra Dir Lts |
|
|
|----------------|-------------|--------------|--------------------|---------------|
|
|
| F3DEX3 | 97 | 103 | 200 | 29 |
|
|
| F3DEX3_NOC | 79 | 103 | 182 | 29 |
|
|
| F3DEX3_LVP | 80 | 15 | 95 | 7 |
|
|
| F3DEX3_LVP_NOC | 62 | 15 | 77 | 7 |
|
|
| F3DEX2 | 54 | 19 | 73 | 3 then 12 |
|
|
|
|
Vertex processing time as reported by the performance counter in the `PA`
|
|
configuration.
|
|
- Scene 1: Kakariko, adult day, from DMT entrance
|
|
- Scene 2: Custom empty scene with Suzanne monkey head with 1 dir light
|
|
- Scene 3: Same but Suzanne has vertex colors instead of lighting (Link is still
|
|
on screen and has lighting)
|
|
|
|
| Microcode | Scene 1 | Scene 2 | Scene 3 |
|
|
|----------------|---------|---------|---------|
|
|
| F3DEX3 | 7.64ms | 3.13ms | 2.37ms |
|
|
| F3DEX3_NOC | 7.07ms | 2.89ms | 2.14ms |
|
|
| F3DEX3_LVP | 4.57ms | 1.77ms | 1.67ms |
|
|
| F3DEX3_LVP_NOC | 3.96ms | 1.52ms | 1.41ms |
|
|
| F3DEX2 | No* | No* | No* |
|
|
| Vertex count | 3664 | 1608 | 1608 |
|
|
|
|
*F3DEX2 does not contain performance counters, so the portion of the RSP time
|
|
taken for vertex processing cannot be measured.
|
|
|
|
|
|
## Porting Your Romhack Codebase to F3DEX3
|
|
|
|
For an OoT codebase, only a few minor changes are required to use F3DEX3.
|
|
However, more changes are recommended to increase performance and enable new
|
|
features.
|
|
|
|
How to modify the microcode in your HackerOoT based romhack (steps may be
|
|
similar for other games):
|
|
- Replace `include/ultra64/gbi.h` in your romhack with `gbi.h` from this repo.
|
|
- Make the "Required Changes" listed below.
|
|
- Build this repo: install the latest version of `armips`, then `make
|
|
F3DEX3_BrZ` or `make F3DEX3_BrW`.
|
|
- Copy the microcode binaries (`build/F3DEX3_X/F3DEX3_X.code` and
|
|
`build/F3DEX3_X/F3DEX3_X.data`) to somewhere in your romhack repo, e.g. `data`.
|
|
- In `data/rsp.rodata.s`, change the line between `fifoTextStart` and
|
|
`fifoTextEnd` to `.incbin "data/F3DEX3_X.code"` (or wherever you put the
|
|
binary), and similarly change the line between `fifoDataStart` and
|
|
`fifoDataEnd` to `.incbin "data/F3DEX3_X.data"`. After both the `fifoTextEnd`
|
|
and `fifoDataEnd` labels, add a line `.balign 16`.
|
|
- If you are planning to ever update the microcode binaries in the future,
|
|
add the following to the Makefile of your romhack, after the section starting
|
|
with `build/data/%.o` (i.e. two lines after that, with a blank line before
|
|
and after): `build/data/rsp.rodata.o: data/F3DEX3_X.code data/F3DEX3_X.data`.
|
|
It is not a mistake that this new line you are adding won't have a second
|
|
indented line after it; it is like the `message_data_static` lines below that.
|
|
This will tell `make` to rebuild `rsp.rodata.o`, which includes the microcode
|
|
binaries, whenever they are changed.
|
|
- Clean and build your romhack (`make clean`, `make`).
|
|
- Test your romhack and confirm that everything works as intended.
|
|
- Make as many of the "Recommended changes" listed below as possible.
|
|
- If you start using new features in F3DEX3, make the "Changes required for new
|
|
features" listed below.
|
|
|
|
### Required Changes
|
|
|
|
Both OoT and SM64:
|
|
|
|
- Remove uses of internal GBI features which have been removed in F3DEX3 (see "C
|
|
GBI Compatibility" section below for full list). In OoT, the only changes
|
|
needed are:
|
|
- In `src/code/ucode_disas.c`, remove the switch statement cases for
|
|
`G_LINE3D`, `G_MW_CLIP`, `G_MV_MATRIX`, `G_MVO_LOOKATX`, `G_MVO_LOOKATY`,
|
|
and `G_MW_PERSPNORM`.
|
|
- In `src/libultra/gu/lookathil.c`, remove the lines which set the `col`,
|
|
`colc`, and `pad` fields.
|
|
- Change your game engine lighting code to set the `type` (formerly `pad1`)
|
|
field to 0 in the initialization of any directional light (`Light_t` and
|
|
derived structs like `Light` or `Lightsn`). F3DEX3 ignores the state of the
|
|
`G_LIGHTING_POSITIONAL` geometry mode bit in all display lists, meaning both
|
|
directional and point lights are supported for all display lists (including
|
|
vanilla). The light is identified as directional if `type` == 0 or point if
|
|
`kc` > 0 (`kc` and `type` are the same byte). This change is required because
|
|
otherwise garbage nonzero values may be put in the padding byte, leading
|
|
directional lights to be misinterpreted as point lights.
|
|
- The change needed in OoT is: in `src/code/z_lights.c`, in
|
|
`Lights_BindPoint`, `Lights_BindDirectional`, and `Lights_NewAndDraw`, set
|
|
`l.type` to 0 right before setting `l.col`.
|
|
|
|
SM64 only:
|
|
|
|
- If you are using the vanilla lighting system where light directions are always
|
|
fixed, the vanilla permanent light direction of `{0x28, 0x28, 0x28}` must be
|
|
changed to `{0x49, 0x49, 0x49}`, or everything will be too dark. The former
|
|
vector is not properly normalized, but F3D through F3DEX2 normalize light
|
|
directions in the microcode, so it doesn't matter with those microcodes. In
|
|
contrast, F3DEX3 normalizes vertex normals (after transforming them), but
|
|
assumes light directions have already been normalized.
|
|
- Matrix stack fix (world space lighting / view matrix in VP instead of in M) is
|
|
basically required. If you *really* want camera space lighting, use matrix
|
|
stack fix, transform the fixed camera space light direction by V inverse each
|
|
frame, and send that to the RSP. This will be faster than the alternative (not
|
|
using matrix stack fix and enabling `G_NORMALS_MODE_AUTO` to correct the
|
|
matrix).
|
|
|
|
### Recommended Changes (Non-Lighting)
|
|
|
|
- Clean up any code using the deprecated, hacky `SPLookAtX` and `SPLookAtY` to
|
|
use `SPLookAt` instead (this is only a few lines change). Also remove any
|
|
code which writes `SPClipRatio` or `SPForceMatrix`--these are now no-ops, so
|
|
you might as well not write them.
|
|
- Avoid using `G_MTX_MUL` in `SPMatrix`. That is, make sure your game engine
|
|
computes a matrix stack on the CPU and sends the final matrix for each object
|
|
/ limb to the RSP, rather than multiplying matrices on the RSP. OoT already
|
|
usually does the former for precision / accuracy reasons and only uses
|
|
`G_MTX_MUL` in a couple places (e.g. view * perspective matrix); it is okay to
|
|
leave those. This change is recommended because the `G_MTX_MUL` mode of
|
|
`SPMatrix` has been moved to Overlay 4 in F3DEX3 (see below), making it
|
|
substantially slower than it was in F3DEX2. It still functions the same though
|
|
so you can use it if it's really needed.
|
|
- Re-export as many display lists (scenes, objects, skeletons, etc.) as possible
|
|
with fast64 set to F3DEX3 mode, to take advantage of the substantially larger
|
|
vertex buffer, triangle packing commands, "hints" system, etc.
|
|
- `#define REQUIRE_SEMICOLONS_AFTER_GBI_COMMANDS` (at the top of, or before
|
|
including, the GBI) for a more modern, OoT-style codebase where uses of GBI
|
|
commands require semicolons after them. SM64 omits the semicolons sometimes,
|
|
e.g. `gSPDisplayList(gfx++, foo) gSPEndDisplayList(gfx++);`. If you are using
|
|
`-Wpedantic`, using this define is required.
|
|
- Once everything in your romhack is ported to F3DEX3 and everything is stable,
|
|
`#define NO_SYNCS_IN_TEXTURE_LOADS` (at the top of, or before including, the
|
|
GBI) and fix any crashes or graphical issues that arise. Display lists
|
|
exported from fast64 already do not contain these syncs, but vanilla display
|
|
lists or custom ones using the texture loading multi-command macros do.
|
|
Disabling the syncs saves a few percent of RDP cycles for each material setup;
|
|
what percentage this is of the total RDP time depends on how many triangles
|
|
are typically drawn between each material change. For more information, see
|
|
the GBI documentation near this define.
|
|
|
|
### Recommended Changes (Lighting)
|
|
|
|
- Change your game engine lighting code to load all lights in one DMA transfer
|
|
with `SPSetLights`, instead of one-at-a-time with repeated `SPLight` commands.
|
|
Note that if you are using a pointer (dynamically allocated) rather than a
|
|
direct variable (statically allocated), you need to dereference it; see the
|
|
docstring for this macro in the GBI.
|
|
- If you still need to use `SPLight` somewhere after this, use `SPLight` only
|
|
for directional / point lights and use `SPAmbient` for ambient lights.
|
|
Directional / point lights are 16 bytes and ambient are 8, and the first 8
|
|
bytes are the same for both types, so normally it's okay to use `SPLight`
|
|
instead of `SPAmbient` to write ambient lights too. However, the memory space
|
|
reserved for lights in the microcode is 16*9+8 bytes, so if you have 9
|
|
directional / point lights and then use `SPLight` to write the ambient light,
|
|
it will overflow the buffer by 8 bytes and corrupt memory.
|
|
- Once you have made the above change for `SPAmbient`, increase the maximum
|
|
number of lights in your engine from 7 to 9.
|
|
- Consider setting lights once before rendering a scene and all actors, rather
|
|
than setting lights before rendering each actor. OoT does the latter to
|
|
emulate point lights in a scene with a directional light recomputed per actor.
|
|
You can now just send those to the RSP as real point lights, regardless of
|
|
whether the display lists are vanilla or new.
|
|
- If you are porting a game which already had point lighting (e.g. Majora's
|
|
Mask), note that the point light kc, kl, and kq factors have been changed, so
|
|
you will need to redesign how game engine light parameters (e.g. "light
|
|
radius") map to these parameters.
|
|
|
|
### Changes Required for New Features
|
|
|
|
Each of these changes is required if you want to use the respective new feature,
|
|
but is not necessary if you are not using it.
|
|
|
|
- For Fresnel and specular lighting: Whenever your code sends camera properties
|
|
to the RSP (VP matrix, viewport, etc.), also send the camera world position to
|
|
the RSP with `SPCameraWorld`. For OoT, this is not trivial because the game
|
|
rendering creates and sets the view matrix in the main DL, then renders the
|
|
game contents, then updates the camera, and finally retroactively modifies the
|
|
view matrix at the beginning of the main DL. See the code in `cpu/camera.c`.
|
|
- For specular lighting: Set the `size` field of any `Light_t` and `PosLight_t`
|
|
to an appropriate value based on the game engine parameters for that light.
|
|
- For the occlusion plane: Bring the code from `cpu/occlusionplane.c` into your
|
|
game and follow the included instructions.
|
|
- For the performance counters: See `cpu/counters.c`.
|
|
|
|
|
|
## Backwards Compatibility with F3DEX2
|
|
|
|
### C GBI Compatibility
|
|
|
|
F3DEX3 is backwards compatible with F3DEX2 at the C GBI level for all features
|
|
and commands except:
|
|
|
|
- The `G_SPECIAL_*` command IDs have been removed. `G_SPECIAL_2` and
|
|
`G_SPECIAL_3` were no-ops in F3DEX2, and `G_SPECIAL_1` was a trigger to
|
|
recalculate the MVP matrix. There is no MVP matrix in F3DEX3 so this is
|
|
useless.
|
|
- `G_LINE3D` (and `Gfx.line`) has been removed. This command did not actually
|
|
work in F3DEX2 (it behaved as a no-op).
|
|
- `G_MW_CLIP` has been removed, and `SPClipRatio` has been converted into a
|
|
no-op. Clipping is handled differently in F3DEX3 and the clip ratio cannot be
|
|
changed from 2.
|
|
- `G_MV_MATRIX`, `G_MW_MATRIX`, and `G_MW_FORCEMTX` have been removed, and
|
|
`SPForceMatrix` has been converted into a no-op. This is because there is no
|
|
MVP matrix in F3DEX3.
|
|
- `G_MV_POINT` has been removed. This was not used in any command; it would have
|
|
likely been used for debugging to copy vertices from DMEM to examine them.
|
|
This does not affect `SPModifyVertex`, which is still supported.
|
|
- `G_MW_PERSPNORM` has been removed; `SPPerspNormalize` is still supported but
|
|
is encoded differently, no longer using this define.
|
|
- `G_MVO_LOOKATX` and `G_MVO_LOOKATY` have been removed, and `SPLookAtX` and
|
|
`SPLookAtY` are deprecated. `SPLookAtX` has been changed to set both
|
|
directions and `SPLookAtY` has been converted to a no-op. To set the lookat
|
|
directions, use `SPLookAt`. The lookat directions are now in one 8-byte DMA
|
|
word, so they must always be set at the same time as each other. Most of the
|
|
non-functional fields (e.g. color) of `LookAt` and its sub-types have been
|
|
removed, so code which accesses these fields needs to change. Code which only
|
|
accesses lookat directions should be compatible with no changes.
|
|
- As discussed above, the `pad1` field of `Light_t` is renamed to `type` and
|
|
must be set to zero.
|
|
- If you do not raise the maximum number of lights from 7 to 9, the lighting GBI
|
|
commands are backwards compatible. However, if you do raise the number of
|
|
lights, you must use `SPAmbient` to write the ambient light, as discussed
|
|
above. Note that you can now load all your lights with one command,
|
|
`SPSetLights`, so it is not usually necessary to use `SPLight` and `SPAmbient`
|
|
at all.
|
|
|
|
### Binary Display List Compatibility
|
|
|
|
F3DEX3 is generally binary backwards compatible with OoT-style display lists for
|
|
objects, scenes, etc. **It is not compatible at the binary level with SM64-style
|
|
display lists which encode object colors as light colors**, as all the command
|
|
encodings related to lighting have changed. Of course, if you recompile these
|
|
display lists with the new `gbi.h`, it can run them.
|
|
|
|
The deprecated commands mentioned above in the C GBI section have had their
|
|
encodings changed (the original encodings will do bad things / crash). In
|
|
addition, all lighting-related commands--e.g. `gdSPDefLights*`, `SPNumLights`,
|
|
`SPLight`, `SPLightColor`, `SPLookAt`--have had their encodings changed, making
|
|
them binary incompatible. The lighting data structures, e.g. `Light_t`,
|
|
`PosLight_t`, `LookAt_t`, `Lightsn`, `Lights*`, `PosLights*`, etc., have also
|
|
changed--generally only slightly, so most code is compatible with no changes.
|
|
|
|
`SPSegment` has been given a different command id (`G_RELSEGMENT` vs.
|
|
`G_MOVEWORD`) to facilitate relative segmented address translation. The
|
|
original binary encoding is still valid, but does not support relative
|
|
translation like the new encoding. However, recompiling with the C GBI will
|
|
always use the new encoding.
|
|
|
|
|
|
## What are the tradeoffs for all these new features?
|
|
|
|
### Vertex Processing RSP Time
|
|
|
|
See the Microcode Configuration and Performance Results sections above.
|
|
|
|
### Overlay 4
|
|
|
|
(Note that in the LVP configuration, Overlay 4 is absent; there is no M inverse
|
|
transpose matrix discussed below, and the other commands mentioned below are
|
|
directly in the microcode without an overlay, due to there being enough IMEM
|
|
space.)
|
|
|
|
F3DEX2 contains Overlay 2, which does lighting, and Overlay 3, which does
|
|
clipping (run on any large triangle which extends a large distance offscreen).
|
|
These overlays are more RSP assembly code which are loaded into the same space
|
|
in IMEM. If the wrong overlay is loaded when the other is needed, the proper
|
|
one is loaded and then code jumps to it. Display lists which do not use lighting
|
|
can stay on Overlay 3 at all times. Display lists for things that are typically
|
|
relatively small on screen, such as characters, can stay on Overlay 2 at all
|
|
times, because even when a triangle overlaps the edge of the screen, it
|
|
typically moves fully off the screen and is discarded before it reaches the
|
|
clipping bounds (2x the screen size).
|
|
|
|
In F3DEX2, the only case where the overlays are swapped frequently is for
|
|
scenes with lighting, because they have large triangles which often extend far
|
|
offscreen (Overlay 3) but also need lighting (Overlay 2). Worst case, the RSP
|
|
will load Overlay 2 once for every `SPVertex` command and then load Overlay 3
|
|
for every set of `SP*Triangle*` commands.
|
|
|
|
(If you're curious, Overlays 0 and 1 are not related to 2 and 3, and have to do
|
|
with starting and stopping RSP tasks. During normal display list execution,
|
|
Overlay 1 is always loaded.)
|
|
|
|
F3DEX3 introduces Overlay 4, which can occupy the same IMEM as Overlay 2 and 3.
|
|
This overlay contains handlers for:
|
|
- Computing the inverse transpose of the model matrix M (abbreviated as mIT),
|
|
discussed below
|
|
- The codepath for `SPMatrix` with `G_MTX_MUL` set (base version only; this is
|
|
moved out of the overlay to normal microcode in the NOC configuration due to
|
|
having extra IMEM space available)
|
|
- `SPBranchLessZ*`
|
|
- `SPDma_io`
|
|
|
|
Whenever any of these features is needed, the RSP has to swap to Overlay 4. The
|
|
next time lighting or clipping is needed, the RSP has to then swap back to
|
|
Overlay 2 or 3. The round-trip of these two overlay loads takes about 5
|
|
microseconds of DRAM time including overheads. Fortunately, all the above
|
|
features other than the mIT matrix are rarely or never used.
|
|
|
|
The mIT matrix is needed in F3DEX3 because normals are covectors--they stretch
|
|
in the opposite direction of an object's scaling. So while you multiply a vertex
|
|
by M to transform it from model space to world space, you have to multiply a
|
|
normal by M inverse transpose to go to world space. F3DEX2 solves this problem
|
|
by instead transforming light directions into model space with M transpose, and
|
|
computing the lighting in model space. However, this requires extra DMEM to
|
|
store the transformed lights, and adds an additional performance penalty for
|
|
point lighting which is absent in F3DEX3. Plus, having world space normals in
|
|
F3DEX3 enables Fresnel and specular lighting.
|
|
|
|
If an object's transformation matrix stack only includes translations,
|
|
rotations, and uniform scale (i.e. same scale in X, Y, and Z), then M inverse
|
|
transpose is just a rescaled version of M, and the normals can be transformed
|
|
with M directly. It is only when the matrix includes nonuniform scales or shear
|
|
that M inverse transpose differs from M. The difference gets larger as the scale
|
|
or shear gets more extreme.
|
|
|
|
F3DEX3 provides three options for handling this (see `SPNormalsMode`):
|
|
- `G_NORMALS_MODE_FAST`: Use M to transform normals. No performance penalty.
|
|
Lighting will be somewhat distorted for objects with nonuniform scale or
|
|
shear.
|
|
- `G_NORMALS_MODE_AUTO`: The RSP will automatically compute M inverse transpose
|
|
whenever M changes. Costs about 3.5 microseconds of DRAM time per matrix, i.e.
|
|
per object or skeleton limb which has lighting enabled. Lighting is correct
|
|
for nonuniform scale or shear.
|
|
- `G_NORMALS_MODE_MANUAL`: You compute M inverse transpose on the CPU and
|
|
manually upload it to the RSP every time M changes.
|
|
|
|
It is recommended to use `G_NORMALS_MODE_FAST` (the default) for most things,
|
|
and use `G_NORMALS_MODE_AUTO` only for objects while they currently have a
|
|
nonuniform scale (e.g. Mario only while he is squashed).
|
|
|
|
### Optimizing for RSP code size
|
|
|
|
A number of optimizations in F3DEX2 which saved a few cycles but took several
|
|
more instructions have been removed. Outside of vertex processing, these have a
|
|
very small impact on overall RSP time and no impact on RDP time.
|
|
|
|
### Far clipping removal
|
|
|
|
Far clipping is completely removed in F3DEX3. Far clipping is not intentionally
|
|
used for performance or aesthetic reasons in levels in vanilla SM64 or OoT,
|
|
though it can be seen in certain extreme cases. However, it is used on the SM64
|
|
title screen for the zoom-in on Mario's face, so this will look slightly
|
|
different.
|
|
|
|
The removal of far clipping saved a bunch of DMEM space, and enabled other
|
|
changes to the clipping implementation which saved even more DMEM space.
|
|
|
|
NoN (No Nearclipping) is also mandatory in F3DEX3, though this was already the
|
|
microcode option used in OoT. Note that tris are still clipped at the camera
|
|
plane; nearclipping means they are clipped at the nearplane, which is a short
|
|
distance in front of the camera plane.
|
|
|
|
### Removal of scaled vertex normals
|
|
|
|
A few clever romhackers figured out that you could shrink the normals on verts
|
|
in your mesh (so their length is less than "1") to make the lighting on those
|
|
verts dimmer and create a version of ambient occlusion. In the base vertex
|
|
pipeline, F3DEX3 normalizes vertex normals after transforming them, which is
|
|
required for most features of the lighting system including packed normals, so
|
|
this no longer works. However, F3DEX3 has support for ambient occlusion via
|
|
vertex alpha, which accomplishes the same goal with some extra benefits:
|
|
- Much easier to create: just paint the vertex alpha in Blender / fast64. The
|
|
scaled normals approach was not supported in fast64 and had to be done with
|
|
scripts or by hand.
|
|
- The amount of ambient occlusion in F3DEX3 can be set at runtime based on
|
|
variable scene lighting, whereas the scaled normals approach is baked into the
|
|
mesh.
|
|
- F3DEX3 can have the vertex alpha affect ambient, directional, and point lights
|
|
by different amounts, which is not possible with scaled normals. In fact,
|
|
scaled normals never affect the ambient light, contrary to the concept of
|
|
ambient occlusion.
|
|
|
|
Furthermore, for partial HLE compatibility, the same mesh can have the ambient
|
|
occlusion information encoded in both scaled normals and vertex alpha at the
|
|
same time. HLE will ignore the vertex alpha AO but use the scaled normals;
|
|
F3DEX3 will fix the normals' scale but then apply the AO.
|
|
|
|
The only case where scaled normals work but F3DEX3 AO doesn't work is for meshes
|
|
with vertex alpha actually used for transparency (therefore also no fog).
|
|
|
|
Note that in LVP mode, scaled normals are supported and work the same way as in
|
|
F3DEX2, while ambient occlusion is not supported.
|
|
|
|
### RDP temporary buffers shrinking
|
|
|
|
In FIFO versions of F3DEX2, there are two DMEM buffers to hold RDP commands
|
|
generated by the microcode, which are swapped and copied to the FIFO in DRAM.
|
|
These each had the capacity of two-and-a-fraction full-size triangle commands
|
|
(i.e. triangles with shade, texture, and Z-buffer). For short commands (e.g.
|
|
texture loads, color combiner, etc.) there is a slight performance gain from
|
|
having longer buffers in DMEM which are swapped to DRAM less frequently. And, if
|
|
a substantial portion of triangles were rendered without shade or texture such
|
|
that three tris could fit per buffer, being able to fit the three tris would
|
|
also slightly improve performance. However, in practice, the vast majority of
|
|
the FIFO is occupied by full-size tris, so the buffers are effectively only two
|
|
tris in size because a third tri can't fit. So, their size has been reduced to
|
|
two tris, saving a substantial amount of DMEM.
|
|
|
|
### Segment 0
|
|
|
|
Segment 0 is now reserved: ensure segment 0 is never set to anything but
|
|
0x00000000. In F3DEX2 and prior this was only a good idea (and SM64 and OoT
|
|
always follow this); in F3DEX3 segmented addresses are now resolved relative to
|
|
other segments. That is, `gsSPSegment(0x08, 0x07001000)` sets segment 8 to the
|
|
base address of segment 7 with an additional offset of 0x1000. So for correct
|
|
behavior when supplying a direct-mapped or physical address such as 0x80101000,
|
|
segment 0 must always be 0x00000000 so that this address resolves to e.g.
|
|
0x101000 as expected in this example.
|
|
|
|
### Obscure semantic differences from F3DEX2 that should never matter in practice
|
|
|
|
- `SPLoadUcode*` corrupts the current M inverse transpose matrix state. If using
|
|
`G_NORMALS_MODE_FAST`, this doesn't matter. If using `G_NORMALS_MODE_AUTO`,
|
|
you must send the M matrix to the RSP again after returning to F3DEX3 from the
|
|
other microcode (which would normally be done anyway when starting to draw the
|
|
next object). If using `G_NORMALS_MODE_MANUAL`, you must send the updated
|
|
M inverse transpose matrix to the RSP after returning to F3DEX3 from the other
|
|
microcode (which would normally be done anyway when starting to draw the next
|
|
object).
|
|
- Changing fog settings--i.e. enabling or disabling `G_FOG` in the geometry mode
|
|
or executing `SPFogFactor` or `SPFogPosition`--between loading verts and
|
|
drawing tris with those verts will lead to incorrect fog values for those
|
|
tris. In F3DEX2, the fog settings at vertex load time would always be used,
|
|
even if they were changed before drawing tris.
|
|
|
|
|
|
## Credits
|
|
|
|
F3DEX3 modifications from F3DEX2 are by Sauraen and are dedicated to the public
|
|
domain. `cpu/` C code is entirely by Sauraen and also dedicated to the public
|
|
domain.
|
|
|
|
If you use F3DEX3 in a romhack, please credit "F3DEX3 Microcode - Sauraen" in
|
|
your project's in-game Staff Roll or wherever other contributors to your project
|
|
are credited.
|
|
|
|
Other credits:
|
|
- Wiseguy: large chunk of F3DEX2 disassembly documentation and first version of
|
|
build system
|
|
- Tharo: relative segment resolution feature, other feature discussions
|
|
- Kaze Emanuar: several feature suggestions, testing
|
|
- thecozies: Fresnel feature suggestion
|
|
- neoshaman: feature discussions
|