2024-06-17 03:56:10 +02:00
|
|
|
@page design-tradeoffs Design Tradeoffs
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2024-06-17 03:56:10 +02:00
|
|
|
# What are the tradeoffs for all these new features?
|
|
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
In other words, when is F3DEX3 worse than F3DEX2?
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
## Vertex processing RSP time for occlusion plane
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
In the occlusion plane F3DEX3 configuration, vertex processing is slower than
|
|
|
|
|
in F3DEX2. If using this configuration and there is no occlusion plane or it is
|
|
|
|
|
occluding almost nothing, the RSP will be slower with no other benefit.
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
However, when the occlusion plane is occluding even a few percent of the
|
|
|
|
|
triangles in the scene, the situation changes. This saves RDP time, and most
|
|
|
|
|
games are RDP bound, so this trades off RSP time for RDP time and makes the game
|
|
|
|
|
faster overall. Plus, RSP time is also saved for the tris which are not drawn,
|
|
|
|
|
which can approximately cancel out the extra RSP time for computing the
|
|
|
|
|
occlusion plane for all vertices.
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-09-29 21:51:34 -07:00
|
|
|
## Functionality in overlays
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-09-21 15:06:43 -07:00
|
|
|
The following commands are moved to Overlay 2 or 3 in F3DEX3 to save IMEM space.
|
|
|
|
|
This means that code will have to be loaded from DRAM to run them if a different
|
|
|
|
|
overlay happens to be loaded already.
|
2025-07-13 16:37:22 -07:00
|
|
|
- Push and multiply codepaths for `SPMatrix`
|
|
|
|
|
- `SPPopMatrix*`
|
|
|
|
|
- `SPDma*`
|
|
|
|
|
- `SPMemset`
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
However:
|
|
|
|
|
- Multiplying, pushing, and popping matrices is not recommended for performance
|
|
|
|
|
or accuracy, and these are not used for most 3D objects in SM64 or OoT.
|
|
|
|
|
- `SPDma*` is rarely used except at startup for HLE detection.
|
|
|
|
|
- `SPMemset` is a new F3DEX3 command which can improve performance. Plus, it is
|
2025-09-21 15:06:43 -07:00
|
|
|
typically run shortly after render start, when Overlay 3 (which contains it)
|
|
|
|
|
is already in IMEM.
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
So there is not a significant practical performance impact from these changes.
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2024-06-17 03:56:10 +02:00
|
|
|
## Far clipping removal
|
2024-06-16 18:21:06 -07:00
|
|
|
|
|
|
|
|
Far clipping is completely removed in F3DEX3. Far clipping is not intentionally
|
|
|
|
|
used for performance or aesthetic reasons in levels in vanilla SM64 or OoT,
|
|
|
|
|
though it can be seen in certain extreme cases. However, it is used on the SM64
|
|
|
|
|
title screen for the zoom-in on Mario's face, so this will look slightly
|
|
|
|
|
different.
|
|
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
Far clipping can be used to cull tris which are fully "fogged out" if the
|
|
|
|
|
background color (no skybox) is also the fog color, for performance benefits.
|
|
|
|
|
This effect has a bad reputation in '90s era games for being used as a cheap
|
|
|
|
|
trick to hide performance problems, though it's occasionally used in "spooky"
|
|
|
|
|
levels in romhacks. In F3DEX3, `SPAlphaCompareCull` can be used instead of far
|
|
|
|
|
clipping to cull these triangles which are fully in fog.
|
|
|
|
|
|
2024-06-16 18:21:06 -07:00
|
|
|
The removal of far clipping saved a bunch of DMEM space, and enabled other
|
|
|
|
|
changes to the clipping implementation which saved even more DMEM space.
|
|
|
|
|
|
|
|
|
|
NoN (No Nearclipping) is also mandatory in F3DEX3, though this was already the
|
|
|
|
|
microcode option used in OoT. Note that tris are still clipped at the camera
|
|
|
|
|
plane; nearclipping means they are clipped at the nearplane, which is a short
|
|
|
|
|
distance in front of the camera plane.
|
|
|
|
|
|
2024-06-17 03:56:10 +02:00
|
|
|
## Removal of scaled vertex normals
|
2024-06-16 18:21:06 -07:00
|
|
|
|
|
|
|
|
A few clever romhackers figured out that you could shrink the normals on verts
|
|
|
|
|
in your mesh (so their length is less than "1") to make the lighting on those
|
2025-07-13 16:37:22 -07:00
|
|
|
verts dimmer and create a version of ambient occlusion. In the "advanced"
|
|
|
|
|
lighting codepath, F3DEX3 normalizes vertex normals after transforming them,
|
|
|
|
|
which is required for point lights, specular, and Fresnel, so this no longer
|
|
|
|
|
works. However, F3DEX3 has support for ambient occlusion via vertex alpha, which
|
|
|
|
|
accomplishes the same goal with some extra benefits:
|
2024-06-16 18:21:06 -07:00
|
|
|
- Much easier to create: just paint the vertex alpha in Blender / fast64. The
|
|
|
|
|
scaled normals approach was not supported in fast64 and had to be done with
|
|
|
|
|
scripts or by hand.
|
|
|
|
|
- The amount of ambient occlusion in F3DEX3 can be set at runtime based on
|
|
|
|
|
variable scene lighting, whereas the scaled normals approach is baked into the
|
|
|
|
|
mesh.
|
|
|
|
|
- F3DEX3 can have the vertex alpha affect ambient, directional, and point lights
|
|
|
|
|
by different amounts, which is not possible with scaled normals. In fact,
|
|
|
|
|
scaled normals never affect the ambient light, contrary to the concept of
|
|
|
|
|
ambient occlusion.
|
|
|
|
|
|
|
|
|
|
The only case where scaled normals work but F3DEX3 AO doesn't work is for meshes
|
|
|
|
|
with vertex alpha actually used for transparency (therefore also no fog).
|
|
|
|
|
|
2025-07-13 16:37:22 -07:00
|
|
|
Note that in the "basic" lighting codepath in F3DEX3, vertex normals are treated
|
|
|
|
|
the same way as in F3DEX2, so scaled normals are supported there. Ambient
|
|
|
|
|
occlusion is also supported there.
|
2024-06-16 18:21:06 -07:00
|
|
|
|
2024-06-17 03:56:10 +02:00
|
|
|
## RDP temporary buffers shrinking
|
2024-06-16 18:21:06 -07:00
|
|
|
|
|
|
|
|
In FIFO versions of F3DEX2, there are two DMEM buffers to hold RDP commands
|
|
|
|
|
generated by the microcode, which are swapped and copied to the FIFO in DRAM.
|
|
|
|
|
These each had the capacity of two-and-a-fraction full-size triangle commands
|
|
|
|
|
(i.e. triangles with shade, texture, and Z-buffer). For short commands (e.g.
|
|
|
|
|
texture loads, color combiner, etc.) there is a slight performance gain from
|
|
|
|
|
having longer buffers in DMEM which are swapped to DRAM less frequently. And, if
|
|
|
|
|
a substantial portion of triangles were rendered without shade or texture such
|
|
|
|
|
that three tris could fit per buffer, being able to fit the three tris would
|
|
|
|
|
also slightly improve performance. However, in practice, the vast majority of
|
|
|
|
|
the FIFO is occupied by full-size tris, so the buffers are effectively only two
|
|
|
|
|
tris in size because a third tri can't fit. So, their size has been reduced to
|
|
|
|
|
two tris, saving a substantial amount of DMEM.
|
|
|
|
|
|
2024-06-17 03:56:10 +02:00
|
|
|
## Segment 0
|
2024-06-16 18:21:06 -07:00
|
|
|
|
|
|
|
|
Segment 0 is now reserved: ensure segment 0 is never set to anything but
|
|
|
|
|
0x00000000. In F3DEX2 and prior this was only a good idea (and SM64 and OoT
|
|
|
|
|
always follow this); in F3DEX3 segmented addresses are now resolved relative to
|
|
|
|
|
other segments. That is, `gsSPSegment(0x08, 0x07001000)` sets segment 8 to the
|
|
|
|
|
base address of segment 7 with an additional offset of 0x1000. So for correct
|
|
|
|
|
behavior when supplying a direct-mapped or physical address such as 0x80101000,
|
|
|
|
|
segment 0 must always be 0x00000000 so that this address resolves to e.g.
|
|
|
|
|
0x101000 as expected in this example.
|
|
|
|
|
|
2024-09-21 18:29:11 -07:00
|
|
|
## Non-textured tris
|
|
|
|
|
|
|
|
|
|
In F3DEX2, the RSP time for drawing non-textured tris was significantly lower
|
|
|
|
|
than for textured tris, by skipping a chunk of computation for the texture
|
2024-11-17 22:30:52 -08:00
|
|
|
coefficients if they were disabled. In F3DEX3, no computation is skipped when
|
2025-09-21 15:06:43 -07:00
|
|
|
textures are disabled. However, practically almost all materials use textures,
|
|
|
|
|
and F3DEX3 is faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still
|
|
|
|
|
does not send the texture cofficients if they are disabled, saving DRAM access
|
|
|
|
|
time for RSP -> FIFO and FIFO -> RDP. RDP time savings from avoiding loading a
|
2025-07-13 16:37:22 -07:00
|
|
|
texture are unaffected of course.
|
2024-09-21 18:29:11 -07:00
|
|
|
|
2025-09-29 21:51:34 -07:00
|
|
|
## Yield check timing
|
|
|
|
|
|
|
|
|
|
In F3DEX2, the microcode checks whether the CPU has requested that it yield (to
|
|
|
|
|
run the audio microcode) before running every display list command. F3DEX3 now
|
|
|
|
|
performs this check every time the input buffer is refilled, which is typically
|
|
|
|
|
once every 21 commands. The amount by which this delays the start of the audio
|
|
|
|
|
microcode is typically very small, and worst case during normal conditions would
|
|
|
|
|
be a few hundred microseconds. However, if the RDP FIFO is full during this
|
|
|
|
|
time, the microcode will have to wait for the RDP to make progress through its
|
|
|
|
|
workload to free up space for the outputs of the RSP commands. This will slow
|
|
|
|
|
down the RSP to the RDP's speed, and since triangles can be arbitrarily large
|
|
|
|
|
on screen, this can theoretically cause huge stalls. If you ever encounter this
|
|
|
|
|
in practice, please contact Sauraen.
|
|
|
|
|
|
2024-06-17 03:56:10 +02:00
|
|
|
## Obscure semantic differences from F3DEX2 that should never matter in practice
|
2024-06-16 18:21:06 -07:00
|
|
|
|
|
|
|
|
- Changing fog settings--i.e. enabling or disabling `G_FOG` in the geometry mode
|
|
|
|
|
or executing `SPFogFactor` or `SPFogPosition`--between loading verts and
|
|
|
|
|
drawing tris with those verts will lead to incorrect fog values for those
|
|
|
|
|
tris. In F3DEX2, the fog settings at vertex load time would always be used,
|
|
|
|
|
even if they were changed before drawing tris.
|
2025-07-27 17:23:36 -07:00
|
|
|
- Drawing tris overwrites the 4 bytes stored with `G_RDPHALF_1`, which is used
|
|
|
|
|
to hold state during some display list macros which are actually two 8-byte
|
|
|
|
|
commands. This change is not noticeable when using standard GBI commands, only
|
|
|
|
|
if something highly custom has been set up.
|
2025-09-21 15:06:43 -07:00
|
|
|
- `SPTexture` and `SPFogFactor` state is corrupted when loading and returning
|
|
|
|
|
from another microcode (S2DEX). In F3DEX2, it would be reinitialized to
|
|
|
|
|
default values; in F3DEX3, it is left as garbage values.
|