Files
F3DEX3/docs/Documentation/Design Tradeoffs.md

154 lines
8.1 KiB
Markdown
Raw Permalink Normal View History

2024-06-17 03:56:10 +02:00
@page design-tradeoffs Design Tradeoffs
2024-06-16 18:21:06 -07:00
2024-06-17 03:56:10 +02:00
# What are the tradeoffs for all these new features?
2025-07-13 16:37:22 -07:00
In other words, when is F3DEX3 worse than F3DEX2?
2024-06-16 18:21:06 -07:00
2025-07-13 16:37:22 -07:00
## Vertex processing RSP time for occlusion plane
2024-06-16 18:21:06 -07:00
2025-07-13 16:37:22 -07:00
In the occlusion plane F3DEX3 configuration, vertex processing is slower than
in F3DEX2. If using this configuration and there is no occlusion plane or it is
occluding almost nothing, the RSP will be slower with no other benefit.
2024-06-16 18:21:06 -07:00
2025-07-13 16:37:22 -07:00
However, when the occlusion plane is occluding even a few percent of the
triangles in the scene, the situation changes. This saves RDP time, and most
games are RDP bound, so this trades off RSP time for RDP time and makes the game
faster overall. Plus, RSP time is also saved for the tris which are not drawn,
which can approximately cancel out the extra RSP time for computing the
occlusion plane for all vertices.
2024-06-16 18:21:06 -07:00
2025-09-29 21:51:34 -07:00
## Functionality in overlays
2024-06-16 18:21:06 -07:00
2025-09-21 15:06:43 -07:00
The following commands are moved to Overlay 2 or 3 in F3DEX3 to save IMEM space.
This means that code will have to be loaded from DRAM to run them if a different
overlay happens to be loaded already.
2025-07-13 16:37:22 -07:00
- Push and multiply codepaths for `SPMatrix`
- `SPPopMatrix*`
- `SPDma*`
- `SPMemset`
2024-06-16 18:21:06 -07:00
2025-07-13 16:37:22 -07:00
However:
- Multiplying, pushing, and popping matrices is not recommended for performance
or accuracy, and these are not used for most 3D objects in SM64 or OoT.
- `SPDma*` is rarely used except at startup for HLE detection.
- `SPMemset` is a new F3DEX3 command which can improve performance. Plus, it is
2025-09-21 15:06:43 -07:00
typically run shortly after render start, when Overlay 3 (which contains it)
is already in IMEM.
2024-06-16 18:21:06 -07:00
2025-07-13 16:37:22 -07:00
So there is not a significant practical performance impact from these changes.
2024-06-16 18:21:06 -07:00
2024-06-17 03:56:10 +02:00
## Far clipping removal
2024-06-16 18:21:06 -07:00
Far clipping is completely removed in F3DEX3. Far clipping is not intentionally
used for performance or aesthetic reasons in levels in vanilla SM64 or OoT,
though it can be seen in certain extreme cases. However, it is used on the SM64
title screen for the zoom-in on Mario's face, so this will look slightly
different.
2025-07-13 16:37:22 -07:00
Far clipping can be used to cull tris which are fully "fogged out" if the
background color (no skybox) is also the fog color, for performance benefits.
This effect has a bad reputation in '90s era games for being used as a cheap
trick to hide performance problems, though it's occasionally used in "spooky"
levels in romhacks. In F3DEX3, `SPAlphaCompareCull` can be used instead of far
clipping to cull these triangles which are fully in fog.
2024-06-16 18:21:06 -07:00
The removal of far clipping saved a bunch of DMEM space, and enabled other
changes to the clipping implementation which saved even more DMEM space.
NoN (No Nearclipping) is also mandatory in F3DEX3, though this was already the
microcode option used in OoT. Note that tris are still clipped at the camera
plane; nearclipping means they are clipped at the nearplane, which is a short
distance in front of the camera plane.
2024-06-17 03:56:10 +02:00
## Removal of scaled vertex normals
2024-06-16 18:21:06 -07:00
A few clever romhackers figured out that you could shrink the normals on verts
in your mesh (so their length is less than "1") to make the lighting on those
2025-07-13 16:37:22 -07:00
verts dimmer and create a version of ambient occlusion. In the "advanced"
lighting codepath, F3DEX3 normalizes vertex normals after transforming them,
which is required for point lights, specular, and Fresnel, so this no longer
works. However, F3DEX3 has support for ambient occlusion via vertex alpha, which
accomplishes the same goal with some extra benefits:
2024-06-16 18:21:06 -07:00
- Much easier to create: just paint the vertex alpha in Blender / fast64. The
scaled normals approach was not supported in fast64 and had to be done with
scripts or by hand.
- The amount of ambient occlusion in F3DEX3 can be set at runtime based on
variable scene lighting, whereas the scaled normals approach is baked into the
mesh.
- F3DEX3 can have the vertex alpha affect ambient, directional, and point lights
by different amounts, which is not possible with scaled normals. In fact,
scaled normals never affect the ambient light, contrary to the concept of
ambient occlusion.
The only case where scaled normals work but F3DEX3 AO doesn't work is for meshes
with vertex alpha actually used for transparency (therefore also no fog).
2025-07-13 16:37:22 -07:00
Note that in the "basic" lighting codepath in F3DEX3, vertex normals are treated
the same way as in F3DEX2, so scaled normals are supported there. Ambient
occlusion is also supported there.
2024-06-16 18:21:06 -07:00
2024-06-17 03:56:10 +02:00
## RDP temporary buffers shrinking
2024-06-16 18:21:06 -07:00
In FIFO versions of F3DEX2, there are two DMEM buffers to hold RDP commands
generated by the microcode, which are swapped and copied to the FIFO in DRAM.
These each had the capacity of two-and-a-fraction full-size triangle commands
(i.e. triangles with shade, texture, and Z-buffer). For short commands (e.g.
texture loads, color combiner, etc.) there is a slight performance gain from
having longer buffers in DMEM which are swapped to DRAM less frequently. And, if
a substantial portion of triangles were rendered without shade or texture such
that three tris could fit per buffer, being able to fit the three tris would
also slightly improve performance. However, in practice, the vast majority of
the FIFO is occupied by full-size tris, so the buffers are effectively only two
tris in size because a third tri can't fit. So, their size has been reduced to
two tris, saving a substantial amount of DMEM.
2024-06-17 03:56:10 +02:00
## Segment 0
2024-06-16 18:21:06 -07:00
Segment 0 is now reserved: ensure segment 0 is never set to anything but
0x00000000. In F3DEX2 and prior this was only a good idea (and SM64 and OoT
always follow this); in F3DEX3 segmented addresses are now resolved relative to
other segments. That is, `gsSPSegment(0x08, 0x07001000)` sets segment 8 to the
base address of segment 7 with an additional offset of 0x1000. So for correct
behavior when supplying a direct-mapped or physical address such as 0x80101000,
segment 0 must always be 0x00000000 so that this address resolves to e.g.
0x101000 as expected in this example.
2024-09-21 18:29:11 -07:00
## Non-textured tris
In F3DEX2, the RSP time for drawing non-textured tris was significantly lower
than for textured tris, by skipping a chunk of computation for the texture
2024-11-17 22:30:52 -08:00
coefficients if they were disabled. In F3DEX3, no computation is skipped when
2025-09-21 15:06:43 -07:00
textures are disabled. However, practically almost all materials use textures,
and F3DEX3 is faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still
does not send the texture cofficients if they are disabled, saving DRAM access
time for RSP -> FIFO and FIFO -> RDP. RDP time savings from avoiding loading a
2025-07-13 16:37:22 -07:00
texture are unaffected of course.
2024-09-21 18:29:11 -07:00
2025-09-29 21:51:34 -07:00
## Yield check timing
In F3DEX2, the microcode checks whether the CPU has requested that it yield (to
run the audio microcode) before running every display list command. F3DEX3 now
performs this check every time the input buffer is refilled, which is typically
once every 21 commands. The amount by which this delays the start of the audio
microcode is typically very small, and worst case during normal conditions would
be a few hundred microseconds. However, if the RDP FIFO is full during this
time, the microcode will have to wait for the RDP to make progress through its
workload to free up space for the outputs of the RSP commands. This will slow
down the RSP to the RDP's speed, and since triangles can be arbitrarily large
on screen, this can theoretically cause huge stalls. If you ever encounter this
in practice, please contact Sauraen.
2024-06-17 03:56:10 +02:00
## Obscure semantic differences from F3DEX2 that should never matter in practice
2024-06-16 18:21:06 -07:00
- Changing fog settings--i.e. enabling or disabling `G_FOG` in the geometry mode
or executing `SPFogFactor` or `SPFogPosition`--between loading verts and
drawing tris with those verts will lead to incorrect fog values for those
tris. In F3DEX2, the fog settings at vertex load time would always be used,
even if they were changed before drawing tris.
2025-07-27 17:23:36 -07:00
- Drawing tris overwrites the 4 bytes stored with `G_RDPHALF_1`, which is used
to hold state during some display list macros which are actually two 8-byte
commands. This change is not noticeable when using standard GBI commands, only
if something highly custom has been set up.
2025-09-21 15:06:43 -07:00
- `SPTexture` and `SPFogFactor` state is corrupted when loading and returning
from another microcode (S2DEX). In F3DEX2, it would be reinitialized to
default values; in F3DEX3, it is left as garbage values.