More documentation, re-added debug normals config

This commit is contained in:
Sauraen
2025-06-29 18:32:27 -07:00
parent 22fb8a71c9
commit 35f7faf653
5 changed files with 127 additions and 29 deletions

View File

@@ -4,8 +4,8 @@ Modern graphics microcode for N64 romhacks. Will make you want to finally ditch
HLE. Heavily modified version of F3DEX2, with all vertex and lighting code
rewritten from scratch.
**F3DEX3 is in alpha. It is not guaranteed to be bug-free, and updates may bring
breaking changes.**
**F3DEX3 is in beta. The GBI should be relatively stable but may change if there
is a good reason.**
[View the documentation here](https://hackern64.github.io/F3DEX3/) (or just look
through the docs folder).
@@ -16,7 +16,7 @@ through the docs folder).
Compared to F3DEX2 or any other F3D family microcode, F3DEX3 is...
- faster on the RDP
- in `LVP_NOC` configuration ([see docs](https://hackern64.github.io/F3DEX3/configuration.html)), [also faster on the RSP](https://hackern64.github.io/F3DEX3/performance.html)
- in `NOC` configuration ([see docs](https://hackern64.github.io/F3DEX3/configuration.html)), [also faster on the RSP](https://hackern64.github.io/F3DEX3/performance.html)
- more accurate
- full of new visual features
- [measurable in performance](https://hackern64.github.io/F3DEX3/counters.html)
@@ -27,9 +27,10 @@ all at the same time!
- New geometry mode bit `G_PACKED_NORMALS` enables **simultaneous vertex colors
and normals/lighting on the same mesh**, by encoding the normals in the unused
2 bytes of each vertex using a variant of [octahedral encoding](https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/).
The normals are effectively as precise as with the vanilla method of replacing
vertex RGB with normal XYZ.
2 bytes of each vertex using the 5-6-5 bit encoding by HailToDodongo from
[Tiny3D](https://github.com/HailToDodongo/tiny3d). Model-space precision of
the normals is reduced, but this is rarely noticeable and there is barely any
performance penalty compared to regular normals without vertex colors.
- New geometry mode bit `G_AMBOCCLUSION` enables **ambient occlusion** for
opaque materials. Paint the shadow map into the vertex alpha channel; separate
factors (set with `SPAmbOcclusion`) control how much this affects the ambient
@@ -106,6 +107,9 @@ all at the same time!
value. This can be used for clearing the Z buffer or filling the framebuffer
or the letterbox with a solid color **faster than the RDP can in fill mode**.
Practical performance may vary due to scheduling constraints.
- The key codepaths for triangle draw and vertex processing (assuming lighting
enabled and the occlusion plane disabled with the `NOC` configuration) are
**slightly faster than in F3DEX2**.
### Miscellaneous
@@ -132,14 +136,23 @@ all at the same time!
parameters are encoded in the command. With some limitations, this allows the
tint colors of cel shading to **match scene lighting** with no code
intervention. Also useful for other lighting-dependent effects.
- The microcode automatically switches between two lighting implementations
depending on which visual features are selected in the particular material.
The "basic lighting" codepath--which is roughly the same speed as F3DEX2--
supports all F3DEX2 features (directional lights, texgen), plus packed
normals, ambient occlusion, and light-to-alpha. The "advanced lighting"
codepath, which is slower, adds support for point lights, specular, and
Fresnel. You only pay the performance penalty for the features you use, and
only for the objects which use them.
### Profiling
F3DEX3 introduces a suite of performance profiling capabilities. These take the
form of performance counters, which report cycle counts for various operations
or the number of items processed of a given type. There are a total of 21
performance counters across multiple microcode versions. See the Profiling
section below.
performance counters across multiple microcode versions. See the Performance
Counters page in the docs.
## Credits
@@ -159,6 +172,7 @@ Other contributors:
- Kaze Emanuar: several feature suggestions, testing
- thecozies: Fresnel feature suggestion
- Rasky: memset feature suggestion
- HailToDodongo: packed normals encoding
- coco875: Doxygen / GitHub Pages setup
- ThePerfectLuigi64: CI build setup
- neoshaman: feature discussions

View File

@@ -116,7 +116,7 @@ In variables.h with the ENABLE_SPEEDMETER section:
extern volatile F3DEX3YieldDataFooter gRSPProfilingResults;
```
In the true codepath of Sched_TaskComplete:
In the `true` codepath of Sched_TaskComplete:
```
#ifdef ENABLE_SPEEDMETER
/* Fetch number of primitives drawn from yield data */
@@ -139,7 +139,7 @@ volatile F3DEX3YieldDataFooter gRSPProfilingResults;
```
You can display them on screen however you wish. Here is an example, in
SpeedMeter_DrawTimeEntries
SpeedMeter_DrawTimeEntries:
```
GfxPrint printer;
Gfx* opaStart;

View File

@@ -4,6 +4,60 @@
These features were present in earlier F3DEX3 versions, but have been removed.
## Legacy Vertex Pipeline (LVP) Configuration
Early versions of F3DEX3 were developed exclusively in an OoT context, where
scenes are almost always RDP bottlenecked. Thus, these versions focused on
reducing RDP time and adding new visual features at the cost of RSP time.
Later, Kaze Emanuar became interested in using F3DEX3 in Return to Yoshi's
Island due to the RDP performance improvements. However, due to the intense
optimization work he had done, his game was relatively balanced in RDP / RSP
time. Thus, when he tried F3DEX3, the decrease in RDP time and increase in RSP
time made the game slower overall, which was not acceptable.
As a result, the LVP configuration of F3DEX3 was developed, to bring
F3DEX2-style vertex processing in exchange for dropping some of the advanced
lighting features (which Kaze was not going to use anyway due to HLE
compatibility). This was implemented, and after much optimization across the
entire microcode, `F3DEX3_LVP_NOC` became slightly faster than F3DEX2 on both
RDP and RSP. This caused Kaze to immediately adopt this configuration of F3DEX3
for Return to Yoshi's Island.
Unfortunately, this meant that if developers wanted to use the advanced lighting
features of F3DEX3 in any part of their project, they were stuck with the much
slower non-LVP configuration of F3DEX3. The desire to have the microcode
automatically swap versions for each material, plus the invention of ways to
include some of the advanced lighting features in the LVP vertex processing
without any performance penalty when not using them, led to the reunion of the
versions. Now you get LVP-style performance when not using some of the advanced
features, and only pay the performance penalty while rendering objects which
use them.
A similar approach was also considered for the NOC configuration--to have the
microcode only compute the occlusion plane when it is enabled. This is
unfortunately infeasible. Register allocation / naming, as well as some
pipelined instructions leading into and out of lighting, are significantly
different between the occlusion plane and NOC versions of vertex processing.
This means the microcode would have to swap between four versions of lighting
code instead of just two, creating much more complexity with the overlay system
and IMEM size issues. Furthermore, the occlusion plane is typically not
enabled/disabled per object, but used when rendering as much of the game
contents as possible to maximize occluded objects. So it is reasonable to choose
the occlusion plane or NOC configuration on a per-frame or even per-scene basis.
## Octahedral Encoding for Packed Normals
Previous F3DEX3 versions encoded packed normals into the unused 2 bytes of each
vertex using a variant of [octahedral encoding](https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/).
Using this method, the normals were effectively as precise as with the vanilla
method of replacing vertex RGB with normal XYZ. However, the decoding of this
format was inefficient, partly due to the requirement to also support vanilla
normals at vanilla performance. Once HailToDodongo showed that the community was
willing to accept the moderate precision loss of the much simpler 5-6-5 bit
encoding in [Tiny3D](https://github.com/HailToDodongo/tiny3d), this was adopted
in F3DEX3.
## Clipping minimal scanlines algorithm
Earlier F3DEX3 versions included a modified algorithm for triangulating the

View File

@@ -143,6 +143,8 @@ COUNTER_C_FIFO_FULL equ 1
.endif
CFG_DEBUG_NORMALS equ 0 // Can manually enable here
// Only raise a warning in base modes; in profiling modes, addresses will be off
.macro warn_if_base, warntext
.if !ENABLE_PROFILING
@@ -1264,7 +1266,6 @@ G_MODIFYVTX_handler:
j do_moveword // Moveword adds cmd_w0 to $10 for final addr
lbu cmd_w0, (inputBufferEnd - 0x07)(inputBufferPos) // offset in vtx, bit 15 clear
TODO check vtx 1 behavior
G_TRIFAN_handler: // 17
li $1, 0x8000 // $ra negative = flag for G_TRIFAN
G_TRISTRIP_handler:
@@ -2597,8 +2598,9 @@ tris_end:
tri_fan_store:
lb $11, (inputBufferEnd - 7)(inputBufferPos) // Load vtx 1
sh cmd_w1_dram, 5(rdpCmdBufPtr) // Store vtx N+2 and N+3 as 1 and 2
j tri_main
sb $11, 5(rdpCmdBufPtr) // Store vtx 1
sb $11, 7(rdpCmdBufPtr) // Store vtx 1 as 3
// Converts the segmented address in cmd_w1_dram to the corresponding physical address
segmented_to_physical: // 7
@@ -3235,6 +3237,19 @@ ltbasic_start_standard:
vnop
luv lVCI[0], (tempVpRGBA)(rdpCmdBufEndP1) // Load vertex color input
ltbasic_after_start:
.if CFG_DEBUG_NORMALS
.warning "Debug normals visualization is enabled"
vmudh vpNrmlX, vOne, vpNrmlX[3h] // Move X to all elements
vne $v29, $v31, $v31[1h] // Set VCC to 10111011
vmrg vpNrmlX, vpNrmlX, vpNrmlY[3h] // X in 0, 4; Y to 1, 5
vne $v29, $v31, $v31[2h] // Set VCC to 11011101
vmrg vpNrmlX, vpNrmlX, vpNrmlZ[3h] // Z to 2, 6
vmudh $v29, vOne, $v31[5] // 0x4000; middle gray
j vtx_return_from_lighting
vmacf vpRGBA, vpNrmlX, $v31[5] // 0x4000; + 0.5 * normal
.else // CFG_DEBUG_NORMALS
vmulf $v29, vpNrmlX, vLTC[4] // Normals X elems 3, 7 * first light dir X
// lDIR <- (NOC: -, Occ: sOTM)
lpv lDIR[0], (ltBufOfs + 8 - 2*lightSize)(ambLight) // Xfrmed dir in elems 4-6; temp reg
@@ -3267,7 +3282,9 @@ ltbasic_post:
jr lbAfter
// vpRGBA <- lDIR
vmrg vpRGBA, vpLtTot, lVCI // RGB = light, A = vtx alpha
.endif // CFG_DEBUG_NORMALS
// lbAfter = ltbasic_ao if AO else
// lbPostAo = ltbasic_l2a if L2A else
// ltbasic_packed if packed else
@@ -3469,6 +3486,17 @@ ltadv_vtx_loop: // Even instruction
vmudn vpWrlF, vpWrlF, $v31[1] // -1; negate world pos so add light/cam pos to it
andi laSpecFres, vGeomMid, (G_LIGHTING_SPECULAR | G_FRESNEL_COLOR | G_FRESNEL_ALPHA) >> 8
vmadh vpWrlI, vpWrlI, $v31[1] // -1
.if CFG_DEBUG_NORMALS
vmudh $v29, vOne, $v31[5] // 0x4000; middle gray
li laTexgen, 0
vmacf vpRGBA, vpWNrm, $v31[5] // 0x4000; + 0.5 * normal
ltadv_finish_light:
ltadv_loop:
ltadv_normals_to_regs:
ltadv_specular:
.else
ltadv_normals_to_regs:
vmudh vpNrmlY, vOne, vpWNrm[1h] // Move normals to separate registers
bnez laSpecFres, ltadv_spec_fres_setup
@@ -3504,7 +3532,7 @@ ltadv_specular: // aDOT in/out, uses vpLtTot[3] and $11 as temps
jr $ra
vxor aDOT, aDOT, $v31[7] // = 0x7FFF - result
align_with_warning 8, "One instruction of padding before ltadv_post"
.align 8
ltadv_post:
// aClOut <- vpWrlF
// aAlOut <- vpWrlI
@@ -3525,7 +3553,7 @@ ltadv_post:
vcopy aClOut, vpLtTot // If no packed normals, base output is just light
@@skip_novtxcolor:
vmrg vpRGBA, aClOut, aAlOut // Merge base output and alpha output
beqz $11, @@skip_fresnel
beqz $11, ltadv_skip_fresnel
ldv vpMdl[8], (VTX_IN_OB + 0 * inputVtxSize)(laPtr) // Vtx 1 Model pos + PN
lsv aAOF[0], (vTRC_0100_addr - altBase)(altBaseReg) // Load constant 0x0100 to temp
vabs aOAFrs, aOAFrs, aOAFrs // Fresnel dot in aOAFrs[0h]; absolute value for underwater
@@ -3538,7 +3566,10 @@ ltadv_post:
@@skip:
vmrg vpRGBA, vpRGBA, aOAFrs[0h] // Replace color or alpha with fresnel
vge vpRGBA, vpRGBA, $v31[2] // Clamp to >= 0 for fresnel; doesn't affect others
@@skip_fresnel:
.endif // CFG_DEBUG_NORMALS
ltadv_skip_fresnel:
beqz laTexgen, ltadv_after_texgen
suv vpRGBA, (VTX_IN_TC - 2 * inputVtxSize)(laPtr) // Vtx 2:1 RGBA
// Texgen: aDOT still contains lookat 0 in elems 0-2, lookat 1 in elems 4-6
@@ -3659,13 +3690,6 @@ ltadv_normalize: // Normalize vector in aDPosI:vpWrlF i/f
// aDIR <- aDotSc
CFG_DEBUG_NORMALS equ 0 // Can manually enable here
.if CFG_DEBUG_NORMALS
.warning "Debug normals visualization is enabled"
vmudh $v29, vOne, $v31[5] // 0x4000; middle gray
j TODO
vmacf vpRGBA, vpWNrm, $v31[5] // 0x4000; + 0.5 * normal
.endif
ovl4_end:
.align 8

18
gbi.h
View File

@@ -2707,13 +2707,19 @@ _DW({ \
}
/**
* 5 Triangles in strip arrangement. Draws the following tris:
* v1-v2-v3, v3-v2-v4, v3-v4-v5, v5-v4-v6, v5-v6-v7
* v1-v2-v3, v2-v4-v3, v3-v4-v5, v4-v6-v5, v5-v6-v7
* If you want to draw fewer tris, set indices to -1 from the right.
* e.g. to draw 4 tris, set v7 to -1; to draw 3 tris, set v6 to -1
* Note that any set of 3 adjacent tris can be drawn with either SPTriStrip
* e.g. to draw 4 tris, set v7 to -1; to draw 3 tris, set v6 to -1.
*
* @note Any set of 3 adjacent tris can be drawn with either SPTriStrip
* or SPTriFan. For arbitrary sets of 4 adjacent tris, four out of five of them
* can be drawn with one of SPTriStrip or SPTriFan. The 4-triangle formation
* which can't be drawn with either command looks like the Triforce.
* which can't be drawn with either command looks like the Triforce--maybe
* F3DEX4 will support gsSPTriForce. :)
*
* @note The first index of each triangle drawn is different, so that in
* !G_SHADING_SMOOTH (flat shading) mode, the single color or single normal of
* each triangle can be set independently.
*/
#define gSPTriStrip(pkt, v1, v2, v3, v4, v5, v6, v7) \
_gSP5Triangles(pkt, G_TRISTRIP, v1, v2, v3, v4, v5, v6, v7)
@@ -2724,8 +2730,8 @@ _DW({ \
_gsSP5Triangles(G_TRISTRIP, v1, v2, v3, v4, v5, v6, v7)
/**
* 5 Triangles in fan arrangement. Draws the following tris:
* v1-v2-v3, v1-v3-v4, v1-v4-v5, v1-v5-v6, v1-v6-v7
* Otherwise works the same as SPTriStrip, see above.
* v2-v3-v1, v3-v4-v1, v4-v5-v1, v5-v6-v1, v6-v7-v1
* Otherwise works the same as @see SPTriStrip.
*/
#define gSPTriFan(pkt, v1, v2, v3, v4, v5, v6, v7) \
_gSP5Triangles(pkt, G_TRIFAN, v1, v2, v3, v4, v5, v6, v7)