From 675ef164e67ad154b402c3ff6c41f082dcf8f2be Mon Sep 17 00:00:00 2001 From: Sauraen Date: Thu, 11 Apr 2024 22:35:48 -0700 Subject: [PATCH] Updating readme --- README.md | 142 +++++++++++++++++++++++++++++++++--------------------- f3dex3.s | 20 +++++--- 2 files changed, 100 insertions(+), 62 deletions(-) diff --git a/README.md b/README.md index 15c12f7..67e5f00 100644 --- a/README.md +++ b/README.md @@ -124,15 +124,37 @@ section below. There are several selectable configuration settings when building F3DEX3, which can be enabled in any combination. With a couple minor exceptions, none of these settings affect the GBI--in fact, you can swap between the microcode versions on -a per-frame basis if you build multiple versions into your game. +a per-frame basis if you build multiple versions into your romhack. -### Legacy Vertex Pipeline +### No Occlusion Plane (NOC) + +If you are not using the occlusion plane feature in your romhack, you can +use this configuration, which removes the computation of the occlusion plane +in the vertex processing pipeline, saving some RSP time. + +If you care about performance, please do consider using the occlusion plane! +When it occludes even a small percentage of the total triangles drawn, not only +is RDP time saved (which is the point), but RSP time is also saved when those +tris are not drawn. This can offset the extra RSP time for computing the +occlusion plane for all vertices. + +You can also build both the NOC and base microcodes into your ROM and switch +between them on a per-frame basis. If there is no occlusion plane active or the +best occlusion plane candidate would be very small on screen, you can use the +NOC microcode and save RSP time. If there is a significant occlusion plane, you +can use the base microcode and reduce the RDP time. You could also determine +which version to use on the profiling results from the previous frame: if the +RSP is the bottleneck (e.g. the RDP `CLK - CMD` is high), use the NOC version, +and otherwise use the base version. + +### Legacy Vertex Pipeline (LVP) The primary tradeoff for all the new lighting features in F3DEX3 is increased -RSP time for vertex processing. For a basic configuration without lighting or -with one directional light, when F3DEX2 is at its fastest, vertex processing in -F3DEX3 takes roughly **2.5x** the RSP cycles as in F3DEX2. However, under most -circumstances, this is not an issue at all: +RSP time for vertex processing. The base version of F3DEX3 takes about +**2-2.5x** more RSP time for vertex processing than F3DEX2 (see Performance +Results section below), assuming no lighting or directional lights only. +However, under most circumstances, this does not affect the game's overall +framerate: - This only applies to vertex processing, not triangle processing or other miscellaneous microcode tasks. So the total RSP cycles spent doing useful work during the frame is only modestly increased. @@ -142,11 +164,11 @@ circumstances, this is not an issue at all: cases, slower vertex processing simply means less time spent waiting, and little to no change in total RSP time. - When the FIFO does not fill up, usually the RSP takes significantly less time - during the frame compared to the RDP, so increased RSP time does not affect - the overall framerate. + during the frame compared to the RDP, so increased RSP time usually does not + affect the overall framerate. -Thus, you should always start with the base version of F3DEX3 in your game, and -if the RSP never becomes the bottleneck, you can stick with that. +As a result, you should always start with the base version of F3DEX3 in your +romhack, and if the RSP never becomes the bottleneck, you can stick with that. However, if you have done extreme optimizations in your game to reduce RDP time (i.e. if you are Kaze Emanuar), it's possible for the RSP to sometimes become @@ -167,17 +189,16 @@ However, it retains all other F3DEX3 features: - All features not related to vertex/lighting: auto-batched rendering, packed 5 triangles commands, hints system, etc. -The performance of F3DEX3 vertex processing with LVP and NOC (no occlusion -plane, see below) is only about 16% slower than F3DEX2, instead of 250% slower. - +The performance of F3DEX3 vertex processing with LVP and NOC is almost the same +as that of F3DEX2; see the Performance Results section below. ### Profiling -There are far too many -counters for a single microcode to maintain, so multiple configurations of the -microcode can be built, each containing a different set of performance counters. -These can be swapped while the game is running so the full set of counters can -be effectively accessed over multiple frames. +As mentioned above, F3DEX3 includes many performance counters. There are far too +many counters for a single microcode to maintain, so multiple configurations of +the microcode can be built, each containing a different set of performance +counters. These can be swapped while the game is running so the full set of +counters can be effectively accessed over multiple frames. There are a total of 21 performance counters, including: - Counts of vertices, triangles, rectangles, matrices, DL commands, etc. @@ -189,11 +210,59 @@ There are a total of 21 performance counters, including: The default configuration of F3DEX3 provides a few of the most basic counters. The additional profiling configurations, called A, B, and C (for example `F3DEX3_BrZ_PA`), provide additional counters, but have two default features -removed to make space for the profiling. These two features were selected +removed to make space for the extra profiling. These two features were selected because their removal does not affect the RDP render time. - The `SPLightToRDP` commands are removed (they become no-ops) - Flat shading mode, i.e. `!G_SHADING_SMOOTH`, is removed (all tris are smooth) +### Branch Depth Instruction (`BrZ` / `BrW`) + +Use `BrZ` if the microcode is replacing F3DEX2 or an earlier F3D version (i.e. +SM64), or `BrW` if the microcode is replacing F3DZEX (i.e. OoT or MM). This +controls whether `SPBranchLessZ*` uses the vertex's W coordinate or screen Z +coordinate. + +### Debug Normals (`dbgN`) + +To help debug lighting issues when integrating F3DEX3 into your romhack, this +feature causes the vertex colors of any material with lighting enabled to be set +to the transformed, normalized world space normals. The X, Y, and Z components +map to R, G, and B, with each dimension's conceptual (-1.0 ... 1.0) range mapped +to (0 ... 255). This is not compatible with LVP as world space normals do not +exist in that pipeline. This also breaks vertex alpha and texgen / lookat. + +Some ways to use this for debugging are: +- If the normals have obvious problems (e.g. flickering, or not changing + smoothly as the object rotates / animates), there is likely a problem with the + model space normals or the M matrix. Conversely, if there is a problem with + the standard lighting results (e.g. flickering) but the normals don't have + this problem, the problem is likely in the lighting data. +- Check that the colors don't change based on the camera position, but DO change + as the object rotates, so that the same side of an object in world space is + always the same color. +- Make a simple object like an octahedron or sphere, view it in game, and check + that the normals are correct. A normal pointing along +X would be + (1.0, 0.0, 0.0), meaning (255, 128, 128) or pink. A normal pointing along -X + would be (-1.0, 0.0, 0.0), meaning (0, 128, 128) or dark cyan. Bright, fully + saturated colors like green (0, 255, 0), yellow (255, 255, 0), or black should + never appear as these would correspond to impossibly long normals. +- Make the same object (octahedron is easiest in this case) with vertex colors + which match what the normals should be, and compare them. + + +## Performance Results + +Vertex pipeline cycles per vertex pair in steady state. Hand-counted timings +taking into account all pipeline stalls, but not instruction alignment. + +| Microcode | No Lighting | First Dir Lt | Second Dir Lt | +|----------------|-------------|--------------|---------------| +| F3DEX3 | +| F3DEX3_NOC | +| F3DEX3_LVP | +| F3DEX3_LVP_NOC | +| F3DEX2 | 54 | 19 | 3 | + ## Porting Your Romhack Codebase to F3DEX3 @@ -201,12 +270,6 @@ For an OoT codebase, only a few minor changes are required to use F3DEX3. However, more changes are recommended to increase performance and enable new features. -Select the correct version of F3DEX3 for your game: use `make F3DEX3_BrW` if the -microcode is replacing F3DZEX (i.e. OoT or MM), otherwise `make F3DEX3_BrZ` if -the microcode is replacing F3DEX2 or an earlier F3D version (i.e. SM64). This -controls whether `SPBranchLessZ*` uses the vertex's W coordinate or screen Z -coordinate. - How to modify the microcode in your HackerOoT based romhack (steps may be similar for other games): - Replace `include/ultra64/gbi.h` in your romhack with `gbi.h` from this repo. @@ -587,35 +650,6 @@ segment 0 must always be 0x00000000 so that this address resolves to e.g. even if they were changed before drawing tris. -## Debugging - -To help debug lighting issues, add `CFG_DEBUG_NORMALS` to the `OPTIONS :=` line -of your selected microcode version in the Makefile (near the bottom), then -`make clean` and `make` again. This feature causes the vertex colors of any -material with lighting enabled to be set to the transformed, normalized world -space normals. The X, Y, and Z components map to R, G, and B, with each -dimension's conceptual (-1.0 ... 1.0) range mapped to (0 ... 255). This also -breaks vertex alpha and texgen / lookat. - -Some ways to use this for debugging are: -- If the normals have obvious problems (e.g. flickering, or not changing - smoothly as the object rotates / animates), there is likely a problem with the - model space normals or the M matrix. Conversely, if there is a problem with - the standard lighting results (e.g. flickering) but the normals don't have - this problem, the problem is likely in the lighting data. -- Check that the colors don't change based on the camera position, but DO change - as the object rotates, so that the same side of an object in world space is - always the same color. -- Make a simple object like an octahedron or sphere, view it in game, and check - that the normals are correct. A normal pointing along +X would be - (1.0, 0.0, 0.0), meaning (255, 128, 128) or pink. A normal pointing along -X - would be (-1.0, 0.0, 0.0), meaning (0, 128, 128) or dark cyan. Bright, fully - saturated colors like green (0, 255, 0), yellow (255, 255, 0), or black should - never appear as these would correspond to impossibly long normals. -- Make the same object (octahedron is easiest in this case) with vertex colors - which match what the normals should be, and compare them. - - ## Credits F3DEX3 modifications from F3DEX2 are by Sauraen and are dedicated to the public diff --git a/f3dex3.s b/f3dex3.s index 3abc299..431d79b 100644 --- a/f3dex3.s +++ b/f3dex3.s @@ -1358,15 +1358,19 @@ vtx_after_calc_mit: .endif andi $11, $5, G_LIGHTING >> 8 beqz $11, @@skip_lighting - li $16, vtx_return_from_lighting // This is clipFlags, but not modified - li $16, lt_vtx_pair // during vtx_store + li $16, vtx_return_from_lighting // This is clipFlags, but not modified + li $16, lt_vtx_pair // during vtx_store @@skip_lighting: andi $7, $5, G_FOG >> 8 // Nonzero if fog enabled - jal while_wait_dma_busy // Wait for vertex load to finish - li $19, clipTempVerts // Temp mem we can freely overwrite replaces outputVtxPos + jal while_wait_dma_busy // Wait for vertex load to finish + li $19, clipTempVerts // Temp mem we can freely overwrite replaces outputVtxPos j middle_of_vtx_store - move secondVtxPos, $19 // for first pre-loop, same for secondVtxPos + move secondVtxPos, $19 // for first pre-loop, same for secondVtxPos +.if CFG_LEGACY_VTX_PIPE +vtx_early_return_from_lighting: + vmrg vPairRGBA, vPairLt, vPairRGBA // RGB = light, A = vtx alpha +.endif vtx_return_from_lighting: li $ra, vertex_end .if CFG_LEGACY_VTX_PIPE @@ -2903,9 +2907,8 @@ lt_loop: bne curLight, altBaseReg, lt_loop vmacf vPairLt, vBBB, vCCC[0h] // + light color * dot product lt_post: - vne $v29, $v31, $v31[3h] // Set VCC to 11101110 - beqz $17, vtx_return_from_lighting - vmrg vPairRGBA, vPairLt, vPairRGBA // RGB = light, A = vtx alpha + beqz $17, vtx_early_return_from_lighting + vne $v29, $v31, $v31[3h] // Set VCC to 11101110 .endif // These definitions are shared by both versions vLookat1 equ vAAA @@ -2917,6 +2920,7 @@ vLookat0 equ vPairLt vmulf $v29, vPairNrml, $v18[4] // Normals X elems 0, 4 * lookat 1 X vmacf $v29, $v14, $v18[5] // Normals Y elems 0, 4 * lookat 1 Y vmacf vLookat1, $v15, $v18[6] // Normals Z elems 0, 4 * lookat 1 Z + vmrg vPairRGBA, vPairLt, vPairRGBA // RGB = light, A = vtx alpha // Continue to rest of texgen shared by both versions. .endif