mirror of
https://github.com/HackerN64/F3DEX3.git
synced 2026-01-21 10:37:45 -08:00
More documentation, re-added debug normals config
This commit is contained in:
30
README.md
30
README.md
@@ -4,8 +4,8 @@ Modern graphics microcode for N64 romhacks. Will make you want to finally ditch
|
||||
HLE. Heavily modified version of F3DEX2, with all vertex and lighting code
|
||||
rewritten from scratch.
|
||||
|
||||
**F3DEX3 is in alpha. It is not guaranteed to be bug-free, and updates may bring
|
||||
breaking changes.**
|
||||
**F3DEX3 is in beta. The GBI should be relatively stable but may change if there
|
||||
is a good reason.**
|
||||
|
||||
[View the documentation here](https://hackern64.github.io/F3DEX3/) (or just look
|
||||
through the docs folder).
|
||||
@@ -16,7 +16,7 @@ through the docs folder).
|
||||
|
||||
Compared to F3DEX2 or any other F3D family microcode, F3DEX3 is...
|
||||
- faster on the RDP
|
||||
- in `LVP_NOC` configuration ([see docs](https://hackern64.github.io/F3DEX3/configuration.html)), [also faster on the RSP](https://hackern64.github.io/F3DEX3/performance.html)
|
||||
- in `NOC` configuration ([see docs](https://hackern64.github.io/F3DEX3/configuration.html)), [also faster on the RSP](https://hackern64.github.io/F3DEX3/performance.html)
|
||||
- more accurate
|
||||
- full of new visual features
|
||||
- [measurable in performance](https://hackern64.github.io/F3DEX3/counters.html)
|
||||
@@ -27,9 +27,10 @@ all at the same time!
|
||||
|
||||
- New geometry mode bit `G_PACKED_NORMALS` enables **simultaneous vertex colors
|
||||
and normals/lighting on the same mesh**, by encoding the normals in the unused
|
||||
2 bytes of each vertex using a variant of [octahedral encoding](https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/).
|
||||
The normals are effectively as precise as with the vanilla method of replacing
|
||||
vertex RGB with normal XYZ.
|
||||
2 bytes of each vertex using the 5-6-5 bit encoding by HailToDodongo from
|
||||
[Tiny3D](https://github.com/HailToDodongo/tiny3d). Model-space precision of
|
||||
the normals is reduced, but this is rarely noticeable and there is barely any
|
||||
performance penalty compared to regular normals without vertex colors.
|
||||
- New geometry mode bit `G_AMBOCCLUSION` enables **ambient occlusion** for
|
||||
opaque materials. Paint the shadow map into the vertex alpha channel; separate
|
||||
factors (set with `SPAmbOcclusion`) control how much this affects the ambient
|
||||
@@ -106,6 +107,9 @@ all at the same time!
|
||||
value. This can be used for clearing the Z buffer or filling the framebuffer
|
||||
or the letterbox with a solid color **faster than the RDP can in fill mode**.
|
||||
Practical performance may vary due to scheduling constraints.
|
||||
- The key codepaths for triangle draw and vertex processing (assuming lighting
|
||||
enabled and the occlusion plane disabled with the `NOC` configuration) are
|
||||
**slightly faster than in F3DEX2**.
|
||||
|
||||
### Miscellaneous
|
||||
|
||||
@@ -132,14 +136,23 @@ all at the same time!
|
||||
parameters are encoded in the command. With some limitations, this allows the
|
||||
tint colors of cel shading to **match scene lighting** with no code
|
||||
intervention. Also useful for other lighting-dependent effects.
|
||||
- The microcode automatically switches between two lighting implementations
|
||||
depending on which visual features are selected in the particular material.
|
||||
The "basic lighting" codepath--which is roughly the same speed as F3DEX2--
|
||||
supports all F3DEX2 features (directional lights, texgen), plus packed
|
||||
normals, ambient occlusion, and light-to-alpha. The "advanced lighting"
|
||||
codepath, which is slower, adds support for point lights, specular, and
|
||||
Fresnel. You only pay the performance penalty for the features you use, and
|
||||
only for the objects which use them.
|
||||
|
||||
|
||||
### Profiling
|
||||
|
||||
F3DEX3 introduces a suite of performance profiling capabilities. These take the
|
||||
form of performance counters, which report cycle counts for various operations
|
||||
or the number of items processed of a given type. There are a total of 21
|
||||
performance counters across multiple microcode versions. See the Profiling
|
||||
section below.
|
||||
performance counters across multiple microcode versions. See the Performance
|
||||
Counters page in the docs.
|
||||
|
||||
|
||||
## Credits
|
||||
@@ -159,6 +172,7 @@ Other contributors:
|
||||
- Kaze Emanuar: several feature suggestions, testing
|
||||
- thecozies: Fresnel feature suggestion
|
||||
- Rasky: memset feature suggestion
|
||||
- HailToDodongo: packed normals encoding
|
||||
- coco875: Doxygen / GitHub Pages setup
|
||||
- ThePerfectLuigi64: CI build setup
|
||||
- neoshaman: feature discussions
|
||||
|
||||
@@ -116,7 +116,7 @@ In variables.h with the ENABLE_SPEEDMETER section:
|
||||
extern volatile F3DEX3YieldDataFooter gRSPProfilingResults;
|
||||
```
|
||||
|
||||
In the true codepath of Sched_TaskComplete:
|
||||
In the `true` codepath of Sched_TaskComplete:
|
||||
```
|
||||
#ifdef ENABLE_SPEEDMETER
|
||||
/* Fetch number of primitives drawn from yield data */
|
||||
@@ -139,7 +139,7 @@ volatile F3DEX3YieldDataFooter gRSPProfilingResults;
|
||||
```
|
||||
|
||||
You can display them on screen however you wish. Here is an example, in
|
||||
SpeedMeter_DrawTimeEntries
|
||||
SpeedMeter_DrawTimeEntries:
|
||||
```
|
||||
GfxPrint printer;
|
||||
Gfx* opaStart;
|
||||
|
||||
@@ -4,6 +4,60 @@
|
||||
|
||||
These features were present in earlier F3DEX3 versions, but have been removed.
|
||||
|
||||
## Legacy Vertex Pipeline (LVP) Configuration
|
||||
|
||||
Early versions of F3DEX3 were developed exclusively in an OoT context, where
|
||||
scenes are almost always RDP bottlenecked. Thus, these versions focused on
|
||||
reducing RDP time and adding new visual features at the cost of RSP time.
|
||||
|
||||
Later, Kaze Emanuar became interested in using F3DEX3 in Return to Yoshi's
|
||||
Island due to the RDP performance improvements. However, due to the intense
|
||||
optimization work he had done, his game was relatively balanced in RDP / RSP
|
||||
time. Thus, when he tried F3DEX3, the decrease in RDP time and increase in RSP
|
||||
time made the game slower overall, which was not acceptable.
|
||||
|
||||
As a result, the LVP configuration of F3DEX3 was developed, to bring
|
||||
F3DEX2-style vertex processing in exchange for dropping some of the advanced
|
||||
lighting features (which Kaze was not going to use anyway due to HLE
|
||||
compatibility). This was implemented, and after much optimization across the
|
||||
entire microcode, `F3DEX3_LVP_NOC` became slightly faster than F3DEX2 on both
|
||||
RDP and RSP. This caused Kaze to immediately adopt this configuration of F3DEX3
|
||||
for Return to Yoshi's Island.
|
||||
|
||||
Unfortunately, this meant that if developers wanted to use the advanced lighting
|
||||
features of F3DEX3 in any part of their project, they were stuck with the much
|
||||
slower non-LVP configuration of F3DEX3. The desire to have the microcode
|
||||
automatically swap versions for each material, plus the invention of ways to
|
||||
include some of the advanced lighting features in the LVP vertex processing
|
||||
without any performance penalty when not using them, led to the reunion of the
|
||||
versions. Now you get LVP-style performance when not using some of the advanced
|
||||
features, and only pay the performance penalty while rendering objects which
|
||||
use them.
|
||||
|
||||
A similar approach was also considered for the NOC configuration--to have the
|
||||
microcode only compute the occlusion plane when it is enabled. This is
|
||||
unfortunately infeasible. Register allocation / naming, as well as some
|
||||
pipelined instructions leading into and out of lighting, are significantly
|
||||
different between the occlusion plane and NOC versions of vertex processing.
|
||||
This means the microcode would have to swap between four versions of lighting
|
||||
code instead of just two, creating much more complexity with the overlay system
|
||||
and IMEM size issues. Furthermore, the occlusion plane is typically not
|
||||
enabled/disabled per object, but used when rendering as much of the game
|
||||
contents as possible to maximize occluded objects. So it is reasonable to choose
|
||||
the occlusion plane or NOC configuration on a per-frame or even per-scene basis.
|
||||
|
||||
## Octahedral Encoding for Packed Normals
|
||||
|
||||
Previous F3DEX3 versions encoded packed normals into the unused 2 bytes of each
|
||||
vertex using a variant of [octahedral encoding](https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/).
|
||||
Using this method, the normals were effectively as precise as with the vanilla
|
||||
method of replacing vertex RGB with normal XYZ. However, the decoding of this
|
||||
format was inefficient, partly due to the requirement to also support vanilla
|
||||
normals at vanilla performance. Once HailToDodongo showed that the community was
|
||||
willing to accept the moderate precision loss of the much simpler 5-6-5 bit
|
||||
encoding in [Tiny3D](https://github.com/HailToDodongo/tiny3d), this was adopted
|
||||
in F3DEX3.
|
||||
|
||||
## Clipping minimal scanlines algorithm
|
||||
|
||||
Earlier F3DEX3 versions included a modified algorithm for triangulating the
|
||||
|
||||
50
f3dex3.s
50
f3dex3.s
@@ -143,6 +143,8 @@ COUNTER_C_FIFO_FULL equ 1
|
||||
|
||||
.endif
|
||||
|
||||
CFG_DEBUG_NORMALS equ 0 // Can manually enable here
|
||||
|
||||
// Only raise a warning in base modes; in profiling modes, addresses will be off
|
||||
.macro warn_if_base, warntext
|
||||
.if !ENABLE_PROFILING
|
||||
@@ -1264,7 +1266,6 @@ G_MODIFYVTX_handler:
|
||||
j do_moveword // Moveword adds cmd_w0 to $10 for final addr
|
||||
lbu cmd_w0, (inputBufferEnd - 0x07)(inputBufferPos) // offset in vtx, bit 15 clear
|
||||
|
||||
TODO check vtx 1 behavior
|
||||
G_TRIFAN_handler: // 17
|
||||
li $1, 0x8000 // $ra negative = flag for G_TRIFAN
|
||||
G_TRISTRIP_handler:
|
||||
@@ -2597,8 +2598,9 @@ tris_end:
|
||||
|
||||
tri_fan_store:
|
||||
lb $11, (inputBufferEnd - 7)(inputBufferPos) // Load vtx 1
|
||||
sh cmd_w1_dram, 5(rdpCmdBufPtr) // Store vtx N+2 and N+3 as 1 and 2
|
||||
j tri_main
|
||||
sb $11, 5(rdpCmdBufPtr) // Store vtx 1
|
||||
sb $11, 7(rdpCmdBufPtr) // Store vtx 1 as 3
|
||||
|
||||
// Converts the segmented address in cmd_w1_dram to the corresponding physical address
|
||||
segmented_to_physical: // 7
|
||||
@@ -3235,6 +3237,19 @@ ltbasic_start_standard:
|
||||
vnop
|
||||
luv lVCI[0], (tempVpRGBA)(rdpCmdBufEndP1) // Load vertex color input
|
||||
ltbasic_after_start:
|
||||
|
||||
.if CFG_DEBUG_NORMALS
|
||||
.warning "Debug normals visualization is enabled"
|
||||
vmudh vpNrmlX, vOne, vpNrmlX[3h] // Move X to all elements
|
||||
vne $v29, $v31, $v31[1h] // Set VCC to 10111011
|
||||
vmrg vpNrmlX, vpNrmlX, vpNrmlY[3h] // X in 0, 4; Y to 1, 5
|
||||
vne $v29, $v31, $v31[2h] // Set VCC to 11011101
|
||||
vmrg vpNrmlX, vpNrmlX, vpNrmlZ[3h] // Z to 2, 6
|
||||
vmudh $v29, vOne, $v31[5] // 0x4000; middle gray
|
||||
j vtx_return_from_lighting
|
||||
vmacf vpRGBA, vpNrmlX, $v31[5] // 0x4000; + 0.5 * normal
|
||||
.else // CFG_DEBUG_NORMALS
|
||||
|
||||
vmulf $v29, vpNrmlX, vLTC[4] // Normals X elems 3, 7 * first light dir X
|
||||
// lDIR <- (NOC: -, Occ: sOTM)
|
||||
lpv lDIR[0], (ltBufOfs + 8 - 2*lightSize)(ambLight) // Xfrmed dir in elems 4-6; temp reg
|
||||
@@ -3267,7 +3282,9 @@ ltbasic_post:
|
||||
jr lbAfter
|
||||
// vpRGBA <- lDIR
|
||||
vmrg vpRGBA, vpLtTot, lVCI // RGB = light, A = vtx alpha
|
||||
|
||||
|
||||
.endif // CFG_DEBUG_NORMALS
|
||||
|
||||
// lbAfter = ltbasic_ao if AO else
|
||||
// lbPostAo = ltbasic_l2a if L2A else
|
||||
// ltbasic_packed if packed else
|
||||
@@ -3469,6 +3486,17 @@ ltadv_vtx_loop: // Even instruction
|
||||
vmudn vpWrlF, vpWrlF, $v31[1] // -1; negate world pos so add light/cam pos to it
|
||||
andi laSpecFres, vGeomMid, (G_LIGHTING_SPECULAR | G_FRESNEL_COLOR | G_FRESNEL_ALPHA) >> 8
|
||||
vmadh vpWrlI, vpWrlI, $v31[1] // -1
|
||||
|
||||
.if CFG_DEBUG_NORMALS
|
||||
vmudh $v29, vOne, $v31[5] // 0x4000; middle gray
|
||||
li laTexgen, 0
|
||||
vmacf vpRGBA, vpWNrm, $v31[5] // 0x4000; + 0.5 * normal
|
||||
ltadv_finish_light:
|
||||
ltadv_loop:
|
||||
ltadv_normals_to_regs:
|
||||
ltadv_specular:
|
||||
.else
|
||||
|
||||
ltadv_normals_to_regs:
|
||||
vmudh vpNrmlY, vOne, vpWNrm[1h] // Move normals to separate registers
|
||||
bnez laSpecFres, ltadv_spec_fres_setup
|
||||
@@ -3504,7 +3532,7 @@ ltadv_specular: // aDOT in/out, uses vpLtTot[3] and $11 as temps
|
||||
jr $ra
|
||||
vxor aDOT, aDOT, $v31[7] // = 0x7FFF - result
|
||||
|
||||
align_with_warning 8, "One instruction of padding before ltadv_post"
|
||||
.align 8
|
||||
ltadv_post:
|
||||
// aClOut <- vpWrlF
|
||||
// aAlOut <- vpWrlI
|
||||
@@ -3525,7 +3553,7 @@ ltadv_post:
|
||||
vcopy aClOut, vpLtTot // If no packed normals, base output is just light
|
||||
@@skip_novtxcolor:
|
||||
vmrg vpRGBA, aClOut, aAlOut // Merge base output and alpha output
|
||||
beqz $11, @@skip_fresnel
|
||||
beqz $11, ltadv_skip_fresnel
|
||||
ldv vpMdl[8], (VTX_IN_OB + 0 * inputVtxSize)(laPtr) // Vtx 1 Model pos + PN
|
||||
lsv aAOF[0], (vTRC_0100_addr - altBase)(altBaseReg) // Load constant 0x0100 to temp
|
||||
vabs aOAFrs, aOAFrs, aOAFrs // Fresnel dot in aOAFrs[0h]; absolute value for underwater
|
||||
@@ -3538,7 +3566,10 @@ ltadv_post:
|
||||
@@skip:
|
||||
vmrg vpRGBA, vpRGBA, aOAFrs[0h] // Replace color or alpha with fresnel
|
||||
vge vpRGBA, vpRGBA, $v31[2] // Clamp to >= 0 for fresnel; doesn't affect others
|
||||
@@skip_fresnel:
|
||||
|
||||
.endif // CFG_DEBUG_NORMALS
|
||||
|
||||
ltadv_skip_fresnel:
|
||||
beqz laTexgen, ltadv_after_texgen
|
||||
suv vpRGBA, (VTX_IN_TC - 2 * inputVtxSize)(laPtr) // Vtx 2:1 RGBA
|
||||
// Texgen: aDOT still contains lookat 0 in elems 0-2, lookat 1 in elems 4-6
|
||||
@@ -3659,13 +3690,6 @@ ltadv_normalize: // Normalize vector in aDPosI:vpWrlF i/f
|
||||
// aDIR <- aDotSc
|
||||
|
||||
|
||||
CFG_DEBUG_NORMALS equ 0 // Can manually enable here
|
||||
.if CFG_DEBUG_NORMALS
|
||||
.warning "Debug normals visualization is enabled"
|
||||
vmudh $v29, vOne, $v31[5] // 0x4000; middle gray
|
||||
j TODO
|
||||
vmacf vpRGBA, vpWNrm, $v31[5] // 0x4000; + 0.5 * normal
|
||||
.endif
|
||||
|
||||
ovl4_end:
|
||||
.align 8
|
||||
|
||||
18
gbi.h
18
gbi.h
@@ -2707,13 +2707,19 @@ _DW({ \
|
||||
}
|
||||
/**
|
||||
* 5 Triangles in strip arrangement. Draws the following tris:
|
||||
* v1-v2-v3, v3-v2-v4, v3-v4-v5, v5-v4-v6, v5-v6-v7
|
||||
* v1-v2-v3, v2-v4-v3, v3-v4-v5, v4-v6-v5, v5-v6-v7
|
||||
* If you want to draw fewer tris, set indices to -1 from the right.
|
||||
* e.g. to draw 4 tris, set v7 to -1; to draw 3 tris, set v6 to -1
|
||||
* Note that any set of 3 adjacent tris can be drawn with either SPTriStrip
|
||||
* e.g. to draw 4 tris, set v7 to -1; to draw 3 tris, set v6 to -1.
|
||||
*
|
||||
* @note Any set of 3 adjacent tris can be drawn with either SPTriStrip
|
||||
* or SPTriFan. For arbitrary sets of 4 adjacent tris, four out of five of them
|
||||
* can be drawn with one of SPTriStrip or SPTriFan. The 4-triangle formation
|
||||
* which can't be drawn with either command looks like the Triforce.
|
||||
* which can't be drawn with either command looks like the Triforce--maybe
|
||||
* F3DEX4 will support gsSPTriForce. :)
|
||||
*
|
||||
* @note The first index of each triangle drawn is different, so that in
|
||||
* !G_SHADING_SMOOTH (flat shading) mode, the single color or single normal of
|
||||
* each triangle can be set independently.
|
||||
*/
|
||||
#define gSPTriStrip(pkt, v1, v2, v3, v4, v5, v6, v7) \
|
||||
_gSP5Triangles(pkt, G_TRISTRIP, v1, v2, v3, v4, v5, v6, v7)
|
||||
@@ -2724,8 +2730,8 @@ _DW({ \
|
||||
_gsSP5Triangles(G_TRISTRIP, v1, v2, v3, v4, v5, v6, v7)
|
||||
/**
|
||||
* 5 Triangles in fan arrangement. Draws the following tris:
|
||||
* v1-v2-v3, v1-v3-v4, v1-v4-v5, v1-v5-v6, v1-v6-v7
|
||||
* Otherwise works the same as SPTriStrip, see above.
|
||||
* v2-v3-v1, v3-v4-v1, v4-v5-v1, v5-v6-v1, v6-v7-v1
|
||||
* Otherwise works the same as @see SPTriStrip.
|
||||
*/
|
||||
#define gSPTriFan(pkt, v1, v2, v3, v4, v5, v6, v7) \
|
||||
_gSP5Triangles(pkt, G_TRIFAN, v1, v2, v3, v4, v5, v6, v7)
|
||||
|
||||
Reference in New Issue
Block a user