Documenting snake

This commit is contained in:
Sauraen
2025-07-26 22:34:27 -07:00
parent f8d9455750
commit 7fdca37991
12 changed files with 1189 additions and 69 deletions

View File

@@ -83,11 +83,11 @@ all at the same time!
loads. If this system incorrectly culls supposedly repeated texture loads
which actually differ due to segment manipulation, you can locally disable it
using the new `SPDontSkipTexLoadsAcross` command.
- New `SPTriangleStrip` and `SPTriangleFan` commands **pack up to 5 tris** into
one 64-bit GBI command (up from 2 tris in F3DEX2). In any given object, most
tris can be drawn with these commands, with only a few at the end drawn with
`SP2Triangles` or `SP1Triangle`. So, this cuts the triangle portion of display
lists roughly in half, saving DRAM traffic and ROM space.
- New `SPTriSnake` command provides a flexible, generalized triangle strip
primitive, which can better leverage the vertex cache than a traditional
triangle strip. This packs up to 8 tris per display list command, for up to
4x less memory bandwidth for loading tris; typical meshes should see a **2-3x
memory bandwidth reduction** for this step.
- New `SPAlphaCompareCull` command enables culling of triangles whose computed
shade alpha values are all below or above a settable threshold. This
**substantially reduces the performance penalty of cel shading**--only tris

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

View File

@@ -123,3 +123,12 @@ By setting this to -2 and drawing an opaque tri, the tri would appear like a
decal, but with no Z-fighting. This has been removed and replaced with the decal
fix, which is automatic and does not require any special setup in the display
list.
## `SPTriStrip` and `SPTriFan`
These commands are still supported in the GBI, but as special cases of
`SPTriSnake` with specific sets of directions. In addition to covering both of
these commands, the `SPTriSnake` command can draw the mirror-imaged 4-triangle
strip which `SPTriStrip` could not (without inefficiency), as well as
arbitrarily long triangle strips, fans, and other snake shapes via
`SPContinueSnake`.

View File

@@ -0,0 +1,63 @@
@page snake Triangle Snake
![F3DEX3 triangle snake demo](snake_demo_ingame.png)
*A triangle snake, drawn with a single F3DEX3 `gsSPTriSnake` command (and
multiple `gsSPContinueSnake`s). Flat shading is used to emphasize that each
consecutive triangle in the snake has its Vertex 1 be a new index, not the same
as one of the indices of the previous triangle. Drawing this with a single snake
uses 3.7x less memory bandwidth for triangle display list commands compared to
drawing the same mesh with `gsSP2Triangles` commands like in F3DEX2.*
**Triangle Snake** is F3DEX3's new accelerated triangles command. It is capable
of drawing any shape which is expressible as a single, non-branching chain of
connected triangles. At each triangle, the command encodes whether the snake
turns left or right--in other words, whether this triangle is attached to one or
the other of the yet-unconnected edges of the previous triangle. A traditional
triangle strip is a special case of a triangle snake with alternating directions
(left-right-left-right-etc.), and similarly a traditional triangle fan is a
triangle snake with the same direction repeatedly (left-left-left-etc.).
![Slithering snake forming triangle strip](snake_slither.jpg)
*A snake can slither by moving in an alternating left and right pattern. This
represents a triangle strip. Original photo by Bui Van Dong, free-use licensed*
![Coiled snake forming triangle fan](snake_coil.jpg)
*If the snake repeatedly turns in the same direction, it coils up. This
corresponds to a triangle fan. Original photo by Gabriel Rondina, free-use
licensed*
![Snake with mixed shape](snake_mixed.jpg)
*The snake need not be constrained to either shape; it can turn left or right in
any combination. This can be thought of as concatenating triangle strips and
fans. Original photo by Al d'Vilas, free-use licensed*
A snake can be arbitrarily long. It starts with a `SPTriSnake` command, which
may be followed by one or more `SPContinueSnake` macros which encode continued
indices. The latter are not commands (there's no command byte)--they are just
more index data sequentially in the display list. In other words, the display
list input buffer is the storage for the indices data. The microcode correctly
handles the case when the snake runs off the end of the input buffer and the
input buffer needs to be refilled. The refilled data starts from the start of
the input buffer, as if it were regular commands; this matters for the hints
system.
## Memory Bandwidth
The goal of any accelerated triangles system in a microcode is to reduce the
memory bandwidth used for loading triangle indices. The actual tris drawn are
the same regardless of how their indices are encoded in the display list, so we
do not consider the performance of actually drawing the tris, only loading their
indices.
An `SPTriSnake` command by itself contains 7 vertices and draws 5 triangles
(because the first triangle needs two extra vertices to start itself). An `SPContinueSnake` macro contains 8 vertices and draws 8 tris, in each case
continuing the existing snake. The F3D family microcodes before F3DEX3 only
provided `SP1Triangle` and `SP2Triangle` commands, so any snake of 3 or more
tris is more efficient than F3DEX2 and older microcodes. The efficiency gain
is up to 4x (2 tris -> 8 tris per 8-byte macro), though in typical meshes the
gain is expected to be 2-3x.
## Vertex Cache Locality
The key advantage of a triangle snake over a traditional triangle strip is that
it better exploits the vertex cache.

Binary file not shown.

After

Width:  |  Height:  |  Size: 324 KiB

File diff suppressed because it is too large Load Diff

After

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 202 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 307 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 312 KiB

View File

@@ -5,3 +5,4 @@
- @subpage removed
- @subpage performance
- @subpage porting
- @subpage snake

View File

@@ -1285,7 +1285,7 @@ tri_snake_loop_from_input_buffer:
li $ra, tri_snake_loop // For tri_main
bltz $3, tri_snake_end // Upper bit of real index b set = done
andi $11, $3, 1 // Get direction flag from index c
beqz inputBufferPos, tris_end // TODO tri_snake_over_input_buffer // == 0 at end of input buffer
beqz inputBufferPos, tri_snake_over_input_buffer // == 0 at end of input buffer
andi $3, $3, 0x7E // Mask out flags from index c
sb $3, rdpHalf1Val + 1 // Store index c as vertex 1
sb $2, (rdpHalf1Val + 2)($11) // Store old v1 as 2 if dir clear or 3 if set
@@ -2604,7 +2604,7 @@ tris_end:
tri_snake_end:
addi inputBufferPos, inputBufferPos, 7 // Round up to whole input command
addi $11, $zero, 0xFFF8 // Sign-extend; andi is zero-extend!
addi $11, $zero, 0xFFF8 // Sign-extend; andi is zero-extend!
j tris_end
and inputBufferPos, inputBufferPos, $11 // inputBufferPos has to be negative

126
gbi.h
View File

@@ -2327,9 +2327,6 @@ _DW({ \
* gSPVertex(glistp++, v, 3, 2);
* ```
*
* @note
* Because the RSP geometry transformation engine uses a vertex list with triangle list architecture, it is quite powerful. A simple one-triangle macro retains least performance compared to @ref gSP2Triangles or the new 5 tris commands in EX3 (@ref gSPTriStrip, @ref gSPTriFan).
*
* @param v is the pointer to the vertex list (segment address)
* @param n is the number of vertices
* @param v0 is the load vertex by index vo(0~55) in vertex buffer
@@ -2681,19 +2678,19 @@ _DW({ \
}
/**
* Make the triangle snake turn left before drawing this triangle. In other
* Make the triangle snake turn right before drawing this triangle. In other
* words, build the new triangle off the newest and middle-age vertices of the
* last triangle.
* @see gSPTriSnake
*/
#define G_SNAKE_LEFT 0
#define G_SNAKE_RIGHT 0
/**
* Make the triangle snake turn right before drawing this triangle. In other
* Make the triangle snake turn left before drawing this triangle. In other
* words, build the new triangle off the newest and oldest vertices of the last
* triangle.
* @see gSPTriSnake
*/
#define G_SNAKE_RIGHT 1
#define G_SNAKE_LEFT 1
/**
* Logical-OR this into a triangle index to mark it as the last triangle of the
* snake. In other words, this gets OR'd into the last valid index, not the
@@ -2706,16 +2703,16 @@ _DW({ \
*/
#define G_SNAKE_LAST 0x40
#define _gSPTriSnakeW0(i1, i2, i3) \
(_SHIFTL(G_TRISNAKE, 24, 8) | \
_SHIFTL((i2)*2, 16, 8) | \
_SHIFTL((i1)*2, 8, 8) | \
_SHIFTL((i3)*2|G_SNAKE_RIGHT, 0, 8))
#define _gSPTriSnakeW0(i1, i2, i3) \
(_SHIFTL(G_TRISNAKE, 24, 8) | \
_SHIFTL((i2)*2, 16, 8) | \
_SHIFTL((i1)*2, 8, 8) | \
_SHIFTL((i3)*2|G_SNAKE_LEFT, 0, 8))
#define _gSPTriSnakeW1(i4, i4d, i5, i5d, i6, i6d, i7, i7d) \
(_SHIFTL((i4)*2|(i4d), 24, 8) | \
_SHIFTL((i5)*2|(i5d), 16, 8) | \
_SHIFTL((i6)*2|(i6d), 8, 8) | \
_SHIFTL((i7)*2|(i7d), 0, 8))
(_SHIFTL((i4)*2|(i4d), 24, 8) | \
_SHIFTL((i5)*2|(i5d), 16, 8) | \
_SHIFTL((i6)*2|(i6d), 8, 8) | \
_SHIFTL((i7)*2|(i7d), 0, 8))
/**
* Triangle snake is F3DEX3's accelerated triangles command. It is a generalized
@@ -2728,31 +2725,31 @@ _DW({ \
* The drawing algorithm is:
* - Initialize 3 bytes of stored triangle indices, A-B-C, to i3-i1-i2, and draw
* this triangle. (This initialization and draw is actually implemented by
* storing i2-i1-i3 and then running the algorithm below with G_SNAKE_RIGHT,
* storing i2-i1-i3 and then running the algorithm below with G_SNAKE_LEFT,
* which ends up storing i2 to C and i3 to A, ultimately creating i3-i1-i2.)
* - Loop:
* - If the index in A has G_SNAKE_LAST or'd into it, exit.
* - Increment the input pointer, and read the next index and its direction
* flag (currently i4 and i4d).
* - If the direction flag is G_SNAKE_LEFT, copy A to B; else
* (G_SNAKE_RIGHT), copy A to C.
* - If the direction flag is G_SNAKE_RIGHT, copy A to B; else
* (G_SNAKE_LEFT), copy A to C.
* - Store the new index (currently i4) to A.
* - Draw the triangle A-B-C and repeat the loop.
*
* For example, after drawing the first triangle i3-i1-i2, if i4 is
* G_SNAKE_LEFT, the snake turns left and draws i4-i3-i2:
* 4 -->-- 3
* \' /'\ (winding order and
* \ / \ first vertex for flat
* \ / \ shading are marked)
* 2 --<-- 1
* Conversely, after the first triangle i3-i1-i2, if i4 is G_SNAKE_RIGHT, the
* snake turns right and draws i4-i1-i3:
* 3 -->-- 4
* /'\ '/
* / \ /
* / \ /
* 2 --<-- 1
* G_SNAKE_RIGHT, the snake turns right and draws i4-i3-i2:
* 3 --<-- 4
* /'\ '/ (winding order and
* / \ / first vertex for flat
* / \ / shading are marked)
* 1 -->-- 2
* Conversely, after the first triangle i3-i1-i2, if i4 is G_SNAKE_LEFT, the
* snake turns left and draws i4-i1-i3:
* 4 --<-- 3
* \' /'\
* \ / \
* \ / \
* 1 -->-- 2
* If the snake turns in the same direction repeatedly, it will coil up, forming
* a triangle fan. If it slithers left and right alternately, this will form a
* triangle strip. Any combination of these is also possible. In particular, a
@@ -2762,6 +2759,11 @@ _DW({ \
* snake, except for tris which have two unconnected edges which can only be the
* first or last tris of the snake.
*
* Logical-OR G_SNAKE_LAST into the last valid index of the snake. This index
* still needs a valid G_SNAKE_LEFT or G_SNAKE_RIGHT for its direction. However,
* for all indices after this, you can fill the index and direction parameters
* with 0s.
*
* @see gSPContinueSnake to extend the snake to more than 5 triangles.
*/
#define gSPTriSnake(pkt, i1, i2, i3, i4, i4d, i5, i5d, i6, i6d, i7, i7d) \
@@ -2802,7 +2804,7 @@ _DW({ \
#define gsSPContinueSnake(i0, i0d, i1, i1d, i2, i2d, i3, i3d, \
i4, i4d, i5, i5d, i6, i6d, i7, i7d) \
{ \
_gSPTriSnakeW1(i0, i0d, i1, i1d, i2, i2d, i3, i3d) \
_gSPTriSnakeW1(i0, i0d, i1, i1d, i2, i2d, i3, i3d), \
_gSPTriSnakeW1(i4, i4d, i5, i5d, i6, i6d, i7, i7d) \
}
@@ -2823,25 +2825,25 @@ _DW({ \
* @note One of the two handednesses of a 4 tri strip cannot be drawn directly
* with gSPTriStrip, unless v1 and v2 are set to the same vertex to create a
* degenerate triangle, which costs a little performance. However, now this
* shape can be drawn with gSPTriSnake (directions right-left-right).
* shape can be drawn with gSPTriSnake (directions left-right-left).
*/
#define gSPTriStrip(pkt, v1, v2, v3, v4, v5, v6, v7) \
gSPTriSnake(pkt, v1, v2, \
v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \
v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
v7, G_SNAKE_RIGHT)
#define gSPTriStrip(pkt, v1, v2, v3, v4, v5, v6, v7) \
gSPTriSnake(pkt, v1, v2, \
(v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \
(v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
(v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
(v7) | G_SNAKE_LAST, G_SNAKE_LEFT)
/**
* @copydetails gSPTriStrip
*/
#define gsSPTriStrip(v1, v2, v3, v4, v5, v6, v7) \
gsSPTriSnake(v1, v2, \
v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \
v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
v7, G_SNAKE_RIGHT)
#define gsSPTriStrip(v1, v2, v3, v4, v5, v6, v7) \
gsSPTriSnake(v1, v2, \
(v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \
(v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
(v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
(v7) | G_SNAKE_LAST, G_SNAKE_LEFT)
/**
* 5 Triangles in fan arrangement. Draws the following tris:
* v3-v1-v2, v4-v1-v3, v5-v1-v4, v6-v1-v5, v7-v1-v6
@@ -2849,23 +2851,23 @@ _DW({ \
*
* @deprecated Use gSPTriSnake directly.
*/
#define gSPTriFan(pkt, v1, v2, v3, v4, v5, v6, v7) \
gSPTriSnake(pkt, v1, v2, \
v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \
v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v7, G_SNAKE_RIGHT)
#define gSPTriFan(pkt, v1, v2, v3, v4, v5, v6, v7) \
gSPTriSnake(pkt, v1, v2, \
(v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \
(v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v7) | G_SNAKE_LAST, G_SNAKE_LEFT)
/**
* @copydetails gSPTriFan
*/
#define gsSPTriFan(v1, v2, v3, v4, v5, v6, v7) \
gsSPTriSnake(v1, v2, \
v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \
v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \
v7, G_SNAKE_RIGHT)
#define gsSPTriFan(v1, v2, v3, v4, v5, v6, v7) \
gsSPTriSnake(v1, v2, \
(v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \
(v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \
(v7) | G_SNAKE_LAST, G_SNAKE_LEFT)
/*