diff --git a/README.md b/README.md index 02ef076..68f60c1 100644 --- a/README.md +++ b/README.md @@ -83,11 +83,11 @@ all at the same time! loads. If this system incorrectly culls supposedly repeated texture loads which actually differ due to segment manipulation, you can locally disable it using the new `SPDontSkipTexLoadsAcross` command. -- New `SPTriangleStrip` and `SPTriangleFan` commands **pack up to 5 tris** into - one 64-bit GBI command (up from 2 tris in F3DEX2). In any given object, most - tris can be drawn with these commands, with only a few at the end drawn with - `SP2Triangles` or `SP1Triangle`. So, this cuts the triangle portion of display - lists roughly in half, saving DRAM traffic and ROM space. +- New `SPTriSnake` command provides a flexible, generalized triangle strip + primitive, which can better leverage the vertex cache than a traditional + triangle strip. This packs up to 8 tris per display list command, for up to + 4x less memory bandwidth for loading tris; typical meshes should see a **2-3x + memory bandwidth reduction** for this step. - New `SPAlphaCompareCull` command enables culling of triangles whose computed shade alpha values are all below or above a settable threshold. This **substantially reduces the performance penalty of cel shading**--only tris diff --git a/docs/Documentation/5tris.png b/docs/Documentation/5tris.png deleted file mode 100644 index c684990..0000000 Binary files a/docs/Documentation/5tris.png and /dev/null differ diff --git a/docs/Documentation/Removed.md b/docs/Documentation/Removed.md index 60cee62..20c881f 100644 --- a/docs/Documentation/Removed.md +++ b/docs/Documentation/Removed.md @@ -123,3 +123,12 @@ By setting this to -2 and drawing an opaque tri, the tri would appear like a decal, but with no Z-fighting. This has been removed and replaced with the decal fix, which is automatic and does not require any special setup in the display list. + +## `SPTriStrip` and `SPTriFan` + +These commands are still supported in the GBI, but as special cases of +`SPTriSnake` with specific sets of directions. In addition to covering both of +these commands, the `SPTriSnake` command can draw the mirror-imaged 4-triangle +strip which `SPTriStrip` could not (without inefficiency), as well as +arbitrarily long triangle strips, fans, and other snake shapes via +`SPContinueSnake`. diff --git a/docs/Documentation/Triangle Snake.md b/docs/Documentation/Triangle Snake.md new file mode 100644 index 0000000..d0b6851 --- /dev/null +++ b/docs/Documentation/Triangle Snake.md @@ -0,0 +1,63 @@ +@page snake Triangle Snake + +![F3DEX3 triangle snake demo](snake_demo_ingame.png) +*A triangle snake, drawn with a single F3DEX3 `gsSPTriSnake` command (and +multiple `gsSPContinueSnake`s). Flat shading is used to emphasize that each +consecutive triangle in the snake has its Vertex 1 be a new index, not the same +as one of the indices of the previous triangle. Drawing this with a single snake +uses 3.7x less memory bandwidth for triangle display list commands compared to +drawing the same mesh with `gsSP2Triangles` commands like in F3DEX2.* + +**Triangle Snake** is F3DEX3's new accelerated triangles command. It is capable +of drawing any shape which is expressible as a single, non-branching chain of +connected triangles. At each triangle, the command encodes whether the snake +turns left or right--in other words, whether this triangle is attached to one or +the other of the yet-unconnected edges of the previous triangle. A traditional +triangle strip is a special case of a triangle snake with alternating directions +(left-right-left-right-etc.), and similarly a traditional triangle fan is a +triangle snake with the same direction repeatedly (left-left-left-etc.). + +![Slithering snake forming triangle strip](snake_slither.jpg) +*A snake can slither by moving in an alternating left and right pattern. This +represents a triangle strip. Original photo by Bui Van Dong, free-use licensed* + +![Coiled snake forming triangle fan](snake_coil.jpg) +*If the snake repeatedly turns in the same direction, it coils up. This +corresponds to a triangle fan. Original photo by Gabriel Rondina, free-use +licensed* + +![Snake with mixed shape](snake_mixed.jpg) +*The snake need not be constrained to either shape; it can turn left or right in +any combination. This can be thought of as concatenating triangle strips and +fans. Original photo by Al d'Vilas, free-use licensed* + +A snake can be arbitrarily long. It starts with a `SPTriSnake` command, which +may be followed by one or more `SPContinueSnake` macros which encode continued +indices. The latter are not commands (there's no command byte)--they are just +more index data sequentially in the display list. In other words, the display +list input buffer is the storage for the indices data. The microcode correctly +handles the case when the snake runs off the end of the input buffer and the +input buffer needs to be refilled. The refilled data starts from the start of +the input buffer, as if it were regular commands; this matters for the hints +system. + +## Memory Bandwidth + +The goal of any accelerated triangles system in a microcode is to reduce the +memory bandwidth used for loading triangle indices. The actual tris drawn are +the same regardless of how their indices are encoded in the display list, so we +do not consider the performance of actually drawing the tris, only loading their +indices. + +An `SPTriSnake` command by itself contains 7 vertices and draws 5 triangles +(because the first triangle needs two extra vertices to start itself). An `SPContinueSnake` macro contains 8 vertices and draws 8 tris, in each case +continuing the existing snake. The F3D family microcodes before F3DEX3 only +provided `SP1Triangle` and `SP2Triangle` commands, so any snake of 3 or more +tris is more efficient than F3DEX2 and older microcodes. The efficiency gain +is up to 4x (2 tris -> 8 tris per 8-byte macro), though in typical meshes the +gain is expected to be 2-3x. + +## Vertex Cache Locality + +The key advantage of a triangle snake over a traditional triangle strip is that +it better exploits the vertex cache. \ No newline at end of file diff --git a/docs/Documentation/snake_coil.jpg b/docs/Documentation/snake_coil.jpg new file mode 100644 index 0000000..ab6b9e6 Binary files /dev/null and b/docs/Documentation/snake_coil.jpg differ diff --git a/docs/Documentation/snake_demo.svg b/docs/Documentation/snake_demo.svg new file mode 100644 index 0000000..02ab5d5 --- /dev/null +++ b/docs/Documentation/snake_demo.svg @@ -0,0 +1,1045 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + + + + + + + + + + + + + + + + + + + + + + + + + + + 23 + 24 + 25 + *=26 black, 27 green + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + + + + + + + + + + + + + + + + + + + + + + + + + + * + 36 + 37 + 39 + 40 + 41 + 42 + 43 + 38 + + + 500 + 1000 + 500 + -500 + -500 + -1000 + + diff --git a/docs/Documentation/snake_demo_ingame.png b/docs/Documentation/snake_demo_ingame.png new file mode 100644 index 0000000..cf5fd22 Binary files /dev/null and b/docs/Documentation/snake_demo_ingame.png differ diff --git a/docs/Documentation/snake_mixed.jpg b/docs/Documentation/snake_mixed.jpg new file mode 100644 index 0000000..c29612d Binary files /dev/null and b/docs/Documentation/snake_mixed.jpg differ diff --git a/docs/Documentation/snake_slither.jpg b/docs/Documentation/snake_slither.jpg new file mode 100644 index 0000000..f0b0700 Binary files /dev/null and b/docs/Documentation/snake_slither.jpg differ diff --git a/docs/documentation.md b/docs/documentation.md index f96dda2..256d3eb 100644 --- a/docs/documentation.md +++ b/docs/documentation.md @@ -5,3 +5,4 @@ - @subpage removed - @subpage performance - @subpage porting +- @subpage snake diff --git a/f3dex3.s b/f3dex3.s index 3d7fdbf..f37c1f1 100644 --- a/f3dex3.s +++ b/f3dex3.s @@ -1285,7 +1285,7 @@ tri_snake_loop_from_input_buffer: li $ra, tri_snake_loop // For tri_main bltz $3, tri_snake_end // Upper bit of real index b set = done andi $11, $3, 1 // Get direction flag from index c - beqz inputBufferPos, tris_end // TODO tri_snake_over_input_buffer // == 0 at end of input buffer + beqz inputBufferPos, tri_snake_over_input_buffer // == 0 at end of input buffer andi $3, $3, 0x7E // Mask out flags from index c sb $3, rdpHalf1Val + 1 // Store index c as vertex 1 sb $2, (rdpHalf1Val + 2)($11) // Store old v1 as 2 if dir clear or 3 if set @@ -2604,7 +2604,7 @@ tris_end: tri_snake_end: addi inputBufferPos, inputBufferPos, 7 // Round up to whole input command - addi $11, $zero, 0xFFF8 // Sign-extend; andi is zero-extend! + addi $11, $zero, 0xFFF8 // Sign-extend; andi is zero-extend! j tris_end and inputBufferPos, inputBufferPos, $11 // inputBufferPos has to be negative diff --git a/gbi.h b/gbi.h index e9fba0e..2b2601f 100644 --- a/gbi.h +++ b/gbi.h @@ -2327,9 +2327,6 @@ _DW({ \ * gSPVertex(glistp++, v, 3, 2); * ``` * - * @note - * Because the RSP geometry transformation engine uses a vertex list with triangle list architecture, it is quite powerful. A simple one-triangle macro retains least performance compared to @ref gSP2Triangles or the new 5 tris commands in EX3 (@ref gSPTriStrip, @ref gSPTriFan). - * * @param v is the pointer to the vertex list (segment address) * @param n is the number of vertices * @param v0 is the load vertex by index vo(0~55) in vertex buffer @@ -2681,19 +2678,19 @@ _DW({ \ } /** - * Make the triangle snake turn left before drawing this triangle. In other + * Make the triangle snake turn right before drawing this triangle. In other * words, build the new triangle off the newest and middle-age vertices of the * last triangle. * @see gSPTriSnake */ -#define G_SNAKE_LEFT 0 +#define G_SNAKE_RIGHT 0 /** - * Make the triangle snake turn right before drawing this triangle. In other + * Make the triangle snake turn left before drawing this triangle. In other * words, build the new triangle off the newest and oldest vertices of the last * triangle. * @see gSPTriSnake */ -#define G_SNAKE_RIGHT 1 +#define G_SNAKE_LEFT 1 /** * Logical-OR this into a triangle index to mark it as the last triangle of the * snake. In other words, this gets OR'd into the last valid index, not the @@ -2706,16 +2703,16 @@ _DW({ \ */ #define G_SNAKE_LAST 0x40 -#define _gSPTriSnakeW0(i1, i2, i3) \ - (_SHIFTL(G_TRISNAKE, 24, 8) | \ - _SHIFTL((i2)*2, 16, 8) | \ - _SHIFTL((i1)*2, 8, 8) | \ - _SHIFTL((i3)*2|G_SNAKE_RIGHT, 0, 8)) +#define _gSPTriSnakeW0(i1, i2, i3) \ + (_SHIFTL(G_TRISNAKE, 24, 8) | \ + _SHIFTL((i2)*2, 16, 8) | \ + _SHIFTL((i1)*2, 8, 8) | \ + _SHIFTL((i3)*2|G_SNAKE_LEFT, 0, 8)) #define _gSPTriSnakeW1(i4, i4d, i5, i5d, i6, i6d, i7, i7d) \ - (_SHIFTL((i4)*2|(i4d), 24, 8) | \ - _SHIFTL((i5)*2|(i5d), 16, 8) | \ - _SHIFTL((i6)*2|(i6d), 8, 8) | \ - _SHIFTL((i7)*2|(i7d), 0, 8)) + (_SHIFTL((i4)*2|(i4d), 24, 8) | \ + _SHIFTL((i5)*2|(i5d), 16, 8) | \ + _SHIFTL((i6)*2|(i6d), 8, 8) | \ + _SHIFTL((i7)*2|(i7d), 0, 8)) /** * Triangle snake is F3DEX3's accelerated triangles command. It is a generalized @@ -2728,31 +2725,31 @@ _DW({ \ * The drawing algorithm is: * - Initialize 3 bytes of stored triangle indices, A-B-C, to i3-i1-i2, and draw * this triangle. (This initialization and draw is actually implemented by - * storing i2-i1-i3 and then running the algorithm below with G_SNAKE_RIGHT, + * storing i2-i1-i3 and then running the algorithm below with G_SNAKE_LEFT, * which ends up storing i2 to C and i3 to A, ultimately creating i3-i1-i2.) * - Loop: * - If the index in A has G_SNAKE_LAST or'd into it, exit. * - Increment the input pointer, and read the next index and its direction * flag (currently i4 and i4d). - * - If the direction flag is G_SNAKE_LEFT, copy A to B; else - * (G_SNAKE_RIGHT), copy A to C. + * - If the direction flag is G_SNAKE_RIGHT, copy A to B; else + * (G_SNAKE_LEFT), copy A to C. * - Store the new index (currently i4) to A. * - Draw the triangle A-B-C and repeat the loop. * * For example, after drawing the first triangle i3-i1-i2, if i4 is - * G_SNAKE_LEFT, the snake turns left and draws i4-i3-i2: - * 4 -->-- 3 - * \' /'\ (winding order and - * \ / \ first vertex for flat - * \ / \ shading are marked) - * 2 --<-- 1 - * Conversely, after the first triangle i3-i1-i2, if i4 is G_SNAKE_RIGHT, the - * snake turns right and draws i4-i1-i3: - * 3 -->-- 4 - * /'\ '/ - * / \ / - * / \ / - * 2 --<-- 1 + * G_SNAKE_RIGHT, the snake turns right and draws i4-i3-i2: + * 3 --<-- 4 + * /'\ '/ (winding order and + * / \ / first vertex for flat + * / \ / shading are marked) + * 1 -->-- 2 + * Conversely, after the first triangle i3-i1-i2, if i4 is G_SNAKE_LEFT, the + * snake turns left and draws i4-i1-i3: + * 4 --<-- 3 + * \' /'\ + * \ / \ + * \ / \ + * 1 -->-- 2 * If the snake turns in the same direction repeatedly, it will coil up, forming * a triangle fan. If it slithers left and right alternately, this will form a * triangle strip. Any combination of these is also possible. In particular, a @@ -2762,6 +2759,11 @@ _DW({ \ * snake, except for tris which have two unconnected edges which can only be the * first or last tris of the snake. * + * Logical-OR G_SNAKE_LAST into the last valid index of the snake. This index + * still needs a valid G_SNAKE_LEFT or G_SNAKE_RIGHT for its direction. However, + * for all indices after this, you can fill the index and direction parameters + * with 0s. + * * @see gSPContinueSnake to extend the snake to more than 5 triangles. */ #define gSPTriSnake(pkt, i1, i2, i3, i4, i4d, i5, i5d, i6, i6d, i7, i7d) \ @@ -2802,7 +2804,7 @@ _DW({ \ #define gsSPContinueSnake(i0, i0d, i1, i1d, i2, i2d, i3, i3d, \ i4, i4d, i5, i5d, i6, i6d, i7, i7d) \ { \ - _gSPTriSnakeW1(i0, i0d, i1, i1d, i2, i2d, i3, i3d) \ + _gSPTriSnakeW1(i0, i0d, i1, i1d, i2, i2d, i3, i3d), \ _gSPTriSnakeW1(i4, i4d, i5, i5d, i6, i6d, i7, i7d) \ } @@ -2823,25 +2825,25 @@ _DW({ \ * @note One of the two handednesses of a 4 tri strip cannot be drawn directly * with gSPTriStrip, unless v1 and v2 are set to the same vertex to create a * degenerate triangle, which costs a little performance. However, now this - * shape can be drawn with gSPTriSnake (directions right-left-right). + * shape can be drawn with gSPTriSnake (directions left-right-left). */ -#define gSPTriStrip(pkt, v1, v2, v3, v4, v5, v6, v7) \ - gSPTriSnake(pkt, v1, v2, \ - v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \ - v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ - v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ - v7, G_SNAKE_RIGHT) +#define gSPTriStrip(pkt, v1, v2, v3, v4, v5, v6, v7) \ + gSPTriSnake(pkt, v1, v2, \ + (v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \ + (v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ + (v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ + (v7) | G_SNAKE_LAST, G_SNAKE_LEFT) /** * @copydetails gSPTriStrip */ -#define gsSPTriStrip(v1, v2, v3, v4, v5, v6, v7) \ - gsSPTriSnake(v1, v2, \ - v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \ - v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ - v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ - v7, G_SNAKE_RIGHT) +#define gsSPTriStrip(v1, v2, v3, v4, v5, v6, v7) \ + gsSPTriSnake(v1, v2, \ + (v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \ + (v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ + (v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ + (v7) | G_SNAKE_LAST, G_SNAKE_LEFT) /** * 5 Triangles in fan arrangement. Draws the following tris: * v3-v1-v2, v4-v1-v3, v5-v1-v4, v6-v1-v5, v7-v1-v6 @@ -2849,23 +2851,23 @@ _DW({ \ * * @deprecated Use gSPTriSnake directly. */ -#define gSPTriFan(pkt, v1, v2, v3, v4, v5, v6, v7) \ - gSPTriSnake(pkt, v1, v2, \ - v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \ - v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v7, G_SNAKE_RIGHT) +#define gSPTriFan(pkt, v1, v2, v3, v4, v5, v6, v7) \ + gSPTriSnake(pkt, v1, v2, \ + (v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \ + (v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v7) | G_SNAKE_LAST, G_SNAKE_LEFT) /** * @copydetails gSPTriFan */ -#define gsSPTriFan(v1, v2, v3, v4, v5, v6, v7) \ - gsSPTriSnake(v1, v2, \ - v3 | ((v4 & 0x80) ? G_SNAKE_LAST : 0), \ - v4 | ((v5 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v5 | ((v6 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v6 | ((v7 & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_RIGHT, \ - v7, G_SNAKE_RIGHT) +#define gsSPTriFan(v1, v2, v3, v4, v5, v6, v7) \ + gsSPTriSnake(v1, v2, \ + (v3) | (((v4) & 0x80) ? G_SNAKE_LAST : 0), \ + (v4) | (((v5) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v5) | (((v6) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v6) | (((v7) & 0x80) ? G_SNAKE_LAST : 0), G_SNAKE_LEFT, \ + (v7) | G_SNAKE_LAST, G_SNAKE_LEFT) /*