diff --git a/design-tradeoffs.html b/design-tradeoffs.html
index 62945c7..c1821db 100644
--- a/design-tradeoffs.html
+++ b/design-tradeoffs.html
@@ -111,8 +111,8 @@ Vertex processing RSP time for occlusion plane</h2>
 <p>In the occlusion plane F3DEX3 configuration, vertex processing is slower than in F3DEX2. If using this configuration and there is no occlusion plane or it is occluding almost nothing, the RSP will be slower with no other benefit.</p>
 <p>However, when the occlusion plane is occluding even a few percent of the triangles in the scene, the situation changes. This saves RDP time, and most games are RDP bound, so this trades off RSP time for RDP time and makes the game faster overall. Plus, RSP time is also saved for the tris which are not drawn, which can approximately cancel out the extra RSP time for computing the occlusion plane for all vertices.</p>
 <h2><a class="anchor" id="autotoc_md30"></a>
-Functionality in Overlay 3</h2>
-<p>The following commands are moved to Overlay 3 in F3DEX3 to save IMEM space. This means that code will have to be loaded from DRAM to run them if Overlays 2 or 4 (for lighting) happen to be loaded already.</p><ul>
+Functionality in overlays</h2>
+<p>The following commands are moved to Overlay 2 or 3 in F3DEX3 to save IMEM space. This means that code will have to be loaded from DRAM to run them if a different overlay happens to be loaded already.</p><ul>
 <li>Push and multiply codepaths for <code>SPMatrix</code></li>
 <li><code>SPPopMatrix*</code></li>
 <li><code>SPDma*</code></li>
@@ -121,7 +121,7 @@ Functionality in Overlay 3</h2>
 <p>However:</p><ul>
 <li>Multiplying, pushing, and popping matrices is not recommended for performance or accuracy, and these are not used for most 3D objects in SM64 or OoT.</li>
 <li><code>SPDma*</code> is rarely used except at startup for HLE detection.</li>
-<li><code>SPMemset</code> is a new F3DEX3 command which can improve performance. Plus, it is typically run shortly after render start, when Overlay 3 is already in IMEM.</li>
+<li><code>SPMemset</code> is a new F3DEX3 command which can improve performance. Plus, it is typically run shortly after render start, when Overlay 3 (which contains it) is already in IMEM.</li>
 </ul>
 <p>So there is not a significant practical performance impact from these changes.</p>
 <h2><a class="anchor" id="autotoc_md31"></a>
@@ -147,12 +147,16 @@ Segment 0</h2>
 <p>Segment 0 is now reserved: ensure segment 0 is never set to anything but 0x00000000. In F3DEX2 and prior this was only a good idea (and SM64 and OoT always follow this); in F3DEX3 segmented addresses are now resolved relative to other segments. That is, <code>gsSPSegment(0x08, 0x07001000)</code> sets segment 8 to the base address of segment 7 with an additional offset of 0x1000. So for correct behavior when supplying a direct-mapped or physical address such as 0x80101000, segment 0 must always be 0x00000000 so that this address resolves to e.g. 0x101000 as expected in this example.</p>
 <h2><a class="anchor" id="autotoc_md35"></a>
 Non-textured tris</h2>
-<p>In F3DEX2, the RSP time for drawing non-textured tris was significantly lower than for textured tris, by skipping a chunk of computation for the texture coefficients if they were disabled. In F3DEX3, no computation is skipped when textures are disabled. However, almost all materials use textures, and F3DEX3 is a little faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still does not send the texture cofficients if they are disabled, saving DRAM access time for RSP -&gt; FIFO and FIFO -&gt; RDP. RDP time savings from avoiding loading a texture are unaffected of course.</p>
+<p>In F3DEX2, the RSP time for drawing non-textured tris was significantly lower than for textured tris, by skipping a chunk of computation for the texture coefficients if they were disabled. In F3DEX3, no computation is skipped when textures are disabled. However, practically almost all materials use textures, and F3DEX3 is faster at drawing textured tris than F3DEX2. Plus, F3DEX3 still does not send the texture cofficients if they are disabled, saving DRAM access time for RSP -&gt; FIFO and FIFO -&gt; RDP. RDP time savings from avoiding loading a texture are unaffected of course.</p>
 <h2><a class="anchor" id="autotoc_md36"></a>
+Yield check timing</h2>
+<p>In F3DEX2, the microcode checks whether the CPU has requested that it yield (to run the audio microcode) before running every display list command. F3DEX3 now performs this check every time the input buffer is refilled, which is typically once every 21 commands. The amount by which this delays the start of the audio microcode is typically very small, and worst case during normal conditions would be a few hundred microseconds. However, if the RDP FIFO is full during this time, the microcode will have to wait for the RDP to make progress through its workload to free up space for the outputs of the RSP commands. This will slow down the RSP to the RDP's speed, and since triangles can be arbitrarily large on screen, this can theoretically cause huge stalls. If you ever encounter this in practice, please contact Sauraen.</p>
+<h2><a class="anchor" id="autotoc_md37"></a>
 Obscure semantic differences from F3DEX2 that should never matter in practice</h2>
 <ul>
 <li>Changing fog settings&ndash;i.e. enabling or disabling <code>G_FOG</code> in the geometry mode or executing <code>SPFogFactor</code> or <code>SPFogPosition</code>&ndash;between loading verts and drawing tris with those verts will lead to incorrect fog values for those tris. In F3DEX2, the fog settings at vertex load time would always be used, even if they were changed before drawing tris.</li>
-<li>Drawing tris overwrites the 4 bytes stored with <code>G_RDPHALF_1</code>, which is used to hold state during some display list macros which are actually two 8-byte commands. This change is not noticeable when using standard GBI commands, only if something highly custom has been set up. </li>
+<li>Drawing tris overwrites the 4 bytes stored with <code>G_RDPHALF_1</code>, which is used to hold state during some display list macros which are actually two 8-byte commands. This change is not noticeable when using standard GBI commands, only if something highly custom has been set up.</li>
+<li><code>SPTexture</code> and <code>SPFogFactor</code> state is corrupted when loading and returning from another microcode (S2DEX). In F3DEX2, it would be reinitialized to default values; in F3DEX3, it is left as garbage values. </li>
 </ul>
 </div></div><!-- contents -->
 </div><!-- PageDoc -->
diff --git a/doxygen_crawl.html b/doxygen_crawl.html
index fc71c6d..6dfb02a 100644
--- a/doxygen_crawl.html
+++ b/doxygen_crawl.html
@@ -111,6 +111,7 @@
 <a href="design-tradeoffs.html#autotoc_md34"/>
 <a href="design-tradeoffs.html#autotoc_md35"/>
 <a href="design-tradeoffs.html#autotoc_md36"/>
+<a href="design-tradeoffs.html#autotoc_md37"/>
 <a href="files.html"/>
 <a href="functions.html"/>
 <a href="functions_vars.html"/>
@@ -277,31 +278,27 @@
 <a href="md_docs_2code.html"/>
 <a href="md_docs_2documentation.html"/>
 <a href="performance.html"/>
-<a href="performance.html#autotoc_md37"/>
 <a href="performance.html#autotoc_md38"/>
 <a href="performance.html#autotoc_md39"/>
 <a href="performance.html#autotoc_md40"/>
-<a href="performance.html#autotoc_md41"/>
-<a href="performance.html#autotoc_md42"/>
-<a href="performance.html#autotoc_md43"/>
 <a href="porting.html"/>
+<a href="porting.html#autotoc_md41"/>
+<a href="porting.html#autotoc_md42"/>
+<a href="porting.html#autotoc_md43"/>
 <a href="porting.html#autotoc_md44"/>
 <a href="porting.html#autotoc_md45"/>
-<a href="porting.html#autotoc_md46"/>
-<a href="porting.html#autotoc_md47"/>
-<a href="porting.html#autotoc_md48"/>
 <a href="removed.html"/>
+<a href="removed.html#autotoc_md46"/>
+<a href="removed.html#autotoc_md47"/>
+<a href="removed.html#autotoc_md48"/>
 <a href="removed.html#autotoc_md49"/>
 <a href="removed.html#autotoc_md50"/>
 <a href="removed.html#autotoc_md51"/>
-<a href="removed.html#autotoc_md52"/>
-<a href="removed.html#autotoc_md53"/>
-<a href="removed.html#autotoc_md54"/>
 <a href="snake.html"/>
+<a href="snake.html#autotoc_md52"/>
+<a href="snake.html#autotoc_md53"/>
+<a href="snake.html#autotoc_md54"/>
 <a href="snake.html#autotoc_md55"/>
-<a href="snake.html#autotoc_md56"/>
-<a href="snake.html#autotoc_md57"/>
-<a href="snake.html#autotoc_md58"/>
 <a href="structAmbient__t.html"/>
 <a href="structAmbient__t.html#a2b51d26372a2d0e6f33f9195911d28dc"/>
 <a href="structAmbient__t.html#aed778175ebe6fb9ea936a6af23ee2303"/>
diff --git a/md_README.html b/md_README.html
index 5194010..e623ae0 100644
--- a/md_README.html
+++ b/md_README.html
@@ -145,7 +145,7 @@ Performance improvements</h2>
 <li>Segment addresses are now resolved relative to other segments (feature by Tharo). This enables a strategy for <b>skipping repeated material DLs</b>: call a segment to run the material, remap the segment in the material to a display list that immediately returns, and so if the material is called again it won't run.</li>
 <li>New <code>SPMemset</code> command fills a specified RDRAM region with a repeated 16-bit value. This can be used for clearing the Z buffer or filling the framebuffer or the letterbox with a solid color <b>faster than the RDP can in fill mode</b>. Practical performance may vary due to scheduling constraints.</li>
 <li>New <code>SPFlush</code> command can ensure that the RDP starts clearing the framebuffer as soon as possible during the frame, instead of waiting a short time for further RSP processing.</li>
-<li>The key codepaths for command dispatch, triangle draw, and vertex processing (assuming lighting enabled and the occlusion plane disabled with the <code>NOC</code> configuration) are <b>slightly faster than in F3DEX2</b>.</li>
+<li>The key codepaths for command dispatch, triangle draw, and vertex processing (assuming lighting enabled and the occlusion plane disabled with the <code>NOC</code> configuration) are <b>faster than in F3DEX2</b>, sometimes <a href="https://hackern64.github.io/F3DEX3/performance.html">much faster</a>.</li>
 </ul>
 <h2><a class="anchor" id="autotoc_md9"></a>
 Miscellaneous</h2>
diff --git a/md_docs_2documentation.js b/md_docs_2documentation.js
index da7aced..e685acc 100644
--- a/md_docs_2documentation.js
+++ b/md_docs_2documentation.js
@@ -24,41 +24,37 @@ var md_docs_2documentation =
     [ "Design Tradeoffs", "design-tradeoffs.html", [
       [ "What are the tradeoffs for all these new features?", "design-tradeoffs.html#autotoc_md28", [
         [ "Vertex processing RSP time for occlusion plane", "design-tradeoffs.html#autotoc_md29", null ],
-        [ "Functionality in Overlay 3", "design-tradeoffs.html#autotoc_md30", null ],
+        [ "Functionality in overlays", "design-tradeoffs.html#autotoc_md30", null ],
         [ "Far clipping removal", "design-tradeoffs.html#autotoc_md31", null ],
         [ "Removal of scaled vertex normals", "design-tradeoffs.html#autotoc_md32", null ],
         [ "RDP temporary buffers shrinking", "design-tradeoffs.html#autotoc_md33", null ],
         [ "Segment 0", "design-tradeoffs.html#autotoc_md34", null ],
         [ "Non-textured tris", "design-tradeoffs.html#autotoc_md35", null ],
-        [ "Obscure semantic differences from F3DEX2 that should never matter in practice", "design-tradeoffs.html#autotoc_md36", null ]
+        [ "Yield check timing", "design-tradeoffs.html#autotoc_md36", null ],
+        [ "Obscure semantic differences from F3DEX2 that should never matter in practice", "design-tradeoffs.html#autotoc_md37", null ]
       ] ]
     ] ],
     [ "Removed Features", "removed.html", [
-      [ "Removed Features", "removed.html#autotoc_md49", [
-        [ "Legacy Vertex Pipeline (LVP) Configuration", "removed.html#autotoc_md50", null ],
-        [ "Octahedral Encoding for Packed Normals", "removed.html#autotoc_md51", null ],
-        [ "Clipping minimal scanlines algorithm", "removed.html#autotoc_md52", null ],
-        [ "Z attribute offsets", "removed.html#autotoc_md53", null ],
-        [ "SPTriStrip and SPTriFan", "removed.html#autotoc_md54", null ]
+      [ "Removed Features", "removed.html#autotoc_md46", [
+        [ "Legacy Vertex Pipeline (LVP) Configuration", "removed.html#autotoc_md47", null ],
+        [ "Octahedral Encoding for Packed Normals", "removed.html#autotoc_md48", null ],
+        [ "Clipping minimal scanlines algorithm", "removed.html#autotoc_md49", null ],
+        [ "Z attribute offsets", "removed.html#autotoc_md50", null ],
+        [ "SPTriStrip and SPTriFan", "removed.html#autotoc_md51", null ]
       ] ]
     ] ],
     [ "Performance Results", "performance.html", [
-      [ "Performance Results", "performance.html#autotoc_md37", [
-        [ "Cycle Counts", "performance.html#autotoc_md38", null ],
-        [ "Triangle Snake Cycle Counts", "performance.html#autotoc_md39", [
-          [ "Very Long Snakes", "performance.html#autotoc_md40", null ],
-          [ "Starting a Snake", "performance.html#autotoc_md41", null ],
-          [ "Ending a Snake", "performance.html#autotoc_md42", null ],
-          [ "Example", "performance.html#autotoc_md43", null ]
-        ] ]
+      [ "Performance Results", "performance.html#autotoc_md38", [
+        [ "Cycle Counts", "performance.html#autotoc_md39", null ],
+        [ "Triangle Snake Cycle Counts", "performance.html#autotoc_md40", null ]
       ] ]
     ] ],
     [ "Porting Your Romhack Codebase to F3DEX3", "porting.html", [
-      [ "Porting Your Romhack Codebase to F3DEX3", "porting.html#autotoc_md44", [
-        [ "Required Changes", "porting.html#autotoc_md45", null ],
-        [ "Recommended Changes (Non-Lighting)", "porting.html#autotoc_md46", null ],
-        [ "Recommended Changes (Lighting)", "porting.html#autotoc_md47", null ],
-        [ "Changes Required for New Features", "porting.html#autotoc_md48", null ]
+      [ "Porting Your Romhack Codebase to F3DEX3", "porting.html#autotoc_md41", [
+        [ "Required Changes", "porting.html#autotoc_md42", null ],
+        [ "Recommended Changes (Non-Lighting)", "porting.html#autotoc_md43", null ],
+        [ "Recommended Changes (Lighting)", "porting.html#autotoc_md44", null ],
+        [ "Changes Required for New Features", "porting.html#autotoc_md45", null ]
       ] ]
     ] ],
     [ "Triangle Snake", "snake.html", null ]
diff --git a/navtreedata.js b/navtreedata.js
index 1bb5c66..194bec8 100644
--- a/navtreedata.js
+++ b/navtreedata.js
@@ -59,7 +59,7 @@ var NAVTREE =
 var NAVTREEINDEX =
 [
 "annotated.html",
-"structVtx__t.html#adb51f166e83cc0968c316d7e54fcf29c"
+"structVtx__tn.html#a2020b066452aa4c208b26c8b68c4d056"
 ];
 
 var SYNCONMSG = 'click to disable panel synchronisation';
diff --git a/navtreeindex0.js b/navtreeindex0.js
index 319c00a..b906b7a 100644
--- a/navtreeindex0.js
+++ b/navtreeindex0.js
@@ -31,6 +31,7 @@ var NAVTREEINDEX0 =
 "design-tradeoffs.html#autotoc_md34":[2,2,0,5],
 "design-tradeoffs.html#autotoc_md35":[2,2,0,6],
 "design-tradeoffs.html#autotoc_md36":[2,2,0,7],
+"design-tradeoffs.html#autotoc_md37":[2,2,0,8],
 "files.html":[5,0],
 "functions.html":[4,2,0],
 "functions_vars.html":[4,2,1],
@@ -200,26 +201,22 @@ var NAVTREEINDEX0 =
 "md_docs_2documentation.html":[2],
 "pages.html":[],
 "performance.html":[2,4],
-"performance.html#autotoc_md37":[2,4,0],
-"performance.html#autotoc_md38":[2,4,0,0],
-"performance.html#autotoc_md39":[2,4,0,1],
-"performance.html#autotoc_md40":[2,4,0,1,0],
-"performance.html#autotoc_md41":[2,4,0,1,1],
-"performance.html#autotoc_md42":[2,4,0,1,2],
-"performance.html#autotoc_md43":[2,4,0,1,3],
+"performance.html#autotoc_md38":[2,4,0],
+"performance.html#autotoc_md39":[2,4,0,0],
+"performance.html#autotoc_md40":[2,4,0,1],
 "porting.html":[2,5],
-"porting.html#autotoc_md44":[2,5,0],
-"porting.html#autotoc_md45":[2,5,0,0],
-"porting.html#autotoc_md46":[2,5,0,1],
-"porting.html#autotoc_md47":[2,5,0,2],
-"porting.html#autotoc_md48":[2,5,0,3],
+"porting.html#autotoc_md41":[2,5,0],
+"porting.html#autotoc_md42":[2,5,0,0],
+"porting.html#autotoc_md43":[2,5,0,1],
+"porting.html#autotoc_md44":[2,5,0,2],
+"porting.html#autotoc_md45":[2,5,0,3],
 "removed.html":[2,3],
-"removed.html#autotoc_md49":[2,3,0],
-"removed.html#autotoc_md50":[2,3,0,0],
-"removed.html#autotoc_md51":[2,3,0,1],
-"removed.html#autotoc_md52":[2,3,0,2],
-"removed.html#autotoc_md53":[2,3,0,3],
-"removed.html#autotoc_md54":[2,3,0,4],
+"removed.html#autotoc_md46":[2,3,0],
+"removed.html#autotoc_md47":[2,3,0,0],
+"removed.html#autotoc_md48":[2,3,0,1],
+"removed.html#autotoc_md49":[2,3,0,2],
+"removed.html#autotoc_md50":[2,3,0,3],
+"removed.html#autotoc_md51":[2,3,0,4],
 "snake.html":[2,6],
 "structAmbient__t.html":[4,0,0],
 "structAmbient__t.html#a2b51d26372a2d0e6f33f9195911d28dc":[4,0,0,0],
@@ -249,5 +246,8 @@ var NAVTREEINDEX0 =
 "structVp__t.html#a413b3bf379f48a1f039a9a7eb133a67b":[4,0,13,0],
 "structVtx__t.html":[4,0,15],
 "structVtx__t.html#a2020b066452aa4c208b26c8b68c4d056":[4,0,15,2],
-"structVtx__t.html#ac6d2aff53d0bd3fdfe97ac50bfff34e2":[4,0,15,0]
+"structVtx__t.html#ac6d2aff53d0bd3fdfe97ac50bfff34e2":[4,0,15,0],
+"structVtx__t.html#adb51f166e83cc0968c316d7e54fcf29c":[4,0,15,1],
+"structVtx__tn.html":[4,0,16],
+"structVtx__tn.html#a1c089003792d753ad1a83e55ba6a1c5d":[4,0,16,2]
 };
diff --git a/navtreeindex1.js b/navtreeindex1.js
index 46aa2c7..3b4ffb1 100644
--- a/navtreeindex1.js
+++ b/navtreeindex1.js
@@ -1,8 +1,5 @@
 var NAVTREEINDEX1 =
 {
-"structVtx__t.html#adb51f166e83cc0968c316d7e54fcf29c":[4,0,15,1],
-"structVtx__tn.html":[4,0,16],
-"structVtx__tn.html#a1c089003792d753ad1a83e55ba6a1c5d":[4,0,16,2],
 "structVtx__tn.html#a2020b066452aa4c208b26c8b68c4d056":[4,0,16,3],
 "structVtx__tn.html#a24420a9beaac7cee08b5e255a4c29db1":[4,0,16,0],
 "structVtx__tn.html#adb51f166e83cc0968c316d7e54fcf29c":[4,0,16,1],
diff --git a/performance.html b/performance.html
index 6a5d5fb..d2497be 100644
--- a/performance.html
+++ b/performance.html
@@ -103,47 +103,47 @@ $(function(){initNavTree('performance.html',''); initResizable(true); });
   <div class="headertitle"><div class="title">Performance Results</div></div>
 </div><!--header-->
 <div class="contents">
-<div class="textblock"><h1><a class="anchor" id="autotoc_md37"></a>
+<div class="textblock"><h1><a class="anchor" id="autotoc_md38"></a>
 Performance Results</h1>
 <p>F3DEX3_NOC matches or beats the RSP performance of F3DEX2 on <b>all</b> critical paths in the microcode, including command dispatch, vertex processing, and triangle processing. Then, the RDP and memory traffic performance improvements of F3DEX3&ndash;56 vertex buffer, auto-batched rendering, etc.&ndash;should further improve overall game performance from there.</p>
-<h2><a class="anchor" id="autotoc_md38"></a>
+<h2><a class="anchor" id="autotoc_md39"></a>
 Cycle Counts</h2>
-<p>These are cycle counts for many key paths in the microcode. Lower numbers are better. The timings are hand-counted taking into account all pipeline stalls and all dual-issue conditions. Instruction alignment after branches is usually taken into account, but in some cases it is assumed to be optimal.</p>
+<p>These are cycle counts for many key paths in the microcode. Lower numbers are better. The timings are hand-counted taking into account all pipeline stalls and all dual-issue conditions. Instruction alignment after branches is usually taken into account (especially in F3DEX3), but in some cases it is assumed to be optimal.</p>
 <p>All numbers assume default profiling configuration. <a class="el" href="structTri.html">Tri</a> numbers assume texture, shade, and Z, and not flushing the buffer. <a class="el" href="structTri.html">Tri</a> numbers are measured from the first cycle of the command handler inclusive, to the first cycle of whatever is after $ra exclusive; this is in order to capture an extra stall cycle in F3DEX2 when finishing a triangle and going to the next command.</p>
 <p>Vertex numbers assume no extra F3DEX3 features (packed normals, ambient occlusion, etc.). These features are listed below as the number of extra cycles the feature costs per vertex pair. ltbasic is the codepath when point lighting, specular, and Fresnel are disabled; ltadv is the codepath with any of these enabled. The reason timings are listed separately for each number of lights is because some implementations are pipelined for two lights, so going from an even to an odd number of lights adds a different time than vice versa.</p>
 <table class="markdownTable">
 <tr class="markdownTableHead">
 <th class="markdownTableHeadNone"></th><th class="markdownTableHeadNone">F3DEX2   </th><th class="markdownTableHeadNone">F3DEX3_NOC   </th><th class="markdownTableHeadNone">F3DEX3    </th></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Command dispatch   </td><td class="markdownTableBodyNone">12   </td><td class="markdownTableBodyNone">12   </td><td class="markdownTableBodyNone">12    </td></tr>
+<td class="markdownTableBodyNone">Command dispatch   </td><td class="markdownTableBodyNone">12   </td><td class="markdownTableBodyNone">10   </td><td class="markdownTableBodyNone">10    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">Small RDP command   </td><td class="markdownTableBodyNone">14   </td><td class="markdownTableBodyNone">5   </td><td class="markdownTableBodyNone">5    </td></tr>
+<td class="markdownTableBodyNone">Small RDP command   </td><td class="markdownTableBodyNone">14   </td><td class="markdownTableBodyNone">4   </td><td class="markdownTableBodyNone">4    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Only/2nd tri to offscreen   </td><td class="markdownTableBodyNone">27   </td><td class="markdownTableBodyNone">25   </td><td class="markdownTableBodyNone">25    </td></tr>
+<td class="markdownTableBodyNone">Only/2nd tri to offscreen   </td><td class="markdownTableBodyNone">27   </td><td class="markdownTableBodyNone">20   </td><td class="markdownTableBodyNone">20    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">1st tri to offscreen   </td><td class="markdownTableBodyNone">28   </td><td class="markdownTableBodyNone">26   </td><td class="markdownTableBodyNone">26    </td></tr>
+<td class="markdownTableBodyNone">1st tri to offscreen   </td><td class="markdownTableBodyNone">28   </td><td class="markdownTableBodyNone">21   </td><td class="markdownTableBodyNone">21    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Only/2nd tri to clip   </td><td class="markdownTableBodyNone">32   </td><td class="markdownTableBodyNone">30   </td><td class="markdownTableBodyNone">30    </td></tr>
+<td class="markdownTableBodyNone">Only/2nd tri to clip   </td><td class="markdownTableBodyNone">32   </td><td class="markdownTableBodyNone">25   </td><td class="markdownTableBodyNone">25    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">1st tri to clip   </td><td class="markdownTableBodyNone">33   </td><td class="markdownTableBodyNone">31   </td><td class="markdownTableBodyNone">31    </td></tr>
+<td class="markdownTableBodyNone">1st tri to clip   </td><td class="markdownTableBodyNone">33   </td><td class="markdownTableBodyNone">26   </td><td class="markdownTableBodyNone">26    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Only/2nd tri to backface   </td><td class="markdownTableBodyNone">38   </td><td class="markdownTableBodyNone">36   </td><td class="markdownTableBodyNone">36    </td></tr>
+<td class="markdownTableBodyNone">Only/2nd tri to backface   </td><td class="markdownTableBodyNone">38   </td><td class="markdownTableBodyNone">31   </td><td class="markdownTableBodyNone">31    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">1st tri to backface   </td><td class="markdownTableBodyNone">39   </td><td class="markdownTableBodyNone">37   </td><td class="markdownTableBodyNone">37    </td></tr>
+<td class="markdownTableBodyNone">1st tri to backface   </td><td class="markdownTableBodyNone">39   </td><td class="markdownTableBodyNone">32   </td><td class="markdownTableBodyNone">32    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Only/2nd tri to degenerate   </td><td class="markdownTableBodyNone">42   </td><td class="markdownTableBodyNone">38   </td><td class="markdownTableBodyNone">38    </td></tr>
+<td class="markdownTableBodyNone">Only/2nd tri to degenerate   </td><td class="markdownTableBodyNone">42   </td><td class="markdownTableBodyNone">33   </td><td class="markdownTableBodyNone">33    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">1st tri to degenerate   </td><td class="markdownTableBodyNone">43   </td><td class="markdownTableBodyNone">39   </td><td class="markdownTableBodyNone">39    </td></tr>
+<td class="markdownTableBodyNone">1st tri to degenerate   </td><td class="markdownTableBodyNone">43   </td><td class="markdownTableBodyNone">34   </td><td class="markdownTableBodyNone">34    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Only/2nd tri to occluded   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">42    </td></tr>
+<td class="markdownTableBodyNone">Only/2nd tri to occluded   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">37    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">1st tri to occluded   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">43    </td></tr>
+<td class="markdownTableBodyNone">1st tri to occluded   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">38    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone">Only/2nd tri to draw   </td><td class="markdownTableBodyNone">172   </td><td class="markdownTableBodyNone">156   </td><td class="markdownTableBodyNone">158    </td></tr>
+<td class="markdownTableBodyNone">Only/2nd tri to draw   </td><td class="markdownTableBodyNone">172   </td><td class="markdownTableBodyNone">149   </td><td class="markdownTableBodyNone">151    </td></tr>
 <tr class="markdownTableRowEven">
-<td class="markdownTableBodyNone">1st tri to draw   </td><td class="markdownTableBodyNone">173   </td><td class="markdownTableBodyNone">157   </td><td class="markdownTableBodyNone">159    </td></tr>
+<td class="markdownTableBodyNone">1st tri to draw   </td><td class="markdownTableBodyNone">173   </td><td class="markdownTableBodyNone">150   </td><td class="markdownTableBodyNone">152    </td></tr>
 <tr class="markdownTableRowOdd">
-<td class="markdownTableBodyNone"><a class="el" href="structTri.html">Tri</a> snake   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">*   </td><td class="markdownTableBodyNone">*    </td></tr>
+<td class="markdownTableBodyNone"><a class="el" href="structTri.html">Tri</a> snake   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">10/11*   </td><td class="markdownTableBodyNone">10/11*    </td></tr>
 <tr class="markdownTableRowEven">
 <td class="markdownTableBodyNone"><a class="el" href="unionVtx.html">Vtx</a> before DMA start   </td><td class="markdownTableBodyNone">16   </td><td class="markdownTableBodyNone">17   </td><td class="markdownTableBodyNone">17    </td></tr>
 <tr class="markdownTableRowOdd">
@@ -229,40 +229,12 @@ Cycle Counts</h2>
 <tr class="markdownTableRowOdd">
 <td class="markdownTableBodyNone">Light dir xfrm, 9 dir lts   </td><td class="markdownTableBodyNone">Can't   </td><td class="markdownTableBodyNone">196   </td><td class="markdownTableBodyNone">196   </td></tr>
 </table>
-<h2><a class="anchor" id="autotoc_md39"></a>
+<h2><a class="anchor" id="autotoc_md40"></a>
 Triangle Snake Cycle Counts</h2>
-<h3><a class="anchor" id="autotoc_md40"></a>
-Very Long Snakes</h3>
-<p>For this section, we assume almost all tris are contained in very long snakes, so the overhead of starting and ending snakes is negligible. This overhead is discussed in the next section.</p>
-<p>We are assuming that the same set of tris is being drawn with or without snakes. Thus, cycles from <code>tri_main_from_snake</code> through the instruction after the return exclusive are not counted here, as they are the same regardless of which method is being used.</p>
-<p>For a pair of tris drawn without snakes, i.e. with a single <code>SP2Triangles</code> command, the cycles are:</p><ul>
-<li>Command dispatch: 12</li>
-<li>First tri up to <code>tri_main_from_snake</code>: 5</li>
-<li>Second tri up to <code>tri_main_from_snake</code>: 4</li>
-<li>Total: 21</li>
-</ul>
-<p>For a pair of tris which are part of a long snake, the cycles are:</p><ul>
-<li>Each tri up to <code>tri_main_from_snake</code>: 11</li>
-<li>Total: 22</li>
-</ul>
-<p>However, there's also the memory bandwidth savings. The <code>SP2Triangles</code> command is 8 bytes and the two tris in a long snake are 2 bytes, so switching to snake saves 6 bytes of bandwidth. Testing has shown that RSP DMAs on average transfer about 2.2 bytes per cycle, though it depends on the length. So this is a savings of about 2.7 cycles of RDRAM / RDP time. Since the DMAs loading this data are input buffer loads, and the RSP stalls waiting for input buffer loads (it does not do useful work during this time), this is also 2.7 cycles of RSP time. This offsets the 1 extra cycle of processing the tri pair above.</p>
-<p>Therefore, switching to snake (assuming very long snakes) saves about 2.7 cycles of RDRAM / RDP time and 1.7 cycles of RSP time per two tris, or about 0.9 RSP cycles and 1.4 RDRAM cycles per tri.</p>
-<h3><a class="anchor" id="autotoc_md41"></a>
-Starting a Snake</h3>
-<p>Since a <code>SPTriSnake</code> command encodes 5 triangles, for comparison to <code>SP2Triangles</code> we will consider the overhead for 10 triangles total / two snake starts.</p>
-<p>For <code>SPTriSnake</code>, this is 2 x (12 cycles command dispatch + 4 cycles snake initialization + 5 tris x 11 cycles per tri as discussed above) = 142 RSP cycles. And it is 16 bytes of loads = 7.3 cycles of RDRAM / RDP time and stall RSP time. So the total cost is 149.3 RSP and 7.3 RDRAM cycles.</p>
-<p>For <code>SP2Triangles</code>, this is 5 x (21 cycles as discussed above) = 105 RSP cycles. And it is 40 bytes of loads = 18.2 cycles of RDRAM / RDP time and stall RSP time. So the total cost is 123.2 RSP and 18.2 RDRAM cycles.</p>
-<p>But drawing those 10 tris as part of very long snakes would have saved 13.5 RDRAM cycles and 8.5 RSP cycles. So the relative cost of drawing these tris as two start-of-snakes instead of in very long snakes is 34.6 RSP cycles and 2.6 RDRAM cycles. Thus the cost of each start-of-snake relative to long snakes is 17.3 RSP cycles and 1.3 RDRAM cycles.</p>
-<h3><a class="anchor" id="autotoc_md42"></a>
-Ending a Snake</h3>
-<p>Ending a snake costs 12 cycles of RSP time and has no direct impact on memory traffic. However, calculating the overall performance is more complicated: the snake can end after 1-8 bytes of the <code>SPContinueSnake</code> command, and the remaining bytes are "wasted" in that they do not contribute to drawing tris with memory bandwidth savings.</p>
-<p>From a mesh optimization standpoint, this is not an issue. If you have a snake which has filled 8 bytes of the previous <code>SPContinueSnake</code> command, and you have another triangle to draw, there are only two cases. If that tri can't be appended to the snake, you have to draw it with a <code>SP1Triangle</code> command either way, so there is no performance difference. If it can be appended to the snake, doing so will take 8 bytes of memory traffic&ndash;the same as the <code>SP1Triangle</code> command. The snake end penalty will have to be paid whether before or after this tri. And it's 11 RSP cycles to draw one more tri in an existing snake, whereas the command dispatch plus second tri code for <code>SP1Triangle</code> is 16 cycles. So it's better to continue a snake than to stop it early and use non-snake commands, even if this leads to a mostly empty <code>SPContinueSnake</code> command. Of course, if you can fill up even more tris in the command, the performance benefit increases.</p>
-<p>Assuming snake lengths are uniformly distributed, on average a snake will end after 4.5 bytes (the same number of triangles) of a <code>SPContinueSnake</code> command. In this case, the command will take 4.5 tris x 11 cycles per tri + 12 cycle end snake penalty = 61.5 RSP cycles, and 8 bytes of memory traffic = 3.6 RDRAM cycles. If these 4.5 tris were instead drawn with <code>SP2Triangles</code> commands, that would be 2.25 commands = 47.3 RSP cycles and 18 bytes = 8.2 RDRAM cycles. Thus on average, the snake end costs 14.2 RSP cycles and saves 4.6 RDRAM cycles compared to <code>SP2Triangles</code> commands. But drawing those 4.5 tris as part of very long snakes would have saved 3.9 RSP cycles and 6.1 RDRAM cycles. So the average cost of ending a snake relative to very long snakes is 18.1 RSP and 1.5 RDRAM cycles.</p>
-<h3><a class="anchor" id="autotoc_md43"></a>
-Example</h3>
-<p>Suppose there are 4000 tris on screen. Suppose that 90% of them have been encoded with snakes&ndash;the rest are disconnected single tris or tri pairs (quads). That 10% are then encoded with <code>SP2Triangles</code> commands, which is the same performance with or without snakes, so we ignore those tris, and there are 3600 "snakeable" tris in the scene.</p>
-<p>Suppose that the average snake length is 16, to account for some objects with more contiguous tris with the same material, and others with smaller disjoint parts. Thus, for 3600 tris, there are 225 snakes.</p>
-<p>Switching the 3600 tris from <code>SP2Triangles</code> commands to long snakes saves 4860 RDRAM cycles and 3060 RSP cycles. However, the 225 snake starts and ends cost 630 RDRAM and 7965 RSP cycles relative to this. So the total performance change of switching to snakes in this case is that the RDRAM / RDP goes faster by 4230 cycles = 68 us, but the RSP goes slower by 4905 cycles = 78 us. </p>
+<p>With the recent F3DEX3 updates bringing significant RSP time savings in command dispatch and triangle draw, triangle snakes are unfortuantely no longer competitive in RSP time.</p>
+<p>Suppose we have two tris which are offscreen. If drawn with <code>SP2Triangles</code>, this is 10 cycles for command dispatch, 21 cycles to cull the first tri, and 20 cycles to cull the second, for a total of 51 cycles. If drawn as part of a long triangle snake, the triangle snake processing adds 10 or 11 cycles relative to the <code>SP2Triangles</code> first or second triangle respectively. So this is 31 cycles to cull each triangle, for a total of 61 cycles.</p>
+<p>It gets worse for snakes when counting the overhead of starting and ending a snake, which have also gotten worse with the recent changes bringing triangle performance improvements. I used to have a long discussion here computing estimated performance for switching to snakes, but the numbers have all changed and they were imprecise to begin with. The upshot is for a typical scene, switching everything from <code>SP2Triangles</code> to snakes might save about 70 us of RDRAM/RDP time but cost about 400 us of RSP time.</p>
+<p>However, note that in F3DEX2, <code>SP2Triangles</code> to two offscreen triangles is 12+28+27 = 67 cycles. F3DEX3 is so much faster than F3DEX2 that even the performance penalty of snakes doesn't outweigh this. </p>
 </div></div><!-- contents -->
 </div><!-- PageDoc -->
 </div><!-- doc-content -->
diff --git a/porting.html b/porting.html
index b10ee19..9244dd0 100644
--- a/porting.html
+++ b/porting.html
@@ -103,7 +103,7 @@ $(function(){initNavTree('porting.html',''); initResizable(true); });
   <div class="headertitle"><div class="title">Porting Your Romhack Codebase to F3DEX3</div></div>
 </div><!--header-->
 <div class="contents">
-<div class="textblock"><h1><a class="anchor" id="autotoc_md44"></a>
+<div class="textblock"><h1><a class="anchor" id="autotoc_md41"></a>
 Porting Your Romhack Codebase to F3DEX3</h1>
 <p>For an OoT codebase, only a few minor changes are required to use F3DEX3. However, more changes are recommended to increase performance and enable new features.</p>
 <p>How to modify the microcode in your HackerOoT based romhack (note that this is already done in HackerOoT, so this is provided as a guide for other games):</p><ul>
@@ -119,7 +119,7 @@ Porting Your Romhack Codebase to F3DEX3</h1>
 <li>If you start using new features in F3DEX3, make the "Changes required for new
   features" listed below.</li>
 </ul>
-<h2><a class="anchor" id="autotoc_md45"></a>
+<h2><a class="anchor" id="autotoc_md42"></a>
 Required Changes</h2>
 <p>Both OoT and SM64:</p>
 <ul>
@@ -141,7 +141,7 @@ Required Changes</h2>
 <li>If you are using the vanilla lighting system where light directions are always fixed, the vanilla permanent light direction of <code>{0x28, 0x28, 0x28}</code> must be changed to <code>{0x49, 0x49, 0x49}</code>, or everything will be too dark. The former vector is not properly normalized, but F3D through F3DEX2 normalize light directions in the microcode, so it doesn't matter with those microcodes. The two lighting codepaths in F3DEX3 treat light directions and vertex normals differently: the fast one works like F3DEX2, but the slow one normalizes vertex normals after transforming them and does not modify light directions. Thus in this case, the light directions must already be normalized.</li>
 <li>Matrix stack fix (world space lighting / view matrix in VP instead of in M) is basically required. If you <em>really</em> want camera space lighting, use matrix stack fix, transform the fixed camera space light direction by V inverse each frame, and send that to the RSP.</li>
 </ul>
-<h2><a class="anchor" id="autotoc_md46"></a>
+<h2><a class="anchor" id="autotoc_md43"></a>
 Recommended Changes (Non-Lighting)</h2>
 <ul>
 <li>Clean up any code using the deprecated, hacky <code>SPLookAtX</code> and <code>SPLookAtY</code> to use <code>SPLookAt</code> instead (this is only a few lines change). Also remove any code which writes <code>SPClipRatio</code> or <code>SPForceMatrix</code>&ndash;these are now no-ops, so you might as well not write them.</li>
@@ -150,7 +150,7 @@ Recommended Changes (Non-Lighting)</h2>
 <li><code>#define REQUIRE_SEMICOLONS_AFTER_GBI_COMMANDS</code> (at the top of, or before including, the GBI) for a more modern, OoT-style codebase where uses of GBI commands require semicolons after them. SM64 omits the semicolons sometimes, e.g. <code>gSPDisplayList(gfx++, foo) gSPEndDisplayList(gfx++);</code>. If you are using <code>-Wpedantic</code>, using this define is required.</li>
 <li>Once everything in your romhack is ported to F3DEX3 and everything is stable, <code>#define NO_SYNCS_IN_TEXTURE_LOADS</code> (at the top of, or before including, the GBI) and fix any crashes or graphical issues that arise. Display lists exported from fast64 already do not contain these syncs, but vanilla display lists or custom ones using the texture loading multi-command macros do. Disabling the syncs saves a few percent of RDP cycles for each material setup; what percentage this is of the total RDP time depends on how many triangles are typically drawn between each material change. For more information, see the GBI documentation near this define.</li>
 </ul>
-<h2><a class="anchor" id="autotoc_md47"></a>
+<h2><a class="anchor" id="autotoc_md44"></a>
 Recommended Changes (Lighting)</h2>
 <ul>
 <li>Change your game engine lighting code to load all lights in one DMA transfer with <code>SPSetLights</code>, instead of one-at-a-time with repeated <code>SPLight</code> commands. Note that if you are using a pointer (dynamically allocated) rather than a direct variable (statically allocated), you need to dereference it; see the docstring for this macro in the GBI.</li>
@@ -159,7 +159,7 @@ Recommended Changes (Lighting)</h2>
 <li>Consider setting lights once before rendering a scene and all actors, rather than setting lights before rendering each actor. OoT does the latter to emulate point lights in a scene with a directional light recomputed per actor. You can now just send those to the RSP as real point lights, regardless of whether the display lists are vanilla or new.</li>
 <li>If your game already had point lighting, note that the point light kc, kl, and kq factors have been changed, so you will need to redesign how game engine light parameters (e.g. "light radius") map to these parameters.</li>
 </ul>
-<h2><a class="anchor" id="autotoc_md48"></a>
+<h2><a class="anchor" id="autotoc_md45"></a>
 Changes Required for New Features</h2>
 <p>Each of these changes is required if you want to use the respective new feature, but is not necessary if you are not using it.</p>
 <ul>
diff --git a/removed.html b/removed.html
index 4714bd7..bbe1290 100644
--- a/removed.html
+++ b/removed.html
@@ -103,20 +103,20 @@ $(function(){initNavTree('removed.html',''); initResizable(true); });
   <div class="headertitle"><div class="title">Removed Features</div></div>
 </div><!--header-->
 <div class="contents">
-<div class="textblock"><h1><a class="anchor" id="autotoc_md49"></a>
+<div class="textblock"><h1><a class="anchor" id="autotoc_md46"></a>
 Removed Features</h1>
 <p>These features were present in earlier F3DEX3 versions, but have been removed.</p>
-<h2><a class="anchor" id="autotoc_md50"></a>
+<h2><a class="anchor" id="autotoc_md47"></a>
 Legacy Vertex Pipeline (LVP) Configuration</h2>
 <p>Early versions of F3DEX3 were developed exclusively in an OoT context, where scenes are almost always RDP bottlenecked. Thus, these versions focused on reducing RDP time and adding new visual features at the cost of RSP time.</p>
 <p>Later, Kaze Emanuar became interested in using F3DEX3 in Return to Yoshi's Island due to the RDP performance improvements. However, due to the intense optimization work he had done, his game was relatively balanced in RDP / RSP time. Thus, when he tried F3DEX3, the decrease in RDP time and increase in RSP time made the game slower overall, which was not acceptable.</p>
 <p>As a result, the LVP configuration of F3DEX3 was developed, to bring F3DEX2-style vertex processing in exchange for dropping some of the advanced lighting features (which Kaze was not going to use anyway due to HLE compatibility). This was implemented, and after much optimization across the entire microcode, <code>F3DEX3_LVP_NOC</code> became slightly faster than F3DEX2 on both RDP and RSP. This caused Kaze to immediately adopt this configuration of F3DEX3 for Return to Yoshi's Island.</p>
 <p>Unfortunately, this meant that if developers wanted to use the advanced lighting features of F3DEX3 in any part of their project, they were stuck with the much slower non-LVP configuration of F3DEX3. The desire to have the microcode automatically swap versions for each material, plus the invention of ways to include some of the advanced lighting features in the LVP vertex processing without any performance penalty when not using them, led to the reunion of the versions. Now you get LVP-style performance when not using some of the advanced features, and only pay the performance penalty while rendering objects which use them.</p>
 <p>A similar approach was also considered for the NOC configuration&ndash;to have the microcode only compute the occlusion plane when it is enabled. This is unfortunately infeasible. Register allocation / naming, as well as some pipelined instructions leading into and out of lighting, are significantly different between the occlusion plane and NOC versions of vertex processing. This means the microcode would have to swap between four versions of lighting code instead of just two, creating much more complexity with the overlay system and IMEM size issues. Furthermore, the occlusion plane is typically not enabled/disabled per object, but used when rendering as much of the game contents as possible to maximize occluded objects. So it is reasonable to choose the occlusion plane or NOC configuration on a per-frame or even per-scene basis.</p>
-<h2><a class="anchor" id="autotoc_md51"></a>
+<h2><a class="anchor" id="autotoc_md48"></a>
 Octahedral Encoding for Packed Normals</h2>
 <p>Previous F3DEX3 versions encoded packed normals into the unused 2 bytes of each vertex using a variant of <a href="https://knarkowicz.wordpress.com/2014/04/16/octahedron-normal-vector-encoding/">octahedral encoding</a>. Using this method, the normals were effectively as precise as with the vanilla method of replacing vertex RGB with normal XYZ. However, the decoding of this format was inefficient, partly due to the requirement to also support vanilla normals at vanilla performance. Once HailToDodongo showed that the community was willing to accept the moderate precision loss of the much simpler 5-6-5 bit encoding in <a href="https://github.com/HailToDodongo/tiny3d">Tiny3D</a>, this was adopted in F3DEX3.</p>
-<h2><a class="anchor" id="autotoc_md52"></a>
+<h2><a class="anchor" id="autotoc_md49"></a>
 Clipping minimal scanlines algorithm</h2>
 <p>Earlier F3DEX3 versions included a modified algorithm for triangulating the polygon which was formed as the result of clipping. This algorithm broke up the polygon into triangles in such a way that the fewest scanlines were accessed multiple times, leading to maximum performance on the RDP. For example, if the polygon was a diamond shape, this algorithm would always cut it horizontally&ndash; leading to few or no scanlines being touched by both the top and bottom tris&ndash;as opposed to vertically, leading to all scanlines being touched by the left and right tris.</p>
 <p>In testing, this was able to save a few hundred microseconds at best in scenes with many large clipped tris. However, this feature has been removed, because it was found to cause undesirable visual artifacts. Other changes to clipping were experimented with in the past, and ultimately not included. These are not due to a bug or design issue with the microcode, but a fundamental limitation of the RDP: vertex colors are interpolated in screen space without perspective correction. In other words, the shade colors of ANY triangle not flat to the camera are slightly wrong, regardless of which microcode is in use. The same world space portion of the triangle will have a slightly different color depending on how the camera is rotated around it. The issues with clipping are a result of this.</p>
@@ -132,10 +132,10 @@ Color interpolation example</div></div>
 <p>Note that BOTH of these are wrong: the correct value for that pixel is 128, because all points on the horizontal midline of the original triangle are color 128. The N64 can't draw the correct triangle here&ndash;its colors would have to change nonlinearly along an edge.</p>
 <p>The problem with the clipping minimal scanlines algorithm is that it would switch between cases C and D here based on which diagonal had a larger Y component. In other words, if the camera moved slightly, the choice of triangulation might change, causing the middle of the polygon to visibly change color. This was visible on large scene triangles with lighting: as you walked around, the colors would have slight but abrupt changes, which look wrong/bad.</p>
 <p>The best we can do, which is what all previous F3D family microcodes did and F3DEX3 does now, is to triangulate in a consistent way, based on the winding of the input triangles. The results are still wrong, but they're wrong the same way every frame, so there are no abrupt changes visible.</p>
-<h2><a class="anchor" id="autotoc_md53"></a>
+<h2><a class="anchor" id="autotoc_md50"></a>
 Z attribute offsets</h2>
 <p>Earlier F3DEX3 versions included attribute offsets for vertex Z as well as ST. By setting this to -2 and drawing an opaque tri, the tri would appear like a decal, but with no Z-fighting. This has been removed and replaced with the decal fix, which is automatic and does not require any special setup in the display list.</p>
-<h2><a class="anchor" id="autotoc_md54"></a>
+<h2><a class="anchor" id="autotoc_md51"></a>
 <code>SPTriStrip</code> and <code>SPTriFan</code></h2>
 <p>These commands are still supported in the GBI, but as special cases of <code>SPTriSnake</code> with specific sets of directions. In addition to covering both of these commands, the <code>SPTriSnake</code> command can draw the mirror-imaged 4-triangle strip which <code>SPTriStrip</code> could not (without inefficiency), as well as arbitrarily long triangle strips, fans, and other snake shapes via <code>SPContinueSnake</code> . </p>
 </div></div><!-- contents -->
diff --git a/search/all_1.js b/search/all_1.js
index f24a742..16e64ad 100644
--- a/search/all_1.js
+++ b/search/all_1.js
@@ -1,5 +1,4 @@
 var searchData=
 [
-  ['3_0',['Functionality in Overlay 3',['../design-tradeoffs.html#autotoc_md30',1,'']]],
-  ['3d_20space_1',['3D Space',['../compatibility.html#autotoc_md19',1,'']]]
+  ['3d_20space_0',['3D Space',['../compatibility.html#autotoc_md19',1,'']]]
 ];
diff --git a/search/all_10.js b/search/all_10.js
index afb601d..3260ab7 100644
--- a/search/all_10.js
+++ b/search/all_10.js
@@ -2,15 +2,15 @@ var searchData=
 [
   ['rdp_20commands_0',['RDP Commands',['../compatibility.html#autotoc_md16',1,'']]],
   ['rdp_20temporary_20buffers_20shrinking_1',['RDP temporary buffers shrinking',['../design-tradeoffs.html#autotoc_md33',1,'']]],
-  ['recommended_20changes_20lighting_2',['Recommended Changes (Lighting)',['../porting.html#autotoc_md47',1,'']]],
-  ['recommended_20changes_20non_20lighting_3',['Recommended Changes (Non-Lighting)',['../porting.html#autotoc_md46',1,'']]],
+  ['recommended_20changes_20lighting_2',['Recommended Changes (Lighting)',['../porting.html#autotoc_md44',1,'']]],
+  ['recommended_20changes_20non_20lighting_3',['Recommended Changes (Non-Lighting)',['../porting.html#autotoc_md43',1,'']]],
   ['reference_4',['GBI Changes Reference',['../compatibility.html#autotoc_md15',1,'']]],
   ['removal_5',['Far clipping removal',['../design-tradeoffs.html#autotoc_md31',1,'']]],
   ['removal_20of_20scaled_20vertex_20normals_6',['Removal of scaled vertex normals',['../design-tradeoffs.html#autotoc_md32',1,'']]],
-  ['removed_20features_7',['Removed Features',['../removed.html',1,'Removed Features'],['../removed.html#autotoc_md49',1,'Removed Features']]],
-  ['required_20changes_8',['Required Changes',['../porting.html#autotoc_md45',1,'']]],
-  ['required_20for_20new_20features_9',['Changes Required for New Features',['../porting.html#autotoc_md48',1,'']]],
-  ['results_10',['Results',['../performance.html',1,'Performance Results'],['../performance.html#autotoc_md37',1,'Performance Results']]],
-  ['romhack_20codebase_20to_20f3dex3_11',['Romhack Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md44',1,'Porting Your Romhack Codebase to F3DEX3']]],
+  ['removed_20features_7',['Removed Features',['../removed.html',1,'Removed Features'],['../removed.html#autotoc_md46',1,'Removed Features']]],
+  ['required_20changes_8',['Required Changes',['../porting.html#autotoc_md42',1,'']]],
+  ['required_20for_20new_20features_9',['Changes Required for New Features',['../porting.html#autotoc_md45',1,'']]],
+  ['results_10',['Results',['../performance.html',1,'Performance Results'],['../performance.html#autotoc_md38',1,'Performance Results']]],
+  ['romhack_20codebase_20to_20f3dex3_11',['Romhack Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md41',1,'Porting Your Romhack Codebase to F3DEX3']]],
   ['rsp_20time_20for_20occlusion_20plane_12',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]]
 ];
diff --git a/search/all_11.js b/search/all_11.js
index 79551c5..ea2aa9e 100644
--- a/search/all_11.js
+++ b/search/all_11.js
@@ -1,17 +1,15 @@
 var searchData=
 [
   ['scaled_20vertex_20normals_0',['Removal of scaled vertex normals',['../design-tradeoffs.html#autotoc_md32',1,'']]],
-  ['scanlines_20algorithm_1',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md52',1,'']]],
+  ['scanlines_20algorithm_1',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md49',1,'']]],
   ['segment_200_2',['Segment 0',['../design-tradeoffs.html#autotoc_md34',1,'']]],
-  ['semantic_20differences_20from_20f3dex2_20that_20should_20never_20matter_20in_20practice_3',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
-  ['should_20never_20matter_20in_20practice_4',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['semantic_20differences_20from_20f3dex2_20that_20should_20never_20matter_20in_20practice_3',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
+  ['should_20never_20matter_20in_20practice_4',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['shrinking_5',['RDP temporary buffers shrinking',['../design-tradeoffs.html#autotoc_md33',1,'']]],
   ['size_6',['size',['../structPointLight__t.html#aac71ffe03c84523594a575b2062849c3',1,'PointLight_t']]],
-  ['snake_7',['Snake',['../performance.html#autotoc_md42',1,'Ending a Snake'],['../performance.html#autotoc_md41',1,'Starting a Snake'],['../snake.html',1,'Triangle Snake']]],
-  ['snake_20cycle_20counts_8',['Triangle Snake Cycle Counts',['../performance.html#autotoc_md39',1,'']]],
-  ['snakes_9',['Very Long Snakes',['../performance.html#autotoc_md40',1,'']]],
-  ['space_10',['3D Space',['../compatibility.html#autotoc_md19',1,'']]],
-  ['sptrifan_20tt_11',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md54',1,'']]],
-  ['sptristrip_20tt_20and_20tt_20sptrifan_20tt_12',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md54',1,'']]],
-  ['starting_20a_20snake_13',['Starting a Snake',['../performance.html#autotoc_md41',1,'']]]
+  ['snake_7',['Triangle Snake',['../snake.html',1,'md_docs_2documentation']]],
+  ['snake_20cycle_20counts_8',['Triangle Snake Cycle Counts',['../performance.html#autotoc_md40',1,'']]],
+  ['space_9',['3D Space',['../compatibility.html#autotoc_md19',1,'']]],
+  ['sptrifan_20tt_10',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md51',1,'']]],
+  ['sptristrip_20tt_20and_20tt_20sptrifan_20tt_11',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md51',1,'']]]
 ];
diff --git a/search/all_12.js b/search/all_12.js
index 3628fc2..b81a0b5 100644
--- a/search/all_12.js
+++ b/search/all_12.js
@@ -4,25 +4,26 @@ var searchData=
   ['temporary_20buffers_20shrinking_1',['RDP temporary buffers shrinking',['../design-tradeoffs.html#autotoc_md33',1,'']]],
   ['texrect_2',['TexRect',['../structTexRect.html',1,'']]],
   ['textured_20tris_3',['Non-textured tris',['../design-tradeoffs.html#autotoc_md35',1,'']]],
-  ['that_20should_20never_20matter_20in_20practice_4',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['that_20should_20never_20matter_20in_20practice_4',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['the_20tradeoffs_20for_20all_20these_20new_20features_5',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
   ['these_20new_20features_6',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
   ['time_20for_20occlusion_20plane_7',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]],
-  ['tiny3d_8',['Comparison with Tiny3D',['../snake.html#autotoc_md58',1,'']]],
-  ['to_20f3dex3_9',['to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md44',1,'Porting Your Romhack Codebase to F3DEX3']]],
-  ['tradeoffs_10',['Design Tradeoffs',['../design-tradeoffs.html',1,'md_docs_2documentation']]],
-  ['tradeoffs_20for_20all_20these_20new_20features_11',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
-  ['tri_12',['Tri',['../structTri.html',1,'']]],
-  ['triangle_20snake_13',['Triangle Snake',['../snake.html',1,'md_docs_2documentation']]],
-  ['triangle_20snake_20cycle_20counts_14',['Triangle Snake Cycle Counts',['../performance.html#autotoc_md39',1,'']]],
-  ['tris_15',['Non-textured tris',['../design-tradeoffs.html#autotoc_md35',1,'']]],
-  ['tt_16',['tt',['../removed.html#autotoc_md54',1,'&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;'],['../configuration.html#autotoc_md26',1,'Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)'],['../configuration.html#autotoc_md27',1,'Debug Normals (&lt;tt&gt;dbgN&lt;/tt&gt;)']]],
-  ['tt_20and_20tt_20sptrifan_20tt_17',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md54',1,'']]],
-  ['tt_20brw_20tt_18',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
-  ['tt_20brz_20tt_20tt_20brw_20tt_19',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
-  ['tt_20dbgn_20tt_20',['Debug Normals (&lt;tt&gt;dbgN&lt;/tt&gt;)',['../configuration.html#autotoc_md27',1,'']]],
-  ['tt_20sptrifan_20tt_21',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md54',1,'']]],
-  ['tt_20sptristrip_20tt_20and_20tt_20sptrifan_20tt_22',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md54',1,'']]],
-  ['tt_20tt_20brw_20tt_23',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
-  ['type_24',['type',['../structLight__t.html#aa5044999f3339d2ba3b1bf22fa6cfe95',1,'Light_t']]]
+  ['timing_8',['Yield check timing',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['tiny3d_9',['Comparison with Tiny3D',['../snake.html#autotoc_md55',1,'']]],
+  ['to_20f3dex3_10',['to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md41',1,'Porting Your Romhack Codebase to F3DEX3']]],
+  ['tradeoffs_11',['Design Tradeoffs',['../design-tradeoffs.html',1,'md_docs_2documentation']]],
+  ['tradeoffs_20for_20all_20these_20new_20features_12',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
+  ['tri_13',['Tri',['../structTri.html',1,'']]],
+  ['triangle_20snake_14',['Triangle Snake',['../snake.html',1,'md_docs_2documentation']]],
+  ['triangle_20snake_20cycle_20counts_15',['Triangle Snake Cycle Counts',['../performance.html#autotoc_md40',1,'']]],
+  ['tris_16',['Non-textured tris',['../design-tradeoffs.html#autotoc_md35',1,'']]],
+  ['tt_17',['tt',['../removed.html#autotoc_md51',1,'&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;'],['../configuration.html#autotoc_md26',1,'Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)'],['../configuration.html#autotoc_md27',1,'Debug Normals (&lt;tt&gt;dbgN&lt;/tt&gt;)']]],
+  ['tt_20and_20tt_20sptrifan_20tt_18',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md51',1,'']]],
+  ['tt_20brw_20tt_19',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
+  ['tt_20brz_20tt_20tt_20brw_20tt_20',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
+  ['tt_20dbgn_20tt_21',['Debug Normals (&lt;tt&gt;dbgN&lt;/tt&gt;)',['../configuration.html#autotoc_md27',1,'']]],
+  ['tt_20sptrifan_20tt_22',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md51',1,'']]],
+  ['tt_20sptristrip_20tt_20and_20tt_20sptrifan_20tt_23',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md51',1,'']]],
+  ['tt_20tt_20brw_20tt_24',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
+  ['type_25',['type',['../structLight__t.html#aa5044999f3339d2ba3b1bf22fa6cfe95',1,'Light_t']]]
 ];
diff --git a/search/all_13.js b/search/all_13.js
index ddd5451..48ddcfc 100644
--- a/search/all_13.js
+++ b/search/all_13.js
@@ -1,14 +1,13 @@
 var searchData=
 [
-  ['vertex_20cache_20locality_0',['Vertex Cache Locality',['../snake.html#autotoc_md56',1,'']]],
+  ['vertex_20cache_20locality_0',['Vertex Cache Locality',['../snake.html#autotoc_md53',1,'']]],
   ['vertex_20normals_1',['Removal of scaled vertex normals',['../design-tradeoffs.html#autotoc_md32',1,'']]],
-  ['vertex_20pipeline_20lvp_20configuration_2',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md50',1,'']]],
+  ['vertex_20pipeline_20lvp_20configuration_2',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md47',1,'']]],
   ['vertex_20processing_20rsp_20time_20for_20occlusion_20plane_3',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]],
-  ['very_20long_20snakes_4',['Very Long Snakes',['../performance.html#autotoc_md40',1,'']]],
-  ['visual_20features_5',['New visual features',['../md_README.html#autotoc_md7',1,'']]],
-  ['vp_5ft_6',['Vp_t',['../structVp__t.html',1,'']]],
-  ['vtrans_7',['vtrans',['../structVp__t.html#a413b3bf379f48a1f039a9a7eb133a67b',1,'Vp_t']]],
-  ['vtx_8',['Vtx',['../unionVtx.html',1,'']]],
-  ['vtx_5ft_9',['Vtx_t',['../structVtx__t.html',1,'']]],
-  ['vtx_5ftn_10',['Vtx_tn',['../structVtx__tn.html',1,'']]]
+  ['visual_20features_4',['New visual features',['../md_README.html#autotoc_md7',1,'']]],
+  ['vp_5ft_5',['Vp_t',['../structVp__t.html',1,'']]],
+  ['vtrans_6',['vtrans',['../structVp__t.html#a413b3bf379f48a1f039a9a7eb133a67b',1,'Vp_t']]],
+  ['vtx_7',['Vtx',['../unionVtx.html',1,'']]],
+  ['vtx_5ft_8',['Vtx_t',['../structVtx__t.html',1,'']]],
+  ['vtx_5ftn_9',['Vtx_tn',['../structVtx__tn.html',1,'']]]
 ];
diff --git a/search/all_14.js b/search/all_14.js
index 8feb98d..436442a 100644
--- a/search/all_14.js
+++ b/search/all_14.js
@@ -1,7 +1,7 @@
 var searchData=
 [
-  ['what_20about_20yielding_0',['What about yielding?',['../snake.html#autotoc_md57',1,'']]],
+  ['what_20about_20yielding_0',['What about yielding?',['../snake.html#autotoc_md54',1,'']]],
   ['what_20are_20the_20tradeoffs_20for_20all_20these_20new_20features_1',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
   ['with_20f3dex2_2',['with F3DEX2',['../compatibility.html',1,'Backwards Compatibility with F3DEX2'],['../compatibility.html#autotoc_md14',1,'Backwards Compatibility with F3DEX2']]],
-  ['with_20tiny3d_3',['Comparison with Tiny3D',['../snake.html#autotoc_md58',1,'']]]
+  ['with_20tiny3d_3',['Comparison with Tiny3D',['../snake.html#autotoc_md55',1,'']]]
 ];
diff --git a/search/all_15.js b/search/all_15.js
index fb56be2..92e8c45 100644
--- a/search/all_15.js
+++ b/search/all_15.js
@@ -1,5 +1,6 @@
 var searchData=
 [
-  ['yielding_0',['What about yielding?',['../snake.html#autotoc_md57',1,'']]],
-  ['your_20romhack_20codebase_20to_20f3dex3_1',['Your Romhack Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md44',1,'Porting Your Romhack Codebase to F3DEX3']]]
+  ['yield_20check_20timing_0',['Yield check timing',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['yielding_1',['What about yielding?',['../snake.html#autotoc_md54',1,'']]],
+  ['your_20romhack_20codebase_20to_20f3dex3_2',['Your Romhack Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md41',1,'Porting Your Romhack Codebase to F3DEX3']]]
 ];
diff --git a/search/all_16.js b/search/all_16.js
index 5162e12..e786257 100644
--- a/search/all_16.js
+++ b/search/all_16.js
@@ -1,4 +1,4 @@
 var searchData=
 [
-  ['z_20attribute_20offsets_0',['Z attribute offsets',['../removed.html#autotoc_md53',1,'']]]
+  ['z_20attribute_20offsets_0',['Z attribute offsets',['../removed.html#autotoc_md50',1,'']]]
 ];
diff --git a/search/all_2.js b/search/all_2.js
index 68c70ce..4d05fb8 100644
--- a/search/all_2.js
+++ b/search/all_2.js
@@ -1,14 +1,13 @@
 var searchData=
 [
   ['a_0',['a',['../structVtx__tn.html#a24420a9beaac7cee08b5e255a4c29db1',1,'Vtx_tn']]],
-  ['a_20snake_1',['a Snake',['../performance.html#autotoc_md42',1,'Ending a Snake'],['../performance.html#autotoc_md41',1,'Starting a Snake']]],
-  ['about_20yielding_2',['What about yielding?',['../snake.html#autotoc_md57',1,'']]],
-  ['accuracy_3',['Accuracy',['../gbi_8h.html#autotoc_md1',1,'']]],
-  ['algorithm_4',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md52',1,'']]],
-  ['all_20these_20new_20features_5',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
-  ['ambient_5ft_6',['Ambient_t',['../structAmbient__t.html',1,'']]],
-  ['and_20new_20effect_20parameters_7',['Geometry Mode and New Effect Parameters',['../compatibility.html#autotoc_md21',1,'']]],
-  ['and_20tt_20sptrifan_20tt_8',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md54',1,'']]],
-  ['are_20the_20tradeoffs_20for_20all_20these_20new_20features_9',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
-  ['attribute_20offsets_10',['Z attribute offsets',['../removed.html#autotoc_md53',1,'']]]
+  ['about_20yielding_1',['What about yielding?',['../snake.html#autotoc_md54',1,'']]],
+  ['accuracy_2',['Accuracy',['../gbi_8h.html#autotoc_md1',1,'']]],
+  ['algorithm_3',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md49',1,'']]],
+  ['all_20these_20new_20features_4',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
+  ['ambient_5ft_5',['Ambient_t',['../structAmbient__t.html',1,'']]],
+  ['and_20new_20effect_20parameters_6',['Geometry Mode and New Effect Parameters',['../compatibility.html#autotoc_md21',1,'']]],
+  ['and_20tt_20sptrifan_20tt_7',['&lt;tt&gt;SPTriStrip&lt;/tt&gt; and &lt;tt&gt;SPTriFan&lt;/tt&gt;',['../removed.html#autotoc_md51',1,'']]],
+  ['are_20the_20tradeoffs_20for_20all_20these_20new_20features_8',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
+  ['attribute_20offsets_9',['Z attribute offsets',['../removed.html#autotoc_md50',1,'']]]
 ];
diff --git a/search/all_3.js b/search/all_3.js
index 7f0c550..92f317f 100644
--- a/search/all_3.js
+++ b/search/all_3.js
@@ -1,7 +1,7 @@
 var searchData=
 [
   ['backwards_20compatibility_20with_20f3dex2_0',['Backwards Compatibility with F3DEX2',['../compatibility.html',1,'Backwards Compatibility with F3DEX2'],['../compatibility.html#autotoc_md14',1,'Backwards Compatibility with F3DEX2']]],
-  ['bandwidth_1',['Memory Bandwidth',['../snake.html#autotoc_md55',1,'']]],
+  ['bandwidth_1',['Memory Bandwidth',['../snake.html#autotoc_md52',1,'']]],
   ['branch_20depth_20instruction_20tt_20brz_20tt_20tt_20brw_20tt_2',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
   ['brw_20tt_3',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
   ['brz_20tt_20tt_20brw_20tt_4',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
diff --git a/search/all_4.js b/search/all_4.js
index 1e4c551..4909c44 100644
--- a/search/all_4.js
+++ b/search/all_4.js
@@ -1,25 +1,26 @@
 var searchData=
 [
-  ['cache_20locality_0',['Vertex Cache Locality',['../snake.html#autotoc_md56',1,'']]],
+  ['cache_20locality_0',['Vertex Cache Locality',['../snake.html#autotoc_md53',1,'']]],
   ['camera_1',['Camera',['../camera.html',1,'md_docs_2code']]],
-  ['changes_2',['Required Changes',['../porting.html#autotoc_md45',1,'']]],
-  ['changes_20lighting_3',['Recommended Changes (Lighting)',['../porting.html#autotoc_md47',1,'']]],
-  ['changes_20non_20lighting_4',['Recommended Changes (Non-Lighting)',['../porting.html#autotoc_md46',1,'']]],
+  ['changes_2',['Required Changes',['../porting.html#autotoc_md42',1,'']]],
+  ['changes_20lighting_3',['Recommended Changes (Lighting)',['../porting.html#autotoc_md44',1,'']]],
+  ['changes_20non_20lighting_4',['Recommended Changes (Non-Lighting)',['../porting.html#autotoc_md43',1,'']]],
   ['changes_20reference_5',['GBI Changes Reference',['../compatibility.html#autotoc_md15',1,'']]],
-  ['changes_20required_20for_20new_20features_6',['Changes Required for New Features',['../porting.html#autotoc_md48',1,'']]],
-  ['clipping_20minimal_20scanlines_20algorithm_7',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md52',1,'']]],
-  ['clipping_20removal_8',['Far clipping removal',['../design-tradeoffs.html#autotoc_md31',1,'']]],
-  ['cn_9',['cn',['../structVtx__t.html#ac6d2aff53d0bd3fdfe97ac50bfff34e2',1,'Vtx_t']]],
-  ['code_10',['Code',['../md_docs_2code.html',1,'']]],
-  ['codebase_20to_20f3dex3_11',['Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md44',1,'Porting Your Romhack Codebase to F3DEX3']]],
-  ['colc_12',['colc',['../structLight__t.html#a777970fc9405076cf854e6695d89b4c4',1,'Light_t::colc'],['../structPointLight__t.html#a777970fc9405076cf854e6695d89b4c4',1,'PointLight_t::colc']]],
-  ['commands_13',['RDP Commands',['../compatibility.html#autotoc_md16',1,'']]],
-  ['comparison_20with_20tiny3d_14',['Comparison with Tiny3D',['../snake.html#autotoc_md58',1,'']]],
-  ['compatibility_20with_20f3dex2_15',['Compatibility with F3DEX2',['../compatibility.html',1,'Backwards Compatibility with F3DEX2'],['../compatibility.html#autotoc_md14',1,'Backwards Compatibility with F3DEX2']]],
-  ['configuration_16',['Configuration',['../removed.html#autotoc_md50',1,'Legacy Vertex Pipeline (LVP) Configuration'],['../configuration.html',1,'Microcode Configuration'],['../configuration.html#autotoc_md23',1,'Microcode Configuration']]],
-  ['control_20logic_17',['Control Logic',['../compatibility.html#autotoc_md18',1,'']]],
-  ['counters_18',['Performance Counters',['../counters.html',1,'md_docs_2code']]],
-  ['counts_19',['Counts',['../performance.html#autotoc_md38',1,'Cycle Counts'],['../performance.html#autotoc_md39',1,'Triangle Snake Cycle Counts']]],
-  ['credits_20',['Credits',['../md_README.html#autotoc_md11',1,'']]],
-  ['cycle_20counts_21',['Cycle Counts',['../performance.html#autotoc_md38',1,'Cycle Counts'],['../performance.html#autotoc_md39',1,'Triangle Snake Cycle Counts']]]
+  ['changes_20required_20for_20new_20features_6',['Changes Required for New Features',['../porting.html#autotoc_md45',1,'']]],
+  ['check_20timing_7',['Yield check timing',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['clipping_20minimal_20scanlines_20algorithm_8',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md49',1,'']]],
+  ['clipping_20removal_9',['Far clipping removal',['../design-tradeoffs.html#autotoc_md31',1,'']]],
+  ['cn_10',['cn',['../structVtx__t.html#ac6d2aff53d0bd3fdfe97ac50bfff34e2',1,'Vtx_t']]],
+  ['code_11',['Code',['../md_docs_2code.html',1,'']]],
+  ['codebase_20to_20f3dex3_12',['Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md41',1,'Porting Your Romhack Codebase to F3DEX3']]],
+  ['colc_13',['colc',['../structLight__t.html#a777970fc9405076cf854e6695d89b4c4',1,'Light_t::colc'],['../structPointLight__t.html#a777970fc9405076cf854e6695d89b4c4',1,'PointLight_t::colc']]],
+  ['commands_14',['RDP Commands',['../compatibility.html#autotoc_md16',1,'']]],
+  ['comparison_20with_20tiny3d_15',['Comparison with Tiny3D',['../snake.html#autotoc_md55',1,'']]],
+  ['compatibility_20with_20f3dex2_16',['Compatibility with F3DEX2',['../compatibility.html',1,'Backwards Compatibility with F3DEX2'],['../compatibility.html#autotoc_md14',1,'Backwards Compatibility with F3DEX2']]],
+  ['configuration_17',['Configuration',['../removed.html#autotoc_md47',1,'Legacy Vertex Pipeline (LVP) Configuration'],['../configuration.html',1,'Microcode Configuration'],['../configuration.html#autotoc_md23',1,'Microcode Configuration']]],
+  ['control_20logic_18',['Control Logic',['../compatibility.html#autotoc_md18',1,'']]],
+  ['counters_19',['Performance Counters',['../counters.html',1,'md_docs_2code']]],
+  ['counts_20',['Counts',['../performance.html#autotoc_md39',1,'Cycle Counts'],['../performance.html#autotoc_md40',1,'Triangle Snake Cycle Counts']]],
+  ['credits_21',['Credits',['../md_README.html#autotoc_md11',1,'']]],
+  ['cycle_20counts_22',['Cycle Counts',['../performance.html#autotoc_md39',1,'Cycle Counts'],['../performance.html#autotoc_md40',1,'Triangle Snake Cycle Counts']]]
 ];
diff --git a/search/all_5.js b/search/all_5.js
index dc29601..2cfb807 100644
--- a/search/all_5.js
+++ b/search/all_5.js
@@ -5,7 +5,7 @@ var searchData=
   ['deprecated_20list_2',['Deprecated List',['../deprecated.html',1,'']]],
   ['depth_20instruction_20tt_20brz_20tt_20tt_20brw_20tt_3',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]],
   ['design_20tradeoffs_4',['Design Tradeoffs',['../design-tradeoffs.html',1,'md_docs_2documentation']]],
-  ['differences_20from_20f3dex2_20that_20should_20never_20matter_20in_20practice_5',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['differences_20from_20f3dex2_20that_20should_20never_20matter_20in_20practice_5',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['documentation_6',['Documentation',['../md_docs_2documentation.html',1,'']]],
   ['drawing_7',['Main Drawing',['../compatibility.html#autotoc_md17',1,'']]]
 ];
diff --git a/search/all_6.js b/search/all_6.js
index 1e2e64e..80aa427 100644
--- a/search/all_6.js
+++ b/search/all_6.js
@@ -2,7 +2,6 @@ var searchData=
 [
   ['effect_20parameters_0',['Geometry Mode and New Effect Parameters',['../compatibility.html#autotoc_md21',1,'']]],
   ['enable_5fpoint_5flights_1',['ENABLE_POINT_LIGHTS',['../gbi_8h.html#ad506bc9c419599f280569f63e7bc9d52',1,'gbi.h']]],
-  ['encoding_20for_20packed_20normals_2',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md51',1,'']]],
-  ['ending_20a_20snake_3',['Ending a Snake',['../performance.html#autotoc_md42',1,'']]],
-  ['example_4',['Example',['../gbi_8h.html#autotoc_md3',1,'Example'],['../gbi_8h.html#autotoc_md4',1,'Example'],['../performance.html#autotoc_md43',1,'Example']]]
+  ['encoding_20for_20packed_20normals_2',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md48',1,'']]],
+  ['example_3',['Example',['../gbi_8h.html#autotoc_md3',1,'Example'],['../gbi_8h.html#autotoc_md4',1,'Example']]]
 ];
diff --git a/search/all_7.js b/search/all_7.js
index 5d39c79..fa050d5 100644
--- a/search/all_7.js
+++ b/search/all_7.js
@@ -1,18 +1,18 @@
 var searchData=
 [
   ['f3dex2_0',['F3DEX2',['../compatibility.html',1,'Backwards Compatibility with F3DEX2'],['../compatibility.html#autotoc_md14',1,'Backwards Compatibility with F3DEX2']]],
-  ['f3dex2_20that_20should_20never_20matter_20in_20practice_1',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
-  ['f3dex3_2',['F3DEX3',['../md_README.html',1,'F3DEX3'],['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md44',1,'Porting Your Romhack Codebase to F3DEX3']]],
+  ['f3dex2_20that_20should_20never_20matter_20in_20practice_1',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
+  ['f3dex3_2',['F3DEX3',['../md_README.html',1,'F3DEX3'],['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md41',1,'Porting Your Romhack Codebase to F3DEX3']]],
   ['far_20clipping_20removal_3',['Far clipping removal',['../design-tradeoffs.html#autotoc_md31',1,'']]],
-  ['features_4',['Features',['../porting.html#autotoc_md48',1,'Changes Required for New Features'],['../md_README.html#autotoc_md6',1,'Features'],['../removed.html',1,'Removed Features'],['../removed.html#autotoc_md49',1,'Removed Features']]],
+  ['features_4',['Features',['../porting.html#autotoc_md45',1,'Changes Required for New Features'],['../md_README.html#autotoc_md6',1,'Features'],['../removed.html',1,'Removed Features'],['../removed.html#autotoc_md46',1,'Removed Features']]],
   ['features_5',['features',['../md_README.html#autotoc_md7',1,'New visual features'],['../design-tradeoffs.html#autotoc_md28',1,'What are the tradeoffs for all these new features?']]],
   ['flag_6',['flag',['../structVtx__t.html#adb51f166e83cc0968c316d7e54fcf29c',1,'Vtx_t::flag'],['../structVtx__tn.html#adb51f166e83cc0968c316d7e54fcf29c',1,'Vtx_tn::flag']]],
   ['for_20all_20these_20new_20features_7',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
-  ['for_20new_20features_8',['Changes Required for New Features',['../porting.html#autotoc_md48',1,'']]],
+  ['for_20new_20features_8',['Changes Required for New Features',['../porting.html#autotoc_md45',1,'']]],
   ['for_20occlusion_20plane_9',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]],
-  ['for_20packed_20normals_10',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md51',1,'']]],
+  ['for_20packed_20normals_10',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md48',1,'']]],
   ['force_5fstructure_5falignment_11',['force_structure_alignment',['../unionVtx.html#a40555a2e9867f38bb8450b67f62b21b8',1,'Vtx']]],
   ['format_12',['Matrix Format',['../gbi_8h.html#autotoc_md0',1,'']]],
-  ['from_20f3dex2_20that_20should_20never_20matter_20in_20practice_13',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
-  ['functionality_20in_20overlay_203_14',['Functionality in Overlay 3',['../design-tradeoffs.html#autotoc_md30',1,'']]]
+  ['from_20f3dex2_20that_20should_20never_20matter_20in_20practice_13',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
+  ['functionality_20in_20overlays_14',['Functionality in overlays',['../design-tradeoffs.html#autotoc_md30',1,'']]]
 ];
diff --git a/search/all_9.js b/search/all_9.js
index fa1761c..8c89538 100644
--- a/search/all_9.js
+++ b/search/all_9.js
@@ -1,7 +1,7 @@
 var searchData=
 [
   ['improvements_0',['Performance improvements',['../md_README.html#autotoc_md8',1,'']]],
-  ['in_20overlay_203_1',['Functionality in Overlay 3',['../design-tradeoffs.html#autotoc_md30',1,'']]],
-  ['in_20practice_2',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['in_20overlays_1',['Functionality in overlays',['../design-tradeoffs.html#autotoc_md30',1,'']]],
+  ['in_20practice_2',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['instruction_20tt_20brz_20tt_20tt_20brw_20tt_3',['Branch Depth Instruction (&lt;tt&gt;BrZ&lt;/tt&gt; / &lt;tt&gt;BrW&lt;/tt&gt;)',['../configuration.html#autotoc_md26',1,'']]]
 ];
diff --git a/search/all_b.js b/search/all_b.js
index 19d329c..249a583 100644
--- a/search/all_b.js
+++ b/search/all_b.js
@@ -1,14 +1,13 @@
 var searchData=
 [
-  ['legacy_20vertex_20pipeline_20lvp_20configuration_0',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md50',1,'']]],
+  ['legacy_20vertex_20pipeline_20lvp_20configuration_0',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md47',1,'']]],
   ['light_5f1_1',['LIGHT_1',['../gbi_8h.html#a9323099c8f34fdaa780ddf2445a963d0',1,'gbi.h']]],
   ['light_5ft_2',['Light_t',['../structLight__t.html',1,'']]],
   ['light_5ftype_5fdir_3',['LIGHT_TYPE_DIR',['../gbi_8h.html#a8229c1b02537492ab81de6efc3346dd7',1,'gbi.h']]],
   ['light_5ftype_5fpoint_4',['LIGHT_TYPE_POINT',['../gbi_8h.html#a2aecc3454d96b07179bd5840c8317466',1,'gbi.h']]],
-  ['lighting_5',['Lighting',['../compatibility.html#autotoc_md20',1,'Lighting'],['../porting.html#autotoc_md47',1,'Recommended Changes (Lighting)'],['../porting.html#autotoc_md46',1,'Recommended Changes (Non-Lighting)']]],
+  ['lighting_5',['Lighting',['../compatibility.html#autotoc_md20',1,'Lighting'],['../porting.html#autotoc_md44',1,'Recommended Changes (Lighting)'],['../porting.html#autotoc_md43',1,'Recommended Changes (Non-Lighting)']]],
   ['list_6',['Deprecated List',['../deprecated.html',1,'']]],
-  ['locality_7',['Vertex Cache Locality',['../snake.html#autotoc_md56',1,'']]],
+  ['locality_7',['Vertex Cache Locality',['../snake.html#autotoc_md53',1,'']]],
   ['logic_8',['Control Logic',['../compatibility.html#autotoc_md18',1,'']]],
-  ['long_20snakes_9',['Very Long Snakes',['../performance.html#autotoc_md40',1,'']]],
-  ['lvp_20configuration_10',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md50',1,'']]]
+  ['lvp_20configuration_9',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md47',1,'']]]
 ];
diff --git a/search/all_c.js b/search/all_c.js
index 7110c87..0ababce 100644
--- a/search/all_c.js
+++ b/search/all_c.js
@@ -2,10 +2,10 @@ var searchData=
 [
   ['main_20drawing_0',['Main Drawing',['../compatibility.html#autotoc_md17',1,'']]],
   ['matrix_20format_1',['Matrix Format',['../gbi_8h.html#autotoc_md0',1,'']]],
-  ['matter_20in_20practice_2',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
-  ['memory_20bandwidth_3',['Memory Bandwidth',['../snake.html#autotoc_md55',1,'']]],
+  ['matter_20in_20practice_2',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
+  ['memory_20bandwidth_3',['Memory Bandwidth',['../snake.html#autotoc_md52',1,'']]],
   ['microcode_20configuration_4',['Microcode Configuration',['../configuration.html',1,'Microcode Configuration'],['../configuration.html#autotoc_md23',1,'Microcode Configuration']]],
-  ['minimal_20scanlines_20algorithm_5',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md52',1,'']]],
+  ['minimal_20scanlines_20algorithm_5',['Clipping minimal scanlines algorithm',['../removed.html#autotoc_md49',1,'']]],
   ['miscellaneous_6',['Miscellaneous',['../md_README.html#autotoc_md9',1,'Miscellaneous'],['../compatibility.html#autotoc_md22',1,'Miscellaneous']]],
   ['mode_20and_20new_20effect_20parameters_7',['Geometry Mode and New Effect Parameters',['../compatibility.html#autotoc_md21',1,'']]],
   ['mtx_5ft_8',['Mtx_t',['../gbi_8h.html#a337d3271dded333fc80aa798bf36efe3',1,'gbi.h']]]
diff --git a/search/all_d.js b/search/all_d.js
index 8bf4c97..5b566ee 100644
--- a/search/all_d.js
+++ b/search/all_d.js
@@ -1,16 +1,16 @@
 var searchData=
 [
   ['n_0',['n',['../structVtx__tn.html#a1c089003792d753ad1a83e55ba6a1c5d',1,'Vtx_tn::n'],['../unionVtx.html#a9d853f123229b9565de7f15bd3a92879',1,'Vtx::n']]],
-  ['never_20matter_20in_20practice_1',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['never_20matter_20in_20practice_1',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['new_20effect_20parameters_2',['Geometry Mode and New Effect Parameters',['../compatibility.html#autotoc_md21',1,'']]],
-  ['new_20features_3',['Changes Required for New Features',['../porting.html#autotoc_md48',1,'']]],
+  ['new_20features_3',['Changes Required for New Features',['../porting.html#autotoc_md45',1,'']]],
   ['new_20features_4',['What are the tradeoffs for all these new features?',['../design-tradeoffs.html#autotoc_md28',1,'']]],
   ['new_20visual_20features_5',['New visual features',['../md_README.html#autotoc_md7',1,'']]],
   ['no_20occlusion_20plane_20noc_6',['No Occlusion Plane (NOC)',['../configuration.html#autotoc_md24',1,'']]],
   ['noc_7',['No Occlusion Plane (NOC)',['../configuration.html#autotoc_md24',1,'']]],
-  ['non_20lighting_8',['Recommended Changes (Non-Lighting)',['../porting.html#autotoc_md46',1,'']]],
+  ['non_20lighting_8',['Recommended Changes (Non-Lighting)',['../porting.html#autotoc_md43',1,'']]],
   ['non_20textured_20tris_9',['Non-textured tris',['../design-tradeoffs.html#autotoc_md35',1,'']]],
-  ['normals_10',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md51',1,'']]],
+  ['normals_10',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md48',1,'']]],
   ['normals_11',['Removal of scaled vertex normals',['../design-tradeoffs.html#autotoc_md32',1,'']]],
   ['normals_20tt_20dbgn_20tt_12',['Debug Normals (&lt;tt&gt;dbgN&lt;/tt&gt;)',['../configuration.html#autotoc_md27',1,'']]],
   ['numlights_5f0_13',['NUMLIGHTS_0',['../gbi_8h.html#aea86692fc57f81ac13ad61f7f7f52b69',1,'gbi.h']]]
diff --git a/search/all_e.js b/search/all_e.js
index 212a1ae..eb17319 100644
--- a/search/all_e.js
+++ b/search/all_e.js
@@ -1,10 +1,10 @@
 var searchData=
 [
-  ['obscure_20semantic_20differences_20from_20f3dex2_20that_20should_20never_20matter_20in_20practice_0',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['obscure_20semantic_20differences_20from_20f3dex2_20that_20should_20never_20matter_20in_20practice_0',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['occlusion_20plane_1',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]],
   ['occlusion_20plane_20noc_2',['No Occlusion Plane (NOC)',['../configuration.html#autotoc_md24',1,'']]],
-  ['octahedral_20encoding_20for_20packed_20normals_3',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md51',1,'']]],
+  ['octahedral_20encoding_20for_20packed_20normals_3',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md48',1,'']]],
   ['of_20scaled_20vertex_20normals_4',['Removal of scaled vertex normals',['../design-tradeoffs.html#autotoc_md32',1,'']]],
-  ['offsets_5',['Z attribute offsets',['../removed.html#autotoc_md53',1,'']]],
-  ['overlay_203_6',['Functionality in Overlay 3',['../design-tradeoffs.html#autotoc_md30',1,'']]]
+  ['offsets_5',['Z attribute offsets',['../removed.html#autotoc_md50',1,'']]],
+  ['overlays_6',['Functionality in overlays',['../design-tradeoffs.html#autotoc_md30',1,'']]]
 ];
diff --git a/search/all_f.js b/search/all_f.js
index 89fc9dc..d59ab30 100644
--- a/search/all_f.js
+++ b/search/all_f.js
@@ -1,6 +1,6 @@
 var searchData=
 [
-  ['packed_20normals_0',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md51',1,'']]],
+  ['packed_20normals_0',['Octahedral Encoding for Packed Normals',['../removed.html#autotoc_md48',1,'']]],
   ['pad1_1',['pad1',['../structAmbient__t.html#a2b51d26372a2d0e6f33f9195911d28dc',1,'Ambient_t']]],
   ['pad2_2',['pad2',['../structLight__t.html#aed778175ebe6fb9ea936a6af23ee2303',1,'Light_t::pad2'],['../structAmbient__t.html#aed778175ebe6fb9ea936a6af23ee2303',1,'Ambient_t::pad2']]],
   ['pad3_3',['pad3',['../structLight__t.html#a17791fc0cdc239e506376e05fc8aa9ff',1,'Light_t']]],
@@ -8,14 +8,14 @@ var searchData=
   ['performance_5',['Performance',['../gbi_8h.html#autotoc_md2',1,'']]],
   ['performance_20counters_6',['Performance Counters',['../counters.html',1,'md_docs_2code']]],
   ['performance_20improvements_7',['Performance improvements',['../md_README.html#autotoc_md8',1,'']]],
-  ['performance_20results_8',['Performance Results',['../performance.html',1,'Performance Results'],['../performance.html#autotoc_md37',1,'Performance Results']]],
-  ['pipeline_20lvp_20configuration_9',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md50',1,'']]],
+  ['performance_20results_8',['Performance Results',['../performance.html',1,'Performance Results'],['../performance.html#autotoc_md38',1,'Performance Results']]],
+  ['pipeline_20lvp_20configuration_9',['Legacy Vertex Pipeline (LVP) Configuration',['../removed.html#autotoc_md47',1,'']]],
   ['plane_10',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]],
   ['plane_20noc_11',['No Occlusion Plane (NOC)',['../configuration.html#autotoc_md24',1,'']]],
   ['pointlight_5ft_12',['PointLight_t',['../structPointLight__t.html',1,'']]],
-  ['porting_20your_20romhack_20codebase_20to_20f3dex3_13',['Porting Your Romhack Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md44',1,'Porting Your Romhack Codebase to F3DEX3']]],
+  ['porting_20your_20romhack_20codebase_20to_20f3dex3_13',['Porting Your Romhack Codebase to F3DEX3',['../porting.html',1,'Porting Your Romhack Codebase to F3DEX3'],['../porting.html#autotoc_md41',1,'Porting Your Romhack Codebase to F3DEX3']]],
   ['pos_14',['pos',['../structPointLight__t.html#af472b1e169974f9f4dbe2e54d6b18c1b',1,'PointLight_t']]],
-  ['practice_15',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md36',1,'']]],
+  ['practice_15',['Obscure semantic differences from F3DEX2 that should never matter in practice',['../design-tradeoffs.html#autotoc_md37',1,'']]],
   ['processing_20rsp_20time_20for_20occlusion_20plane_16',['Vertex processing RSP time for occlusion plane',['../design-tradeoffs.html#autotoc_md29',1,'']]],
   ['profiling_17',['Profiling',['../md_README.html#autotoc_md10',1,'Profiling'],['../configuration.html#autotoc_md25',1,'Profiling']]]
 ];
diff --git a/snake.html b/snake.html
index b2721d7..e62f329 100644
--- a/snake.html
+++ b/snake.html
@@ -125,11 +125,11 @@ Coiled snake forming triangle fan</div></div>
 Snake with mixed shape</div></div>
     <p><em>The snake need not be constrained to either shape; it can turn left or right in any combination. This can be thought of as concatenating triangle strips and fans. (Original photo by Al d'Vilas, free-use licensed)</em></p>
 <p>A snake can be arbitrarily long. It starts with a <code>SPTriSnake</code> command, which may be followed by one or more <code>SPContinueSnake</code> macros which encode continued indices. The latter are not commands (there's no command byte)&ndash;they are just more index data sequentially in the display list. In other words, the display list input buffer is the storage for the indices data. The microcode correctly handles the case when the snake runs off the end of the input buffer and the input buffer needs to be refilled. The refilled data starts from the start of the input buffer, as if it were regular commands; this matters for the hints system.</p>
-<h2><a class="anchor" id="autotoc_md55"></a>
+<h2><a class="anchor" id="autotoc_md52"></a>
 Memory Bandwidth</h2>
 <p>The goal of any accelerated triangles system in a microcode is to reduce the memory bandwidth used for loading triangle indices. The actual tris drawn are the same regardless of how their indices are encoded in the display list, so we do not consider the performance of actually drawing the tris, only loading their indices.</p>
 <p>An <code>SPTriSnake</code> command by itself contains 7 vertices and draws 5 triangles (because the first triangle needs two extra vertices to start itself). An <code>SPContinueSnake</code> macro contains 8 vertices and draws 8 tris, in each case continuing the existing snake. The F3D family microcodes before F3DEX3 only provided <code>SP1Triangle</code> and <code>SP2Triangle</code> commands, so any snake of 3 or more tris is more efficient than F3DEX2 and older microcodes. The efficiency gain is up to 4x (2 tris -&gt; 8 tris per 8-byte macro), though in typical meshes the gain is expected to be 2-3x.</p>
-<h2><a class="anchor" id="autotoc_md56"></a>
+<h2><a class="anchor" id="autotoc_md53"></a>
 Vertex Cache Locality</h2>
 <p>The key advantage of a triangle snake over a traditional triangle strip is that it better exploits the vertex cache.</p>
 <p>In any microcode, the vertex cache is of a fixed size, and any continuous subset of it can be reloaded. Loading a vertex costs 16 bytes of memory throughput plus some RSP time to perform transformation and lighting. So, the goal of vertex cache optimization is to reduce the number of vertices reloaded (loaded a second time when they had been loaded in the past but are no longer loaded). A secondary goal is to reduce the number of vertices kept in the cache between loads, as doing so increases the average load size, which decreases the relative overhead of the loads.</p>
@@ -145,7 +145,7 @@ Part of a mesh showing subdivision into 4 strips or 1 snake</div></div>
 <div class="caption">
 The same mesh but with a long, thin set of tris</div></div>
     <p><em>If vertex loads are optimized for rendering with triangle strips, long "1D" sections of meshes will be loaded, which does not exploit the "2D" spatial locality of the vertex cache. This is especially inefficient if the export tool always reloads vertices instead of keeping them in the cache: in the case pictured, the entire top row of selected vertices will be immediately reloaded when rendering the next strip up.</em></p>
-<h2><a class="anchor" id="autotoc_md57"></a>
+<h2><a class="anchor" id="autotoc_md54"></a>
 What about yielding?</h2>
 <p>Microcodes compatible with libultra&ndash;including the F3D family, S2DEX, JPEG decoder, etc.&ndash;are required to listen for a flag from the CPU, and if it is set, to save their state and stop executing. This allows the higher-priority audio microcode to be swapped in and run, which must occur soon after every VI. The audio microcode may take a few ms to run, so if it is delayed by more than a few ms, there is a risk of audio corruption.</p>
 <p>Any command which results in RDP commands being enqueued&ndash;triangle or rectangle draws, texture loads, CC setting changes, etc.&ndash;can cause the current temporary RDP command buffer in DMEM to be flushed to the FIFO in RDRAM. If the latter is full, the command will wait until space is available. In an extreme case, the RDP may have to clear the framebuffer and depth buffer before making progress and opening up space in the FIFO, which can take several ms. Thus, the processing of most display list commands could theoretically cause the RSP to wait&ndash;delaying the yield&ndash;by several ms.</p>
@@ -158,11 +158,11 @@ What about yielding?</h2>
 <p>Since each pair of tris drawn in the snake require a buffer flush, and the tris the RSP is enqueuing are the same data size as the tris the RDP is rendering, the RSP will have to wait after each pair of tris in the snake for capacity in the RDP buffer before it can continue with the snake. In other words, the snake speed is limited by the RDP drawing speed for tris much earlier in the frame. In this example, the RSP will not finish the snake and respond to the yield for 10 ms, delaying the audio microcode too long and causing audio corruption.</p>
 <p>This is still an unlikely case though:</p><ul>
 <li>If your game has 100 consecutive tris which take a total of 10 ms of RDP time, this is probably poorly optimized to begin with.</li>
-<li>When the FIFO fills up, this means wasted RSP time, even if this happens not to conflict with a yield. If the FIFO fills up often and RSP peformance is imporant in your game (either for audio or because the graphics are RSP bound), you should expand the FIFO.</li>
+<li>When the FIFO fills up, this means wasted RSP time, even if this happens not to conflict with a yield. If the FIFO fills up often and RSP peformance is important in your game (either for audio or because the graphics are RSP bound), you should expand the FIFO.</li>
 <li>A snake this long is rare in typical low-poly N64 meshes. And, the export tool could limit the maximum snake length generated.</li>
 </ul>
-<p>A future version of F3DEX3 could allow the snake command to yield in the middle. This has not been implemented yet because it is very difficult to validate. Yields are rare relative to display list commands (typically 1-2 of the former and many thousands of the latter per frame). And, until we have a robust F3DEX3 mesh optimizer and a game where most things are drawn with snakes (i.e. few vanilla assets left), snakes will also be rare in the display list. So it will be hard to know whether the yield-during-snake codepath is even being run, let alone whether it is correct in all cases.</p>
-<h2><a class="anchor" id="autotoc_md58"></a>
+<p>F3DEX3 now checks for yields whenever the input buffer is refilled, not before every command as in F3DEX2. When a triangle snake extends across the boundary of the input buffer, a yield can occur, and F3DEX3 correctly suspends and resumes the triangle snake in this case. So, while triangle snakes can be unlimited in length, because the input buffer is 21 commands = 168 bytes, there is guaranteed to be a yield check at least once every 168 triangles. (Any snake of 8 tris or more can potentially cross the input buffer end and therefore be interrupted.) This guarantee does not help practically though, as practically snakes will not be more than about 110 tris due to the vertex buffer size.</p>
+<h2><a class="anchor" id="autotoc_md55"></a>
 Comparison with Tiny3D</h2>
 <p><a href="https://github.com/HailToDodongo/tiny3d">Tiny3D</a>, the homebrew microcode, uses triangle strips as its accelerated triangles command. F3DEX3 triangle snakes have several advantages compared to Tiny3D's triangle strips:</p><ul>
 <li>Tiny3D uses 16-bit indices (raw DMEM addresses) in triangle strips, which saves a multiply-add / table lookup for each index, but doubles the required memory bandwidth compared to 8-bit indices in the F3D family (including F3DEX3).</li>
diff --git a/structAmbient__t.html b/structAmbient__t.html
index 26209ac..8412082 100644
--- a/structAmbient__t.html
+++ b/structAmbient__t.html
@@ -116,7 +116,9 @@ Data Fields</h2></td></tr>
 <tr class="separator:aed778175ebe6fb9ea936a6af23ee2303"><td class="memSeparator" colspan="2">&#160;</td></tr>
 </table>
 <a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
-<div class="textblock"></div><h2 class="groupheader">Field Documentation</h2>
+<div class="textblock"><p>Light structure.</p>
+<p>Note: the weird order is for the DMEM alignment benefit of the microcode.    </p>
+</div><h2 class="groupheader">Field Documentation</h2>
 <a id="a2b51d26372a2d0e6f33f9195911d28dc" name="a2b51d26372a2d0e6f33f9195911d28dc"></a>
 <h2 class="memtitle"><span class="permalink"><a href="#a2b51d26372a2d0e6f33f9195911d28dc">&#9670;&#160;</a></span>pad1</h2>
 

	F3DEX2	F3DEX3_NOC	F3DEX3
Command dispatch	12	12	12
Command dispatch	12	10	10
Small RDP command	14	5	5
Small RDP command	14	4	4
Only/2nd tri to offscreen	27	25	25
Only/2nd tri to offscreen	27	20	20
1st tri to offscreen	28	26	26
1st tri to offscreen	28	21	21
Only/2nd tri to clip	32	30	30
Only/2nd tri to clip	32	25	25
1st tri to clip	33	31	31
1st tri to clip	33	26	26
Only/2nd tri to backface	38	36	36
Only/2nd tri to backface	38	31	31
1st tri to backface	39	37	37
1st tri to backface	39	32	32
Only/2nd tri to degenerate	42	38	38
Only/2nd tri to degenerate	42	33	33
1st tri to degenerate	43	39	39
1st tri to degenerate	43	34	34
Only/2nd tri to occluded	Can't	Can't	42
Only/2nd tri to occluded	Can't	Can't	37
1st tri to occluded	Can't	Can't	43
1st tri to occluded	Can't	Can't	38
Only/2nd tri to draw	172	156	158
Only/2nd tri to draw	172	149	151
1st tri to draw	173	157	159
1st tri to draw	173	150	152
Tri snake	Can't	*	*
Tri snake	Can't	10/11*	10/11*
Vtx before DMA start	16	17	17
Light dir xfrm, 9 dir lts	Can't	196	196