Commit Graph

109 Commits

Author SHA1 Message Date
dan elksnitis
a9037b25ed [shaders]
- add FShaderSource class which wraps source as populated by preprocessing and subsequently accessed by compilation and other debug features; this class automatically inserts zeroed padding such that 16-byte-wide SIMD string comparison operations do not require a non-SIMD tail to process any overhang.
- add typedefs for the string/view/character types and update preprocessing code to use these typedefs instead of the explicit types
- add explicit if constexprs in minifier code around char width to disable simd optimizations for char width != 2 (and subsequently skip the non-simd tail if char width == 2 since FShaderSource automatically adds the required padding)

#rb Jason.Nadro, Yuriy.ODonnell

[CL 30358137 by dan elksnitis in ue5-main branch]
2023-12-15 15:28:27 -05:00
jason hoerner
903ff2eea7 Shader preprocessor: Fix for ASAN failure -- need to add SSE padding to substitution text generated for TEXT macros.
#jira UE-202569
#rnx
#rb dan.elksnitis

[CL 30331769 by jason hoerner in ue5-main branch]
2023-12-14 16:35:09 -05:00
dan elksnitis
6ed653a189 [shaders] further preprocessing cleanup
- move sequence of preprocessing steps out of ShaderPreprocessor module and into UE::ShaderCompilerCommon::ExecuteShaderPreprocessingSteps; the former is now explicitly just the low-level preprocessor lib
- add an implementation of PreprocessShader in FBaseShaderFormat so backends which have no custom code to execute as part of preprocessing can just automatically inherit this implementation, and fix up such backends to eliminate now-unnecessary overrides

#rb christopher.waters, Laura.Hermanns

[CL 30178136 by dan elksnitis in ue5-main branch]
2023-12-07 08:55:41 -05:00
dan elksnitis
c1f33c7a82 [shaders] debug usf/direct compile cleanup
- never append the environment defines as commented code to the source used for further preprocessing/compilation; instead only append it to the debug USF
- strip comments after loading the debug usf in direct compile mode as some backends expect comments to have already been removed and the extra ones we add to the debug dump cause them to barf
- change all #if 0s in the debug usf to block comments instead so the above can strip them (said backends also don't like preprocessor directives left in the file)

#rb Jason.Nadro, rob.krajcarski

[CL 30161438 by dan elksnitis in ue5-main branch]
2023-12-06 13:32:32 -05:00
dan elksnitis
cc7c2c54f4 [shaders] shader format preprocessing cleanup & refactoring
- move uniform buffer cleanup and dead stripping into ShaderPreprocessor module's PreprocessShader function
- add "required symbols" to compiler input struct to specify additional symbols to keep during minification aside from those specified by the entrypoint; modify API such that both an entry point string and additional symbols can be specified (to avoid each backend needing to manually parse the compound RT entry point string)
- make use of ModifyShaderCompilerInput in all backends to set additional defines and required symbols on input struct up front; only use the AdditionalDefines map in cases where it's actually necessary
- remove the various per-platform defines for enabling minifier, no longer required now that this has been rolled out for all backends
- fix SCW directcompile mode; this had rotted due to pieces of the FShaderCompilerEnvironment having been added that weren't explicitly serialized to either cmdline or in the shader source. this now serializes as a base64 string written inside the USF containing all portions of the environment required for compilation (using the same serialization function as is used to write/read the SCW input file)
- use a debug flag for indicating we're in "direct compile" mode and should load the debug USF off disk, rather than the poorly named "bSkipPreprocessedCache" (this name is both inaccurate and also confusing due to the addition of the preprocessed job cache)
- modify platform "force wave32" mechanism to use a pragma directive to set a compiler define, instead of doing string replacement in the preprocessed source
- add a view version of the RT entrypoint parsing to use in preprocessing, note that other paths still need to construct fstrings due to further manipulation so keeping the FString path around too
- clean up backends manually checking the "directcompile" cmdline arg

#rb christopher.waters, Yuriy.ODonnell
#rb Chris.Waters
#rb Laura.Hermanns

[CL 30023082 by dan elksnitis in ue5-main branch]
2023-11-30 15:56:34 -05:00
marc audy
399bcf9971 Disable PVS warning V758
Silence V570 false positives for bit field assignments
Silence various other PVS warnings
#rnx

[CL 29706746 by marc audy in ue5-main branch]
2023-11-14 00:29:43 -05:00
josh adams
92f54ebbbc - Worked around crash in shader preprocessor on mac
#rb Jason.Hoerner

[CL 29612445 by josh adams in ue5-main branch]
2023-11-09 17:28:23 -05:00
jason hoerner
92db7f3070 Shader Preprocessor: Fix preprocessing for shader asserts. Need to preprocess the contents of TEXT macros to handle __FILE__ macros, or other possible macro generated text.
#rnx
#rb dan.elksnitis

[CL 29538506 by jason hoerner in ue5-main branch]
2023-11-07 18:01:37 -05:00
jason hoerner
9852294945 Shader Preprocessor: ASAN bug fix for stack use after return for undef_map identifiers, by enabling arena allocation. Strip out whitespace on blank lines -- doesn't measurably affect perf, and could improve deduplication or avoid certain shader invalidations, and makes stripped source more readable. Fix bug where line numbers aren't incremented when exiting the scanning loop on an identifier.
#jira UE-198065
#rnx
#rb dan.elksnitis yuriy.odonnell

[CL 29100213 by jason hoerner in ue5-main branch]
2023-10-25 16:04:13 -04:00
jason hoerner
0e18ea7e81 Shader Preprocessor: Roll TEXT macro substitution into the C preprocessor step, via custom macro callback feature, making it effectively free. Saves 3% overall on preprocess, but also represents a simplification by reducing the number of unique preprocess passes.
#jira UE-198496
#rnx
#rb dan.elksnitis yuriy.odonnell jason.nadro

[CL 29100104 by jason hoerner in ue5-main branch]
2023-10-25 16:01:11 -04:00
jason hoerner
a518725896 Shader Preprocessor optimization: Support for loading include preprocess dependencies in bulk, and drastically reduced string processing, memory allocation, and map overhead. Roughly 7x faster, saving 12% in low level preprocessor, or 5% overall.
Flattened include dependencies are generated during include scanning at startup, basically for free (perf difference was well below noise).  Bulk dependencies reduce round trips to the shader cache (which require mutex locks), and are indexed by the ANSI text exactly as it appears in the include directive in the source files, allowing a faster case sensitive hash, and avoiding the need for expensive path string operations.  Anything found in a bulk dependency is stored in an array that parallels the dependency array, rather than a map.  Includes stored in IncludeVirtualPathToSharedContentsMap also use an array.

Noting that our string classes (FString) are already case insensitive by default, some unnecessary case conversions were removed.  The separate map of "seen" shaders was also removed, as we can just use the LoadedIncludesCache map for the same purpose.  Where possible, existing ANSI strings are referenced, avoiding dynamic allocation.

#jira UE-197213
#rnx
#rb yuriy.odonnell jason.nadro

[CL 29095249 by jason hoerner in ue5-main branch]
2023-10-25 13:58:15 -04:00
jason hoerner
bae18a5cc0 Shader Preprocessor platform specific compile error / warning fixes. Unused branch label on non-SSE platforms, char array index, conversion of 255 to char in SSE vector initialization (used by unsigned compare intrinsic so code was correct), strcpy_s not supported on some platforms -- switched to strncpy.
#rnx

[CL 28835617 by jason hoerner in ue5-main branch]
2023-10-17 07:01:26 -04:00
jason hoerner
8074c3b4b1 Shader Preprocessor: Add padding for SSE reads as actual zeroes, rather than capacity padding. Capacity padding doesn't work for compiler workers some of the time.
[CL 28834349 by jason hoerner in ue5-main branch]
2023-10-17 05:19:24 -04:00
jason hoerner
2d5091c15e Shader Preprocessor: Early bloom filter and SSE optimizations. Overall 27.9% improvement to low level preprocessor, or 10.4% to ConditionalPreprocessShader as a whole.
* Moved identifier copy and macro bloom filter from maybe_expand_macro to test into copy_to_action_point / copy_to_action_point_macro_expansion.  13.1% of improvement.
* SSE implementation of scan_to_directive, 10x faster, 5.2%
* SSE implementation of identifier copy, 3x faster, 4.5%
* SSE ShaderConvertAndStripComments, 4x faster, 3.6%
* Fast inline string equality comparison, 4x faster, 1.5%

To make SSE implementations "safe" without needing special cases near the end of a buffer, it's necessary to ensure padding is present in the relevant buffers, anything that goes through a preprocess_string call.  This includes the string arena allocator, temporary stbds arrays that hold strings, and file buffers passed in.  The latter all pass through ShaderConvertAndStripComments, where we can add padding.  (ShaderConvertAndStripComments itself has special cases for end of buffer).  Code related to original 1 and 2 character macro filter removed, since I can't see a reason to enable it over the bloom filter.

I also attempted SSE optimization of copy_to_action_point and copy_line_without_comments, but improvement wasn't big enough to be worth the complexity (around 2% for the former, but massive code complexity, 0.5% for the latter).  That's pretty much everything SSE friendly that's over 1% on a profile, although I think copy_argument can be made a lot faster, not primarily through SSE.

#jira UE-197212
#rnx
#rb yuriy.odonnell jason.nadro

[CL 28834324 by jason hoerner in ue5-main branch]
2023-10-17 05:18:57 -04:00
jason hoerner
312825f3d0 Build fix for Mac ARM64: inline functions in cond_expr.c produced link errors, not sure why, but force inlining them should fix the issue.
[CL 28538832 by jason hoerner in ue5-main branch]
2023-10-06 10:09:38 -04:00
jason hoerner
b5db1a507e Shader Preprocessor optimizations second pass: 10.7% overall improvement to ConditionalPreprocessShader, or 23.1% improvement to lower level PreprocessShader pass alone, factoring out other passes like dead code removal not affected by this CL.
* Low overhead bloom filter for identifier hash lookups in maybe_expand_macro  (3.1%)
* Optimized inline version of library calls (isspace, isalnum), and avoid strtoul for single digit numbers in evaluate_if   (2.6%)
* Array reserve function inlining  (1.7%)
* Fetch output size from preprocessor library instead of calling strlen  (1.2%)
* Avoid FName -> string -> FName round trip conversion of key in FShaderCompilerEnvironment::Merge   (0.3%)
* Use string concatentation instead of printf and inline storage in AddStbDefines   (0.3%)
* Other overhead reduction (remove unused fast_dest feature, smaller stbds_array_header struct, savings from inlining)    (1.5%)

#rnx
#rb dan.elksnitis jason.nadro

[CL 28537574 by jason hoerner in ue5-main branch]
2023-10-06 09:14:35 -04:00
dan elksnitis
5518e3d9f3 [shaders] deprecating old PreprocessShader and IShaderFormat::CompileShader APIs now that all shader formats have been migrated
#rb Jason.Nadro

[CL 28452658 by dan elksnitis in ue5-main branch]
2023-10-04 09:22:36 -04:00
jason hoerner
57a469b8d9 Build fix -- preprocessor compile error due to assigning a variable twice in one statement (once in the macro, once in the code). Second instance in the code, doh!
#rnx

[CL 28221361 by jason hoerner in ue5-main branch]
2023-09-26 07:30:24 -04:00
jason hoerner
fb2a55e5b7 Build fix -- preprocessor compile error due to assigning a variable twice in one statement (once in the macro, once in the code).
#rnx

[CL 28221307 by jason hoerner in ue5-main branch]
2023-09-26 07:29:25 -04:00
jason hoerner
046ace59ff ShaderCompiler: Preprocessor optimizations, first pass. Saves 16.7% overall on ConditionalPreprocessShader.
* Inline array memory allocation added to low level preprocessor for output and various temporary buffers to reduce dynamic memory allocation and reallocation overhead.  Saved 4.6%.
* FShaderPreprocessOutput::StripCode optimized to write to FString as TCHAR array, rather than using AppendChar (over 4x speedup).  Saved 2.9%
* Shader source file cache now also stores stripped and ANSI converted source, to avoid need to convert and strip the source, plus allocating a copy is avoided.  Saved 4.3%
* Uniform buffer structure declarations stored as ANSI converted source, avoiding convert and copy.  Saved 4.9%

#rnx
#rb dan.elksnitis jason.nadro

[CL 28219741 by jason hoerner in ue5-main branch]
2023-09-26 05:29:05 -04:00
dan elksnitis
6cc4950575 [shaders]
- modify versioning mechanisms to use UEMETADATA pragmas
- update commands which generate version.ush files to generate such pragmas instead of comments
- include the above pragma-driven version(s) in the input hash when preprocessed job cache is enabled (not needed for the disabled case since as before the entire contents of the version files are hashed)
- fix bug in stb_preprocessor that was causing it to stop expanding macros if any diagnostic was encountered (and rename the array field storing diagnostic messages to diagnostics, since it contains more than just errors)

#rb Yuriy.ODonnell

[CL 27515060 by dan elksnitis in ue5-main branch]
2023-08-31 04:49:18 -04:00
dan elksnitis
9a5a1b6f74 [shaders] minor include cleanup
#rb Jason.Hoerner
#rb Laura.Hermanns

[CL 26817718 by dan elksnitis in ue5-main branch]
2023-08-03 13:40:32 -04:00
jason hoerner
7f22080814 Shader Compiler: Made FShaderCompilerDefinitions private, to reduce code publicly visible in ShaderCore.h. For now, it's just marked deprecated for 5.4, but will be hidden more generally in 5.5.
#rnx
#rb yuriy.odonnell dan.elksnitis jason.nadro

[CL 26534998 by jason hoerner in ue5-main branch]
2023-07-22 06:35:10 -04:00
jason hoerner
e1fe70dbca Shader Compiler: GlobalBeginCompileShader define optimizations, plus a few other micro-optimizations, producing a 3.2x performance improvement.
* Major define optimization involves converting map of defines to use FName keys and variant values rather than strings -- this eliminates most of the cost of string hashing, allocation, and conversion.
* Lower overhead FHashTable used instead of TMap.
* An initial map of defines can optionally be provided globally.  Anything using an initial define can have its map index cached for optimized lookup when reading or writing.

Other micro-optimizations:
* Added Reserve calls for uniform buffer related maps, to eliminate map resizing / rehashing.  Saved around 15% perf (after define optimizations).
* Added a map for UB lookup, instead of iterating through the linked list.  Saved around 10% perf.

Non-optimization:  Sort the order in which uniform buffer variable names are searched in BuildShaderFileToUniformBufferMap, to create determinism in ShaderDebug data for A/B testing (previously order was dependent on global constructor order for UB definitions, which could vary arbitrarily with unrelated changes).

#jira UE-187334
#rnx
#rb dan.elksnitis jason.nadro yuriy.odonnell

[CL 26142884 by jason hoerner in ue5-main branch]
2023-06-21 03:26:02 -04:00
henrik karlsson
f027bfa856 [Core]
* Fixed compile errors when compiling .h files in isolation

#rb none

[CL 25865668 by henrik karlsson in ue5-main branch]
2023-06-08 02:30:27 -04:00