Commit Graph

121 Commits

Author SHA1 Message Date
jason hoerner
2d5091c15e Shader Preprocessor: Early bloom filter and SSE optimizations. Overall 27.9% improvement to low level preprocessor, or 10.4% to ConditionalPreprocessShader as a whole.
* Moved identifier copy and macro bloom filter from maybe_expand_macro to test into copy_to_action_point / copy_to_action_point_macro_expansion.  13.1% of improvement.
* SSE implementation of scan_to_directive, 10x faster, 5.2%
* SSE implementation of identifier copy, 3x faster, 4.5%
* SSE ShaderConvertAndStripComments, 4x faster, 3.6%
* Fast inline string equality comparison, 4x faster, 1.5%

To make SSE implementations "safe" without needing special cases near the end of a buffer, it's necessary to ensure padding is present in the relevant buffers, anything that goes through a preprocess_string call.  This includes the string arena allocator, temporary stbds arrays that hold strings, and file buffers passed in.  The latter all pass through ShaderConvertAndStripComments, where we can add padding.  (ShaderConvertAndStripComments itself has special cases for end of buffer).  Code related to original 1 and 2 character macro filter removed, since I can't see a reason to enable it over the bloom filter.

I also attempted SSE optimization of copy_to_action_point and copy_line_without_comments, but improvement wasn't big enough to be worth the complexity (around 2% for the former, but massive code complexity, 0.5% for the latter).  That's pretty much everything SSE friendly that's over 1% on a profile, although I think copy_argument can be made a lot faster, not primarily through SSE.

#jira UE-197212
#rnx
#rb yuriy.odonnell jason.nadro

[CL 28834324 by jason hoerner in ue5-main branch]
2023-10-17 05:18:57 -04:00
jason hoerner
312825f3d0 Build fix for Mac ARM64: inline functions in cond_expr.c produced link errors, not sure why, but force inlining them should fix the issue.
[CL 28538832 by jason hoerner in ue5-main branch]
2023-10-06 10:09:38 -04:00
jason hoerner
b5db1a507e Shader Preprocessor optimizations second pass: 10.7% overall improvement to ConditionalPreprocessShader, or 23.1% improvement to lower level PreprocessShader pass alone, factoring out other passes like dead code removal not affected by this CL.
* Low overhead bloom filter for identifier hash lookups in maybe_expand_macro  (3.1%)
* Optimized inline version of library calls (isspace, isalnum), and avoid strtoul for single digit numbers in evaluate_if   (2.6%)
* Array reserve function inlining  (1.7%)
* Fetch output size from preprocessor library instead of calling strlen  (1.2%)
* Avoid FName -> string -> FName round trip conversion of key in FShaderCompilerEnvironment::Merge   (0.3%)
* Use string concatentation instead of printf and inline storage in AddStbDefines   (0.3%)
* Other overhead reduction (remove unused fast_dest feature, smaller stbds_array_header struct, savings from inlining)    (1.5%)

#rnx
#rb dan.elksnitis jason.nadro

[CL 28537574 by jason hoerner in ue5-main branch]
2023-10-06 09:14:35 -04:00
dan elksnitis
5518e3d9f3 [shaders] deprecating old PreprocessShader and IShaderFormat::CompileShader APIs now that all shader formats have been migrated
#rb Jason.Nadro

[CL 28452658 by dan elksnitis in ue5-main branch]
2023-10-04 09:22:36 -04:00
jason hoerner
57a469b8d9 Build fix -- preprocessor compile error due to assigning a variable twice in one statement (once in the macro, once in the code). Second instance in the code, doh!
#rnx

[CL 28221361 by jason hoerner in ue5-main branch]
2023-09-26 07:30:24 -04:00
jason hoerner
fb2a55e5b7 Build fix -- preprocessor compile error due to assigning a variable twice in one statement (once in the macro, once in the code).
#rnx

[CL 28221307 by jason hoerner in ue5-main branch]
2023-09-26 07:29:25 -04:00
jason hoerner
046ace59ff ShaderCompiler: Preprocessor optimizations, first pass. Saves 16.7% overall on ConditionalPreprocessShader.
* Inline array memory allocation added to low level preprocessor for output and various temporary buffers to reduce dynamic memory allocation and reallocation overhead.  Saved 4.6%.
* FShaderPreprocessOutput::StripCode optimized to write to FString as TCHAR array, rather than using AppendChar (over 4x speedup).  Saved 2.9%
* Shader source file cache now also stores stripped and ANSI converted source, to avoid need to convert and strip the source, plus allocating a copy is avoided.  Saved 4.3%
* Uniform buffer structure declarations stored as ANSI converted source, avoiding convert and copy.  Saved 4.9%

#rnx
#rb dan.elksnitis jason.nadro

[CL 28219741 by jason hoerner in ue5-main branch]
2023-09-26 05:29:05 -04:00
dan elksnitis
6cc4950575 [shaders]
- modify versioning mechanisms to use UEMETADATA pragmas
- update commands which generate version.ush files to generate such pragmas instead of comments
- include the above pragma-driven version(s) in the input hash when preprocessed job cache is enabled (not needed for the disabled case since as before the entire contents of the version files are hashed)
- fix bug in stb_preprocessor that was causing it to stop expanding macros if any diagnostic was encountered (and rename the array field storing diagnostic messages to diagnostics, since it contains more than just errors)

#rb Yuriy.ODonnell

[CL 27515060 by dan elksnitis in ue5-main branch]
2023-08-31 04:49:18 -04:00
dan elksnitis
9a5a1b6f74 [shaders] minor include cleanup
#rb Jason.Hoerner
#rb Laura.Hermanns

[CL 26817718 by dan elksnitis in ue5-main branch]
2023-08-03 13:40:32 -04:00
jason hoerner
7f22080814 Shader Compiler: Made FShaderCompilerDefinitions private, to reduce code publicly visible in ShaderCore.h. For now, it's just marked deprecated for 5.4, but will be hidden more generally in 5.5.
#rnx
#rb yuriy.odonnell dan.elksnitis jason.nadro

[CL 26534998 by jason hoerner in ue5-main branch]
2023-07-22 06:35:10 -04:00
jason hoerner
e1fe70dbca Shader Compiler: GlobalBeginCompileShader define optimizations, plus a few other micro-optimizations, producing a 3.2x performance improvement.
* Major define optimization involves converting map of defines to use FName keys and variant values rather than strings -- this eliminates most of the cost of string hashing, allocation, and conversion.
* Lower overhead FHashTable used instead of TMap.
* An initial map of defines can optionally be provided globally.  Anything using an initial define can have its map index cached for optimized lookup when reading or writing.

Other micro-optimizations:
* Added Reserve calls for uniform buffer related maps, to eliminate map resizing / rehashing.  Saved around 15% perf (after define optimizations).
* Added a map for UB lookup, instead of iterating through the linked list.  Saved around 10% perf.

Non-optimization:  Sort the order in which uniform buffer variable names are searched in BuildShaderFileToUniformBufferMap, to create determinism in ShaderDebug data for A/B testing (previously order was dependent on global constructor order for UB definitions, which could vary arbitrarily with unrelated changes).

#jira UE-187334
#rnx
#rb dan.elksnitis jason.nadro yuriy.odonnell

[CL 26142884 by jason hoerner in ue5-main branch]
2023-06-21 03:26:02 -04:00
henrik karlsson
f027bfa856 [Core]
* Fixed compile errors when compiling .h files in isolation

#rb none

[CL 25865668 by henrik karlsson in ue5-main branch]
2023-06-08 02:30:27 -04:00
christopher waters
993a7ec191 Fixing shader error dumping:
- When checking if we should output debug info, the output won't always be successful
- When preprocessing the shader source, make sure the output buffer is cleared before we do anything to prevent doubling up the shader source on retries.

#rb dan.elksnitis

[CL 25857810 by christopher waters in ue5-main branch]
2023-06-07 16:52:24 -04:00
dan elksnitis
b083d56fb1 [shaders] partial backout of preprocessor optimizations. the reasons for the performance gain were incorrectly assessed and the additional cache is not providing any measurable gains, so removing it as unnecessary complexity.
#rb Jason.Nadro
#preflight 64779d234b0d5a1eb16000b3

[CL 25724429 by dan elksnitis in ue5-main branch]
2023-06-01 09:03:52 -04:00
dan elksnitis
459ef8ba65 [shader preprocessor] resubmit - optimizations
- keep a preprocessor-specific shared cache of loaded shader files in ANSI format similar to the one in ShaderCore but skipping the unnecessary additional load to widechar, along with the associated extra allocations and conversions, and stripping comments directly as part of the load
- for in-memory source contained in the environment, which can't use the above due to potential different contents of same-named includes across jobs, perform the widechar->ansichar conversion and comment strip in a single step rather than converting then stripping, to save an additional allocation of the full source

#rb Jason.Nadro
#rb Yuriy.ODonnell
#preflight 64775dca2e6c1a0737f6aaef

[CL 25702732 by dan elksnitis in ue5-main branch]
2023-05-31 11:08:31 -04:00
dan elksnitis
9b758ba87c [Backout] - CL25621499
#fyi dan.elksnitis
Original CL Desc
-----------------------------------------------------------------
[shader preprocessor] optimizations
- keep a preprocessor-specific shared cache of loaded shader files in ANSI format similar to the one in ShaderCore but skipping the unnecessary additional load to widechar, along with the associated extra allocations and conversions, and stripping comments directly as part of the load
- for in-memory source contained in the environment, which can't use the above due to potential different contents of same-named includes across jobs, perform the widechar->ansichar conversion and comment strip in a single step rather than converting then stripping, to save an additional allocation of the full source

#rb Jason.Nadro
#rb Yuriy.ODonnell
#preflight 646f74ac407983b99801c870

[CL 25626032 by dan elksnitis in ue5-main branch]
2023-05-25 15:00:40 -04:00
dan elksnitis
c4e34c8b63 [shader preprocessor] optimizations
- keep a preprocessor-specific shared cache of loaded shader files in ANSI format similar to the one in ShaderCore but skipping the unnecessary additional load to widechar, along with the associated extra allocations and conversions, and stripping comments directly as part of the load
- for in-memory source contained in the environment, which can't use the above due to potential different contents of same-named includes across jobs, perform the widechar->ansichar conversion and comment strip in a single step rather than converting then stripping, to save an additional allocation of the full source

#rb Jason.Nadro
#rb Yuriy.ODonnell
#preflight 646f74ac407983b99801c870

[CL 25621499 by dan elksnitis in ue5-main branch]
2023-05-25 11:00:28 -04:00
dan elksnitis
f645f93588 [shaders] remove legacy preprocessor option & mcpp library
#rb Jason.Nadro
#preflight 6467bf0c2c0a5da0dcd7aaf2
#fyi Yuriy.ODonnell

[CL 25549427 by dan elksnitis in ue5-main branch]
2023-05-19 14:50:25 -04:00
dan elksnitis
b52faed5d5 [shaders]
- add new IShaderFormat API for separate preprocessing and compilation; backends can implement one or the other depending on the return value of SupportsIndependentPreprocessing
- add support for executing preprocessing in the cook process prior to job submission and constructing job input hashes based on preprocessed source (and a subset of the environment used as compile inputs). controlled by a cvar for now and disabled by default
- add a BaseShaderFormat class in ShaderCompilerCommon which implements common behaviour for output of debug data - note this function is only called for formats which support independent preprocessing, so is expected to be used only by formats which have been converted to use this API
- add new cvars for output of some additional shader debug data - 1. a txt file containing the input hash a.k.a. job cache key 2. a text file containing all diagnostic messages (errors and warnings) for the job
- minor change to how input hashes are constructed for pipeline jobs - sum hashes as 256-bit ints instead of adding to a buffer and re-hashing. faster and simpler, and also more collision resistant (sum of two well distributed hashes equally well distributed)

#rb Jason.Nadro
#rb Yuriy.ODonnell
#preflight 64512c88c86798f650b953d3

[CL 25317218 by dan elksnitis in ue5-main branch]
2023-05-03 10:17:48 -04:00
dan elksnitis
a21eb84e66 [shader preprocessor] implement #error directives
#rb Jason.Nadro
#preflight 644931ee1c2846595c0887c6

[CL 25199050 by dan elksnitis in ue5-main branch]
2023-04-26 11:23:31 -04:00
dan elksnitis
f2c9000b60 [shaders] deprecate GetDefinitions accessor on FShaderCompilerEnvironment; encapsulate internal usage of definitions object in friend classes and remove some superfluous code calling the function unnecessarily
#preflight 6425b46391589478cdab0c35
#rb Laura.Hermanns
#rb Sameer.Mirza

[CL 24870640 by dan elksnitis in ue5-main branch]
2023-03-31 09:55:19 -04:00
Matt Peters
6be2b0412d ShaderPreprocessor: Fix access violation due to reading off the end of the input text when the text contains an unterminated /* comment.
#jira UE-178273
#rnx
#rb charles.derousiers
#preflight 63f79630ef1b24bf940b32e0

[CL 24382270 by Matt Peters in ue5-main branch]
2023-02-23 11:44:33 -05:00
Joe Kirchoff
4d1c3ab0b8 Disable sndbs distribution for ShaderPreprocessor module
#rnx
#rb trivial
#preflight 63e41634ea7ad6869835614b

[CL 24083177 by Joe Kirchoff in ue5-main branch]
2023-02-08 16:42:12 -05:00
Jeremy Moore
3e7f174035 STB processor reemits line directives.
#preflight 63c838c902024f93d8dc9534

[CL 23760433 by Jeremy Moore in ue5-main branch]
2023-01-18 13:37:36 -05:00
dan elksnitis
e5d2976d39 [shaders] fix msvc static analysis warning - enforce that loaded includes are non-empty when loading via stb file load callback.
#rb Yuriy.ODonnell
#preflight 6388f525303395f6c93adb67

[CL 23357182 by dan elksnitis in ue5-main branch]
2022-12-01 14:02:18 -05:00