* Moved identifier copy and macro bloom filter from maybe_expand_macro to test into copy_to_action_point / copy_to_action_point_macro_expansion. 13.1% of improvement.
* SSE implementation of scan_to_directive, 10x faster, 5.2%
* SSE implementation of identifier copy, 3x faster, 4.5%
* SSE ShaderConvertAndStripComments, 4x faster, 3.6%
* Fast inline string equality comparison, 4x faster, 1.5%
To make SSE implementations "safe" without needing special cases near the end of a buffer, it's necessary to ensure padding is present in the relevant buffers, anything that goes through a preprocess_string call. This includes the string arena allocator, temporary stbds arrays that hold strings, and file buffers passed in. The latter all pass through ShaderConvertAndStripComments, where we can add padding. (ShaderConvertAndStripComments itself has special cases for end of buffer). Code related to original 1 and 2 character macro filter removed, since I can't see a reason to enable it over the bloom filter.
I also attempted SSE optimization of copy_to_action_point and copy_line_without_comments, but improvement wasn't big enough to be worth the complexity (around 2% for the former, but massive code complexity, 0.5% for the latter). That's pretty much everything SSE friendly that's over 1% on a profile, although I think copy_argument can be made a lot faster, not primarily through SSE.
#jira UE-197212
#rnx
#rb yuriy.odonnell jason.nadro
[CL 28834324 by jason hoerner in ue5-main branch]
* Low overhead bloom filter for identifier hash lookups in maybe_expand_macro (3.1%)
* Optimized inline version of library calls (isspace, isalnum), and avoid strtoul for single digit numbers in evaluate_if (2.6%)
* Array reserve function inlining (1.7%)
* Fetch output size from preprocessor library instead of calling strlen (1.2%)
* Avoid FName -> string -> FName round trip conversion of key in FShaderCompilerEnvironment::Merge (0.3%)
* Use string concatentation instead of printf and inline storage in AddStbDefines (0.3%)
* Other overhead reduction (remove unused fast_dest feature, smaller stbds_array_header struct, savings from inlining) (1.5%)
#rnx
#rb dan.elksnitis jason.nadro
[CL 28537574 by jason hoerner in ue5-main branch]
* Inline array memory allocation added to low level preprocessor for output and various temporary buffers to reduce dynamic memory allocation and reallocation overhead. Saved 4.6%.
* FShaderPreprocessOutput::StripCode optimized to write to FString as TCHAR array, rather than using AppendChar (over 4x speedup). Saved 2.9%
* Shader source file cache now also stores stripped and ANSI converted source, to avoid need to convert and strip the source, plus allocating a copy is avoided. Saved 4.3%
* Uniform buffer structure declarations stored as ANSI converted source, avoiding convert and copy. Saved 4.9%
#rnx
#rb dan.elksnitis jason.nadro
[CL 28219741 by jason hoerner in ue5-main branch]
- modify versioning mechanisms to use UEMETADATA pragmas
- update commands which generate version.ush files to generate such pragmas instead of comments
- include the above pragma-driven version(s) in the input hash when preprocessed job cache is enabled (not needed for the disabled case since as before the entire contents of the version files are hashed)
- fix bug in stb_preprocessor that was causing it to stop expanding macros if any diagnostic was encountered (and rename the array field storing diagnostic messages to diagnostics, since it contains more than just errors)
#rb Yuriy.ODonnell
[CL 27515060 by dan elksnitis in ue5-main branch]
- add handling of #pragma directives in macro definitions
- add handling of #pragma message to create diagnostic outputs
- add and update comments in preprocessor.h
- unifying memory allocation mechanisms between stb_ds, stb_alloc and preprocessor - allow configuring all functions performing memory allocation/freeing via macros which should be specified via a header whose include path is specified via the STB_CONFIG macro (this will be included in stb_common.h if defined)
- don't pass an stb_arena to the include resolution callback; this is unnecessary complexity (implementation can handle its own memory allocation as long as the resulting strings have the appropriate lifetime) and indeed the arena used was global (and so not threadsafe)
- expose function allowing external modification of error modes so we can disable the error case for preprocessor directives not being at the start of a line (and so handle the case where we #define a macro whose value is a #pragma, or any other string possibly containing a # that we want to pass unmodified to the output i.e. the case of #define COMPILER_DEFINE #define which we use to keep #defines in the preprocessed code for handling by the platform compilers)
- fix leak of pp diagnostic messages
- remove internal caches for include resolution; they are problematic since (a) they aren't threadsafe, and (b) the caching of resolved filenames based on the include path found in source is inherently problematic (can potentially cause issues with same-named files in different folders included via relative paths)
- strip all code related to explicit file loading and lists of include paths; this has no purpose for our use case. note this makes the callbacks for file load/unload and include resolve mandatory, comments in preprocessor.h updated to reflect this.
- fix a few other memory leaks
#preflight 638611a9fa053c489a54b2ec
#rb Yuriy.ODonnell
[CL 23305324 by dan elksnitis in ue5-main branch]
- add explicit extern "C" on preprocessor methods to allow linking from C++ code
- add custom callbacks for include path resolution and file loading/unloading
- add padding space following macro expansion ending with a > to work around DXC bug (not handling >> correctly if it occurs at the end of a templated type)
- adding explicit handling for HLSL infinity constant (1.#INF); the # character outside of a directive was previously putting the preprocessor into an error state
- change line directive output to use #line format rather than # (DXC is ok with the latter but minifier doesn't currently handle it), and also always include the filename (so when minifying don't have to be careful not to strip the line directive with the filename in it); this also matches mcpp behaviour
- adding const-correctness and casts where needed to avoid const casts/compile errors in calling C++ code and satisfy Mac/Intellisense compilers
- bugfix: parsing of '.' was immediately putting the scanning state machine into "number" mode resulting in skipped macro expansion in code of the form "var.MACRO.var2"/numbers; added extra state transitions and changing the default transition when encountering '.' to an explicit mode that can allow either transitioning to parsing identifiers or numbers depending on following characters.
- bugfix: macros with an empty parameter list were incorrectly terminating said list on expansion (adding extra rparens)
- bugfix: set file load mode as "callback" when callback used for loading, and check the mode in the free callbacks
- various minor changes to satisfy Clang code analysis
#preflight 637e3391e30d438849ddf902
#rb Yuriy.ODonnell
[CL 23249332 by dan elksnitis in ue5-main branch]