The default threshold for loop unrolling was 32, so the compiler tried to make a dynamic loop when we had more iterations. The vector VM doesn't support that, so the codegen was never designed to understand dynamic loops, leading to an assertion failure. Increased the threshold to a large value, so we never hit this problem in practice.
#rnx
#jira none
#rb none
#ROBOMERGE-SOURCE: CL 12422302 in //UE4/Release-4.25/... via CL 12422324
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Release-Engine-Staging) (v671-12333473)
[CL 12426823 by mihnea balta in Release-Engine-Staging branch]
The default threshold for loop unrolling was 32, so the compiler tried to make a dynamic loop when we had more iterations. The vector VM doesn't support that, so the codegen was never designed to understand dynamic loops, leading to an assertion failure. Increased the threshold to a large value, so we never hit this problem in practice.
#rnx
#jira none
#rb none
[CL 12422302 by mihnea balta in 4.25 branch]
The branch flattening code assumed the function will always have a return value. Also, it never advanced the current parameter pointer when going through the function arguments, so it was trying to dereference the first parameter for all the arguments.
#rnx
#jira none
#rb Simon.Tovey
#ROBOMERGE-SOURCE: CL 12387646 in //UE4/Release-4.25/... via CL 12387649
#ROBOMERGE-BOT: RELEASE (Release-4.25Plus -> Main) (v671-12333473)
[CL 12387651 by mihnea balta in Main branch]
The branch flattening code assumed the function will always have a return value. Also, it never advanced the current parameter pointer when going through the function arguments, so it was trying to dereference the first parameter for all the arguments.
#rnx
#jira none
#rb Simon.Tovey
[CL 12387646 by mihnea balta in 4.25 branch]
#RB Rob.Krajcarski, Stu.McKenna
#JIRA UE-84463
#ROBOMERGE-OWNER: arne.schober
#ROBOMERGE-AUTHOR: arne.schober
#ROBOMERGE-SOURCE: CL 12105113 via CL 12121713
#ROBOMERGE-BOT: (v657-12064184)
[CL 12121717 by arne schober in Main branch]
The opcode was only added to the argument list when WITH_EDITOR was defined, so all the positional arguments used by the string formatter were offset by -1.
#rb none
#ROBOMERGE-SOURCE: CL 11335700 via CL 11335730
#ROBOMERGE-BOT: (v653-11302973)
[CL 11335734 by mihnea balta in Main branch]
- Defines will still be there if we're dumping the shader code for debug.
- This makes the code less unique in general case and also avoids unnecessary works that no one sees.
#rb Lukas.Hermanns
[FYI] Lukas.Hermanns, Rolando.Caloca
#ROBOMERGE-SOURCE: CL 11311362 via CL 11311438
#ROBOMERGE-BOT: (v653-11302973)
[CL 11311455 by arciel rekman in Main branch]
The comparison operators return boolean vectors, but we were assigning the result to scalar booleans. We need to use all() around comparisons, to make sure they work correctly. Since there's no all() opcode in the VM, I've added NiagaraAll() which emulates it on CPU.
Also added support for comparing boolean values (new operator nodes plus support in the VM codegen).
#rb none
#ROBOMERGE-SOURCE: CL 11282275 via CL 11282292
#ROBOMERGE-BOT: (v647-11244347)
[CL 11284441 by mihnea balta in Main branch]
#rnx
#rb none
#ROBOMERGE-SOURCE: CL 10869240 via CL 10869516 via CL 10869902
#ROBOMERGE-BOT: (v613-10869866)
[CL 10870584 by ryan durand in Main branch]
- Read ByteCode directly if unalgned loads are supported to condense 3 opts into 1 when reading a uint16
- Pre-calculate instance loops for VECTOR_WIDTH_FLOATS to avoid each vector op having to round / divide instance counts
- Added runtime optimization of the VM script, this currently boils down to a function call per VM invoke + storing the data required
- Use vm.OptimizeVMByteCode to enable optimized code generation
- Use vm.UseOptimizedVMByteCode to enable running optimized code rather the traditional byte code
[FYI] simon.tovey,rob.krajcarski,shaun.kime
#rnx
#ROBOMERGE-SOURCE: CL 9962648 via CL 9964955
#ROBOMERGE-BOT: (v560-9963197)
[CL 9965540 by stu mckenna in Main branch]
- VM is now directly fed a set of pre generated register tables from the datasets.
- Split the monolithic register table in the VM up so there are explicit I/O and temp register tables the script indexes into directly.
- Avoids some recreation of expensve objects in favour of manual reset calls.
- Re-wrote Oupt kernel to be more explicit. Going via templated handler in this case didn't get us any code reuse and just obfuscated it's workings.
Saves ~10-25us of overhead per VM involcation which soon adds up.
Saves ~1-2us inside each VM exec itself.
#rb Stu.Mckenna
#ROBOMERGE-SOURCE: CL 9743796 via CL 9743798
#ROBOMERGE-BOT: (v542-9736015)
[CL 9745804 by simon tovey in Main branch]
#rb Fred.Kimberley
#jira
#rnx
[CODEREVIEW] Simon.Tovey
#ROBOMERGE-SOURCE: CL 9152880 via CL 9152882 via CL 9152886
#ROBOMERGE-BOT: (v443-9013191)
[CL 9153386 by marc audy in Main branch]
#rb Nicholas.Goldstein
[FYI] shuan.kime
#ROBOMERGE-OWNER: Simon.Tovey
#ROBOMERGE-AUTHOR: simon.tovey
#ROBOMERGE-SOURCE: CL 9133376 via CL 9134494 via CL 9134514
#ROBOMERGE-BOT: (v443-9013191)
[CL 9135322 by Simon Tovey in Main branch]
Also run another optimization path after scalarization
Originally authored by Arne.Schober, integrated by Shaun.Kime
#rb Arne.Schober
[FYI] Simon.Tovey
#tests Engine tests passed
#ROBOMERGE-OWNER: shaun.kime
#ROBOMERGE-AUTHOR: shaun.kime
#ROBOMERGE-SOURCE: CL 8927626 via CL 8927643 via CL 8927657
#ROBOMERGE-BOT: (v429-8924992)
[CL 8928191 by shaun kime in Main branch]
- Optimizing temp register layout for better cache usage.
- Moving VM context from TLS to pool.
#rb Stu.Mckenna, Shaun.Kime
#ROBOMERGE-OWNER: simon.tovey
#ROBOMERGE-AUTHOR: simon.tovey
#ROBOMERGE-SOURCE: CL 8886792 via CL 8886955 via CL 8889429
#ROBOMERGE-BOT: (v427-8887818)
[CL 8890014 by simon tovey in Main branch]
#rb Shaun.Kime
#jira UE-75719
#ROBOMERGE-OWNER: simon.tovey
#ROBOMERGE-AUTHOR: simon.tovey
#ROBOMERGE-SOURCE: CL 7974077 via CL 7974683
#ROBOMERGE-BOT: (v396-7974030)
[CL 7975059 by simon tovey in Main branch]