axiomdl/llvm

mirror of https://github.com/AxioDL/llvm.git synced 2026-03-30 11:42:29 -07:00

Author	SHA1	Message	Date
Eugene Zelenko	68c521d030	[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292623 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-20 17:52:16 +00:00
Stanislav Mekhanoshin	b8fa7c40ea	[AMDGPU] Add exec copy to LiveIntervals in SILowerControlFlow::emitElse This instruction is missing from LiveIntervals. I'm not aware of any problems because of this though. Differential Revision: https://reviews.llvm.org/D28879 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292521 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-19 21:26:22 +00:00
Diana Picus	8a47810cd6	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291891 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-13 09:58:52 +00:00
Stanislav Mekhanoshin	ab827bdc35	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288053 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-28 18:58:49 +00:00
Stanislav Mekhanoshin	64620b1c31	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow Differential Revision: https://reviews.llvm.org/D26939 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287608 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-22 01:42:34 +00:00
Mehdi Amini	67f335d992	Use StringRef in Pass/PassManager APIs (NFC) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283004 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-01 02:56:57 +00:00
Matt Arsenault	0461ece2ce	AMDGPU: Partially fix control flow at -O0 Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282667 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-29 01:44:16 +00:00
Matt Arsenault	36a8c3e60f	AMDGPU: Remove register operand from si_mask_branch It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279899 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-27 00:42:21 +00:00
Matt Arsenault	7517ed227a	AMDGPU: Split SILowerControlFlow into two pieces Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279464 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-22 19:33:16 +00:00
Matt Arsenault	ece2d8b253	AMDGPU: Remove unused tracking of flat instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278361 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-11 17:15:28 +00:00
Matt Arsenault	34c6b123f7	AMDGPU: Change insertion point of si_mask_branch Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278273 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-10 19:11:42 +00:00
Nicolai Haehnle	b18ca96c79	AMDGPU: add execfix flag to SI_ELSE Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276972 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-28 11:39:24 +00:00
Reid Kleckner	12e910f70a	Remove MCAsmInfo.h include from TargetOptions.h TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276883 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-27 16:03:57 +00:00
Matt Arsenault	d506595769	AMDGPU: Make AMDGPUMachineFunction fields private ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276766 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-26 16:45:58 +00:00
Matt Arsenault	21e0aa8d55	AMDGPU: Make skip threshold an option git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276680 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-25 19:48:29 +00:00
Davide Italiano	f36cce1574	[AMDGPU] Remove spurious line (should've been removed in r276029). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276030 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-19 21:16:30 +00:00
Davide Italiano	5012465830	[AMDGPU] Remove dead code. LGTM'd by Matt Arsenault. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276029 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-19 21:10:49 +00:00
Matt Arsenault	4cead0b564	AMDGPU: Expand register indexing pseudos in custom inserter This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275934 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-19 00:35:03 +00:00
Matt Arsenault	beff7fe056	AMDGPU: Fix not expanding control flow after some kill blocks Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275510 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-15 00:58:15 +00:00
Matt Arsenault	011dcf3d90	AMDGPU: Fix trying to skip from a block with no successors Found while reducing bug 28550 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275509 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-15 00:58:13 +00:00
Matt Arsenault	8a85be7236	AMDGPU: Follow up to r275203 I meant to squash this into it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275220 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-12 21:41:32 +00:00
Matt Arsenault	3f7a1b2f11	AMDGPU: Fix verifier error with kill intrinsic Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275203 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-12 19:01:23 +00:00
Matt Arsenault	762cdd4ae8	Revert "AMDGPU: Remove unused control flow intrinsic" git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274978 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-09 17:18:39 +00:00
Matt Arsenault	c39550268e	AMDGPU: Improve offset folding for register indexing git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274954 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-09 01:13:56 +00:00
Matt Arsenault	5e2ec03cf4	AMDGPU: Remove unused control flow intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274939 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-08 21:39:44 +00:00

1 2 3

53 Commits