When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.
Differential revision: https://reviews.llvm.org/D27742
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291609 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true. The timeline is as
follows:
1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
`TailDup.shouldTailDuplicate` on its argument, which in turn calls
`analyzeBranch` with `AllowModify` set to true.
2. This `analyzeBranch` call edits the terminator sequence of the block
based on the physical layout of the machine function, turning an
unanalyzable non-fallthrough block to a unanalyzable fallthrough
block. Normally MBP bails out of rearranging such blocks, but this
block was unanalyzable non-fallthrough (and thus rearrangeable) the
first time MBP looked at it, and so it goes ahead and decides where
it should be placed in the function.
3. When placing this block MBP fails to analyze and thus update the
block in keeping with the new physical layout.
Concretely, before (1) we have something like:
```
LBL0:
< unknown terminator op that may branch to LBL1 >
jmp LBL1
LBL1:
... A
LBL2:
... B
```
In (2), analyze branch simplifies this to
```
LBL0:
< unknown terminator op that may branch to LBL2 >
;; jmp LBL1 <- redundant jump removed
LBL1:
... A
LBL2:
... B
```
In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.
```
LBL0:
< unknown terminator op that may branch to LBL2 >
;; jmp LBL1 <- redundant jump
LBL2:
... B
LBL1:
... A
```
and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.
There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator. That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.
I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.
Reviewers: chandlerc, gberry, MatzeB, iteratee
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27783
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289764 91177308-0d34-0410-b5e6-96231b3b80d8
We fail to produce bit-to-bit matching stage2 and stage3 compiler in PGO
bootstrap build. The reason is because LoopBlockSet is of SmallPtrSet type
whose iterating order depends on the pointer value.
This patch fixes this issue by changing to use SmallSetVector.
Differential Revision: http://reviews.llvm.org/D26634
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287148 91177308-0d34-0410-b5e6-96231b3b80d8
There is a use after free bug in the existing code. Loop layout selects
a preferred exit block, and then lays out the loop. If this block is
removed during layout, it needs to be invalidated to prevent a use after
free.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285348 91177308-0d34-0410-b5e6-96231b3b80d8
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283934 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r283842.
test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283857 91177308-0d34-0410-b5e6-96231b3b80d8
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283842 91177308-0d34-0410-b5e6-96231b3b80d8
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Differential revision: https://reviews.llvm.org/D18226
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283619 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 062ace9764.
Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283292 91177308-0d34-0410-b5e6-96231b3b80d8
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well.
Differential revision: https://reviews.llvm.org/D18226
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283274 91177308-0d34-0410-b5e6-96231b3b80d8
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283164 91177308-0d34-0410-b5e6-96231b3b80d8
This is prep work for allowing the threshold to be different during layout,
and to enforce a single threshold between merging and duplicating during
layout. No observable change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279117 91177308-0d34-0410-b5e6-96231b3b80d8
If the result of the find is only used to compare against end(), just
use is_contained instead.
No functionality change is intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278433 91177308-0d34-0410-b5e6-96231b3b80d8
The following pattern was being layed out poorly:
A
/ \
B C
/ \ / \
D E ? (Doesn't matter)
Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)
The current algorithm gives:
A,B,C,E (D goes on worklist)
It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:
A,B,E, (C and D are in a worklist)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277187 91177308-0d34-0410-b5e6-96231b3b80d8