Specify an allocation order with a register class. This is used by register
allocators with a greedy heuristic. This is usefull as it is sometimes
beneficial to color more constrained classes first.
Differential Revision: http://reviews.llvm.org/D8626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233743 91177308-0d34-0410-b5e6-96231b3b80d8
When allocating live intervals in linear order and all of them are local
to a single basic block you get an optimal coloring. This is also true
if you reverse the order, but it is not true if you sort live ranges
beginnings in reverse order, change to sort live range endings in
reverse order. Take the following live ranges for example:
|---| |--------|
|----------| |-------|
They get colored suboptimally with 3 registers if you sort the live range
starting points in reverse order (but optimally with live range begins in order,
or live range ends in reverse order).
Apparently the previous strategy was intentional because of allocation
time considerations. I am having a hard time replicating these effects,
while I see substantial improvements in allocation quality with this
change.
No testcase as none of the (in tree) targets use reverse order mode.
Differential Revision: http://reviews.llvm.org/D8625
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233742 91177308-0d34-0410-b5e6-96231b3b80d8
MachineFunction argument so that it can look up the subtarget
rather than using a cached one in some Targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231888 91177308-0d34-0410-b5e6-96231b3b80d8
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.
** Context **
Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.
This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B
Whereas we could simply have done:
def B
...
def A
use A
...
use B
** Proposed Solution **
A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.
Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.
The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.
** Example **
Consider the following example:
BB1:
a =
b =
BB2:
...
= b
= a
Let us assume b gets split:
BB1:
a =
b =
BB2:
c = b
...
d = c
= d
= a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
a =
st a, SpillSlot
b =
BB2:
c = b
...
d = c
= d
e = ld SpillSlot
= e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.
** Performances **
Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.
<rdar://problem/18312047>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225422 91177308-0d34-0410-b5e6-96231b3b80d8
Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222118 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
- coalescing (or any other "side effect" of reg alloc) is negative, and
instead of being derived from a spill cost, they use the block
frequency info.
- spill costs are in the [MinSpillCost:+inf( range
- register or interference costs are in [0.0:MinSpillCost( or +inf
The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.
The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221292 91177308-0d34-0410-b5e6-96231b3b80d8
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.
See PR18996.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218981 91177308-0d34-0410-b5e6-96231b3b80d8
It's also possible to just write "= nullptr", but there's some question
of whether that's as readable, so I leave it up to authors to pick which
they prefer for now. If we want to discuss standardizing on one or the
other, we can do that at some point in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213438 91177308-0d34-0410-b5e6-96231b3b80d8
heuristic.
By default, no functionality change.
This is a follow-up of r212099.
This hook provides a finer grain to control the optimization.
<rdar://problem/17444599>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212204 91177308-0d34-0410-b5e6-96231b3b80d8
By default, no functionality change.
Before evicting a local variable, this heuristic tries to find another (set of)
local(s) that can be reassigned to a free color.
In some extreme cases (large basic blocks with tons of local variables), the
compilation time is dominated by the local interference checks that this
heuristic must perform, with no code gen gain.
E.g., the motivating example takes 4 minutes to compile with this heuristic, 12
seconds without.
Improving the situation will likely require to make drastic changes to the
register allocator and/or the interference check framework.
For now, provide this flag to better understand the impact of that heuristic.
<rdar://problem/17444599>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212099 91177308-0d34-0410-b5e6-96231b3b80d8
define below all header includes in the lib/CodeGen/... tree. While the
current modules implementation doesn't check for this kind of ODR
violation yet, it is likely to grow support for it in the future. It
also removes one layer of macro pollution across all the included
headers.
Other sub-trees will follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206837 91177308-0d34-0410-b5e6-96231b3b80d8
fexhaustive-register-search => exhaustive-register-search
'f' is a Clang thing!
This is related to PR18747.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206075 91177308-0d34-0410-b5e6-96231b3b80d8
Until r197284, the entry frequency was constant -- i.e., set to 2^14.
Although current ToT still has a constant entry frequency, since r197284
that has been an implementation detail (which is soon going to change).
- r204690 made the wrong assumption for the CSRCost metric. Adjust
callee-saved register cost based on entry frequency.
- r185393 made the wrong assumption (although it was valid at the
time). Update SpillPlacement.cpp::Threshold to be relative to the
entry frequency.
Since ToT still has 2^14 entry frequency, this should have no observable
functionality change.
<rdar://problem/14292693>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205789 91177308-0d34-0410-b5e6-96231b3b80d8
When register allocator's stage is RS_Spill, we choose spill over using the CSR
for the first time, if the spill cost is lower than CSRCost.
When register allocator's stage is < RS_Split, we choose pre-splitting over
using the CSR for the first time, if the cost of splitting is lower than
CSRCost.
CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
default value is 0 to generate the same codes as before this commit.
With a value of 15 (1 << 14 is the entry frequency), I measured performance
gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
on an arm device.
rdar://16162005
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204690 91177308-0d34-0410-b5e6-96231b3b80d8