Commit Graph

64 Commits

Author SHA1 Message Date
Tom Stellard 107362cf5f RegAllocGreedy: Properly initialize this pass, so that -run-pass will work
Reviewers: qcolombet, MatzeB

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26572

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286895 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-14 21:50:13 +00:00
Matt Arsenault c205b62417 Move AArch64BranchRelaxation to generic code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283459 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-06 15:38:53 +00:00
Hal Finkel f73c9c11e7 Add a counter-function insertion pass
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.

Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.

Differential Revision: https://reviews.llvm.org/D22825

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280347 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-01 09:42:39 +00:00
Brendon Cahoon c1359c9fbb MachinePipeliner pass that implements Swing Modulo Scheduling
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.

This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.

This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.

Differential Review: http://reviews.llvm.org/D16829


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277169 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:44:44 +00:00
Dean Michael Berris cee9af9136 XRay: Add entry and exit sleds
Summary:
In this patch we implement the following parts of XRay:

- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.

There are some caveats here:

1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.

2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.

Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk

Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D19904

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275367 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-14 04:06:33 +00:00
Wei Mi 302c35d621 [PM] Port UnreachableBlockElim to the new Pass Manager
Differential Revision: http://reviews.llvm.org/D22124


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274824 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-08 03:32:49 +00:00
Michael Kuperstein f3f8fc6a20 [PM] Port PreISelIntrinsicLowering to the new PM
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273713 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 20:13:42 +00:00
Matthias Braun 1cd242fe11 CodeGen: Refactor renameDisconnectedComponents() as a pass
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass.
Also change the name to "RenameIndependentSubregs":

- renameDisconnectedComponents() worked on a MachineFunction at a time
  so it is a natural candidate for a machine function pass.

- The algorithm is testable with a .mir test now.

- This also fixes a problem where the lazy renaming as part of the
  MachineScheduler introduced IMPLICIT_DEF instructions after the number
  of a nodes in a region were counted leading to a mismatch.

Differential Revision: http://reviews.llvm.org/D20507

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-31 22:38:06 +00:00
Matthew Simpson d06ea8a8f4 [ARM, AArch64] Properly initialize InterleavedAccessPass
InterleavedAccessPass is an IR-level pass, so this change will enable testing
it with opt. This is part of D20250.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270101 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-19 20:08:32 +00:00
Matthias Braun e5c4e28d9c CodeGen: Add DetectDeadLanes pass.
The DetectDeadLanes pass performs a dataflow analysis of used/defined
subregister lanes across COPY instructions and instructions that will
get lowered to copies. It detects dead definitions and uses reading
undefined values which are obscured by COPY and subregister usage.

These dead definitions cause trouble in the register coalescer which
cannot deal with definitions suddenly becoming dead after coalescing
COPY instructions.

For now the pass only adds dead and undef flags to machine operands. It
should be possible to extend it in the future to remove the dead
instructions and redo the analysis for the affected virtual
registers.

Differential Revision: http://reviews.llvm.org/D18427

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267851 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-28 03:07:16 +00:00
Peter Collingbourne 74eabdd998 Introduce llvm.load.relative intrinsic.
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.

LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.

Differential Revision: http://reviews.llvm.org/D18367

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267223 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-22 21:18:02 +00:00
Tom Stellard fbbc621bb4 CodeGen: Add a stand-alone hazard recognizer pass
Summary:
This new pass allows targets to use the hazard recognizer without having
to also run one of the schedulers.  This is useful when compiling with
optimizations disabled for targets that still need noop hazards
to be handled correctly.

Reviewers: hfinkel, atrick

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18594

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267156 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-22 14:43:50 +00:00
Sanjoy Das 151328540e Introduce a "patchable-function" function attribute
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.

Reviewers: joker.eph, rnk, echristo, dberris

Subscribers: joker.eph, echristo, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19046

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266715 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-19 05:24:47 +00:00
Benjamin Kramer b714d34a7e Move SafeStack to CodeGen.
It depends on the target machinery, that's not available for
instrumentation passes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258942 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-27 16:53:42 +00:00
Vikram TV b1415e7eba Recommit LiveDebugValues pass after fixing a couple of minor issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255759 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-16 11:09:48 +00:00
Mehdi Amini b310704b13 Revert "Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933"
This reverts commit r255096.

Break the bots: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/16378/

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255101 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 08:17:42 +00:00
Vikram TV 2f351a5ca7 Implement a new pass - LiveDebugValues - to compute the set of live DEBUG_VALUEs at each basic block and insert them. Reviewed and accepted at: http://reviews.llvm.org/D11933
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255096 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 05:49:14 +00:00
David Majnemer 048d7e541d [WinEH] Add a funclet layout pass
Windows EH funclets need to be contiguous.  The FuncletLayout pass will
ensure that the funclets are together and begin with a funclet entry MBB.

Differential Revision: http://reviews.llvm.org/D12943

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247937 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-17 20:45:18 +00:00
Sanjoy Das 8d5b28507b [CodeGen] Add a pass to fold null checks into nearby memory operations.
Summary:
This change adds an "ImplicitNullChecks" target dependent pass.  This
pass folds null checks into memory operation using the FAULTING_LOAD
pseudo-op introduced in previous patches.

Depends on D10197
Depends on D10199
Depends on D10200

Reviewers: reames, rnk, pgavlin, JosephTremoulet, atrick

Reviewed By: atrick

Subscribers: ab, JosephTremoulet, llvm-commits

Differential Revision: http://reviews.llvm.org/D10201

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239743 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-15 18:44:27 +00:00
Quentin Colombet 2f7322b348 [ShrinkWrap] Add (a simplified version) of shrink-wrapping.
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.

As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.


** Context **

Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.


** Motivating example **

Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b)  {
 %tmp = alloca i32, align 4
 %tmp2 = icmp slt i32 %a, %b
 br i1 %tmp2, label %true, label %false

true:
 store i32 %a, i32* %tmp, align 4
 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
 br label %false

false:
 %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
 ret i32 %tmp.0
}

On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f:                                     ; @f
; BB#0:
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
LBB0_2:                                 ; %false
  mov  sp, x29
  ldp x29, x30, [sp], #16
  ret

With shrink-wrapping we could generate:
_f:                                     ; @f
; BB#0:
  cmp  w0, w1
  b.ge  LBB0_2
; BB#1:                                 ; %true
  stp x29, x30, [sp, #-16]!
  mov  x29, sp
  sub sp, sp, #16             ; =16
  stur  w0, [x29, #-4]
  sub x1, x29, #4             ; =4
  mov  w0, wzr
  bl  _doSomething
  add sp, x29, #16            ; =16
  ldp x29, x30, [sp], #16
LBB0_2:                                 ; %false
  ret

Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.


** Proposed Solution **

This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.

Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.

The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.

Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.


** Design Decisions **

1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
  pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.

Differential Revision: http://reviews.llvm.org/D9210

<rdar://problem/3201744>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236507 91177308-0d34-0410-b5e6-96231b3b80d8
2015-05-05 17:38:16 +00:00
Reid Kleckner 4c27f8d49e Reland r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
Fix the double-deletion of AnalysisResolver when delegating through to
Dwarf EH preparation by creating one from scratch. Hopefully the new
pass manager simplifies this.

This reverts commit r229952.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231719 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-09 22:45:16 +00:00
Chandler Carruth 90b8e791ac Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229952 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 02:15:36 +00:00
Reid Kleckner 49ab3a626a EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.

Fixes PR20300.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7744

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229944 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 01:00:19 +00:00
Chandler Carruth a6a87b595d [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 03:43:40 +00:00
JF Bastien 7f0cbb5703 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225948 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 05:24:33 +00:00