This will make the depedence graph more accurate if an alias analysis
is provided. If nullptr is specified in its place, the behavior will
remain as it is currently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255540 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow custom handling of packet finalization. The current
definition of endPacket will still perform the default finalization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255537 91177308-0d34-0410-b5e6-96231b3b80d8
When FastISel fails to translate an instruction it hands off code
generation to SelectionDAG. Before it does so, it may have generated
local value instructions to feed phi nodes in successor blocks. These
instructions will then be generated again by SelectionDAG, causing
duplication and less efficient code, including extra spill
instructions.
Patch by Wolfgang Pieb!
Differential Revision: http://reviews.llvm.org/D11768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255520 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255455 91177308-0d34-0410-b5e6-96231b3b80d8
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255422 91177308-0d34-0410-b5e6-96231b3b80d8
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255387 91177308-0d34-0410-b5e6-96231b3b80d8
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255362 91177308-0d34-0410-b5e6-96231b3b80d8
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given calling convention and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15340
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255353 91177308-0d34-0410-b5e6-96231b3b80d8
Detecting additional dead-defs without a dead flag that are only visible
through liveness information should be part of the register operand
collection not intertwined with the register pressure update logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255192 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.
Differential Revision: http://reviews.llvm.org/D14948
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254874 91177308-0d34-0410-b5e6-96231b3b80d8
The indexList's nodes are all allocated on a BumpPtrAllocator, so it's
more efficient to let them be freed when it goes away, rather than
deleting them directly. This is a follow up to r254794.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254808 91177308-0d34-0410-b5e6-96231b3b80d8
When a `SlotIndexes` is destroyed, `ileAllocator` will currently be
destructed before `IndexList`, but all of `IndexList`'s storage has
been allocated by `ileAllocator`. This means we'll call destructors on
garbage data, which is very bad. This can be avoided by putting the
BumpPtrAllocator earlier in the class than anything it allocates.
Unfortunately, I don't know how to test this. It depends very much on
memory layout, and the only evidence I have that this is actually
happening in practice are backtraces that might be explained by this.
By inspection though, the code is obviously dangerous/wrong, and this
is the right thing to do.
I'll follow up later with a patch that calls clearAndLeakNodesUnsafely
on the list, since there isn't much point in destructing them when
they're allocated in a BPA anyway, but I figured it makes sense to
commit the correctness fix separately from that optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254794 91177308-0d34-0410-b5e6-96231b3b80d8
Now that ScheduleDAGInstrs doesn't need it anymore we can move the field
down the class hierarcy to ScheduleDAGMI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254759 91177308-0d34-0410-b5e6-96231b3b80d8
Re-comitting with a change that avoids undefined uses getting put into
the VRegUses list.
The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
(not used for now)
The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.
Differential Revision: http://reviews.llvm.org/D9068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254683 91177308-0d34-0410-b5e6-96231b3b80d8
Currently in LLVM's cost model, a vectorized arithmetic instruction will have
high cost if its type is split into multiple registers. However, this
punishment is too heavy and unnecessary. The overhead of the split should not
be on arithmetic instructions but instructions that implement the split. Note
that during vectorization we have calculated the register pressure, and we
only choose proper interleaving factor (and also vectorization factor) so
that we don't use more registers than the maximum number.
Here is a very simple example: if a vadd has the cost 1, and if we double VF
so that we need two registers to perform it, then its cost will become 4 with
the current implementation, which will prevent us to use larger VF.
Differential revision: http://reviews.llvm.org/D15159
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254671 91177308-0d34-0410-b5e6-96231b3b80d8
This works mostly fine but breaks some stage 1 builders when compiling
compiler-rt on i386. Revert for further investigation as I can't see an
obvious cause/fix.
This reverts commit r254577.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254586 91177308-0d34-0410-b5e6-96231b3b80d8
The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
(not used for now)
The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.
Differential Revision: http://reviews.llvm.org/D9068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254577 91177308-0d34-0410-b5e6-96231b3b80d8
This call should in fact be made by RegScavenger::enterBasicBlock()
called below. The first call does nothing except for triggering UB,
indicated by UBSan (passing nullptr to memset()).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254548 91177308-0d34-0410-b5e6-96231b3b80d8
vector.resize() is significantly slower than memset in many STLs
and the cost of initializing these vectors is significant on targets
with many registers. Since we don't need the overhead of a vector,
use a simple unique_ptr instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254526 91177308-0d34-0410-b5e6-96231b3b80d8
The @llvm.get.dynamic.area.offset.* intrinsic family is used to get the offset
from native stack pointer to the address of the most recent dynamic alloca on
the caller's stack. These intrinsics are intendend for use in combination with
@llvm.stacksave and @llvm.restore to get a pointer to the most recent dynamic
alloca. This is useful, for example, for AddressSanitizer's stack unpoisoning
routines.
Patch by Max Ostapenko.
Differential Revision: http://reviews.llvm.org/D14983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254404 91177308-0d34-0410-b5e6-96231b3b80d8
(This is the second attempt to submit this patch. The first caused two assertion
failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based
ones.
Differential revision: http://reviews.llvm.org/D14973
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254377 91177308-0d34-0410-b5e6-96231b3b80d8
Nobody was checking the returnvalue of recede()/advance() so we can
simply replace this code with asserts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254371 91177308-0d34-0410-b5e6-96231b3b80d8
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."
Asserts were firing in Chromium builds. See PR25687.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254366 91177308-0d34-0410-b5e6-96231b3b80d8
The patch in http://reviews.llvm.org/D13745 is broken into four parts:
1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.
This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.
All uses of weight-based interfaces are now updated to use probability-based
ones.
Differential revision: http://reviews.llvm.org/D14973
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254348 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements dynamic realignment of stack objects for targets
with a non-realigned stack pointer. Behaviour in FunctionLoweringInfo
is changed so that for a target that has StackRealignable set to
false, over-aligned static allocas are considered to be variable-sized
objects and are handled with DYNAMIC_STACKALLOC nodes.
It would be good to group aligned allocas into a single big alloca as
an optimization, but this is yet todo.
SystemZ benefits from this, due to its stack frame layout.
New tests SystemZ/alloca-03.ll for aligned allocas, and
SystemZ/alloca-04.ll for "no-realign-stack" attribute on functions.
Review and help from Ulrich Weigand and Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254227 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.
Reviewers: MatzeB, andreadb, tstellarAMD
Subscribers: arsenm, jyknight, llvm-commits
Differential Revision: http://reviews.llvm.org/D14945
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254085 91177308-0d34-0410-b5e6-96231b3b80d8
to a simple type when lowering a truncating store of a vector type. In this
case for an EVT we'll return Expand as we should in all of the cases anyhow.
The testcase triggered at the one in VectorLegalizer::LegalizeOp, inspection
found the rest.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254061 91177308-0d34-0410-b5e6-96231b3b80d8
Duplicate a few common definitions between DFAPacketizer.cpp and
DFAPacketizerEmitter.cpp to avoid including files from CodeGen
in TableGen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253820 91177308-0d34-0410-b5e6-96231b3b80d8
MachineInstrBuilder::addDisp can already add an immediate or global address MO with an adjusted offset, this patch adds support for constant pool indices as well.
All remaining MO types still assert - there are a number of other types that could support adjusted offsets but I have no test cases at this time.
Required to fix a regression in D13988 found by Mikael Holmén during stress testing (test case attached).
Differential Revision: http://reviews.llvm.org/D14867
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253795 91177308-0d34-0410-b5e6-96231b3b80d8
Extended DFA tablegen to:
- added "-debug-only dfa-emitter" support to llvm-tblgen
- defined CVI_PIPE* resources for the V60 vector coprocessor
- allow specification of multiple required resources
- supports ANDs of ORs
- e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
(SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)
- added support for combo resources
- allows specifying ORs of ANDs
- e.g. [CVI_XLSHF, CVI_MPY01] means:
(CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)
- increased DFA input size from 32-bit to 64-bit
- allows for a maximum of 4 AND'ed terms of 16 resources
- supported expressions now include:
expression => term [AND term] [AND term] [AND term]
term => resource [OR resource]*
resource => one_resource | combo_resource
combo_resource => (one_resource [AND one_resource]*)
Author: Dan Palermo <dpalermo@codeaurora.org>
kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.
Reapply the previous patch, this time without circular dependencies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253793 91177308-0d34-0410-b5e6-96231b3b80d8
Extended DFA tablegen to:
- added "-debug-only dfa-emitter" support to llvm-tblgen
- defined CVI_PIPE* resources for the V60 vector coprocessor
- allow specification of multiple required resources
- supports ANDs of ORs
- e.g. [SLOT2, SLOT3], [CVI_MPY0, CVI_MPY1] means:
(SLOT2 OR SLOT3) AND (CVI_MPY0 OR CVI_MPY1)
- added support for combo resources
- allows specifying ORs of ANDs
- e.g. [CVI_XLSHF, CVI_MPY01] means:
(CVI_XLANE AND CVI_SHIFT) OR (CVI_MPY0 AND CVI_MPY1)
- increased DFA input size from 32-bit to 64-bit
- allows for a maximum of 4 AND'ed terms of 16 resources
- supported expressions now include:
expression => term [AND term] [AND term] [AND term]
term => resource [OR resource]*
resource => one_resource | combo_resource
combo_resource => (one_resource [AND one_resource]*)
Author: Dan Palermo <dpalermo@codeaurora.org>
kparzysz: Verified AMDGPU codegen to be unchanged on all llc
tests, except those dealing with instruction encodings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253790 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a new API, LTOCodeGenerator::setFileType, to choose the output file
format for LTO CodeGen. A corresponding change to use this new API from
llvm-lto and a test case is coming in a separate commit.
Differential Revision: http://reviews.llvm.org/D14554
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253622 91177308-0d34-0410-b5e6-96231b3b80d8