no-alias with non-addr-taken globals: they cannot alias a captured
pointer.
If the non-global underlying object would have been a capture were it to
alias the global, we can firmly conclude no-alias. It isn't reasonable
for a transformation to introduce a capture in a way observable by an
alias analysis. Consider, even if it were to temporarily capture one
globals address into another global and then restore the other global
afterward, there would be no way for the load in the alias query to
observe that capture event correctly. If it observes it then the
temporary capturing would have changed the meaning of the program,
making it an invalid transformation. Even instrumentation passes or
a pass which is synthesizing stores to global variables to expose race
conditions in programs could not trigger this unless it queried the
alias analysis infrastructure mid-transform, in which case it seems
reasonable to return results from before the transform started.
See the comments in the change for a more detailed outlining of the
theory here.
This should address the primary performance regression found when the
non-conservatively-correct path of the alias query was disabled.
Differential Revision: http://reviews.llvm.org/D11410
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243405 91177308-0d34-0410-b5e6-96231b3b80d8
VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions.
Split off from D11518.
Differential Revision: http://reviews.llvm.org/D11541
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243395 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
Differential Revision: http://reviews.llvm.org/D11477
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243361 91177308-0d34-0410-b5e6-96231b3b80d8
This is effectively an NFC but we can no longer print the index of the
pointer group so instead I print its address. This still lets us
cross-check the section that list the checks against the section that
list the groups (see how I modified the test).
E.g. before we printed this:
Run-time memory checks:
Check 0:
Comparing group 0:
%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
Against group 1:
%arrayidxA = getelementptr i16, i16* %a, i64 %ind
%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
...
Grouped accesses:
Group 0:
(Low: %c High: (78 + %c))
Member: {%c,+,4}<%for.body>
Member: {(2 + %c),+,4}<%for.body>
Now we print this (changes are underlined):
Run-time memory checks:
Check 0:
Comparing group (0x7f9c6040c320):
~~~~~~~~~~~~~~
%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
Against group (0x7f9c6040c358):
~~~~~~~~~~~~~~
%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
%arrayidxA = getelementptr i16, i16* %a, i64 %ind
...
Grouped accesses:
Group 0x7f9c6040c320:
~~~~~~~~~~~~~~
(Low: %c High: (78 + %c))
Member: {(2 + %c),+,4}<%for.body>
Member: {%c,+,4}<%for.body>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243354 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If a scale or a base register can be rewritten as "Zext({A,+,1})" then
LSR will now consider a formula of that form in its normal cost
computation.
Depends on D9180
Reviewers: qcolombet, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9181
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243348 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: WebAssemblySubtarget.cpp expects a default 'generic' CPU to exist, and this seems to be prevalent with other targets. It makes sense to have something between MVP and bleeding-edge, even though for now it's the same as MVP. This removes a warning that's currently generated.
Subscribers: jfb, llvm-commits, sunfish
Differential Revision: http://reviews.llvm.org/D11546
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243345 91177308-0d34-0410-b5e6-96231b3b80d8
This commit serializes the references from the machine basic blocks to the
unnamed basic blocks.
This commit adds a new attribute to the machine basic block's YAML mapping
called 'ir-block'. This attribute contains the actual reference to the
basic block.
Reviewers: Duncan P. N. Exon Smith
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243340 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"
This change re-implements D9784 with the two differences:
1. It does not use SCEVExpander and does not generate new
instructions. Instead, it does a quick local search for existing
`llvm::Value`s that it needs when modifying the `icmp`
instruction.
2. It is more general -- it deals with both increasing and decreasing
induction variables.
I've added all of the tests included with D9784, and two more.
As an example on what this change does (copied from D9784):
Given C code:
```
for (int i = M; i < N; i++) // i is known not to overflow
if (i < 0) break;
a[i] = 0;
}
```
This transformation produces:
```
for (int i = M; i < N; i++)
if (M < 0) break;
a[i] = 0;
}
```
Which can be unswitched into:
```
if (!(M < 0))
for (int i = M; i < N; i++)
a[i] = 0;
}
```
I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine. Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison. I'm open to putting it in its own helper routine if you
think that is better.
Reviewers: reames, nicholas, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11278
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243331 91177308-0d34-0410-b5e6-96231b3b80d8
be reserved.
The decision to reserve x18 is going to be made solely by the front-end,
so it isn't necessary to check if the OS is Darwin in the backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243308 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.
Differential Revision: http://reviews.llvm.org/D11503
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243303 91177308-0d34-0410-b5e6-96231b3b80d8
The pointer size of the addrspacecasted pointer might not have matched,
so this would have hit an assert in accumulateConstantOffset.
I think this was here to allow constant folding of a load of an
addrspacecasted constant. Accumulating the offset through the
addrspacecast doesn't make much sense, so something else is necessary
to allow folding the load through this cast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243300 91177308-0d34-0410-b5e6-96231b3b80d8
Author: Dave Airlie <airlied@redhat.com>
In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.
this should be applicable for llvm 3.7 as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243294 91177308-0d34-0410-b5e6-96231b3b80d8
This commit zeroes out the virtual register references in the machine
function's liveins in the class 'MachineRegisterInfo' when the virtual
register definitions are cleared.
Reviewers: Matthias Braun
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243290 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r242295 with fixes in the implementation.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243271 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.
This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.
No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11524
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243270 91177308-0d34-0410-b5e6-96231b3b80d8
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243253 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.
I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.
Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.
This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.
Reviewers: chandlerc, hfinkel
Subscribers: aschwaighofer, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D9819
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243250 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch improves trivial loop unswitch.
The current trivial loop unswitch only checks if loop header's terminator contains a trivial unswitch condition. But if the loop header only has one reachable successor (due to intentionally or unintentionally missed code simplification), we should consider the successor as part of the loop header. Therefore, instead of stopping at loop header's terminator, we should keep traversing its successors within loop until reach a *real* conditional branch or switch (whose condition can not be constant folded). This change will enable a single -loop-unswitch pass to unswitch multiple trivial conditions (unswitch one trivial condition could open opportunity to unswitch another one in the same loop), while the old implementation can unswitch only one per pass.
Reviewers: reames, broune
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11481
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243203 91177308-0d34-0410-b5e6-96231b3b80d8
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.
This commit fixes this and adds the missing i32 truncate tests.
This fixes rdar://problem/21990703.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243198 91177308-0d34-0410-b5e6-96231b3b80d8
extension property we're requesting - zero or sign extended.
This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243192 91177308-0d34-0410-b5e6-96231b3b80d8
whether register x18 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.
Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11463
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243186 91177308-0d34-0410-b5e6-96231b3b80d8
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are
the only configurations that are properly understood by the backend.
(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)
A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters. If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:
sed -e 's/, arg: 0/, arg: 1/'
or some such, but I hand-updated the ones in tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243183 91177308-0d34-0410-b5e6-96231b3b80d8
This commit serializes the callee saved information from the class
'MachineFrameInfo'. This commit extends the YAML mappings for the fixed and
the ordinary stack objects and adds an optional 'callee-saved-register'
attribute. This attribute is used to serialize the callee save information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243173 91177308-0d34-0410-b5e6-96231b3b80d8
This patch extend LoopReroll pass to hand the loops which
is similar to the following:
while (len > 1) {
sum4 += buf[len];
sum4 += buf[len-1];
len -= 2;
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243171 91177308-0d34-0410-b5e6-96231b3b80d8
This commit serializes the virtual register allocations hints of type 0.
These hints specify the preferred physical registers for allocations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243156 91177308-0d34-0410-b5e6-96231b3b80d8
The names for instructions inserted were previous dependent on iteration order. By deriving the names from the original instructions, we can avoid instability in tests without resorting to ordered traversals. It also makes the IR mildly easier to read at large scale.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243140 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds the liveins and successors properties to machine basic blocks
in some of the MIR tests to ensure that the tests will pass when the MIR parser
will run the machine verifier after initializing a machine function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243124 91177308-0d34-0410-b5e6-96231b3b80d8
This commit moves and transforms the generic test
'CodeGen/MIR/successor-basic-blocks.mir' into an X86 specific test
'CodeGen/MIR/X86/successor-basic-blocks.mir'. This change is required in order
to enable the machine verifier for the MIR parser, as the machine verifier
verifies that the machine basic blocks contain instructions that actually
determine the machine basic block successors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243123 91177308-0d34-0410-b5e6-96231b3b80d8
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Differential Revision: http://reviews.llvm.org/D11407
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243103 91177308-0d34-0410-b5e6-96231b3b80d8
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
Differential Revision: http://reviews.llvm.org/D11408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243100 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows llvm-dsymutil to read universal (aka fat) macho object
files and archives. The patch touches nearly everything in the BinaryHolder,
but it is fairly mechinical: the methods that returned MemoryBufferRefs or
ObjectFiles now return a vector of those, and the high-level access function
takes a triple argument to select the architecture.
There is no support yet for handling fat executables and thus no support for
writing fat object files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243096 91177308-0d34-0410-b5e6-96231b3b80d8
MachOObjectFile offers a method for detecting the correct triple, use
it instead of the previous approximation. This doesn't matter right
now, but it will become important for mach-o universal (fat) binaries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243095 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10206
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243084 91177308-0d34-0410-b5e6-96231b3b80d8
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.
The machine-combiner pass asks machine trace metrics to calculate an instruction trace,
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.
On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.
Differential Revision: http://reviews.llvm.org/D11423
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243057 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Scalarizer has two data structures that hold information about changes
to the function, Gathered and Scattered. These are cleared in finish()
at the end of runOnFunction() if finish() detects any changes to the
function.
However, finish() was checking for changes by only checking if
Gathered was non-empty. The function visitStore() only modifies
Scattered without touching Gathered. As a result, Scattered could have
ended up having stale data if Scalarizer only scalarized store
instructions. Since the data in Scattered is used during the execution
of the pass, this introduced dangling pointer errors.
The fix is to check whether both Scattered and Gathered are empty
before deciding what to do in finish(). This also fixes a problem
where the Function can be modified although the pass returns false.
Reviewers: rnk
Subscribers: rnk, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D10459
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243040 91177308-0d34-0410-b5e6-96231b3b80d8
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of
the push instructions did not have the right mayLoad/mayStore flags.
Differential Revision: http://reviews.llvm.org/D11340
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243010 91177308-0d34-0410-b5e6-96231b3b80d8
We currently version `__asan_init` and when the ABI version doesn't match, the linker gives a `undefined reference to '__asan_init_v5'` message. From this, it might not be obvious that it's actually a version mismatch error. This patch makes the error message much clearer by changing the name of the undefined symbol to be `__asan_version_mismatch_check_xxx` (followed by the version string). We obviously don't want the initializer to be named like that, so it's a separate symbol that is used only for the purpose of version checking.
Reviewed at http://reviews.llvm.org/D11004
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243003 91177308-0d34-0410-b5e6-96231b3b80d8
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.
Differential Revision: http://reviews.llvm.org/D11413
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242994 91177308-0d34-0410-b5e6-96231b3b80d8
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.
It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242992 91177308-0d34-0410-b5e6-96231b3b80d8
The debug map contains the timestamp of the object files in references.
We do not check these in the general case, but it's really useful if
you have archives where different versions of an object file have been
appended. This allows llvm-dsymutil to find the right one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242965 91177308-0d34-0410-b5e6-96231b3b80d8
The MSVC ABI requires that we generate an alias for the vtable which
means looking through a GlobalAlias which cannot be overridden improves
our ability to devirtualize.
Found while investigating PR20801.
Patch by Andrew Zhogin!
Differential Revision: http://reviews.llvm.org/D11306
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242955 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Add a basic CodeGen bitcode test which (for now) only prints out the function name and nothing else. The current code merely implements the basic needed for the test run to not crash / assert. Getting to that point required:
- Basic InstPrinter.
- Basic AsmPrinter.
- DiagnosticInfoUnsupported (not strictly required, but nice to have, duplicated from AMDGPU/BPF's ISelLowering).
- Some SP and register setup in WebAssemblyTargetLowering.
- Basic LowerFormalArguments.
- GenInstrInfo.
- Placeholder LowerFormalArguments.
- Placeholder CanLowerReturn and LowerReturn.
- Basic DAGToDAGISel::Select, which requiresGenDAGISel.inc as well as GET_INSTRINFO_ENUM with GenInstrInfo.inc.
- Remove WebAssemblyFrameLowering::determineCalleeSaves and rely on default.
- Implement WebAssemblyFrameLowering::hasFP, same as AArch64's implementation.
Follow-up patches will implement a real AsmPrinter, which will require adding MI opcodes specific to WebAssembly.
Reviewers: sunfish
Subscribers: aemerson, jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D11369
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242939 91177308-0d34-0410-b5e6-96231b3b80d8
Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap.
Related to <rdar://problem/20821730>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242908 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, a load from an alloca that is used in as single block and is not preceded
by a store is replaced by undef. This is not always correct if the single block is
inside a loop.
Fix the logic so that:
1) If there are no stores in the block, replace the load with an undef, as before.
2) If there is a store (regardless of where it is in the block w.r.t the load), bail
out, and let the rest of mem2reg handle this alloca.
Patch by: gil.rapaport@intel.com
Differential Revision: http://reviews.llvm.org/D11355
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242884 91177308-0d34-0410-b5e6-96231b3b80d8
In r242510, non-instrumented allocas are now moved into the first basic block. This patch limits that to only move allocas that are present *after* the first instrumented one (i.e. only move allocas up). A testcase was updated to show behavior in these two cases. Without the patch, an alloca could be moved down, and could cause an invalid IR.
Differential Revision: http://reviews.llvm.org/D11339
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242883 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The current code in LoopUnswtich::processCurrentLoop() mixes trivial loop unswitch and non-trivial loop unswitch together. It goes over all basic blocks in the loop and checks if a condition is trivial or non-trivial unswitch condition. However, trivial unswitch condition can only occur in the loop header basic block (where it controls whether or not the loop does something at all). This refactoring separate trivial loop unswitch and non-trivial loop unswitch. Before going over all basic blocks in the loop, it checks if the loop header contains a trivial unswitch condition. If so, unswitch it. Otherwise, go over all blocks like before but don't check trivial condition any more since they are not possible to be in the other blocks. This code has no functionality change.
Reviewers: meheff, reames, broune
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242873 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
MCRegAliasIterator only works for physical registers. So, do not run it
on virtual registers.
With this issue fixed, we can resurrect the BranchFolding pass in NVPTX
backend.
Reviewers: jholewinski, bkramer
Subscribers: henryhu, meheff, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242871 91177308-0d34-0410-b5e6-96231b3b80d8
types and loads, loads or stores widened past the size of an alloca,
etc.
This started off with a bug report about big-endian behavior with
bitfields and loads and stores to a { i32, i24 } struct. An initial
attempt to fix this was sent for review in D10357, but that didn't
really get to the root of the problem.
The core issue was that canConvertValue and convertValue in SROA were
handling different bitwidth integers by doing a zext of the integer. It
wouldn't do a trunc though, only a zext! This would in turn lead SROA to
form an i24 load from an i24 alloca, zext it to i32, and then use it.
This would at least produce the wrong value for big-endian systems.
One of my many false starts here was to correct the computation for
big-endian systems by shifting. But this doesn't actually work because
the original code has a 64-bit store to the entire 8 bytes, and a 32-bit
load of the last 4 bytes, and because the alloc size is 8 bytes, we
can't lose that last (least significant if bigendian) byte! The real
problem here is that we're forming an i24 load in SROA which is actually
not sufficiently wide to load all of the necessary bits here. The source
has an i32 load, and SROA needs to form that as well.
The straightforward way to do this is to disable the zext logic in
canConvertValue and convertValue, forcing us to actually load all
32-bits. This seems like a really good change, but it in turn breaks
several other parts of SROA.
First in the chain of knock-on failures, we had places where we were
doing integer-widening promotion even though some of the integer loads
or stores extended *past the end* of the alloca's memory! There was even
a comment about preventing this, but it only prevented the case where
the type had a different bit size from its store size. So I added checks
to handle the cases where we actually have a widened load or store and
to avoid trying to special integer widening promotion in those cases.
Second, we actually rely on the ability to promote in the face of loads
past the end of an alloca! This is important so that we can (for
example) speculate loads around PHI nodes to do more promotion. The bits
loaded are garbage, but as long as they aren't used and the alignment is
suitable high (which it wasn't in the test case!) this is "fine". And we
can't stop promoting here, lots of things stop working well if we do. So
we need to add specific logic to handle the extension (and truncation)
case, but *only* where that extension or truncation are over bytes that
*are outside the alloca's allocated storage* and thus totally bogus to
load or store.
And of course, once we add back this correct handling of extension or
truncation, we need to correctly handle bigendian systems to avoid
re-introducing the exact bug that started us off on this chain of misery
in the first place, but this time even more subtle as it only happens
along speculated loads atop a PHI node.
I've ported an existing test for PHI speculation to the big-endian test
file and checked that we get that part correct, and I've added several
more interesting big-endian test cases that should help check that we're
getting this correct.
Fun times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242869 91177308-0d34-0410-b5e6-96231b3b80d8
This optimization allows the DWARF linker to reuse definition of
types it has emitted in previous CUs rather than reemitting them
in each CU that references them. The size and link time gains are
huge. For example when linking the DWARF for a debug build of
clang, this generates a ~150M dwarf file instead of a ~700M one
(the numbers date back a bit and must not be totally accurate
these days).
As with all the other parts of the llvm-dsymutil codebase, the
goal is to keep bit-for-bit compatibility with dsymutil-classic.
The code is littered with a lot of FIXMEs that should be
addressed once we can get rid of the compatibilty goal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242847 91177308-0d34-0410-b5e6-96231b3b80d8
This commit begins serialization of the CFI index machine operands by
serializing one kind of CFI instruction - the .cfi_def_cfa_offset instruction.
Reviewers: Duncan P. N. Exon Smith
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242845 91177308-0d34-0410-b5e6-96231b3b80d8