So that it has access to getOrCreateGlobalVariableDIE. If we ever support
decsribing using directive in C++ classes (thus requiring support in type
units), it will certainly use another mechanism anyway.
Differential Revision: http://reviews.llvm.org/D5975
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220594 91177308-0d34-0410-b5e6-96231b3b80d8
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220578 91177308-0d34-0410-b5e6-96231b3b80d8
It was only being used as a flag to identify the lack of debug info from
within endModule - use the section labels for that instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220575 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for legalization of instructions of the form:
[fp_conv] <1 x i1> %op to <1 x double>
where fp_conv is one of fpto[us]i, [us]itofp. This used to assert
because they were simply missing from the vector operand scalarizer.
A similar problem arose in r190830, with trunc instead.
Fixes PR20778.
Differential Revision: http://reviews.llvm.org/D5810
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220533 91177308-0d34-0410-b5e6-96231b3b80d8
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.
The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.
Should fix PR20376.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220529 91177308-0d34-0410-b5e6-96231b3b80d8
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.
The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.
(though, let's see what the buildbots have to say about this - *crosses
fingers*)
There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220527 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug (introduced by fixing the IR emitted from Clang where
the definition of a static member would be scoped within the class,
rather than within its lexical decl context) where the definition of a
static variable would be placed inside a class.
It also improves source fidelity by scoping static class member
definitions inside the lexical decl context in which tehy are written
(eg: namespace n { class foo { static int i; } int foo::i; } - the
definition of 'i' will be within the namespace 'n' in the DWARF output
now).
Lastly, and the original goal, this reduces debug info size slightly
(and makes debug info easier to read, etc) by placing the definitions of
non-member global variables within their namespace, rather than using a
separate namespace-scoped declaration along with a definition at global
scope.
Based on patches and discussion with Frédéric.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220497 91177308-0d34-0410-b5e6-96231b3b80d8
Variable handling will be sunk into DwarfFile so that abstract variables
and the like can be shared across multiple CUs (to handle cross-CU
inlining, for example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220453 91177308-0d34-0410-b5e6-96231b3b80d8
Use the DwarfDebug in one function that previously took it as a
parameter, and lay the foundation for use this for other operations
coming soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220452 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we're sure the only root (non-abstract) scope is the current
function scope, there's no need for isCurrentFunctionScope, the property
can be tested directly instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220451 91177308-0d34-0410-b5e6-96231b3b80d8
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220321 91177308-0d34-0410-b5e6-96231b3b80d8
As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220302 91177308-0d34-0410-b5e6-96231b3b80d8
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220256 91177308-0d34-0410-b5e6-96231b3b80d8
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types:
+ MD_nontemporal = 9, // "nontemporal"
+ MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access"
+ MD_nonnull = 11 // "nonnull"
I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220248 91177308-0d34-0410-b5e6-96231b3b80d8
loosely based on linear scan.
On x86-64 this is good for a ~2% drop in compile time on the nightly test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220143 91177308-0d34-0410-b5e6-96231b3b80d8
TL;DR: Indexing maps with [] creates missing entries.
The long version:
When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca.
On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots.
PEI would later trigger an assert because it expects the stack protector to not be dead.
This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken.
rdar://problem/18672951
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220099 91177308-0d34-0410-b5e6-96231b3b80d8
v2: use dyn_cast
fixup comments
v3: use cast
Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220044 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
It happens to mostly work for the other targets because they are extremely
conservative, but Power for example had to switch to AtomicExpand to be
able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
x.store(1);
Thread 1:
y.store(1);
Thread 2:
r1 = x.load();
r2 = y.load();
Thread 3:
r3 = y.load();
r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..
This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.
Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.
Test Plan: make check-all, no functional change
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219957 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.
This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.
Thanks Alexey Volkov for reporting PR21115!
Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.
Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.
Updated an affected test in X86.
Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.
Reviewers: Jiangning, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D5633
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219773 91177308-0d34-0410-b5e6-96231b3b80d8
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.
Examples:
1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
to b.<invCC>
2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
to b.<CC>
rdar://problem/18506500
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219742 91177308-0d34-0410-b5e6-96231b3b80d8
Let me tell you a tale...
Originally committed in r211723 after discovering a nasty case of weird
scoping due to inlining, this was reverted in r211724 after it fired in
ASan/compiler-rt.
(minor diversion where I accidentally committed/reverted again in
r211871/r211873)
After further testing and fixing bugs in ArgumentPromotion (r211872) and
Inlining (r212065) it was recommitted in r212085. Reverted in r212089
after the sanitizer buildbots still showed problems.
Fixed another bug in ArgumentPromotion (r212128) found by this
assertion.
Recommitted in r212205, reverted in r212226 after it crashed some more
on sanitizer buildbots.
Fix clang some more in r212761.
Recommitted in r212776, reverted in r212793. ASan failures.
Recommitted in r213391, reverted in r213432, trying to reproduce flakey
ASan build failure.
Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952
(LiveDebugVariables strips dbg_value intrinsics in functions not
described by debug info).
Recommitted in r214761, reverted in r214999, flakey failure on Windows
buildbot.
Fixed DeadArgElimination + DebugInfo bug in r219210.
Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic
properties in the test-suite on Darwin.
Fixed ObjC++ atomic properties issue in Clang in r219690.
[This commit is provided 'as is' with no hope that this is the last time
I commit this change either expressed or implied]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219702 91177308-0d34-0410-b5e6-96231b3b80d8
If we figure out why they should be here, let's add some testing of some
kind so we can better demonstrate why it's needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219694 91177308-0d34-0410-b5e6-96231b3b80d8
and TargetRegisterInfo in the peephole optimizer. This
makes it easier to grab subtarget dependent variables off
of the MachineFunction rather than the TargetMachine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219669 91177308-0d34-0410-b5e6-96231b3b80d8
scheduler or via the SelectionDAG if available. Otherwise
grab the subtarget off of the MachineFunction by going up
the parent chain.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219666 91177308-0d34-0410-b5e6-96231b3b80d8