Add an additional based relocation to the enumeration of based relocation names.
The lack of the enumerator value causes issues when inspecting WoA binaries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226314 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This change ended up being slightly more controversial than expected. Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions.
Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance.
In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.)
Differential Revision: http://reviews.llvm.org/D6811
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226311 91177308-0d34-0410-b5e6-96231b3b80d8
Searching all of the existing gc.root implementations I'm aware of (all three of them), there was exactly one use of this mechanism, and that was to implement a performance improvement that should have been applied to the default lowering.
Having this function is requiring a dependency on a CodeGen class (MachineFunction), in a class which is otherwise completely independent of CodeGen. I could solve this differently, but given that I see absolutely no value in preserving this mechanism, I going to just get rid of it.
Note: Tis is the first time I'm intentionally breaking previously supported gc.root functionality. Given 3.6 has branched, I believe this is a good time to do this.
Differential Revision: http://reviews.llvm.org/D7004
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226305 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to the unaligned cases.
Test was generated with update_llc_test_checks.py.
Part of <rdar://problem/17688758>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226296 91177308-0d34-0410-b5e6-96231b3b80d8
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.
The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).
However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.
In the example below:
//
__m128 test(__m128 a, __m128 *b) {
__m128 c = _mm_insert_ps(a, *b, 1 << 6);
return c;
}
//
Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:
movss 4(%rdi), %xmm1
insertps $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].
With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:
movaps (%rdi), %xmm1
insertps $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226277 91177308-0d34-0410-b5e6-96231b3b80d8
The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations.
This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach).
More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point.
Differential Revision: http://reviews.llvm.org/D6932
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226264 91177308-0d34-0410-b5e6-96231b3b80d8
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226247 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r226173, adding r226038 back.
No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.
Original message:
Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.
Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226242 91177308-0d34-0410-b5e6-96231b3b80d8
removing the macho-archive-headers.test added with r226228 that it is
failing on for now while I try to figure out what is going on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226241 91177308-0d34-0410-b5e6-96231b3b80d8
IRCE eliminates range checks of the form
0 <= A * I + B < Length
by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment. As an
example, IRCE will convert
len = < known positive >
for (i = 0; i < n; i++) {
if (0 <= i && i < len) {
do_something();
} else {
throw_out_of_bounds();
}
}
to
len = < known positive >
limit = smin(n, len)
// no first segment
for (i = 0; i < limit; i++) {
if (0 <= i && i < len) { // this check is fully redundant
do_something();
} else {
throw_out_of_bounds();
}
}
for (i = limit; i < n; i++) {
if (0 <= i && i < len) {
do_something();
} else {
throw_out_of_bounds();
}
}
IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).
Currently IRCE does not do any profitability analysis. That is a
TODO.
Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline. Having said that, I will love
to get feedback and general input from people interested in trying
this out.
This pass was originally r226201. It was reverted because it used C++
features not supported by MSVC 2012.
Differential Revision: http://reviews.llvm.org/D6693
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226238 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions with 1 operand can still use source modifiers,
so make sure we don't print an extra comma afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226226 91177308-0d34-0410-b5e6-96231b3b80d8
be exported from a dylib if their containing object file were linked into one.
No test case: No command line tools query this flag, and there are no Object
unit tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226217 91177308-0d34-0410-b5e6-96231b3b80d8
The change used C++11 features not supported by MSVC 2012. I will fix
the change to use things supported MSVC 2012 and recommit shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226216 91177308-0d34-0410-b5e6-96231b3b80d8
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.
Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).
Consider this simple test:
$ cat call.c
typedef void (*fp)();
void bar(fp x) {
for (int i = 0; i < 1600000000; ++i)
x();
}
$ cat main.c
typedef void (*fp)();
void bar(fp x);
void foo() {}
int main() {
bar(foo);
}
On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.
The difference on the POWER7 is smaller. Compiling with:
gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]
clang -O3 -mcpu=native call.c main.c : ~5.3 seconds
clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
(looks like we'd benefit from additional loop unrolling here, as a first
guess, because this is faster with the extra loads)
The -mno-invariant-function-descriptors will be added to Clang shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226207 91177308-0d34-0410-b5e6-96231b3b80d8
IRCE eliminates range checks of the form
0 <= A * I + B < Length
by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment. As an
example, IRCE will convert
len = < known positive >
for (i = 0; i < n; i++) {
if (0 <= i && i < len) {
do_something();
} else {
throw_out_of_bounds();
}
}
to
len = < known positive >
limit = smin(n, len)
// no first segment
for (i = 0; i < limit; i++) {
if (0 <= i && i < len) { // this check is fully redundant
do_something();
} else {
throw_out_of_bounds();
}
}
for (i = limit; i < n; i++) {
if (0 <= i && i < len) {
do_something();
} else {
throw_out_of_bounds();
}
}
IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).
Currently IRCE does not do any profitability analysis. That is a
TODO.
Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline. Having said that, I will love
to get feedback and general input from people interested in trying
this out.
Differential Revision: http://reviews.llvm.org/D6693
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226201 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r226071 with fixes. Two fixes:
1. We need to manually remove the old and create the new 'deaf defs'
associated with physical register definitions when we move the definition of
the physical register from the copy point to the point of the original vreg def.
This problem was picked up by the machinstr verifier, and could trigger a
verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
turned on the verifier in the tests.
2. When moving the def point of the phys reg up, we need to make sure that it
is neither defined nor read in between the two instructions. We don't, however,
extend the live ranges of phys reg defs to cover uses, so just checking for
live-range overlap between the pair interval and the phys reg aliases won't
pick up reads. As a result, we manually iterate over the range and check for
reads.
A test soon to be committed to the PowerPC backend will test this change.
Original commit message:
[RegisterCoalescer] Remove copies to reserved registers
This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:
<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>
(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )
Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).
An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226200 91177308-0d34-0410-b5e6-96231b3b80d8
Use static functions for helpers rather than static member functions. a) this changes the linking (minor at best), and b) this makes it obvious no object state is involved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226198 91177308-0d34-0410-b5e6-96231b3b80d8