This already helps SSE2 x86 a lot because it lacks an efficient way to
represent a vector select. The long term goal is to enable the backend to match
a canonicalized pattern into a single instruction (e.g. vabs or pabs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180597 91177308-0d34-0410-b5e6-96231b3b80d8
Mips have delayslots for certain instructions
like jumps and branches. These are instructions
that follow the branch or jump and are executed
before the jump or branch is completed.
Early Mips compilers could not cope with delayslots
and left them up to the assembler. The assembler would
fill the delayslots with the appropriate instruction,
usually just a nop to allow correct runtime behavior.
The default behavior for this is set with .set reorder.
To tell the assembler that you don't want it to mess with
the delayslot one used .set noreorder.
For backwards compatibility we need to support
.set reorder and have it be the default behavior in the
assembler.
Our support for it is to insert a NOP directly after an
instruction with a delayslot when in .set reorder mode.
Contributer: Vladimir Medic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180584 91177308-0d34-0410-b5e6-96231b3b80d8
Pattern has source location by itself. After adding a trivial method to
retrieve it, it's unnecessary to pair a source location for CHECK-NOT patterns.
One thing revised after this is the diagnostic info is more accurate by
pointing to the start of the CHECK-NOT pattern instead of the end of the
CHECK-NOT pattern. E.g. diagnostic message previously looks like
<stdin>:1:1: error: CHECK-NOT: string occurred!
test
^
test.txt:1:16: note: CHECK-NOT: pattern specified here
CHECK-NOT: test
^
is changed to
<stdin>:1:1: error: CHECK-NOT: string occurred!
test
^
test.txt:1:12: note: CHECK-NOT: pattern specified here
CHECK-NOT: test
^
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180578 91177308-0d34-0410-b5e6-96231b3b80d8
latency for certain models of the Intel Atom family, by converting
instructions into their equivalent LEA instructions, when it is both
useful and possible to do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180573 91177308-0d34-0410-b5e6-96231b3b80d8
getRelocationAddress is for dynamic libraries and executables,
getRelocationOffset for relocatable objects.
Mark the getRelocationAddress of COFF and MachO as not implemented yet. Add a
test of ELF's. llvm-readobj -r now prints the same values as readelf -r.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180259 91177308-0d34-0410-b5e6-96231b3b80d8
While here, don't report a dummy symbol for relocations that don't have symbols.
We used to says such relocations were for the first defined symbol, but now we
return end_symbols(). The llvm-readobj output change agrees with otool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180214 91177308-0d34-0410-b5e6-96231b3b80d8
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.
Patch by Daisuke Takahashi!
Fixes PR15758.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180196 91177308-0d34-0410-b5e6-96231b3b80d8
For now, we just reschedule instructions that use the copied vregs and
let regalloc elliminate it. I would really like to eliminate the
copies on-the-fly during scheduling, but we need a complete
implementation of repairIntervalsInRange() first.
The general strategy is for the register coalescer to eliminate as
many global copies as possible and shrink live ranges to be
extended-basic-block local. The coalescer should not have to worry
about resolving local copies (e.g. it shouldn't attemp to reorder
instructions). The scheduler is a much better place to deal with local
interference. The coalescer side of this equation needs work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180193 91177308-0d34-0410-b5e6-96231b3b80d8
debug location. This solves a problem where range of an inlined
subroutine is emitted wrongly.
Patch by Manman Ren.
Fixes rdar://problem/12415623
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180140 91177308-0d34-0410-b5e6-96231b3b80d8
This makes llvm-dwarfdump and llvm-symbolizer understand
debug info sections compressed by ld.gold linker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180088 91177308-0d34-0410-b5e6-96231b3b80d8
even if erroneously annotated with the parallel loop metadata.
Fixes Bug 15794:
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180081 91177308-0d34-0410-b5e6-96231b3b80d8
Also add a check for llvm.used in the verifier and simplify clients now that
they can assume they have a ConstantArray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180019 91177308-0d34-0410-b5e6-96231b3b80d8
-- C.4 and C.5 statements, when NSAA is not equal to SP.
-- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a
variadic procedure.
Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are
some exceptions in AAPCS.
1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated
CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs.
2. Check that for VA functions all params uses GPRs and then stack.
No exceptions, no CPRCs here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180011 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll
I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179996 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.
Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again. For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
%inlo = v4i32 extract_subvector %in, 0
%inhi = v4i32 extract_subvector %in, 4
%lo16 = v4i16 trunc v4i32 %inlo
%hi16 = v4i16 trunc v4i32 %inhi
%in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
%res = v8i8 trunc v8i16 %in16
This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.
Update the ARMTargetTransformInfo to take this improved legalization
into account.
Consider the simplified IR:
define <16 x i8> @test1(<16 x i32>* %ap) {
%a = load <16 x i32>* %ap
%tmp = trunc <16 x i32> %a to <16 x i8>
ret <16 x i8> %tmp
}
define <8 x i8> @test2(<8 x i32>* %ap) {
%a = load <8 x i32>* %ap
%tmp = trunc <8 x i32> %a to <8 x i8>
ret <8 x i8> %tmp
}
Previously, we would generate the truly hideous:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #20
bic sp, sp, #7
add r1, r0, #48
add r2, r0, #32
vld1.64 {d24, d25}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
vld1.64 {d18, d19}, [r2:128]
add r1, r0, #16
vmovn.i32 d22, q8
vld1.64 {d16, d17}, [r1:128]
vmovn.i32 d20, q9
vmovn.i32 d18, q12
vmov.u16 r0, d22[3]
strb r0, [sp, #15]
vmov.u16 r0, d22[2]
strb r0, [sp, #14]
vmov.u16 r0, d22[1]
strb r0, [sp, #13]
vmov.u16 r0, d22[0]
vmovn.i32 d16, q8
strb r0, [sp, #12]
vmov.u16 r0, d20[3]
strb r0, [sp, #11]
vmov.u16 r0, d20[2]
strb r0, [sp, #10]
vmov.u16 r0, d20[1]
strb r0, [sp, #9]
vmov.u16 r0, d20[0]
strb r0, [sp, #8]
vmov.u16 r0, d18[3]
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
vldmia sp, {d16, d17}
vmov r0, r1, d16
vmov r2, r3, d17
mov sp, r7
pop {r7}
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
push {r7}
mov r7, sp
sub sp, sp, #12
bic sp, sp, #7
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d20, d21}, [r0:128]
vmovn.i32 d18, q8
vmov.u16 r0, d18[3]
vmovn.i32 d16, q10
strb r0, [sp, #3]
vmov.u16 r0, d18[2]
strb r0, [sp, #2]
vmov.u16 r0, d18[1]
strb r0, [sp, #1]
vmov.u16 r0, d18[0]
strb r0, [sp]
vmov.u16 r0, d16[3]
strb r0, [sp, #7]
vmov.u16 r0, d16[2]
strb r0, [sp, #6]
vmov.u16 r0, d16[1]
strb r0, [sp, #5]
vmov.u16 r0, d16[0]
strb r0, [sp, #4]
ldm sp, {r0, r1}
mov sp, r7
pop {r7}
bx lr
Now, however, we generate the much more straightforward:
.syntax unified
.section __TEXT,__text,regular,pure_instructions
.globl _test1
.align 2
_test1: @ @test1
@ BB#0:
add r1, r0, #48
add r2, r0, #32
vld1.64 {d20, d21}, [r0:128]
vld1.64 {d16, d17}, [r1:128]
add r1, r0, #16
vld1.64 {d18, d19}, [r2:128]
vld1.64 {d22, d23}, [r1:128]
vmovn.i32 d17, q8
vmovn.i32 d16, q9
vmovn.i32 d18, q10
vmovn.i32 d19, q11
vmovn.i16 d17, q8
vmovn.i16 d16, q9
vmov r0, r1, d16
vmov r2, r3, d17
bx lr
.globl _test2
.align 2
_test2: @ @test2
@ BB#0:
vld1.64 {d16, d17}, [r0:128]
add r0, r0, #16
vld1.64 {d18, d19}, [r0:128]
vmovn.i32 d16, q8
vmovn.i32 d17, q9
vmovn.i16 d16, q8
vmov r0, r1, d16
bx lr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8