sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.
For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205146 91177308-0d34-0410-b5e6-96231b3b80d8
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205106 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.
Everything will be easier with the target in-tree though, hence this
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205075 91177308-0d34-0410-b5e6-96231b3b80d8
Emit 32-bit register names instead of 64-bit register names if the target does
not have 64-bit general purpose registers.
<rdar://problem/14653996>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205067 91177308-0d34-0410-b5e6-96231b3b80d8
WinCOFF cannot form PC relative relocations to support absolute
MCValues. We should reenable this once WinCOFF supports emission of
IMAGE_REL_I386_REL32 relocations.
This fixes PR19272.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205058 91177308-0d34-0410-b5e6-96231b3b80d8
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205046 91177308-0d34-0410-b5e6-96231b3b80d8
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205041 91177308-0d34-0410-b5e6-96231b3b80d8
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
%VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)
This adds a small cleanup pass to remove these prior to post-RA scheduling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204980 91177308-0d34-0410-b5e6-96231b3b80d8
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204971 91177308-0d34-0410-b5e6-96231b3b80d8
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204963 91177308-0d34-0410-b5e6-96231b3b80d8
This adds back r204781.
Original message:
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204934 91177308-0d34-0410-b5e6-96231b3b80d8
I've not yet updated PPCTTI because I'm not sure what the actual relative cost
is compared to the aligned uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204848 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204838 91177308-0d34-0410-b5e6-96231b3b80d8
In some cases it is possible for CGP to attempt to reuse a base address from
another basic block. In those cases we have to be sure that all the address
math was either done at the same bit width, or that none of it overflowed
before it was extended.
Patch by Louis Gerbarg <lgg@apple.com>
rdar://16307442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204833 91177308-0d34-0410-b5e6-96231b3b80d8
> For functions where esi is used as base pointer, we would previously fall ba
> from lowering memcpy with "rep movs" because that clobbers esi.
>
> With this patch, we just store esi in another physical register, and restore
> it afterwards. This adds a little bit of register preassure, but the more
> efficient memcpy should be worth it.
>
> Differential Revision: http://llvm-reviews.chandlerc.com/D2968
This didn't work. I was ending up with code like this:
lea edi,[esi+38h]
mov ecx,0Fh
mov edx,esi
mov esi,ebx
rep movs dword ptr es:[edi],dword ptr [esi]
lea ecx,[esi+74h] <-- Ooops, we're now using esi before restoring it from edx.
add ebx,3Ch
mov esi,edx
I guess if we want to do this we need stronger glue or something, or doing the expansion
much later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204829 91177308-0d34-0410-b5e6-96231b3b80d8
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.
This fixes the lowering for v2f64 VSELECT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204828 91177308-0d34-0410-b5e6-96231b3b80d8
We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204813 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing the LLVM part of the call to __builtin___clear_cache
which translates into an intrinsic @llvm.clear_cache and is lowered
by each target, either to a call to __clear_cache or nothing at all
incase the caches are unified.
Updating LangRef and adding some tests for the implemented architectures.
Other archs will have to implement the method in case this builtin
has to be compiled for it, since the default behaviour is to bail
unimplemented.
A Clang patch is required for the builtin to be lowered into the
llvm intrinsic. This will be done next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204802 91177308-0d34-0410-b5e6-96231b3b80d8
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204801 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204784 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are essentially the same as their Altivec counterparts, but
have access to the larger VSX register file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204782 91177308-0d34-0410-b5e6-96231b3b80d8
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204781 91177308-0d34-0410-b5e6-96231b3b80d8
Adds the different broadcast instructions to the ReplaceableInstrsAVX2 table.
That way the ExeDepsFix pass can take better decisions when AVX2 broadcasts are
across domain (int <-> float).
In particular, prior to this patch we were generating:
vpbroadcastd LCPI1_0(%rip), %ymm2
vpand %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0 ## <- domain change penalty
Now, we generate the following nice sequence where everything is in the float
domain:
vbroadcastss LCPI1_0(%rip), %ymm2
vandps %ymm2, %ymm0, %ymm0
vmaxps %ymm1, %ymm0, %ymm0
<rdar://problem/16354675>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204770 91177308-0d34-0410-b5e6-96231b3b80d8
The VSX instruction set has two types of FMA instructions: A-type (where the
addend is taken from the output register) and M-type (where one of the product
operands is taken from the output register). This adds a small pass that runs
just after MI scheduling (and, thus, just before register allocation) that
mutates A-type instructions (that are created during isel) into M-type
instructions when:
1. This will eliminate an otherwise-necessary copy of the addend
2. One of the product operands is killed by the instruction
The "right" moment to make this decision is in between scheduling and register
allocation, because only there do we know whether or not one of the product
operands is killed by any particular instruction. Unfortunately, this also
makes the implementation somewhat complicated, because the MIs are not in SSA
form and we need to preserve the LiveIntervals analysis.
As a simple example, if we have:
%vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
...
%vreg9<def,tied1> = XSMADDADP %vreg9<tied0>, %vreg17, %vreg19,
%RM<imp-use>; VSLRC:%vreg9,%vreg17,%vreg19
...
We can eliminate the copy by changing from the A-type to the
M-type instruction. This means:
%vreg5<def,tied1> = XSMADDADP %vreg5<tied0>, %vreg17, %vreg16,
%RM<imp-use>; VSLRC:%vreg5,%vreg17,%vreg16
is replaced by:
%vreg16<def,tied1> = XSMADDMDP %vreg16<tied0>, %vreg18, %vreg9,
%RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9
and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204768 91177308-0d34-0410-b5e6-96231b3b80d8
This used to resort to splitting the 256-bit operation into two 128-bit
shuffles and then recombining the results.
Fixes <rdar://problem/16167303>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204735 91177308-0d34-0410-b5e6-96231b3b80d8
This is supposed to have the same store size and alignment as <4 x i32>,
but currently is split into a 64-bit and 32-bit store.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204729 91177308-0d34-0410-b5e6-96231b3b80d8
This is a pretty straight forward translation for COFF, we just need to
stick the data in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.
N.B. We must be careful to avoid sticking entities with private linkage
in COMDAT groups. COFF is pretty hostile to the renaming of entities so
we must be careful to disallow GlobalVariables with unstable names.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204703 91177308-0d34-0410-b5e6-96231b3b80d8
When register allocator's stage is RS_Spill, we choose spill over using the CSR
for the first time, if the spill cost is lower than CSRCost.
When register allocator's stage is < RS_Split, we choose pre-splitting over
using the CSR for the first time, if the cost of splitting is lower than
CSRCost.
CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
default value is 0 to generate the same codes as before this commit.
With a value of 15 (1 << 14 is the entry frequency), I measured performance
gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
on an arm device.
rdar://16162005
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204690 91177308-0d34-0410-b5e6-96231b3b80d8
Try to match scalar and first like the other instructions.
Expand 64-bit ands to a pair of 32-bit ands since that is not
available on the VALU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204660 91177308-0d34-0410-b5e6-96231b3b80d8
Those patterns are used when the load cannot be folded into the related broadcast
during the select phase.
This happens when the load gets additional uses that were not anticipated during
the previous lowering phases (constant vector to constant load, then constant
load reused) or when selection DAG is not able to prove that folding the load
will not create a cycle in the DAG.
<rdar://problem/16074331>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204631 91177308-0d34-0410-b5e6-96231b3b80d8