The DAGCombine logic that recognized a/sqrt(b) and transformed it into
a multiplication by the reciprocal sqrt did not handle cases where the
sqrt and the division were separated by an fpext or fptrunc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178801 91177308-0d34-0410-b5e6-96231b3b80d8
The Thumb2SizeReduction pass avoids false CPSR dependencies, except it
still aggressively creates tMOVi8 instructions because they are so
common.
Avoid creating false CPSR dependencies even for tMOVi8 instructions when
the the CPSR flags are known to have high latency. This allows integer
computation to overlap floating point computations.
Also process blocks in a reverse post-order and propagate high-latency
flags to successors.
<rdar://problem/13468102>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178773 91177308-0d34-0410-b5e6-96231b3b80d8
This requires v9 cmov instructions using the %xcc flags instead of the
%icc flags.
Still missing:
- Select floats on %xcc flags.
- Select i64 on %fcc flags.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178737 91177308-0d34-0410-b5e6-96231b3b80d8
For this we need to use a libcall. Previously LLVM didn't implement
libcall support for frem, so I've added it in the usual
straightforward manner. A test case from the bug report is included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178639 91177308-0d34-0410-b5e6-96231b3b80d8
The same compare instruction is used for 32-bit and 64-bit compares. It
sets two different sets of flags: icc and xcc.
This patch adds a conditional branch instruction using the xcc flags for
64-bit compares.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178621 91177308-0d34-0410-b5e6-96231b3b80d8
When unsafe FP math operations are enabled, we can use the fre[s] and
frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together
with some Newton iteration, in order to quickly generate floating-point
division and sqrt results. All of these instructions are separately optional,
and so each has its own feature flag (except for the Altivec instructions,
which are covered under the existing Altivec flag). Doing this is not only
faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these
computations to be pipelined with other computations in order to hide their
overall latency.
I've also added a couple of missing fnmsub patterns which turned out to be
missing (but are necessary for good code generation of the Newton iterations).
Altivec needs a similar fix, but that will probably be more complicated because
fneg is expanded for Altivec's v4f32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178617 91177308-0d34-0410-b5e6-96231b3b80d8
This patch initializes t9 to the handler address, but only if the relocation
model is pic. This handles the case where handler to which eh.return jumps
points to the start of the function.
Patch by Sasa Stankovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178588 91177308-0d34-0410-b5e6-96231b3b80d8
When doing a partword atomic operation, a lwarx was being paired with
a stdcx. instead of a stwcx. when compiling for a 64-bit target. The
target has nothing to do with it in this case; we always need a stwcx.
Thanks to Kai Nacke for reporting the problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178559 91177308-0d34-0410-b5e6-96231b3b80d8
This is helps on architectures where i8,i16 are not legal but we have byte, and
short loads/stores. Allowing us to merge copies like the one below on ARM.
copy(char *a, char *b, int n) {
do {
int t0 = a[0];
int t1 = a[1];
b[0] = t0;
b[1] = t1;
radar://13536387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178546 91177308-0d34-0410-b5e6-96231b3b80d8
SPARC v9 extends all ALU instructions to 64 bits, so we simply need to
add patterns to use them for both i32 and i64 values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178527 91177308-0d34-0410-b5e6-96231b3b80d8
The last resort pattern produces 6 instructions, and there are still
opportunities for materializing some immediates in fewer instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178526 91177308-0d34-0410-b5e6-96231b3b80d8
SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right
instructions are still usable as zero and sign extensions.
This adds new F3_Sr and F3_Si instruction formats that probably should
be used for the 32-bit shifts as well. They don't really encode an
simm13 field.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178525 91177308-0d34-0410-b5e6-96231b3b80d8
This is far from complete, but it is enough to make it possible to write
test cases using i64 arguments.
Missing features:
- Floating point arguments.
- Receiving arguments on the stack.
- Calls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178523 91177308-0d34-0410-b5e6-96231b3b80d8
We would also like to merge sequences that involve a variable index like in the
example below.
int index = *idx++
int i0 = c[index+0];
int i1 = c[index+1];
b[0] = i0;
b[1] = i1;
By extending the parsing of the base pointer to handle dags that contain a
base, index, and offset we can handle examples like the one above.
The dag for the code above will look something like:
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i8 load %index))))
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
The code that parses the tree ignores the intermediate sign extensions. However,
if there is a sign extension it needs to be on all indexes.
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (add (i8 load %index)
(i8 1))))
vs
(load (i64 add (i64 copyfromreg %c)
(i64 signextend (i32 add (i32 signextend (i8 load %index))
(i32 1)))))
radar://13536387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178483 91177308-0d34-0410-b5e6-96231b3b80d8
The P7 and A2 have additional floating-point conversion instructions which
allow a direct two-instruction sequence (plus load/store) to convert from all
combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores,
only some combinations were directly available).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178480 91177308-0d34-0410-b5e6-96231b3b80d8
The popcntw instruction is available whenever the popcntd instruction is
available, and performs a separate popcnt on the lower and upper 32-bits.
Ignoring the high-order count, this can be used for the 32-bit input case
(saving on the explicit zero extension otherwise required to use popcntd).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178470 91177308-0d34-0410-b5e6-96231b3b80d8
This instruction is available on modern PPC64 CPUs, and is now used
to improve the SINT_TO_FP lowering (by eliminating the need for the
separate sign extension instruction and decreasing the amount of
needed stack space).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178446 91177308-0d34-0410-b5e6-96231b3b80d8
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode. Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.
This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178438 91177308-0d34-0410-b5e6-96231b3b80d8
'@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'.
With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here).
Patch by David Nadlinger!
Differential Revision: http://llvm-reviews.chandlerc.com/D429
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178427 91177308-0d34-0410-b5e6-96231b3b80d8
specific code paths.
This allows us to write code like:
if (__nvvm_reflect("FOO"))
// Do something
else
// Do something else
and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178416 91177308-0d34-0410-b5e6-96231b3b80d8
Check that instruction selection can select multiply-add/sub DSP instructions
from a pattern that doesn't have intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178406 91177308-0d34-0410-b5e6-96231b3b80d8
derived class MipsSETargetLowering.
We shouldn't be generating madd/msub nodes if target is Mips16, since Mips16
doesn't have support for multipy-add/sub instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178404 91177308-0d34-0410-b5e6-96231b3b80d8
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).
Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178362 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178337 91177308-0d34-0410-b5e6-96231b3b80d8
- RDRAND always clears the destination value when a random value is not
available (i.e. CF == 0). This value is truncated or zero-extended as
the false boolean value to be returned. Boolean simplification needs
to skip this 'zext' or 'trunc' node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178312 91177308-0d34-0410-b5e6-96231b3b80d8
Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were
added for bswap with load/store. This is because these combines are really only
valid in 64-bit mode, regardless of the CPU (and this was not being checked).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178286 91177308-0d34-0410-b5e6-96231b3b80d8