On some cores it's a bad idea for performance to mix VFP and NEON instructions
and since these patterns are NEON anyway, the NEON load should be used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155630 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly to test the waters. I'd like to get results from FNT
build bots and other bots running on non-x86 platforms.
This feature has been pretty heavily tested over the last few months by
me, and it fixes several of the execution time regressions caused by the
inlining work by preventing inlining decisions from radically impacting
block layout.
I've seen very large improvements in yacr2 and ackermann benchmarks,
along with the expected noise across all of the benchmark suite whenever
code layout changes. I've analyzed all of the regressions and fixed
them, or found them to be impossible to fix. See my email to llvmdev for
more details.
I'd like for this to be in 3.1 as it complements the inliner changes,
but if any failures are showing up or anyone has concerns, it is just
a flag flip and so can be easily turned off.
I'm switching it on tonight to try and get at least one run through
various folks' performance suites in case SPEC or something else has
serious issues with it. I'll watch bots and revert if anything shows up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154816 91177308-0d34-0410-b5e6-96231b3b80d8
don't elide the branch instruction if it's the only one in the block,
otherwise it's ok.
PR9796 and rdar://11215207
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154417 91177308-0d34-0410-b5e6-96231b3b80d8
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154370 91177308-0d34-0410-b5e6-96231b3b80d8
in-register, such that we can use a single vector store rather then a
series of scalar stores.
For func_4_8 the generated code
vldr d16, LCPI0_0
vmov d17, r0, r1
vadd.i16 d16, d17, d16
vmov.u16 r0, d16[3]
strb r0, [r2, #3]
vmov.u16 r0, d16[2]
strb r0, [r2, #2]
vmov.u16 r0, d16[1]
strb r0, [r2, #1]
vmov.u16 r0, d16[0]
strb r0, [r2]
bx lr
becomes
vldr d16, LCPI0_0
vmov d17, r0, r1
vadd.i16 d16, d17, d16
vuzp.8 d16, d17
vst1.32 {d16[0]}, [r2, :32]
bx lr
I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.
This
ldrh r0, [r0, #4]
strh r0, [r1]
becomes
vldr d16, [r0]
vmov.u16 r0, d16[2]
vmov.32 d16[0], r0
vuzp.16 d16, d17
vst1.32 {d16[0]}, [r1, :32]
PR11158
rdar://10703339
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154340 91177308-0d34-0410-b5e6-96231b3b80d8
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154265 91177308-0d34-0410-b5e6-96231b3b80d8
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154183 91177308-0d34-0410-b5e6-96231b3b80d8
LSR can fold three addressing modes into its ICmpZero node:
ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset
ICmpZero -1*ScaleReg + Offset => ICmp ScaleReg, Offset
ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg
The first two cases are only used if TLI->isLegalICmpImmediate() likes
the offset.
Make sure the right Offset sign is passed to this method in the second
case. The ARM version is not symmetric.
<rdar://problem/11184260>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154079 91177308-0d34-0410-b5e6-96231b3b80d8
A MOVCCr instruction can be commuted by inverting the condition. This
can help reduce register pressure and remove unnecessary copies in some
cases.
<rdar://problem/11182914>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154033 91177308-0d34-0410-b5e6-96231b3b80d8
This is just the fallback tie-breaker ordering, the main allocation
order is still descending size.
Patch by Shamil Kurmangaleev!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153904 91177308-0d34-0410-b5e6-96231b3b80d8
produces a 32-bit immediate which is consumed by the use. It tries to
fold the immediate by breaking it into two parts and fold them into the
immmediate fields of two uses. e.g
movw r2, #40885
movt r3, #46540
add r0, r0, r3
=>
add.w r0, r0, #3019898880
add.w r0, r0, #30146560
;
However, this transformation is incorrect if the user produces a flag. e.g.
movw r2, #40885
movt r3, #46540
adds r0, r0, r3
=>
add.w r0, r0, #3019898880
adds.w r0, r0, #30146560
Note the adds.w may not set the carry flag even if the original sequence
would.
rdar://11116189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153484 91177308-0d34-0410-b5e6-96231b3b80d8
* Removed test/lib/llvm.exp - it is no longer needed
* Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files
left in the test suite so this code is no longer required. test/lit.cfg is
now much shorter and clearer
* Removed a lot of duplicate code in lit.local.cfg files that need access to
the root configuration, by adding a "root" attribute to the TestingConfig
object. This attribute is dynamically computed to provide the same
information as was previously provided by the custom getRoot functions.
* Documented the config.root attribute in docs/CommandGuide/lit.pod
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153408 91177308-0d34-0410-b5e6-96231b3b80d8
execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester.
This may also improve compile-time on architectures that would otherwise
generate a libcall for urem (e.g., ARM) or fall back to the DAG selector.
rdar://10810716
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153230 91177308-0d34-0410-b5e6-96231b3b80d8
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).
rdar://11035895
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152675 91177308-0d34-0410-b5e6-96231b3b80d8
When an instruction only writes sub-registers, it is still necessary to
add an <imp-def> operand for the super-register. When reloading into a
virtual register, rewriting will add the operand, but when loading
directly into a virtual register, the <imp-def> operand is still
necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152095 91177308-0d34-0410-b5e6-96231b3b80d8
The fpscr register contains both flags (set by FP operations/comparisons) and
control bits. The control bits (FPSCR) should be reserved, since they're always
available and needn't be defined before use. The flag bits (FPSCR_NZCV) should
like to be unreserved so they can be hoisted by MachineCSE. This fixes PR12165.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152076 91177308-0d34-0410-b5e6-96231b3b80d8
In this update:
- I assumed neon2 does not imply vfpv4, but neon and vfpv4 imply neon2.
- I kept setting .fpu=neon-vfpv4 code attribute because that is what the
assembler understands.
Patch by Ana Pazos <apazos@codeaurora.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152036 91177308-0d34-0410-b5e6-96231b3b80d8
MachineOperands that define part of a virtual register must have an
<undef> flag if they are not intended as read-modify-write operands.
The old trick of adding an <imp-def> operand doesn't work any longer.
Fixes PR12177.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152008 91177308-0d34-0410-b5e6-96231b3b80d8
Some BBs can become dead after codegen preparation. If we delete them here, it
could help enable tail-call optimizations later on.
<rdar://problem/10256573>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152002 91177308-0d34-0410-b5e6-96231b3b80d8
floating point equality comparisons into integer ones with -ffast-math. The
issue is the optimization causes +0.0 != -0.0.
Now the optimization is only done when one side is known to be 0.0. The other
side's sign bit is masked off for the comparison.
rdar://10964603
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151861 91177308-0d34-0410-b5e6-96231b3b80d8