307 Commits

Author SHA1 Message Date
Jakob Stoklund Olesen
369e673289 Fix Thumb and Thumb2 tests to be register allocator independent.
llvm-svn: 128690
2011-03-31 23:31:50 +00:00
Eric Christopher
b51c27cd9a Fix the bfi handling for or (and a mask) (and b mask). We need the two
masks to match inversely for the code as is to work. For the example given
we actually want:

bfi r0, r2, #1, #1

not #0, however, given the way the pattern is written it's not possible
at the moment.

Fixes rdar://9177502

llvm-svn: 128320
2011-03-26 01:21:03 +00:00
Cameron Zwarich
bf5c9cd119 Roll r127459 back in:
Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127498
2011-03-11 21:52:04 +00:00
Daniel Dunbar
a02706c889 Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get
created from the", it broke some GCC test suite tests.

llvm-svn: 127477
2011-03-11 19:30:30 +00:00
Cameron Zwarich
9ed726c151 Optimize trivial branches in CodeGenPrepare, which often get created from the
lowering of objectsize intrinsics. Unfortunately, a number of tests were relying
on llc not optimizing trivial branches, so I had to add an option to allow them
to continue to test what they originally tested.

This fixes <rdar://problem/8785296> and <rdar://problem/9112893>.

llvm-svn: 127459
2011-03-11 04:54:27 +00:00
Bob Wilson
d46874eb5d Move a test that ended up in the wrong place.
llvm-svn: 124933
2011-02-05 04:15:50 +00:00
Evan Cheng
0dfe28a9b5 Last round of fixes for movw + movt global address codegen.
1. Fixed ARM pc adjustment.
2. Fixed dynamic-no-pic codegen
3. CSE of pc-relative load of global addresses.

It's now enabled by default for Darwin.

llvm-svn: 123991
2011-01-21 18:55:51 +00:00
Andrew Trick
e0bccb5f87 Enable support for precise scheduling of the instruction selection
DAG. Disable using "-disable-sched-cycles".

For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.

Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
  operations can have their latency covered. i.e. two independent 4
  cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
  latency-based stalls on their uses will be prioritized by depth before height
  (height is irrelevant if no stalls occur in the schedule below this point).

llvm-svn: 123971
2011-01-21 06:19:05 +00:00
Bob Wilson
22f18a7e94 Add ARM patterns to match EXTRACT_SUBVECTOR nodes.
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.

The test changes are needed to keep those spill-q tests from testing aligned
spills and restores.  If the only aligned stack objects are spill slots, we
no longer realign the stack frame.  Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.

llvm-svn: 122995
2011-01-07 04:59:04 +00:00
Bob Wilson
33e5e902b0 Remove the rest of the *_sfp Neon instruction patterns.
Use the same COPY_TO_REGCLASS approach as for the 2-register *_sfp instructions.
This change made a big difference in the code generated for the
CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing
a fine job, but some instructions that were previously moved outside the loop
are not moved now.  It's using fewer VFP registers now, which is generally
a good thing, so I think the estimates for register pressure changed and that
affected the LICM behavior.  Since that isn't obviously wrong, I've just
changed the test file.  This completes the work for Radar 8711675.

llvm-svn: 121730
2010-12-13 23:02:37 +00:00
Evan Cheng
b6773d7e1f (or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056
llvm-svn: 121606
2010-12-11 04:11:38 +00:00
Jim Grosbach
8bc33cc6e5 ARM stm/ldm instructions require more than one register in the register list.
Otherwise, a plain str/ldr should be used instead. Make sure we account for
that in prologue/epilogue code generation.
rdar://8745460

llvm-svn: 121391
2010-12-09 18:31:13 +00:00
Bob Wilson
20c65a9d33 The Thumb tADDrSPi instruction is not valid when the destination is SP.
Check for that and try narrowing it to tADDspi instead.  Radar 8724703.

llvm-svn: 120892
2010-12-04 04:40:19 +00:00
Jim Grosbach
c69ad2176a When using the 'push' mnemonic for Thumb2 stmdb, be explicit when it's the
32-bit wide version by adding the .w suffix.

llvm-svn: 120838
2010-12-03 20:33:01 +00:00
Owen Anderson
8802c68592 Add correct encodings for STRD and LDRD, including fixup support. Additionally, update these to unified syntax.
llvm-svn: 120589
2010-12-01 19:18:46 +00:00
Evan Cheng
e6d55cd247 Fix epilogue codegen to avoid leaving the stack pointer in an invalid
state. Previously Thumb2 would restore sp from fp like this:
mov sp, r7
sub, sp, #4
If an interrupt is taken after the 'mov' but before the 'sub', callee-saved
registers might be clobbered by the interrupt handler. Instead, try
restoring directly from sp:
add sp, #4
Or, if necessary (with VLA, etc.) use a scratch register to compute sp and
then restore it:
sub.w r4, r7, #8
mov sp, r7
rdar://8465407

llvm-svn: 119977
2010-11-22 18:12:04 +00:00
Eric Christopher
bc6a51d63f Rewrite stack callee saved spills and restores to use push/pop instructions.
Remove movePastCSLoadStoreOps and associated code for simple pointer
increments. Update routines that depended upon other opcodes for save/restore.

Adjust all testcases accordingly.

llvm-svn: 119725
2010-11-18 19:40:05 +00:00
Dale Johannesen
a87210e350 These tests are looking for library function names that
appear to differ on Linux.  Try to make them pass on Linux.
Would be good for a Linux person to review this.

llvm-svn: 119572
2010-11-17 21:57:32 +00:00
Evan Cheng
ce610bd6b3 Remove ARM isel hacks that fold large immediates into a pair of add, sub, and,
and xor. The 32-bit move immediates can be hoisted out of loops by machine
LICM but the isel hacks were preventing them.

Instead, let peephole optimization pass recognize registers that are defined by
immediates and the ARM target hook will fold the immediates in.

Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ
instructions if there are multiple uses. This happens when the 'and' is live
out, machine sink would have sinked the computation and that ends up pessimizing
code. The peephole pass would recognize situations where the 'and' can be
toggled to define CPSR and eliminate the comparison anyway.

2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking
important optimizations.

rdar://8663787, rdar://8241368

llvm-svn: 119548
2010-11-17 20:13:28 +00:00
Evan Cheng
67db408634 Two sets of changes. Sorry they are intermingled.
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
   "optimize for latency". Call instructions don't have the right latency and
   this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
   not # of micro-ops since multi-latency instructions is completely executed
   even when the predicate is false. Also, some instruction will be "slower"
   when they are predicated due to the register def becoming implicit input.
   rdar://8598427

llvm-svn: 118135
2010-11-03 00:45:17 +00:00
Jim Grosbach
d6df785c6d Revert r114340 (improvements in Darwin function prologue/epilogue), as it broke
assumptions about stack layout. Specifically, LR must be saved next to FP.

llvm-svn: 118026
2010-11-02 17:35:25 +00:00
Bob Wilson
183c466006 Overhaul memory barriers in the ARM backend. Radar 8601999.
There were a number of issues to fix up here:
* The "device" argument of the llvm.memory.barrier intrinsic should be
used to distinguish the "Full System" domain from the "Inner Shareable"
domain.  It has nothing to do with using DMB vs. DSB instructions.
* The compiler should never need to emit DSB instructions.  Remove the
ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB.
* Merge the separate DMB/DSB instructions for options only used for the
disassembler with the default DMB/DSB instructions.  Add the default
"full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum.
* Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement
a data memory barrier using the MCR instruction.
* Fix up encodings for these instructions (except MCR).
I also updated the tests and added a few new ones to check for DMB options
that were not currently being exercised.

llvm-svn: 117756
2010-10-30 00:54:37 +00:00
Evan Cheng
392d2cbdcc Avoiding overly aggressive latency scheduling. If the two nodes share an
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.

BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB

=>

BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB

This fixed the recent 256.bzip2 regression.

llvm-svn: 117675
2010-10-29 18:09:28 +00:00
Evan Cheng
9a06e4c7c8 More accurate estimate / tracking of register pressure.
- Initial register pressure in the loop should be all the live defs into the
  loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
  to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.

llvm-svn: 116956
2010-10-20 22:03:58 +00:00
Dale Johannesen
a324c8c6bd Fix crash introduced in 116852. 8573915.
llvm-svn: 116955
2010-10-20 22:03:37 +00:00
Dale Johannesen
ee87cbe4e9 Enable using vdup for vector constants which are splat of
integers by default, and remove the controlling flag, now
that LICM will hoist such vdup's.  8003375.

llvm-svn: 116852
2010-10-19 20:00:17 +00:00
Evan Cheng
1c8dafd12a Re-enable register pressure aware machine licm with fixes. Hoist() may have
erased the instruction during LICM so UpdateRegPressureAfter() should not
reference it afterwards.

llvm-svn: 116845
2010-10-19 18:58:51 +00:00
Daniel Dunbar
6ff550c84d Revert r116781 "- Add a hook for target to determine whether an instruction def
is", which breaks some nightly tests.

llvm-svn: 116816
2010-10-19 17:14:24 +00:00
Evan Cheng
9c3f6f486e - Add a hook for target to determine whether an instruction def is
"long latency" enough to hoist even if it may increase spilling. Reloading
  a value from spill slot is often cheaper than performing an expensive
  computation in the loop. For X86, that means machine LICM will hoist
  SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
  instructions.
- Enable register pressure aware machine LICM by default.

llvm-svn: 116781
2010-10-19 00:55:07 +00:00
Bob Wilson
8689a52c10 Change register allocation order for ARM VFP and NEON registers to put the
callee-saved registers at the end of the lists.  Also prefer to avoid using
the low registers that are in register subclasses required by certain
instructions, so that those registers will more likely be available when needed.
This change makes a huge improvement in spilling in some cases.  Thanks to
Jakob for helping me realize the problem.

Most of this patch is fixing the testsuite.  There are quite a few places
where we're checking for specific registers.  I changed those to wildcards
in places where that doesn't weaken the tests.  The spill-q.ll and
thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch
of live values to force spills on those tests.

llvm-svn: 116055
2010-10-08 06:15:13 +00:00
Owen Anderson
d9fd152c3a Enable target-specific mul-lowering on ARM, even at -Os. Remove a test that this makes
irrelevant, but add a new test for the new, improved functionality.

llvm-svn: 114494
2010-09-21 22:51:46 +00:00
Jim Grosbach
cf90f8beb1 Simplify ARM callee-saved register handling by removing the distinction
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.

For example, previously we would generate code like:
        push    {r4, r5, r6, r7, lr}
        add     r7, sp, #12
        stmdb   sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
        push    {r4, r5, r6, r7, r8, r10, r11, lr}
        add     r7, sp, #12

rdar://8445635

llvm-svn: 114340
2010-09-20 19:32:20 +00:00
Jim Grosbach
8ae5cfffdd Teach the (non-MC) instruction printer to use the cannonical names for push/pop,
and shift instructions on ARM. Update the tests to match.

llvm-svn: 114230
2010-09-17 22:36:38 +00:00
Jim Grosbach
b21c19d666 Move thumb2 tests to the thumb2 directory
llvm-svn: 114206
2010-09-17 20:34:09 +00:00
Evan Cheng
c9cb37516d Teach if-converter to be more careful with predicating instructions that would
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.

llvm-svn: 113570
2010-09-10 01:29:16 +00:00
Bob Wilson
524123343c Fix NEON VLD pseudo instruction itineraries that were incorrectly copied from
the VST pseudos.  The VLD/VST scheduling still needs work (see pr6722), but
at least we shouldn't confuse the loads with the stores.

llvm-svn: 113473
2010-09-09 05:40:26 +00:00
Jim Grosbach
c50df6cfad Re-apply r112883:
"For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs."

r112986 fixed a latent bug exposed by the above.

llvm-svn: 112989
2010-09-03 18:37:12 +00:00
Daniel Dunbar
3fa5ea53fa Revert "For ARM stack frames that utilize variable sized objects and have either", it is breaking oggenc with Clang for ARMv6.
This reverts commit 8d6e29cfda270be483abf638850311670829ee65.

llvm-svn: 112962
2010-09-03 15:26:42 +00:00
Jim Grosbach
fb89154d21 For ARM stack frames that utilize variable sized objects and have either
large local stack areas or require dynamic stack realignment, allocate a
base register via which to access the local frame. This allows efficient
access to frame indices not accessible via the FP (either due to being out
of range or due to dynamic realignment) or the SP (due to variable sized
object allocation). In particular, this greatly improves efficiency of access
to spill slots in Thumb functions which contain VLAs.

rdar://7352504
rdar://8374540
rdar://8355680

llvm-svn: 112883
2010-09-02 22:29:01 +00:00
Jim Grosbach
8f30718112 Now that register allocation properly considers reserved regs, simplify the
ARM register class allocation order functions to take advantage of that.

llvm-svn: 112841
2010-09-02 18:14:29 +00:00
Chris Lattner
b74759a9fa temporarily revert r112664, it is causing a decoding conflict, and
the testcases should be merged.

llvm-svn: 112711
2010-09-01 16:00:50 +00:00
Bill Wendling
bb6052cfd6 We have a chance for an optimization. Consider this code:
int x(int t) {
  if (t & 256)
    return -26;
  return 0;
}

We generate this:

     tst.w   r0, #256
     mvn     r0, #25
     it      eq
     moveq   r0, #0

while gcc generates this:

     ands    r0, r0, #256
     it      ne
     mvnne   r0, #25
     bx      lr

Scandalous really!

During ISel time, we can look for this particular pattern. One where we have a
"MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND
instruction to 0. Something like this (greatly simplified):

  %r0 = ISD::AND ...
  ARMISD::CMPZ %r0, 0         @ sets [CPSR]
  %r0 = ARMISD::MOVCC 0, -26  @ reads [CPSR]

All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR]
when it's zero. The zero value will all ready be in the %r0 register and we only
need to change it if the AND wasn't zero. Easy!

llvm-svn: 112664
2010-08-31 22:41:22 +00:00
Bob Wilson
c01101e76c Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Daniel Dunbar
9b7c2ce591 ARM/Thumb2: Fix a misselect in getARMCmp, when attempting to adjust a signed
comparison that would overflow.
 - The other under/overflow cases can't actually happen because the immediates
   which would trigger them are legal (so we don't enter this code), but
   adjusted the style to make it clear the transform is always valid.

llvm-svn: 112053
2010-08-25 16:58:05 +00:00
Bob Wilson
e382fce916 Change ARM PKHTB and PKHBT instructions to use a shift_imm operand to avoid
printing "lsl #0".  This fixes the remaining parts of pr7792.  Make
corresponding changes for encoding/decoding these instructions.

llvm-svn: 111251
2010-08-17 17:23:19 +00:00
Bob Wilson
d662e8cd02 Generalize a pattern for PKHTB: an SRL of 16-31 bits will guarantee
that the high halfword is zero.  The shift need not be exactly 16 bits.

llvm-svn: 111196
2010-08-16 22:26:55 +00:00
Bob Wilson
b776a55df5 Convert test to FileCheck.
llvm-svn: 111195
2010-08-16 22:21:13 +00:00
Bob Wilson
ca672ee828 Temporarily disable tail calls on ARM to work around some linker problems.
llvm-svn: 111050
2010-08-13 22:43:33 +00:00
Jim Grosbach
401709255b fix silly typo
llvm-svn: 110831
2010-08-11 17:32:46 +00:00
Jim Grosbach
be1b6086b3 Add a target triple, as the runtime library invocation varies a bit by
platform. It's apparently "bl __muldf3" on linux, for example. Since that's
not what we're checking here, it's more robust to just force a triple. We
just wwant to check that the inline FP instructions are only generated
on cpus that have them."

llvm-svn: 110830
2010-08-11 17:31:12 +00:00