llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-04 11:17:31 +00:00

Author	SHA1	Message	Date
Jakob Stoklund Olesen	369e673289	Fix Thumb and Thumb2 tests to be register allocator independent. llvm-svn: 128690	2011-03-31 23:31:50 +00:00
Eric Christopher	b51c27cd9a	Fix the bfi handling for or (and a mask) (and b mask). We need the two masks to match inversely for the code as is to work. For the example given we actually want: bfi r0, r2, #1, #1 not #0, however, given the way the pattern is written it's not possible at the moment. Fixes rdar://9177502 llvm-svn: 128320	2011-03-26 01:21:03 +00:00
Cameron Zwarich	bf5c9cd119	Roll r127459 back in: Optimize trivial branches in CodeGenPrepare, which often get created from the lowering of objectsize intrinsics. Unfortunately, a number of tests were relying on llc not optimizing trivial branches, so I had to add an option to allow them to continue to test what they originally tested. This fixes <rdar://problem/8785296> and <rdar://problem/9112893>. llvm-svn: 127498	2011-03-11 21:52:04 +00:00
Daniel Dunbar	a02706c889	Revert r127459, "Optimize trivial branches in CodeGenPrepare, which often get created from the", it broke some GCC test suite tests. llvm-svn: 127477	2011-03-11 19:30:30 +00:00
Cameron Zwarich	9ed726c151	Optimize trivial branches in CodeGenPrepare, which often get created from the lowering of objectsize intrinsics. Unfortunately, a number of tests were relying on llc not optimizing trivial branches, so I had to add an option to allow them to continue to test what they originally tested. This fixes <rdar://problem/8785296> and <rdar://problem/9112893>. llvm-svn: 127459	2011-03-11 04:54:27 +00:00
Bob Wilson	d46874eb5d	Move a test that ended up in the wrong place. llvm-svn: 124933	2011-02-05 04:15:50 +00:00
Evan Cheng	0dfe28a9b5	Last round of fixes for movw + movt global address codegen. 1. Fixed ARM pc adjustment. 2. Fixed dynamic-no-pic codegen 3. CSE of pc-relative load of global addresses. It's now enabled by default for Darwin. llvm-svn: 123991	2011-01-21 18:55:51 +00:00
Andrew Trick	e0bccb5f87	Enable support for precise scheduling of the instruction selection DAG. Disable using "-disable-sched-cycles". For ARM, this enables a framework for modeling the cpu pipeline and counting stalls. It also activates several heuristics to drive scheduling based on the model. Scheduling is inherently imprecise at this stage, and until spilling is improved it may defeat attempts to schedule. However, this framework provides greater control over tuning codegen. Although the flag is not target-specific, it should have very little affect on the default scheduler used by x86. The only two changes that affect x86 are: - scheduling a high-latency operation bumps the current cycle so independent operations can have their latency covered. i.e. two independent 4 cycle operations can produce results in 4 cycles, not 8 cycles. - Two operations with equal register pressure impact and no latency-based stalls on their uses will be prioritized by depth before height (height is irrelevant if no stalls occur in the schedule below this point). llvm-svn: 123971	2011-01-21 06:19:05 +00:00
Bob Wilson	22f18a7e94	Add ARM patterns to match EXTRACT_SUBVECTOR nodes. Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle vectors from being translated to EXTRACT_SUBVECTOR. Patch by Tim Northover. The test changes are needed to keep those spill-q tests from testing aligned spills and restores. If the only aligned stack objects are spill slots, we no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR was legalized by loading from the stack, which created an aligned frame index. Now, however, there is nothing except the spill slot in the stack frame, so I added an aligned alloca. llvm-svn: 122995	2011-01-07 04:59:04 +00:00
Bob Wilson	33e5e902b0	Remove the rest of the _sfp Neon instruction patterns. Use the same COPY_TO_REGCLASS approach as for the 2-register _sfp instructions. This change made a big difference in the code generated for the CodeGen/Thumb2/cross-rc-coalescing-2.ll test: The coalescer is still doing a fine job, but some instructions that were previously moved outside the loop are not moved now. It's using fewer VFP registers now, which is generally a good thing, so I think the estimates for register pressure changed and that affected the LICM behavior. Since that isn't obviously wrong, I've just changed the test file. This completes the work for Radar 8711675. llvm-svn: 121730	2010-12-13 23:02:37 +00:00
Evan Cheng	b6773d7e1f	(or (and (shl A, #shamt), mask), B) => ARMbfi B, A, ~mask where lsb(mask) == #shamt. rdar://8752056 llvm-svn: 121606	2010-12-11 04:11:38 +00:00
Jim Grosbach	8bc33cc6e5	ARM stm/ldm instructions require more than one register in the register list. Otherwise, a plain str/ldr should be used instead. Make sure we account for that in prologue/epilogue code generation. rdar://8745460 llvm-svn: 121391	2010-12-09 18:31:13 +00:00
Bob Wilson	20c65a9d33	The Thumb tADDrSPi instruction is not valid when the destination is SP. Check for that and try narrowing it to tADDspi instead. Radar 8724703. llvm-svn: 120892	2010-12-04 04:40:19 +00:00
Jim Grosbach	c69ad2176a	When using the 'push' mnemonic for Thumb2 stmdb, be explicit when it's the 32-bit wide version by adding the .w suffix. llvm-svn: 120838	2010-12-03 20:33:01 +00:00
Owen Anderson	8802c68592	Add correct encodings for STRD and LDRD, including fixup support. Additionally, update these to unified syntax. llvm-svn: 120589	2010-12-01 19:18:46 +00:00
Evan Cheng	e6d55cd247	Fix epilogue codegen to avoid leaving the stack pointer in an invalid state. Previously Thumb2 would restore sp from fp like this: mov sp, r7 sub, sp, #4 If an interrupt is taken after the 'mov' but before the 'sub', callee-saved registers might be clobbered by the interrupt handler. Instead, try restoring directly from sp: add sp, #4 Or, if necessary (with VLA, etc.) use a scratch register to compute sp and then restore it: sub.w r4, r7, #8 mov sp, r7 rdar://8465407 llvm-svn: 119977	2010-11-22 18:12:04 +00:00
Eric Christopher	bc6a51d63f	Rewrite stack callee saved spills and restores to use push/pop instructions. Remove movePastCSLoadStoreOps and associated code for simple pointer increments. Update routines that depended upon other opcodes for save/restore. Adjust all testcases accordingly. llvm-svn: 119725	2010-11-18 19:40:05 +00:00
Dale Johannesen	a87210e350	These tests are looking for library function names that appear to differ on Linux. Try to make them pass on Linux. Would be good for a Linux person to review this. llvm-svn: 119572	2010-11-17 21:57:32 +00:00
Evan Cheng	ce610bd6b3	Remove ARM isel hacks that fold large immediates into a pair of add, sub, and, and xor. The 32-bit move immediates can be hoisted out of loops by machine LICM but the isel hacks were preventing them. Instead, let peephole optimization pass recognize registers that are defined by immediates and the ARM target hook will fold the immediates in. Other changes include 1) do not fold and / xor into cmp to isel TST / TEQ instructions if there are multiple uses. This happens when the 'and' is live out, machine sink would have sinked the computation and that ends up pessimizing code. The peephole pass would recognize situations where the 'and' can be toggled to define CPSR and eliminate the comparison anyway. 2) Move peephole pass to after machine LICM, sink, and CSE to avoid blocking important optimizations. rdar://8663787, rdar://8241368 llvm-svn: 119548	2010-11-17 20:13:28 +00:00
Evan Cheng	67db408634	Two sets of changes. Sorry they are intermingled. 1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to "optimize for latency". Call instructions don't have the right latency and this is more likely to use introduce spills. 2. Fix if-converter cost function. For ARM, it should use instruction latencies, not # of micro-ops since multi-latency instructions is completely executed even when the predicate is false. Also, some instruction will be "slower" when they are predicated due to the register def becoming implicit input. rdar://8598427 llvm-svn: 118135	2010-11-03 00:45:17 +00:00
Jim Grosbach	d6df785c6d	Revert r114340 (improvements in Darwin function prologue/epilogue), as it broke assumptions about stack layout. Specifically, LR must be saved next to FP. llvm-svn: 118026	2010-11-02 17:35:25 +00:00
Bob Wilson	183c466006	Overhaul memory barriers in the ARM backend. Radar 8601999. There were a number of issues to fix up here: * The "device" argument of the llvm.memory.barrier intrinsic should be used to distinguish the "Full System" domain from the "Inner Shareable" domain. It has nothing to do with using DMB vs. DSB instructions. * The compiler should never need to emit DSB instructions. Remove the ARMISD::SYNCBARRIER node and also remove the instruction patterns for DSB. * Merge the separate DMB/DSB instructions for options only used for the disassembler with the default DMB/DSB instructions. Add the default "full system" option ARM_MB::SY to the ARM_MB::MemBOpt enum. * Add a separate ARMISD::MEMBARRIER_MCR node for subtargets that implement a data memory barrier using the MCR instruction. * Fix up encodings for these instructions (except MCR). I also updated the tests and added a few new ones to check for DMB options that were not currently being exercised. llvm-svn: 117756	2010-10-30 00:54:37 +00:00
Evan Cheng	392d2cbdcc	Avoiding overly aggressive latency scheduling. If the two nodes share an operand and one of them has a single use that is a live out copy, favor the one that is live out. Otherwise it will be difficult to eliminate the copy if the instruction is a loop induction variable update. e.g. BB: sub r1, r3, #1 str r0, [r2, r3] mov r3, r1 cmp bne BB => BB: str r0, [r2, r3] sub r3, r3, #1 cmp bne BB This fixed the recent 256.bzip2 regression. llvm-svn: 117675	2010-10-29 18:09:28 +00:00
Evan Cheng	9a06e4c7c8	More accurate estimate / tracking of register pressure. - Initial register pressure in the loop should be all the live defs into the loop. Not just those from loop preheader which is often empty. - When an instruction is hoisted, update register pressure from loop preheader to the original BB. - Treat only use of a virtual register as kill since the code is still SSA. llvm-svn: 116956	2010-10-20 22:03:58 +00:00
Dale Johannesen	a324c8c6bd	Fix crash introduced in 116852. 8573915. llvm-svn: 116955	2010-10-20 22:03:37 +00:00
Dale Johannesen	ee87cbe4e9	Enable using vdup for vector constants which are splat of integers by default, and remove the controlling flag, now that LICM will hoist such vdup's. 8003375. llvm-svn: 116852	2010-10-19 20:00:17 +00:00
Evan Cheng	1c8dafd12a	Re-enable register pressure aware machine licm with fixes. Hoist() may have erased the instruction during LICM so UpdateRegPressureAfter() should not reference it afterwards. llvm-svn: 116845	2010-10-19 18:58:51 +00:00
Daniel Dunbar	6ff550c84d	Revert r116781 "- Add a hook for target to determine whether an instruction def is", which breaks some nightly tests. llvm-svn: 116816	2010-10-19 17:14:24 +00:00
Evan Cheng	9c3f6f486e	- Add a hook for target to determine whether an instruction def is "long latency" enough to hoist even if it may increase spilling. Reloading a value from spill slot is often cheaper than performing an expensive computation in the loop. For X86, that means machine LICM will hoist SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON instructions. - Enable register pressure aware machine LICM by default. llvm-svn: 116781	2010-10-19 00:55:07 +00:00
Bob Wilson	8689a52c10	Change register allocation order for ARM VFP and NEON registers to put the callee-saved registers at the end of the lists. Also prefer to avoid using the low registers that are in register subclasses required by certain instructions, so that those registers will more likely be available when needed. This change makes a huge improvement in spilling in some cases. Thanks to Jakob for helping me realize the problem. Most of this patch is fixing the testsuite. There are quite a few places where we're checking for specific registers. I changed those to wildcards in places where that doesn't weaken the tests. The spill-q.ll and thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch of live values to force spills on those tests. llvm-svn: 116055	2010-10-08 06:15:13 +00:00
Owen Anderson	d9fd152c3a	Enable target-specific mul-lowering on ARM, even at -Os. Remove a test that this makes irrelevant, but add a new test for the new, improved functionality. llvm-svn: 114494	2010-09-21 22:51:46 +00:00
Jim Grosbach	cf90f8beb1	Simplify ARM callee-saved register handling by removing the distinction between the high and low registers for prologue/epilogue code. This was a Darwin-only thing that wasn't providing a realistic benefit anymore. Combining the save areas simplifies the compiler code and results in better ARM/Thumb2 codegen. For example, previously we would generate code like: push {r4, r5, r6, r7, lr} add r7, sp, #12 stmdb sp!, {r8, r10, r11} With this change, we combine the register saves and generate: push {r4, r5, r6, r7, r8, r10, r11, lr} add r7, sp, #12 rdar://8445635 llvm-svn: 114340	2010-09-20 19:32:20 +00:00
Jim Grosbach	8ae5cfffdd	Teach the (non-MC) instruction printer to use the cannonical names for push/pop, and shift instructions on ARM. Update the tests to match. llvm-svn: 114230	2010-09-17 22:36:38 +00:00
Jim Grosbach	b21c19d666	Move thumb2 tests to the thumb2 directory llvm-svn: 114206	2010-09-17 20:34:09 +00:00
Evan Cheng	c9cb37516d	Teach if-converter to be more careful with predicating instructions that would take multiple cycles to decode. For the current if-converter clients (actually only ARM), the instructions that are predicated on false are not nops. They would still take machine cycles to decode. Micro-coded instructions such as LDM / STM can potentially take multiple cycles to decode. If-converter should take treat them as non-micro-coded simple instructions. llvm-svn: 113570	2010-09-10 01:29:16 +00:00
Bob Wilson	524123343c	Fix NEON VLD pseudo instruction itineraries that were incorrectly copied from the VST pseudos. The VLD/VST scheduling still needs work (see pr6722), but at least we shouldn't confuse the loads with the stores. llvm-svn: 113473	2010-09-09 05:40:26 +00:00
Jim Grosbach	c50df6cfad	Re-apply r112883: "For ARM stack frames that utilize variable sized objects and have either large local stack areas or require dynamic stack realignment, allocate a base register via which to access the local frame. This allows efficient access to frame indices not accessible via the FP (either due to being out of range or due to dynamic realignment) or the SP (due to variable sized object allocation). In particular, this greatly improves efficiency of access to spill slots in Thumb functions which contain VLAs." r112986 fixed a latent bug exposed by the above. llvm-svn: 112989	2010-09-03 18:37:12 +00:00
Daniel Dunbar	3fa5ea53fa	Revert "For ARM stack frames that utilize variable sized objects and have either", it is breaking oggenc with Clang for ARMv6. This reverts commit 8d6e29cfda270be483abf638850311670829ee65. llvm-svn: 112962	2010-09-03 15:26:42 +00:00
Jim Grosbach	fb89154d21	For ARM stack frames that utilize variable sized objects and have either large local stack areas or require dynamic stack realignment, allocate a base register via which to access the local frame. This allows efficient access to frame indices not accessible via the FP (either due to being out of range or due to dynamic realignment) or the SP (due to variable sized object allocation). In particular, this greatly improves efficiency of access to spill slots in Thumb functions which contain VLAs. rdar://7352504 rdar://8374540 rdar://8355680 llvm-svn: 112883	2010-09-02 22:29:01 +00:00
Jim Grosbach	8f30718112	Now that register allocation properly considers reserved regs, simplify the ARM register class allocation order functions to take advantage of that. llvm-svn: 112841	2010-09-02 18:14:29 +00:00
Chris Lattner	b74759a9fa	temporarily revert r112664, it is causing a decoding conflict, and the testcases should be merged. llvm-svn: 112711	2010-09-01 16:00:50 +00:00
Bill Wendling	bb6052cfd6	We have a chance for an optimization. Consider this code: int x(int t) { if (t & 256) return -26; return 0; } We generate this: tst.w r0, #256 mvn r0, #25 it eq moveq r0, #0 while gcc generates this: ands r0, r0, #256 it ne mvnne r0, #25 bx lr Scandalous really! During ISel time, we can look for this particular pattern. One where we have a "MOVCC" that uses the flag off of a CMPZ that itself is comparing an AND instruction to 0. Something like this (greatly simplified): %r0 = ISD::AND ... ARMISD::CMPZ %r0, 0 @ sets [CPSR] %r0 = ARMISD::MOVCC 0, -26 @ reads [CPSR] All we have to do is convert the "ISD::AND" into an "ARM::ANDS" that sets [CPSR] when it's zero. The zero value will all ready be in the %r0 register and we only need to change it if the AND wasn't zero. Easy! llvm-svn: 112664	2010-08-31 22:41:22 +00:00
Bob Wilson	c01101e76c	Add alignment arguments to all the NEON load/store intrinsics. Update all the tests using those intrinsics and add support for auto-upgrading bitcode files with the old versions of the intrinsics. llvm-svn: 112271	2010-08-27 17:13:24 +00:00
Daniel Dunbar	9b7c2ce591	ARM/Thumb2: Fix a misselect in getARMCmp, when attempting to adjust a signed comparison that would overflow. - The other under/overflow cases can't actually happen because the immediates which would trigger them are legal (so we don't enter this code), but adjusted the style to make it clear the transform is always valid. llvm-svn: 112053	2010-08-25 16:58:05 +00:00
Bob Wilson	e382fce916	Change ARM PKHTB and PKHBT instructions to use a shift_imm operand to avoid printing "lsl #0". This fixes the remaining parts of pr7792. Make corresponding changes for encoding/decoding these instructions. llvm-svn: 111251	2010-08-17 17:23:19 +00:00
Bob Wilson	d662e8cd02	Generalize a pattern for PKHTB: an SRL of 16-31 bits will guarantee that the high halfword is zero. The shift need not be exactly 16 bits. llvm-svn: 111196	2010-08-16 22:26:55 +00:00
Bob Wilson	b776a55df5	Convert test to FileCheck. llvm-svn: 111195	2010-08-16 22:21:13 +00:00
Bob Wilson	ca672ee828	Temporarily disable tail calls on ARM to work around some linker problems. llvm-svn: 111050	2010-08-13 22:43:33 +00:00
Jim Grosbach	401709255b	fix silly typo llvm-svn: 110831	2010-08-11 17:32:46 +00:00
Jim Grosbach	be1b6086b3	Add a target triple, as the runtime library invocation varies a bit by platform. It's apparently "bl __muldf3" on linux, for example. Since that's not what we're checking here, it's more robust to just force a triple. We just wwant to check that the inline FP instructions are only generated on cpus that have them." llvm-svn: 110830	2010-08-11 17:31:12 +00:00

1 2 3 4 5 ...

307 Commits