RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-01-09 21:50:50 +00:00

Author	SHA1	Message	Date
Lang Hames	4c553e0367	Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225534 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-09 18:55:42 +00:00
Hal Finkel	139bfee84c	[PowerPC] Enable late partial unrolling on the POWER7 The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225522 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-09 15:51:16 +00:00
Hal Finkel	cc7ea4df1e	[PowerPC] Add a flag for experimenting with subreg liveness tracking This cannot yet be enabled by default, it causes ~50 miscompiles in the test suite. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225497 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-09 02:03:11 +00:00
Hal Finkel	4e98296890	[PowerPC] Fold [sz]ext with fp_to_int lowering where possible On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32 to i64 into the load necessary for an i64 -> fp conversion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225493 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-09 01:34:30 +00:00
Hal Finkel	b7c01bf403	[PowerPC] Mark all instructions as non-cheap for MachineLICM MachineLICM uses a callback named hasLowDefLatency to determine if an instruction def operand has a 'low' latency. If all relevant operands have a 'low' latency, the instruction is considered too cheap to hoist out of loops even in low-register-pressure situations. On PowerPC cores, both the embedded cores and the others, there is no reason to believe that this is a good choice: all instructions have a cost inside a loop, and hoisting them when not limited by register pressure is a reasonable default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225471 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-08 22:11:49 +00:00
Justin Hibbits	77e85a150c	Add saving and restoring of r30 to the prologue and epilogue, respectively Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225450 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-08 15:47:19 +00:00
Ahmed Bougacha	7fac1d945f	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225421 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-08 00:51:32 +00:00
Ahmed Bougacha	8065738154	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225392 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-07 21:27:10 +00:00
Hal Finkel	6b7f3f4b20	[PowerPC] Transform a README.txt entry into a FIXME Remove the README.txt entry regarding register allocation of CR logical ops, and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM for clarifying what was intended and highlighting the relevant text in the ISA specification. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225325 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-07 00:15:29 +00:00
Lang Hames	84acf09f32	Revert r224935 "Refactor duplicated code. No intended functionality change." This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225311 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-06 23:04:36 +00:00
Hal Finkel	8e9ba0e588	[PowerPC] Reuse a load operand in int->fp conversions int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225301 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-06 22:31:02 +00:00
Hal Finkel	49fe2a5a5c	[PowerPC] Remove old README.txt entry regarding struct passing Because of how Clang represents structs as arrays (at least on non-Darwin platforms), and what SROA does, etc. this is no longer a problem. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225251 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-06 07:23:13 +00:00
Hal Finkel	7a7ed3cec2	[PowerPC] Add some missing names in getTargetNodeName These are used for debugging output; NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225249 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-06 07:02:15 +00:00
Hal Finkel	10ae865847	[PowerPC] Improve int_to_fp(fp_to_int(x)) combining The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x)) without a load/store pair is updated here with support for unsigned integers, and to support single-precision values without a third rounding step, on newer cores with the appropriate instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225248 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-06 06:01:57 +00:00
Lang Hames	bce877c84c	Revert r225048: It broke ObjC on AArch64. I've filed http://llvm.org/PR22100 to track this issue. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225228 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-06 00:54:32 +00:00
Duncan P. N. Exon Smith	2f7982026d	Revert "Use the integrated assembler by default on 32-bit PowerPC and SPARC" This reverts commit r225213. It's failing on multiple buildbots [1][2]. [1]: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22032 [2]: http://lab.llvm.org:8080/green/view/Clang/job/clang-stage1-cmake-RA-incremental_check/2357/ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225222 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 23:31:51 +00:00
Hal Finkel	a767eed6f1	[PowerPC] Remove old README.txt entry We no longer generate horrible code for the stated function: void f(signed char a, _Bool b, _Bool c) { signed char t = 0; if (b) t = a; if (c) *a = t; } for which we now generate: .L.f: andi. 5, 5, 1 cmpldi 1, 4, 0 li 5, 0 beq 1, .LBB0_2 lbz 5, 0(3) .LBB0_2: # %if.end bclr 4, 1, 0 stb 5, 0(3) blr so we don't need the README.txt entry. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225217 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 22:20:22 +00:00
Hal Finkel	fcfee17911	[PowerPC] Convert a README.txt entry into a better test We now produce the desired code as noted in the README.txt file (no spurious or). Remove the README entry and improve the regression test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225214 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 21:53:52 +00:00
Brad Smith	a45b58174f	Use the integrated assembler by default on 32-bit PowerPC and SPARC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225213 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 21:48:16 +00:00
Hal Finkel	92a87c67ee	[PowerPC] Remove README.txt entry This entry has been rendered irrelevant now that we have proper CR bit tracking. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225211 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 21:41:26 +00:00
Hal Finkel	1b84bf2554	[PowerPC] Add a test for truncating a shifted load We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225209 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 21:33:14 +00:00
Hal Finkel	e7d845b709	[PowerPC] Add another test for load/store with update We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225205 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 21:22:42 +00:00
Hal Finkel	ccc83e4a08	[PowerPC] Fold i1 extensions with other ops Consider this function from our README.txt file: int foo(int a, int b) { return (a < b) << 4; } We now explicitly track CR bits by default, so the comment in the README.txt about not really having a SETCC is no longer accurate, but we did generate this somewhat silly code: cmpw 0, 3, 4 li 3, 0 li 12, 1 isel 3, 12, 3, 0 sldi 3, 3, 4 blr which generates the zext as a select between 0 and 1, and then shifts the result by a constant amount. Here we preprocess the DAG in order to fold the results of operations on an extension of an i1 value into the SELECT_I[48] pseudo instruction when the resulting constant can be materialized using one instruction (just like the 0 and 1). This was not implemented as a DAGCombine because the resulting code would have been anti-canonical and depends on replacing chained user nodes, which does not fit well into the lowering paradigm. Now we generate: cmpw 0, 3, 4 li 3, 0 li 12, 16 isel 3, 12, 3, 0 blr which is less silly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225203 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 21:10:24 +00:00
Hal Finkel	3ab10c1918	[PowerPC] Remove zexts after i32 ctlz The 64-bit semantics of cntlzw are not special, the 32-bit population count is stored as a 64-bit value in the range [0,32]. As a result, it is always zero extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a frontier instruction for the removal of unnecessary zero extensions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225192 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 18:52:29 +00:00
Hal Finkel	0ef99720c5	[PowerPC] Remove zexts after byte-swapping loads lhbrx and lwbrx not only load their data with byte swapping, but also clear the upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG peephole optimization as frontier instructions for the removal of unnecessary zero extensions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225189 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 18:09:06 +00:00
Hal Finkel	0ef8f3189e	[PowerPC] Enable speculation of cttz/ctlz PPC has an instruction for ctlz with defined zero behavior, and our lowering of cttz (provided by DAGCombine) is also efficient and branchless, so speculating these makes sense. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225150 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 05:24:42 +00:00
Hal Finkel	9cad6c8a24	[PowerPC] Materialize i64 constants using rotation with masking r225135 added the ability to materialize i64 constants using rotations in order to reduce the instruction count. Sometimes we can use a rotation only with some extra masking, so that we take advantage of the fact that generating a bunch of extra higher-order 1 bits is easy using li/lis. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225147 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 03:41:38 +00:00
Hal Finkel	2ac0826af3	[PowerPC] Materialize i64 constants using rotation Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing a rotated constant, and then applying the inverse rotation, requires fewer instructions than the direct method. If so, do that instead. In r225132, I added support for forming constants using bit inversion. In effect, this reverts that commit and replaces it with rotation support. The bit inversion is useful for turning constants that are mostly ones into ones that are mostly zeros (thus enabling a more-efficient shift-based materialization), but the same effect can be obtained by using negative constants and a rotate, and that is at least as efficient, if not more. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225135 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-04 15:43:55 +00:00
Hal Finkel	d138a7bb3f	[PowerPC] Materialize i64 constants using bit inversion Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing the bit-reversed constant, and then flipping the bits, requires fewer instructions than the direct method. If so, do that instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225132 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-04 12:35:03 +00:00
Hal Finkel	e05b232c20	[PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225117 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-03 17:58:24 +00:00
Hal Finkel	a1d22cc789	[PowerPC] Use 16-byte alignment for modern cores for functions/loops Most modern PowerPC cores prefer that functions and loops start on 16-byte-aligned boundaries (), so instruct block placement, etc. to make this happen. The branch selector has also been adjusted so account for the extra nops that might now be inserted before loop headers. () Some cores actually prefer other alignments for small loops, but that will be addressed in a follow-up commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225115 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-03 14:58:25 +00:00
Craig Topper	00d70e98f0	Minor cleanup to all the switches after MatchInstructionImpl in all the AsmParsers. Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225114 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-03 08:16:34 +00:00
Hal Finkel	958b670c34	[PowerPC] Add support for the CMPB instruction Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225106 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-03 01:16:37 +00:00
Hal Finkel	32399786f9	[PowerPC] use UINT64_C instead of ul Attempting to fix PR22078 (building on 32-bit systems) by replacing my careless use of 1ul to be a uint64_t constant with UINT64_C(1). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225066 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-01 19:33:59 +00:00
Hal Finkel	84cd524ee9	[PowerPC] Improve instruction selection bit-permuting operations (64-bit) This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225056 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-01 02:53:29 +00:00
Rafael Espindola	8093abb745	Add r224985 back with a fix. The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225048 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-31 17:19:34 +00:00
Rafael Espindola	937e781f49	Revert "Remove doesSectionRequireSymbols." This reverts commit r224985. I am investigating why it made an Apple bot unhappy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225044 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-31 16:06:48 +00:00
Rafael Espindola	65300b95e6	Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224985 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-30 13:13:27 +00:00
Rafael Espindola	2a1c1c9dea	Refactor duplicated code. No intended functionality change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224935 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-29 15:18:31 +00:00
David Majnemer	bd64447bf3	PowerPC: CTR shouldn't fire if a TLS call is in the loop Determining the address of a TLS variable results in a function call in certain TLS models. This means that a simple ICmpInst might actually result in invalidating the CTR register. In such cases, do not attempt to rely on the CTR register for loop optimization purposes. This fixes PR22034. Differential Revision: http://reviews.llvm.org/D6786 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224890 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-27 19:45:38 +00:00
Hal Finkel	d7b2788e51	[PowerPC] [FastISel] i1 constants must be zero extended When materializing constant i1 values, they must be zero extended. We represent i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code path was dead for i1 values prior to r216006 (which is why this did not manifest in miscompiles until recently). Fixes -O0 self-hosting on PPC64/Linux. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224842 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-25 23:08:25 +00:00
Hal Finkel	c9e5247ea7	[PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64 On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction sequence is interpreted by the unwinding code in libgcc. To make sure these occur as a pair, as with other pairings interpreted by the linker, fuse the two instructions into one instruction (for code generation only). In the future, we might wish to do this by emitting CFI directives instead, but this solution is simpler, and mirrors what GCC does. Additional discussion on this point is contained in the PR. Fixes PR22015. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224788 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-23 22:29:40 +00:00
Hal Finkel	2bea947207	[PowerPC] Don't mark the return-address slot as immutable It is tempting to mark the fixed stack slot used to store the return address as immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the function, it is not completely immutable: it is written during the function prologue. When using post-RA instruction scheduling, the prologue instructions are available for scheduling, and we're not free to interchange the order of a particular store in the prologue with loads from that stack location. Fixes PR21976. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224761 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-23 09:45:06 +00:00
Hal Finkel	775294d183	[PowerPC] Don't attempt a 64-bit pow2 division on PPC32 In r224033, in moving the signed power-of-2 division expansion into BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a 64-bit division on PPC32. This later asserts. Fixes PR21928. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224758 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-23 08:38:50 +00:00
Craig Topper	633948975c	[PowerPC] Use MCPhysReg for tables of registers. Const-correct the tables. Only put the anonymous namespace around classes. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224498 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-18 05:02:14 +00:00
Will Schmidt	0a91afabd5	Enable the P8Model entry This was missed last time around, for the P8 Instruction Scheduling changes (223257). This will hook the P8Model entry in so those changes will actually be used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224452 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-17 19:56:29 +00:00
Hal Finkel	edfeeb6d70	[PowerPC] Improve instruction selection bit-permuting operations (32-bit) The PowerPC backend, somewhat embarrassingly, did not generate an optimal-length sequence of instructions for a 32-bit bswap. While adding a pattern for the bswap intrinsic to fix this would not have been terribly difficult, doing so would not have addressed the real problem: we had been generating poor code for many bit-permuting operations (by which I mean things like byte-swap that permute the bits of one or more inputs around in various ways). Here are some initial steps toward solving this deficiency. Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL, SHL, SRL, AND and OR (mostly with constant second operands). Looking back through these operations, we can build up a description of the bits in the resulting value in terms of bits of one or more input values (and constant zeros). For each bit, we compute the rotation amount from the original value, and then group consecutive (value, rotation factor) bits into groups. Groups sharing these attributes are then collected and sorted, and we can then instruction select the entire permutation using a combination of masked rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts (rlwimi). The result is that instead of lowering an i32 bswap as: rlwinm 5, 3, 24, 16, 23 rlwinm 4, 3, 24, 0, 7 rlwimi 4, 3, 8, 8, 15 rlwimi 5, 3, 8, 24, 31 rlwimi 4, 5, 0, 16, 31 we now produce: rlwinm 4, 3, 8, 0, 31 rlwimi 4, 3, 24, 16, 23 rlwimi 4, 3, 24, 0, 7 and for the 'test6' example in the PowerPC/README.txt file: unsigned test6(unsigned x) { return ((x & 0x00FF0000) >> 16) \| ((x & 0x000000FF) << 16); } we used to produce: lis 4, 255 rlwinm 3, 3, 16, 0, 31 ori 4, 4, 255 and 3, 3, 4 and now we produce: rlwinm 4, 3, 16, 24, 31 rlwimi 4, 3, 16, 8, 15 and, as a nice bonus, this fixes the FIXME in test/CodeGen/PowerPC/rlwimi-and.ll. This commit does not include instruction-selection for i64 operations, those will come later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224318 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-16 05:51:41 +00:00
Hal Finkel	0b19b561e0	[PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in PPCTL::DAGCombineExtBoolTrunc PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted truncations and extensions when dealing with nodes of the form: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) There was a FIXME in the implementation (now removed) regarding the fact that the function would abort the transformations if any of the non-output operands of a SELECT or SELECT_CC node would need to be promoted (because they were also output operands, for example). As a result, we continued to generate unnecessary zero-extends for code such as this: unsigned foo(unsigned a, unsigned b) { return (a <= b) ? a : b; } which would produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 rldicl 3, 3, 0, 32 blr and now we produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 blr which is better in the obvious way. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224213 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-14 05:53:19 +00:00
Hal Finkel	4e703f82f2	[PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all of the usual places, but also from the ABI, which specifies that values passed are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined to do something to the higher-order bits, and for some instructions, that action clears those bits (thus providing a zero-extended result). This is especially common after rotate-and-mask instructions. Adding an additional instruction to zero-extend the results of these instructions is unnecessary. This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and looks back through their operands to see if all instructions will implicitly zero extend their results. If so, we convert these instructions to their 64-bit variants (which is an internal change only, the actual encoding of these instructions is the same as the original 32-bit ones) and remove the unnecessary zero-extension (changing where the INSERT_SUBREG instructions are to make everything internally consistent). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224169 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-12 23:59:36 +00:00
Hal Finkel	f329765d23	[PowerPC] Better lowering for add/or of a FrameIndex If we have an add (or an or that is really an add), where one operand is a FrameIndex and the other operand is a small constant, we can combine the lowering of the FrameIndex (which is lowered as an add of the FI and a zero offset) with the constant operand. Amusingly, this is an old potential improvement entry from lib/Target/PowerPC/README.txt which had never been resolved. In short, we used to lower: %X = alloca { i32, i32 } %Y = getelementptr {i32,i32}* %X, i32 0, i32 1 ret i32* %Y as: addi 3, 1, -8 ori 3, 3, 4 blr and now we produce: addi 3, 1, -4 blr which is much more sensible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224071 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-11 22:51:06 +00:00

1 2 3 4 5 ...

4174 Commits