archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Arnold Schwaighofer	db4a46b079	swiftcc: Don't emit tail calls from callers with swifterror parameters Backends don't support this yet. They would have to move to the swifterror register before the tail call to make sure it is live-in to the call. rdar://30495920 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294982 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-13 19:58:28 +00:00
James Molloy	eef9539b00	[ARM] Fix crash caused by r294945 I'd missed a creator of FCMP nodes - duplicateCmp(). Kindly and promptly reported by Gabor Ballabas, due to his CSiBE test suite. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294968 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-13 17:18:00 +00:00
Sanne Wouda	7a9d6eeb90	[CodeGen] fix alignment of JUMPTABLE_INSTS on v8M.base Summary: The attached test case fails with "fatal error: error in backend: misaligned pc-relative fixup value" as the jump table is misaligned. The EmitAlignment existed already for ARM and Thumb-1 code, but was missing for Thumb-2. The test checks that the fatal error disappears when generating an obj file, as well as checking the align directive is there when producing an asm file. Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker Reviewed By: samparker Subscribers: samparker, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D29650 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294950 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-13 14:07:45 +00:00
James Molloy	9b264f7915	[ARM] Use VCMP, not VCMPE, for floating point equality comparisons When generating a floating point comparison we currently unconditionally generate VCMPE. This has the sideeffect of setting the cumulative Invalid bit in FPSCR if any of the operands are QNaN. It is expected that use of a relational predicate on a QNaN value should raise Invalid. Quoting from the C standard: The relational and equality operators support the usual mathematical relationships between numeric values. For any ordered pair of numeric values exactly one of relationships the less, greater, equal and is true. Relational operators may raise the floating-point exception when argument values are NaNs. The standard doesn't explicitly state the expectation for equality operators, but the implication and obvious expectation is that equality operators should not raise Invalid on a QNaN input, as those predicates are wholly defined on unordered inputs (to return not equal). Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if QNaN should raise Invalid, and pipe that through to TableGen. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294945 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-13 12:32:47 +00:00
John Brawn	b0221e3835	[ARM] Fix incorrect mask bits in MSR encoding for write_register intrinsic In the encoding of system registers in the M-class MSR instruction the mask bits should be 2 for registers that don't take a _<bits> qualifier (the instruction is unpredictable otherwise), and should also be 2 if the register takes a _<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq. Differential Revision: https://reviews.llvm.org/D29828 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294762 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-10 17:41:08 +00:00
George Burgess IV	e79ec29260	[ARM] Add support for armv7ve triple in llvm (PR31358). Gcc supports target armv7ve which is armv7-a with virtualization extensions. This change adds support for this in llvm for gcc compatibility. Also remove redundant FeatureHWDiv, FeatureHWDivARM for a few models as this is specified automatically by FeatureVirtualization. Patch by Manoj Gupta. Differential Revision: https://reviews.llvm.org/D29472 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294661 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 23:29:14 +00:00
Artur Pilipenko	2105006c6e	Add DAGCombiner load combine tests for partially available values If some of the trailing or leading bytes of a load combine pattern are zeroes we can combine the pattern to a load + zext and shift. Currently we don't support it, so the tests check the current codegen without load combine. This change will make the patch to support this kind of combine a bit more clear. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294591 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 15:13:40 +00:00
Diana Picus	8a37b965ff	[ARM] GlobalISel: Lower single precision FP args Both for aapcscc and aapcs_vfpcc. We currently filter out soft float targets because we don't support libcalls yet. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294584 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 13:09:59 +00:00
Artur Pilipenko	66a342c211	[DAGCombiner] Support non-zero offset in load combine Enable folding patterns which load the value from non-zero offset: i8 a = ... i32 val = a[4] \| (a[5] << 8) \| (a[6] << 16) \| (a[7] << 24) => i32 val = ((i32*)(a+4)) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29394 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294582 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 12:06:01 +00:00
Arnold Schwaighofer	f9b31cf6f5	SwiftCC: swifterror register cannot be as the base register Functions that have a dynamic alloca require a base register which is defined to be X19 on AArch64 and r6 on ARM. We have defined the swifterror register to be the same register. Use a different callee save register for swifterror instead: X21 on AArch64 R8 on ARM rdar://30433803 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294551 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 01:52:17 +00:00
Arnold Schwaighofer	7296f4cb83	[ARM/AArch ISel] SwiftCC: First parameters that are marked swiftself are not 'this returns' We mark X0 as preserved by a call that passes the returned parameter. x0 = ... fun(x0) // no implicit def of x0 This no longer is valid if we pass the parameter in a different register then the returned value as is the case with a swiftself parameter (passed in x20). x20 = ... fun(x20) // there should be an implict def of x8 rdar://30425845 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294527 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 22:30:47 +00:00
Diana Picus	96111bcb31	Fix test to work on swift/cyclone too I forgot to remove the neonfp target feature from the test, which means we'd have trouble selecting VADDS on targets that have neonfp enabled by default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294451 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 14:23:30 +00:00
Diana Picus	bb65a8b75f	[ARM] GlobalISel: Add FPR reg bank Add a register bank for floating point values and select simple instructions using them (add, copies from GPR). This assumes that the hardware can cope with a single precision add (VADDS) instruction, so the legalizer will treat G_FADD as legal and the instruction selector will refuse to select if the hardware doesn't support it. In the future we'll want to be more careful about this, and legalize to libcalls if we have to use soft float. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294442 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 13:23:04 +00:00
Artur Pilipenko	6449f6ff38	Add DAGCombiner load combine tests for {a\|s}ext, {a\|z\|s}ext load nodes Currently we don't support these nodes, so the tests check the current codegen without load combine. This change makes the review of the change to support these nodes more clear. Separated from https://reviews.llvm.org/D29591 review. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294305 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-07 14:09:37 +00:00
Christof Douma	5c21be0e40	[ARM] Make RWPI use movw/movt when available When constructing global address literals while targeting the RWPI relocation model. LLVM currently only uses literal pools. If MOVW/MOVT instructions are available we can use these instead. Beside being more efficient it allows -arm-execute-only to work with -relocation-model=RWPI as well. When we generate MOVW/MOVT for global addresses when targeting the RWPI relocation model, we need to use base relative relocations. This patch does the needed plumbing in MC to generate these for MOVW/MOVT. Differential Revision: https://reviews.llvm.org/D29487 Change-Id: I446786e43a6f5aa9b6a5bb2cd216d60d41c7755d git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294298 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-07 13:07:12 +00:00
Artur Pilipenko	e7cc4f3c59	[DAGCombiner] Support bswap as a part of load combine patterns Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D29397 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294201 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-06 17:48:08 +00:00
Artur Pilipenko	cf5b8bc093	Add DAGCombiner load combine tests with non-zero offset This is separated from https://reviews.llvm.org/D29394 review. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294185 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-06 14:15:31 +00:00
Matthias Braun	b6384b27aa	MachineCopyPropagation: Respect implicit operands of COPY The code missed to check implicit operands of COPY instructions for defs/uses. Differential Revision: https://reviews.llvm.org/D29522 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294088 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-04 02:27:20 +00:00
Sanne Wouda	90365ceb11	[ARM] Change TCReturn to tBL if tailcall optimization fails. Summary: The tail call optimisation is performed before register allocation, so at that point we don't know if LR is being spilt or not. If LR was spilt to the stack, then we cannot do a tail call optimisation. That would involve popping back into LR which is not possible in Thumb1 code. Reviewers: rengolin, jmolloy, rovka, olista01 Reviewed By: olista01 Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D29020 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294000 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 11:15:53 +00:00
Sanne Wouda	5f9d8e2229	[LLC] Add an inline assembly diagnostics handler. Summary: llc would hit a fatal error for errors in inline assembly. The diagnostics message is now printed. Reviewers: rengolin, MatzeB, javed.absar, anemet Reviewed By: anemet Subscribers: jyknight, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D29408 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293999 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 11:14:39 +00:00
Javed Absar	f497270e32	[ARM] Classification Improvements to ARM Sched-Model. NFCI. This is the second in the series of patches to enable adding of machine sched-models for ARM processors easier and compact. This patch focuses on integer instructions and adds missing sched definitions. Reviewers: rovka, rengolin Differential Revision: https://reviews.llvm.org/D29127 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293935 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 21:08:12 +00:00
Nirav Dave	529986a15d	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293915 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 18:24:55 +00:00
Nirav Dave	99b0642f83	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293893 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 14:39:42 +00:00
Diana Picus	c53d2b802e	[ARM] GlobalISel: Lower pointer args and returns It is important to change the ArgInfo's type from pointer to integer, otherwise the CC assign function won't know what to do. Instead of hacking it up, we use ComputeValueVTs and introduce some of the helpers that we will need later on for lowering more complex types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293889 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 14:01:00 +00:00
Diana Picus	e92069833f	[ARM] GlobalISel: Legalize loading pointers Make it legal to load pointer values. Also check that pointers are assigned to the GPR reg bank by default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293886 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 13:20:49 +00:00
Diana Picus	10282bda8c	[ARM] GlobalISel: Test default banks for load results. NFC. Check that all scalars are loaded into the GPR by default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293883 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 13:00:24 +00:00
Javed Absar	bddaeb4ffb	[ARM] Enable Cortex-M23 and Cortex-M33 support. Add both cores to the target parser and TableGen. Test that eabi attributes are set correctly for both cores. Additionally, test the absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively. Committed on behalf of Sanne Wouda. Reviewers : rengolin, olista01. Differential Revision: https://reviews.llvm.org/D29073 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293761 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-01 11:55:03 +00:00
Kyle Butt	5818a513ae	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293716 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 23:48:32 +00:00
Sam Parker	d9605fec4b	[ARM] Avoid using ARM instructions in Thumb mode The Requires class overrides the target requirements of an instruction, rather than adding to them, so all ARM instructions need to include the IsARM predicate when they have overwitten requirements. This caused the swp and swpb instructions to be allowed in thumb mode assembly, and the ARM encoding of CDP to be selected in codegen (which is different for conditional instructions). Differential Revision: https://reviews.llvm.org/D29283 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293634 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 14:35:01 +00:00
Matt Arsenault	430953ebc8	DAG: Constant fold fp16_to_fp/fp16_to_fp This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293499 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 16:57:41 +00:00
Saleem Abdulrasool	ce948e6304	ARM: support `-mlong-calls` with AEABI TLS on ELF Support lowering AEABI TLS access (__aeabi_read_tp) with long calls. This requires adjusting the call sequence to use an indirect call to get full addressability. Resolves PR31769! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293433 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-29 16:46:22 +00:00
Matthew Simpson	6ea10124d1	[ARM/AArch64] Relocate and update InterleavedAccessPass tests (NFC) The interleaved access pass is an IR-to-IR transformation that runs before code generation. It matches interleaved memory operations to target-specific intrinsics (that are later lowered to load and store multiple instructions on ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under test/Transforms. This patch moves the InterleavedAccessPass tests out of test/CodeGen and into target-specific directories under test/Transforms/InterleavedAccess. Although the pass is an IR pass, many of the existing tests were llc tests rather opt tests. For example, the tests would check for ldN/stN instructions generated by llc rather than the intrinsic calls the pass actually inserts. Thus, this patch updates all tests to be opt tests that check for the inserted intrinsics. We already have separate CodeGen tests that ensure we lower the interleaved access intrinsics to their corresponding ldN/stN instructions. In addition to migrating the tests to opt, this patch also performs some minor clean-up (to ensure consistent naming, etc.). Differential Revision: https://reviews.llvm.org/D29184 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293309 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-27 17:33:16 +00:00
Saleem Abdulrasool	611cb2c218	ARM: fix vectorized division on WoA The Windows on ARM target uses custom division for normal division as the backend needs to insert division-by-zero checks. However, it is designed to only handle non-vectorized division. ARM has custom lowering for vectorized division as that can avoid loading registers with the values and invoke a division routine for each one, preferring to lower using NEON instructions. Fall back to the custom lowering for the NEON instructions if we encounter a vectorized division. Resolves PR31778! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293259 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-27 03:41:53 +00:00
Nirav Dave	d9031ef908	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293188 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 16:46:13 +00:00
Nirav Dave	dbb7a65598	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293184 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 16:02:24 +00:00
Diana Picus	4cead7216b	[ARM] GlobalISel: Load i1, i8 and i16 args from stack Add support for loading i1, i8 and i16 arguments from the stack, with or without the ABI extension flags. When the ABI extension flags are present, we load a 4-byte value, otherwise we preserve the size of the load and let the instruction selector replace it with a LDRB/LDRH. This generates the same thing as DAGISel. Differential Revision: https://reviews.llvm.org/D27803 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293163 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 09:20:47 +00:00
Tim Northover	d1752d763f	SDag: fix how initial loads are formed when splitting vector ops. Later code expects the vector loads produced to be directly concatenable, which means we shouldn't pad anything except the last load produced with UNDEF. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293088 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 20:58:26 +00:00
Artur Pilipenko	b52af07c1d	[DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2 . The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used. From the original commit: Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it. Assuming little endian target: i8 a = ... i32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) => i32 val = ((i32)a) i8 a = ... i32 val = (a[0] << 24) \| (a[1] << 16) \| (a[2] << 8) \| a[3] => i32 val = BSWAP(((i32)a)) This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations. Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part: i32 val = a[i] \| (a[i + 1] << 8) \| (a[i + 2] << 16) \| (a[i + 3] << 24) Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses. The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed. Reviewed By: RKSimon, filcab, chandlerc Differential Revision: https://reviews.llvm.org/D27861 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293036 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 08:53:31 +00:00
Diana Picus	6bed410085	[ARM] GlobalISel: Support i1 add and ABI extensions Add support for: * i1 add * i1 function arguments, if passed through registers * i1 returns, with ABI signext/zeroext Differential Revision: https://reviews.llvm.org/D27706 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293035 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 08:47:40 +00:00
Diana Picus	c3d81fb16c	[ARM] GlobalISel: Support i8/i16 ABI extensions At the moment, this means supporting the signext/zeroext attribute on the return type of the function. For function arguments, signext/zeroext should be handled by the caller, so there's nothing for us to do until we start lowering calls. Note that this does not include support for other extensions (i8 to i16), those will be added later. Differential Revision: https://reviews.llvm.org/D27705 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293034 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 08:10:40 +00:00
Javed Absar	f8d5c73567	[ARM] Classification Improvements to ARM Sched-Models. NFCI. This is a series of patches to enable adding of machine sched models for ARM processors easier and compact. They define new sched-readwrites for groups of ARM instructions. This has been missing so far, and as a consequence, machine scheduler models for individual sub-targets have tended to be larger than they needed to be. The current patch focuses on floating-point instructions. Reviewers: Diana Picus (rovka), Renato Golin (rengolin) Differential Revision: https://reviews.llvm.org/D28194 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292825 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-23 20:20:39 +00:00
Sjoerd Meijer	360cd34a82	[Thumb] Add support for tMUL in the compare instruction peephole optimizer. We also want to optimise tests like this: return a*b == 0. The MULS instruction is flag setting, so we don't need the CMP instruction but can instead branch on the result of the MULS. The generated instructions sequence for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the boolean values resulting from the select instruction, but these MOVS instructions are flag setting and were thus preventing this optimisation. Now we first reorder and move the MULS to before the CMP and generate sequence MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the MULS and MOVS is safe to do because the subsequent MOVS instructions just set the CPSR register and don't use it, i.e. the CPSR is dead. Differential Revision: https://reviews.llvm.org/D27990 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292608 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-20 13:10:12 +00:00
Serge Rogatch	9a63b871bd	[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier Summary: Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future. This patch is one of a series, see also - https://reviews.llvm.org/D28623 Reviewers: rengolin, dberris Reviewed By: dberris Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown Differential Revision: https://reviews.llvm.org/D28624 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292516 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-19 20:24:23 +00:00
Renato Golin	6994e2312e	Revert "[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier" This reverts commit r292210, as it broke the Thumb buldbot with: clang-5.0: error: the clang compiler does not support '-fxray-instrument on thumbv7-unknown-linux-gnueabihf'. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292357 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-18 09:08:43 +00:00
Serge Rogatch	fca725c192	[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier Summary: Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future. This patch is one of a series, see also - https://reviews.llvm.org/D28623 Reviewers: rengolin, dberris Reviewed By: dberris Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown Differential Revision: https://reviews.llvm.org/D28624 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292210 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-17 11:52:10 +00:00
Simon Pilgrim	b648fac5ca	[SelectionDAG] Add support for BITREVERSE constant folding We were relying on constant folding of the legalized instructions to do what constant folding we had previously git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292114 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-16 13:39:00 +00:00
Kyle Butt	0aa7497cd7	Revert "CodeGen: Allow small copyable blocks to "break" the CFG." This reverts commit `ada6595a52`. This needs a simple probability check because there are some cases where it is not profitable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291695 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-11 19:55:19 +00:00
Eli Friedman	1f1e5c29c3	[ARM] More aggressive matching for vpadd and vpaddl. The new matchers work after legalization to make them simpler, and to avoid blocking other optimizations. Differential Revision: https://reviews.llvm.org/D27779 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291693 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-11 19:33:38 +00:00
Evgeny Astigeevich	de8188fb92	[ARM] Fix test CodeGen/ARM/fpcmp_ueq.ll broken by rL290616 Commit rL290616 (https://reviews.llvm.org/rL290616) changed a checking command for the triple arm-apple-darwin in LLVM::CodeGen/ARM/fpcmp_ueq.ll. As a result of the changes the test could fail for the valid generated code. These changes fixes the test to check only instructions we would expect. Differential Revision: https://reviews.llvm.org/D28159 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291678 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-11 16:23:28 +00:00
Kyle Butt	ada6595a52	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291609 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-10 23:04:30 +00:00

1 2 3 4 5 ...

2942 Commits