archived-llvm

mirror of https://github.com/RPCSX/llvm.git synced 2026-01-31 01:05:23 +01:00

Author	SHA1	Message	Date
Nirav Dave	5fc240a5b6	Recommitting Craig Topper's patch now that r296476 has been recommitted. When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297698 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-14 01:42:23 +00:00
Nirav Dave	3bbf394145	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297695 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-14 00:34:14 +00:00
Amaury Sechet	ff2afbf7d8	Use setBits in SelectionDAG Summary: As per title. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30836 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297559 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-11 11:24:03 +00:00
Simon Pilgrim	49e0bf23d7	[SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297492 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-10 18:36:46 +00:00
Amaury Sechet	5436744eb4	[SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC. Summary: Depends on D30379 This improves the state of things for the sub class of operation. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30436 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297482 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-10 17:26:44 +00:00
Amaury Sechet	a29d680671	[SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO. Summary: As per title. This is extracted from D29872 and I threw SADDO in. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30379 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297479 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-10 17:06:52 +00:00
Simon Pilgrim	943d3e07f8	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297458 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-10 13:44:32 +00:00
Amaury Sechet	2cab1ec06e	[DAGCombiner] Do various combine on uaddo. Summary: This essentially does the same transform as for ADC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30417 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297416 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-09 22:47:00 +00:00
Amaury Sechet	f15775bffb	[DAGCombiner] Do various combine on usubo. Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297404 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-09 19:28:00 +00:00
Sanjay Patel	4f7ea50478	[DAG] recognize div/rem by 0 as undef before trying constant folding As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297384 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-09 15:02:25 +00:00
Matt Arsenault	6d62c71357	DAG: Check no signed zeros instead of unsafe math attribute git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297354 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-09 01:36:39 +00:00
Eli Friedman	9ac562c933	[DAGCombine] Simplify ISD::AND in GetDemandedBits. This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297249 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-08 00:56:35 +00:00
Sanjay Patel	5e80a82489	[DAG] refactor related div/rem folds; NFCI This is known incomplete and not called in the right order relative to other folds, but that's the current behavior. I'm just trying to clean this up before making actual functional changes to make the patch smaller. The logic here should mimic the IR equivalents that are in InstSimplify's simplifyDivRem(). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297086 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-06 22:32:40 +00:00
Sanjay Patel	58868f1c75	[DAGCombiner] simplify div/rem-by-0 Refactoring of duplicated code and more fixes to follow. This is motivated by the post-commit comments for r296699: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html Ie, we can crash if we're missing obvious simplifications like this that exist in the IR simplifier or if these occur later than expected. The x86 change for non-splat division shows a potential opportunity to improve vector codegen: we assumed that since only one lane had meaningful results, we should do the math in scalar. But that means moving back and forth from vector registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297026 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-06 16:36:42 +00:00
Sanjay Patel	5caa12b8d3	[DAG] fix formatting; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297015 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-06 15:27:57 +00:00
Sanjay Patel	6318cbb1fd	[DAG] fix typo in comment; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297011 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-06 15:07:43 +00:00
Simon Pilgrim	08c7984cee	[SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions Found by fuzz testing after rL296985 landed git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296989 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-05 15:52:18 +00:00
Simon Pilgrim	ca6750e3d5	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296985 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-05 09:57:20 +00:00
Craig Topper	1ecf0813db	[DAGCombine] Use APInt::operator\|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator\| support rvalue references similar to add/sub. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296981 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-05 01:08:16 +00:00
Sanjay Patel	59071e49a9	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296977 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-04 19:18:09 +00:00
Florian Hahn	a4e8716fd4	[legalize-types] Remove stale entries from SoftenedFloats. Summary: When replacing a SDValue, we should remove the replaced value from SoftenedFloats (and possibly the other maps as well?). When we revisit a Node because it needs analyzing again, we have to remove all result values from SoftenedFloats (and possibly other maps?). This fixes the fp128 test failures with expensive checks for X86. I think we probably should also remove the values from the other maps (PromotedIntegers and so on), let me know what you think. Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon Reviewed By: chh Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D29265 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296964 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-04 12:00:35 +00:00
Simon Pilgrim	74b3a7ad26	Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296886 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-03 17:03:52 +00:00
Simon Pilgrim	b526446628	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation Avoids all the unnecessary extra bitrange creation/shift stages. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296879 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-03 16:35:57 +00:00
Simon Pilgrim	d3f4ec4842	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation Avoids all the unnecessary extra bitrange creation/shift stages. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296871 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-03 14:25:46 +00:00
Chandler Carruth	f970832c3b	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296862 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-03 10:02:25 +00:00
Taewook Oh	15497c13fd	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296825 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 21:58:35 +00:00
Sanjay Patel	0eec3b0c78	[DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296824 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 21:56:43 +00:00
Sanjay Patel	e5601be82e	[DAG] improve documentation comments; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296808 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 20:48:08 +00:00
Sanjay Patel	d541a8113c	[DAGCombiner] avoid assertion when folding binops with opaque constants This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296768 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 17:18:56 +00:00
Sanjay Patel	131b639126	fix typo in comment; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296760 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 16:37:24 +00:00
Amaury Sechet	a50f957a61	[DAGCombiner] mulhi + 1 never overflow. Summary: This can be used to optimize large multiplications after legalization. Depends on D29565 Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29587 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296711 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 23:44:17 +00:00
Sanjay Patel	7685de201c	[DAGCombiner] fold binops with constant into select-of-constants This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296699 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 22:51:31 +00:00
Benjamin Kramer	abd2baf207	[DAGCombiner] Remove non-ascii character and reflow comment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296690 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 22:10:43 +00:00
Reid Kleckner	4c3428b604	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296683 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 21:42:00 +00:00
Nirav Dave	e4dc80fc4b	[DAG] Prevent Stale nodes from entering worklist Add check that deleted nodes do not get added to worklist. This can occur when a node's operand is simplified to an existing node. This fixes PR32108. Reviewers: jyknight, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30506 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296668 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 20:19:38 +00:00
Artur Pilipenko	a4e52312e8	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296651 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 18:12:29 +00:00
Ahmed Bougacha	fb5e79a577	[CodeGen] Remove dead FastISel code after SDAG emitted a tailcall. When SDAGISel (top-down) selects a tail-call, it skips the remainder of the block. If, before that, FastISel (bottom-up) selected some of the (no-op) next few instructions, we can end up with dead instructions following the terminator (selected by SDAGISel). We need to erase them, as we know they aren't necessary (in addition to being incorrect). We already do this when FastISel falls back on the tail-call itself. Also remove the FastISel-emitted code if we fallback on the instructions between the tail-call and the return. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296552 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 00:43:42 +00:00
Sanjay Patel	0e267e802a	[DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296502 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-28 18:41:49 +00:00
Craig Topper	79e90c5570	[DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node. This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296486 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-28 16:52:05 +00:00
Nirav Dave	bfdb3f2a5a	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296476 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-28 14:24:15 +00:00
Eugene Zelenko	90d9920fc9	[CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296404 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 22:45:06 +00:00
Arnold Schwaighofer	e4e218c802	ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock Otherwise, it will insert instructions before it. rdar://30536186 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296395 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 22:12:06 +00:00
Matt Arsenault	68c622048b	Revert "DAG: Check if extract_vector_elt is legal or custom" This reverts r295782. This could potentially result in some legalization loops and I avoided the need for this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296393 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 21:59:07 +00:00
Simon Pilgrim	e9bce87a93	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296381 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 21:01:57 +00:00
Artur Pilipenko	85f508cd9d	[DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296336 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 13:04:23 +00:00
Artur Pilipenko	f85432589b	[DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296332 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 11:42:54 +00:00
Artur Pilipenko	0003ac947f	[DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296331 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 11:40:14 +00:00
Nirav Dave	b89cc7e5e3	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296279 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-26 01:27:32 +00:00
Artyom Skrobov	cc3dbb4073	No need to copy the variable [NFC] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296259 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-25 17:18:09 +00:00
Nirav Dave	32147cef64	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296252 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-25 11:43:58 +00:00

1 2 3 4 5 ...

8114 Commits