archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Serguei Katkov	a01b42e49a	[CGP] Fix the rematerialization of gc.relocates If we want to substitute the relocation of derived pointer with gep of base then we must ensure that relocation of base dominates the relocation of derived pointer. Currently only check for basic block is present. However it is possible that both relocation are in the same basic block but relocation of derived pointer is defined earlier. The patch moves the relocation of base pointer right before relocation of derived pointer in this case. Reviewers: sanjoy,artagnon,igor-laevsky,reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36462 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311067 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-17 05:48:30 +00:00
Sanjay Patel	5250bac12f	[CGP] use narrower types in memcmp expansion when possible This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309711 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-01 17:24:54 +00:00
Sanjay Patel	8209d78723	[CGP] use subtract or subtract-of-cmps for result of memcmp expansion As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint. Besides that theoretical motivation, there are 2 practical motivations for the subtract-of-cmps form: 1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). There is discussion about canonicalizing IR to the select form ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), so we probably need to add DAG transforms for those patterns anyway, but this improves the memcmp output without waiting for that step. 2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided if we choose to enable that. Differential Revision: https://reviews.llvm.org/D34904 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309597 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-31 18:08:24 +00:00
Simon Pilgrim	902fe6e6a0	[X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914) D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308986 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-25 17:04:37 +00:00
Serguei Katkov	8d9168d095	[CGP] Allow cycles during Phi traversal in OptimizaMemoryInst Allowing cycles in Phi traversal increases the scope of optimize memory instruction in case we are in loop. The added test shows an example of enabling optimization inside a loop. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35294 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308419 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-19 04:49:17 +00:00
Simon Pilgrim	e029500a63	[x86, CGP] increase memcmp() expansion up to 4 load pairs It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308322 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-18 15:55:30 +00:00
Eli Friedman	dd70def46c	[CodeGenPrepare] Don't create dead instructions in addrmode sinking When we fail to sink an instruction, we must make sure not to modify the function; otherwise, we end up in an infinite loop because CodeGenPrepare iterates until it doesn't make any changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 . git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307866 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-12 23:30:02 +00:00
George Burgess IV	7c497afb63	Add a test for r307754 As promised in D35003. Uses -codegenprepare instead of -instcombine since we hit the same buggy path anyway, and CGP lets us keep this test really simple (instcombine likes turning the alloca T, N into alloca [N x T], which hides the bug this is testing for). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307811 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-12 16:30:37 +00:00
Sanjay Patel	f65d8b9174	[CGP, x86] update test checks; NFC This was auto-generated using an older version of the script, and that version does not work with phis, so if we enable expansion it will go bad. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307267 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-06 15:31:38 +00:00
Keno Fischer	35d7a2e860	[CodeGenPrepare] Don't create inttoptr for ni ptrs Summary: Arguably non-integral pointers probably shouldn't show up here at all, but since the backend doesn't complain and this takes valid (according to the Verifier) IR and makes it invalid, make sure not to introduce any inttoptr instructions if we're dealing with non-integral pointers. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D33110 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306737 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-29 20:28:59 +00:00
Sanjay Patel	dbbccbae97	[CGP] add specialization for memcmp expansion with only one basic block git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306485 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-27 23:15:01 +00:00
Sanjay Patel	ca9df19568	[CGP] eliminate a sub instruction in memcmp expansion As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306471 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-27 21:46:34 +00:00
Sanjay Patel	2c60ba8943	[x86] set the datalayout to match the RUN line triple; NFC I don't think there's any visible difference from having the wrong layout for the 32-bit case at this point, but that could change in the future. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305931 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-21 17:06:24 +00:00
Sanjay Patel	aab686b3f7	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305802 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-20 15:58:30 +00:00
Sanjay Patel	5c9336eb2b	[CGP, x86] add tests for potential memcmp expansion; NFC No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ), so I want these for extra coverage if we enable memcmp expansion for x86. As shown, nothing is expanded for x86 in CGP yet. Also fundamentally, we're doing an IR transform, so we should have IR tests for just that part. If something goes wrong, we need to know if the bug is in CGP or later lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305011 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-08 20:40:39 +00:00
Teresa Johnson	7b29966a93	Restrict call metadata based hotness detection to Sample PGO mode Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302844 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-11 23:18:05 +00:00
Teresa Johnson	bb89593724	Fix code section prefix for proper layout Summary: r284533 added hot and cold section prefixes based on profile information, to enable grouping of hot/cold functions at link time. However, it used "cold" as the prefix for cold sections, but gold only recognizes "unlikely" (which is used by gcc for cold sections). Therefore, cold sections were not properly being grouped. Switch to using "unlikely" Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32983 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302502 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-09 01:43:24 +00:00
Brendon Cahoon	80269962da	[CodeGenPrepare] Fix crash due to an invalid CFG The splitIndirectCriticalEdges function generates and invalid CFG when the 'Target' basic block is a loop to itself. When this occurs, the code that updates the predecessor terminator needs to update the terminator in the split basic block. This occurs when there is an edge from block D back to D. Since D is split in to D0 and D1, the code needs to update the terminator in D1. But D1 is not in the OtherPreds vector, so it was not getting updated. Differential Revision: https://reviews.llvm.org/D32126 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300480 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-17 19:11:04 +00:00
Matt Arsenault	bdbe8280f2	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299876 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-10 20:18:21 +00:00
Eli Friedman	f25acacbe6	Turn on -addr-sink-using-gep by default. The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299723 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-06 22:42:18 +00:00
Nikolai Bozhenov	650bf3e599	[BypassSlowDivision] Do not bypass division of hash-like values Disable bypassing if one of the operands looks like a hash value. Slow division often occurs in hashtable implementations and fast division is never taken there because a hash value is extremely unlikely to have enough upper bits set to zero. A value is considered to be hash-like if it is produced by 1) XOR operation 2) Multiplication by a constant wider than the shorter type 3) PHI node with all incoming values being hash-like Differential Revision: https://reviews.llvm.org/D28200 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299329 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-02 13:14:30 +00:00
Dehao Chen	261eb1f850	Use isFunctionHotInCallGraph to set the function section prefix. Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31225 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298656 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-23 23:14:11 +00:00
Matt Arsenault	d706d030af	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298444 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-21 21:39:51 +00:00
George Burgess IV	3479ed63a6	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298430 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-21 20:08:59 +00:00
Sanjay Patel	33896dc120	[SimplifyCFG] move tests for PR31028 from CGP Hopefully, this will make sense with a forthcoming patch. If not, we can move these back. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297660 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-13 19:59:14 +00:00
Sanjay Patel	498a13fdd1	[CGP] add tests for PR31028; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297629 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-13 15:45:37 +00:00
Nikolai Bozhenov	ba60b2bba2	[BypassSlowDivision] Use ValueTracking to simplify run-time checks ValueTracking is used for more thorough analysis of operands. Based on the analysis, either run-time checks can be simplified (e.g. check only one operand instead of two) or the transformation can be avoided. For example, it is quite often the case that a divisor is promoted from a shorter type and run-time checks for it are redundant. With additional compile-time analysis of values, two special cases naturally arise and are addressed by the patch: 1) Both operands are known to be short enough. Then, the long division can be simply replaced with a short one without CFG modification. 2) If a division is unsigned and the dividend is known to be short then the long division is not needed at all. Because if the divisor is too big for short division then the quotient is obviously zero (and the remainder is equal to the dividend). Actually, the division is not needed when (divisor > dividend). Differential Revision: https://reviews.llvm.org/D29897 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296832 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 22:12:15 +00:00
Michael Kuperstein	1872f69aec	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296416 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-28 00:11:34 +00:00
Daniel Jasper	e5e8f2aec1	Revert "[CGP] Split some critical edges coming out of indirect branches" This reverts commit r296149 as it leads to crashes when compiling for PPC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296295 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-26 11:09:12 +00:00
Eli Friedman	0ee8e563c9	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces. When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296167 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 20:51:36 +00:00
Michael Kuperstein	98ee128c8e	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296149 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 18:41:32 +00:00
Michael Kuperstein	969577f54d	Revert r269060 to pacify bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296064 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 01:22:19 +00:00
Michael Kuperstein	8981fc9888	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296060 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 00:56:21 +00:00
George Burgess IV	1ced44b92a	[Analysis] Centralize objectsize lowering logic. We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290214 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-20 23:46:36 +00:00
Jun Bum Lim	4d9c93dc3f	[CodeGenPrep] Skip merging empty case blocks This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289988 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-16 20:38:39 +00:00
Sanjoy Das	83c8485899	Fix CodeGenPrepare::stripInvariantGroupMetadata `dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The only reason it worked at all is that `getMetadataID` returns something unrelated -- it returns the subclass ID of the receiver (which is used in `dyn_cast` etc.). That does not numerically match `LLVMContext::MD_invariant_group` and ends up dropping `invariant_group` along with every other metadata that does not numerically match `LLVMContext::MD_invariant_group`. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289973 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-16 18:52:33 +00:00
Jun Bum Lim	51900e49f9	Revert "[CodeGenPrep] Skip merging empty case blocks" This reverts commit r289951. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289960 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-16 17:06:14 +00:00
Jun Bum Lim	6db0eaf697	[CodeGenPrep] Skip merging empty case blocks This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289951 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-16 16:03:31 +00:00
Matt Arsenault	76a17e03e0	AMDGPU: Implement isCheapAddrSpaceCast git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288523 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-02 18:12:53 +00:00
Joerg Sonnenberger	46cc79217b	Revert r287553: [CodeGenPrep] Skip merging empty case blocks It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288052 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-28 18:56:54 +00:00
Justin Lebar	09220c80d3	[CodeGenPrepare] Don't sink non-cheap addrspacecasts. Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287591 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-21 22:49:15 +00:00
Jun Bum Lim	b68036c70c	[CodeGenPrep] Skip merging empty case blocks Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287553 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-21 16:47:28 +00:00
Justin Lebar	eb77a4a537	[BypassSlowDivision] Handle division by constant numerators better. Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287062 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-16 00:44:47 +00:00
Justin Lebar	27d02ea698	Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285464 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 21:56:07 +00:00
Justin Lebar	f644e7b00f	Don't leave unused divs/rems sitting around in BypassSlowDivision. Summary: This "pass" eagerly creates div and rem instructions even when only one is needed -- it relies on a later pass (machine DCE?) to clean them up. This is problematic not just from a cleanliness perspective (this pass is running during CodeGenPrepare, so should leave the IR in a better state), but it also creates a problem for instruction selection. If we always have a div+rem, isel will always select a divrem instruction (if possible), even when a single div or rem would do. Specifically, in NVPTX, we want to compute rem from the output of div, if available. But if a div is not available, we want to leave the rem alone. This transformation is overeager if div is always available. Because this code runs as part of CodeGenPrepare, it's nontrivial to write a test for this change. But this will effectively be tested by a later patch which adds the aforementioned change to NVPTX isel. Reviewers: tra Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26088 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285460 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 21:43:54 +00:00
Justin Lebar	9488f1f527	Don't claim the udiv created in BypassSlowDivision is exact. Summary: In BypassSlowDivision's short-dividend path, we would create e.g. udiv exact i32 %a, %b "exact" here means that we are asserting that %a is a multiple of %b. But we have no reason to believe this must be true -- this is just a bug, as far as I can tell. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D26097 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285459 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 21:43:51 +00:00
Dehao Chen	2e4381ef79	Update the section.ll to fix non-x86 failure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284566 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 03:53:41 +00:00
Dehao Chen	625e9e7e61	Revert r284545 again as the regression in ppc still exists. There is bug in MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284563 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 01:18:25 +00:00
Dehao Chen	8c8a9767af	Add target for test to fix regression introduced by r284533. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284538 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 21:13:31 +00:00
Dehao Chen	977fc82cac	Use profile info to set function section prefix to group hot/cold functions. Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284533 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 20:42:47 +00:00

1 2 3

132 Commits