archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Sam Parker	a2dc88278d	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store] Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374763 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 10:00:21 +00:00
Zi Xuan Wu	704914973a	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374634 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 02:53:04 +00:00
David Greene	4f2b50c62d	[System Model] [TTI] Move default cache/prefetch implementations Move the default implementations of cache and prefetch queries to TargetTransformInfoImplBase and delete them from NoTIIImpl. This brings these interfaces in line with how other TTI interfaces work. Differential Revision: https://reviews.llvm.org/D68804 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374446 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-10 20:39:27 +00:00
David Greene	53a00b6727	[System Model] [TTI] Update cache and prefetch TTI interfaces Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374205 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-09 19:51:48 +00:00
Jinsong Ji	cf65f7210c	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374091 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-08 17:32:56 +00:00
Zi Xuan Wu	ee8d82e802	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374017 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-08 03:28:33 +00:00
Martin Storsjo	d749f502c7	Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373882 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-07 08:21:37 +00:00
Sanjay Patel	8646f280f2	[SLP] avoid reduction transform on patterns that the backend can load-combine I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373833 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-05 18:03:58 +00:00
Sam Parker	3760694e63	[NFC][HardwareLoops] Update some iterators git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373309 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-01 07:53:28 +00:00
Guillaume Chatelet	71864c0be5	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373081 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-27 12:54:21 +00:00
Guillaume Chatelet	5932076b5e	[NFC] remove unused functions Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67616 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371994 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-16 14:48:58 +00:00
Guillaume Chatelet	3ad084e5dd	[LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371063 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-05 13:09:42 +00:00
Roman Lebedev	07c1e2bdfa	[CostModel] Model all `extractvalue`s as free. Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370339 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-29 11:50:30 +00:00
Matt Arsenault	5a130c5b33	InferAddressSpaces: Move target intrinsic handling to TTI I'm planning on handling intrinsics that will benefit from checking the address space enums. Don't bother moving the address collection for now, since those won't need th enums. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368895 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-14 18:13:00 +00:00
Daniil Fukalov	7768f31c04	[AMDGPU] Tune inlining parameters for AMDGPU target Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366348 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-17 16:51:29 +00:00
Jinsong Ji	7e32df8cbe	Revert "[HardwareLoops] NFC - move hardware loop checking code to isHardwareLoopProfitable()" This reverts commit d95557306585404893d610784edb3e32f1bfce18. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365520 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-09 17:53:09 +00:00
Chen Zheng	1ec7dd26c7	[HardwareLoops] NFC - move hardware loop checking code to isHardwareLoopProfitable() Differential Revision: https://reviews.llvm.org/D64197 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365497 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-09 14:56:17 +00:00
Chen Zheng	adf8ddb23a	[PowerPC] exclude ICmpZero in LSR if icmp can be replaced in later hardware loop. Differential Revision: https://reviews.llvm.org/D63477 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364993 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-03 01:49:03 +00:00
Chen Zheng	3a6c5d72b9	[HardwareLoops] NFC - move loop with irreducible control flow checking logic to HarewareLoopInfo. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364415 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 12:02:43 +00:00
Chen Zheng	5bd39d6f6e	[HardwareLoops] NFC - move loop with irreducible control flow checking logic to isHardwareLoopProfitable() git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364397 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-26 09:12:52 +00:00
Clement Courbet	6ef46a770d	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364281 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-25 08:04:13 +00:00
Chen Zheng	221779b03f	[NFC] move some hardware loop checking code to a common place for other using. Differential Revision: https://reviews.llvm.org/D63478 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363758 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 01:26:31 +00:00
Amara Emerson	f765312a6f	[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 1324200 -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363632 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 23:20:29 +00:00
Warren Ristow	31868b92df	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363581 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 17:20:08 +00:00
Sam Parker	c313a177b4	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362774 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-07 07:35:30 +00:00
Craig Topper	315a86c2f2	[CostModel] Add really basic support for being able to query the cost of the FNeg instruction. Summary: This reuses the getArithmeticInstrCost, but passes dummy values of the second operand flags. The X86 costs are wrong and can be improved in a follow up. I just wanted to stop it from reporting an unknown cost first. Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62444 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361788 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-28 04:09:18 +00:00
Sjoerd Meijer	71dbcf0b11	[ARM] Implement TTI::getMemcpyCost This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359547 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-30 10:28:50 +00:00
Craig Topper	c1b79b3c59	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356687 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-21 17:38:52 +00:00
Sjoerd Meijer	240db0070c	[TTI] Enable analysis of clib functions in getIntrinsicCosts. NFCI. This is addressing the issue that we're not modeling the cost of clib functions in TTI::getIntrinsicCosts and thus we're basically addressing this fixme: // FIXME: This is wrong for libc intrinsics. To enable analysis of clib functions, we not only need an intrinsic ID and formal arguments, but also the actual user of that function so that we can e.g. look at alignment and values of arguments. So, this is the initial plumbing to pass the user of an intrinsinsic on to getCallCosts, which queries getIntrinsicCosts. Differential Revision: https://reviews.llvm.org/D59014 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355901 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-12 09:48:02 +00:00
Sam Parker	d0c143de76	[LSR] Generate cross iteration indexes Modify GenerateConstantOffsetsImpl to create offsets that can be used by indexed addressing modes. If formulae can be generated which result in the constant offset being the same size as the recurrence, we can generate a pre-indexed access. This allows the pointer to be updated via the single pre-indexed access so that (hopefully) no add/subs are required to update it for the next iteration. For small cores, this can significantly improve performance DSP-like loops. Differential Revision: https://reviews.llvm.org/D55373 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353403 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-07 13:32:54 +00:00
Chandler Carruth	6b547686c5	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-19 08:50:56 +00:00
Tom Stellard	356397dc91	Only promote args when function attributes are compatible Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351296 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-16 05:15:31 +00:00
Simon Pilgrim	48de5cc373	[TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346868 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-14 15:04:08 +00:00
Simon Pilgrim	20a1e3c6ac	[TTI] Make TargetTransformInfo::getOperandInfo static. NFCI. It has no member dependencies and this makes it easier to reuse in other cost analysis code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346755 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-13 13:45:10 +00:00
Simon Pilgrim	ec09d9119e	[TTI] Flip vector types in getShuffleCost SK_ExtractSubvector call For SK_ExtractSubvector, the default 'Ty' type is the source operand type and 'SubTy' is the destination subvector type I got this the wrong way around when I added rL346510 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346534 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-09 18:30:59 +00:00
Simon Pilgrim	3ccdf221e9	[CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346510 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-09 16:28:19 +00:00
Dorit Nuzman	06bac6c858	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-31 09:57:56 +00:00
Dorit Nuzman	7d7250490b	recommit 344472 after fixing build failure on ARM and PPC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344475 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 08:50:06 +00:00
Dorit Nuzman	473da03560	revert 344472 due to failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344473 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 07:21:20 +00:00
Dorit Nuzman	a3ff03e8e2	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344472 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-14 07:06:16 +00:00
Jonas Paulsson	9a16b611a7	[LoopVectorizer] Use TTI.getOperandInfo() Call getOperandInfo() instead of using (near) duplicated code in LoopVectorizationCostModel::getInstructionCost(). This gets the OperandValueKind and OperandValueProperties values for a Value passed as operand to an arithmetic instruction. getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but is now instead a public member. Review: Florian Hahn https://reviews.llvm.org/D52883 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343852 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-05 14:34:04 +00:00
Fangrui Song	af7b1832a0	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338293 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-30 19:41:25 +00:00
Simon Pilgrim	7a7cfd8a89	[TargetTransformInfo] Add pow2 analysis for scalar constants Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336827 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-11 17:51:27 +00:00
Sanjay Patel	42f462a392	[IR] move shuffle mask queries from TTI to ShuffleVectorInst The optimizer is getting smarter (eg, D47986) about differentiating shuffles based on its mask values, so we should make queries on the mask constant operand generally available to avoid code duplication. We'll probably use this soon in the vectorizers and instcombine (D48023 and https://bugs.llvm.org/show_bug.cgi?id=37806). We might clean up TTI a bit more once all of its current 'SK_*' options are covered. Differential Revision: https://reviews.llvm.org/D48236 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335067 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-19 18:44:00 +00:00
Benjamin Kramer	c011f6948e	Fix namespaces. No functionality change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334890 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-16 13:37:52 +00:00
Simon Pilgrim	419887cd06	[CostModel] Cleanup isSingleSourceVectorMask to match other shuffle matchers. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334699 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 09:48:19 +00:00
Simon Pilgrim	d9dafe02fb	[CostModel] Recognise REVERSE shuffle mask if the elements come from the second src git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334698 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 09:35:00 +00:00
Simon Pilgrim	31dfcf10a6	[CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334620 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 16:52:02 +00:00
Simon Pilgrim	21582f2af6	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334513 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 16:12:29 +00:00
Simon Pilgrim	2ccbb4c82e	Fix signed/unsigned warning. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334509 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 15:14:34 +00:00

1 2 3 4 5

213 Commits