archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Matt Arsenault	db6cc311a0	AMDGPU: Remove redundant combine This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317526 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 00:06:32 +00:00
Matt Arsenault	bd04b64cd1	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317492 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-06 17:04:37 +00:00
Wei Ding	607acf30af	AMDGPU : Fix an error for the llvm.cttz implementation. Differential Revision: http://reviews.llvm.org/D39014 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316037 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-17 21:49:52 +00:00
Matt Arsenault	9a6875264b	AMDGPU: Implement isFPExtFoldable This helps match v_mad_mix* in some cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315744 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-13 20:18:59 +00:00
Wei Ding	29de9d738e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315610 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-12 19:37:14 +00:00
Stanislav Mekhanoshin	287608ccc3	[AMDGPU] New 64 bit div/rem expansion Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315081 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-06 17:24:45 +00:00
Konstantin Zhuravlyov	9cb20ab95d	AMDGPU: Expand setcc for v2f32 and v4f32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314853 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov	b922e2ff02	AMDGPU: Expand setcc for v2i32 and v4i32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314852 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-03 21:31:24 +00:00
Tim Renouf	8ba98f908f	[AMDGPU] calling conventions for AMDPAL OS type Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314502 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-29 09:51:22 +00:00
Matt Arsenault	e4e1eed1d7	AMDGPU: Allow coldcc calls git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312936 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-11 18:54:20 +00:00
Stanislav Mekhanoshin	f3b5f2ad4a	[AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() Differential Revision: https://reviews.llvm.org/D37392 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312364 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-01 20:43:20 +00:00
Matt Arsenault	d213820974	AMDGPU: Turn int pack pattern into build_vector build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312282 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-31 21:17:22 +00:00
Stanislav Mekhanoshin	f4dd1bdd9a	[AMDGPU] computeKnownBitsForTargetNode for 24 bit mul Differential Revision: https://reviews.llvm.org/D37168 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311896 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-28 16:35:37 +00:00
Matt Arsenault	45424dbebb	AMDGPU: Start adding tail call support Handle the sibling call cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310753 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-11 20:42:08 +00:00
Matt Arsenault	0856e7acd5	AMDGPU: Don't use report_fatal_error for unsupported call types git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310004 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 23:32:41 +00:00
Matt Arsenault	c60159767d	AMDGPU: Pass special input registers to functions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309998 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 23:00:29 +00:00
Matt Arsenault	43950949ad	AMDGPU: Initial implementation of calls Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309732 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-01 19:54:18 +00:00
Hiroshi Inoue	e3b8cd6b61	fix typos in comments; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308127 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-16 08:11:56 +00:00
Matt Arsenault	078c435803	AMDGPU: Return correct type during argument lowering The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308082 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-15 05:52:59 +00:00
Simon Pilgrim	2541a59ac3	Fix some more -Wimplicit-fallthrough warnings. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307411 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-07 16:40:06 +00:00
David Stuttard	dad6e61ce7	[AMDGPU] Add intrinsics for tbuffer load and store Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306031 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-22 16:29:22 +00:00
Matt Arsenault	b9cdbc013b	AMDGPU: Cleanup CreateLiveInRegister git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305748 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-19 21:52:45 +00:00
Chandler Carruth	e3e43d9d57	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304787 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-06 11:49:48 +00:00
Stanislav Mekhanoshin	6ff1a723f7	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303681 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-23 19:54:48 +00:00
Stanislav Mekhanoshin	ddde657138	[AMDGPU] Convert shl (add) into add (shl) shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303641 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-23 15:59:58 +00:00
Stanislav Mekhanoshin	69edad7913	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303569 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-22 16:58:10 +00:00
Matt Arsenault	a0540d3468	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303308 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-17 21:56:25 +00:00
Craig Topper	d49344495d	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302925 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-12 17:20:30 +00:00
Matt Arsenault	2bbb56fd75	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302813 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-11 17:26:25 +00:00
Craig Topper	ace8b39f82	[KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302262 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-05 17:36:09 +00:00
Marek Olsak	a2057043bd	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301930 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-02 15:41:10 +00:00
Marek Olsak	007530e1a2	AMDGPU: Add new amdgcn.init.exec intrinsics v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301677 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-28 20:21:58 +00:00
Craig Topper	8b430f87e6	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301620 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-28 05:31:46 +00:00
Matt Arsenault	666020a37d	AMDGPU: Move trap lowering to DAG Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301206 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-24 17:49:13 +00:00
Akira Hatanaka	586c752a82	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301019 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-21 18:53:12 +00:00
Akira Hatanaka	1933132d0a	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300940 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-21 01:31:50 +00:00
Akira Hatanaka	63da689bdf	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300930 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-21 00:05:16 +00:00
Akira Hatanaka	01c014ca98	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300916 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-20 23:03:30 +00:00
Akira Hatanaka	ac0ecde9f0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300913 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-20 22:47:56 +00:00
Matt Arsenault	938bfaf893	AMDGPU: Refactor argument lowering Split into smaller functions and prepare for handling non-entry functions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299998 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-11 22:29:24 +00:00
Matt Arsenault	0719006e7e	AMDGPU: Stop using CCAssignToRegWithShadow This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299667 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-06 17:37:27 +00:00
Matt Arsenault	513e714dfd	AMDGPU: Remove llvm.SI.vs.load.input git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299391 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-03 21:45:13 +00:00
Matt Arsenault	cd7c9c3178	AMDGPU: Remove legacy bfe intrinsics git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299372 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-03 18:08:08 +00:00
Matt Arsenault	c4de629ce2	AMDGPU: Remove unnecessary ands when f16 is legal Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299246 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-31 19:53:03 +00:00
Simon Pilgrim	9fc191fd45	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299219 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-31 13:54:09 +00:00
Simon Pilgrim	07898901df	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299201 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-31 11:24:16 +00:00
Simon Pilgrim	70a5705cf0	[AMDGPU] Tidy up computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298991 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-29 12:09:25 +00:00
Yaxun Liu	ab3be33d40	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-27 14:04:01 +00:00
Matt Arsenault	d4f6485173	AMDGPU: Implement f16 fround git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298730 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-24 20:04:18 +00:00
Matt Arsenault	dc55587b7f	AMDGPU: Rename SI_RETURN This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298452 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-21 22:18:10 +00:00

1 2 3 4 5

236 Commits