archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Matt Arsenault	06b493f7f0	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337535 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-20 09:05:08 +00:00
Farhana Aleen	ac8c393bd5	[AMDGPU] [AMDGPU] Support a fdot2 pattern. Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337198 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-16 18:19:59 +00:00
Evgeniy Stepanov	1382a3a7e8	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering" This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337079 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-14 01:20:53 +00:00
Matt Arsenault	e61b6779e4	AMDGPU: Fix handling of alignment padding in DAG argument lowering This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337021 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-13 16:40:25 +00:00
Matt Arsenault	4835c04d29	AMDGPU: Fix assert in truncate combine with vectors The piece above probably has the same problem, but I need to try to come up with a test for it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336935 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-12 19:40:16 +00:00
Tom Stellard	1d6fd076a3	AMDGPU: Refactor Subtarget classes Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-11 20:59:01 +00:00
Tom Stellard	3f928c753c	AMDGPU: Fix UBSan error caused by r335942 Summary: Fixes PR38071. Reviewers: arsenm, dstenb Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48979 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336448 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-06 17:16:17 +00:00
Matt Arsenault	e5d3d15134	AMDGPU/GlobalISel: Implement custom kernel arg lowering Avoid using allocateKernArg / AssignFn. We do not want any of the type splitting properties of normal calling convention lowering. For now at least this exists alongside the IR argument lowering pass. This is necessary to handle struct padding correctly while some arguments are still skipped by the IR argument lowering pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336373 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 17:01:20 +00:00
Tom Stellard	cba2181e77	AMDGPU: Separate R600 and GCN TableGen files Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335942 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-28 23:47:12 +00:00
Matt Arsenault	90f8cc80db	AMDGPU: Remove MFI::ABIArgOffset We have too many mechanisms for tracking the various offsets used for kernel arguments, so remove one. There's still a lot of confusion with these because there are two different "implicit" argument areas located at the beginning and end of the kernarg segment. Additionally, the offset was determined based on the memory size of the split element types. This would break in a future commit where v3i32 is decomposed into separate i32 pieces. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335830 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-28 10:18:55 +00:00
Stanislav Mekhanoshin	bc547571e7	[AMDGPU] Convert rcp to rcp_iflag If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335742 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-27 15:33:33 +00:00
Nicolai Haehnle	4c3fa871b5	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335231 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:37:45 +00:00
Matt Arsenault	9e41f5314e	AMDGPU: Make v4i16/v4f16 legal Some image loads return these, and it's awkward working around them not being legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 15:15:46 +00:00
Stanislav Mekhanoshin	822ea1bfe8	[AMDGPU] Corrected computeKnownBits for V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48133 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334640 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 18:52:54 +00:00
Tom Stellard	98e05fe529	AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334607 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin	6c5eb4370b	[AMDGPU] DAG combine to produce V_PERM_B32 Differential Revision: https://reviews.llvm.org/D48099 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334559 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-12 23:50:37 +00:00
Matt Arsenault	0aedafefd1	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334269 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-08 08:05:54 +00:00
Amaury Sechet	876db10e96	Set ADDE/ADDC/SUBE/SUBC to expand by default Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333748 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-01 13:21:33 +00:00
Tom Stellard	e851590596	AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333153 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-24 05:28:34 +00:00
Tom Stellard	ccf5904d1c	AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333016 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-22 19:37:55 +00:00
Matt Arsenault	4525054673	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332953 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-22 06:32:10 +00:00
Tom Stellard	f02d6fd47c	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332930 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-22 02:03:23 +00:00
Matt Arsenault	f8b36841ee	AMDGPU: Custom lower v4i16/v4f16 vector operations Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332453 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-16 11:47:30 +00:00
Matt Arsenault	c31fde0824	AMDGPU: Ignore any_extend in mul24 combine If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331919 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 21:11:35 +00:00
Matt Arsenault	5d4101dba4	AMDGPU: Handle partial shift reduction for variable shifts If the variable shift amount has known bits, we can still reduce the shift. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331917 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 20:52:54 +00:00
Matt Arsenault	c7ccc04824	AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit This is an extension of an existing combine to reduce wider shls if the result fits in the final result type. This introduces the same combine, but reduces the shift to a middle sized type to avoid the slow 64-bit shift. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331916 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 20:52:43 +00:00
Matt Arsenault	e2f4d3fdb7	AMDGPU: Add combine for trunc of bitcast from build_vector If the truncate is only accessing the first element of the vector, we can use the original source value. This helps with some combine ordering issues after operations are lowered to integer operations between bitcasts of build_vector. In particular it stops unnecessarily materializing the unused top half of a vector in some cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331909 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 18:37:39 +00:00
Adrian Prantl	26b584c691	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-01 15:54:18 +00:00
Matt Arsenault	ac9b3ef76a	AMDGPU: Add Vega12 and Vega20 Changes by Matt Arsenault Konstantin Zhuravlyov git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331215 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-30 19:08:16 +00:00
David Stuttard	793b714ddc	[AMDGPU] Fix issues for backend divergence tracking Summary: A change to use divergence analysis in the AMDGPU backend was getting formal arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or VGPR2 For graphics shaders it is possible to have more than these passed in as VGPR Modified the checking code to check for any VGPR registers passed in as formal arguments. Also, some intrinsics that are sources of divergence may have been lowered during instruction selection and are missed on subsequent calls to isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well. Finally, the FunctionLoweringInfo tracks virtual registers that are live across basic block boundaries. This is used to check for divergence of CopyFromRegister registers using the DivergenceAnalysis analysis. For multiple blocks the lazily evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45372 Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330257 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-18 13:53:31 +00:00
Alexander Timofeev	77d8d0a7e7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326703 91177308-0d34-0410-b5e6-96231b3b80d8	2018-03-05 15:12:21 +00:00
Marek Olsak	45ce427076	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323908 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-31 20:18:04 +00:00
Hiroshi Inoue	d1b456b6d1	[NFC] fix trivial typos in comments and documents "to to" -> "to" git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323628 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-29 05:17:03 +00:00
Changpeng Fang	1e426c2a34	AMDGPU/SI: Add d16 support for image intrinsics. Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322903 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-18 22:08:53 +00:00
Daniil Fukalov	a520636d37	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322656 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-17 14:05:05 +00:00
Changpeng Fang	d6fbd6ac45	AMDGPU/SI: Add d16 support for buffer intrinsics. Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322402 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-12 21:12:19 +00:00
Matthias Braun	d318139827	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320884 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-15 22:22:58 +00:00
Matt Arsenault	ff838de892	DAG: Add nuw when splitting loads and stores The object can't straddle the address space wrap around, so I think it's OK to assume any offsets added to the base object pointer can't overflow. Similar logic already appears to be applied in SelectionDAGBuilder when lowering aggregate returns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319272 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-29 01:25:12 +00:00
Vedran Miletic	14242fe7a3	[AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI. Reviewers: arsenm, jvesely Subscribers: tstellar Differential Revision: https://reviews.llvm.org/D29942 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319025 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-27 13:26:38 +00:00
Matt Arsenault	12d09b0ded	AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318100 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-13 22:55:05 +00:00
Jan Vesely	9ac18d6709	AMDGPU: Drop duplicate setOperationAction These are set with other scalar int ops few lines up Differential Revision: https://reviews.llvm.org/D39928 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318051 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-13 16:46:07 +00:00
Marek Olsak	aa75d4aeb0	AMDGPU: Lower buffer store and atomic intrinsics manually Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317754 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-09 01:52:48 +00:00
Matt Arsenault	db6cc311a0	AMDGPU: Remove redundant combine This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317526 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-07 00:06:32 +00:00
Matt Arsenault	bd04b64cd1	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317492 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-06 17:04:37 +00:00
Wei Ding	607acf30af	AMDGPU : Fix an error for the llvm.cttz implementation. Differential Revision: http://reviews.llvm.org/D39014 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316037 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-17 21:49:52 +00:00
Matt Arsenault	9a6875264b	AMDGPU: Implement isFPExtFoldable This helps match v_mad_mix* in some cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315744 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-13 20:18:59 +00:00
Wei Ding	29de9d738e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315610 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-12 19:37:14 +00:00
Stanislav Mekhanoshin	287608ccc3	[AMDGPU] New 64 bit div/rem expansion Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315081 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-06 17:24:45 +00:00
Konstantin Zhuravlyov	9cb20ab95d	AMDGPU: Expand setcc for v2f32 and v4f32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314853 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov	b922e2ff02	AMDGPU: Expand setcc for v2i32 and v4i32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314852 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-03 21:31:24 +00:00

1 2 3 4 5 ...

278 Commits