278 Commits

Author SHA1 Message Date
Matt Arsenault
06b493f7f0 Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
Reverts r337079 with fix for msan error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337535 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-20 09:05:08 +00:00
Farhana Aleen
ac8c393bd5 [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337198 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-16 18:19:59 +00:00
Evgeniy Stepanov
1382a3a7e8 Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"
This reverts commit r337021.

WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7
    #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3
    #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3
    #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18
    #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23
    #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5
    #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const*, llvm::raw_ostream&, char const*) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5
    #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3
    #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26

[...]

Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv'
    #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337079 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-14 01:20:53 +00:00
Matt Arsenault
e61b6779e4 AMDGPU: Fix handling of alignment padding in DAG argument lowering
This was completely broken if there was ever a struct argument, as
this information is thrown away during the argument analysis.

The offsets as passed in to LowerFormalArguments are not useful,
as they partially depend on the legalized result register type,
and they don't consider the alignment in the first place.

Ignore the Ins array, and instead figure out from the raw IR type
what we need to do. This seems to fix the padding computation
if the DAG lowering is forced (and stops breaking arguments
following padded arguments if the arguments were only partially
lowered in the IR)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337021 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-13 16:40:25 +00:00
Matt Arsenault
4835c04d29 AMDGPU: Fix assert in truncate combine with vectors
The piece above probably has the same problem, but I need
to try to come up with a test for it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336935 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-12 19:40:16 +00:00
Tom Stellard
1d6fd076a3 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 20:59:01 +00:00
Tom Stellard
3f928c753c AMDGPU: Fix UBSan error caused by r335942
Summary: Fixes PR38071.

Reviewers: arsenm, dstenb

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48979

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336448 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-06 17:16:17 +00:00
Matt Arsenault
e5d3d15134 AMDGPU/GlobalISel: Implement custom kernel arg lowering
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.

For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336373 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 17:01:20 +00:00
Tom Stellard
cba2181e77 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335942 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 23:47:12 +00:00
Matt Arsenault
90f8cc80db AMDGPU: Remove MFI::ABIArgOffset
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.

Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335830 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 10:18:55 +00:00
Stanislav Mekhanoshin
bc547571e7 [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335742 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 15:33:33 +00:00
Nicolai Haehnle
4c3fa871b5 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335231 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 13:37:45 +00:00
Matt Arsenault
9e41f5314e AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 15:15:46 +00:00
Stanislav Mekhanoshin
822ea1bfe8 [AMDGPU] Corrected computeKnownBits for V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48133

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334640 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 18:52:54 +00:00
Tom Stellard
98e05fe529 AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600.  We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.

Reviewers: alex-t, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46298

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334607 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin
6c5eb4370b [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334559 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 23:50:37 +00:00
Matt Arsenault
0aedafefd1 AMDGPU: Error on LDS global address in functions
These won't work as expected now, so error on them to avoid
wasting time debugging this in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334269 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 08:05:54 +00:00
Amaury Sechet
876db10e96 Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333748 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-01 13:21:33 +00:00
Tom Stellard
e851590596 AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47181

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333153 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-24 05:28:34 +00:00
Tom Stellard
ccf5904d1c AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering
Summary: This is always false for R600.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333016 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 19:37:55 +00:00
Matt Arsenault
4525054673 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332953 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 06:32:10 +00:00
Tom Stellard
f02d6fd47c AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332930 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 02:03:23 +00:00
Matt Arsenault
f8b36841ee AMDGPU: Custom lower v4i16/v4f16 vector operations
Avoids stack access.

Also handle extract hi elt pattern from truncate + shift
to avoid a couple test regressions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332453 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 11:47:30 +00:00
Matt Arsenault
c31fde0824 AMDGPU: Ignore any_extend in mul24 combine
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331919 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 21:11:35 +00:00
Matt Arsenault
5d4101dba4 AMDGPU: Handle partial shift reduction for variable shifts
If the variable shift amount has known bits, we can still reduce
the shift.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331917 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 20:52:54 +00:00
Matt Arsenault
c7ccc04824 AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331916 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 20:52:43 +00:00
Matt Arsenault
e2f4d3fdb7 AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector,
we can use the original source value.

This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331909 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 18:37:39 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Matt Arsenault
ac9b3ef76a AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331215 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-30 19:08:16 +00:00
David Stuttard
793b714ddc [AMDGPU] Fix issues for backend divergence tracking
Summary:
A change to use divergence analysis in the AMDGPU backend was getting formal
arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or
VGPR2

For graphics shaders it is possible to have more than these passed in as VGPR

Modified the checking code to check for any VGPR registers passed in as formal
arguments.

Also, some intrinsics that are sources of divergence may have been lowered
during instruction selection and are missed on subsequent calls to
isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well.

Finally, the FunctionLoweringInfo tracks virtual registers that are live across
basic block boundaries. This is used to check for divergence of CopyFromRegister
registers using the DivergenceAnalysis analysis. For multiple blocks the lazily
evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45372

Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330257 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-18 13:53:31 +00:00
Alexander Timofeev
77d8d0a7e7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326703 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-05 15:12:21 +00:00
Marek Olsak
45ce427076 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323908 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-31 20:18:04 +00:00
Hiroshi Inoue
d1b456b6d1 [NFC] fix trivial typos in comments and documents
"to to" -> "to"



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323628 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-29 05:17:03 +00:00
Changpeng Fang
1e426c2a34 AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322903 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-18 22:08:53 +00:00
Daniil Fukalov
a520636d37 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322656 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17 14:05:05 +00:00
Changpeng Fang
d6fbd6ac45 AMDGPU/SI: Add d16 support for buffer intrinsics.
Differential Revision:
  https://reviews.llvm.org/D38906

Reviewers:
  Matt and Brian.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322402 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-12 21:12:19 +00:00
Matthias Braun
d318139827 MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320884 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-15 22:22:58 +00:00
Matt Arsenault
ff838de892 DAG: Add nuw when splitting loads and stores
The object can't straddle the address space
wrap around, so I think it's OK to assume any
offsets added to the base object pointer can't
overflow. Similar logic already appears to be
applied in SelectionDAGBuilder when lowering
aggregate returns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319272 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-29 01:25:12 +00:00
Vedran Miletic
14242fe7a3 [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsics
AMDGPU backend errors with "unsupported call to function" upon
encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch
adds custom lowering to avoid that error on both R600 and SI.

Reviewers: arsenm, jvesely

Subscribers: tstellar

Differential Revision: https://reviews.llvm.org/D29942


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319025 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-27 13:26:38 +00:00
Matt Arsenault
12d09b0ded AMDGPU: Implement computeKnownBitsForTargetNode for mbcnt
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318100 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-13 22:55:05 +00:00
Jan Vesely
9ac18d6709 AMDGPU: Drop duplicate setOperationAction
These are set with other scalar int ops few lines up

Differential Revision: https://reviews.llvm.org/D39928

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318051 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-13 16:46:07 +00:00
Marek Olsak
aa75d4aeb0 AMDGPU: Lower buffer store and atomic intrinsics manually
Summary:
Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every
buffer store and atomic instruction.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39060

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317754 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-09 01:52:48 +00:00
Matt Arsenault
db6cc311a0 AMDGPU: Remove redundant combine
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).

This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.

Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317526 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 00:06:32 +00:00
Matt Arsenault
bd04b64cd1 AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317492 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-06 17:04:37 +00:00
Wei Ding
607acf30af AMDGPU : Fix an error for the llvm.cttz implementation.
Differential Revision: http://reviews.llvm.org/D39014

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316037 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-17 21:49:52 +00:00
Matt Arsenault
9a6875264b AMDGPU: Implement isFPExtFoldable
This helps match v_mad_mix* in some cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13 20:18:59 +00:00
Wei Ding
29de9d738e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315610 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-12 19:37:14 +00:00
Stanislav Mekhanoshin
287608ccc3 [AMDGPU] New 64 bit div/rem expansion
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.

Passes OpenCL conformance test_integer_ops quick_[u]long_math

Differential Revision: https://reviews.llvm.org/D38607

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-06 17:24:45 +00:00
Konstantin Zhuravlyov
9cb20ab95d AMDGPU: Expand setcc for v2f32 and v4f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314853 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov
b922e2ff02 AMDGPU: Expand setcc for v2i32 and v4i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314852 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03 21:31:24 +00:00