65 Commits

Author SHA1 Message Date
Matt Arsenault
7e212e4168 AMDGPU: Remove remnants of old address space mapping
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 05:49:54 +00:00
Tom Stellard
1d6fd076a3 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 20:59:01 +00:00
Piotr Padlewski
c2f24d9ea8 Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336073 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 04:49:30 +00:00
Tom Stellard
cba2181e77 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335942 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 23:47:12 +00:00
Changpeng Fang
54a7d61d8c AMDGPU/SI: Don't promote alloca to vector for atomic load/store
Summary:
  Don't promote alloca to vector for atomic load/store

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D46085

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332673 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-17 21:49:44 +00:00
Nicola Zaghen
0818e789cb Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332240 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-14 12:53:11 +00:00
Changpeng Fang
52d265c4ad AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.
Summary:
  We have no logic to promote alloca to vector for an AddrSpaceCast instruction.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D45993

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332147 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-11 22:17:57 +00:00
Piotr Padlewski
9648b46325 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331448 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-03 11:03:01 +00:00
Changpeng Fang
88d6664b97 AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements
Summary:
  This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D33559

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325372 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-16 19:14:17 +00:00
Daniel Neilson
3a406d4ae7 [AMDGPUPromoteAlloca] Replace deprecated memory intrinsic APIs (NFCI)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AMDGPUPromoteAlloca pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324774 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-09 21:56:15 +00:00
Matt Arsenault
4d43fa8b05 AMDGPU: Fix assert on alloca of array of struct
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313282 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-14 18:02:29 +00:00
David Stuttard
00d555d436 [AMDGPU] Fix for issue in alloca to vector promotion pass
Summary:
Alloca promotion pass not dealing with non-canonical input

Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)

Also added some test cases in non-canonical form to check that it no longer crashes

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31710

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305079 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-09 14:16:22 +00:00
Chandler Carruth
e3e43d9d57 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304787 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 11:49:48 +00:00
Changpeng Fang
6b7bd0e1f9 AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca
Summary:
  Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other.
As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage
related checking out and replace it after the calling convention checking.

Reviewer:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33139

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303684 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 20:25:41 +00:00
Francis Visoiu Mistrih
ae1c853358 [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303360 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-18 17:21:13 +00:00
Changpeng Fang
ed4c8077b0 AMDGPU/SI: Don't promote to vector if the load/store is volatile.
Summary:
  We should not change volatile loads/stores in promoting alloca to vector.

Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33107

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302943 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 20:31:12 +00:00
Matt Arsenault
5c95b810cb AMDGPU: Don't promote alloca to LDS for leaf functions
LDS use in leaf functions not currently handled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301958 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-02 18:33:18 +00:00
Stanislav Mekhanoshin
bb9002fbb2 [AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local
id  and local size queries.

Differential Revision: https://reviews.llvm.org/D31804

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300102 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 20:48:56 +00:00
Yaxun Liu
ab3be33d40 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 14:04:01 +00:00
George Burgess IV
3479ed63a6 Let llvm.objectsize be conservative with null pointers
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298430 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 20:08:59 +00:00
Reid Kleckner
6707770d48 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298393 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 16:57:19 +00:00
Stanislav Mekhanoshin
a1d4ee75a4 [AMDGPU] Account workgroup size in LDS occupancy limits
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.

In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.

Differential Revision: https://reviews.llvm.org/D29423

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293837 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-01 22:59:50 +00:00
Matthias Braun
88d207542b Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293359 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-28 02:02:38 +00:00
Changpeng Fang
6562a033a3 AMDGPU/SI: Give up in promote alloca when a pointer may be captured.
Differential Revision:
  http://reviews.llvm.org/D28970

Reviewer:
  Matt

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292966 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-24 19:06:28 +00:00
Eugene Zelenko
68c521d030 [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292623 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 17:52:16 +00:00
Matt Arsenault
f4fb506ab9 AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts
The users of the addrspacecast were having their types incorrectly
changed, producing invalid bitcasts between address spaces.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289307 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-10 00:52:50 +00:00
Mehdi Amini
67f335d992 Use StringRef in Pass/PassManager APIs (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283004 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-01 02:56:57 +00:00
Konstantin Zhuravlyov
1f99c41083 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280747 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 20:22:28 +00:00
David Majnemer
975248e4fb Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278433 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-11 22:21:41 +00:00
Matt Arsenault
1b96f3c048 AMDGPU: Remove pointless dyn_cast_or_null
This is already casted above so non-null

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275881 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 19:00:07 +00:00
Matt Arsenault
865e2fa1dc AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275869 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 18:34:53 +00:00
Matt Arsenault
797b9ee060 AMDGPU: Remove dead code and redundant check
Non intrinsic calls aren't really handled, and this
IntrinsicInst dyn_cast checks for the function for us.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275868 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 18:34:48 +00:00
Nicolai Haehnle
0c05ce4746 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275779 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 09:02:47 +00:00
Matt Arsenault
dca409d5ad AMDGPU: Move subtarget feature checks into passes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273937 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 20:32:13 +00:00
Peter Collingbourne
63b34cdf34 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272709 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-14 21:01:22 +00:00
Matt Arsenault
44aaff08ed AMDGPU: Fix promote alloca for pointer loads
If the load has a pointer type, we don't want to change
its type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270000 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 23:20:24 +00:00
Matt Arsenault
41cf920df5 AMDGPU: Handle alloca promoting with null operands
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269945 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 15:57:21 +00:00
Matt Arsenault
7985e4be56 AMDGPU: Fix promote alloca pass creating huge arrays
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.

By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.

Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269708 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-16 21:19:59 +00:00
Matt Arsenault
8adf4ebcf3 AMDGPU: Fix breaking IR on instructions with multiple pointer operands
The promote alloca pass would attempt to promote an alloca with
a select, icmp, or phi user, even though the other operand was
from a non-promotable source, producing a select on two different
pointer types.

Only do this if we know that both operands derive from the same
alloca. In the future we should be able to relax this to an alloca
which will also be promoted.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269265 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-12 01:58:58 +00:00
Matt Arsenault
3ba7927b46 AMDGPU: Fix mishandling array allocations when promoting alloca
The canonical form for allocas is a single allocation of the array type.
In case we see a non-canonical array alloca, make sure we aren't
replacing this with an array N times smaller.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-28 18:38:48 +00:00
Matt Arsenault
38099e5394 AMDGPU: Account for globals in AMDGPUPromoteAlloca pass
Patch by Bas Nieuwenhuizen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267791 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-27 21:05:08 +00:00
Andrew Kaylor
c7ca1302cf Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267485 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-25 22:23:44 +00:00
Tom Stellard
510a2b9622 AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266337 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 16:27:07 +00:00
Matt Arsenault
fc80e900d8 AMDGPU: Promote alloca should skip volatiles
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264214 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-23 23:17:29 +00:00
Matt Arsenault
6da276af59 AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263206 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-11 08:20:50 +00:00
Matt Arsenault
420f9c1154 AMDGPU: Remove a fixme for ptrrtoint handling
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262854 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-07 21:12:46 +00:00
Matt Arsenault
d1d0a1a39d AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259911 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-05 19:47:23 +00:00
Matt Arsenault
f1f2dd4ca2 AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259573 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-02 21:16:12 +00:00
Matt Arsenault
551787639e AMDGPU: Handle promoting memmove
Also add missing tests for the others.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259558 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-02 20:28:10 +00:00
Matt Arsenault
ec856e4504 AMDGPU: Skip promote alloca with no optimizations
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259551 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-02 19:32:42 +00:00