1112 Commits

Author SHA1 Message Date
Hiroshi Inoue
e3b8cd6b61 fix typos in comments; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308127 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-16 08:11:56 +00:00
Matt Arsenault
078c435803 AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308082 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-15 05:52:59 +00:00
Alfred Huang
1356a150af [AMDGPU] Do not insert an instruction into worklist twice in movetovalu
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.

Differential Revision: https://reviews.llvm.org/D34726

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308039 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-14 17:56:55 +00:00
Matt Arsenault
f9915c27c2 AMDGPU: Detect kernarg segment pointer
This is necessary to pass the kernarg segment pointer
to callee functions. Also don't unconditionally enable
for kernels.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307978 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-14 00:11:13 +00:00
Stanislav Mekhanoshin
9fc15af9b2 [AMDGPU] fcaninicalize optimization for GFX9+
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.

Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.

Differential Revision: https://reviews.llvm.org/D35335

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307976 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 23:59:15 +00:00
Matt Arsenault
a20c1d0cec AMDGPU: Annotate call graph with used features
Previously this wouldn't detect used features indirectly
used in callee functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307967 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 21:43:42 +00:00
Matt Arsenault
ffac88a158 AMDGPU: Fix converting unanalyzable global loads to SMRD
Not all memory dependence queries succeed, so this needs to
be conservative if it fails.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 23:06:18 +00:00
Stanislav Mekhanoshin
16be511cb4 [AMDGPU] fcanonicalize elimination optimization
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.

Differential Revision: https://reviews.llvm.org/D35218

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307848 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 21:20:28 +00:00
Konstantin Zhuravlyov
8f85685860 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307722 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 22:23:00 +00:00
Matt Arsenault
d380c14b7a AMDGPU: Allow SIShrinkInstructions to fold FrameIndexes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307576 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 20:04:35 +00:00
Matt Arsenault
a038a8340c AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307575 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 19:53:57 +00:00
Sean Fertile
471398ffea Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 02:00:06 +00:00
Matt Arsenault
8763b3ac42 AMDGPU: Add macro fusion schedule DAG mutation
Try to increase opportunities to shrink vcc uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307313 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:57:05 +00:00
Matt Arsenault
0f915c6a85 AMDGPU: Remove unnecessary IR from MIR tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307311 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:56:57 +00:00
Stanislav Mekhanoshin
71b4fe4228 [AMDGPU] Always use rcp + mul with fast math
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.

An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:

%div = fdiv fast float %v, %0, !fpmath !12
!12 = !{float 2.500000e+00}

Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.

This change allows an "fdiv fast" to be always lowered as rcp + mul.

Differential Revision: https://reviews.llvm.org/D34844

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307308 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:34:21 +00:00
David Stuttard
3b312dd635 [RegisterCoalescer] Fix for SubRange join unreachable
Summary:
During remat, some subranges might end up having invalid segments which caused problems for later
coalescing.

Added in a check to remove segments that are invalidated as part of the remat.

See http://llvm.org/PR33524

Subscribers: MatzeB, qcolombet

Differential Revision: https://reviews.llvm.org/D34391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307247 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 10:07:57 +00:00
Alexander Timofeev
f9e9586c80 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307097 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 17:32:00 +00:00
NAKAMURA Takumi
0a256123a4 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307054 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 02:14:18 +00:00
Alexander Timofeev
0f9ec97238 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307026 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-03 14:54:11 +00:00
Richard Smith
638ba5afb5 Fix ODR violations due to abuse of LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
This is a short-term fix for PR33650 aimed to get the modules build bots green again.

Remove all the places where we use the LLVM_YAML_IS_(FLOW_)?SEQUENCE_VECTOR
macros to try to locally specialize a global template for a global type. That's
not how C++ works.

Instead, we now centrally define how to format vectors of fundamental types and
of string (std::string and StringRef). We use flow formatting for the former
cases, since that's the obvious right thing to do; in the latter case, it's
less clear what the right choice is, but flow formatting is really bad for some
cases (due to very long strings), so we pick block formatting. (Many of the
cases that were using flow formatting for strings are improved by this change.)

Other than the flow -> block formatting change for some vectors of strings,
this should result in no functionality change.

Differential Revision: https://reviews.llvm.org/D34907

Corresponding updates to clang, clang-tools-extra, and lld to follow.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306878 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-30 20:56:57 +00:00
Matt Arsenault
c278dccfd0 AMDGPU: Remove SITypeRewriter
This was an old workaround for using v16i8 in some old intrinsics
for resource descriptors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306603 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 21:38:50 +00:00
Stanislav Mekhanoshin
a143b4a4f3 Fold fneg and fabs like multiplications
Given no NaNs and no signed zeroes it folds:

(fmul X, (select (fcmp X > 0.0), -1.0, 1.0)) -> (fneg (fabs X))
(fmul X, (select (fcmp X > 0.0), 1.0, -1.0)) -> (fabs X)

Differential Revision: https://reviews.llvm.org/D34579

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306592 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 20:25:50 +00:00
Stanislav Mekhanoshin
8b38a13919 [AMDGPU] Add pattern for v_alignbit_b32 with immediate
If immediate in shift is less than 32 we can use alignbit too.

Differential Revision: https://reviews.llvm.org/D34729

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306500 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin
a5e3faf5db Allow to truncate left shift with non-constant shift amount
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.

Differential Revision: https://reviews.llvm.org/D34723

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306499 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 02:37:11 +00:00
Stanislav Mekhanoshin
040f338ab8 [AMDGPU] Add 2 new alignbit patterns
Differential Revision: https://reviews.llvm.org/D34655

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306449 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 19:10:47 +00:00
Stanislav Mekhanoshin
e764e24028 [AMDGPU] Simplify setcc (sext from i1 b), -1|0, cc
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.

Differential Revision: https://reviews.llvm.org/D34545

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306446 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 18:53:03 +00:00
Matt Arsenault
d841eae40b RenameIndependentSubregs: Fix infinite loop
Apparently this replacement can really be substituting the
same as the original register. Avoid restarting the loop
when there's been no change in the register uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306441 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 18:28:10 +00:00
Stanislav Mekhanoshin
e2d935510c [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.

Differential Revision: https://reviews.llvm.org/D34500

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306439 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 18:25:26 +00:00
Sam Kolton
06ed4a14fd [AMDGPU] SDWA: several fixes for V_CVT and VOPC instructions
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.

Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye

Differential Revision: https://reviews.llvm.org/D34626

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306413 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 15:02:23 +00:00
Nicolai Haehnle
7ca35760c5 AMDGPU: M0 operands to spill/restore opcodes are dead
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.

This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).

Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306375 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 08:04:13 +00:00
Matthias Braun
ea254cbf8f ScheduleDAGInstrs: Fix fixupKills() adding too many kill flags.
Remove invalid shortcut in fixupKills(): A register needs to be marked
live even when we are not adding a kill flag. This is because a
partially live register must not get a kill flags, but it still needs to
be fully marked live when walking backwards.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306352 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 00:58:48 +00:00
Matt Arsenault
ab5d97fb87 RenameIndependentSubregs: Fix iterator problem
Fixes bug 33597.

Use of substituteRegister in the tied operand case messes
up the register use iterator, causing some uses to be left
unprocessed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306333 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 21:33:36 +00:00
Matt Arsenault
8e828b87b2 AMDGPU: Setup SP/FP in callee function prolog/epilog
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306312 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 17:53:59 +00:00
Tom Stellard
8d3ca7cfeb AMDGPU/GlobalISel: Mark 32-bit G_SHL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34589

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306298 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 15:56:52 +00:00
Matt Arsenault
ec6175c524 AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306264 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-26 03:01:31 +00:00
Rafael Espindola
c88c3632e7 Add missing %s to RUN line.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306199 cdac9f57-aa62-4fd3-8940-286f4534e8a0
2017-06-24 04:41:39 +00:00
Rafael Espindola
82693db150 Test the object file creation too.
This should *really* be a llvm-mc test, but the parser is broken.
See PR33579 for the parser bug.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306198 cdac9f57-aa62-4fd3-8940-286f4534e8a0
2017-06-24 04:31:45 +00:00
Tom Stellard
111d1b387d AMDGPU/GlobalISel: Mark 32-bit G_AND as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34349

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306112 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-23 15:17:17 +00:00
David Stuttard
dad6e61ce7 [AMDGPU] Add intrinsics for tbuffer load and store
Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr

Differential Revision: https://reviews.llvm.org/D30687

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306031 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 16:29:22 +00:00
Sam Kolton
1f2bcd710f [AMDGPU] SDWA: remove support for VOP2 instructions that have only 64-bit encoding
Summary:
Despite that this instructions are listed in VOP2, they are treated as VOP3 in specs. They should not support SDWA.
There are no real instructions for them, but there are pseudo instructions.

Reviewers: arsenm, vpykhtin, cfang

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34403

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305999 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 12:42:14 +00:00
Sam Kolton
e88fc4046f [AMDGPU] SDWA: add support for GFX9 in peephole pass
Summary:
Added support based on merged SDWA pseudo instructions. Now peephole allow one scalar operand, omod and clamp modifiers.
Added several subtarget features for GFX9 SDWA.
This diff also contains changes from D34026.
Depends D34026

Reviewers: vpykhtin, rampitec, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34241

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305986 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 06:26:41 +00:00
Stanislav Mekhanoshin
91cd127b89 [AMDGPU] Add FP_CLASS to the add/setcc combine
This is one of the nodes which also compile as v_cmp_*.

Differential Revision: https://reviews.llvm.org/D34485

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305970 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 23:46:22 +00:00
Stanislav Mekhanoshin
6189e64739 [AMDGPU] Combine add and adde, sub and sube
If one of the arguments of adde/sube is zero we can fold another
add/sub into it.

Differential Revision: https://reviews.llvm.org/D34374

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305964 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 22:30:01 +00:00
Stanislav Mekhanoshin
1a1f544263 [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setcc
This simplification allows to avoid generating v_cndmask_b32
to serialize condition code between compare and use.

Differential Revision: https://reviews.llvm.org/D34300

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305962 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 22:05:06 +00:00
Stanislav Mekhanoshin
64373efcab [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.

Differential Revison: https://reviews.llvm.org/D34291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305840 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 20:33:44 +00:00
Matt Arsenault
5516bd1387 AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305821 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:56:32 +00:00
Matthias Braun
0cc137e051 RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:

- Rewrite findSurvivorBackwards algorithm to use the existing
  LiveRegUnit::accumulateBackward() code. This also avoids the Available
  and Candidates bitset and just need 1 LiveRegUnit instance
  (= 1 bitset).
- Pick registers in allocation order instead of register number order.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305817 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:43:14 +00:00
Matt Arsenault
73854fd751 AMDGPU: Preserve undef when folding register operands
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305816 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin
423a449bd5 [AMDGPU] Eliminate SGPR to VGPR copy when possible
SGPRs are generally cheaper, so try to use them over VGPRs.

Differential Revision: https://reviews.llvm.org/D34130

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305815 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:32:42 +00:00
Matt Arsenault
4cabef582f AMDGPU: Fix crash with undef vreg input operand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305814 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:28:02 +00:00