Commit Graph

26908 Commits

Author SHA1 Message Date
Neil Henning
b461f4de29 [AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
  the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
  record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
  whether the value is uniform/divergent use wavefront wide operations
  (DPP in the divergent case) to calculate the total amount to be
  atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
  operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
  the wavefront, and calculate our individual lanes offset into the
  final result.

Differential Revision: https://reviews.llvm.org/D51969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343973 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 15:49:19 +00:00
Oliver Stannard
8a4fa4a92d [AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled
When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343969 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 14:12:08 +00:00
Oliver Stannard
764fdc0b3e [AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI
When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343968 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 14:09:15 +00:00
Oliver Stannard
4bc81028d4 [AArch64][v8.5A] Branch Target Identification code-generation pass
The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343967 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 14:04:24 +00:00
Alexander Ivchenko
c190984a44 [GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM
Support G_UDIV/G_UREM/G_SREM. The instruction selection
code is taken from FastISel with only minor tweaks to adapt
for GlobalISel.

Differential Revision: https://reviews.llvm.org/D49781


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343966 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 13:40:34 +00:00
Sanjay Patel
6a25e9b7c6 [x86] add 16 missed hadd patterns (PR39195); NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343965 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 12:54:33 +00:00
Peter Smith
944e6c56f7 [ARM] Account for implicit IT when calculating inline asm size
When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343960 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 09:38:28 +00:00
Oliver Stannard
ca6a9e3669 [AArch64] Fix verifier error when outlining indirect calls
The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343959 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 09:18:48 +00:00
Alex Bradbury
7d468f2a87 [RISCV] Update alu8.ll and alu16.ll test cases
The srli test in alu8.ll was a no-op, as it shifted by 8 bits. Fix this, and 
also change the immediate in alu16.ll as shifted by something other than a 
poewr of 8 is more interesting.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343958 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 09:08:51 +00:00
Sanjay Patel
a2434c2657 [DAGCombiner] allow undef elts in vector fadd matching
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343945 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 16:30:42 +00:00
Sanjay Patel
c708db84ee [x86] add vector fadd with undef elts test; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343944 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 16:27:50 +00:00
Sanjay Patel
049dc00383 [x86] remove redundant tests; NFC
The equivalent tests were added to the file with related folds in rL343941.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343943 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 16:13:38 +00:00
Sanjay Patel
199609a85f [DAGCombiner] allow undefs when matching vector splats for fmul folds
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343942 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 16:05:37 +00:00
Sanjay Patel
9cc55166d5 [x86] add vector fmul with undef elts tests; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343941 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 16:00:55 +00:00
Sanjay Patel
0620db4950 [DAGCombiner] allow undef elts in vector fabs/fneg matching
This change is proposed as a part of D44548, but we
need this independently to avoid regressions from improved
undef propagation in SimplifyDemandedVectorElts().


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343940 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 15:32:06 +00:00
Sanjay Patel
5edfe6ab5d [x86] add tests for FP logic folding for vectors with undefs; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343938 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 15:05:39 +00:00
Simon Pilgrim
b1a2f9aee3 [SelectionDAG] Respect multiple uses in SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
rL343913 was using SimplifyDemandedBits's original demanded mask instead of the adjusted 'NewMask' that accounts for multiple uses of the op (those variable names really need improving....).

Annoyingly many of the test changes (back to pre-rL343913 state) are actually safe - but only because their multiple uses are all by PMULDQ/PMULUDQ.

Thanks to Jan Vesely (@jvesely) for bisecting the bug.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343935 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 11:45:46 +00:00
Simon Pilgrim
fd84d483ad [AARCH64][X86] Remove _nonsplat from test names
As discussed on D50222 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343934 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 11:24:04 +00:00
Alex Bradbury
bb3be36c55 [RISCV] Introduce alu8.ll and alu16.ll tests
These track the quality of generated code for simple arithmetic operations
that were legalised from non-native types.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343930 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-07 06:53:46 +00:00
Simon Pilgrim
e52b757e5d [X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343926 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 22:13:44 +00:00
Simon Pilgrim
e4c0278ac6 [X86][SSE] Add SSE41 vector int2fp tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343925 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 20:24:27 +00:00
Simon Pilgrim
9c587960b3 [X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand
Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343922 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 14:51:14 +00:00
Simon Pilgrim
942f89c803 [X86] Regenerate LSR loop iteration test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343921 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 14:26:38 +00:00
Sanjay Patel
29dbca16bf [x86] add test for masked store with extra shift op; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343920 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 14:11:05 +00:00
Simon Pilgrim
8020028957 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks
Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343919 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 13:49:31 +00:00
Simon Pilgrim
0d0e510068 [SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector.

There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify.

As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.).

Fixes PR39178

Differential Revision: https://reviews.llvm.org/D52935

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343913 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-06 10:20:04 +00:00
Jessica Paquette
5fccc463e0 [GlobalIsel] Add llvm.invariant.start and llvm.invariant.end
Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator
and update the arm64-irtranslator test.

These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select),
and this patch fixes that.

https://reviews.llvm.org/D52945

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343885 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 21:02:46 +00:00
Sanjay Patel
b1bffa1a5b [x86] make blend tests resistant to demanded elements improvements; NFC
Similar to rL343858 - we don't want these tests to lose value with D52912.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343882 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 20:26:54 +00:00
Alex Bradbury
628b7f5885 [RISCV] Regenerate several tests now enableMultipleCopyHints is enabled by default
r343851 caused codegen changes in several tests. This patch regenerates them.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343873 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 18:25:55 +00:00
Craig Topper
6bab3515eb [X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits.
The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away.

The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343871 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 18:13:36 +00:00
Sanjay Patel
9d809925c6 [SelectionDAG] allow undefs when matching splat constants
And use that to transform fsub with zero constant operands.
The integer part isn't used yet, but it is proposed for use in
D44548, so adding both enhancements here makes that 
patch simpler.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343865 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 17:42:19 +00:00
Sanjay Patel
eb13633ec7 [x86] add test for (X - 0.0) vector with undef elts; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343863 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 17:36:51 +00:00
Simon Pilgrim
22609030b4 [X86][SSE] Try to make MOVLPS/MOVHPS(+PD) instructions SimplifyDemandedElts proof
Fix for D52912 which was simplifying MOVLPS/MOVHPS(+PD) instructions as the tests were only touching one of the vector halfs

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343858 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 15:50:18 +00:00
Sanjay Patel
ece1a2dc30 [x86] regenerate full checks; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343855 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 14:56:14 +00:00
Sanjay Patel
99f3c46caf [x86] add test for fneg matching failure; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343854 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 14:49:20 +00:00
Simon Pilgrim
d327e07926 [X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles
Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1))

This was found necessary while investigating PR39161

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343853 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 14:41:00 +00:00
Tom Stellard
69f718971f AMDGPU/GlobalISel: Add support for G_INTTOPTR
Summary: This is a no-op.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52916

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343839 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 04:34:09 +00:00
Thomas Lively
70285a0359 [WebAssembly] Saturating arithmetic intrinsics
Summary: Depends on D52805.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52813

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343833 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-05 00:45:20 +00:00
Daniel Sanders
48a6c8baac [globalisel][combine] When placing truncates, handle the case when the BB is empty
GlobalISel uses MIR with implicit fallthrough on each basic block. As a result,
getFirstNonPhi() can return end().


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343829 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 23:47:37 +00:00
Yury Delendik
9c4d2a20d7 [WebAssembly] Ignore DBG_VALUE in WebAssemblyCFGStackify pass when looking for block start
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=39158 and regression caused by
D49034. Though it is possible the problem was existed before and was exposed by
additional DBG_VALUEs.

Reviewers: sunfish, dschuff, aheejin

Reviewed By: aheejin

Subscribers: sbc100, aheejin, llvm-commits, alexcrichton, jgravelle-google

Differential Revision: https://reviews.llvm.org/D52837

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343827 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 23:31:00 +00:00
Daniel Sanders
ae0852b7ac [globalisel][combine] Fix a rare crash when encountering an instruction whose op0 isn't a reg
The simplest instance of this is an intrinsic with no results which will have the
intrinsic ID as operand 0.

Also fix some benign incorrectness when op0 is a reg but isn't a def that was
guarded against by checking for the extension opcodes.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343821 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 21:44:32 +00:00
Konstantin Zhuravlyov
b57394b3c2 AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa
The isAmdCodeObjectV2 is a misleading name which actually checks whether the os
is amdhsa or mesa.

Also add a test to make sure we do not generate old kernel header for code
object v3.

Differential Revision: https://reviews.llvm.org/D52897


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343813 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 21:02:16 +00:00
Daniel Sanders
c07d6f2d53 [globalisel][combine] Improve the truncate placement for the extending-loads combine
This brings the extending loads patch back to the original intent but minus the
PHI bug and with another small improvement to de-dupe truncates that are
inserted into the same block.

The truncates are sunk to their uses unless this would require inserting before a
phi in which case it sinks to the _beginning_ of the predecessor block for that
path (but no earlier than the def).

The reason for choosing the beginning of the predecessor is that it makes de-duping
multiple truncates in the same block simple, and optimized code is going to run a
scheduler at some point which will likely change the position anyway.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343804 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 18:44:58 +00:00
Sanjay Patel
ef9b7604b6 [x86] add test for SSE sqrtss register dep (PR22206)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343803 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 17:59:30 +00:00
Matthias Braun
5fffc11996 AArch64: Fix XSeqPairs/WSeqPairs problems
- Fix spill/reloads of XSeqPairs failing with vregs (only physregs
  worked correctly)
- Add missing spill/reload code for WSeqPairs class

Differential Revision: https://reviews.llvm.org/D52761

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343799 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 17:02:53 +00:00
Farhana Aleen
b3e54123c5 [AMDGPU] Match signed dot4/8 pattern.
Summary: This patch matches signed dot4 and dot8 pattern.

Author: FarhanaAleen

Reviewed By: msearles

Differential Revision: https://reviews.llvm.org/D52520

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343798 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 16:57:37 +00:00
Simon Pilgrim
4959586891 [X86][AVX] Add PR39161 test case for v4f64 zzww shuffle
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343786 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 15:06:09 +00:00
Alex Bradbury
c9b5c59206 [RISCV][NFC] Remove dead CHECK lines from vararg.ll test
The RISCV32 check prefix is no longer used so these lines are dead.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343757 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 07:35:52 +00:00
Alex Bradbury
af8412e6ae [RISCV] Bugfix for floats passed on the stack with the ILP32 ABI on RV32F
f32 values passed on the stack would previously cause an assertion in 
unpackFromMemLoc.. This would only trigger in the presence of the F extension 
making f32 a legal type. Otherwise the f32 would be legalized.

This patch fixes that by keeping LocVT=f32 when a float is passed on the 
stack. It also adds test coverage for this case, and tests that also 
demonstrate lw/sw/flw/fsw will be selected when most profitable. i.e. there is 
no unnecessary i32<->f32 conversion in registers.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343756 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 07:28:49 +00:00
Thomas Lively
6f31a46f4a [WebAssembly] Bitselect intrinsic and instruction
Summary: Depends on D52755.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52805

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343739 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-03 23:02:23 +00:00