Commit Graph

191211 Commits

Author SHA1 Message Date
Jessica Paquette
53b445b98d [AArch64][GlobalISel] Walk through G_AND in TB(N)Z bit calculation
Given

```
tb(n)z (and x, m), b
```

Where the `b`-th bit of `m` is 1,

```
tb(n)z (and x, m), b == tb(n)z x, b
```

So, we can walk past a `G_AND` in this case.

Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this.

Differential Revision: https://reviews.llvm.org/D73790
2020-02-03 11:53:47 -08:00
Amara Emerson
c040f97bf9 [AArch64][GlobalISel] Don't reconvert to p0 in convertPtrAddToAdd().
convertPtrAddToAdd improved overall code size and quality by a significant amount,
but on -O0 we generate some cross-class copies due to the fact that we emitted
G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any
register coalescing, so these cross class copies end up escaping as moves, and
we ended up regressing 3 benchmarks on CTMark (though still a winner overall).

This patch changes the lowering to instead directly emit the G_ADD into the
destination register, and then force changes the dest LLT to s64 from p0. This
should be ok, as all uses of the register should now be selected and therefore
the LLT doesn't matter for the users. It does however matter for the importer
patterns, which will fail to select a G_ADD if there's a p0 LLT.

I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't
use the same trick of breaking the type system since that could break the
selection of the defining instruction. Thus with -O0 we still end up with a
cross class copy on source.

Code size improvements on -O0:
Program                                         baseline      new         diff
 test-suite :: CTMark/Bullet/bullet.test        965520       949164      -1.7%
 test-suite...TMark/7zip/7zip-benchmark.test    1069456      1052600     -1.6%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    1213692      1199804     -1.1%
 test-suite...:: CTMark/sqlite3/sqlite3.test    421680       419736      -0.5%
 test-suite...-typeset/consumer-typeset.test    837076       833380      -0.4%
 test-suite :: CTMark/lencod/lencod.test        799712       796976      -0.3%
 test-suite...:: CTMark/ClamAV/clamscan.test    688264       686132      -0.3%
 test-suite :: CTMark/kimwitu++/kc.test         1002344      999648      -0.3%
 test-suite...Mark/mafft/pairlocalalign.test    422296       421768      -0.1%
 test-suite :: CTMark/SPASS/SPASS.test          656792       656532      -0.0%
 Geomean difference                                                      -0.6%

Differential Revision: https://reviews.llvm.org/D73910
2020-02-03 11:50:22 -08:00
Matt Arsenault
b1ca9e2a43 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Quentin Colombet
cbcbad6447 [TargetRegisterInfo] Make the heuristic to skip region split overridable by the target
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).

This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.

The default implementation of this hook keeps the heuristic as it was
before this patch.
2020-02-03 11:30:35 -08:00
Reid Kleckner
9369556310 Add PassManagerImpl.h to hide implementation details
ClangBuildAnalyzer results show that a lot of time is spent
instantiating AnalysisManager::getResultImpl across the code base:

**** Templates that took longest to instantiate:
 50445 ms: llvm::AnalysisManager<llvm::Function>::getResultImpl (412 times, avg 122 ms)
 47797 ms: llvm::AnalysisManager<llvm::Function>::getResult<llvm::TargetLibraryAnalysis> (389 times, avg 122 ms)
 46894 ms: std::tie<const unsigned long long, const bool> (2452 times, avg 19 ms)
 43851 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096>::Allocate (3228 times, avg 13 ms)
 33911 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (897 times, avg 37 ms)
 33854 ms: std::tie<const unsigned long long, const unsigned long long> (1897 times, avg 17 ms)
 27886 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11156 times, avg 2 ms)

I mentioned this result to @chandlerc, and he suggested this direction.

AnalysisManager is already explicitly instantiated, and getResultImpl
doesn't need to be inlined. Move the definition to an Impl header, and
include that header in files that explicitly instantiate
AnalysisManager. There are only four (real) IR units:
- function
- module
- loop
- cgscc

Looking at a specific transform (ArgumentPromotion.cpp), here are three
compilations before & after this change:

BEFORE:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 258.15MB
real: 0m6.297s
peak memory: 257.54MB
real: 0m5.906s
peak memory: 257.47MB
real: 0m6.219s

AFTER:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 235.35MB
real: 0m5.454s
peak memory: 234.72MB
real: 0m5.235s
peak memory: 234.39MB
real: 0m5.469s

The 20MB of memory saved seems real, and the time improvement seems like
it is there.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D73817
2020-02-03 11:15:55 -08:00
Reid Kleckner
a1c473cd39 Revert "[SVE] Fix bug in simplification of scalable vector instructions"
This reverts commit 31574d38ac5fa4646cf01dd252a23e682402134f.

The newly added shufflevector test does not pass locally on either of my
workstations.
2020-02-03 11:12:09 -08:00
Michael Trent
cd0c8f255c [llvm-objdump] Suppress spurious warnings when parsing Mach-O binaries.
Summary:
llvm-objdump started warning when asked to disassemble a section that
isn't present in the input files, in Yuanfang Chen's change:
d16c162c9453db855503134fe29ae4a3c0bec936. The problem is that the
logic was restricted only to the generic llvm-objdump parser, not to the
Mach-O-specific parser used for Apple toolchain compatibility. The
solution is to log section names from the Mach-O parser.

The macho-cstring-dump.test has been updated to fail if it encounters
this new warning in the future.

Reviewers: pete, ab, lhames, jhenderson, grimar, MaskRay, ychen

Reviewed By: jhenderson, grimar

Subscribers: rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73586
2020-02-03 10:59:36 -08:00
Alina Sbirlea
0528d7fce2 [LoopUtils] Make duplicate method a utility. [NFCI]
Summary:
Method appendLoopsToWorklist is duplicate in LoopUnroll and in the
LoopPassManager as an internal method. Make it an utility.

Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi

Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73569
2020-02-03 10:24:18 -08:00
Christopher Tetreault
56276c94bb [SVE] Fix bug in simplification of scalable vector instructions
Summary:
* Most of the simplifications in SimplifyShuffleVectorInst depend on the
concrete value of, or the length of the mask vector. For scalable
vectors, this cannot be known at compile time.
** for these tests, detect if the vector is scalable before attempting
the transformation
* The functions ShuffleVectorInst::getMaskValue and
ShuffleVectorInst::getShuffleMask access the value of the constant mask.
However, since the length of the mask is unknown at compile time, these
function do not work for scalable vectors. Add asserts to ensure that
the input mask is not scalable

Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73555
2020-02-03 10:15:56 -08:00
Nikita Popov
092e2dc033 [SimplifyLibCalls] Remove unused IRBuilder argument; NFC
isLocallyOpenedFile() does not use IRBuilder.
2020-02-03 19:12:57 +01:00
Nikita Popov
64d4134188 [IRBuilder] Add missing NoFolder::CreatePointerBitCastOrAddrSpaceCast(); NFC
Split out from D73835. This method was added to ConstantFolder and
TargetFolder, but not NoFolder.
2020-02-03 19:11:27 +01:00
Nikita Popov
41a030740c [IRBuilder] Remove unnecessary NoFolder methods; NFCI
Split out from D73835: These methods are not part of the
ConstantFolder API and as such don't serve a purpose.
2020-02-03 19:08:41 +01:00
Simon Pilgrim
780ca3dfa3 [X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI. 2020-02-03 18:06:47 +00:00
Nikita Popov
df272a915f [InstCombine] Add replaceOperand() helper
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.

This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.

Differential Revision: https://reviews.llvm.org/D73803
2020-02-03 19:00:17 +01:00
Nikita Popov
ed97e37dc0 [InstCombine] Rename worklist methods; NFC
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.

As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.

Differential Revision: https://reviews.llvm.org/D73745
2020-02-03 18:56:51 +01:00
Nikita Popov
8495c2d57a [ARM] Expand vector reduction intrinsics on soft float
Followup to D73135. If the target doesn't have hard float (default
for ARM), then we assert when trying to soften the result of vector
reduction intrinsics. This patch marks these for expansion as well.
(A bit odd to use vectors on a target without hard float ... but
that's where you end up if you expose target-independent vector types.)

Differential Revision: https://reviews.llvm.org/D73854
2020-02-03 18:49:12 +01:00
Nikita Popov
8f742c150c [Examples] Link BitReader in ThinLtoJIT example
D72486 broke the shared library build.
2020-02-03 18:47:38 +01:00
Nikita Popov
08f738d15c [InstCombine] Fix unused variable warning; NFC 2020-02-03 18:47:38 +01:00
Teresa Johnson
919474967d [ThinLTO] More efficient export computation (NFC)
Summary:
A recent change to enable more importing of global variables with
references exposed some efficiency issues with export computation.
See D73724 for more information and detailed analysis.

The first was specific to variable importing. The code was marking every
copy of a referenced value (from possibly thousands of files in the case
of linkonce_odr) as exported, and we only need to mark the copy in the
module containing the variable def being imported as exported. The
reason is that this is tracking what values are newly exported as a
result of importing. Anything that was defined in another module and
simply used in the exporting module is already exported, and would have
been identified by the caller (e.g. the LTO API implementations).

The second issue is that the code was re-adding previously exported
values (along with all references). It is easy to identify when a
variable was already imported into the same module (via the
import list insert call return value), and we already did this for
function importing. However, what we weren't doing for either function
or variable importing was avoiding a re-insertion when it was previously
exported into a different importing module. The reason we couldn't do
this is there was no way of telling from the export list whether it was
previously inserted there because its definition was exported (in which
case we already marked all its references as exported) from when it was
inserted there because it was referenced by another exported value (in
which case we haven't yet inserted its own references).

To address this we can restructure the way the export list is
constructed. This patch only adds the actual imported definitions
(variable or function) to the export list for its module during the
import computation. After import computation is complete, where we were
already post-processing the export list we go ahead and add all
references made by those exported values to the export list.

These changes speed up the thin link not only with constant variable
importing enabled, but also without (due to the efficiency improvement
in function importing).

Some thin link user time measurements for one large application, average
of 5 runs:

With constant variable importing enabled:
- without this patch: 479.5s
- with this patch: 74.6s

Without constant variable importing enabled:
- without this patch: 80.6s
- with this patch: 70.3s

Note I have not re-enabled constant variable importing here, as I would
like to do additional compile time measurements with these fixes first.

Reviewers: evgeny777

Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73851
2020-02-03 09:15:33 -08:00
Jay Foad
ccd7445730 [AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions
Summary:
This prevents unwanted clustering of BUF instructions with the same
vaddr but different resource descriptors.

Reviewers: rampitec, arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73867
2020-02-03 17:06:09 +00:00
Sanjay Patel
4398f57ddf [InstCombine] add tests for casted phi; NFC 2020-02-03 11:54:47 -05:00
Simon Pilgrim
539b0b67e9 HexagonOptAddrMode::changeStore - fix null dereference warning (PR43463)
As detailed on PR43463, this fixes a static analyzer null dereference warning by sinking Changed = true into the if() blocks where the MIB is actually created.

I did a quick check that suggested that one of those if() blocks is always guaranteed to be hit (so we could change it to if-else), but this seems like a safer approach

Differential Revision: https://reviews.llvm.org/D73883
2020-02-03 16:50:04 +00:00
Simon Pilgrim
9c81e3d7cc [TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling
We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.
2020-02-03 16:50:04 +00:00
Simon Pilgrim
5c93d9fa0c [X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0 2020-02-03 16:50:03 +00:00
Hans Wennborg
494b532127 Actually, don't try to use __builtin_strlen in StringRef.h before VS 2019
The fix in b3d7d1061dc375bb5ea725e6597382fcd37f41d6 compiled nicely,
but didn't link because at least the VS 2017 version I use doesn't
have the builtin yet. Instead, make use of the builtin with MSVC
conditional on VS 2019 or later.
2020-02-03 17:49:29 +01:00
Guillaume Chatelet
1f9dcd30dc [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Hans Wennborg
41105ccc60 Declare __builtin_strlen in StringRef.h as constexpr
Otherwise Visual Studio 2017 will complain about
llvm::StringRef::strlen not being constexpr:

  StringRef.h(80): error C3615: constexpr function 'llvm::StringRef::strLen' cannot result in a constant expression
  StringRef.h(84): note: failure was caused by call of undefined function or one not declared 'constexpr'
2020-02-03 16:58:01 +01:00
Kazushi (Jam) Marukawa
3f0180f097 [VE] (fp)trunc+store & load+(fp)ext isel
Summary: load+sext/zext/fpext and (fp)trunc+store isel legalization and tests

Reviewers: arsenm, craig.topper, rengolin, k-ishizaka

Reviewed By: arsenm

Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits

Tags: #ve, #llvm

Differential Revision: https://reviews.llvm.org/D73774
2020-02-03 16:55:44 +01:00
Simon Pilgrim
cebe3b26a7 [X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153)
Add a KnownBits::extractBits helper
2020-02-03 15:43:59 +00:00
Hans Wennborg
238f3b0462 build_llvm_package.bat: Use a short form of the git revision 2020-02-03 16:40:10 +01:00
Craig Topper
cf7fa877a2 [X86] FUCOMI/FCOMI instructions should Def FPSW not FPCW.
These instructions can set the exception in FPSW. But I
don't think they can change FPCW. So this looks like a typo.

Differential Revision: https://reviews.llvm.org/D73864
2020-02-03 07:39:00 -08:00
Sanjay Patel
cb8bd29a62 [InstCombine] regenerate complete test checks; NFC 2020-02-03 10:30:26 -05:00
Kazushi (Jam) Marukawa
b2d7ee731b [VE] vaarg functions callers and callees
Summary: Isel patterns and tests for vaarg functions as callers and callees.

Reviewers: arsenm, rengolin, k-ishizaka

Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits

Tags: #ve, #llvm

Differential Revision: https://reviews.llvm.org/D73710
2020-02-03 16:26:44 +01:00
Simon Pilgrim
8301fd0d00 [X86] Add some initial BEXTR combine tests 2020-02-03 15:16:40 +00:00
Simon Pilgrim
1f9f866ff9 [X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode
Some prep work for PR39153.
2020-02-03 15:16:40 +00:00
Matt Arsenault
80bf477ac5 AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break
These have to be the same mask type.
2020-02-03 07:02:05 -08:00
Johannes Doerfert
b1f217520e Revert "[OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder."
This reverts commit 1ca740387b9bbdc142ac81c8bdd6370a8813e328.

The bots break [0], investigation is needed.

[0] http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22899
2020-02-03 08:59:14 -06:00
Fady Ghanim
0e8b45d86c [OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder.
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.

Also this patch modifies clang to use the new directives when  `-fopenmp-enable-irbuilder` commandline option is passed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72304
2020-02-03 08:44:23 -06:00
John Brawn
882073b1e0 [FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC
These get lowered to function calls, like the non-strict versions.

Differential Revision: https://reviews.llvm.org/D73784
2020-02-03 14:39:16 +00:00
Krzysztof Parzyszek
fde0721a0a [Hexagon] Rename FeatureHasPreV65 to FeaturePreV65 2020-02-03 08:20:59 -06:00
Sanjay Patel
244e402b5c [InstCombine] reassociate splatted vector ops
bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp

This patch depends on the splat analysis enhancement in D73549.
See the test with comment:
; Negative test - mismatched splat elements
...as the motivation for that first patch.

The motivating case for reassociating splatted ops is shown in PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174

In that example, a slight change in order-of-associative math results
in a big difference in IR and codegen. This patch gets all of the
unnecessary shuffles out of the way, but doesn't address the potential
scalarization (see D50992 or D73480 for that).

Differential Revision: https://reviews.llvm.org/D73703
2020-02-03 09:08:36 -05:00
Matt Arsenault
7ff7b7c59d AMDGPU/GlobalISel: Reduce indentation 2020-02-03 05:41:14 -08:00
Matt Arsenault
729362c237 AMDGPU/GlobalISel: Fix mem size in test
This wasn't intended to tests an extload.
2020-02-03 05:41:14 -08:00
Simon Moll
99c2d7bdcd [NFC][VE] format VEInstrInfo 2020-02-03 14:25:49 +01:00
Simon Moll
a84333d938 [NFC] unsigned->Register in storeRegTo/loadRegFromStack
Summary:
This patch makes progress on the 'unsigned -> Register' rewrite for
`TargetInstrInfo::loadRegFromStack` and `TII::storeRegToStack`.

Reviewers: arsenm, craig.topper, uweigand, jpienaar, atanasyan, venkatra, robertlytton, dylanmckay, t.p.northover, kparzysz, tstellar, k-ishizaka

Reviewed By: arsenm

Subscribers: wuzish, merge_guards_bot, jyknight, sdardis, nemanjai, jvesely, wdng, nhaehnle, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73870
2020-02-03 14:22:16 +01:00
Guillaume Chatelet
8cbfdb9b6f [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
John Brawn
cdc62345ba [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.

The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.

Differential Revision: https://reviews.llvm.org/D73194
2020-02-03 12:59:12 +00:00
James Henderson
46d04a31b6 [DebugInfo][test] Adjust line table unit length to account for contents
Previously, if a debug line Prologue was created via
createBasicPrologue, its TotalLength field did not account for any
contents in the table itself. This change fixes this issue.

Reviewed by: probinson

Differential Revision: https://reviews.llvm.org/D73772
2020-02-03 12:16:36 +00:00
Simon Tatham
6709634733 [ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.

So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?

The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.

To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.

In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.

For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73786
2020-02-03 11:20:06 +00:00
Simon Tatham
e3f9be3c6f [ARM,MVE] Add intrinsics for v[id]dupq and v[id]wdupq.
Summary:
These instructions generate a vector of consecutive elements starting
from a given base value and incrementing by 1, 2, 4 or 8. The `wdup`
versions also wrap the values back to zero when they reach a given
limit value. The instruction updates the scalar base register so that
another use of the same instruction will continue the sequence from
where the previous one left off.

At the IR level, I've represented these instructions as a family of
target-specific intrinsics with two return values (the constructed
vector and the updated base). The user-facing ACLE API provides a set
of intrinsics that throw away the written-back base and another set
that receive it as a pointer so they can update it, plus the usual
predicated versions.

Because the intrinsics return two values (as do the underlying
instructions), the isel has to be done in C++.

This is the first family of MVE intrinsics that use the `imm_1248`
immediate type in the clang Tablegen framework, so naturally, I found
I'd given it the wrong C integer type. Also added some tests of the
check that the immediate has a legal value, because this is the first
time those particular checks have been exercised.

Finally, I also had to fix a bug in MveEmitter which failed an
assertion when I nested two `seq` nodes (the inner one used to extract
the two values from the pair returned by the IR intrinsic, and the
outer one put on by the predication multiclass).

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73357
2020-02-03 11:20:06 +00:00