When choosing whether a pair of loads can be combined into a single
wide load, we check that the load only has a sext user and that sext
also only has one user. But this can prevent the transformation in
the cases when parallel macs use the same loaded data multiple times.
To enable this, we need to fix up any other uses after creating the
wide load: generating a trunc and a shift + trunc pair to recreate
the narrow values. We also need to keep a record of which loads have
already been widened.
Differential Revision: https://reviews.llvm.org/D59215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356132 91177308-0d34-0410-b5e6-96231b3b80d8
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.
Differential Revision: https://reviews.llvm.org/D54790
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347542 91177308-0d34-0410-b5e6-96231b3b80d8
A few code movement things:
- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
including CreateParallelMACPairs so it doesn't need to return a
value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.
Differential Revision: https://reviews.llvm.org/D54254
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346479 91177308-0d34-0410-b5e6-96231b3b80d8
Previously reverted in rL343082.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344693 91177308-0d34-0410-b5e6-96231b3b80d8
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).
This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344186 91177308-0d34-0410-b5e6-96231b3b80d8
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).
> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343082 91177308-0d34-0410-b5e6-96231b3b80d8
Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342870 91177308-0d34-0410-b5e6-96231b3b80d8
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342210 91177308-0d34-0410-b5e6-96231b3b80d8
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342033 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
OpChain has subclasses, so add a virtual destructor.
This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701.
Reviewers: javed.absar
Subscribers: llvm-commits, SjoerdMeijer, samparker
Differential Revision: https://reviews.llvm.org/D49681
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337713 91177308-0d34-0410-b5e6-96231b3b80d8
In preparing to allow ARMParallelDSP pass to parallelise more than
smlads, I've restructed some elements:
- The ParallelMAC struct has been renamed to BinOpChain.
- The BinOpChain struct holds two value lists: LHS and RHS, as well
as inheriting from the OpChain base class.
- The OpChain struct holds all the values of the represented chain
and has had the memory locations functionality inserted into it.
- ParallelMACList becomes OpChainList and it now holds pointers
instead of objects.
Differential Revision: https://reviews.llvm.org/D49020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337701 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336795 91177308-0d34-0410-b5e6-96231b3b80d8
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.
Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).
Patch by: Sam Parker and Sjoerd Meijer
Differential Revision: https://reviews.llvm.org/D48128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335850 91177308-0d34-0410-b5e6-96231b3b80d8