The incoming accumulator value can be discovered through a sext, in
which case there will be a mismatch between the input and the result.
So sign extend the accumulator input if we're performing a 64-bit mac.
Differential Revision: https://reviews.llvm.org/D67220
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371370 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371284 91177308-0d34-0410-b5e6-96231b3b80d8
rL369567 reverted a couple of recent changes made to ARMParallelDSP
because of a miscompilation error: PR43073.
The issue stemmed from an underlying bug that was caused by adding
muls into a reduction before it was proved that they could be executed
in parallel with another mul.
Most of the changes here are from the previously reverted commits.
The additional changes have been made area:
1) The Search function now doesn't insert any muls into the Reduction
object. That now happens once the search has successfully finished.
2) For any muls added into the reduction but that weren't paired, we
accumulate their values as an input into the smlad.
Differential Revision: https://reviews.llvm.org/D66660
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370171 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369013 91177308-0d34-0410-b5e6-96231b3b80d8
As loads are combined and widened, we replaced their sext users
operands whereas we should have been replacing the uses of the sext.
I've added a load of tests, with only a few of them originally
causing assertion failures, the rest improve pattern coverage.
Differential Revision: https://reviews.llvm.org/D65740
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368404 91177308-0d34-0410-b5e6-96231b3b80d8
Add a couple of getters for Reduction and do some renaming of
variables around CreateSMLAD for clarity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367522 91177308-0d34-0410-b5e6-96231b3b80d8
- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
MulCandidate, as it's the only value we care about. This means we
don't have to perform any ugly (and unnecessary) iterations of the
list later on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367208 91177308-0d34-0410-b5e6-96231b3b80d8
We explicitly search for a parallel mac and we only care about its
inputs, checking for symmetry doesn't add anything here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367205 91177308-0d34-0410-b5e6-96231b3b80d8
We no longer have to check what loads are used, all this
is performed at the start of the transform, so it's not
doing anything now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367204 91177308-0d34-0410-b5e6-96231b3b80d8
While combining two loads into a single load, we often need to
reorder the pointer operands for the new load. This reordering was
broken in the cases where there was a chain of values that built up
the pointer.
Differential Revision: https://reviews.llvm.org/D65193
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366881 91177308-0d34-0410-b5e6-96231b3b80d8
Two functional changes have been made here:
- Now search up from any add instruction to find the chains of
operations that we may turn into a smlad. This allows the
generation of a smlad which doesn't accumulate into a phi.
- The search function has been corrected to stop it falsely searching
up through an invalid path.
The bulk of the changes have been making the Reduction struct a class
and making it more C++y with getters and setters.
Differential Revision: https://reviews.llvm.org/D61780
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365740 91177308-0d34-0410-b5e6-96231b3b80d8
Most of the code used for finding a 'narrow' sequence is not used,
so I've removed it and simplified the calls from the smlad matcher.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362104 91177308-0d34-0410-b5e6-96231b3b80d8
When deciding the safety of generating smlad, we checked for any
writes within the block that may alias with any of the loads that
need to be widened. This is overly conservative because it only
matters when there's a potential aliasing write to a location
accessed by a pair of loads.
Now we check for aliasing writes only once, during setup. If two
loads are found to have an aliasing write between them, we don't add
these loads to LoadPairs. This means that later during the transform,
we can safely widened a pair without worrying about aliasing.
However, to maintain correctness, we also need to change the way that
wide loads are inserted because the order is now important.
The MatchSMLAD method has also been changed, absorbing
MatchReductions and AddMACCandidate to hopefully improve readability.
Differential Revision: https://reviews.llvm.org/D6102
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360567 91177308-0d34-0410-b5e6-96231b3b80d8
When choosing whether a pair of loads can be combined into a single
wide load, we check that the load only has a sext user and that sext
also only has one user. But this can prevent the transformation in
the cases when parallel macs use the same loaded data multiple times.
To enable this, we need to fix up any other uses after creating the
wide load: generating a trunc and a shift + trunc pair to recreate
the narrow values. We also need to keep a record of which loads have
already been widened.
Differential Revision: https://reviews.llvm.org/D59215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356132 91177308-0d34-0410-b5e6-96231b3b80d8
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.
Differential Revision: https://reviews.llvm.org/D54790
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347542 91177308-0d34-0410-b5e6-96231b3b80d8
A few code movement things:
- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
including CreateParallelMACPairs so it doesn't need to return a
value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.
Differential Revision: https://reviews.llvm.org/D54254
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346479 91177308-0d34-0410-b5e6-96231b3b80d8
Previously reverted in rL343082.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344693 91177308-0d34-0410-b5e6-96231b3b80d8
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).
This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344186 91177308-0d34-0410-b5e6-96231b3b80d8
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).
> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343082 91177308-0d34-0410-b5e6-96231b3b80d8
Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342870 91177308-0d34-0410-b5e6-96231b3b80d8
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342210 91177308-0d34-0410-b5e6-96231b3b80d8
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342033 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
OpChain has subclasses, so add a virtual destructor.
This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701.
Reviewers: javed.absar
Subscribers: llvm-commits, SjoerdMeijer, samparker
Differential Revision: https://reviews.llvm.org/D49681
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337713 91177308-0d34-0410-b5e6-96231b3b80d8
In preparing to allow ARMParallelDSP pass to parallelise more than
smlads, I've restructed some elements:
- The ParallelMAC struct has been renamed to BinOpChain.
- The BinOpChain struct holds two value lists: LHS and RHS, as well
as inheriting from the OpChain base class.
- The OpChain struct holds all the values of the represented chain
and has had the memory locations functionality inserted into it.
- ParallelMACList becomes OpChainList and it now holds pointers
instead of objects.
Differential Revision: https://reviews.llvm.org/D49020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337701 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336795 91177308-0d34-0410-b5e6-96231b3b80d8
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.
Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).
Patch by: Sam Parker and Sjoerd Meijer
Differential Revision: https://reviews.llvm.org/D48128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335850 91177308-0d34-0410-b5e6-96231b3b80d8