Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Adding a default "no information" subtarget implementation
Only a handful of targets use these interfaces currently: AArch64,
Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget
implementation, so its custom TTI implementation is migrated to use
the new facilities in BasicTTIImpl to invoke its custom subtarget
implementation. The custom TTI implementations continue to exist for
the other targets with this change. They are not moved over to
subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to
the system model defined by the target. With this change, the default
subtarget implementation essentially returns "no information" for
these interfaces. None of the existing users of TTI will hit that
implementation because they define their own custom TTI
implementations and won't use the BasicTTIImpl implementations.
Once system models are in place for the targets that use these
interfaces, their custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365676 91177308-0d34-0410-b5e6-96231b3b80d8
Without this gcc 7.4.0 complains with
../include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h:457:54: error: suggest parentheses around '&&' within '||' [-Werror=parentheses]
isArtifactCast(TmpDef->getOpcode()) &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
"Expecting copy or artifact cast here");
~
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365597 91177308-0d34-0410-b5e6-96231b3b80d8
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365583 91177308-0d34-0410-b5e6-96231b3b80d8
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.
This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
Differential Revision: https://reviews.llvm.org/D64295
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365549 91177308-0d34-0410-b5e6-96231b3b80d8
If we have an icmp->brcond->br sequence where the brcond just branches to the
next block jumping over the br, while the br takes the false edge, then we can
modify the conditional branch to jump to the br's target while inverting the
condition of the incoming icmp. This means we can eliminate the br as an
unconditional branch to the fallthrough block.
Differential Revision: https://reviews.llvm.org/D64354
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365510 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we've dropped VS2015 support (D64326) we can enable the constexpr variables on MSVC builds as VS2017+ correctly handles them
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365477 91177308-0d34-0410-b5e6-96231b3b80d8
Dump the DWARF information about call sites and call site parameters into
debug info sections.
The patch also provides an interface for the interpretation of instructions
that could load values of a call site parameters in order to generate DWARF
about the call site parameters.
([13/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60716
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365467 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes AMDGPU patterns for 32-bit address spaces always failing. Tests
will be included in future patches when additional issues are solved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365319 91177308-0d34-0410-b5e6-96231b3b80d8
Some out of tree backend require larger vector type. Since maintaining the changes out of tree is difficult due to the many manual changes needed when adding a new type we are adding it even if no backend currently use it.
Differential Revision: https://reviews.llvm.org/D64141
Patch by Thomas Raoux!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365274 91177308-0d34-0410-b5e6-96231b3b80d8
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365202 91177308-0d34-0410-b5e6-96231b3b80d8
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365198 91177308-0d34-0410-b5e6-96231b3b80d8
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.
Differential revision: https://reviews.llvm.org/D64131
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365043 91177308-0d34-0410-b5e6-96231b3b80d8
For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow
encoding. However, current allocation order is like:
R0-R3, R12, LR, R4-R11
As a result, a lot of instructs that use R12/LR will be wide instrs.
This patch changes the allocation order to:
R0-R7, R12, LR, R8-R11
for thumb2 and -Osize.
In most cases, there is no extra push/pop instrs as they will be folded
into existing ones. There might be slight performance impact due to more
stack usage, so we only enable it when opt for min size.
https://reviews.llvm.org/D30324
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365014 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365010 91177308-0d34-0410-b5e6-96231b3b80d8
There are two main issues preventing us from generating immediate form shifts:
1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift
immediate forms, but they currently don't work because the amount type is
expected to be an s64 constant, but we only legalize them to have homogenous
types.
To deal with this, first we introduce a custom legalizer to *only* custom legalize
s32 shifts which have a constant operand into a s64.
There is also an additional artifact combiner to fold zexts(g_constant) to a
larger G_CONSTANT if it's legal, a counterpart to the anyext version committed
in an earlier patch.
2) For G_SHL the importer can't cope with the pattern. For this I introduced an
early selection phase in the arm64 selector to select these forms manually
before the tablegen selector pessimizes it to a register-register variant.
Differential Revision: https://reviews.llvm.org/D63910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364994 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Delete the begin-end form because the standard std::partition_point
can be easily used as a replacement.
The ranges-style llvm::bsearch will be renamed to llvm::partition_point
in the next clean-up patch.
The name "bsearch" doesn't meet people's expectation because in C:
> If two or more members compare equal, which member is returned is unspecified.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D63718
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364719 91177308-0d34-0410-b5e6-96231b3b80d8
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364613 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).
Original patch D50222 by @hermord (Dmytro Shynkevych)
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364600 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
Original patch author: @hermord (Dmytro Shynkevych)!
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364563 91177308-0d34-0410-b5e6-96231b3b80d8
Handle call instruction replacements and deletions in order to preserve
valid state of the call site info of the MachineFunction.
NOTE: If the call site info is enabled for a new target, the assertion from
the MachineFunction::DeleteMachineInstr() should help to locate places
where the updateCallSiteInfo() should be called in order to preserve valid
state of the call site info.
([10/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D61062
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364536 91177308-0d34-0410-b5e6-96231b3b80d8
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.
([9/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60715
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364516 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the last use of packRegs from IRTranslator and delete
pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics
with aggregate arguments, since we don't have a testcase for them so
it's hard to tell how we'd want to handle them.
Discussed in https://reviews.llvm.org/D63551
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364514 91177308-0d34-0410-b5e6-96231b3b80d8
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63551
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364512 91177308-0d34-0410-b5e6-96231b3b80d8
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63550
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364511 91177308-0d34-0410-b5e6-96231b3b80d8
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364510 91177308-0d34-0410-b5e6-96231b3b80d8
Allow CallLowering::ArgInfo to contain more than one virtual register.
This is useful when passes split aggregates into several virtual
registers, but need to also provide information about the original type
to the call lowering. Used in follow-up patches.
Differential Revision: https://reviews.llvm.org/D63548
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364509 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62793
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364418 91177308-0d34-0410-b5e6-96231b3b80d8
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364412 91177308-0d34-0410-b5e6-96231b3b80d8
Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one *user*, but instead returned true only when there was one *use*. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one.
All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection.
Differential Revision: https://reviews.llvm.org/D63656
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364336 91177308-0d34-0410-b5e6-96231b3b80d8