Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.
This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table
Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.
llvm-svn: 366329
AMDGPU needs to allocate special argument registers separately from
the user function argument list, so needs direct control over the
CCState.
The ArgLocs argument is only really necessary because CCState doesn't
allow access to it.
llvm-svn: 366279
Summary:
We agreed to rename `except_ref` to `exnref` for consistency with other
reference types in
https://github.com/WebAssembly/exception-handling/issues/79. This also
renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order
to use the file for other reference types in future.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64703
llvm-svn: 366145
Currently AMDGPU uses a CodePatPred to check address spaces from the
MachineMemOperand. Introduce a new first class property so that the
existing patterns can be easily modified to uses the new generated
predicate, which will also be handled for GlobalISel.
I would prefer these to match against the pointer type of the
instruction, but that would be difficult to get working with
SelectionDAG compatbility. This is much easier for now and will avoid
a painful tablegen rewrite for all the loads and stores.
I'm also not sure if there's a better way to encode multiple address
spaces in the table, rather than putting the number to expect.
llvm-svn: 366128
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Adding a default "no information" subtarget implementation
Only a handful of targets use these interfaces currently: AArch64,
Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget
implementation, so its custom TTI implementation is migrated to use
the new facilities in BasicTTIImpl to invoke its custom subtarget
implementation. The custom TTI implementations continue to exist for
the other targets with this change. They are not moved over to
subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to
the system model defined by the target. With this change, the default
subtarget implementation essentially returns "no information" for
these interfaces. None of the existing users of TTI will hit that
implementation because they define their own custom TTI
implementations and won't use the BasicTTIImpl implementations.
Once system models are in place for the targets that use these
interfaces, their custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 365676
Without this gcc 7.4.0 complains with
../include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h:457:54: error: suggest parentheses around '&&' within '||' [-Werror=parentheses]
isArtifactCast(TmpDef->getOpcode()) &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
"Expecting copy or artifact cast here");
~
llvm-svn: 365597
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.
llvm-svn: 365583
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.
This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
Differential Revision: https://reviews.llvm.org/D64295
llvm-svn: 365549
If we have an icmp->brcond->br sequence where the brcond just branches to the
next block jumping over the br, while the br takes the false edge, then we can
modify the conditional branch to jump to the br's target while inverting the
condition of the incoming icmp. This means we can eliminate the br as an
unconditional branch to the fallthrough block.
Differential Revision: https://reviews.llvm.org/D64354
llvm-svn: 365510
Dump the DWARF information about call sites and call site parameters into
debug info sections.
The patch also provides an interface for the interpretation of instructions
that could load values of a call site parameters in order to generate DWARF
about the call site parameters.
([13/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60716
llvm-svn: 365467
Emit replacements for clobbered parameters location if the parameter
has unmodified value throughout the funciton. This is basic scenario
where we can use the debug entry values.
([12/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D58042
llvm-svn: 365444
Fixes AMDGPU patterns for 32-bit address spaces always failing. Tests
will be included in future patches when additional issues are solved.
llvm-svn: 365319
Some out of tree backend require larger vector type. Since maintaining the changes out of tree is difficult due to the many manual changes needed when adding a new type we are adding it even if no backend currently use it.
Differential Revision: https://reviews.llvm.org/D64141
Patch by Thomas Raoux!
llvm-svn: 365274
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
llvm-svn: 365202
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
llvm-svn: 365198
This patch generalizes the fix in D61680 to ignore all meta instructions,
not just debug info.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D62605
llvm-svn: 365061
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.
Differential revision: https://reviews.llvm.org/D64131
llvm-svn: 365043
For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow
encoding. However, current allocation order is like:
R0-R3, R12, LR, R4-R11
As a result, a lot of instructs that use R12/LR will be wide instrs.
This patch changes the allocation order to:
R0-R7, R12, LR, R8-R11
for thumb2 and -Osize.
In most cases, there is no extra push/pop instrs as they will be folded
into existing ones. There might be slight performance impact due to more
stack usage, so we only enable it when opt for min size.
https://reviews.llvm.org/D30324
llvm-svn: 365014
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
There are two main issues preventing us from generating immediate form shifts:
1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift
immediate forms, but they currently don't work because the amount type is
expected to be an s64 constant, but we only legalize them to have homogenous
types.
To deal with this, first we introduce a custom legalizer to *only* custom legalize
s32 shifts which have a constant operand into a s64.
There is also an additional artifact combiner to fold zexts(g_constant) to a
larger G_CONSTANT if it's legal, a counterpart to the anyext version committed
in an earlier patch.
2) For G_SHL the importer can't cope with the pattern. For this I introduced an
early selection phase in the arm64 selector to select these forms manually
before the tablegen selector pessimizes it to a register-register variant.
Differential Revision: https://reviews.llvm.org/D63910
llvm-svn: 364994
https://reviews.llvm.org/D31359
Add a hook "legalizeInstrinsic" to allow backends to override this
and custom lower/legalize intrinsics.
llvm-svn: 364821
Summary:
Delete the begin-end form because the standard std::partition_point
can be easily used as a replacement.
The ranges-style llvm::bsearch will be renamed to llvm::partition_point
in the next clean-up patch.
The name "bsearch" doesn't meet people's expectation because in C:
> If two or more members compare equal, which member is returned is unspecified.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D63718
llvm-svn: 364719
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
llvm-svn: 364613
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).
Original patch D50222 by @hermord (Dmytro Shynkevych)
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364600
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
Original patch author: @hermord (Dmytro Shynkevych)!
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364563
Handle call instruction replacements and deletions in order to preserve
valid state of the call site info of the MachineFunction.
NOTE: If the call site info is enabled for a new target, the assertion from
the MachineFunction::DeleteMachineInstr() should help to locate places
where the updateCallSiteInfo() should be called in order to preserve valid
state of the call site info.
([10/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D61062
llvm-svn: 364536
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.
([9/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60715
llvm-svn: 364516
Remove the last use of packRegs from IRTranslator and delete
pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics
with aggregate arguments, since we don't have a testcase for them so
it's hard to tell how we'd want to handle them.
Discussed in https://reviews.llvm.org/D63551
llvm-svn: 364514