This header doesn't seem to be parsable on its own and breaks the module build therefore with
the following error:
While building module 'LLVM_Backend' imported from llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:14:
In file included from <module-includes>:62:
llvm-project/llvm/include/llvm/CodeGen/MachinePipeliner.h:91:20: error: declaration of 'AAResultsWrapperPass' must be imported from module 'LLVM_Analysis.AliasAnalysis' before it is required
AU.addRequired<AAResultsWrapperPass>();
^
llvm-project/llvm/include/llvm/Analysis/AliasAnalysis.h:1157:7: note: previous declaration is here
class AAResultsWrapperPass : public FunctionPass {
^
llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:14:10: fatal error: could not build module 'LLVM_Backend'
~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375433 91177308-0d34-0410-b5e6-96231b3b80d8
Commit message from D66935:
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*`
Changes after D66935:
Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return
the uninitialized CalleeSavedStackSize when running `llc` on a specific
pass where the MIR code has already been expected to have gone through PEI.
Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try
to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug
mode, the compiler will assert the recalculated size equals the cached
size as calculated through a call to determineCalleeSaves.
This fixes two tests:
test/DebugInfo/AArch64/asan-stack-vars.mir
test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir
that otherwise fail when compiled using msan.
Reviewed By: omjavaid, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68783
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375425 91177308-0d34-0410-b5e6-96231b3b80d8
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375311 91177308-0d34-0410-b5e6-96231b3b80d8
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.
NFC
Differential Revision: https://reviews.llvm.org/D69187
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375277 91177308-0d34-0410-b5e6-96231b3b80d8
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375222 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.
After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68992
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375086 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375084 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375057 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.
At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.
As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.
In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.
Reviewers: aprantl, vsk, djtodoro, NikolaPrica
Reviewed By: aprantl
Subscribers: jyknight, fedor.sergeev, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67492
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374881 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
DWARF's DW_OP_entry_value operation has two operands; the first is a
ULEB128 operand that specifies the size of the second operand, which is
a DWARF block. This means that we need to be able to pre-calculate and
emit the size of DWARF expressions before emitting them. There is
currently no interface for doing this in DwarfExpression, so this patch
introduces that.
When implementing this I initially thought about running through
DwarfExpression's emission two times; first with a temporary buffer to
emit the expression, in order to being able to calculate the size of
that emitted data. However, DwarfExpression is a quite complex state
machine, so I decided against that, as it seemed like the two runs could
get out of sync, resulting in incorrect size operands. Therefore I have
implemented this in a way that we only have to run DwarfExpression once.
The idea is to emit DWARF to a temporary buffer, for which it is
possible to query the size. The data in the temporary buffer can then be
emitted to DwarfExpression's main output.
In the case of DIEDwarfExpression, a temporary DIE is used. The values
are all allocated using the same BumpPtrAllocator as for all other DIEs,
and the values are then transferred to the real value list. In the case
of DebugLocDwarfExpression, the temporary buffer is implemented using a
BufferByteStreamer which emits to a buffer in the DwarfExpression
object.
Reviewers: aprantl, vsk, NikolaPrica, djtodoro
Reviewed By: aprantl
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D67768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374879 91177308-0d34-0410-b5e6-96231b3b80d8
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374634 91177308-0d34-0410-b5e6-96231b3b80d8
In GISel we have both G_CONSTANT and G_FCONSTANT, but because
in GISel we don't really have a concept of Float vs Int value
the only difference between the two is where the data originates
from.
What both G_CONSTANT and G_FCONSTANT return is just a bag of bits
with the constant representation in it.
By making getConstantVRegVal() return G_FCONSTANTs bit representation
as well we allow ConstantFold and other things to operate with
G_FCONSTANT.
Adding tests that show ConstantFolding to work on mixed G_CONSTANT
and G_FCONSTANT sources.
Differential Revision: https://reviews.llvm.org/D68739
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374458 91177308-0d34-0410-b5e6-96231b3b80d8
Move the default implementations of cache and prefetch queries to
TargetTransformInfoImplBase and delete them from NoTIIImpl. This brings these
interfaces in line with how other TTI interfaces work.
Differential Revision: https://reviews.llvm.org/D68804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374446 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, the heuristics the if-conversion pass uses for diamond if-conversion
are based on execution time, with no consideration for code size. This adds a
new set of heuristics to be used when optimising for code size.
This is mostly target-independent, because the if-conversion pass can
see the code size of the instructions which it is removing. For thumb,
there are a few passes (insertion of IT instructions, selection of
narrow branches, and selection of CBZ instructions) which are run after
if conversion and affect these heuristics, so I've added target hooks to
better predict the code-size effect of a proposed if-conversion.
Differential revision: https://reviews.llvm.org/D67350
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374301 91177308-0d34-0410-b5e6-96231b3b80d8
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
MCSubtarget implementation
Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ. AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation. The custom TTI implementations
continue to exist for the other targets with this change. They are not moved
over to subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to the system
model defined by the target. With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return. Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo. Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.
Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374205 91177308-0d34-0410-b5e6-96231b3b80d8
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"
This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a.
This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04.
The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374091 91177308-0d34-0410-b5e6-96231b3b80d8
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.
Reviewers: aprantl, vsk, efriedma
Reviewed By: vsk, efriedma
Differential Revision: https://reviews.llvm.org/D66955
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374068 91177308-0d34-0410-b5e6-96231b3b80d8
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:
> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
> %mul = fmul fast <8 x half> %c, %b
> %sub = fsub fast <8 x half> %mul, %a
> ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.
This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.
Tested on aarch64-linux with make check-all.
Differential Revision: https://reviews.llvm.org/D67990
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374044 91177308-0d34-0410-b5e6-96231b3b80d8
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.
So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.
For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.
It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.
Differential revision: https://reviews.llvm.org/D67148
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374017 91177308-0d34-0410-b5e6-96231b3b80d8
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373937 91177308-0d34-0410-b5e6-96231b3b80d8
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373900 91177308-0d34-0410-b5e6-96231b3b80d8
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.
This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.
Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.
Fixes PR43217
Differential Revision: https://reviews.llvm.org/D68544
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373871 91177308-0d34-0410-b5e6-96231b3b80d8
Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.
The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.
SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.
The layout looks as follows:
+-------------+
| stack arg |
+-------------+
| Callee Saves|
| X29, X30 | (if available)
|-------------| <- FP (if available)
| : |
| SVE area |
| : |
+-------------+
|/////////////| alignment gap.
| : |
| Stack objs |
| : |
+-------------+ <- SP after call and frame-setup
SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.
Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61437
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373585 91177308-0d34-0410-b5e6-96231b3b80d8
As noted on PR41772, the static analyzer reports that the MachineMemOperand::print partial wrappers set a number of args to null pointers that were then dereferenced in the actual implementation.
It turns out that these wrappers are not being used at all (hence why we're not seeing any crashes), so I'd like to propose we just get rid of them.
Differential Revision: https://reviews.llvm.org/D68208
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373484 91177308-0d34-0410-b5e6-96231b3b80d8
This was reverted in r373454 due to breaking the expensive-checks bot.
This version addresses that by omitting the addSuccessorWithProb() call
when omitting the range check.
> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373477 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This extends the PeelingModuloScheduleExpander to generate prolog and epilog code,
and correctly stitch uses through the prolog, kernel, epilog DAG.
The key concept in this patch is to ensure that all transforms are *local*; only a
function of a block and its immediate predecessor and successor. By defining the problem in this way
we can inductively rewrite the entire DAG using only local knowledge that is easy to
reason about.
For example, we assume that all prologs and epilogs are near-perfect clones of the
steady-state kernel. This means that if a block has an instruction that is predicated out,
we can redirect all users of that instruction to that equivalent instruction in our
immediate predecessor. As all blocks are clones, every instruction must have an equivalent in
every other block.
Similarly we can make the assumption by construction that if a value defined in a block is used
outside that block, the only possible user is its immediate successors. We maintain this
even for values that are used outside the loop by creating a limited form of LCSSA.
This code isn't small, but it isn't complex.
Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet;
I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns
that we don't produce. Those still need a bit more investigation. In the meantime we
(Google) are happy with the code produced by this on our downstream SMS implementation,
and believe it generates correct code.
Subscribers: mgorny, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68205
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373462 91177308-0d34-0410-b5e6-96231b3b80d8