Motivated by D88183, this seeks to clarify the current loop nomenclature with added illustrations, examples for possibly unexpected situations (infinite loops not part of the "parent" loop, logical loops sharing the same header, ...), and clarification on what other sources may consider a loop. The current document also has multiple errors that are fixed here.
Some selected errors:
* Loops a defined as strongly-connected components. A component a partition of all nodes, i.e. a subloop can never be a component. That is, the document as it currently is only covers top-level loops, even it also uses the term SCC for subloops.
* "a block can be the header of two separate loops at the same time" (it is considered a single loop by LoopInfo)
* "execute before some interesting event happens" (some interesting event is not well-defined)
Reviewed By: baziotis, Whitney
Differential Revision: https://reviews.llvm.org/D88408
Summary:
This patch adds an error to Clang that detects if OpenMP offloading is
used between two architectures with incompatible pointer sizes. This
ensures that the data mapping can be done correctly and solves an issue
in code generation generating the wrong size pointer. This patch adds a
new lit substitution, %omp_powerpc_triple that, if the system is 32-bit or
64-bit, sets the powerpc triple accordingly. This was required to fix
some OpenMP tests that automatically populated the target architecture.
Reviewers: jdoerfert
Subscribers: cfe-commits guansong sstefan1 yaxunl delcypher
Tags: OpenMP clang LLVM
Differential Revision: https://reviews.llvm.org/D88594
Support VRI, VRR, VRS, VRV, VRX, VSI instruction formats with the .insn
directive.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D88357
This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV.
Differential Revision: https://reviews.llvm.org/D87836
Update the code responsible for deleting VPBBs and recipes to properly
update users and release operands.
This is another preparation for D84680 & following patches towards
enabling modeling def-use chains in VPlan.
This change implements generation of a function which may be used by a backend to check if a given instruction is supported for a specific subtarget.
Reviewers: sdesmalen
Differential Revision: https://reviews.llvm.org/D88214
Use tablegen generic tables to get the index of image intrinsic
arguments.
Before, the computation of which image intrinsic argument is at which
index was scattered in a few places, tablegen, the SDag instruction
selection and GlobalISel. This patch changes that, so only tablegen
contains code to compute indices and the ImageDimIntrinsicInfo table
provides these information.
Differential Revision: https://reviews.llvm.org/D86270
Support register and frame-index pair correctly as operands of
generic load/store instrucitons, e.g. LD1BZXrri, STLrri, and etc.
Add regression tests also.
Differential Revision: https://reviews.llvm.org/D88779
This tends to increase code size but more importantly it reduces vgpr
usage, and could avoid costly readfirstlanes if the result needs to be
in an sgpr.
Differential Revision: https://reviews.llvm.org/D88580
When nesting INSERT_SUBREG and EXTRACT_SUBREG, GlobalISelEmitter would
fail to find the register class of the nested node. This patch fixes
that for registers with subregs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D88487
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.
Fixes : PR38815
Differential Revision: https://reviews.llvm.org/D87494
Rename the DwarfFile class in DWARFLinker to DWARFFile. This is
consistent with the other DWARF classes and avoids a ODR violation with
the DwarfFile class in AsmPrinter.
RBX was copied to a virtual register before this instruction
was created. And the EBX input for the final MWAITX is still
in a virtual register. So EBX isn't read by this pseudo.
It's not possible to do this in complete generality - a CU using a
sec_offset DW_AT_ranges has no way of knowing where its rnglists
contribution starts, so should not attempt to parse any full rnglist
table/header to do so. And even using FORM_rnglistx there's no need to
parse the header - the offset can be computed using the CU's DWARF
format (32 or 64) to compute offset entry sizes, and then the list
parsed at that offset without ever trying to find a rnglist contribution
header immediately prior to the rnglists_base.
ebx/rbx only needs to be saved when 64-bit registers are supported
anyway. It should be fine to save/restore the whole rbx register
even in gnux32 where the base is technically just ebx.
This matches what we do for cmpxchg16b where rbx is saved/restored
regardless of gnux32.
This is one of the reason for extra invalidations in D84959. In
practice, I don't think we have use cases needing this. This simplifies
the pipeline a bit and prune corner cases when considering
invalidations.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85676
i.e. they cannot be unreachable from the entry (which usually indicate usage errors).
This change allows the removal of some nullptr checks.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D88758
MWAITX doesn't touch EFLAGS so no pseudos should def EFLAGS.
The SAVE_EBX/RBX pseudos only needs to def the EBX register that
the expansion overwrites. The EAX and ECX registers are only read.
The pseudo emitted during isel that is used by the custom inserter
shouldn't have any implicit defs or uses since everything is in
vregs.
This reverts commit e9b87f43bde8b5f0d8a79c5884fdce639b12e0ca.
There are issues with macros generating macros without an obvious simple fix
so I'm going to revert this and try something different.
Apparently querying dereferenceability of array allocations is
being intentionally penalized (https://reviews.llvm.org/D41398),
so avoid using them in tests.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
This adds a helper to convert a VPRecipeBase pointer to a VPUser, for
recipes that inherit from VPUser. Once VPRecipeBase directly inherits
from VPUser this helper can be removed.
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
In the motivating case from https://llvm.org/PR47517
we create a node that does not get constant folded
before getNegatedExpression is attempted from some
other node, and we crash.
By moving the fold into SelectionDAG::simplifyFPBinop(),
we get the constant fold sooner and avoid the problem.
The case of a destination read between call and memcpy was not
covered anywhere (but is handled correctly).
However, a potentially throwing call between the call and the
memcpy appears to be miscompiled.
Preliminary patch for the next stage of PR45974 - we don't want to be creating 'padded' vectors on-the-fly at all in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.
This means that the inputs to combineX86ShuffleChain might soon be smaller than the final root value type, so we should ensure that isTargetShuffleEquivalent only matches with the inputs if they are the correct size.