This encapsulates the section specific code inside the
corresponding writeSectionContent methods.
Making the code a bit more consistent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359297 91177308-0d34-0410-b5e6-96231b3b80d8
Create a matchBitOpReduction helper that checks for the pattern with any opcode.
First step towards reusing this code to recognize other scalar reduction patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359296 91177308-0d34-0410-b5e6-96231b3b80d8
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Differential Revision: https://reviews.llvm.org/D61068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359293 91177308-0d34-0410-b5e6-96231b3b80d8
A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.
Differential Revision: https://reviews.llvm.org/D60512
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359292 91177308-0d34-0410-b5e6-96231b3b80d8
This enables the pass to be used in the absence of
TargetTransformInfo. When the argument isn't passed, the factory
defaults to UninitializedAddressSpace and the flat address space is
obtained from the TargetTransformInfo as before this change. Existing
users won't have to change.
Patch by Kevin Petit.
Differential Revision: https://reviews.llvm.org/D60602
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359290 91177308-0d34-0410-b5e6-96231b3b80d8
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.
Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)
Differential revision: https://reviews.llvm.org/D61124
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359287 91177308-0d34-0410-b5e6-96231b3b80d8
When constrainRegClass is called if the constraining happens on a use the COPY
needs to be inserted before the instruction that contains the MachineOperand,
but if we are constraining a definition it actually needs to be added
after the instruction. In addition, the COPY needs to have its operands
flipped (in the use case we are copying from the old unconstrained register
to the new constrained register, while in the definition case we are copying
from the new constrained register that the instruction defines to the old
unconstrained register).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359282 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
llvm-{objcopy,strip} (and many other LLVM binary utilities) accept
cl::opt style -long-option as well as many short options (e.g. -p -S
-x). People who use them as replacement of GNU binutils often use the
grouped option syntax (POSIX Utility Conventions), e.g. -Sx => -S -x,
-Wd => -W -d, -sj.text => -s -j.text
There is ambiguity if a long option starts with the character used by a
short option. Drop the support for -long-option to resolve the ambiguity.
This divergence from other utilities is accepted (other utilities
continue supporting -long-option).
https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html
Reviewers: alexshap, jakehehrlich, jhenderson, rupprecht, espindola
Reviewed By: jakehehrlich, jhenderson, rupprecht
Subscribers: grimar, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60439
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359265 91177308-0d34-0410-b5e6-96231b3b80d8
isValidCandidateForColdCC is much more expensive than
TTI.useColdCCForColdCall, which by default just returns false. Avoid
doing this work if we're not going to look at the answer anyway.
This change is NFC, but I see significant compile time improvements on
some code with pathologically many functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359253 91177308-0d34-0410-b5e6-96231b3b80d8
When failing materialization of a symbol X, remove X from the dependants list
of any of X's dependencies. This ensures that when X's dependencies are
emitted (or fail themselves) they do not try to access the no-longer-existing
MaterializationInfo for X.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359252 91177308-0d34-0410-b5e6-96231b3b80d8
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.
Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)
Differential Revision: https://reviews.llvm.org/D60279
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359248 91177308-0d34-0410-b5e6-96231b3b80d8
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.
The test generation script wmma.py has been refactored in a similar way.
Differential Revision: https://reviews.llvm.org/D60015
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359247 91177308-0d34-0410-b5e6-96231b3b80d8
PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.
Differential Revision: https://reviews.llvm.org/D59393
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359246 91177308-0d34-0410-b5e6-96231b3b80d8
Generalized constructions of 'fragments' of MMA operations to provide
common primitives for construction of the ops. This will make it easier
to add new variants of the instructions that operate on integer types.
Use nested foreach loops which makes it possible to better control
naming of the intrinsics.
This patch does not affect LLVM's output, so there are no test changes.
Differential Revision: https://reviews.llvm.org/D59389
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359245 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This value is derived from the host triple, which on the machine
I'm currently using is `ppc64le-linux-redhat`. This change makes
LLVM compile.
Reviewers: nemanjai
Differential Revision: https://reviews.llvm.org/D57118
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359242 91177308-0d34-0410-b5e6-96231b3b80d8
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.
P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.
(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)
Reviewed By: dblaikie
Author: Arthur O'Dwyer
Differential Revision: https://reviews.llvm.org/D54885
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359236 91177308-0d34-0410-b5e6-96231b3b80d8
This case was missing before, so we couldn't legalize it.
Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359231 91177308-0d34-0410-b5e6-96231b3b80d8
it keeps track of becomes too large
ARC optimizer does a top-down and a bottom-up traversal of the whole
function to pair up retain and release instructions and remove them.
This can be expensive if the number of instructions in the function and
pointer states it tracks are large since it has to look at each pointer
state and determine whether the instruction being visited can
potentially use the pointer.
This patch adds a command line option that sets a limit to the number of
pointers it tracks.
rdar://problem/49477063
Differential Revision: https://reviews.llvm.org/D61100
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359226 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).
Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.
Differential Revision: https://reviews.llvm.org/D60889
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359222 91177308-0d34-0410-b5e6-96231b3b80d8
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.
This patch falls back to doing the conversion through memory.
Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.
Differential Revision: https://reviews.llvm.org/D61114
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359219 91177308-0d34-0410-b5e6-96231b3b80d8
When evaluating a store through a bitcast, the evaluator tries to move the
bitcast from the pointer onto the stored value. If the cast is invalid, it
tries to "introspect" the type to get a valid cast by obtaining a pointer to
the initial element (if the type is nested, this may require walking several
initial elements).
In some situations it is possible to get a bitcast on a load (e.g. with
unions, where the bitcast may not be the same type as the store). However,
equivalent logic to the store to introspect the type is missing. This patch
add this logic.
Note, when developing the patch I was unhappy with adding similar logic
directly to the load case as it could get out of step. Instead, I have
abstracted the "introspection" into a helper function, with the specifics
being handled by a passed-in lambda function.
Differential Revision: https://reviews.llvm.org/D60793
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359205 91177308-0d34-0410-b5e6-96231b3b80d8
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.
Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359204 91177308-0d34-0410-b5e6-96231b3b80d8
If this is set, %INCLUDE% must contain ".../DIA SDK/include"
and %LIB% must contain ".../DIA SKD/lib/amd64" (assuming you're doing a
64-bit build).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359195 91177308-0d34-0410-b5e6-96231b3b80d8
This has no effect on constant folding but will be useful when we expand non-saturating PACKSS/PACKUS intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359191 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There's still a little bit of constant factor that could be trimmed (e.g.
more overloads to avoid round-tripping primitives through json::Value).
But this solves the memory scaling problem, and greatly improves the performance
constant factor, and the API should leave room for optimization if needed.
Adapt TimeProfiler to use it, eliminating almost all the performance regression
from r358476.
Performance test on my machine:
perf stat -r 5 ~/llvmbuild-opt/bin/clang++ -w -S -ftime-trace -mllvm -time-trace-granularity=0 spirit.cpp
Handcrafted JSON (HEAD=r358532 with r358476 reverted): 2480ms
json::Value (HEAD): 2757ms (+11%)
After this patch: 2520 ms (+1.6%)
Reviewers: anton-afanasyev, lebedev.ri
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359186 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).
When parallel_for_each() is about to return, in ~Latch() called by
~TaskGroup(), a thread (in the default executor) may block in
Latch::sync() waiting for Count to become zero. If all threads in the
default executor are blocked, it is a dead lock.
To fix this, force serial execution if the current TaskGroup is not the
first one. For a nested llvm::parallel::for_each(), this parallelizes
the outermost loop and serializes inner loops.
Differential Revision: https://reviews.llvm.org/D61115
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359182 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Annotations allow writing nice-looking unit test code when one needs
access to locations from the source code, e.g. running code completion
at particular offsets in a file. See comments in Annotations.cpp for
more details on the API.
Also got rid of a duplicate annotations parsing code in clang's code
complete tests.
Reviewers: gribozavr, sammccall
Reviewed By: gribozavr
Subscribers: mgorny, hiraditya, ioeric, MaskRay, jkorous, arphaman, kadircet, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D59814
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359179 91177308-0d34-0410-b5e6-96231b3b80d8