Summary:
llvm-{objcopy,strip} (and many other LLVM binary utilities) accept
cl::opt style -long-option as well as many short options (e.g. -p -S
-x). People who use them as replacement of GNU binutils often use the
grouped option syntax (POSIX Utility Conventions), e.g. -Sx => -S -x,
-Wd => -W -d, -sj.text => -s -j.text
There is ambiguity if a long option starts with the character used by a
short option. Drop the support for -long-option to resolve the ambiguity.
This divergence from other utilities is accepted (other utilities
continue supporting -long-option).
https://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html
Reviewers: alexshap, jakehehrlich, jhenderson, rupprecht, espindola
Reviewed By: jakehehrlich, jhenderson, rupprecht
Subscribers: grimar, emaste, arichardson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60439
llvm-svn: 359265
isValidCandidateForColdCC is much more expensive than
TTI.useColdCCForColdCall, which by default just returns false. Avoid
doing this work if we're not going to look at the answer anyway.
This change is NFC, but I see significant compile time improvements on
some code with pathologically many functions.
llvm-svn: 359253
When failing materialization of a symbol X, remove X from the dependants list
of any of X's dependencies. This ensures that when X's dependencies are
emitted (or fail themselves) they do not try to access the no-longer-existing
MaterializationInfo for X.
llvm-svn: 359252
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.
Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)
Differential Revision: https://reviews.llvm.org/D60279
llvm-svn: 359248
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.
The test generation script wmma.py has been refactored in a similar way.
Differential Revision: https://reviews.llvm.org/D60015
llvm-svn: 359247
PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.
Differential Revision: https://reviews.llvm.org/D59393
llvm-svn: 359246
Generalized constructions of 'fragments' of MMA operations to provide
common primitives for construction of the ops. This will make it easier
to add new variants of the instructions that operate on integer types.
Use nested foreach loops which makes it possible to better control
naming of the intrinsics.
This patch does not affect LLVM's output, so there are no test changes.
Differential Revision: https://reviews.llvm.org/D59389
llvm-svn: 359245
Adds a representation of the section header table to XCOFFObjectFile,
and implements enough to dump the section headers with llvm-obdump.
Differential Revision: https://reviews.llvm.org/D60784
llvm-svn: 359244
Summary:
This value is derived from the host triple, which on the machine
I'm currently using is `ppc64le-linux-redhat`. This change makes
LLVM compile.
Reviewers: nemanjai
Differential Revision: https://reviews.llvm.org/D57118
llvm-svn: 359242
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.
P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.
(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)
Reviewed By: dblaikie
Author: Arthur O'Dwyer
Differential Revision: https://reviews.llvm.org/D54885
llvm-svn: 359236
This case was missing before, so we couldn't legalize it.
Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.
llvm-svn: 359231
it keeps track of becomes too large
ARC optimizer does a top-down and a bottom-up traversal of the whole
function to pair up retain and release instructions and remove them.
This can be expensive if the number of instructions in the function and
pointer states it tracks are large since it has to look at each pointer
state and determine whether the instruction being visited can
potentially use the pointer.
This patch adds a command line option that sets a limit to the number of
pointers it tracks.
rdar://problem/49477063
Differential Revision: https://reviews.llvm.org/D61100
llvm-svn: 359226
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).
Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.
Differential Revision: https://reviews.llvm.org/D60889
llvm-svn: 359222
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.
This patch falls back to doing the conversion through memory.
Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.
Differential Revision: https://reviews.llvm.org/D61114
llvm-svn: 359219
When evaluating a store through a bitcast, the evaluator tries to move the
bitcast from the pointer onto the stored value. If the cast is invalid, it
tries to "introspect" the type to get a valid cast by obtaining a pointer to
the initial element (if the type is nested, this may require walking several
initial elements).
In some situations it is possible to get a bitcast on a load (e.g. with
unions, where the bitcast may not be the same type as the store). However,
equivalent logic to the store to introspect the type is missing. This patch
add this logic.
Note, when developing the patch I was unhappy with adding similar logic
directly to the load case as it could get out of step. Instead, I have
abstracted the "introspection" into a helper function, with the specifics
being handled by a passed-in lambda function.
Differential Revision: https://reviews.llvm.org/D60793
llvm-svn: 359205
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.
Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.
llvm-svn: 359204
Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update
arm64-irtranslator.ll.
Differential Revision: https://reviews.llvm.org/D60922
llvm-svn: 359203
For eventually selecting llvm.nearbyint. Equivalent to the SelectionDAG
nearbyint node.
Update legalizer-info-validation.mir.
Differential Revision: https://reviews.llvm.org/D60921
llvm-svn: 359201
If this is set, %INCLUDE% must contain ".../DIA SDK/include"
and %LIB% must contain ".../DIA SKD/lib/amd64" (assuming you're doing a
64-bit build).
llvm-svn: 359195
Summary:
There's still a little bit of constant factor that could be trimmed (e.g.
more overloads to avoid round-tripping primitives through json::Value).
But this solves the memory scaling problem, and greatly improves the performance
constant factor, and the API should leave room for optimization if needed.
Adapt TimeProfiler to use it, eliminating almost all the performance regression
from r358476.
Performance test on my machine:
perf stat -r 5 ~/llvmbuild-opt/bin/clang++ -w -S -ftime-trace -mllvm -time-trace-granularity=0 spirit.cpp
Handcrafted JSON (HEAD=r358532 with r358476 reverted): 2480ms
json::Value (HEAD): 2757ms (+11%)
After this patch: 2520 ms (+1.6%)
Reviewers: anton-afanasyev, lebedev.ri
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60804
llvm-svn: 359186
Summary:
Concurrent (e.g. nested) llvm::parallel::for_each() may lead to dead
locks. See PR35788 (fixed by rLLD322041) and PR41508 (fixed by D60757).
When parallel_for_each() is about to return, in ~Latch() called by
~TaskGroup(), a thread (in the default executor) may block in
Latch::sync() waiting for Count to become zero. If all threads in the
default executor are blocked, it is a dead lock.
To fix this, force serial execution if the current TaskGroup is not the
first one. For a nested llvm::parallel::for_each(), this parallelizes
the outermost loop and serializes inner loops.
Differential Revision: https://reviews.llvm.org/D61115
llvm-svn: 359182
Summary:
Annotations allow writing nice-looking unit test code when one needs
access to locations from the source code, e.g. running code completion
at particular offsets in a file. See comments in Annotations.cpp for
more details on the API.
Also got rid of a duplicate annotations parsing code in clang's code
complete tests.
Reviewers: gribozavr, sammccall
Reviewed By: gribozavr
Subscribers: mgorny, hiraditya, ioeric, MaskRay, jkorous, arphaman, kadircet, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D59814
llvm-svn: 359179
yaml2obj might crash on invalid input when unable to parse the YAML.
Recently a crash with a very similar nature was fixed for an empty files.
This patch revisits the fix and does it in yaml::Input instead.
It seems to be more correct way to handle such situation.
With that crash for invalid inputs is also fixed now.
Differential revision: https://reviews.llvm.org/D61059
llvm-svn: 359178
Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type.
We still have the issue identified in PR41594 but D61114 should handle this.
llvm-svn: 359176
On Mips32r2 bitcast can be expanded to two sw instructions and an ldc1
when using bitcast i64 to double or an sdc1 and two lw instructions when
using bitcast double to i64. By introducing custom lowering that uses
mtc1/mthc1 we can avoid excessive instructions.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D61069
llvm-svn: 359171
The IndexReg will always be non-null at this point. Earlier in the function, if
IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it
non-null.
llvm-svn: 359170
This should fix the MachO/x86-64 eh-frame regression test by ensuring that
the symbols __ZTIi and ___gxx_personality_v0 are defined on all platforms.
llvm-svn: 359169
The --args option can now be used to pass arguments to code linked with
llvm-rtdyld. E.g.
$ llvm-rtdyld file1.o file2.o --args a b c
is equivalent to:
$ ld -o program file1.o file2.o
$ ./program a b c
This is the rtdyld counterpart to the jitlink change in r359115, and makes
benchmarking and comparison between the tools easier.
llvm-svn: 359168
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.
Reviewers: chandlerc, jgorbe
Subscribers: jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61091
llvm-svn: 359167