We can work around a shortcoming of FileCheck by using {{\[}} to match a square
bracket before a [[ sequence.
Thanks to Eli Friedman for the heads up!
llvm-svn: 283422
If we don't truncate, LLVM asserts when the label difference doesn't fit
in a 16 bit field. This patch truncates two kinds of data: trailing null
terminated names in symbol records, and inline line tables. The inline
line table test that I have is too large (many MB), so I'm not checking
it in.
Hopefully fixes PR28264.
llvm-svn: 283403
How code is optimized sometimes, perhaps often, depends on the context into
which it was inlined. This change allows llvm-opt-report to track the
differences between the optimizations performed, or not, in different contexts,
and when these differ, display those differences.
For example, this code:
$ cat /tmp/q.cpp
void bar();
void foo(int n) {
for (int i = 0; i < n; ++i)
bar();
}
void quack() {
foo(4);
}
void quack2() {
foo(4);
}
will now produce this report:
< /home/hfinkel/src/llvm/test/tools/llvm-opt-report/Inputs/q.cpp
2 | void bar();
3 | void foo(int n) {
[[
> foo(int):
4 | for (int i = 0; i < n; ++i)
> quack(), quack2():
4 U4 | for (int i = 0; i < n; ++i)
]]
5 | bar();
6 | }
7 |
8 | void quack() {
9 I | foo(4);
10 | }
11 |
12 | void quack2() {
13 I | foo(4);
14 | }
15 |
Note that the tool has demangled the function names, and grouped the reports
associated with line 4. This shows that the loop on line 4 was unrolled by a
factor of 4 when inlined into the functions quack() and quack2(), but not in
the function foo(int) itself.
llvm-svn: 283402
This came out of a discussion in https://reviews.llvm.org/D25285.
There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.
This also removes an entirely obsolete and bitrotted testcase for PR7662.
Reapplies 283390 with a forgotten testcase.
llvm-svn: 283400
LLVM now has the ability to record information from optimization remarks in a
machine-consumable YAML file for later analysis. This can be enabled in opt
(see r282539), and D25225 adds a Clang flag to do the same. This patch adds
llvm-opt-report, a tool to generate basic optimization "listing" files
(annotated sources with information about what optimizations were performed)
from one of these YAML inputs.
D19678 proposed to add this capability directly to Clang, but this more-general
YAML-based infrastructure was the direction we decided upon in that review
thread.
For this optimization report, I focused on making the output as succinct as
possible while providing information on inlining and loop transformations. The
goal here is that the source code should still be easily readable in the
report. My primary inspiration here is the reports generated by Cray's tools
(http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html).
These reports are highly regarded within the HPC community. Intel's compiler,
for example, also has an optimization-report capability
(https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).
$ cat /tmp/v.c
void bar();
void foo() { bar(); }
void Test(int *res, int *c, int *d, int *p, int n) {
int i;
#pragma clang loop vectorize(assume_safety)
for (i = 0; i < 1600; i++) {
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
for (i = 0; i < 16; i++) {
res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}
foo();
foo(); bar(); foo();
}
D25225 adds -fsave-optimization-record (and
-fsave-optimization-record=filename), and this would be used as follows:
$ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
$ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
$ cat /tmp/v.lst
< /tmp/v.c
2 | void bar();
3 | void foo() { bar(); }
4 |
5 | void Test(int *res, int *c, int *d, int *p, int n) {
6 | int i;
7 |
8 | #pragma clang loop vectorize(assume_safety)
9 V4,2 | for (i = 0; i < 1600; i++) {
10 | res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
11 | }
12 |
13 U16 | for (i = 0; i < 16; i++) {
14 | res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
15 | }
16 |
17 I | foo();
18 |
19 | foo(); bar(); foo();
I | ^
I | ^
20 | }
Each source line gets a prefix giving the line number, and a few columns for
important optimizations: inlining, loop unrolling and loop vectorization. An
'I' is printed next to a line where a function was inlined, a 'U' next to an
unrolled loop, and 'V' next to a vectorized loop. These are printed on the
relevant code line when that seems unambiguous, or on subsequent lines when
multiple potential options exist (messages, both positive and negative, from
the same optimization with different column numbers are taken to indicate
potential ambiguity). When on subsequent lines, a '^' is output in the relevant
column.
Annotated source for all relevant input files are put into the listing file
(each starting with '<' and then the file name).
You can disable having the unrolling/vectorization factors appear by using the
-s flag.
Differential Revision: https://reviews.llvm.org/D25262
llvm-svn: 283398
Summary: This makes a change to the state used to maintain visited information for depth first iterator. We know assume a method "completed(...)" which is called after all children of a node have been visited. In all existing cases, this method does nothing so this patch has no functional changes. It will however allow a client to distinguish back from cross edges in a DFS tree.
Reviewers: nadav, mehdi_amini, dberlin
Subscribers: MatzeB, mzolotukhin, twoh, freik, llvm-commits
Differential Revision: https://reviews.llvm.org/D25191
llvm-svn: 283391
This came out of a discussion in https://reviews.llvm.org/D25285.
There used to be various other llvm.dbg.* nodes, but we don't support
upgrading them and we want to reserve the namespace for future uses.
This also removes an entirely obsolete and bitrotted testcase for PR7662.
llvm-svn: 283390
This allows LLVM to describe locations of aggregate variables that have
been split by SROA.
Fixes PR29141
Reviewers: amccarth, majnemer
Differential Revision: https://reviews.llvm.org/D25253
llvm-svn: 283388
To be default constructible, Archive::child_iterator needs to be able to
construct an Archive::Child with a null parent, however Archive::Child's
constructor always dereferenced its Parent argument to compute the remaining
archive size. This commit fixes Archive::Child's constructor to only do the
size calculation when the parent is non-null.
llvm-svn: 283387
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Differential Revision: https://reviews.llvm.org/D24076
llvm-svn: 283383
This patch is related to r274263 or Phabricator/D21818.
This patch aims to improve the test case added in the previous commit to verify
specifically that the stack protector pass is adding the debug line info as
intended. Before, the test only verified that the verifier pass does not crash.
The current approach is to generate the assembly output and then look for the
.loc directive.
Differential Revision: https://reviews.llvm.org/D25290
llvm-svn: 283374
The vectorizer already holds a pointer to one cost model artifact in a member
variable (i.e., MinBWs). As we add more, it will be easier to communicate these
artifacts to the vectorizer if we simply pass a pointer to the cost model
instead.
llvm-svn: 283373
The register scavenging code does not support multiple definitions of
the same vreg.
Differential Revision: https://reviews.llvm.org/D25220
llvm-svn: 283369
The vectorizer already holds a pointer to the legality analysis in a member
variable, so it makes sense that we would pass it in the constructor.
llvm-svn: 283368
This patch refactors the cost estimation of scalarized loads and stores to
reuse getScalarizationOverhead for the cost of the extractelement and
insertelement instructions we might create. The existing code accounted for
this cost, but it was functionally equivalent to the helper function.
llvm-svn: 283364
Previously we would give up when we saw the bitpiece DWARF expression
and print "[complex expression]" when actually we handled bitpiece
expressions outside the loop.
llvm-svn: 283355
The cost model has to estimate the probability of executing predicated blocks.
However, we currently always assume predicated blocks have a 50% chance of
executing (this value is hardcoded in several places throughout the code).
Since we always use the same value, this patch adds a helper function for
getting this uniform probability. The function simplifies some comments and
makes our assumptions more clear. In the future, we may want to extend this
with actual block probability information if it's available.
llvm-svn: 283354
The integrated assembler evaluates the expressions such as ~0x80000000 to
0xffffffff7fffffff early in the parsing process. This patch adds compatibility
with gas so that li loads the expected value (0x7fffffff) in those cases. This
only occurs iff all the upper 32bits are set and maintains existing checks by
not truncating the result down to 32 bits if any of the the upper bits are not
set.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D23399
llvm-svn: 283353
This patch adds a single helper function for checking if an instruction will be
scalarized with predication. Such instructions include conditional stores and
instructions that may divide by zero. Existing checks have been updated to use
the new function.
llvm-svn: 283350
Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple
look-through of EXTRACT_VECTOR_ELT. It will compute the result based
on the known bits (or known sign bits) for the vector that the element
is extracted from.
Reviewers: bogner, tstellarAMD, mkuper
Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D25007
llvm-svn: 283347
This allows you to enumerate over a range using a range-based
for while the return type contains the index of the enumeration.
Differential revision: https://reviews.llvm.org/D25124
llvm-svn: 283337
Add rsqrt.[ds], recip.[ds] for MIPS. Correct the microMIPS definitions for
architecture support and register usage.
Reviewers: vkalintiris, zoran.jovanoic
Differential Review: https://reviews.llvm.org/D24499
llvm-svn: 283334
This is not a valid encoding - these instructions cannot do PC-relative addressing.
The underlying problem here is of whitelist in ARMISelDAGToDAG that unwraps ARMISD::Wrappers during addressing-mode selection. This didn't realise TargetConstantPool was actually possible, so didn't handle it.
llvm-svn: 283323
Summary:
This adds the AVR machine code backend (`AVRAsmBackend.cpp`). This will
allow us to generate machine code from assembled AVR instructions.
Reviewers: arsenm, kparzysz
Subscribers: modocache, japaric, wdng, beanz, mgorny
Differential Revision: https://reviews.llvm.org/D25029
llvm-svn: 283297
This should allow users of the library to get a range to iterate through
all the subcommands that are registered to the global parser. This
allows users to define subcommands in libraries that self-register to
have dispatch done at a different stage (like main). It allows for
writing code like the following:
for (auto *S : cl::getRegisteredSubcommands()) {
if (*S) {
// Dispatch on S->getName().
}
}
This change also contains tests that show this usage pattern.
Reviewers: zturner, dblaikie, echristo
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D24489
llvm-svn: 283296
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.
Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.
llvm-svn: 283292
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283274
Summary:
These are analog to the existing LLVMConstExactSDiv and LLVMBuildExactSDiv
functions.
Reviewers: deadalnix, majnemer
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D25259
llvm-svn: 283269
Summary:
Attempting to fix PR30384.
Take the same approach as in compiler_rt and add a simplified version of __get_cpuid_max.
Including cpuid.h is no longer needed.
Reviewers: echristo, joerg
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24597
llvm-svn: 283265
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.
Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.
The ingredients of this patch are:
Remove the reciprocal estimate command-line debug option.
Add TargetRecip to TargetLowering.
Remove TargetRecip from TargetOptions.
Clean up the TargetRecip implementation to work with this new scheme.
Set the default reciprocal settings in TargetLoweringBase (everything is off).
Update the PowerPC defaults, users, and tests.
Update the x86 defaults, users, and tests.
Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.
Differential Revision: https://reviews.llvm.org/D24816
llvm-svn: 283252
load commands that uses the MachO::encryption_info_command and
MachO::encryption_info_command types but not used in llvm libObject
code but used in llvm tool code.
This includes just LC_ENCRYPTION_INFO and
LC_ENCRYPTION_INFO_64 load commands.
llvm-svn: 283250
Make LIT_COMMAND configurable, use source tree only when actually
available and extend the default search to other common executable names
'lit.py' and 'lit', in order to increase uniformity between all LLVM
projects and support using installed lit.
Changing the conditional used to determine whether in-tree or external
lit is being used covers the case when LLVM_MAIN_SRC_DIR is defined but
does not exist (anymore). In this case, the functions falls back to
looking for installed lit rather than attempting to use a non-existing
path. The same conditional is used in clang already.
Making LIT_COMMAND a cache variable in case the source tree variant is
used serves two purposes. Firstly, it increases uniformity between
the two branches since find_program() implicitly makes LIT_COMMAND
a cache variable. Secondly, it allows overriding the lit executable used
to run the tests when the LLVM source tree is provided. Gentoo is
planning to use this to use installed (and byte-compiled) lit instead of
re-compiling it in every LLVM project.
Extending default search is meant to increase uniformity between
different LLVM projects. The 'lit.py' name is already used by a few of
them, and 'lit' is the name used by utils/lit/setup.py when installing.
Differential Revision: https://reviews.llvm.org/D25076
llvm-svn: 283247
This adds support for CaseLower, CasesLower, StartsWithLower, and
EndsWithLower.
Differential revision: https://reviews.llvm.org/D24686
llvm-svn: 283244
AArch64InstrInfo::shouldScheduleAdjacent() determines whether two
instruction can benefit from macroop fusion on apple CPUs. The list
turned out to be incomplete:
- the "rr" variants of the instructions were missing
- even the "rs" variants can have shift value == 0 and behave like the
"rr" variants
This also splits the MacropFusion target feature into
ArithmeticBccFusion and ArithmeticCbzFusion.
Differential Revision: https://reviews.llvm.org/D25142
llvm-svn: 283243
The purpose of the YAML diagnostic output file is to collect information on
optimizations performed, or not performed, for later processing by tools that
help users (and compiler developers) understand how code was optimized. As
such, the diagnostics that appear in the file should not be coupled to what a
user might want to see summarized for them as the compiler runs, and in fact,
because the user likely does not know what optimization diagnostics their tools
might want to use, the user cannot provide a useful filter regardless. As such,
we shouldn't filter the diagnostics going to the output file.
Differential Revision: https://reviews.llvm.org/D25224
llvm-svn: 283236
CMake requires that all targets expressed as dependencies exist, so we can't have intrinsics_gen in LLVM_COMMON_DEPENDS when it is written out, otherwise projects building out of tree will have CMake errors.
llvm-svn: 283234
IntegerType::MAX_INT_BITS is apparently not in sync with Type::SubclassData
size. This patch fixes this.
Differential Revision: https://reviews.llvm.org/D24814
llvm-svn: 283215
This patch corresponds to review:
The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.
llvm-svn: 283212
Treat soft-float as unsupported for fast-isel. Additionally, ensure we check
that lowering f32 arguments also considers the case of soft-float mode.
Reviewers: ehostunreach, vkalintiris, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D24505
llvm-svn: 283209
Previously code would access invalid memory and may crash,
patch fixes the issue.
Differential revision: https://reviews.llvm.org/D25187
llvm-svn: 283204
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, an __mul?i3 intrinsic is used. Since the type is
not legal, the legalizer cannot directly call the intrinsic with
the wide arguments; instead, it "pre-lowers" them by splitting them
in halves.
The "pre-lowering" code in essence made assumptions about
the calling convention, specifically that i(N*2) values will be
split into two iN values and passed in consecutive registers in
little-endian order. This, naturally, breaks on a big-endian system,
such as our OR1K out-of-tree backend.
Thanks to James Miller <james@aatch.net> for help in debugging.
Differential Revision: https://reviews.llvm.org/D25223
llvm-svn: 283203
When using broken input object found using AFL,
getExtendedSymbolTableIndex() crashed because ShndxTable
was empty as object does not contain SHT_SYMTAB_SHNDX section.
Differential revision: https://reviews.llvm.org/D25189
llvm-svn: 283196
This fixes the inconsistency of the fp denormal option names: in LLVM this was
DenormalType, but in Clang this is DenormalMode which seems better.
Differential Revision: https://reviews.llvm.org/D24906
llvm-svn: 283192
This patch corresponds to review:
https://reviews.llvm.org/D23155
This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:
Int to Fp conversions of 1 or 2-byte values loaded from memory
Building vectors of 1 or 2-byte integers with values loaded from memory
Storing individual 1 or 2-byte elements from integer vectors
This patch implements all of those uses.
llvm-svn: 283190
Reintroduce versioning of shared libraries via SOVERSION, addressing
the issues with the previous design, since Gentoo is relying
on shared-split install of LLVM. The SOVERSIONs were originally
introduced in r229720 for all libraries, and removed in r252093 in favor
of custom SONAME. As far as I understand, the major concern with the old
versioning was that the used versions were incompatible with ldconfig.
Having considered that, this commit introduce SOVERSIONS with the
following considerations:
1. SOVERSIONs are formed of major & minor version concatenated -- i.e.
for 4.0 its .so.40. This matches the common practice where the first
version number indicates ABI breakage, and therefore fixes the issues
with ldconfig. Additionally, VERSION with the remaining verion
components appended is used, however this is not strictly necessary.
2. The versioning is only applied to libraries with no explicit SONAME
specified -- i.e. it won't apply to libLLVM but only to the split
libraries. It will also apply to libraries installed by the subprojects.
3. The versioning is only done on *nix systems, Darwin excluded. This
matches the current use of SONAME.
Differential Revision: https://reviews.llvm.org/D24757
llvm-svn: 283189
Use separate doctrees between different Sphinx builders in order to
prevent race condition issues due to multiple Sphinx instances accessing
the same doctree cache in parallel.
Bug: https://llvm.org/bugs/show_bug.cgi?id=23781
Differential Revision: https://reviews.llvm.org/D23755
llvm-svn: 283188
Summary:
The minimum version of Python necessary to run the LLVM test suite is
2.7. Code to work around Python 2.5 and lower isn't necessary.
Reviewers: ddunbar, echristo, delcypher, beanz
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25209
llvm-svn: 283169
Slightly improves the precision of GlobalsAA in certain situations, and
makes the behavior of optimization passes more predictable.
Differential Revision: https://reviews.llvm.org/D24104
llvm-svn: 283165
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
llvm-svn: 283164
We now build MemorySSA in its ctor, instead of waiting until the user
calls MemorySSA::getWalker. This silently changed our unittests, since
we add BasicAA to AAResults *after* constructing MemorySSA (...but
before calling MemorySSA::getWalker).
None of them broke because we do most of our "did this get optimized
correctly?" tests in .ll files.
llvm-svn: 283158
WebAssembly has officially switched from being an AST to being a stack
machine. Update various bits of terminology and README.md entries
accordingly.
llvm-svn: 283154
Summary:
optparse is deprecated in Python 2.7, which is the minimum version of
Python required to run the LLVM test suite. Replace its usage in lit
with argparse, optparse's 2.7 replacement module.
argparse has several benefits over optparse, but this commit does not
make use of those benefits yet. Instead, it simply uses the new API,
and attempts to keep the number of changes to a minimum.
Confirmed that lit's test suite, as well as LLVM's regression test suite,
still pass with these changes.
Patch By Brian Gesiak!
Reviewers: ddunbar, echristo, beanz, delcypher
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25173
llvm-svn: 283152
This is to avoid problems with win32 + ELF which surprisingly happens a
lot in practice: If a user just specifies -march on the commandline the
object format changes along with the architecture to ELF in many
instances while the OS stays with the default/host OS.
llvm-svn: 283151
This avoids llc using the hosts OS/vendor as defaults and triggering
unwanted behaviour in the tests. This should deal with the buildbot
breakages on windows after r283140.
llvm-svn: 283149
Refactor the code so that the same function can be used for all
instructions with all the same operands for up to 3 operands.
This is going to be useful for cast instructions.
NFC.
llvm-svn: 283144
Each shadow only represents data flow that is restricted to its reaching
def. Propagating more than that could lead to spurious register liveness,
resulting in extra (incorrectly) block live-ins.
llvm-svn: 283143
Windows has no GOT relocations the way elf/darwin has. Some people use
x86_64-pc-win32-macho to build EFI firmware; Do not produce GOT
relocations for this target.
Differential Revision: https://reviews.llvm.org/D24627
llvm-svn: 283140
Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778).
Reviewers: davidxl, danielcdh, hfinkel, chandlerc
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D24168
llvm-svn: 283134
Splitting the edge is nontrivial because of the landing pad, and we would
currently assert trying to do it.
Differential Revision: https://reviews.llvm.org/D24680
llvm-svn: 283129
It should forward to deregisterEHFramesInProcess by default, not
registerEHFramesInProcess.
No test case: I haven't come up with a good way to unit test EH frame
registration yet.
llvm-svn: 283123