Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.
This can be more efficient than using pred_size(), which is a linear
time operation.
We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).
With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.
See llvm.org/PR39702 for a harder but more general way of achieving
similar results.
Differential Revision: https://reviews.llvm.org/D54686
llvm-svn: 347256
This will hold flags specific to subprograms. In the future
we could potentially free up scarce bits in DIFlags by moving
subprogram-specific flags from there to the new flags word.
This patch does not change IR/bitcode formats, that will be
done in a follow-up.
Differential Revision: https://reviews.llvm.org/D54597
llvm-svn: 347239
The #if check around the statistics computation gave an error about
the statistic being an unused variable. Instead, guard with
AreStatisticsEnabled().
llvm-svn: 347146
Summary:
Follow up to D49362 ([ThinLTO] Internalize read only globals). Add a
statistic on the number of read only variables (only counting live
variables since dead variables will be dropped anyway).
Reviewers: evgeny777
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54642
llvm-svn: 347145
Summary:
StructRet attribute is not allowed in vararg calls. The statepoint
intrinsic is vararg, but the wrapped function may be not. Allow
calls of statepoint with StructRet arg, as long as the wrapped
function is not vararg.
Reviewers: thanm, anna
Reviewed By: anna
Subscribers: anna, llvm-commits
Differential Revision: https://reviews.llvm.org/D53602
llvm-svn: 347050
An attempt to recommit r346584 after failure on OSX build bot.
Fixed cache key computation in ThinLTOCodeGenerator and added
test case
llvm-svn: 347033
Before this commit, `llc -print-after-all` would print something like:
*** IR Dump After Pre-ISel Intrinsic Lowering ***; ModuleID = ...
Emit a newline such that ModuleID appears on a line by its own.
llvm-svn: 346844
Summary:
Ranges base address specifiers can save a lot of object size in
relocation records especially in optimized builds.
For an optimized self-host build of Clang with split DWARF and debug
info compression in object files, but uncompressed debug info in the
executable, this change produces about 18% smaller object files and 6%
larger executable.
While it would've been nice to turn this on by default, gold's 32 bit
gdb-index support crashes on this input & I don't think there's any
perfect heuristic to implement solely in LLVM that would suffice - so
we'll need a flag one way or another (also possible people might want to
aggressively optimized for executable size that contains debug info
(even with compression this would still come at some cost to executable
size)) - so let's plumb it through.
Differential Revision: https://reviews.llvm.org/D54242
llvm-svn: 346788
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.
Differential Revision: https://reviews.llvm.org/D53877
llvm-svn: 346774
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions
Differential revision: https://reviews.llvm.org/D49362
llvm-svn: 346584
In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D53390
llvm-svn: 346481
As shown, this is used to eliminate redundant code in InstCombine,
and there are more cases where we should be using this pattern, but
we're currently unintentionally dropping flags.
llvm-svn: 346282
Summary:
The NotEligibleToImport flag on the GlobalValueSummary was set if it
isn't legal to import (e.g. because it references unpromotable locals)
and when it can't be inlined (in which case importing is pointless).
I split out the inlinable piece into a separate flag on the
FunctionSummary (doesn't make sense for aliases or global variables),
because in the future we may want to import for reasons other than
inlining.
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53345
llvm-svn: 346261
Summary:
Improve the intrinsic bindings with operations for
- Retrieving and automatically inserting the declaration of an intrinsic by ID
- Retrieving the name of a non-overloaded intrinsic by ID
- Retrieving the name of an overloaded intrinsic by ID and overloaded parameter types
Improve the echo test to copy non-overloaded intrinsics by ID.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53626
llvm-svn: 346195
ModuleSummaryIndex::exportToDot crashes when linking the Linux kernel
under ThinLTO using LLVMgold.so. This is due to the exportToDot
function trying to get the GUID of an empty ValueInfo. The root cause
related to the fact that we attempt to get the GUID of an aliasee
via its OriginalGUID recorded in the aliasee summary, and that is not
always possible. Specifically, we cannot do this mapping when the value
is internal linkage and there were other internal linkage symbols with
the same name.
There are 2 fixes for the problem included here.
1) In all cases where we can currently print the dot file from the
command line (which is only via save-temps), we have a valid AliaseeGUID
in the AliasSummary. Use that when it is available, so that we can get
the correct aliasee GUID whenever possible.
2) However, if we were to invoke exportToDot from the debugger right
after it is built during the initial analysis step (i.e. the per-module
summary), we won't have the AliaseeGUID field populated. In that case,
we have a fallback fix that will simply print "@"+GUID when we aren't
able to get the GUID from the OriginalGUID. It simply checks if the VI
is valid or not before attempting to get the name. Additionally, since
getAliaseeGUID will assert that the AliaseeGUID is non-zero, guard the
earlier fix#1 by a new function hasAliaseeGUID().
Reviewers: pcc, tmroeder
Subscribers: evgeny777, mehdi_amini, inglorion, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53986
llvm-svn: 346055
We want to remove this fneg API because it would silently fail
if we add an actual fneg instruction to IR (as proposed in
D53877 ).
We have a newer 'match' API that makes checking for
these patterns simpler. It also works with vectors
that may include undef elements in constants.
If any out-of-tree users need updating, they can model
their code changes on this commit:
https://reviews.llvm.org/rL345295
llvm-svn: 345904
I think this is the actual important property; the previous visibility
check was an approximation.
Differential Revision: https://reviews.llvm.org/D53852
llvm-svn: 345790
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53783
llvm-svn: 345512
Summary:
This function was performing two hash lookups when a new struct type was requested: first checking if it exists and second to insert it. This patch updates the function to perform a single hash lookup in this case by updating the value in the hash table in-place in case the struct type was not there before.
Similar to r345151.
Reviewers: bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53689
llvm-svn: 345264
Summary: This function was performing two hash lookups when a new function type was requested: first checking if it exists and second to insert it. This patch updates the function to perform a single hash lookup in this case by updating the value in the hash table in-place in case the function type was not there before.
Reviewers: bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53471
llvm-svn: 345151
The initial motivation is that we want to remove the
fneg API because that would silently fail if we add
an actual fneg instruction to IR. The same would be
true for the integer ops, so we might as well get rid
of these too.
We have a newer 'match' API that makes checking for
these patterns simpler. It also works with vectors
that may include undef elements in constants.
If any out-of-tree users need updating, they can model
their code changes on these commits:
rL345050
rL345043
rL345042
rL345041
rL345036
rL345030
llvm-svn: 345052
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53340
llvm-svn: 344971
This updates the C API for the removal of `TerminatorInst`. It converts
the type query to a predicate query and moves the generic methods to
work on `Instruction` instances that satisfy this predicate rather than
requiring a specific type. It also clarifies that the C API wrapping
`BasicBlock::getTerminator` just returns an `Instruction`. Because this
was always wrapped opaquely as a value and the functions consuming these
values will work on `Instruction` objects, this shouldn't break any
clients.
This is a completely compatible change to the C API.
Differential Revision: https://reviews.llvm.org/D52968
llvm-svn: 344764
Add an intrinsic that takes 2 integers and perform saturation addition on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53053
llvm-svn: 344629
This removes the primary remaining API producing `TerminatorInst` which
will reduce the rate at which code is introduced trying to use it and
generally make it much easier to remove the remaining APIs across the
codebase.
Also clean up some of the stragglers that the previous mechanical update
of variables missed.
Users of LLVM and out-of-tree code generally will need to update any
explicit variable types to handle this. Replacing `TerminatorInst` with
`Instruction` (or `auto`) almost always works. Most of these edits were
made in prior commits using the perl one-liner:
```
perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g'
```
This also my break some rare use cases where people overload for both
`Instruction` and `TerminatorInst`, but these should be easily fixed by
removing the `TerminatorInst` overload.
llvm-svn: 344504
are terminators without relying on the specific `TerminatorInst` type.
This required cleaning up two users of `InstVisitor`s usage of
`TerminatorInst` as well.
llvm-svn: 344503
by `getTerminator()` calls instead be declared as `Instruction`.
This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).
llvm-svn: 344502
This commit modifies an existing IR verifier check that
assumes all functions will be located in the default address
space 0.
Rather than using the default paramater value getPointerTo(AddrSpace=0),
explicitly specify the program memory address space from the data layout.
This only affects targets that specify a nonzero address space
in their data layouts. The only in-tree target that does this
is AVR.
llvm-svn: 344243
Add thin shims to C interface to provide access to DebugLoc info for
Instructions, GlobalVariables and Functions. Patch by Josh Berdine!
llvm-svn: 344202
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:
- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).
This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).
Differential Revision: https://reviews.llvm.org/D52087
llvm-svn: 343962
Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.
Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
%t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
%t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Output:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.
```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```
I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.
Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52294
llvm-svn: 343956
Enable time-passes functionality through PassInstrumentation callbacks
for passes and analyses.
TimePassesHandler class keeps all the callbacks, the timing data as it
is being collected as well as the stack of currently active timers.
Parts of the fix that might be somewhat unobvious:
- mapping of passes into Timer (TimingData) can not be done per-instance.
PassID name provided into the callback is common for all the pass invocations.
Thus the only way to get a timing with reasonable granularity is to collect
timing data per pass invocation, getting a new timer for each BeforePass.
Hence the key for TimingData uses a pair of <StringRef/unsigned count> to
uniquely identify a pass invocation.
- consequently, this new-pass-manager implementation performs no aggregation
of timing data, reporting timings for each pass invocation separately.
In that it differs from legacy-pass-manager time-passes implementation that
reports timing data aggregated per pass instance.
- pass managers and adaptors are not tracked, similar to how pass managers are
not tracked in legacy time-passes.
- TimerStack tracks timers that are active, each BeforePass pushes the new timer
on stack, each AfterPass pops active timer from stack and stops it.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D51276
llvm-svn: 343898
DWARF v5 introduces DW_AT_call_all_calls, a subprogram attribute which
indicates that all calls (both regular and tail) within the subprogram
have call site entries. The information within these call site entries
can be used by a debugger to populate backtraces with synthetic tail
call frames.
Tail calling frames go missing in backtraces because the frame of the
caller is reused by the callee. Call site entries allow a debugger to
reconstruct a sequence of (tail) calls which led from one function to
another. This improves backtrace quality. There are limitations: tail
recursion isn't handled, variables within synthetic frames may not
survive to be inspected, etc. This approach is not novel, see:
https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jelinek.pdf
This patch adds an IR-level flag (DIFlagAllCallsDescribed) which lowers
to DW_AT_call_all_calls. It adds the minimal amount of DWARF generation
support needed to emit standards-compliant call site entries. For easier
deployment, when the debugger tuning is LLDB, the DWARF requirement is
adjusted to v4.
Testing: Apart from check-{llvm, clang}, I built a stage2 RelWithDebInfo
clang binary. Its dSYM passed verification and grew by 1.4% compared to
the baseline. 151,879 call site entries were added.
rdar://42001377
Differential Revision: https://reviews.llvm.org/D49887
llvm-svn: 343883
Replacing Timer* with unique_ptr<Timer> in a pass-to-timer map.
That allows to get rid of unpretty raw deletes in PassTimingInfo destructor.
Strictly cleanup, not intended to change any visible behavior.
llvm-svn: 343772