This reverts commit r302461.
It appears to be causing failures compiling gtest with debug info on the
Linux sanitizer bot. I was unable to reproduce the failure locally,
however.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302504 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
r284533 added hot and cold section prefixes based on profile
information, to enable grouping of hot/cold functions at link time.
However, it used "cold" as the prefix for cold sections, but gold only
recognizes "unlikely" (which is used by gcc for cold sections).
Therefore, cold sections were not properly being grouped. Switch to
using "unlikely"
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302502 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For inalloca functions, this is a very common code pattern:
%argpack = type <{ i32, i32, i32 }>
define void @f(%argpack* inalloca %args) {
entry:
%a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0
%b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1
%c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2
tail call void @llvm.dbg.declare(metadata i32* %a, ... "a")
tail call void @llvm.dbg.declare(metadata i32* %c, ... "b")
tail call void @llvm.dbg.declare(metadata i32* %b, ... "c")
Even though these GEPs can be simplified to a constant offset from EBP
or RSP, we don't do that at -O0, and each GEP is computed into a
register. Registers used to compute argument addresses are typically
spilled and clobbered very quickly after the initial computation, so
live debug variable tracking loses information very quickly if we use
DBG_VALUE instructions.
This change moves processing of dbg.declare between argument lowering
and basic block isel, so that we can ask if an argument has a frame
index or not. If the argument lives in a register as is the case for
byval arguments on some targets, then we don't put it in the side table
and during ISel we emit DBG_VALUE instructions.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32980
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302483 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
An llvm.dbg.declare of a static alloca is always added to the
MachineFunction dbg variable map, so these values are entirely
redundant. They survive all the way through codegen to be ignored by
DWARF emission.
Effectively revert r113967
Two bugpoint-reduced test cases from 2012 broke as a result of this
change. Despite my best efforts, I haven't been able to rewrite the test
case using dbg.value. I'm not too concerned about the lost coverage
because these were reduced from the test-suite, which we still run.
Reviewers: aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32920
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302461 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces an LLVM intrinsic and a target opcode for custom event
logging in XRay. Initially, its use case will be to allow users of XRay to log
some type of string ("poor man's printf"). The target opcode compiles to a noop
sled large enough to enable calling through to a runtime-determined relative
function call. At runtime, when X-Ray is enabled, the sled is replaced by
compiler-rt with a trampoline to the logic for creating the custom log entries.
Future patches will implement the compiler-rt parts and clang-side support for
emitting the IR corresponding to this intrinsic.
Reviewers: timshen, dberris
Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D27503
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302405 91177308-0d34-0410-b5e6-96231b3b80d8
Remove an extra canonicalization step if ISD::ABS is going to be used anyway.
Updated x86 abs combine to check that we are lowering from both canonicalizations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302337 91177308-0d34-0410-b5e6-96231b3b80d8
This is a step toward having statically allocated instruciton mapping.
We are going to tablegen them eventually, so let us reflect that in
the API.
NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302316 91177308-0d34-0410-b5e6-96231b3b80d8
This exposes a method in MachineFrameInfo that calculates
MaxCallFrameSize and calls it after instruction selection in the ARM
target.
This avoids
ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame()
giving different answers in early/late phases of codegen.
The testcase shows a particular nasty example result of that where we
would fail to properly align an alloca.
Differential Revision: https://reviews.llvm.org/D32622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302303 91177308-0d34-0410-b5e6-96231b3b80d8
Most of the time we know exactly how many type records we
have in a list, and we want to use the visitor to deserialize
them into actual records in a database. Previously we were
just using push_back() every time without reserving the space
up front in the vector. This is obviously terrible from a
performance standpoint, and it's not uncommon to have PDB
files with half a million type records, where the performance
degredation was quite noticeable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302302 91177308-0d34-0410-b5e6-96231b3b80d8
- MIParser: If the successor list is not specified successors will be
added based on basic block operands in the block and possible
fallthrough.
- MIRPrinter: Adds a new `simplify-mir` option, with that option set:
Skip printing of block successor lists in cases where the
parser is guaranteed to reconstruct it. This means we still print the
list if some successor cannot be determined (happens for example for
jump tables), if the successor order changes or branch probabilities
being unequal.
Differential Revision: https://reviews.llvm.org/D31262
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302289 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change other than improving dbgs logging accuracy on
constant dbg values. Previously we would add things like "i32 42" as
debug values, and then log that we were dropping the debug info, which
is silly.
Delete some dead code that was checking for static allocas. This
remained after r207165, but served no purpose. Currently, static alloca
dbg.values are always sent through the DanglingDebugInfoMap, and are
usually made valid the first time the alloca is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302267 91177308-0d34-0410-b5e6-96231b3b80d8
Hoisting common code can cause registers that live-in in the successor
blocks to no longer be live-in. The live-in information needs to be
updated to reflect this, or otherwise incorrect code can be generated
later on.
Differential Revision: https://reviews.llvm.org/D32661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302228 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it simpler for the runtime to consistently handle the entries
in the function sled index in both 32 and 64 bit platforms where the
XRay runtime works.
Follow-up on D32693.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302111 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change adds a new section to the xray-instrumented binary that
stores an index into ranges of the instrumentation map, where sleds
associated with the same function can be accessed as an array. At
runtime, we can get access to this index by function ID offset allowing
for selective patching and unpatching by function ID.
Each entry in this new section (xray_fn_idx) will include two pointers
indicating the start and one past the end of the sleds associated with
the same function. These entries will be 16 bytes long on x86 and
aarch64. On arm, we align to 16 bytes anyway so the runtime has to take
that into consideration.
__{start,stop}_xray_fn_idx will be the symbols that the runtime will
look for when we implement the selective patching/unpatching by function
id APIs. Because XRay synthesizes the function id's in a monotonically
increasing manner at runtime now, implementations (and users) can use
this table to look up the sleds associated with a specific function.
This is useful in implementations that want to do things like:
- Implement coverage mode for functions by patching everything
pre-main, then as functions are encountered, the installed handler
can unpatch the function that's been encountered after recording
that it's been called.
- Do "learning mode", so that the implementation can figure out some
statistical information about function calls by function id for a
time being, and then determine which functions are worth
uninstrumenting at runtime.
- Do "selective instrumentation" where an implementation can
specifically instrument only certain function id's at runtime
(either based on some external data, or through some other
heuristics) instead of patching all the instrumented functions at
runtime.
Reviewers: dblaikie, echristo, chandlerc, javed.absar
Subscribers: pelikan, aemerson, kpw, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D32693
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302109 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is an implementation of the loop detection logic that XRay needs to
determine whether a function might take time at runtime. Without this
heuristic, XRay will tend to not instrument short functions that have
loops that might have runtime dependent on inputs or external values.
While this implementation doesn't do any further analysis than just
figuring out whether there is a loop in the MachineFunction being
code-gen'ed, we're paving the way for being able to perform more
sophisticated analysis of the function in the future (for example to
determine whether the trip count for the loop might be constant, and
make a decision on that instead). This enables us to cover more
functions with the default heuristics, and potentially identify ones
that have variable runtime latency just by looking for the presence of
loops.
Reviewers: chandlerc, rnk, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32274
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302103 91177308-0d34-0410-b5e6-96231b3b80d8
This is the DAG equivalent of https://reviews.llvm.org/D32255 ,
which will hopefully be committed again. The functionality
(preferring a 'not' op) is already here in the DAG, so this is
just intended to be a clean-up and performance improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302087 91177308-0d34-0410-b5e6-96231b3b80d8
Compiler emitted synthetic types may not have an associated DIFile
(translation unit). In such a case, when generating CodeView debug type
information, we would attempt to compute an absolute filepath which
would result in a segfault due to a NULL DIFile*. If there is no source
file associated with the type, elide the type index entry for the type
and record the type information. This actually results in higher
fidelity debug information than clang/C2 as of this writing.
Resolves PR32668!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302085 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302060 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Do the transform when the carry isn't used. It's a pattern exposed when legalizing large integers.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32755
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302047 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the corresponding llvm change to D28037 to ensure no performance
regression.
Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28329
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301990 91177308-0d34-0410-b5e6-96231b3b80d8
On AMDGPU if an SGPR is spilled to a VGPR, the frame index
is deleted. If there were any CSR SGPRs, this woudl
assert when setting the offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301961 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we wrote line information and file checksum
information, but we did not write information about inlinee
lines and functions. This patch adds support for that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301936 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This is a common pattern that arise when legalizing large integers operations. Only do it when Y + 1 cannot overflow as this would change the carry behavior of uaddo .
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32687
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301922 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Common pattern when legalizing large integers operations. Similar to D32687, when the carry isn't used.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Differential Revision: https://reviews.llvm.org/D32738
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301919 91177308-0d34-0410-b5e6-96231b3b80d8
PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats.
This patch adds support for extension/truncation for both integer and float types.
Differential Revision: https://reviews.llvm.org/D32391
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301910 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code only looks at half of the tree when matching bswap + rol patterns ending in an OR tree (as opposed to a cascade).
Patch originally introduced by Jim Lewis.
Submitted on the behalf of Dinar Temirbulatov.
Differential Revision: https://reviews.llvm.org/D32039
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301907 91177308-0d34-0410-b5e6-96231b3b80d8
This tracks whether MaxCallFrameSize is computed yet. Ideally we would
assert and fail when the value is queried before it is computed, however
this fails various targets that need to be fixed first.
Differential Revision: https://reviews.llvm.org/D32570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301851 91177308-0d34-0410-b5e6-96231b3b80d8
This is the SelectionDAG version of D32521. If know where at least one 1 is located in the input to these intrinsics we can place an upper bound on the number of bits needed to represent the count and thus increase the number of known zeros in the output.
I think we can also refine this further for CTTZ_UNDEF/CTLZ_UNDEF by assuming that the answer will never be BitWidth. I've left this out for now because it caused other test failures across multiple targets. Usually because of turning ADD into OR based on this new information.
I'll fix CTPOP in a future patch.
Differential Revision: https://reviews.llvm.org/D32692
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301806 91177308-0d34-0410-b5e6-96231b3b80d8
We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that
patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check
legality of the select that we want to produce.
A few things to note:
1. We can't wait until after legalization and do this generically because (at least in the x86
tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but
that requires a closer look to make sure we don't end up worse.
3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases.
That should be fixed next.
4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple
legal vector sizes, but if there are other targets like that, we should add more tests.
5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests):
despite IR that is terrible for the target, this patch allows us to generate the optimal loop
code because something post-ISEL is hoisting the splat extends above the vector loops.
Differential Revision: https://reviews.llvm.org/D32620
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301781 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29872
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301775 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().
Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.
Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32491
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301750 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch adds isNegative, isNonNegative for querying whether the sign bit is known. It also adds makeNegative and makeNonNegative for controlling the sign bit.
Reviewers: RKSimon, spatel, davide
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32651
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301747 91177308-0d34-0410-b5e6-96231b3b80d8
This broke the Clang build. (Clang-side patch missing?)
Original commit message:
> [IR] Make add/remove Attributes use AttrBuilder instead of
> AttributeList
>
> This change cleans up call sites and avoids creating temporary
> AttributeList objects.
>
> NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301712 91177308-0d34-0410-b5e6-96231b3b80d8