Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.
This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282667 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: AArch64 LLVM assembler emits add instruction without shift bit to calculate the higher 12-bit address of TLS variables in local exec model. This generates wrong code sequence to access TLS variables with thread offset larger than 0x1000.
Reviewers: t.p.northover, peter.smith, rovka
Subscribers: salim.nasser, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D24702
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282661 91177308-0d34-0410-b5e6-96231b3b80d8
load command that uses the Mach::rpath_command type
but not used in llvm libObject code but used in llvm tool code.
This includes just the LC_RPATH load command.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282649 91177308-0d34-0410-b5e6-96231b3b80d8
This is a step toward statically allocate InstructionMapping. Like the
previous few commits, the goal is to move toward a TableGen'ed like
structure with no dynamic allocation at all.
This should already improve compile time by getting rid of a bunch of
memmove of SmallVectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282643 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Answering any meaningful questions about .sancov files requires
accessing symbol information from the corresponding binary.
This change introduces a separate intermediate data structure and
format: symbolized coverage. It contains all symbol information that
is required to answer common queries:
- merging
- coverd/uncovered files and functions
- line status.
Also removing the html report functionality from sancov: generated
HTML files are too huge, and a different approach is required.
Maintaining this half-working approach in the C++ is painful.
Differential Revision: https://reviews.llvm.org/D24947
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282639 91177308-0d34-0410-b5e6-96231b3b80d8
LiveDebugVariables doesn't propagate DBG_VALUEs accross basic block
boundaries any more; this functionality was split into LiveDebugValues.
We can thus drop the now dead references to LexicalScopes from LiveDebugVariables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282638 91177308-0d34-0410-b5e6-96231b3b80d8
Coverage reports for gigabyte-sized binaries are huge. There's no
practical reason to generate them statically.
Implementing an experiment http coverage report server. The server
loads .symcov file and serves interactive coverage pages.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282637 91177308-0d34-0410-b5e6-96231b3b80d8
other load commands that use the Mach::version_min_command type
but not used in llvm libObject code but used in llvm tool code.
This includes LC_VERSION_MIN_MACOSX, LC_VERSION_MIN_IPHONEOS,
LC_VERSION_MIN_TVOS and LC_VERSION_MIN_WATCHOS load commands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282635 91177308-0d34-0410-b5e6-96231b3b80d8
Normally, if conversion would add implicit uses for redefined registers,
e.g. R0<def> = add_if ..., R0<imp-use>. However, if only subregisters of
R0 are known to be live but not R0 itself, such implicit uses will not be
added, causing prior definitions of such subregisters and R0 itself to
become dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282626 91177308-0d34-0410-b5e6-96231b3b80d8
Also, remove unnecessary function attributes, parameters, and comments.
It looks like at least some of these tests are not minimal though...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282620 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When using llc with -compile-twice, module is generated twice, but getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI will still get the old PSI with the original (invalidated) Module. This patch checks if the module has changed when calling getPSI, if yes, update the module and invalidate the Summary.
The bug does not show up in the current llc because PSI is not used in CodeGen yet. But with https://reviews.llvm.org/D24989, the bug will be exposed by test/CodeGen/PowerPC/pr26378.ll
Reviewers: eraman, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24993
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282616 91177308-0d34-0410-b5e6-96231b3b80d8
Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value.
This is similar to D13008.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D24729
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282612 91177308-0d34-0410-b5e6-96231b3b80d8
This addresses PR26055 LiveDebugValues is very slow.
Contrary to the old LiveDebugVariables pass LiveDebugValues currently
doesn't look at the lexical scopes before inserting a DBG_VALUE
intrinsic. This means that we often propagate DBG_VALUEs much further
down than necessary. This is especially noticeable in large C++
functions with many inlined method calls that all use the same
"this"-pointer.
For example, in the following code it makes no sense to propagate the
inlined variable a from the first inlined call to f() into any of the
subsequent basic blocks, because the variable will always be out of
scope:
void sink(int a);
void __attribute((always_inline)) f(int a) { sink(a); }
void foo(int i) {
f(i);
if (i)
f(i);
f(i);
}
This patch reuses the LexicalScopes infrastructure we have for
LiveDebugVariables to take this into account.
The effect on compile time and memory consumption is quite noticeable:
I tested a benchmark that is a large C++ source with an enormous
amount of inlined "this"-pointers that would previously eat >24GiB
(most of them for DBG_VALUE intrinsics) and whose compile time was
dominated by LiveDebugValues. With this patch applied the memory
consumption is 1GiB and 1.7% of the time is spent in LiveDebugValues.
https://reviews.llvm.org/D24994
Thanks to Daniel Berlin and Keith Walker for reviewing!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282611 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Instead of creating and destroying SCEVUnionPredicate instances (which
internally creates and destroys a DenseMap), use temporary SmallPtrSet
instances of remember the set of predicates that will get reified into a
SCEVUnionPredicate.
Reviewers: silviu.baranga, sbaranga
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D25000
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282606 91177308-0d34-0410-b5e6-96231b3b80d8
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282600 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds the AVRMCTargetDesc file in tree. It allows creation of the
core classes used in the backend.
Reviewers: arsenm, kparzysz
Subscribers: wdng, beanz, mgorny
Differential Revision: https://reviews.llvm.org/D25023
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282597 91177308-0d34-0410-b5e6-96231b3b80d8
We very recently landed the code. This commit enables the parser.
It also adds a missing include to AVRAsmParser.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282593 91177308-0d34-0410-b5e6-96231b3b80d8
The previous data layout caused issues when dealing with atomics.
Foe example, it is illegal to load a 16-bit value with less than 16-bits
of alignment.
This changes the data layout so that all types are aligned by at least
their own width.
Interestingly, this also _slightly_ decreased register pressure in some
cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282587 91177308-0d34-0410-b5e6-96231b3b80d8
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening
The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.
Differential Revision: https://reviews.llvm.org/D24953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282580 91177308-0d34-0410-b5e6-96231b3b80d8
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.
It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).
Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282570 91177308-0d34-0410-b5e6-96231b3b80d8
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.
Differential Revision: https://reviews.llvm.org/D24625
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282567 91177308-0d34-0410-b5e6-96231b3b80d8
Ever since LAA was split out into an analysis on its own, this function
stopped emitting the report directly. Instead it stores it to be
retrieved by the client which can then emit it as its own report
(e.g. -Rpass-analysis=loop-vectorize).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282561 91177308-0d34-0410-b5e6-96231b3b80d8