Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.
Reviewers:
aaboud
zvi
craig.topper
gadi.haber
Differential revision: https://reviews.llvm.org/D34393
Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316413 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.
To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.
Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316360 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316295 91177308-0d34-0410-b5e6-96231b3b80d8
ComplexRendererFn -> ComplexRendererFns
Corrected a couple lingering references to tied operands that were missed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316237 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`.
This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8.
This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes.
Reviewers: skatkov, sanjoy
Reviewed By: skatkov, sanjoy
Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38924
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315990 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315887 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315885 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This includes some context-sensitivity in the MVT to LLT conversion so that
pointer types are tested correctly.
FIXME: I'm not happy with the way this is done since everything is a
special-case. I've yet to find a reasonable way to implement it.
select-load.mir fails because <1 x s64> loads in tablegen get priority over s64
loads. This is fixed in the next patch and as such they should be committed
together, I've posted them separately to help with the review.
Depends on D37456
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37457
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315884 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Hopefully fixed the ambiguous constructor that a large number of bots reported.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315869 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315863 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There is an important mismatch between ISD::LOAD and G_LOAD (and likewise for
ISD::STORE and G_STORE). In SelectionDAG, ISD::LOAD is a non-atomic load
and atomic loads are handled by a separate node. However, this is not true of
GlobalISel's G_LOAD. For G_LOAD, the MachineMemOperand indicates the atomicity
of the operation. As a result, this mapping must also add a predicate that
checks for non-atomic MachineMemOperands.
This is NFC since these nodes always have predicates in practice and are
therefore always rejected at the moment.
Depends on D37443
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37445
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315843 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Operand variable lookups are now performed by the RuleMatcher rather than
searching the whole matcher hierarchy for a match. This revealed a wrong-code
bug that currently affects ARM and X86 where patterns that use a variable more
than once in the match pattern will be imported but won't check that the
operands are identical. This can cause the tablegen-erated matcher to
accept matches that should be rejected.
Depends on D36569
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Subscribers: aemerson, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D36618
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315780 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There's only a tablegen testcase for IntImmLeaf and not a CodeGen one
because the relevant rules are rejected for other reasons at the moment.
On AArch64, it's because there's an SDNodeXForm attached to the operand.
On X86, it's because the rule either emits multiple instructions or has
another predicate using PatFrag which cannot easily be supported at the
same time.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D36569
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315761 91177308-0d34-0410-b5e6-96231b3b80d8
TargetRegisterInfo::getMinimalPhysRegClass is actually pretty expensive
because it has to iterate over all the register classes.
Cache this information as we need and get it so that we limit its usage.
Right now, we heavily rely on it, because this is how we get the mapping
for vregs defined by copies from physreg (i.e., the one that are ABI
related).
Improve compile time by up to 10% for that pass.
NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315759 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The purpose of this patch is to expose more information about ImmLeaf-like
PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf
could only be used to test int64_t's produced by sign-extending an APInt.
Other tests on immediates had to use the generic PatLeaf and extract the
constant using C++.
With this patch, tablegen will know how to generate predicates for APInt,
and APFloat. This will allow it to 'do the right thing' for both SelectionDAG
and GlobalISel which require different methods of extracting the immediate
from the IR.
This is NFC for SelectionDAG since the new code is equivalent to the
previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1
for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new
subclasses will require a significant re-factor of FastISel.
For GlobalISel, it's currently NFC because the relevant code to import the
affected rules is not yet present. This will be added in a later patch.
Depends on D36086
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D36534
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315747 91177308-0d34-0410-b5e6-96231b3b80d8
Significantly reduces performancei (~30%) of gipfeli
(https://github.com/google/gipfeli)
I have not yet managed to reproduce this regression with the open-source
version of the benchmark on github, but will work with others to get a
reproducer to you later today.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315680 91177308-0d34-0410-b5e6-96231b3b80d8
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315637 91177308-0d34-0410-b5e6-96231b3b80d8
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315633 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with
LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP.
Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods.
Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so
it'll be picked up by public headers.
Differential Revision: https://reviews.llvm.org/D38406
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315590 91177308-0d34-0410-b5e6-96231b3b80d8
Since r315388 we have a shorter way to say this, so we'll replace
MI->getParent()->getParent() with MI->getMF() in a few places.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315390 91177308-0d34-0410-b5e6-96231b3b80d8
Similarly to how Instruction has getFunction, this adds a less verbose
way to write MI->getParent()->getParent(). I'll follow up shortly with
a change that changes a bunch of the uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315388 91177308-0d34-0410-b5e6-96231b3b80d8
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!
To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.
Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.
Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda
Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic
Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb
Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315136 91177308-0d34-0410-b5e6-96231b3b80d8
In some cases an instruction is deleted from the block during combining,
however it can still exist in the legalizer worklist.
This change modifies the combiner routines to add the given MI to the
dead instruction list rather than trying to remove it from the block
themselves. The responsibility is then on the caller to delete the instruction
from the block and any worklists.
Differential Revision: https://reviews.llvm.org/D38622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315092 91177308-0d34-0410-b5e6-96231b3b80d8
Recommitting r314517 with the fix for handling ConstantExpr.
Original commit message:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
mode in the target. However, since it doesn't check its actual users, it will
return FREE even in cases where the GEP cannot be folded away as a part of
actual addressing mode. For example, if an user of the GEP is a call
instruction taking the GEP as a parameter, then the GEP may not be folded in
isel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314923 91177308-0d34-0410-b5e6-96231b3b80d8
It broke the Chromium / SQLite build; see PR34830.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314919 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314886 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r314729.
Another bug has been encountered in an out-of-tree target reported by Quentin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314814 91177308-0d34-0410-b5e6-96231b3b80d8
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314729 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314517 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314375 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
According to https://gcc.gnu.org/wiki/SplitStacks, the linker expects a zero-sized .note.GNU-split-stack section if split-stack is used (and also .note.GNU-no-split-stack section if it also contains non-split-stack functions), so it can handle the cases where a split-stack function calls non-split-stack function.
This change adds the sections if needed.
Fixes PR #34670.
Reviewers: thanm, rnk, luqmana
Reviewed By: rnk
Subscribers: llvm-commits
Patch by Cherry Zhang <cherryyz@google.com>
Differential Revision: https://reviews.llvm.org/D38051
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314335 91177308-0d34-0410-b5e6-96231b3b80d8