We were handling some vectors in foldSelectIntoOp, but not if the operand of the bin op was any kind of vector constant. This patch fixes it to treat vector splats the same as scalars.
Differential Revision: https://reviews.llvm.org/D37232
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311940 91177308-0d34-0410-b5e6-96231b3b80d8
This extends the current llvm-rc parser by ICON and HTML resources.
Moreover, some tests have been slightly rewritten.
Thanks for Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D36891
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311939 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
STRQro* instructions are slower than the alternative ADD/STRQui expanded
instructions on Falkor, so avoid generating them unless we're optimizing
for code size.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D37020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311931 91177308-0d34-0410-b5e6-96231b3b80d8
When peeling kicks in, it updates the loop preheader.
Later, a successful full unroll of the loop needs to update a PHI
which i-th argument comes from the loop preheader, so it'd better look
at the correct block. Fixes PR33437.
Differential Revision: https://reviews.llvm.org/D37153
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311922 91177308-0d34-0410-b5e6-96231b3b80d8
ARMv4 doesn't support the "BX" instruction, which has been introduced
with ARMv4t. Adjust the call lowering and tail call implementation
accordingly.
Further changes are necessary to ensure that presence of the v4t feature
is correctly set. Most importantly, the "generic" CPU for thumb-*
triples should include ARMv4t, since thumb mode without thumb support
would naturally be pointless.
Add a couple of asserts to ensure thumb instructions are not emitted
without CPU support.
Differential Revision: https://reviews.llvm.org/D37030
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311921 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes 2 problems in subregister hierarchies with multiple levels
and tuples:
1) For bigger tuples computing secondary subregs would miss 2nd order
effects. In the test case a register like `S10_S11_S12_S13_S14` with D5
= S10_S11, D6 = S12_S13 we would correctly compute sub0 = D5, sub1 = D6
but would miss the fact that we could now form ssub0_ssub1_ssub2_ssub3
(aka sub0_sub1) = D5_D6. This is fixed by changing
computeSecondarySubRegs() to compute a fixpoint.
2) Fixing 1) exposed a problem where TableGen would create multiple
names for effectively the same subregister index. In the test case
the subregister index sub0 is composed from ssub0 and ssub1, and sub1 is
composed from ssub2 and ssub3. TableGen should not create both sub0_sub1
and ssub0_ssub1_ssub2_ssub3 as infered subregister indexes. This changes
the code to build a transitive closure of the subregister components
before forming new concatenated subregister indexes.
This fix was developed for an out of tree target. For the in-tree
targets the only change is in the register information computed for ARM.
There is a slight chance this fixed/improved some register coalescing
around the QQQQ/QQ register classes there but I couldn't see/provoke any
code generation differences.
Differential Revision: https://reviews.llvm.org/D36913
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311914 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
ARMLoadStoreOpt::FixInvalidRegPairOp() was only checking if one of the
load destination registers to be split overlapped with the base register
if the base register was marked as killed. Since kill flags may not
always be present, this can lead to incorrect code.
This bug was exposed by my MachineCopyPropagation change D30751 breaking
the sanitizer-x86_64-linux-android buildbot.
Also clean up some dead code and add an assert that a register offset is
never encountered by this code, since it does not handle them correctly.
Reviewers: MatzeB, qcolombet, t.p.northover
Subscribers: aemerson, javed.absar, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37164
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311907 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, a phi node is created in the normal destination to unify the return values from promoted calls and the original indirect call. This patch makes this phi node to be created only when the return value has uses.
This patch is necessary to generate valid code, as compiler crashes with the attached test case without this patch. Without this patch, an illegal phi node that has no incoming value from `entry`/`catch` is created in `cleanup` block.
I think existing implementation is good as far as there is at least one use of the original indirect call. `insertCallRetPHI` creates a new phi node in the normal destination block only when the original indirect call dominates its use and the normal destination block. Otherwise, `fixupPHINodeForNormalDest` will handle the unification of return values naturally without creating a new phi node. However, if there's no use, `insertCallRetPHI` still creates a new phi node even when the original indirect call does not dominate the normal destination block, because `getCallRetPHINode` returns false.
Reviewers: xur, davidxl, danielcdh
Reviewed By: xur
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37176
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311906 91177308-0d34-0410-b5e6-96231b3b80d8
S_UDT symbols are the debugger's "index" for all the structs,
typedefs, classes, and enums in a program. If any of those
structs/classes don't have a complete declaration, or if there
is a typedef to something that doesn't have a complete definition,
then emitting the S_UDT is unhelpful because it doesn't give
the debugger enough information to do anything useful. On the
other hand, it results in a huge size blow-up in the resulting
PDB, which is exacerbated by an order of magnitude when linking
with /DEBUG:FASTLINK.
With this patch, we drop S_UDT records for types that refer either
directly or indirectly (e.g. through a typedef, pointer, etc) to
a class/struct/union/enum without a complete definition. This
brings us about 50% of the way towards parity with /DEBUG:FASTLINK
PDBs generated from cl-compiled object files.
Differential Revision: https://reviews.llvm.org/D37162
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311904 91177308-0d34-0410-b5e6-96231b3b80d8
Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was
allowed to produce HSAIL's nsqrt instruction. HSAIL is not here
and we stick with non-existing native_sqrt(double) as a result.
Add check for f64 to not return native functions and also remove
handling of f64 case for fold_sqrt.
Differential Revision: https://reviews.llvm.org/D37223
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311900 91177308-0d34-0410-b5e6-96231b3b80d8
Differential Revision: https://reviews.llvm.org/D36788
M lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp
M lib/Target/X86/Disassembler/X86DisassemblerDecoder.h
A test/MC/Disassembler/X86/prefixes-i386.s
A test/MC/Disassembler/X86/prefixes-x86_64.s
M test/MC/Disassembler/X86/prefixes.txt
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311882 91177308-0d34-0410-b5e6-96231b3b80d8
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud
Differential Revision: https://reviews.llvm.org/D36663
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311879 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables generation of NMADD and NMSUB instructions when fneg node
is present. These instructions are currently only generated if fsub node is
present.
Patch by Stanislav Ocovaj.
Differential Revision: https://reviews.llvm.org/D34507
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311862 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.
This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311849 91177308-0d34-0410-b5e6-96231b3b80d8
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.
Add some more test cases to ensure we've also got most of the zero masking covered too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311837 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.
This will also fix PR 33784
Reviewers: zvi, delena, RKSimon, thakis
Reviewed By: RKSimon
Subscribers: chandlerc, eladcohen, llvm-commits
Differential Revision: https://reviews.llvm.org/D35788
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311833 91177308-0d34-0410-b5e6-96231b3b80d8
This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311821 91177308-0d34-0410-b5e6-96231b3b80d8
We can probably add patterns to fix some of them. But the ones that use 'and' as their root node emit a X86ISD::CMP node in front of the 'and' and then pattern matching that to 'test' instruction. We can't use a tablegen pattern to fix that because we can't remap the cmp result to the flag output of a TBM instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311819 91177308-0d34-0410-b5e6-96231b3b80d8
to instructions.
These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.
Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.
I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.
Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.
Differential Revision: https://reviews.llvm.org/D37130
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311806 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
SimplifyIndVar may introduce zext instructions to widen arguments of the
loop exit check. They should not prevent us from splitting the loop at
the induction variable, but maybe the check should be more conservative,
e.g. making sure it only extends arguments used by a comparison?
Reviewers: karthikthecool, mcrosier, mzolotukhin
Reviewed By: mcrosier
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34879
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311783 91177308-0d34-0410-b5e6-96231b3b80d8
Fix a test that is failing on a downstream ARM/AArch64
bootstrap. We just need add an elf_x86_64 parameter to
gold.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311780 91177308-0d34-0410-b5e6-96231b3b80d8
There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit.
Differential Revision: https://reviews.llvm.org/D36936
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311773 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Add musttail to any resume instructions that is immediately followed by a
suspend (i.e. ret). We do this even in -O0 to support guaranteed tail call
for symmetrical coroutine control transfer (C++ Coroutines TS extension).
This transformation is done only in the resume part of the coroutine that has
identical signature and calling convention as the coro.resume call.
Reviewers: GorNishanov
Reviewed By: GorNishanov
Subscribers: EricWF, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D37125
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311751 91177308-0d34-0410-b5e6-96231b3b80d8
FeatureSlowUAMem32.
The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.
The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.
The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/
It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311740 91177308-0d34-0410-b5e6-96231b3b80d8
The comment for this code indicated that it should work similar to our
handling of add lowering above: if we see uses of an instruction other
than flag usage and store usage, it tries to avoid the specialized
X86ISD::* nodes that are designed for flag+op modeling and emits an
explicit test.
Problem is, only the add case actually did this. In all the other cases,
the logic was incomplete and inverted. Any time the value was used by
a store, we bailed on the specialized X86ISD node. All of this appears
to have been historical where we had different logic here. =/
Turns out, we have quite a few patterns designed around these nodes. We
should actually form them. I fixed the code to match what we do for add,
and it has quite a positive effect just within some of our test cases.
The only thing close to a regression I see is using:
notl %r
testl %r, %r
instead of:
xorl -1, %r
But we can add a pattern or something to fold that back out. The
improvements seem more than worth this.
I've also worked with Craig to update the comments to no longer be
actively contradicted by the code. =[ Some of this still remains
a mystery to both Craig and myself, but this seems like a large step in
the direction of consistency and slightly more accurate comments.
Many thanks to Craig for help figuring out this nasty stuff.
Differential Revision: https://reviews.llvm.org/D37096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311737 91177308-0d34-0410-b5e6-96231b3b80d8
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.
Steps in this patch:
1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
enable this transform. Other targets will probably want that anyway to distinguish scalars
from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.
2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
vector ext is often free.
Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.
Differential Revision: https://reviews.llvm.org/D36840
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311731 91177308-0d34-0410-b5e6-96231b3b80d8
There are 3 small independent changes here:
1. Account for multiple uses in the pattern matching: avoid the transform if it increases the instruction count.
2. Add a missing fold for the case where the numerator is the constant: http://rise4fun.com/Alive/E2p
3. Enable all folds for vector types.
There's still one more potential change - use "shouldChangeType()" to keep from transforming to an illegal integer type.
Differential Revision: https://reviews.llvm.org/D36988
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311726 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: We need to have accurate-sample-profile in function attribute so that it works with LTO.
Reviewers: davidxl, rsmith
Reviewed By: davidxl
Subscribers: sanjoy, mehdi_amini, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37113
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311706 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Currently FastISel lowers constexpr calls as indirect calls.
We'd like those to direct calls, and falling back to SelectionDAGISel
handles that.
Reviewers: dschuff, sunfish
Subscribers: jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D37073
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311693 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a warning when there are zero defined predicates and also fixes an
unnoticed bug where the first predicate in the table was unusable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311684 91177308-0d34-0410-b5e6-96231b3b80d8
The check (assuming positive stride) for validity of memmove should be
(a) the destination is at a lower address than the source, or
(b) the distance between the source and destination is greater than or
equal the number of bytes copied.
For the second part it is sufficient to assume that the destination
is at a higher address, since the opposite case is covered by (a).
The distance calculation was previously done by subtracting the
pointers in the wrong order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311650 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds support for predicates on imm nodes but only for ImmLeaf and not
for PatLeaf or PatFrag and only where the value does not need to be transformed
before being rendered into the instruction.
The limitation on PatLeaf/PatFrag/SDNodeXForm is due to differences in the
necessary target-supplied C++ for GlobalISel.
Depends on D36085
The previous commit was reverted for breaking the build but this appears to have
been the recurring problem on the Windows bots with tablegen not being re-run
when llvm-tblgen is changed but the .td's aren't. If it re-occurs then forcing a
build with clean=True should fix it but this string should do this in advance:
Requires a clean build.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311645 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is intended to enable the use of basic double letter constraints used in GCC extended inline asm {Yi Y2 Yz Y0 Ym Yt}.
Supersedes D35204
Clang counterpart: D36371
Differential Revision: https://reviews.llvm.org/D36369
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311644 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When reassociating an expression, do not drop the instruction's
original debug location in case the replacement location is
missing.
The debug location must at least not be dropped for inlinable
callsites of debug-info-bearing functions in debug-info-bearing
functions. Failing to do so would result in an "inlinable function "
"call in a function with debug info must have a !dbg location"
error in the verifier.
As preserving the original debug location is not expected
to result in overly jumpy debug line information, it is
preserved for all other cases too.
This fixes PR34231:
https://bugs.llvm.org/show_bug.cgi?id=34231
Original patch by David Stenberg
Reviewers: davide, craig.topper, mcrosier, dblaikie, aprantl
Reviewed By: davide, aprantl
Subscribers: aprantl
Differential Revision: https://reviews.llvm.org/D36865
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311642 91177308-0d34-0410-b5e6-96231b3b80d8
Some refactoring to X86AsmParser, mostly regarding the way rewrites are conducted.
Mainly, we try to concentrate all the rewrite effort under one hood, so it'll hopefully be less of a mess and easier to maintain and understand.
naturally, some frontend tests were affected: D36794
Differential Revision: https://reviews.llvm.org/D36793
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311639 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes not finding the called global for AMDGPU
call pseudoinstructions, which prevented IPRA
from doing much.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311637 91177308-0d34-0410-b5e6-96231b3b80d8
Mostly this involved giving unnamed values names and running the IR
through `opt` to re-format it but merging in any important comments in
the original. I then deleted pointless comments and inlined the function
attributes for ease of reading and editting.
All of this is to make it much easier to see the instructions being
generated here and evaluate any updates to the tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311634 91177308-0d34-0410-b5e6-96231b3b80d8
When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.
Fixes PR34137.
Landing on behalf of Nirav.
Differential Revision: https://reviews.llvm.org/D36581
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311623 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch adds test to cover the logic guarded by "accurate-sample-profile" flag.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: sanjoy, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D37084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311618 91177308-0d34-0410-b5e6-96231b3b80d8
Switching to external relocations for ARM-mode branches (to allow Thumb
interworking when the offset is unencodable) causes calls to temporary symbols
to be miscompiled and instead go to the parent externally visible symbol.
Calling a temporary never happens in compiled code, but can occasionally in
hand-written assembly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311611 91177308-0d34-0410-b5e6-96231b3b80d8
Current PGO only annotates the edge weight for branch and switch instructions
with profile counts. We should also annotate the indirectbr instruction as
all the information is there. This patch enables the annotating for indirectbr
instructions. Also uses this annotation in branch probability analysis.
Differential Revision: https://reviews.llvm.org/D37074
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311604 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.
This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.
See also PR22780, for making DIExpression not be an MDNode.
Reviewers: aprantl, dexonsmith, dblaikie
Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37075
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311594 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.
Differential Revision : https: // reviews.llvm.org/D32776
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311588 91177308-0d34-0410-b5e6-96231b3b80d8
There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them.
On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal.
This fixes PR34139.
Differential Revision: https://reviews.llvm.org/D36992
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311572 91177308-0d34-0410-b5e6-96231b3b80d8
The lowering isn't really an optimization, so optnone shouldn't make a
difference. ARM relies on the pass running when using "-mthread-model
single", because in that mode, it doesn't run AtomicExpand. See bug for
more details.
Differential Revision: https://reviews.llvm.org/D37040
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311565 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If a coroutine outer calls another coroutine inner and the inner coroutine body is inlined into the outer, coro.begin from the inner coroutine should be considered for spilling if accessed across suspends.
Prior to this change, coroutine frame building code was not considering any coro.begins for spilling.
With this change, we only ignore coro.begin for the current coroutine, but, any coro.begins that were inlined into the current coroutine are eligible for spills.
Fixes PR34267
Reviewers: GorNishanov
Subscribers: qcolombet, llvm-commits, EricWF
Differential Revision: https://reviews.llvm.org/D37062
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311556 91177308-0d34-0410-b5e6-96231b3b80d8
..if the resulting subtract will be broken up later. This can cause us to get
into an infinite loop.
x + (-5.0 * y) -> x - (5.0 * y) ; Canonicalize neg const
x - (5.0 * y) -> x + (0 - (5.0 * y)) ; Break up subtract
x + (0 - (5.0 * y)) -> x + (-5.0 * y) ; Replace 0-X with X*-1.
PR34078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311554 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds support for predicates on imm nodes but only for ImmLeaf and not for PatLeaf or PatFrag and only where the value does not need to be transformed before being rendered into the instruction.
The limitation on PatLeaf/PatFrag/SDNodeXForm is due to differences in the necessary target-supplied C++ for GlobalISel.
Depends on D36085
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311546 91177308-0d34-0410-b5e6-96231b3b80d8
Currently this test causes test failures on some machines, due to isel not being registered. Update the test to run all passes and check emitted assembly instructions for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311545 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: In some cases, shufflevector instruction can be transformed involving insert_subvector instructions. The ARM backend was missing some insert_subvector patterns, causing a failure during instruction selection. AArch64 has similar patterns.
Reviewers: t.p.northover, olista01, javed.absar, rengolin
Reviewed By: javed.absar
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D36796
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311543 91177308-0d34-0410-b5e6-96231b3b80d8
lld was broken in this regard (PR33097). The gold plugin gets this
right so, no changes needed, but better adding a test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311541 91177308-0d34-0410-b5e6-96231b3b80d8
- recommitting after fixing a test failure on MacOS
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).
This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.
e.g. (x | 0xFFFFFFFF) should be
ori 3, 3, 65535
oris 3, 3, 65535
but LLVM generates without this patch
li 4, 0
oris 4, 4, 65535
ori 4, 4, 65535
or 3, 3, 4
Differential Revision: https://reviews.llvm.org/D34757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311538 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).
This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.
e.g. (x | 0xFFFFFFFF) should be
ori 3, 3, 65535
oris 3, 3, 65535
but LLVM generates without this patch
li 4, 0
oris 4, 4, 65535
ori 4, 4, 65535
or 3, 3, 4
Differential Revision: https://reviews.llvm.org/D34757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311526 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change achieves two things:
- Redefine the Custom Event handling instrumentation points emitted by
the compiler to not require dynamic relocation of references to the
__xray_CustomEvent trampoline.
- Remove the synthetic reference we emit at the end of a function that
we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER
associated with the section where the function is defined.
To achieve the custom event handling change, we've had to introduce the
concept of sled versioning -- this will need to be supported by the
runtime to allow us to understand how to turn on/off the new version of
the custom event handling sleds. That change has to land first before we
change the way we write the sleds.
To remove the synthetic reference, we rely on a relatively new linker
feature that preserves the sections that are associated with each other.
This allows us to limit the effects on the .text section of ELF
binaries.
Because we're still using absolute references that are resolved at
runtime for the instrumentation map (and function index) maps, we mark
these sections write-able. In the future we can re-define the entries in
the map to use relative relocations instead that can be statically
determined by the linker. That change will be a bit more invasive so we
defer this for later.
Depends on D36816.
Reviewers: dblaikie, echristo, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36615
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311525 91177308-0d34-0410-b5e6-96231b3b80d8
-mcpu=# will support:
. generic: the default insn set
. v1: insn set version 1, the same as generic
. v2: insn set version 2, version 1 + additional jmp insns
. probe: the compiler will probe the underlying kernel to
decide proper version of insn set.
We did not not use -mcpu=native since llc/llvm will interpret -mcpu=native
as the underlying hardware architecture regardless of -march value.
Currently, only x86_64 supports -mcpu=probe. Other architecture will
silently revert to "generic".
Also added -mcpu=help to print available cpu parameters.
llvm will print out the information only if there are at least one
cpu and at least one feature. Add an unused dummy feature to
enable the printout.
Examples for usage:
$ llc -march=bpf -mcpu=v1 -filetype=asm t.ll
$ llc -march=bpf -mcpu=v2 -filetype=asm t.ll
$ llc -march=bpf -mcpu=generic -filetype=asm t.ll
$ llc -march=bpf -mcpu=probe -filetype=asm t.ll
$ llc -march=bpf -mcpu=v3 -filetype=asm t.ll
'v3' is not a recognized processor for this target (ignoring processor)
...
$ llc -march=bpf -mcpu=help -filetype=asm t.ll
Available CPUs for this target:
generic - Select the generic processor.
probe - Select the probe processor.
v1 - Select the v1 processor.
v2 - Select the v2 processor.
Available features for this target:
dummy - unused feature.
Use +feature to enable a feature, or -feature to disable it.
For example, llc -mcpu=mycpu -mattr=+feature1,-feature2
...
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311522 91177308-0d34-0410-b5e6-96231b3b80d8
The output of this test changed after the fix in r311520 to have
-run-pass=block-placement behave like it does in a normal pipeline.
Adjust the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311521 91177308-0d34-0410-b5e6-96231b3b80d8
This also changes the TailDuplicator to be configured explicitely
pre/post regalloc rather than relying on the isSSA() flag. This was
necessary to have `llc -run-pass` work reliably.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311520 91177308-0d34-0410-b5e6-96231b3b80d8
Looks like for 'and' and 'or' we end up performing at least some of the transformations this is bocking in a round about way anyway.
For 'and sext(cmp1), sext(cmp2) we end up later turning it into 'select cmp1, sext(cmp2), 0'. Then we optimize that back to sext (and cmp1, cmp2). This is the same result we would have gotten if shouldOptimizeCast hadn't blocked it. We do something analogous for 'or'.
With this patch we allow that transformation to happen directly in foldCastedBitwiseLogic. And we now support the same thing for 'xor'. This is definitely opening up many other cases, but since we already went around it for some cases hopefully it's ok.
Differential Revision: https://reviews.llvm.org/D36213
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311508 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds printing for DW_AT_type DIEs like it's currently already
the case for DW_AT_specification DIEs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311492 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Use the initialexec TLS type and eliminate calls to the TLS
wrapper. Fixes the sanitizer-x86_64-linux-fuzzer bot failure.
Reviewers: vitalybuka, kcc
Reviewed By: kcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37026
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311490 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
This is reapplies the original patch r311057 that was reverted in r311381.
The previous version wasn't using the batch update api for updating dominators,
which in vary rare cases caused assertion failures.
This also fixes PR34258.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311467 91177308-0d34-0410-b5e6-96231b3b80d8
I've replaced the two OS-specific runs with a generic run because
there's no functional difference in the resulting output that
we're checking. Also, the script still doesn't work with a Win
target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311463 91177308-0d34-0410-b5e6-96231b3b80d8
Armv8.3-A adds instructions that convert a double-precision floating
point number to a signed 32-bit integer with round towards zero,
designed for improving Javascript performance.
Differential Revision: https://reviews.llvm.org/D36785
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311448 91177308-0d34-0410-b5e6-96231b3b80d8
When expanding a BRCOND into a BR_CC, do not create an AND 1
if one already exists.
Review: D36705
Patch by Joel Galenson <jgalenson@google.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311447 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM backend should call setBooleanContents so that it can
use known bits to make some optimizations.
Review: D35821
Patch by Joel Galenson <jgalenson@google.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311446 91177308-0d34-0410-b5e6-96231b3b80d8
This is PR33245.
Case I am fixing is next:
Imagine we have 2 BC files, one defines and uses personality routine,
second has only declaration and also uses it.
Previously algorithm computing dead symbols (llvm::computeDeadSymbols) did
not know about personality routines and leaved them dead even if function that
has routine was live.
As a result thinLTOInternalizeAndPromoteGUID() method changed binding for
such symbol to local. Later when LLD tried to link these objects it failed
because one object had undefined global symbol for routine and second
object contained local definition instead of global.
Patch set the live root flag on the corresponding FunctionSummary
for personality routines when we build the per-module summaries
during the compile step.
Differential revision: https://reviews.llvm.org/D36834
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311432 91177308-0d34-0410-b5e6-96231b3b80d8
ISD::isConstantSplatVector can shrink to the smallest splat width. But we don't check the size of the resulting APInt at all. This can cause us to misinterpret the results.
This patch just adds a flag to prevent the APInt from changing width.
Fixes PR34271.
Differential Revision: https://reviews.llvm.org/D36996
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311429 91177308-0d34-0410-b5e6-96231b3b80d8
If a struct would end up half in GPRs and half on SP the ABI says it should
actually go entirely on the stack. We were getting this wrong in GlobalISel
before, causing compatibility issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311388 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
From r303590, ModuleFlagBehavior for PIC and PIE level is changed from
Error to Max. This will cause bitcode compatibility issue when linking
against a bitcode static archive built with old compiler.
Add an auto-ugprade path to upgrade the the ModuleFlagBehavior in the
old bitcode to match the new one so IRLinker can link them.
Reviewers: tejohnson, mehdi_amini, dexonsmith
Reviewed By: dexonsmith
Subscribers: hans, llvm-commits
Differential Revision: https://reviews.llvm.org/D36556
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311387 91177308-0d34-0410-b5e6-96231b3b80d8
The 1st try was reverted because it could inf-loop by creating a dead instruction.
Fixed that to not happen and added a test case to verify.
Original commit message:
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311366 91177308-0d34-0410-b5e6-96231b3b80d8
This is similar to what was already done in foldSelectICmpAndOr. Ultimately I'd like to see if we can call foldSelectICmpAnd from foldSelectIntoOp if we detect a power of 2 constant. This would allow us to remove foldSelectICmpAndOr entirely.
Differential Revision: https://reviews.llvm.org/D36498
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311362 91177308-0d34-0410-b5e6-96231b3b80d8
This is the baseline (current) version of the tests that would
have been added with the transform in r311333 (reverted at
r311340 due to inf-looping).
Adding these now to aid in testing and minimize the patch if/when
it is reinstated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311350 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the bitsToClear from the LHS of an 'and' comes back non-zero, but all of those bits are known zero on the RHS, we can reset bitsToClear.
Without this, the 'or' in the modified test case blocks the transform because it has non-zero bits in its RHS in those bits.
Reviewers: spatel, majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36944
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311343 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: With masked operations, its possible for the operation node like fadd, fsub, etc. to be used by multiple different vselects. Since the pattern matching will start at the vselect, we need to make sure the operation node itself is only used once before we can fold a load. Otherwise we'll end up folding the same load into multiple instructions.
Reviewers: RKSimon, spatel, zvi, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36938
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311342 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for dumping a summary of module symbols
and CodeView debug chunks. This option prints a table for
each module of all of the symbols that occurred in the module
and the number of times it occurred and total byte size. Then
at the end it prints the totals for the entire file.
Additionally, this patch adds the -jmc (just my code) option,
which suppresses modules which are from external libraries or
linker imports, so that you can focus only on the object files
and libraries that originate from your own source code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311338 91177308-0d34-0410-b5e6-96231b3b80d8
Try to fold:
memcmp(X, C, ConstantLength) == 0 --> load X == *C
Without this change, we're unnecessarily checking the alignment of the constant data,
so we miss the transform in the first 2 tests in the patch.
I noted this shortcoming of LibCallSimpifier in one of the recent CGP memcmp expansion
patches. This doesn't help the example in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13
...directly, but it's worth short-circuiting more of these simple cases since we're
already trying to do that.
The benefit of transforming to load+cmp is that existing IR analysis/transforms may
further simplify that code. For example, if the load of the variable is common to
multiple memcmp calls, CSE can remove the duplicate instructions.
Differential Revision: https://reviews.llvm.org/D36922
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311333 91177308-0d34-0410-b5e6-96231b3b80d8
Preparations to use the per-increment are sometimes done in the target
independent pass Loop Strength Reduction. We try to detect them in the PowerPC
specific pass so that they are not done twice and so that we do not add PHIs
that are not required.
Differential Revision: https://reviews.llvm.org/D36736
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311332 91177308-0d34-0410-b5e6-96231b3b80d8
Re-committing after r311325 fixed an unintentional use of '#' comments in
clang.
The '#' token is not a comment for all targets (on ARM and AArch64 it marks an
immediate operand), so we shouldn't treat it as such.
Comments are already converted to AsmToken::EndOfStatement by
AsmLexer::LexLineComment, so this check was unnecessary.
Differential Revision: https://reviews.llvm.org/D36405
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311326 91177308-0d34-0410-b5e6-96231b3b80d8
widely used processors.
This occured to me when I saw that we were generating 'inc' and 'dec'
when for Haswell and newer we shouldn't. However, there were a few "X is
slow" things that we should probably just set.
I've avoided any of the "X is fast" features because most of those would
be pretty serious regressions on processors where X isn't actually fast.
The slow things are likely to be negligible costs on processors where
these aren't slow and a significant win when they are slow.
In retrospect this seems somewhat obvious. Not sure why we didn't do
this a long time ago.
Differential Revision: https://reviews.llvm.org/D36947
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311318 91177308-0d34-0410-b5e6-96231b3b80d8
rather than doing a separate comparison.
This both saves an explicit comparision and avoids the use of `xadd`
which introduces register constraints and other challenges to the
generated code.
The motivating case is from atomic reference counts where `1` is the
sentinel rather than `0` for whatever reason. This can and should be
lowered efficiently on x86 by just using a different flag, however the
x86 code only handled the `0` case.
There remains some further opportunities here that are currently hidden
due to canonicalization. I've included test cases that show these and
FIXMEs. However, I don't at the moment have any production use cases and
they seem substantially harder to address.
Differential Revision: https://reviews.llvm.org/D36945
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311317 91177308-0d34-0410-b5e6-96231b3b80d8
There's no functional difference between the AVX512DQ instructions if we're not masking.
This change unifies test checks and removes extra isel entries. Similar was done for subvector insert and extracts recently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311308 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When extracting the instrumentation map from a binary, we should be able
to recognize the new kinds of instrumentation sleds we've been emitting
with the compiler using -fxray-instrument. This change adds a test for
all the kinds of sleds we currently support (sans the tail-call sled,
which is a bit harder to force in a simple prebuilt input).
Reviewers: kpw, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36819
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311305 91177308-0d34-0410-b5e6-96231b3b80d8
We're missing some single use checks in the sse_load_f32/f64 handling that cause us to replicate the load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311300 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Support call ABI. For now only Linux C and X86_64_SysV calling conventions supported. Variadic function not supported.
Reviewers: zvi, guyblank, oren_ben_simhon
Reviewed By: oren_ben_simhon
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34602
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311279 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds.
So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33951
Reviewers: anemet, chandlerc
Reviewed By: anemet, chandlerc
Subscribers: javed.absar, chandlerc, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D36906
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311271 91177308-0d34-0410-b5e6-96231b3b80d8
cmov self-refrencing.
Pointed out by Amjad Aboud in code review, test case minorly simplified
from the one he posted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311267 91177308-0d34-0410-b5e6-96231b3b80d8
This is the exact same fix as in SVN r247254. In that commit, the fix was
applied only for isVTRNMask and isVTRN_v_undef_Mask, but the same issue
is present for VZIP/VUZP as well.
This fixes PR33921.
Differential Revision: https://reviews.llvm.org/D36899
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311258 91177308-0d34-0410-b5e6-96231b3b80d8
The tests added in r311254 require a target triple since they are
running through code generation. Fix bot failures by requiring
an x86 target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311257 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.
Reviewers: zvi, delena, RKSimon, thakis
Reviewed By: RKSimon
Subscribers: chandlerc, eladcohen, llvm-commits
Differential Revision: https://reviews.llvm.org/D35788
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311255 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Follow up to fix in r311023, which fixed the case where the combined
index is written to disk. The same samplePGO logic exists for the
in-memory index when computing imports, so we need to filter out
GlobalVariable summaries there too.
Reviewers: davidxl
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D36919
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311254 91177308-0d34-0410-b5e6-96231b3b80d8
a function into itself.
We tried to fix this before in r306495 but that got reverted as the
assert was actually hit.
This fixes the original bug (which we seem to have lost track of with
the revert) by blocking a second remapping when the function being
inlined is also the caller and the remapping could succeed but
erroneously.
The included test case would actually load from an inlined copy of the
alloca before this change, failing to load the stored value and
miscompiling.
Many thanks to Richard Smith for diagnosing a user miscompile to this
bug, and to Kyle for the first attempt and initial analysis and David Li
for remembering the issue and how to fix it and suggesting the patch.
I'm just stitching it together and landing it. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311229 91177308-0d34-0410-b5e6-96231b3b80d8
mt.exe performs a tree merge where certain element nodes are combined
into one. This introduces the possibility of xml namespaces conflicting
with each other. The original mt.exe has a hierarchy whereby certain
namespace names can override others, and nodes that would then end up in
ambigious namespaces have their namespaces explicitly defined. This
namespace handles this merging process.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311215 91177308-0d34-0410-b5e6-96231b3b80d8
Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations.
In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned
predicates, but we did not check this for `S` which can in theory be negative.
This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative,
and it adds a test where such situation may lead to incorrect conditions calculation.
Differential Revision: https://reviews.llvm.org/D36873
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311205 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Memcpy intrinsics have size argument of any integer type, like i32 or i64.
Fixed size type along with its value when cloning the intrinsic.
Reviewers: davidxl, xur
Reviewed By: davidxl
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D36844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311188 91177308-0d34-0410-b5e6-96231b3b80d8
The internal (__text-relative) relocation risks the offset not being encodable
if the destination is Thumb.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311187 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Augment SanitizerCoverage to insert maximum stack depth tracing for
use by libFuzzer. The new instrumentation is enabled by the flag
-fsanitize-coverage=stack-depth and is compatible with the existing
trace-pc-guard coverage. The user must also declare the following
global variable in their code:
thread_local uintptr_t __sancov_lowest_stack
https://bugs.llvm.org/show_bug.cgi?id=33857
Reviewers: vitalybuka, kcc
Reviewed By: vitalybuka
Subscribers: kubamracek, hiraditya, cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D36839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311186 91177308-0d34-0410-b5e6-96231b3b80d8
As for now, the parser supports a limited set of statements and
resources. This will be extended in the following patches.
Thanks to Nico Weber (thakis) for his original work in this area.
This patch was originally submitted as r311175 and got reverted
in r311177 because of the problems with compilation under gcc.
Differential Revision: https://reviews.llvm.org/D36340
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311184 91177308-0d34-0410-b5e6-96231b3b80d8
As for now, the parser supports a limited set of statements and
resources. This will be extended in the following patches.
Thanks to Nico Weber (thakis) for his original work in this area.
Differential Revision: https://reviews.llvm.org/D36340
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311175 91177308-0d34-0410-b5e6-96231b3b80d8
Armv8.2-A adds FP16 support, i.e. f16 is not only a storage-only type, but it
also supports performing data processing on 16-bit floating-point quantities.
All the necessary (tablegen) groundwork of adding the ARMv8.2-A FP16 (scalar)
instructions was done in D15014. To take advantage of this, this patch avoids
promotion of f16 to f32 types when the subtarget supports FullFP16, which
enables instruction selection of these FP16 instructions.
Differential Revision: https://reviews.llvm.org/D36396
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311154 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit e8fd209647 in an
attempt to appease the GlobalISel buildbot, which fails in the
test-suite with errors like
fpcmp: files differ without tolerance allowance
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311151 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r311135.
sanitizer-x86_64-linux-android buildbot is timing out with just this
patch applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311142 91177308-0d34-0410-b5e6-96231b3b80d8
There's really no reason to do this we should just let isel pick the integer version and let the execution dependency fixing pass take care of moving to FP if necessary.
It's not very reliable to look for bitcasts at the edges of patterns. If for some reason one input was bitcasted and the other wasn't, or if one was a v4f32 bitcast and one was a v2f64 bitcast, we would have fallen back to the integer pattern anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311138 91177308-0d34-0410-b5e6-96231b3b80d8
If a struct would end up half in GPRs and half on SP the ABI says it should
actually go entirely on the stack. We were getting this wrong in GlobalISel
before, causing compatibility issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311137 91177308-0d34-0410-b5e6-96231b3b80d8
Two issues identified by buildbots were addressed:
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311135 91177308-0d34-0410-b5e6-96231b3b80d8
We've discussed canonicalizing to this form in IR, so the backend
should be prepared to lower these in ways better than what we see
here in most cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311103 91177308-0d34-0410-b5e6-96231b3b80d8
We've discussed canonicalizing to this form in IR, so the backend
should be prepared to lower these in ways better than what we see
here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311099 91177308-0d34-0410-b5e6-96231b3b80d8
The SelectionDAGBuilder translates various conditional branches into
CaseBlocks which are then translated into SDNodes. If a conditional
branch results in multiple CaseBlocks only the first CaseBlock is
translated into SDNodes immediately, the rest of the CaseBlocks are
put in a queue and processed when all LLVM IR instructions in the
basic block have been processed.
When a CaseBlock is transformed into SDNodes the SelectionDAGBuilder
is queried for the current LLVM IR instruction and the resulting
SDNodes are annotated with the debug info of the current
instruction (if it exists and has debug metadata).
When the deferred CaseBlocks are processed, the SelectionDAGBuilder
does not have a current LLVM IR instruction, and the resulting SDNodes
will not have any debuginfo. As DwarfDebug::beginInstruction() outputs
a .loc directive for the first instruction in a labeled
block (typically the case for something coming from a CaseBlock) this
tends to produce a line-0 directive.
This patch changes the handling of CaseBlocks to store the current
instruction's debug info into the CaseBlock when it is created (and the
SelectionDAGBuilder knows the current instruction) and to always use
the stored debug info when translating a CaseBlock to SDNodes.
Patch by Frej Drejhammar!
Differential Revision: https://reviews.llvm.org/D36671
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311097 91177308-0d34-0410-b5e6-96231b3b80d8
There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences.
There is however value in enabling the masked version of the instructions with DQI.
This required introducing some new multiclasses to enabling this splitting.
Differential Revision: https://reviews.llvm.org/D36661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311091 91177308-0d34-0410-b5e6-96231b3b80d8
Mark this unsupported for now as it causes tests hangs on buildbot.
Will place it back when the problem is debugged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311089 91177308-0d34-0410-b5e6-96231b3b80d8
In the case where dfsan provides a custom wrapper for a function,
shadow parameters are added for each parameter of the function.
These parameters are i16s. For targets which do not consider this
a legal type, the lack of sign extension information would cause
LLVM to generate anyexts around their usage with phi variables
and calling convention logic.
Address this by introducing zero exts for each shadow parameter.
Reviewers: pcc, slthakur
Differential Revision: https://reviews.llvm.org/D33349
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311087 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Generate the type table from the types used by a target rather than hard-coding
the union of types used by all targets.
Depends on D36084
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36085
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311084 91177308-0d34-0410-b5e6-96231b3b80d8
VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This
patch introduces the VPlan model into LV and uses it to represent the vectorized
code and drive the generation of vectorized IR.
In this patch VPlan models the vectorized loop body: the vectorized control-flow
is represented using VPlan's Hierarchical CFG, with predication refactored from
being a post-vectorization-step into a vectorization planning step modeling
if-then VPRegionBlocks, and generating code inline with non-predicated code. The
vectorized code within each VPBasicBlock is represented as a sequence of
Recipes, each responsible for modelling and generating a sequence of IR
instructions. To keep the size of this commit manageable the Recipes in this
patch are coarse-grained and capture large chunks of LV's code-generation logic.
The constructed VPlans are dumped in dot format under -debug.
This commit retains current vectorizer output, except for minor instruction
reorderings; see associated modifications to lit tests.
For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst
and its references.
Authors: Gil Rapaport and Ayal Zaks
Differential Revision: https://reviews.llvm.org/D32871
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311077 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Support the case where an operand of a pattern is also the whole of the
result pattern. In this case the original result and all its uses must be
replaced by the operand. However, register class restrictions can require
a COPY. This patch handles both cases by always emitting the copy and
leaving it for the register allocator to optimize.
The previous commit failed on Windows machines due to a flaw in the sort
predicate which allowed both A < B < C and B == C to be satisfied
simultaneously. The cause of this was some sloppiness in the priority order of
G_CONSTANT instructions compared to other instructions. These had equal priority
because it makes no difference, however there were operands had higher priority
than G_CONSTANT but lower priority than any other instruction. As a result, a
priority order between G_CONSTANT and other instructions must be enforced to
ensure the predicate defines a strict weak order.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311076 91177308-0d34-0410-b5e6-96231b3b80d8
The idea of this patch is to continue the scheduler state over an MBB boundary
in the case where the successor block has only one predecessor. This means
that the scheduler will continue in the successor block (after emitting any
branch instructions) with e.g. maintained processor resource counters.
Benchmarks have been confirmed to benefit from this.
The algorithm in MachineScheduler.cpp that extracts scheduling regions of an
MBB has been extended so that the strategy may optionally reverse the order
of processing the regions themselves. This is controlled by a new method
doMBBSchedRegionsTopDown(), which defaults to false.
Handling the top-most region of an MBB first also means that a top-down
scheduler can continue the scheduler state across any scheduling boundary
between to regions inside MBB.
Review: Ulrich Weigand, Matthias Braun, Andy Trick.
https://reviews.llvm.org/D35053
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311072 91177308-0d34-0410-b5e6-96231b3b80d8
When v1i1 is legal (e.g. AVX512) the legalizer can reach
a case where a v1i1 SETCC with an illgeal vector type operand
wasn't scalarized (since v1i1 is legal) but its operands does
have to be scalarized. This used to assert because SETCC was
missing from the vector operand scalarizer.
This patch attemps to teach the legalizer to handle these cases
by scalazring the operands, converting the node into a scalar
SETCC node.
Differential revision: https://reviews.llvm.org/D36651
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311071 91177308-0d34-0410-b5e6-96231b3b80d8
If we want to substitute the relocation of derived pointer with gep of base then
we must ensure that relocation of base dominates the relocation of derived pointer.
Currently only check for basic block is present. However it is possible that both
relocation are in the same basic block but relocation of derived pointer is defined
earlier.
The patch moves the relocation of base pointer right before relocation of derived
pointer in this case.
Reviewers: sanjoy,artagnon,igor-laevsky,reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36462
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311067 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r311038.
Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change. Reverting while
I investigate further.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311062 91177308-0d34-0410-b5e6-96231b3b80d8
When lowering a VLA, we emit a __chstk call. However, this call can
internally clobber CPSR. We did not mark this register as an ImpDef,
which could potentially allow a comparison to be hoisted above the call
to `__chkstk`. In such a case, the CPSR could be clobbered, and the
check invalidated. When the support was initially added, it seemed that
the call would take care of preventing CPSR from being clobbered, but
this is not the case. Mark the register as clobbered to fix a possible
state corruption.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311061 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
I didn't notice any performance impact when bootstrapping clang with this patch.
The patch was originally committed in r311039 and reverted in r311049.
This revision fixes the problem with not adding a dependency on the
DominatorTreeWrapperPass for the LegacyPassManager.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311057 91177308-0d34-0410-b5e6-96231b3b80d8
This way we can see what the current codegen looks like.
I've also explicitly added/removed the cmov attribute from the RUN lines,
so we know exactly what we're checking in the runs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311052 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch teaches ADCE to preserve both DominatorTrees and PostDominatorTrees.
I didn't notice any performance impact when bootstrapping clang with this patch.
Reviewers: dberlin, chandlerc, sanjoy, davide, grosser, brzycki
Reviewed By: davide
Subscribers: grandinj, zhendongsu, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D35869
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311039 91177308-0d34-0410-b5e6-96231b3b80d8
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311038 91177308-0d34-0410-b5e6-96231b3b80d8
Debug information for TLS variables on MIPS might have R_MIPS_TLS_DTPREL32
or R_MIPS_TLS_DTPREL64 relocations. This patch adds a support for such
relocations in the `RelocVisitor`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311031 91177308-0d34-0410-b5e6-96231b3b80d8
To clear assumptions that are potentially invalid after trivialization, we need
to walk the use/def chain. Normally, the only way to reach an instruction with
an unsized type is via an instruction that has side effects (or otherwise will
demand its input bits). That would stop the walk. However, if we have a
readnone function that returns an unsized type (e.g., void), we must avoid
asking for the demanded bits of the function call's return value. A
void-returning readnone function is always dead (and so we can stop walking the
use/def chain here), but the check is necessary to avoid asserting.
Fixes PR34211.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311014 91177308-0d34-0410-b5e6-96231b3b80d8
I'll make this a verifier check to catch other violations. This
commit fixes the tests already in tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311004 91177308-0d34-0410-b5e6-96231b3b80d8
If a variable has an explicit section such as .sdata or .sbss, it is placed
in that section and accessed in a gp relative manner. This overrides the global
-G setting.
Otherwise if a variable has a explicit section attached to it, such as '.rodata'
or '.mysection', it is not placed in the small data section. This also overrides
the global -G setting.
Reviewers: atanasyan, nitesh.jain
Differential Revision: https://reviews.llvm.org/D36616
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311001 91177308-0d34-0410-b5e6-96231b3b80d8
- Set the default runtime unroll count to 4 and use the newly added
UnrollRemainder option.
- Create loop cost and force unroll for a cost less than 12.
- Disable unrolling on Thumb1 only targets.
Differential Revision: https://reviews.llvm.org/D36134
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310997 91177308-0d34-0410-b5e6-96231b3b80d8
Hook up the -k option (that in the original GNU dlltool removes the
@n suffix from the symbol that the final executable ends up linked to).
In llvm-dlltool, make sure that functions end up with the undecorate
name type if this option is set and they are decorated. In mingw, when
creating import libraries from def files instead of creating an import
library as a side effect of linking a DLL, the symbol names in the def
contain the stdcall/fastcall decoration (but no leading underscore).
By setting the undecorate name type, a linker linking to the import
library will omit the decoration from the DLL import entry.
With this in place, mingw-w64 for i386 built with llvm-dlltool/clang
produces import libraries that actually work.
Differential Revision: https://reviews.llvm.org/D36548
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310990 91177308-0d34-0410-b5e6-96231b3b80d8