This reverts commit r299287 plus clean-ups.
The localizer pass is a helper pass that could be run at O0 in the GISel
pipeline to work around the deficiency of the fast register allocator.
It basically shortens the live-ranges of the constants so that the
allocator does not spill all over the place.
Long term fix would be to make the greedy allocator fast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304051 91177308-0d34-0410-b5e6-96231b3b80d8
One case in BranchRelaxation did not compute liveins after creating a
new block. This is catched by existing tests with an upcoming commit
that will improve MachineVerifier checking of livein lists.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304049 91177308-0d34-0410-b5e6-96231b3b80d8
Consistent with GCC and addresses a shortcoming with ThinLTO where many
imported CUs may end up being empty (because the functions imported from
them either ended up not being used (and were then discarded, since
they're imported as available_externally) or optimized away entirely).
Test cases previously testing empty CUs (either intentionally, or
because they didn't need anything more complicated) had a trivial 'int'
or similar basic type added to their retained types list.
This is a first order approximation - a deeper implementation could do
things like:
1) Be more lazy about construction of the CU - for example if two CUs
containing a single identical retained type are linked together, with
this change one of the two CUs will be produced but empty (since a
duplicate type won't be produced).
2) Go further and invert all the CU links the same way the subprogram
link is inverted - keep named CU lists of retained types, macros, etc,
and have those link back to the CU. Then if they're emitted, the CU is
emitted, but never otherwise - this would allow the metadata itself to
be dropped earlier too, though it seems unlikely that's an important
optimization as there shouldn't be many CUs relative to the number of
other entities.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304020 91177308-0d34-0410-b5e6-96231b3b80d8
This produced 'strange' DWARF anyway - the CU would have no ranges (or
at least not a range including the inlined code) nor any subprogram or
inlined_subroutine - yet the line table would have entries for these
instructions.
(this actually becomes more relevant with changes coming after this,
where a CU without any contents will be omitted entirely - so there
would be no line table to put this on anyway)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304004 91177308-0d34-0410-b5e6-96231b3b80d8
Re-commit r303938 and r303954 with a fix for addLiveIns(): the internal
addPristines() function must be called on an empty set or it may
accidentally reset saved registers.
- addLiveOutsNoPristines() needs to add callee saved registers that are
actually saved and restored somewhere to the set (they are not
pristine).
- Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines().
This fixes the problem from D32156.
Differential Revision: https://reviews.llvm.org/D32464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304001 91177308-0d34-0410-b5e6-96231b3b80d8
In the best case:
extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN
...we kill all of the extract/concat and just have narrow binops remaining.
If only one of the binop operands is amenable, this transform is still
worthwhile because we kill some of the extract/concat.
Optional bitcasting makes the code more complicated, but there doesn't
seem to be a way to avoid that.
The TODO about extending to more than bitwise logic is there because we really
will regress several x86 tests including madd, psad, and even a plain
integer-multiply-by-2 or shift-left-by-1. I don't think there's anything
fundamentally wrong with this patch that would cause those regressions; those
folds are just missing or brittle.
If we extend to more binops, I found that this patch will fire on at least one
non-x86 regression test. There's an ARM NEON test in
test/CodeGen/ARM/coalesce-subregs.ll with a pattern like:
t5: v2f32 = vector_shuffle<0,3> t2, t4
t6: v1i64 = bitcast t5
t8: v1i64 = BUILD_VECTOR Constant:i64<0>
t9: v2i64 = concat_vectors t6, t8
t10: v4f32 = bitcast t9
t12: v4f32 = fmul t11, t10
t13: v2i64 = bitcast t12
t16: v1i64 = extract_subvector t13, Constant:i32<0>
There was no functional change in the codegen from this transform from what I
could see though.
For the x86 test changes:
1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case,
but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops,
there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op.
SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat
ops to match the pattern.
2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract.
Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026
3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT.
4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction
count by one in each case by eliminating two insert/extract while adding one narrower logic op.
https://bugs.llvm.org/show_bug.cgi?id=32790
Differential Revision: https://reviews.llvm.org/D33137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303997 91177308-0d34-0410-b5e6-96231b3b80d8
Currently getOptimalMemOpType returns i32 for large enough sizes without
checking for alignment, leading to poor code generation when misaligned accesses
aren't permitted as we generate a word store then later split it up into byte
stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for
memset we splat the memset value into a word then immediately split it up
again.
Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type
to use, but also fix a bug there where it wasn't correctly checking if
misaligned memory accesses are allowed.
Differential Revision: https://reviews.llvm.org/D33442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303990 91177308-0d34-0410-b5e6-96231b3b80d8
Re-commit r303937 + r303949 as they were not the cause for the build
failures.
We do not track liveness of reserved registers so adding them to the
liveins list in computeLiveIns() was completely unnecessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303970 91177308-0d34-0410-b5e6-96231b3b80d8
Tentatively revert this to see if it fixes the buildbot stage2
breakages.
This reverts commit r303938.
This reverts commit r303954.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303960 91177308-0d34-0410-b5e6-96231b3b80d8
Tentatively revert, suspecting that it caused breakage in stage2
buildbots.
This reverts commit r303949.
This reverts commit r303937.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303955 91177308-0d34-0410-b5e6-96231b3b80d8
We may have situations in which a superregister is reserved and not
added to liveins, so we have to add the subregisters.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303949 91177308-0d34-0410-b5e6-96231b3b80d8
- addLiveOutsNoPristines() needs to add callee saved registers that are
actually saved and restored somewhere to the set (they are not
pristine).
- Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines().
This fixes the problem from D32156.
Differential Revision: https://reviews.llvm.org/D32464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303938 91177308-0d34-0410-b5e6-96231b3b80d8
We do not track liveness of reserved registers so adding them to the
liveins list in computeLiveIns() was completely unnecessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303937 91177308-0d34-0410-b5e6-96231b3b80d8
Previously this code was defensive to the situation in which the debug
info scopes would lead to a different subprogram from the subprogram in
the CU's subprogram list (this could've happened with linkonce
functions, etc as per the comment being removed). Since the CU<>SP link
reversal this is no longer possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303933 91177308-0d34-0410-b5e6-96231b3b80d8
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303921 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, every time we wanted to serialize a field list record, we
would create a new copy of FieldListRecordBuilder, which would in turn
create a temporary instance of TypeSerializer, which itself had a
std::vector<> that was about 128K in size. So this 128K allocation was
happening every time. We can re-use the same instance over and over, we
just have to clear its internal hash table and seen records list between
each run. This saves us from the constant re-allocations.
This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests.
Differential Revision: https://reviews.llvm.org/D33506
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303919 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to
parse the rest of the DIEs when building gdb-index. This causes gold to
trip over LLVM's output when there are DW_FORM_ref_addr present.
Gold does use the presence of a debug_gnu_pub{names,types} entry for the
CU to skip parsing the debug_info portion, so make sure that's included
even when empty (technically, when empty there couldn't be any ref_addr
anyway - it only came up when gmlt didn't produce any (even non-empty)
pubnames - but given what that reveals about gold's implementation, this
seems like a good thing to do for consistency).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303894 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a fix for PR32538. MachineCSE first looks at MO.isDead(), but
if it is not marked dead, MachineCSE still wants to do its own check
to see if it is trivially dead. This check for the trivial case
assumed that physical registers cannot be live out of a block.
Patch by Mattias Eriksson.
Reviewers: qcolombet, jbhateja
Reviewed By: qcolombet, jbhateja
Subscribers: jbhateja, llvm-commits
Differential Revision: https://reviews.llvm.org/D33408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303731 91177308-0d34-0410-b5e6-96231b3b80d8
C++14 added user-defined literal support for complex numbers so that you can
write something like "complex<double> val = 2i". However, there is an existing
GNU extension supporting this syntax and interpreting the result as a _Complex
type.
This changes parsing so that such literals are interpreted in terms of C++14's
operators if an overload is present but otherwise falls back to the original
GNU extension.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303694 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
similar to GenericScheduler::tryCandidate.
This change increases the number of AES instruction pairs generated on
Cortex-A57 and Cortex-A72. This doesn't change code at all in
most benchmarks or general code, but we've seen improvement on kernels
using AESE/AESMC and AESD/AESIMC.
Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB
Reviewed By: evandro
Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33230
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303618 91177308-0d34-0410-b5e6-96231b3b80d8
MachineInstructions that don't generate any code (such as
IMPLICIT_DEFs) should not generate any debug info either.
Fixes PR33107.
https://bugs.llvm.org/show_bug.cgi?id=33107
This reapplies r303566 without any modifications. The stage2 build
failures persisted even after reverting this patch, and looking back
through history, it looks like these tests are flaky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303575 91177308-0d34-0410-b5e6-96231b3b80d8
Refactor the strlen optimization code to work for both strlen and wcslen.
This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.
This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.
Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.
The refactoring "accidentally" fixes http://llvm.org/PR32124
Differential Revision: https://reviews.llvm.org/D32839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303461 91177308-0d34-0410-b5e6-96231b3b80d8
This was originally reverted because it was a breaking a bunch
of bots and the breakage was not surfacing on Windows. After much
head-scratching this was ultimately traced back to a bug in the
lit test runner related to its pipe handling. Now that the bug
in lit is fixed, Windows correctly reports these test failures,
and as such I have finally (hopefully) fixed all of them in this
patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303446 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation.
As per discussion in D32756 .
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32916
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303441 91177308-0d34-0410-b5e6-96231b3b80d8
This is a squash of ~5 reverts of, well, pretty much everything
I did today. Something is seriously broken with lit on Windows
right now, and as a result assertions that fire in tests are
triggering failures. I've been breaking non-Windows bots all
day which has seriously confused me because all my tests have
been passing, and after running lit with -a to view the output
even on successful runs, I find out that the tool is crashing
and yet lit is still reporting it as a success!
At this point I don't even know where to start, so rather than
leave the tree broken for who knows how long, I will get this
back to green, and then once lit is fixed on Windows, hopefully
hopefully fix the remaining set of problems for real.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303409 91177308-0d34-0410-b5e6-96231b3b80d8
pruneSubRegValues() needs to remove subregister ranges starting at
instructions that later get removed by eraseInstrs(). It missed to check
one case in which eraseInstrs() would remove an instruction.
Fixes http://llvm.org/PR32688
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303396 91177308-0d34-0410-b5e6-96231b3b80d8
Right now we have multiple notions of things that represent collections of
types. Most commonly used are TypeDatabase, which is supposed to keep
mappings from TypeIndex to type name when reading a type stream, which
happens when reading PDBs. And also TypeTableBuilder, which is used to
build up a collection of types dynamically which we will later serialize
(i.e. when writing PDBs).
But often you just want to do some operation on a collection of types, and
you may want to do the same operation on any kind of collection. For
example, you might want to merge two TypeTableBuilders or you might want
to merge two type streams that you loaded from various files.
This dichotomy between reading and writing is responsible for a lot of the
existing code duplication and overlapping responsibilities in the existing
CodeView library classes. For example, after building up a
TypeTableBuilder with a bunch of type records, if we want to dump it we
have to re-invent a bunch of extra glue because our dumper takes a
TypeDatabase or a CVTypeArray, which are both incompatible with
TypeTableBuilder.
This patch introduces an abstract base class called TypeCollection which
is shared between the various type collection like things. Wherever we
previously stored a TypeDatabase& in some common class, we now store a
TypeCollection&.
The advantage of this is that all the details of how the collection are
implemented, such as lazy deserialization of partial type streams, is
completely transparent and you can just treat any collection of types the
same regardless of where it came from.
Differential Revision: https://reviews.llvm.org/D33293
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303388 91177308-0d34-0410-b5e6-96231b3b80d8
This also reverts follow-ups r303292 and r303298.
It broke some Chromium tests under MSan, and apparently also internal
tests at Google.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303369 91177308-0d34-0410-b5e6-96231b3b80d8