Convert a vector load intrinsic into an llvm load instruction.
This is beneficial when the underlying object being addressed
comes from a constant, since we get constant-folding for free.
Differential Revision: https://reviews.llvm.org/D46273
llvm-svn: 333643
Summary:
- I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
- For ZnVer1 and Atom, values were transferred form `InstRW`s.
- For SLM and BtVer2, values are from Agner.
This is split off from https://reviews.llvm.org/D47377
Reviewers: RKSimon, andreadb
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D47523
llvm-svn: 333642
This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426).
llvm-svn: 333641
Summary:
Both (Apple and DWARF5) implementations of the iterators had bugs which
resulted in crashes if one attempted to iterate through the accelerator
tables all the way.
For the Apple tables, the issue was that we did not clear the DataOffset
field when we reached the end, which made our iterator compare unequal
to the "end" iterator. For the Dwarf5 tables, the problem was that we
incremented the CurrentIndex pointer and then used the incremented
(possibly invalid) pointer to check whether we have reached the end of
the index list.
The reason these bugs went undetected is because their only user
(dwarfdump) only ever searched for the first match. Besides allowing us
to test this fix, changing llvm-dwarfdump --find to display all matches
seems like a good improvement (it makes the behavior consistent with the
--name option), so I change llvm-dwarfdump to do that.
The existing tests would be sufficient to test this fix with the new
llvm-dwarfdump behavior, but I add a special test that demonstrates that
the tool indeed displays multiple results. The find.test test needed to
be tweaked a bit as the tool now does not print the ".debug_info
contents" header (also consistent with how --name works).
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D47543
llvm-svn: 333635
Summary:
I'm slowly looking into a new X86 scheduler model,
for AMD Bulldozer CPU, model 2 (bdver2, Piledriver).
And naturally, i have hit that assert :)
I happened to know what it meant, and how to fix it,
but that is not too common knowledge.
Reviewers: courbet, RKSimon
Reviewed By: courbet
Subscribers: tschuett, llvm-commits, craig.topper
Differential Revision: https://reviews.llvm.org/D47572
llvm-svn: 333632
In post-commit review, Eric Christopher notes that many
new MSan warnings are being observed with this patch.
The probable reason is: if 'y' is undef here and we could
evaluate it twice and get different results.
We can't increase the number of uses of a value.
llvm-svn: 333631
Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:
- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.
MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.
New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.
Differential Revision: https://reviews.llvm.org/D47509
llvm-svn: 333629
v2: use "ensureAlignment"
make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"
Differential Revision: https://reviews.llvm.org/D47516
llvm-svn: 333622
This fixes projects/compiler-rt/test/fuzzer/sigusr.test, which was
broken by r333614. The trouble was that "&&" changes the command for
which "$!" gives the pid.
llvm-svn: 333620
(Relands r333584, reverted in 333592.)
When debugging test failures with -vv (or -v in the case of the
internal shell), this makes it easier to locate the RUN line that
failed. For example, clang's test/Driver/linux-ld.c has 892 total RUN
lines, and clang's test/Driver/arm-cortex-cpus.c has 424 RUN lines
after concatenation for line continuations.
When reading the generated shell script, this also makes it easier to
locate the RUN line that produced each command.
To support reporting RUN line numbers in the case of the internal
shell, this patch extends the internal shell to support the null
command, ":", except pipelines are not supported.
To support reporting RUN line numbers in the case of windows cmd.exe
as the external shell, this patch extends -vv to set "echo on" instead
of "echo off" in bat files. (Support for windows cmd.exe as a lit
external shell will likely be dropped later, but I found out too
late.)
Reviewed By: delcypher, asmith, stella.stamenova, jmorse, lebedev.ri, rnk
Differential Revision: https://reviews.llvm.org/D44598
llvm-svn: 333614
Don't always:
cast (select (cmp x, y), z, C) --> select (cmp x, y), (cast z), C'
This is something that came up as far back as D26556, and I lost track of it.
I suspect that this transform is part of the underlying problem that is
inspiring some of the recent proposals that seek to match larger patterns
that include a cast op. Even if that's not true, this transform causes
problems for codegen (particularly with vector types).
A transform to actively match the size of cmp and select operand sizes should
follow. This patch just removes the harmful canonicalization in the other
direction.
Differential Revision: https://reviews.llvm.org/D47163
llvm-svn: 333611
Summary:
Fix PR37625. It's possible for an extern_weak declaration to be emitted
to the merged module when a definition exists in the ThinLTO portion of
the build; discard the linkage on the declaration in that case.
(otherwise we copy the linkage to the alias to the jumptable and fail)
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D47494
llvm-svn: 333604
Looks like we intended to compare this->Members with Other->Members
here, but ended up comparing this->Members with this->Members. Oops. :)
Since CongruenceClass::Members is a SmallPtrSet anyway, we can probably
skip building std::sets if we're willing to write a bit more code.
This appears to be no functional change (for sufficiently lax values of
"no"): this equality check was only being called inside of an assert.
So, worst case, we'll catch more bugs in the form of assertion failures.
Thanks to d0k for noting this!
llvm-svn: 333601
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333597
By using std::shared_ptr for TreePatternNode, we can avoid leaking them.
Reviewers: craig.topper, dsanders, stoklund, tstellar, zturner
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D47463
llvm-svn: 333591
Summary:
Creating the IRBuilder methods:
CreateElementUnorderedAtomicMemSet
CreateElementUnorderedAtomicMemMove
These mirror the methods that create calls to the regular (non-atomic) memmove and
memset intrinsics.
llvm-svn: 333588
(Relands r330755 (reverted in r330848) with fix for PR37239.)
When debugging test failures with -vv (or -v in the case of the
internal shell), this makes it easier to locate the RUN line that
failed. For example, clang's test/Driver/linux-ld.c has 892 total RUN
lines, and clang's test/Driver/arm-cortex-cpus.c has 424 RUN lines
after concatenation for line continuations.
When reading the generated shell script, this also makes it easier to
locate the RUN line that produced each command.
To support reporting RUN line numbers in the case of the internal
shell, this patch extends the internal shell to support the null
command, ":", except pipelines are not supported.
To support reporting RUN line numbers in the case of windows cmd.exe
as the external shell, this patch extends -vv to set "echo on" instead
of "echo off" in bat files. (Support for windows cmd.exe as a lit
external shell will likely be dropped later, but I found out too
late.)
Reviewed By: delcypher, asmith, stella.stamenova, jmorse, lebedev.ri, rnk
Differential Revision: https://reviews.llvm.org/D44598
llvm-svn: 333584
The set properties are never used, so a vector is enough. No
functionality change intended.
While there add some std::moves to SparseSolver.
llvm-svn: 333582
Per discussion on the generic-abi mailing list:
https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4
An object file manipulation tool must either write out a symbol
table with the same number of entries as the original symbol table
and in the same order, or if this is impossible, refuse to operate
on the object file if it has unrecognized sections that are linked
to the symtab section. However, existing tools (namely GNU strip,
GNU objcopy and ld.{bfd,gold,lld} -r) do not comply with this at
present: they change symbol table indexes and set sh_link to 0 on
the unrecognized symtab-linked sections.
We intend to use the latter as a (temporary) signal that a tool has
operated on a proposed new symtab-linked section and invalidated the
symbol table indexes. However, llvm-objcopy currently keeps sh_link
pointing to the new symtab section. This patch changes llvm-objcopy
to set sh_link to 0 to match the behaviour of the other tools.
Differential Revision: https://reviews.llvm.org/D47404
llvm-svn: 333581
Created the IsSplatValue helper from the splat detection code in LowerScalarVariableShift as a first NFC step towards improving support for splat rotations, which is an extension of PR37426.
llvm-svn: 333580
This commit adds a simple verifier that tracks type indices being
touched by legalization rules' builders.
Every target will now have an opportunity to call
LegalizerInfo::verify(...) at the end of its derived LegalizerInfo's
constructor and check there are no obvious mistakes like checking only
first type for an opcode that has more than one type index and therefore
implicitly declaring any type for the second (and higher) type index
legal.
The check is only ran in assert builds and should have very minor
performance impact in assert builds and none in release builds.
This commit does not add LegalizerInfo::verify(...) calls to
target-specific legalizers, look for separate commits for that.
This commit also doesn't make the verification errors fatal, only
produces an error message, look for a later commit that does.
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333576
When printing string in the Plist, we weren't escaping the characters
which lead to invalid XML. This patch adds the escape logic to
StringExtras.
rdar://39785334
llvm-svn: 333565
Making LegalizeRuleSet's implementation a little more dumb and
straightforward to make it easier to read and change, in particular in
order to add the initial version of LegalizerInfo verifier
Reviewers: aemerson, qcolombet
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D46338
llvm-svn: 333562
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.
llvm-svn: 333558
Summary:
The isKnownNonZero() function have checks that abort the recursion when
it reaches the specified max depth. However one of the recursive calls
was placed before the max depth check was done, resulting in a endless
recursion that eventually triggered a segmentation fault.
Fixed the problem by moving the max depth check above the first
recursive call.
Reviewers: Prazek, nlopes, spatel, craig.topper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bjope, llvm-commits
Differential Revision: https://reviews.llvm.org/D47531
llvm-svn: 333557
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.
Differential Revision: https://reviews.llvm.org/D47488
llvm-svn: 333556
Support for Clang lowering of fused intrinsics. This patch:
1. Removes bindings to clang fma intrinsics.
2. Introduces new LLVM unmasked intrinsics with rounding mode:
int_x86_avx512_vfmadd_pd_512
int_x86_avx512_vfmadd_ps_512
int_x86_avx512_vfmaddsub_pd_512
int_x86_avx512_vfmaddsub_ps_512
supported with a new intrinsic type (INTR_TYPE_3OP_RM).
3. Introduces new x86 fmaddsub/fmsubadd folding.
4. Introduces new tests for code emitted by sequentions introduced in Clang part.
Patch by tkrupa
Reviewers: craig.topper, sroland, spatel, RKSimon
Reviewed By: craig.topper, RKSimon
Differential Revision: https://reviews.llvm.org/D47443
llvm-svn: 333554
Summary:
The atomic variants of the memcpy/memmove/memset intrinsics can be treated
the same was as the regular forms, with respect to aliasing. Update the
AliasSetTracker to treat the atomic forms the same was as the regular forms.
llvm-svn: 333551
Turning a table lookup intrinsic into a shuffle vector instruction
can be beneficial. If the mask used for the lookup is the constant
vector {7,6,5,4,3,2,1,0}, then the back-end generates byte reverse
instructions instead.
Differential Revision: https://reviews.llvm.org/D46133
llvm-svn: 333550
It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes.
This adds vzeroupper at the end of some tests as we lose the 'FeatureFastPartialYMMorZMMWrite' feature from KNL, since Skylake+ don't support this its probably better.
llvm-svn: 333549
Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY .
Reviewers: rogfer01, efriedma, rengolin, javed.absar
Subscribers: kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D47413
llvm-svn: 333544