Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.
1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.
Differential Revision: http://reviews.llvm.org/D18094
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263446 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.
It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.
Patch by Simon Dardis.
Reviewers: vkalintiris, dsanders
Subscribers: MatzeB, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16353
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263444 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18101
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263440 91177308-0d34-0410-b5e6-96231b3b80d8
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).
Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE. However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.
This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263431 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
With the addition of checks to ensure that operands have a strict ordering
it has become tricky to manage the order in the way I originally intended.
This patch linearizes the ordering which simplifies the implementation but
requires an order that is arbitrary in places. Here are some examples:
* uimm4 < uimm5 < uimm6
* simm4 < uimm4 < simm5 < uimm5
* uimm5 < uimm5_plus1 (1..32) < uimm5_plus32 (32..63) < uimm6
The term 'superset' starts to break down here since the *_plus* classes
are not true supersets of uimm5 (but they are still subsets of uimm6).
* uimm5 < uimm5_64, and uimm5 < vsplat_uimm5
This is entirely arbitrary. We need an ordering and what we pick is
unimportant since only one is possible for a given mnemonic.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D17723
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263423 91177308-0d34-0410-b5e6-96231b3b80d8
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)
Differential Revision: http://reviews.llvm.org/D18040
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263420 91177308-0d34-0410-b5e6-96231b3b80d8
It's failing to build on VS2015 with:
C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\lib\Target\WebAssembly\WebAssemblyRegStackify.cpp(520):
error C2668: 'llvm::make_reverse_iterator': ambiguous call to overloaded function
C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(217):
note: could be 'std::reverse_iterator<llvm::MachineBasicBlock::iterator>
llvm::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(IteratorTy)'
with
[
IteratorTy=llvm::MachineInstrBundleIterator<llvm::MachineInstr>
]
C:\b\depot_tools\win_toolchain\vs_files\391bbf1220d3edcd3cc3fccdb56224181e3b13a7\win_sdk\bin\..\..\VC\include\xutility(1217):
note: or 'std::reverse_iterator<llvm::MachineBasicBlock::iterator>
std::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(_RanIt)' [found using argument-dependent lookup]
with
[
_RanIt=llvm::MachineInstrBundleIterator<llvm::MachineInstr>
]
I don't have VS2015 locally at the moment, but hopefully this will help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263418 91177308-0d34-0410-b5e6-96231b3b80d8
The motivating example is this
for (j = n; j > 1; j = i) {
i = j / 2;
}
The signed division is safely to be changed to an unsigned division (j is known
to be larger than 1 from the loop guard) and later turned into a single shift
without considering the sign bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263406 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263393 91177308-0d34-0410-b5e6-96231b3b80d8
For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status.
Example bugs: PR14666, PR22703
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263384 91177308-0d34-0410-b5e6-96231b3b80d8
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263383 91177308-0d34-0410-b5e6-96231b3b80d8
Fundamentally, the length of a variable or function name is bound by the
maximum size of a record: 0xffff. However, the name doesn't live in a
vacuum; other data is associated with the name, lowering the bound
further.
We would naively attempt to emit the name, causing us to assert because
the record would no-longer fit in 16-bits. Instead, truncate the name
but preserve as much as we can.
While I have tested this locally, I've decided to not commit it due to
the test's size.
N.B. While this behavior is undesirable, it is better than MSVC's
behavior. They seem to truncate to ~4000 characters.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263378 91177308-0d34-0410-b5e6-96231b3b80d8
It had a weird artificial limitation on the write side: the comdat name
couldn't be bigger than 2**16. However, the reader had no such
limitation. Make the reader and the writer agree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263377 91177308-0d34-0410-b5e6-96231b3b80d8
Check to see if all operands are constant before calling simplify on them
so that we don't perform wasted simplifications.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263374 91177308-0d34-0410-b5e6-96231b3b80d8
This follows up on the related AVX instruction transforms, but this
one is too strange to do anything more with. Intel's behavioral
description of this instruction in its Software Developer's Manual
is tragi-comic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263340 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D17712
We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263338 91177308-0d34-0410-b5e6-96231b3b80d8