- Rename the ptx.read.* intrinsics to nvvm.read.ptx.sreg.* - some but
not all of these registers were already accessible via the nvvm
name.
- Rename ptx.bar.sync nvvm.bar.sync, to match nvvm.bar0.
There's a fair amount of code motion here, but it's all very
mechanical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274769 91177308-0d34-0410-b5e6-96231b3b80d8
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
Differential Revision: http://reviews.llvm.org/D21774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274692 91177308-0d34-0410-b5e6-96231b3b80d8
On CPUs with the zero cycle zeroing feature enabled "movi v.2d" should
be used to zero a vector register. This was previously done at
instruction selection time, however the register coalescer sometimes
widened multiple vregs to the Q width because of that leading to extra
spills. This patch leaves the decision on how to zero a register to the
AsmPrinter phase where it doesn't affect register allocation anymore.
This patch also sets isAsCheapAsAMove=1 on FMOVS0, FMOVD0.
This fixes http://llvm.org/PR27454, rdar://25866262
Differential Revision: http://reviews.llvm.org/D21826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274686 91177308-0d34-0410-b5e6-96231b3b80d8
Everywhere where cuda.syncthreads or __syncthreads is used, use the
properly namespaced nvvm.barrier0 instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274664 91177308-0d34-0410-b5e6-96231b3b80d8
On SystemZ, shift and rotate instructions only use the bottom 6 bits of the shift/rotate amount.
Therefore, if the amount is ANDed with an immediate mask that has all of the bottom 6 bits set, we
can remove the AND operation entirely.
Differential Revision: http://reviews.llvm.org/D21854
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274650 91177308-0d34-0410-b5e6-96231b3b80d8
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.
The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.
This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).
Test case based on the original problem reported.
Phabricator Review: http://reviews.llvm.org/D21802
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274645 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed.
This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase.
Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner
Subscribers: mehdi_amini, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D21692
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274644 91177308-0d34-0410-b5e6-96231b3b80d8
The prev commit failed on compilation.
A minor change in one pattern in lib/Target/X86/X86InstrAVX512.td fixes the failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274626 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-up for r273544.
The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also removes two command-line flags that weren't used in any of the
tests: widen-vmovs and swift-partial-update-clearance. The former may be easily
replaced with the mattr mechanism, but the latter may not (as it is a subtarget
property, and not a proper feature).
Differential Revision: http://reviews.llvm.org/D21797
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274620 91177308-0d34-0410-b5e6-96231b3b80d8
The patch removes redundant kmov instructions (not all, we still have a lot of work here) and redundant "and" instructions after "setcc".
I use "AssertZero" marker between X86ISD::SETCC node and "truncate" to eliminate extra "and $1" instruction.
I also changed zext, aext and trunc patterns in the .td file. It allows to remove extra "kmov" instruictions.
This patch fixes https://llvm.org/bugs/show_bug.cgi?id=28173.
Fast ISEL mode is not supported correctly for AVX-512. ICMP/FCMP scalar instruction should return result in k-reg. It will be fixed in one of the next patches. I redirected handling of "cmp" to the DAG builder mode. (The code looks worse in one specific test case, but without this fix the new patch fails).
Differential revision: http://reviews.llvm.org/D21956
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274613 91177308-0d34-0410-b5e6-96231b3b80d8
The way the named arguments for various system instructions are handled at the
moment has a few problems:
- Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
- That weird Mapping class that I have no idea what I was on when I thought
it was a good idea.
- Searches are performed linearly through the entire list.
- We print absolutely all registers in upper-case, even though some are
canonically mixed case (SPSel for example).
- The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated
to comments in our implementation, with a slightly opaque hex value
indicating the canonical encoding LLVM will use.
This adds a new TableGen backend to produce efficiently searchable tables, and
switches AArch64 over to using that infrastructure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274576 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r259387 because it inserts illegal code after legalization
in some backends where i64 OR type is illegal for example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274573 91177308-0d34-0410-b5e6-96231b3b80d8
Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274571 91177308-0d34-0410-b5e6-96231b3b80d8
Not all code-paths set the relocation model to static for Windows. This
currently breaks on Windows ARM with `-mlong-calls` when built with clang.
Loosen the assertion to what it was previously. We would ideally ensure that
all the configuration sets Windows to static relocation model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274570 91177308-0d34-0410-b5e6-96231b3b80d8
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274569 91177308-0d34-0410-b5e6-96231b3b80d8
The other use really does only care about the SDNode (it checks the
opcode against a whitelist), but bitFieldPlacement can be misled if
the node produces multiple results.
Patch by Ismail Badawi.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274567 91177308-0d34-0410-b5e6-96231b3b80d8
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.
r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274564 91177308-0d34-0410-b5e6-96231b3b80d8
The important thing I was missing was ensuring newly added constants were kept in topological order. Repositioning the node is correct if the constant is newly added (so it has no topological ordering) but wrong if it already existed - positioning it next in the worklist would break the topological ordering.
Original commit message:
[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274543 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D20443
It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274535 91177308-0d34-0410-b5e6-96231b3b80d8
We were using DAG->getConstant instead of DAG->getTargetConstant. This meant that we could inadvertently increase the use count of a constant if stars aligned, which it did in this testcase. Increasing the use count of the constant could cause ISel to fall over (because DAGToDAG lowering assumed the constant had only one use!)
Original commit message:
[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274510 91177308-0d34-0410-b5e6-96231b3b80d8
We can now handle concatenation of each source multiple times. The previous code just checked for each source to appear once in either order.
This also now handles an entire source vector sized piece having undef indices correctly. We now concat with UNDEF instead of using one of the sources. This is responsible for the test case change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274483 91177308-0d34-0410-b5e6-96231b3b80d8
After the block placement, if a block ends with a conditional branch, but the
next block is not its successor. The conditional branch should be changed to
unconditional branch. This patch fixes PR28307, PR28297, PR28402.
Differential Revision: http://reviews.llvm.org/D21811
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274470 91177308-0d34-0410-b5e6-96231b3b80d8