Force RIP-relative jump tables and global values
Force RIP-relative all zeros / all ones constants
These things were causing crashes due to use of absolute addressing
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.
So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.
Differential Revision: https://reviews.llvm.org/D48194
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335064 91177308-0d34-0410-b5e6-96231b3b80d8
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.
Differential Revision: https://reviews.llvm.org/D45203
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335037 91177308-0d34-0410-b5e6-96231b3b80d8
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.
Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.
Finally, remove the manual table from the emitter.
This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335018 91177308-0d34-0410-b5e6-96231b3b80d8
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.
The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.
This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.
This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335017 91177308-0d34-0410-b5e6-96231b3b80d8
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335015 91177308-0d34-0410-b5e6-96231b3b80d8
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334994 91177308-0d34-0410-b5e6-96231b3b80d8
the individual stub creation to increase readability a bit in the
non-object file format specific function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334989 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334971 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.
Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.
This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.
This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334925 91177308-0d34-0410-b5e6-96231b3b80d8
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.
Also remove the vpextrw.s EVEX alias. That's not something gas implements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334922 91177308-0d34-0410-b5e6-96231b3b80d8
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.
Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334920 91177308-0d34-0410-b5e6-96231b3b80d8
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334915 91177308-0d34-0410-b5e6-96231b3b80d8
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.
This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334908 91177308-0d34-0410-b5e6-96231b3b80d8
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334897 91177308-0d34-0410-b5e6-96231b3b80d8
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.
VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334896 91177308-0d34-0410-b5e6-96231b3b80d8
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334867 91177308-0d34-0410-b5e6-96231b3b80d8
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334848 91177308-0d34-0410-b5e6-96231b3b80d8
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334802 91177308-0d34-0410-b5e6-96231b3b80d8
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.
There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334800 91177308-0d34-0410-b5e6-96231b3b80d8
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().
We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly
are not worthwhile for AVX1.
I think in most cases we are able to recover by converting
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334759 91177308-0d34-0410-b5e6-96231b3b80d8
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334728 91177308-0d34-0410-b5e6-96231b3b80d8
Some of these instructions are already in the manual folding table so we should have them in the auto table too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334725 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.
This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll
Reviewers: RKSimon, gbedwell, spatel
Reviewed By: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D47993
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334685 91177308-0d34-0410-b5e6-96231b3b80d8
This shortcoming was noted in D47330, and the test diffs show we already
had other examples where we failed to fold to a SHRUNKBLEND:
/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.
This patch implements an idea from D48043 and would obsolete that patch
because it catches more cases (notable the AVX1 case that was missed there).
All we're doing is allowing the existing transform to fire more often by
removing the post-legalize constraint. All of the relevant feature checks
and other predicates are left as-is.
Differential Revision: https://reviews.llvm.org/D48078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334592 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334563 91177308-0d34-0410-b5e6-96231b3b80d8