Commit Graph

17288 Commits

Author SHA1 Message Date
Nekotekina
95e7c217de X86: disable K-masks for AVX512BW+VL
Their usage often generates code that is ineffective on SKX.
Use conservative approach for xmm/ymm byte/word vectors.
2018-06-19 22:54:24 +03:00
Nekotekina
bf766f3aaf X86: optimize VSELECT for v16i8 with shl + sign bit test 2018-06-19 22:34:57 +03:00
Nekotekina
e114ebcddb X86: change v64i8 sar by 7
Use ADDUS (add with unsigned saturation)
addus(0, 0) = 0
addus(0x80, 0x80) = 0xff
2018-06-19 22:18:37 +03:00
Nekotekina
6cac8565a9 X86: combine AND+OR to VPTERNLOG 2018-06-19 22:15:33 +03:00
Nekotekina
339a277566 X86: detect AVG (alternative pattern)
Pattern doesn't use zero/sign extensions.
Also handle signed and signed-unsigned cases.
2018-06-19 22:15:33 +03:00
Nekotekina
edf822875b X86: combine inversion of VPTERNLOG 2018-06-19 22:15:33 +03:00
Nekotekina
09e968fc41 X86: detect patterns for saturation arithmetic
Includes ADDUS, ADDS, SUBUS, SUBS
Patterns use carry/overflow calculation in sign bit
Also combine some related logic into VPTERNLOG
2018-06-19 22:15:28 +03:00
Nekotekina
b76dd412cb X86: LowerShift: new algorithm for vector-vector shifts
Emit pair of shifts of double size if possible
2018-06-19 22:11:46 +03:00
Nekotekina
866da2bdb7 X86: Fix/workaround Small Code Model for JIT
Force RIP-relative jump tables and global values
Force RIP-relative all zeros / all ones constants
These things were causing crashes due to use of absolute addressing
2018-06-19 22:11:46 +03:00
Craig Topper
fbe156db3c [X86] Initialize FMA3Info directly in its constructor instead of relying on std::call_once
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.

So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.

Differential Revision: https://reviews.llvm.org/D48194

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335064 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 18:06:52 +00:00
Craig Topper
54edf4671d [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335062 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 17:51:42 +00:00
Mikhail Dvoretckii
be59eb33a0 [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.

Differential Revision: https://reviews.llvm.org/D45203


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335037 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 10:37:52 +00:00
Mikhail Dvoretckii
f356e3e089 Test commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335026 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 07:55:10 +00:00
Craig Topper
e5b799ba09 [X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter.
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.

Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.

Finally, remove the manual table from the emitter.

This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335018 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 04:24:44 +00:00
Craig Topper
023b407c11 [X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0.
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.

The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.

This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.

This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335017 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 04:24:42 +00:00
Craig Topper
fe7e53ea94 [X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCI
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335015 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 03:17:46 +00:00
Craig Topper
8187a5bc6d [X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass.
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334994 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 23:20:57 +00:00
Eric Christopher
6368868d2e Tidy comment language and explanation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334990 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 22:21:19 +00:00
Eric Christopher
2bb865e009 Pull non-lazy stub table emission into a separate function alongside
the individual stub creation to increase readability a bit in the
non-object file format specific function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334989 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 22:21:18 +00:00
Eric Christopher
ba9ac3034c Add return statements to make it clear that all of these are mutually exclusive conditions.
else if would have worked just as well, but this keeps the original readability a bit more clear.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334988 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 22:21:13 +00:00
Craig Topper
2bcbecf852 [X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source.
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334971 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 18:47:07 +00:00
Simon Pilgrim
a5ac3f909c [X86][BtVer2] Flag AVX2+ scheduler classes as unsupported
Jaguar only supports up to AVX1

Differential Revision: https://reviews.llvm.org/D48274

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334947 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 14:31:14 +00:00
Clement Courbet
637504b217 [X86] Fix NOOP sched overrides on BDW/HSW/SKL.
Summary: Noop certainly does not use resources.

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits, gchatelet

Differential Revision: https://reviews.llvm.org/D48028

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334927 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 06:48:22 +00:00
Craig Topper
c6145b53bb [X86] Create X86InstrFMA3Group objects fully in a static table instead of on the heap. NFCI
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.

Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.

This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.

This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334925 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 06:32:22 +00:00
Craig Topper
3f83344257 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334922 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 05:00:50 +00:00
Craig Topper
3ac0bd3303 [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases.
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.

Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334920 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 01:28:05 +00:00
Craig Topper
77b50e463e [X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334915 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-17 18:00:16 +00:00
Craig Topper
07beb491a5 [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling.
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.

This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334908 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-17 16:29:46 +00:00
Craig Topper
73b1acb59c [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334898 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 23:25:50 +00:00
Craig Topper
86126e4b16 [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser.
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334897 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 23:25:48 +00:00
Craig Topper
ea330bd086 [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions.
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.

VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334896 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 23:25:47 +00:00
Craig Topper
9dda3f52a1 [X86] Add more instructions to the hasUndefRegUpdate list.
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334867 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 22:25:04 +00:00
Tomasz Krupa
a36133dda7 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334849 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 18:05:24 +00:00
Craig Topper
2665835577 [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334848 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 17:56:17 +00:00
Craig Topper
f045c54f1a Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update."
There's a typo causing the build to fail.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334803 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 06:15:26 +00:00
Craig Topper
f26b24c487 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334802 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 06:11:36 +00:00
Craig Topper
055e27a2ce [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334800 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 05:49:19 +00:00
Craig Topper
61c89e393d [X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334785 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 04:42:54 +00:00
Sanjay Patel
a641736633 [x86] be more selective about converting 'and' to shuffle (PR37749)
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().

We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly 
are not worthwhile for AVX1. 

I think in most cases we are able to recover by converting 
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334759 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 19:55:02 +00:00
Craig Topper
b261e10390 [X86] Fix stale comment in folding tables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334758 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 19:28:31 +00:00
Craig Topper
b179f7438d [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334728 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 15:40:31 +00:00
Craig Topper
a56239d6e3 [X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334727 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 15:40:30 +00:00
Craig Topper
8a9fc632eb [X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334726 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 15:40:29 +00:00
Craig Topper
474c12cb48 [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.
Some of these instructions are already in the manual folding table so we should have them in the auto table too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334725 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 15:40:27 +00:00
Craig Topper
3a81c6c697 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334685 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 03:16:58 +00:00
Craig Topper
eadd795fa3 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334648 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 20:03:42 +00:00
Sanjay Patel
81eac77ab1 [x86] eliminate even more sign-bit tests with vector select
This shortcoming was noted in D47330, and the test diffs show we already 
had other examples where we failed to fold to a SHRUNKBLEND:

/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.

This patch implements an idea from D48043 and would obsolete that patch 
because it catches more cases (notable the AVX1 case that was missed there). 
All we're doing is allowing the existing transform to fire more often by 
removing the post-legalize constraint. All of the relevant feature checks 
and other predicates are left as-is.

Differential Revision: https://reviews.llvm.org/D48078


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334592 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 12:28:32 +00:00
Craig Topper
c2287211da [X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334576 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 07:19:21 +00:00
Craig Topper
dae4c99b4e [X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator.
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334563 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 00:04:08 +00:00
Craig Topper
04bd25fad1 [X86] Remove VPCOMPRESSB/W from the autogenerated load folding table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334562 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 00:04:04 +00:00