Commit Graph

12073 Commits

Author SHA1 Message Date
Derek Schuff
c76507d03c x32. Fixes a bug in i8mem_NOREX declaration.
The old implementation assumed LP64 which is broken for x32.  Specifically, the
MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit
physreg copy instruction' error message to be reported.

This patch also enable the h-register*ll tests for x32.

Differential Revision: http://reviews.llvm.org/D12336

Patch by João Porto

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247058 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 19:47:15 +00:00
Andrew Kaylor
b25ffb37c6 Fix for bz24500: Avoid non-deterministic code generation triggered by the x86 call frame optimization
Patch by Dave Kreitzer

Differential Revision: http://reviews.llvm.org/D12620



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247042 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 18:18:46 +00:00
Igor Breger
b23094366e AVX512: kunpck encoding implementation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D12061

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247010 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 13:10:00 +00:00
Elena Demikhovsky
1c82e5f791 Removed an old comment, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247006 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 12:22:22 +00:00
Elena Demikhovsky
27828d7a5e compilation issue, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246983 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 07:34:06 +00:00
Elena Demikhovsky
758b9df87a fixed compilation issue, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246982 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 07:10:08 +00:00
Elena Demikhovsky
1e00496f88 AVX-512: Lowering for 512-bit vector shuffles.
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.

Differential Revision: http://reviews.llvm.org/D10683



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246981 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-08 06:38:21 +00:00
Reid Kleckner
412db355b3 Sink COFF.h MC include into .cpp files
This prevents MC clients from getting COFF.h, which conflicts with
winnt.h macros. Also a minor IWYU cleanup. Now the only public headers
including COFF.h are in Object, and they actually need it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246784 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-03 16:41:50 +00:00
Sanjay Patel
eb8298cfe1 [x86] enable machine combiner reassociations for scalar 'xor' insts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246781 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-03 16:36:16 +00:00
Igor Breger
d951d3c8df AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, vpconflictq, vpconflictd
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11931

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246750 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-03 09:05:31 +00:00
Ahmed Bougacha
1522e8d8f8 [X86] Require 32-byte alignment for 32-byte VMOVNTs.
We used to accept (and even test, and generate) 16-byte alignment
for 32-byte nontemporal stores, but they require 32-byte alignment,
per SDM. Found by inspection.

Instead of hardcoding 16 in the patfrag, check for natural alignment.
Also fix the autoupgrade and the various tests.

Also, use explicit -mattr instead of -mcpu: I stared at the output
several minutes wondering why I get 2x movntps for the unaligned
case (which is the ideal output, but needs some work: see FIXME),
until I remembered corei7-avx implies +slow-unaligned-mem-32.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246733 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 23:25:39 +00:00
Ahmed Bougacha
074165218d [X86] Cleanup nontemporal fragments. NFCI.
We can chain other fragments to avoid repeating conditions.
This also fixes a potential bug (that realistically can't happen),
where we would match indexed nontemporal stores for i32/i64.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246719 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 22:27:38 +00:00
Sanjay Patel
ec44710063 [x86] fix allowsMisalignedMemoryAccesses() for 8-byte and smaller accesses
This is a continuation of the fix from:
http://reviews.llvm.org/D10662

and discussion in:
http://reviews.llvm.org/D12154

Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned
scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType) 
assumes that unaligned scalar accesses are always ok, so this changes 
allowsMisalignedMemoryAccesses() to match that behavior.

Differential Revision: http://reviews.llvm.org/D12543


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246658 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 15:42:49 +00:00
Asaf Badouh
05859c7cbb [X86][AVX512VLBW] add support in byte shift and SAD
add byte shift left/right
add SAD - compute sum of absolute differences

Differential Revision: http://reviews.llvm.org/D12479

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246654 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 14:21:54 +00:00
Igor Breger
1b50f7132b AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S instructions
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11593

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246642 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 11:18:55 +00:00
Igor Breger
191108c6b8 AVX512: Implemented encoding and intrinsics for vshufps/d.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11709

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246640 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:50:58 +00:00
Elena Demikhovsky
e1bb461f27 AVX-512: store <4 x i1> and <2 x i1> values in memory
Enabled DAG pattern lowering for SKX with DQI predicate.

Differential Revision: http://reviews.llvm.org/D12550



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246625 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 09:20:58 +00:00
Vedant Kumar
ec0cd29de8 [CodeGen] Fix FREM on 32-bit MSVC on x86
Patch by Dylan McKay!

Differential Revision: http://reviews.llvm.org/D12099

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246615 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 01:31:58 +00:00
Sanjay Patel
ac515c4087 rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI)
This is a follow-on suggested by:
http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 )
http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 )

This makes the attribute name match most of the existing lowering logic
and regression test expectations.

But the current use of this attribute is inconsistent; see the FIXME
comment for "allowsMisalignedMemoryAccesses()". That change will
result in functional changes and should be coming soon.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246585 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-01 20:51:51 +00:00
Igor Breger
c02bfc6060 AVX512: Implemented intrinsics for valign.
Differential Revision: http://reviews.llvm.org/D12526

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246551 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-01 15:27:18 +00:00
Sanjay Patel
63384be23d [x86] enable machine combiner reassociations for scalar 'or' insts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246481 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31 20:27:03 +00:00
Matthias Braun
023a6e3548 X86: Fix FastISel SSESelect register class
X86FastISel has been using the wrong register class for VBLENDVPS which
produces a VR128 and needs an extra copy to the target register. The
problem was already hit by the existing test cases when using
> llvm-lit -Dllc="llc -verify-machineinstr"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246461 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31 18:25:11 +00:00
Igor Breger
046f79fbb0 AVX512: ktest implemantation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D11979

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246439 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31 13:30:19 +00:00
Igor Breger
c7aaf020ab AVX512: Implemented encoding and intrinsics for vdbpsadbw
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12491

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246436 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31 13:09:30 +00:00
Igor Breger
c21a0f3132 AVX512: kadd implementation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D11973

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246432 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31 11:50:23 +00:00
Igor Breger
66973634a5 AVX512: Implemented encoding and intrinsics for vpalignr
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12270

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246428 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-31 11:14:02 +00:00
Hal Finkel
16c92083ab [MIR Serialization] static -> static const in getSerializable*MachineOperandTargetFlags
Make the arrays 'static const' instead of just 'static'. Post-commit review
comment from Roman Divacky on IRC. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246376 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-30 08:07:29 +00:00
Vedant Kumar
21f084aa72 [X86] NFC: Clean up and clang-format a few lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246340 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-28 21:59:00 +00:00
Sanjay Patel
4b1821fa36 [x86] enable machine combiner reassociations for scalar 'and' insts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246300 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-28 14:09:48 +00:00
Reid Kleckner
c0e64ada5c [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246235 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-27 23:27:47 +00:00
Reid Kleckner
9c5b8a0117 [ms-inline-asm] Relax assertion around funky identifiers slightly
A corresponding clang change will make it so that clang can consume part
of an assembler token. The assembler treats '.' as an identifier
character while clang does not, so it's view of the token stream is a
little different.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246089 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-26 21:57:25 +00:00
Andrew Kaylor
ab3081c118 Expose hasLiveCondCodeDef as a member function of the X86InstrInfo class. NFC
This takes the existing static function hasLiveCondCodeDef and makes it a member function of the X86InstrInfo class. This is a useful utility function that an upcoming change would like to use. NFC.

Patch by: Kevin B. Smith
Differential Revision: http://reviews.llvm.org/D12371




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246073 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-26 20:36:52 +00:00
Vedant Kumar
0c3c0acf23 [llvm-mc] Ignore opcode size prefix in 64-bit CALL disassembly
This is a fix for disassembling unusual instruction sequences in 64-bit
mode w.r.t the CALL rel16 instruction. It might be desirable to move the
check somewhere else, but it essentially mimics the special case
handling with JCXZ in 16-bit mode.

The current behavior accepts the opcode size prefix and causes the
call's immediate to stop disassembling after 2 bytes. When debugging
sequences of instructions with this pattern, the disassembler output
becomes extremely unreliable and essentially useless (if you jump midway
into what lldb thinks is a unified instruction, you'll lose %rip). So we
ignore the prefix and consume all 4 bytes when disassembling a 64-bit
mode binary.

Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S.
is defined as:

    Indicates an instruction syntax that requires an address override
    prefix in 64-bit mode and is not supported. Using an address
    override prefix in 64-bit mode may result in model-specific
    execution behavior. (Vol. 2A 3-7)

Since 0x66 is an operand override prefix we should be OK (although we
may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested
with, they all ignore the 0x66 prefix in 64-bit mode.

Patch by Matthew Barney!

Differential Revision: http://reviews.llvm.org/D9573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246038 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-26 16:20:29 +00:00
Matthias Braun
ec01af9135 FastISel: Factor out common code; NFC intended
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245997 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-26 01:38:00 +00:00
Charles Davis
7e96f0f6ff Make variable argument intrinsics behave correctly in a Win64 CC function.
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.

Reviewers: nadav, asl, eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1622

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245990 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-25 23:27:41 +00:00
Sanjay Patel
af7c00f213 make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.

Differential Revision: http://reviews.llvm.org/D12288



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245950 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-25 16:29:21 +00:00
Michael Kuperstein
89024e86ad [X86] Remove references to _ftol2
As of r245924, _ftol2 is no longer used for fptoui on MS platforms.
Remove the dead code associated with it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245925 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-25 07:58:33 +00:00
Michael Kuperstein
f48b1beeec [X86] Fix fptoui conversions
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245924 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-25 07:42:09 +00:00
Steve King
46ff6da860 Pass function attributes instead of boolean in isIntDivCheap().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245921 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-25 02:31:21 +00:00
Matthias Braun
56dd2d0886 MachineBasicBlock: Add liveins() method returning an iterator_range
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245895 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-24 22:59:52 +00:00
Michael Zuckerman
59dfeede45 [X86] Add support for mmword memory operand size for Intel-syntax x86 assembly
Differential Revision: http://reviews.llvm.org/D12151


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245835 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-24 10:26:54 +00:00
Michael Zuckerman
7b854fda4a first commit to llvm
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245825 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-24 07:48:50 +00:00
Sanjay Patel
bf83737adb [x86] enable machine combiner reassociations for 256-bit vector min/max
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 21:04:21 +00:00
Sanjay Patel
6113df3d73 remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245733 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 20:39:17 +00:00
Sanjay Patel
2071d7abd9 [x86] invert logic for attribute 'FeatureFastUAMem'
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.

Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.

Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.

Differential Revision: http://reviews.llvm.org/D12154



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245729 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 20:17:26 +00:00
Sanjay Patel
b730bdf4e9 [x86] enable machine combiner reassociations for 128-bit vector min/max
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245715 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 18:06:49 +00:00
Eric Christopher
32ce343fe6 Fix typo - symetric -> symmetric.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245705 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 16:23:39 +00:00
Ahmed Bougacha
14fea5bf0d [X86] Look for scalar through one bitcast when lowering to VBROADCAST.
Fixes PR23464: one way to use the broadcast intrinsics is:

  _mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));

We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.

Differential Revision: http://reviews.llvm.org/D10557


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245613 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-20 21:02:39 +00:00
Ahmed Bougacha
ad0ddd8e01 [X86] Replace avx2 broadcast intrinsics with native IR.
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.

Differential Revision: http://reviews.llvm.org/D10555


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245606 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-20 20:36:19 +00:00
Marina Yatsina
4ca59ab261 [X86] Fix FBLD and FBSTP
FBLD and FBSTP should receive TBYTE because it is defined as
FBLD m80
FBSTP m80

Differential Revision: http://reviews.llvm.org/D11748



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245553 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-20 11:51:24 +00:00