Commit Graph

7878 Commits

Author SHA1 Message Date
Simon Pilgrim
8f579ce1a6 [X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272319 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 22:03:15 +00:00
Simon Pilgrim
3ddec70a78 [X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics
Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272308 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 21:09:03 +00:00
Simon Pilgrim
f921bac68f [X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272307 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 20:53:12 +00:00
Simon Pilgrim
9ceba992f6 [X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272300 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 20:13:58 +00:00
Davide Italiano
a72ade5c07 Also fix a typo. Need more coffee today.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272278 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 17:06:01 +00:00
Davide Italiano
c4c43eaa95 Improve r272262, check that __stack_chk_guard is used.
Thanks to Rafael for the suggestion.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272277 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 17:04:38 +00:00
Davide Italiano
cbf7512550 Move stackguard test to X86/ directory as it's not generic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272264 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 15:16:58 +00:00
Igor Breger
de21197e48 [AVX512] Remove masked_move/blendm intrinsic from back-end.
This is complement patch to D21060.

Differential Revision: http://reviews.llvm.org/D21174

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272257 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 11:46:55 +00:00
Craig Topper
b867bf3ea9 [AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272252 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 07:49:08 +00:00
Craig Topper
cadff981d8 [X86] Fix a test I failed to re-generate in r272249.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272250 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 07:10:34 +00:00
Craig Topper
1b683873d6 [X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272249 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 07:06:38 +00:00
Dehao Chen
8153f63460 Revive http://reviews.llvm.org/D12778 to handle forward-hot-prob and backward-hot-prob consistently.
Summary:
Consider the following diamond CFG:

 A
/ \
B C
 \/
 D

Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.

Reviewers: djasper, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20989

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272203 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-08 21:30:12 +00:00
Simon Pilgrim
3aa4772f49 [X86][SSE4A] Regenerated SSE4A intrinsics tests
There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272060 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 21:15:45 +00:00
Etienne Bergeron
70cf01c276 [stack-protection] Add support for MSVC buffer security check
Summary:
This patch is adding support for the MSVC buffer security check implementation

The buffer security check is turned on with the '/GS' compiler switch.
  * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
  * To be added to clang here: http://reviews.llvm.org/D20347

Some overview of buffer security check feature and implementation:
  * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
  * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
  * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html


For the following example:
```
int example(int offset, int index) {
  char buffer[10];
  memset(buffer, 0xCC, index);
  return buffer[index];
}
```

The MSVC compiler is adding these instructions to perform stack integrity check:
```
        push        ebp  
        mov         ebp,esp  
        sub         esp,50h  
  [1]   mov         eax,dword ptr [__security_cookie (01068024h)]  
  [2]   xor         eax,ebp  
  [3]   mov         dword ptr [ebp-4],eax  
        push        ebx  
        push        esi  
        push        edi  
        mov         eax,dword ptr [index]  
        push        eax  
        push        0CCh  
        lea         ecx,[buffer]  
        push        ecx  
        call        _memset (010610B9h)  
        add         esp,0Ch  
        mov         eax,dword ptr [index]  
        movsx       eax,byte ptr buffer[eax]  
        pop         edi  
        pop         esi  
        pop         ebx  
  [4]   mov         ecx,dword ptr [ebp-4]  
  [5]   xor         ecx,ebp  
  [6]   call        @__security_check_cookie@4 (01061276h)  
        mov         esp,ebp  
        pop         ebp  
        ret  
```

The instrumentation above is:
  * [1] is loading the global security canary,
  * [3] is storing the local computed ([2]) canary to the guard slot,
  * [4] is loading the guard slot and ([5]) re-compute the global canary,
  * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.

Overview of the current stack-protection implementation:
  * lib/CodeGen/StackProtector.cpp
    * There is a default stack-protection implementation applied on intermediate representation.
    * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
    * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
    * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
    * Guard manipulation and comparison are added directly to the intermediate representation.

  * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
  * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
      * see long comment above 'class StackProtectorDescriptor' declaration.
    * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
      * 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
    * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.

  * include/llvm/Target/TargetLowering.h
    * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.

  * lib/Target/X86/X86ISelLowering.cpp
    * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.

Function-based Instrumentation:
  * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
  * To support function-based instrumentation, this patch is
    * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
      * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
    * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
    * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
    * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).

Modifications
  * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
  * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)

Results

  * IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```

```
*** Final LLVM Code input to ISel ***

; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
  %StackGuardSlot = alloca i8*                                                  <<<-- Allocated guard slot
  %0 = call i8* @llvm.stackguard()                                              <<<-- Loading Stack Guard value
  call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot)                  <<<-- Prologue intrinsic call (store to Guard slot)
  %index.addr = alloca i32, align 4
  %offset.addr = alloca i32, align 4
  %buffer = alloca [10 x i8], align 1
  store i32 %index, i32* %index.addr, align 4
  store i32 %offset, i32* %offset.addr, align 4
  %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
  %1 = load i32, i32* %index.addr, align 4
  call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
  %2 = load i32, i32* %index.addr, align 4
  %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
  %3 = load i8, i8* %arrayidx, align 1
  %conv = sext i8 %3 to i32
  %4 = load volatile i8*, i8** %StackGuardSlot                                  <<<-- Loading Guard slot
  call void @__security_check_cookie(i8* %4)                                    <<<-- Epilogue function-based check
  ret i32 %conv
}
```

  * SelectionDAG generated instrumentation:

```
clang-cl /GS test.cc /O1 /c /FA
```

```
"?example@@YAHHH@Z":                    # @"\01?example@@YAHHH@Z"
# BB#0:                                 # %entry
        pushl   %esi
        subl    $16, %esp
        movl    ___security_cookie, %eax                                        <<<-- Loading Stack Guard value
        movl    28(%esp), %esi
        movl    %eax, 12(%esp)                                                  <<<-- Store to Guard slot
        leal    2(%esp), %eax
        pushl   %esi
        pushl   $204
        pushl   %eax
        calll   _memset
        addl    $12, %esp
        movsbl  2(%esp,%esi), %esi
        movl    12(%esp), %ecx                                                  <<<-- Loading Guard slot
        calll   @__security_check_cookie@4                                      <<<-- Epilogue function-based check
        movl    %esi, %eax
        addl    $16, %esp
        popl    %esi
        retl
```

Reviewers: kcc, pcc, eugenis, rnk

Subscribers: majnemer, llvm-commits, hans, thakis, rnk

Differential Revision: http://reviews.llvm.org/D20346

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272053 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 20:15:35 +00:00
Simon Pilgrim
c9a83046c3 [X86][AVX512] Added 512-bit integer vector non-temporal load tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272016 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 15:12:47 +00:00
Simon Pilgrim
6ad3e358e9 [X86][SSE] Add general lowering of nontemporal vector loads
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.

This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.

We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.

There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.

Differential Review: http://reviews.llvm.org/D20965

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272010 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 13:34:24 +00:00
Igor Breger
7e0019d8f7 [AVX512] Fix load opcode for fast isel.
Differential Revision: http://reviews.llvm.org/D21067

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272006 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 13:08:45 +00:00
Simon Pilgrim
c1e27cc453 [X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.

This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272003 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 12:20:14 +00:00
Igor Breger
5238dbe213 [KNL] Fix UMULO lowering.
Differential Revision: http://reviews.llvm.org/D21013

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271891 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-06 12:24:52 +00:00
Craig Topper
6dbfac925f [AVX512] Remove masked palignr intrinsics and auto-upgrade them to native IR of vector shuffle and select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271872 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-06 06:12:54 +00:00
Craig Topper
856b53e006 [AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271870 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-06 05:39:10 +00:00
Craig Topper
5cc4ee2898 [AVX512] Update tests to show shuffle decoding for vpshuflw/vpshufhw.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271869 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-06 05:39:07 +00:00
Simon Pilgrim
7705b591df [X86][XOP] Added VPERMIL2PD/VPERMIL2PS raw mask decoding for target shuffle combines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271834 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-05 15:21:30 +00:00
Simon Pilgrim
2d63358b82 [X86][XOP] Added VPERMIL2PD/VPERMIL2PS as a target shuffle type
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271831 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-05 15:01:45 +00:00
Craig Topper
3789c0fe24 [AVX512] Add support for lowering PALIGNR for v64i8.
Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271828 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-05 06:29:12 +00:00
Craig Topper
4b35a62824 [AVX512] Split command lines and regenerate a test to prepare for a future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271827 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-05 06:29:08 +00:00
Craig Topper
dd17bc5daa [AVX512] Fix PANDN combining for v4i32/v8i32 when VLX is enabled.
v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271826 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-05 05:35:11 +00:00
Simon Pilgrim
b6dac61a73 [X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271809 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-04 21:44:28 +00:00
Saleem Abdulrasool
9195dbacc0 X86: enable TLS on Windows itanium
Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for
C++).  Enable the TLS support for the target similar to the MSVC model.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271797 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-04 18:27:22 +00:00
Simon Pilgrim
ebbbdf51f2 [X86][AVX2] Fix v16i16 SHL lowering (PR27730)
The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result.

The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271796 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-04 16:45:33 +00:00
Simon Pilgrim
644a7d6a0c [X86][AVX512] Fixed 512-bit vector nontemporal load alignment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271673 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 14:12:43 +00:00
Simon Pilgrim
181a1d131a [X86][AVX512] Added 512-bit vector nontemporal load tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271668 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 13:42:49 +00:00
Simon Pilgrim
209a26795d [X86][SSE] Added nontemporal load tests
These currently all lower to regular loads, generic nontemporal load support will be added in a future patch

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271659 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 11:00:55 +00:00
Simon Pilgrim
502de5f49d [X86] Added nontemporal scalar store tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271656 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 10:30:54 +00:00
Simon Pilgrim
2306053194 [X86][SSE] Regenerated nontemporal vector store tests and added extra target types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271654 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 10:24:24 +00:00
Simon Pilgrim
c5409a6b1c [X86] Regenerated nontemporal store tests and added tests for all 128-bit vector types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271651 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 10:15:36 +00:00
Simon Pilgrim
693ef963a3 [X86][AVX2] Relaxed alignment on nontemporal store tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271646 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 10:06:59 +00:00
Simon Pilgrim
d396013727 [X86][AVX2] Regenerated nontemporal store tests and added tests for all 256-bit vector types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271645 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 09:56:24 +00:00
Simon Pilgrim
f940424957 [X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.

The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.

Mask decoding/target shuffle support will be added in future patches.

Differential Revision: http://reviews.llvm.org/D20049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271633 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 08:06:03 +00:00
Craig Topper
e94dfe2269 [AVX512] Ensure EVEX vpshufd, vpshuflw, and vpshufhw have isel priority over the VEX encoded ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271629 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 05:31:04 +00:00
Craig Topper
1b4de08d95 [AVX512] Fix shuffle comment printing for EVEX encoded PSHUFD, PSHUFHW, and PSHUFLW.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271628 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 05:31:00 +00:00
Simon Pilgrim
848232c1e9 [X86][SSE] Added SSE41/AVX2 non-temporal tests
Useful for when we add MOVNTDQA support

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271552 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 18:01:21 +00:00
Dimitry Andric
f9f8cdc1e4 Only attempt to detect AVG if SSE2 is available
Summary:
In PR29973 Sanjay Patel reported an assertion failure when a certain
loop was optimized, for a target without SSE2 support.  It turned out
this was because of the AVG pattern detection introduced in rL253952.

Prevent the assertion failure by bailing out early in
`detectAVGPattern()`, if the target does not support SSE2.

Also add a minimized test case.

Reviewers: congh, eli.friedman, spatel

Subscribers: emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D20905


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271548 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 17:30:49 +00:00
Sanjay Patel
ec6796196a [DAG] use getBitcast() to reduce code
Although this was intended to be NFC, the test case wiggle shows a change in
code scheduling/RA caused by a difference in the SDLoc() generation.

Depending on how you look at it, this is the (dis)advantage of exact checking
in regression tests.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271526 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 16:01:15 +00:00
Simon Pilgrim
c83e4f9067 [X86][SSE] Added non-temporal load tests for vector types
These currently lower to regular loads instead of MOVNTDQA

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271516 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 13:51:50 +00:00
Simon Pilgrim
22d789345f [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (llvm)
This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead.

Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower.

Differential Revision: http://reviews.llvm.org/D20860

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271510 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 10:55:21 +00:00
Craig Topper
f543143a73 [X86] Add AVX 256-bit load and stores to fast isel.
I'm not sure why this was missing for so long.

This also exposed that we were picking floating point 256-bit VMOVNTPS for some integer types in normal isel for AVX1 even though VMOVNTDQ is available. In practice it doesn't matter due to the execution dependency fix pass, but it required extra isel patterns. Fixing that in a follow up commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271481 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 04:19:45 +00:00
Craig Topper
11a12c2c07 [AVX512] Remove masked load intrinsics. Clang now emits generic masked load intrinsics instead.
The intrinsics will be autoupgraded to the same generic masked loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271478 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 04:19:36 +00:00
Sanjay Patel
7cff23cad5 [x86, AVX2] regenerate checks
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271434 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-01 21:32:56 +00:00
Michael Kuperstein
d99a2e3ae2 [DAG] Improve legalization of INSERT_SUBVECTOR
When the index is known to be constant 0, insert directly into the the low half,
instead of spilling, performing the insert in-memory, and reloading.

Differential Revision: http://reviews.llvm.org/D20763


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271428 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-01 20:49:35 +00:00