Commit Graph

24874 Commits

Author SHA1 Message Date
Zvi Rackover
32d2ff0d0f X86 Tests: Add a case for combining sdiv by a splatted pow2 negative. NFC.
Noticed test was missing while working on D42479.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329356 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 21:57:20 +00:00
Craig Topper
efb0b0966d [X86] Separate CDQ and CDQE in the scheduler model.
According to Agner's data, CDQE is closer to CWDE.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329354 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 21:56:19 +00:00
Craig Topper
d5783cbc20 [X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329351 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 21:40:32 +00:00
Craig Topper
f828e9316d [X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge.
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.

The Sandy Bridge version was missing a load port use.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329347 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 21:16:26 +00:00
Simon Pilgrim
62cc26416d [X86][SSE] Add floating point add/mul fast-math vector.reduce tests
Strict versions aren't working at all (PR36732) and the accumulators aren't supported (PR36734)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329344 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 21:01:21 +00:00
Simon Pilgrim
23bf3d86ba [X86][SSE] Add floating point min/max vector.reduce tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329343 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 20:54:55 +00:00
Konstantin Zhuravlyov
ae3b2037b4 AMDGPU/Metadata: Always report a fixed number of hidden arguments
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.

Differential Revision: https://reviews.llvm.org/D45129


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329341 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 20:46:04 +00:00
Craig Topper
b49f64bf05 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329339 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 20:04:06 +00:00
Craig Topper
8f85f224b9 [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
Mostly vector load, store, and move instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329330 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 18:38:45 +00:00
Simon Pilgrim
f614f36c69 [X86][SSE] Add integer add/mul vector.reduce tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329321 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 17:37:35 +00:00
Simon Pilgrim
f096c30c1e [X86][SSE] Add integer and/or/xor vector.reduce tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329320 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 17:29:51 +00:00
Simon Pilgrim
34841016c3 [X86][SSE] Add integer min/max vector.reduce tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329319 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 17:25:40 +00:00
Sam Clegg
13c09234d6 [WebAssembly] Allow for the creation of user-defined custom sections
This patch adds a way for users to create their own custom sections to
be added to wasm files. At the LLVM IR layer, they are defined through
the "wasm.custom_sections" named metadata. The expected use case for
this is bindings generators such as wasm-bindgen.

Patch by Dan Gohman

Differential Revision: https://reviews.llvm.org/D45297

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329315 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 17:01:39 +00:00
Tim Northover
1444c19a6b ARM: Do not spill CSR to stack on entry to noreturn functions
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.

Should fix PR9970.

Patch by myeisha (pmb).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329287 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 14:26:06 +00:00
Sam Parker
b326c6e97a [DAGCombine] Revert r329160
Again, broke the big endian stage 2 builders.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329283 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 13:46:17 +00:00
Simon Dardis
60da41d8b5 [mips] Regenerate test before posting patch for constant multiplication (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329268 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 10:30:17 +00:00
Craig Topper
19636dfbd8 [X86] Revert r329251-329254
It's failing on the bots and I'm not sure why.

This reverts:

[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329256 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 05:19:36 +00:00
Craig Topper
67c388f5a1 [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
Mostly vector load, store, and move instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329254 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 04:42:03 +00:00
Craig Topper
d4b8b33a93 [X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329252 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 04:42:01 +00:00
Craig Topper
224b702752 [X86] Auto-generate complete checks. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329251 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 04:41:59 +00:00
Puyan Lotfi
a7f9b6aaad [MIR-Canon] Improving performance by switching to named vregs.
No more skipping thounsands of vregs. Much faster running time.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329246 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 00:27:15 +00:00
Puyan Lotfi
29bc6472de [MIR-Canon] Adding support for multi-def -> user distance reduction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329243 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 00:08:15 +00:00
Peter Collingbourne
70bca66be3 AArch64: Implement support for the shadowcallstack attribute.
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.

Differential Revision: https://reviews.llvm.org/D45239

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329236 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 21:55:44 +00:00
Craig Topper
84038c4e88 [X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models.
The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329211 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 17:54:19 +00:00
Lei Huang
200eeca319 [Power9]Legalize and emit code for quad-precision fma instructions
Legalize and emit code for the following quad-precision fma:

  * xsmaddqp
  * xsnmaddqp
  * xsmsubqp
  * xsnmsubqp

Differential Revision: https://reviews.llvm.org/D44843

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329206 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 16:43:50 +00:00
Nicolai Haehnle
83bfebdaca AMDGPU: Dimension-aware image intrinsics
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.

This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.

Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.

v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI

Change-Id: I099f309e0a394082a5901ea196c3967afb867f04

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44939

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329166 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 10:58:54 +00:00
Nicolai Haehnle
126cd7e831 AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.

There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".

Fixes a bug encountered in Nier: Automata.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329164 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 10:57:58 +00:00
John Brawn
107f6100d7 [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)
Differential Revision: https://reviews.llvm.org/D44573


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329163 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 10:12:53 +00:00
Sam Parker
9b4235c03c [DAGCombine] Improve ReduceLoadWidth for SRL
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.

Original commit message:

If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.

Differential Revision: https://reviews.llvm.org/D41350


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329160 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 09:26:56 +00:00
Vlad Tsyrklevich
484fd96051 Add the ShadowCallStack pass
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.

Reviewers: pcc, vitalybuka

Reviewed By: pcc

Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc

Differential Revision: https://reviews.llvm.org/D44802

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329139 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 01:21:16 +00:00
Jessica Paquette
ddbccbd6f6 [MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.

Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.

This also adds a new test for the new redzone behaviour.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329134 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 23:32:41 +00:00
Farhana Aleen
a59291c1f6 [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D45219

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329131 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 23:00:30 +00:00
Jessica Paquette
5d2f4dba66 [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.

This removes the requirement to pass -mno-red-zone when outlining for AArch64.

https://reviews.llvm.org/D45189



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329120 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 21:56:10 +00:00
Farhana Aleen
d82ffe5dae Revert "MSG"
This reverts commit 9a0ce889d1.

This was committed by mistake.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329119 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 21:51:45 +00:00
Jessica Paquette
4831763daa [MachineOutliner][NFC] Make outlined functions have internal linkage
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.

This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329116 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 21:36:00 +00:00
Farhana Aleen
9a0ce889d1 MSG
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329114 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 21:20:39 +00:00
Sanjay Patel
8c6709f8ef [x86] add tests for convert-FP-to-integer with constants; NFC
We don't constant fold any of these, but we could...but if we
do, we must produce the right answer.

Unlike the IR fptosi instruction or its DAG node counterpart 
ISD::FP_TO_SINT, these are not undef for an out-of-range input.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329100 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 18:34:56 +00:00
Krzysztof Parzyszek
22950265a5 [Hexagon] Remove unneeded attributes from lit test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329078 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 16:05:20 +00:00
Chandler Carruth
7e78daafdd [x86] Fix a pretty obvious think-o with my asm scrubbing. You have to in
fact use regular expression syntax to use regular expressions.

Should restore the bots. Sorry for the noise on this test.

Thanks to Philip for spotting the bug!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329057 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 10:28:56 +00:00
Chandler Carruth
18ceb931fd [x86] Clean up and enhance a test around eflags copying.
This adds the basic test cases from all the EFLAGS bugs in more direct
forms. It also switches to generated check lines, and includes both
32-bit and 64-bit variations.

No functionality changing here, just setting things up to have a nice
clean asm diff in my EFLAGS patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329056 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 10:04:37 +00:00
Chandler Carruth
6f46178ea5 [x86] Extend my goofy SP offset scrubbing for llc test cases to actually
do explicit scrubbing of the offsets of stack spills and reloads.

You can always turn this off in order to test specific stack slot usage.
We were already hiding most of this, but the new logic hides it more
generically. Notably, we should effectively hide stack slot churn in
functions that have a frame pointer now, and should also hide it when
changing a function from stack pointer to frame pointer. That transition
already changes enough to be clearly noticed in the test case diff,
showing *every* spill and reload is really noisy without benefit. See
the test case I ran this on as a classic example.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329055 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 09:57:05 +00:00
Yonghong Song
86ab905c1d bpf: fix incorrect SELECT_CC lowering
Commit 37962a331c ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
  (1). In code, just swap is not enough, ConditionalCode CC
       should also be swapped, otherwise incorrect code will
       be generated.
  (2). The ConditionalCode swap should be subject to
       getHasJmpExt(). If getHasJmpExt() is False, certain
       conditional codes will not be supported and swap
       may generate incorrect code.

The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329043 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 03:56:37 +00:00
Chandler Carruth
bab9d7badc [x86] Tidy up test case, generate check lines with script. NFC.
Just adds basic block labels and tidies up where comments go in the test
case and then generates fresh CHECK lines with the script. This way, the
check lines are much easier to maintain. They were already close to this
but not quite there.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329040 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 02:19:05 +00:00
Rafael Espindola
aa2d256782 Align stubs for external and common global variables to pointer size.
This patch fixes PR36885: clang++ generates unaligned stub symbol
holding a pointer.

Patch by Rahul Chaudhry!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329030 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 23:20:30 +00:00
Lama Saba
3c03a2ac26 [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.

This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.

The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.

Differential revision: https://reviews.llvm.org/D41330

Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328973 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 13:48:28 +00:00
Craig Topper
1ad5730bb9 [X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model.
Data taken from Table 16-17 in the Intel Optimization Manual.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328962 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 06:34:16 +00:00
Craig Topper
1936d3b892 [X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide.
Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328961 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 05:54:34 +00:00
Craig Topper
d9e6efc2fb [X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models.
Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328960 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 05:33:28 +00:00
Craig Topper
0036663aec [X86] Fix the SchedRW for AVX512 shift instructions.
It was being inadvertently defaulted to an FADD scheduler class.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328959 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 03:15:02 +00:00
Craig Topper
8b17c46f75 [X86] Add an itinerary to BTR64rr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328956 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 01:12:34 +00:00