40420 Commits

Author SHA1 Message Date
Artem Tamazov
3398735496 [AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed.
Fixes Bug 28215. Lit tests updated.

Differential Revision: https://reviews.llvm.org/D25837

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284825 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-21 14:49:22 +00:00
Simon Pilgrim
06bac82d8f [X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284823 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-21 13:00:47 +00:00
Simon Pilgrim
01109e8b73 [X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284821 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-21 12:14:24 +00:00
Bjorn Pettersson
46ab1d2bd2 [AArch64] Corrected spill size for DDD register class. NFCI
Summary:
The spill size was incorrectly set to 196 bits,
which isn't a multiple of 8. This problem was detected when
experimenting with asserts that the spill size should be a
multiple of the byte size.

New corrected value for the spill size is set to 192 bits.

Note that tablegen (RegisterInfoEmitter) will divide the
size set in the RegisterClass definition by 8. So this
change should not have any impact on the tablegen output
(trunc(192/8) == trunc(196/8) == 24 bytes).

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, rengolin

Differential Revision: https://reviews.llvm.org/D25818

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284814 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-21 09:53:42 +00:00
Michael Kuperstein
fe040323c6 [X86] Enable interleaved memory access by default
This lets the loop vectorizer generate interleaved memory accesses on x86.

Differential Revision: https://reviews.llvm.org/D25350


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284779 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 21:04:31 +00:00
Konstantin Zhuravlyov
37962abba1 [AMDGPU] Make note record name a static const member of target streamer
Differential Revision: https://reviews.llvm.org/D25746


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284760 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov
e0e811c4ce [AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284759 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 18:12:38 +00:00
Simon Pilgrim
e1ac64bc87 [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284755 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 18:00:35 +00:00
Sanjay Patel
928f047b68 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284746 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 16:55:45 +00:00
Simon Pilgrim
99edc4fc3c [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.

We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284744 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 16:39:11 +00:00
Valery Pykhtin
446cd5eefe [AMDGPU] add fcopysign(f64, f32) pattern
Differential revision: https://reviews.llvm.org/D25827

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284743 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 16:17:54 +00:00
Benjamin Kramer
06d5a1641d Do a sweep over move ctors and remove those that are identical to the default.
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.

No functionality change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284721 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 12:20:28 +00:00
Jonas Paulsson
a007516937 [SystemZ] Post-RA scheduler implementation
Post-RA sched strategy and scheduling instruction annotations for z196, zEC12
and z13.

This scheduler optimizes decoder grouping and balances processor resources
(including side steering the FPd unit instructions).

The SystemZHazardRecognizer keeps track of the scheduling state, which can
be dumped with -debug-only=misched.

Reviers: Ulrich Weigand, Andrew Trick.
https://reviews.llvm.org/D17260

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284704 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 08:27:16 +00:00
Peter Collingbourne
f43c9c6d35 X86: Allow expressions to appear as u8imm operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284688 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 01:58:34 +00:00
Peter Collingbourne
bae0043048 X86: Deduplicate some lowering code. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284686 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 01:21:26 +00:00
Reid Kleckner
3a3c5c8815 Use __func__ directly now that all supported compilers support it
Remove the portability macro now that it is unused.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284681 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 00:22:23 +00:00
Wei Ding
cc8ca50286 AMDGPU : Add a function to enable and disable IEEEBit for SC and shader
respectively.

Differential Revision: http://reviews.llvm.org/D25789

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284655 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 22:34:49 +00:00
Krzysztof Parzyszek
735fbf86f3 [AMDGPU] Stop using MCRegisterClass::getSize()
Differential Review: https://reviews.llvm.org/D24675


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284619 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 17:40:36 +00:00
Krzysztof Parzyszek
cac70281f9 [RDF] Switch RefMap in liveness calculation to use lane masks
This required reengineering of some of the part of liveness calculation,
including fixing some issues caused by the limitations of the previous
approach. The current code is not necessarily the fastest, but it should
be functionally correct (at least more so than before). The compile-time
performance will be addressed in the future.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284609 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 16:30:56 +00:00
Chris Dewhurst
4a9c407929 [Sparc][LEON] Detects an erratum on UT699 LEON 3 processors involving rounding mode changes and issues an appropriate user error message.
Differential Revision: https://reviews.llvm.org/D24665

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284591 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 14:01:06 +00:00
Sjoerd Meijer
556bf4b535 Reapply r284571 (with the new tests fixed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284588 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 13:43:02 +00:00
Ulrich Weigand
bc50ab0dfa [SystemZ] Add missing vector instructions for the assembler
Most z13 vector instructions have a base form where the data type of
the operation (whether to consider the vector to be 16 bytes, 8
halfwords, 4 words, or 2 doublewords) is encoded into a mask field,
and then a set of extended mnemonics where the mask field is not
present but the data type is encoded into the mnemonic name.

Currently, LLVM only supports the type-specific forms (since those
are really the ones needed for code generation), but not the base
type-generic forms.

To complete the assembler support and make it fully compatible with
the GNU assembler, this commit adds assembler aliases for all the
base forms of the various vector instructions.

It also adds two more alias forms that are documented in the PoP:
VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc.
VNOT -- special variant of VNO



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284586 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 13:03:18 +00:00
Ulrich Weigand
a91221aaa6 [SystemZ] Add optional argument to some vector string instructions
The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are
documented in the Principles of Operation to have an optional last
operand to encode arbitrary values in a mask field.

This commit adds support for those optional operands, and cleans up
the patterns to generate vector string instruction as bit.  No change
to code generation intended.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284585 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 12:57:46 +00:00
James Molloy
ab4e0362c7 [Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.

It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.

TBB example:
Before: lsls r0, r0, #2    After: add  r0, pc
        adr  r1, .LJTI0_0         ldrb r0, [r0, #6]
        ldr  r0, [r0, r1]         lsls r0, r0, #1
        mov  pc, r0               add  pc, r0
  => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.

The only case that can increase dynamic instruction count is the TBH case:

Before: lsls r0, r4, #2    After: lsls r4, r4, #1
        adr  r1, .LJTI0_0         add  r4, pc
        ldr  r0, [r0, r1]         ldrh r4, [r4, #6]
        mov  pc, r0               lsls r4, r4, #1
                                  add  pc, r4
  => 1 more instruction in prologue. Jump table shrunk by a factor of 2.

So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284580 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 12:06:49 +00:00
Sjoerd Meijer
c9cee26cf7 Revert of r284571 because of failing tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284572 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 07:45:48 +00:00
Sjoerd Meijer
9a54c88709 Checking FP function attribute values and adding more build attribute tests.
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).

Differential Revision: https://reviews.llvm.org/D25625


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284571 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 07:25:06 +00:00
Craig Topper
cdb220aad5 [AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast.
Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.

New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.

There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.

The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.

Reviewers: RKSimon, delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25651

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284567 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 04:44:17 +00:00
Eli Friedman
ed57153864 Improve ARM lowering for "icmp <2 x i64> eq".
The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.

Differential Revision: https://reviews.llvm.org/D25713



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284536 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 21:03:40 +00:00
Evandro Menezes
d9b0063780 [AArch64] Avoid materializing 0.0 when generating FP SELECT
Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.

Differential Revision: https://reviews.llvm.org/D24808

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284531 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 20:37:35 +00:00
Tim Northover
10519cde2d GlobalISel: select small binary operations on AArch64.
AArch64 actually supports many 8-bit operations under the definition used by
GlobalISel: the designated information-carrying bits of a GPR32 get the right
value if you just use the normal 32-bit instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284526 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 20:03:48 +00:00
Tim Northover
1fd8e0ac25 GlobalISel: support floating-point constants on AArch64.
Patch from Ahmed Bougacha.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284523 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 19:47:57 +00:00
Krzysztof Parzyszek
57144cfc1a [Hexagon] Handle block live-ins with lane masks in HexagonBlockRanges
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284522 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 19:47:20 +00:00
Benjamin Kramer
6e3c4da116 Reduce global namespace pollution. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284521 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 19:39:31 +00:00
Sanjay Patel
bbcb21daf0 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284513 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 18:36:49 +00:00
Sanjay Patel
5800d6e9a7 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284495 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 17:05:05 +00:00
Simon Pilgrim
cad4756e00 [X86][AVX512] Add mask/maskz writemask support to constant pool shuffle decode commentx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284488 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 15:45:37 +00:00
Simon Dardis
4ec176ab25 [mips][ias] Handle more complicated expressions for memory operands
This patch teaches ias for mips to handle expressions such as
(8*4)+(8*31)($sp). Such expression typically occur from the expansion
of multiple macro definitions.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D24667


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284485 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 15:17:17 +00:00
Simon Dardis
8e2c689fe9 [mips] Fix sync instruction definition
The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands.
MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit
immediate.

This patch correct the definition of sync so that it is accepted with an
operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned
immediate for MIPS32 and later revisions.

Additionally a clear error is given when the MIPS32 version of sync is
used when targeting pre MIPS32.

This partially resolves PR/30714.

Thanks to Daniel Sanders for reporting this issue!

Reveiwers: vkalintiris

Differential Revision: https://reviews.llvm.org/D25672


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284483 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 14:42:13 +00:00
Simon Dardis
05fe9f3914 [mips] Macro expansion for ld, sd for O32
ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads
or stores using the specified source or destination register and the next
register.

This patch does not add support for the cases where the offset is greater than
a 16 bit signed immediate as that would lead to a wrong/misleading error
message as the assembler would report "instruction requires a CPU feature
not currently enabled" for ld & sd for MIPS64 when their offset is not a signed
16 bit number.

This fixes PR/29159.

Thanks to Sean Bruno for reporting this issue!

Reviewers: vkalintiris, seanbruno, zoran.jovanovic

Differential Review: https://reviews.llvm.org/D24556


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284481 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 14:28:00 +00:00
Michael Zuckerman
d5f902eb11 [x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marks
Committing on behalf of Coby Tayree: After check-all and LGTM

Desc:

AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}')
 Currently, the following forms are allowed:
 {k<num>}
 {k<num>}{z}

This patch allows the following forms:
 {z}{k<num>}

and ignores the next form:
 {z}

Justification would be quite simple - GCC

Differential Revision: http://reviews.llvm.org/D25013



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284479 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 13:52:39 +00:00
Vasileios Kalintiris
289f83a7d4 [mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel.
Summary:
Instead of instantiating the MipsFastISel class and checking if the
target is supported in the overriden methods, we should perform that
check before creating the class. This allows us to enable FastISel *only*
for targets that truly support it, ie. MIPS32 to MIPS32R5.

Reviewers: sdardis

Subscribers: ehostunreach, llvm-commits

Differential Revision: https://reviews.llvm.org/D24824

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284475 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 13:05:42 +00:00
Javed Absar
729751583a [ARM] Assign cost of scaling for Cortex-R52
This patch assigns cost of the scaling used in addressing for Cortex-R52.

On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.

Differential Revision: http://reviews.llvm.org/D25670

Reviewer: jmolloy



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284460 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 09:08:54 +00:00
Simon Pilgrim
e615ec6e15 [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284459 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 07:42:15 +00:00
Dean Michael Berris
dfab4815c7 [XRay] Support for for tail calls for ARM no-Thumb
This patch adds simplified support for tail calls on ARM with XRay instrumentation.

Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14  -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program

    #include <cstdio>
    #include <cassert>
    #include <xray/xray_interface.h>

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
      std::printf("In fC()\n");
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
      std::printf("In fB()\n");
      fC();
    }

    [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
      std::printf("In fA()\n");
      fB();
    }

    // Avoid infinite recursion in case the logging function is instrumented (so calls logging
    //   function again).
    [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
    {
      printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
    }

    int main(int argc, char* argv[]) {
      __xray_set_handler(simplyPrint);

      printf("Patching...\n");
      __xray_patch();
      fA();

      printf("Unpatching...\n");
      __xray_unpatch();
      fA();

      return 0;
    }

gives the following output:

    Patching...
    XRay: functionId=3 type=0.
    In fA()
    XRay: functionId=3 type=1.
    XRay: functionId=2 type=0.
    In fB()
    XRay: functionId=2 type=1.
    XRay: functionId=1 type=0.
    XRay: functionId=1 type=1.
    In fC()
    Unpatching...
    In fA()
    In fB()
    In fC()

So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().

Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).

Differential Revision: https://reviews.llvm.org/D25030

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284456 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 05:54:15 +00:00
Craig Topper
63ae3007f1 [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself.
This is especially important for 32-bit targets with 64-bit shuffle elements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284453 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 04:48:33 +00:00
Craig Topper
a97a64ce70 [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself.
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25666

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284451 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 04:00:32 +00:00
Craig Topper
d02b25359b [AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284450 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov
18560f1ee0 [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it
Differential Revision: https://reviews.llvm.org/D25694


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284435 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-17 22:40:15 +00:00
Tim Northover
087ac0bf03 GlobalISel: support wider range of load/store sizes in AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284406 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-17 18:36:53 +00:00
Tom Stellard
0d8c32f076 AMDGPU/SI: LowerParameter() should be computing align based on memory type
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25203

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284398 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-17 16:56:19 +00:00