RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-04-13 03:21:07 +00:00

Author	SHA1	Message	Date
Artem Tamazov	3398735496	[AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed. Fixes Bug 28215. Lit tests updated. Differential Revision: https://reviews.llvm.org/D25837 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284825 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-21 14:49:22 +00:00
Simon Pilgrim	06bac82d8f	[X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284823 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-21 13:00:47 +00:00
Simon Pilgrim	01109e8b73	[X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284821 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-21 12:14:24 +00:00
Bjorn Pettersson	46ab1d2bd2	[AArch64] Corrected spill size for DDD register class. NFCI Summary: The spill size was incorrectly set to 196 bits, which isn't a multiple of 8. This problem was detected when experimenting with asserts that the spill size should be a multiple of the byte size. New corrected value for the spill size is set to 192 bits. Note that tablegen (RegisterInfoEmitter) will divide the size set in the RegisterClass definition by 8. So this change should not have any impact on the tablegen output (trunc(192/8) == trunc(196/8) == 24 bytes). Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D25818 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284814 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-21 09:53:42 +00:00
Michael Kuperstein	fe040323c6	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284779 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 21:04:31 +00:00
Konstantin Zhuravlyov	37962abba1	[AMDGPU] Make note record name a static const member of target streamer Differential Revision: https://reviews.llvm.org/D25746 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284760 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov	e0e811c4ce	[AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284759 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 18:12:38 +00:00
Simon Pilgrim	e1ac64bc87	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284755 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 18:00:35 +00:00
Sanjay Patel	928f047b68	[Target] remove TargetRecip class; 2nd try This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284746 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 16:55:45 +00:00
Simon Pilgrim	99edc4fc3c	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284744 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 16:39:11 +00:00
Valery Pykhtin	446cd5eefe	[AMDGPU] add fcopysign(f64, f32) pattern Differential revision: https://reviews.llvm.org/D25827 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284743 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 16:17:54 +00:00
Benjamin Kramer	06d5a1641d	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284721 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 12:20:28 +00:00
Jonas Paulsson	a007516937	[SystemZ] Post-RA scheduler implementation Post-RA sched strategy and scheduling instruction annotations for z196, zEC12 and z13. This scheduler optimizes decoder grouping and balances processor resources (including side steering the FPd unit instructions). The SystemZHazardRecognizer keeps track of the scheduling state, which can be dumped with -debug-only=misched. Reviers: Ulrich Weigand, Andrew Trick. https://reviews.llvm.org/D17260 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284704 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 08:27:16 +00:00
Peter Collingbourne	f43c9c6d35	X86: Allow expressions to appear as u8imm operands. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284688 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 01:58:34 +00:00
Peter Collingbourne	bae0043048	X86: Deduplicate some lowering code. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284686 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 01:21:26 +00:00
Reid Kleckner	3a3c5c8815	Use __func__ directly now that all supported compilers support it Remove the portability macro now that it is unused. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284681 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-20 00:22:23 +00:00
Wei Ding	cc8ca50286	AMDGPU : Add a function to enable and disable IEEEBit for SC and shader respectively. Differential Revision: http://reviews.llvm.org/D25789 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284655 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 22:34:49 +00:00
Krzysztof Parzyszek	735fbf86f3	[AMDGPU] Stop using MCRegisterClass::getSize() Differential Review: https://reviews.llvm.org/D24675 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284619 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 17:40:36 +00:00
Krzysztof Parzyszek	cac70281f9	[RDF] Switch RefMap in liveness calculation to use lane masks This required reengineering of some of the part of liveness calculation, including fixing some issues caused by the limitations of the previous approach. The current code is not necessarily the fastest, but it should be functionally correct (at least more so than before). The compile-time performance will be addressed in the future. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284609 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 16:30:56 +00:00
Chris Dewhurst	4a9c407929	[Sparc][LEON] Detects an erratum on UT699 LEON 3 processors involving rounding mode changes and issues an appropriate user error message. Differential Revision: https://reviews.llvm.org/D24665 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284591 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 14:01:06 +00:00
Sjoerd Meijer	556bf4b535	Reapply r284571 (with the new tests fixed). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284588 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 13:43:02 +00:00
Ulrich Weigand	bc50ab0dfa	[SystemZ] Add missing vector instructions for the assembler Most z13 vector instructions have a base form where the data type of the operation (whether to consider the vector to be 16 bytes, 8 halfwords, 4 words, or 2 doublewords) is encoded into a mask field, and then a set of extended mnemonics where the mask field is not present but the data type is encoded into the mnemonic name. Currently, LLVM only supports the type-specific forms (since those are really the ones needed for code generation), but not the base type-generic forms. To complete the assembler support and make it fully compatible with the GNU assembler, this commit adds assembler aliases for all the base forms of the various vector instructions. It also adds two more alias forms that are documented in the PoP: VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc. VNOT -- special variant of VNO git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284586 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 13:03:18 +00:00
Ulrich Weigand	a91221aaa6	[SystemZ] Add optional argument to some vector string instructions The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are documented in the Principles of Operation to have an optional last operand to encode arbitrary values in a mask field. This commit adds support for those optional operands, and cleans up the patterns to generate vector string instruction as bit. No change to code generation intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284585 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 12:57:46 +00:00
James Molloy	ab4e0362c7	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284580 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 12:06:49 +00:00
Sjoerd Meijer	c9cee26cf7	Revert of r284571 because of failing tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284572 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 07:45:48 +00:00
Sjoerd Meijer	9a54c88709	Checking FP function attribute values and adding more build attribute tests. This renames the function for checking FP function attribute values and also adds more build attribute tests (which are in separate files because build attributes are set per file). Differential Revision: https://reviews.llvm.org/D25625 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284571 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 07:25:06 +00:00
Craig Topper	cdb220aad5	[AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast. Summary: This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors. New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions. There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert. The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns. Reviewers: RKSimon, delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25651 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284567 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-19 04:44:17 +00:00
Eli Friedman	ed57153864	Improve ARM lowering for "icmp <2 x i64> eq". The custom lowering is pretty straightforward: basically, just AND together the two halves of a <4 x i32> compare. Differential Revision: https://reviews.llvm.org/D25713 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284536 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 21:03:40 +00:00
Evandro Menezes	d9b0063780	[AArch64] Avoid materializing 0.0 when generating FP SELECT Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0` to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not have to be materialized beforehand for FCMP, as it has a form that has 0.0 as an implicit operand. Differential Revision: https://reviews.llvm.org/D24808 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284531 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 20:37:35 +00:00
Tim Northover	10519cde2d	GlobalISel: select small binary operations on AArch64. AArch64 actually supports many 8-bit operations under the definition used by GlobalISel: the designated information-carrying bits of a GPR32 get the right value if you just use the normal 32-bit instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284526 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 20:03:48 +00:00
Tim Northover	1fd8e0ac25	GlobalISel: support floating-point constants on AArch64. Patch from Ahmed Bougacha. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284523 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 19:47:57 +00:00
Krzysztof Parzyszek	57144cfc1a	[Hexagon] Handle block live-ins with lane masks in HexagonBlockRanges git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284522 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 19:47:20 +00:00
Benjamin Kramer	6e3c4da116	Reduce global namespace pollution. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284521 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 19:39:31 +00:00
Sanjay Patel	bbcb21daf0	revert r284495: [Target] remove TargetRecip class There's something wrong with the StringRef usage while parsing the attribute string. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284513 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 18:36:49 +00:00
Sanjay Patel	5800d6e9a7	[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284495 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 17:05:05 +00:00
Simon Pilgrim	cad4756e00	[X86][AVX512] Add mask/maskz writemask support to constant pool shuffle decode commentx git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284488 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 15:45:37 +00:00
Simon Dardis	4ec176ab25	[mips][ias] Handle more complicated expressions for memory operands This patch teaches ias for mips to handle expressions such as (84)+(831)($sp). Such expression typically occur from the expansion of multiple macro definitions. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D24667 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284485 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 15:17:17 +00:00
Simon Dardis	8e2c689fe9	[mips] Fix sync instruction definition The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands. MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit immediate. This patch correct the definition of sync so that it is accepted with an operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned immediate for MIPS32 and later revisions. Additionally a clear error is given when the MIPS32 version of sync is used when targeting pre MIPS32. This partially resolves PR/30714. Thanks to Daniel Sanders for reporting this issue! Reveiwers: vkalintiris Differential Revision: https://reviews.llvm.org/D25672 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284483 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 14:42:13 +00:00
Simon Dardis	05fe9f3914	[mips] Macro expansion for ld, sd for O32 ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads or stores using the specified source or destination register and the next register. This patch does not add support for the cases where the offset is greater than a 16 bit signed immediate as that would lead to a wrong/misleading error message as the assembler would report "instruction requires a CPU feature not currently enabled" for ld & sd for MIPS64 when their offset is not a signed 16 bit number. This fixes PR/29159. Thanks to Sean Bruno for reporting this issue! Reviewers: vkalintiris, seanbruno, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24556 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284481 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 14:28:00 +00:00
Michael Zuckerman	d5f902eb11	[x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marks Committing on behalf of Coby Tayree: After check-all and LGTM Desc: AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}') Currently, the following forms are allowed: {k<num>} {k<num>}{z} This patch allows the following forms: {z}{k<num>} and ignores the next form: {z} Justification would be quite simple - GCC Differential Revision: http://reviews.llvm.org/D25013 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284479 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 13:52:39 +00:00
Vasileios Kalintiris	289f83a7d4	[mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel. Summary: Instead of instantiating the MipsFastISel class and checking if the target is supported in the overriden methods, we should perform that check before creating the class. This allows us to enable FastISel only for targets that truly support it, ie. MIPS32 to MIPS32R5. Reviewers: sdardis Subscribers: ehostunreach, llvm-commits Differential Revision: https://reviews.llvm.org/D24824 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284475 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 13:05:42 +00:00
Javed Absar	729751583a	[ARM] Assign cost of scaling for Cortex-R52 This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284460 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 09:08:54 +00:00
Simon Pilgrim	e615ec6e15	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284459 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 07:42:15 +00:00
Dean Michael Berris	dfab4815c7	[XRay] Support for for tail calls for ARM no-Thumb This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284456 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 05:54:15 +00:00
Craig Topper	63ae3007f1	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284453 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 04:48:33 +00:00
Craig Topper	a97a64ce70	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284451 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 04:00:32 +00:00
Craig Topper	d02b25359b	[AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284450 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov	18560f1ee0	[AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it Differential Revision: https://reviews.llvm.org/D25694 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284435 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-17 22:40:15 +00:00
Tim Northover	087ac0bf03	GlobalISel: support wider range of load/store sizes in AArch64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284406 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-17 18:36:53 +00:00
Tom Stellard	0d8c32f076	AMDGPU/SI: LowerParameter() should be computing align based on memory type Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25203 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284398 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-17 16:56:19 +00:00

1 2 3 4 5 ...

40420 Commits