llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-04 11:17:31 +00:00

Author	SHA1	Message	Date
whitequark	e28ae58ff4	[LLVM-C] Add LLVMGetHostCPU{Name,Features}. Without these functions it's hard to create a TargetMachine for Orc JIT that creates efficient native code. It's not sufficient to just expose LLVMGetHostCPUName(), because for some CPUs there's fewer features actually available than the CPU name indicates (e.g. AVX might be missing on some CPUs identified as Skylake). Differential Revision: https://reviews.llvm.org/D44861 llvm-svn: 329856	2018-04-11 22:40:42 +00:00
Nemanja Ivanovic	81e041e541	[PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039 The condition only covers one of the two 64-bit rotate instructions. This just adds the second (RLDICLo). Patch by Josh Stone. llvm-svn: 329852	2018-04-11 21:25:44 +00:00
Yonghong Song	830a6fb07d	bpf: signal error instead of silent drop for certain invalid asm insn Currently, an invalid asm insn, either in an asm file or in an inline asm format, might be silently dropped. This patch fixed two places where this may happen by signaling the error so user knows what goes wrong. The following is an example to demonstrate error messages: -bash-4.2$ cat t.c int test(void ctx) { #if defined(NO_ERROR) asm volatile("r0 = (u16 )skb[%0]" : : "i"(2)); #elif defined(ERROR_1) asm volatile("r20 = (u16 )skb[%0]" : : "i"(2)); #elif defined(ERROR_2) asm volatile("r0 = (u16 )(r1 + ?)" : :); #endif return 0; } -bash-4.2$ cat run.sh for macro in NO_ERROR ERROR_1 ERROR_2; do echo "===== compile for macro" $macro clang -D${macro} -O2 -target bpf -emit-llvm -S t.c echo "==llc==" llc -march=bpf -filetype=obj t.ll done -bash-4.2$ ./run.sh ===== compile for macro NO_ERROR ==llc== ===== compile for macro ERROR_1 ==llc== <inline asm>:1:2: error: invalid register/token name r20 = (u16 )skb[2] ^ note: !srcloc = 135 ===== compile for macro ERROR_2 ==llc== <inline asm>:1:21: error: unexpected token r0 = (u16 *)(r1 + ?) ^ note: !srcloc = 210 -bash-4.2$ Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 329849	2018-04-11 20:24:52 +00:00
Gabor Buella	d11176227c	[X86] Describe wbnoinvd instruction Similar to the wbinvd instruction, except this one does not invalidate caches. Ring 0 only. The encoding matches a wbinvd instruction with an F3 prefix. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43816 llvm-svn: 329847	2018-04-11 20:01:57 +00:00
Simon Pilgrim	c09c832a8e	[X86][Atom] Convert Atom scheduler model to SchedRW (PR32431) Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550. I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here, There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers. There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical. NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329837	2018-04-11 18:23:01 +00:00
Simon Pilgrim	ac46827cba	[X86] Generalize X86PadShortFunction to work with TargetSchedModel Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call. Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329834	2018-04-11 18:05:17 +00:00
Artem Belevich	3561cb9d71	[NVPTX] Removed 'satom' feature which is no longer used. Differential Revision: https://reviews.llvm.org/D45061 llvm-svn: 329830	2018-04-11 17:51:33 +00:00
Artem Belevich	3cfdfea288	[NVPTX, CUDA] Improved feature constraints on NVPTX target builtins. When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature, consider those features available if we're compiling for GPU >= sm_XX or have enabled PTX version >= ptxYY. Differential Revision: https://reviews.llvm.org/D45061 llvm-svn: 329829	2018-04-11 17:51:19 +00:00
Tim Renouf	e85f3a4fd1	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Re-landed after noticing that the buildbot failure from 329808 seemed to be unrelated. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329826	2018-04-11 17:18:36 +00:00
Petar Jovanovic	45805522db	[MIPS GlobalISel] Select add i32, i32 Add the minimal support necessary to lower a function that returns the sum of two i32 values. Support argument/return lowering of i32 values through registers only. Add tablegen for regbankselect and instructionselect. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D44304 llvm-svn: 329819	2018-04-11 15:12:32 +00:00
Yaxun Liu	c4af07c19e	[AMDGPU] Fix lowering enqueue_kernel Two issues were fixed: runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make the runtime handle of an enqueued kernel an ordinary global variable. Runtime only needs to store the address of the loaded kernel to the handle and has verified that this approach works. handle the situation where __enqueue_kernel* gets inlined therefore the enqueued kernel may be used through a constant expr instead of an instruction. Differential Revision: https://reviews.llvm.org/D45187 llvm-svn: 329815	2018-04-11 14:46:15 +00:00
Tim Renouf	268bd738df	Revert "[AMDGPU] Ensure there are enough registers for wave dispatch" This reverts 329808. That change caused a report of a failure in test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect it is an expensive-check-only error. Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0 llvm-svn: 329811	2018-04-11 14:27:41 +00:00
Sander de Smalen	73b5f3945d	[AArch64][AsmParser] Split index parsing from vector list. Summary: Place parsing of a vector index into a separate function to reduce duplication, since the code is duplicated in both the parsing of a Neon vector register operand and a Neon vector list. This is patch [2/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45428 llvm-svn: 329809	2018-04-11 14:10:37 +00:00
Tim Renouf	ea9df3592f	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329808	2018-04-11 14:02:41 +00:00
Simon Pilgrim	e955d824eb	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Dmitry Preobrazhensky	28dc2311b5	[AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32 See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845 Differential Revision: https://reviews.llvm.org/D45443 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329801	2018-04-11 13:13:30 +00:00
Francis Visoiu Mistrih	5194d11e51	[AArch64] Fix regression after r329691 In r329691, we would choose FP even if the offset wouldn't fit, just because the offset is smaller than the one from BP. This made many accesses through FP need to scavenge a register, which resulted in slower and bigger code for no good reason. This patch now always picks the offset that fits first, even if FP is preferred. llvm-svn: 329797	2018-04-11 12:36:55 +00:00
Sjoerd Meijer	10e7e8538f	[ARM] FP16 VSEL codegen This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788	2018-04-11 09:28:04 +00:00
Sander de Smalen	8787081f3e	[AArch64][AsmParser] Unify code for parsing Neon/SVE vectors. Summary: Merged 'tryMatchVectorRegister' (specific to Neon) and 'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and created a generic 'parseVectorKind()' function that returns the #Elements and ElementWidth of a vector suffix. This reduces the duplication of this functionality between two the vector implementations. This is patch [1/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45427 llvm-svn: 329782	2018-04-11 07:36:10 +00:00
Craig Topper	debdfcda33	[X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774	2018-04-11 04:55:04 +00:00
Craig Topper	ece8d06b3f	[X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771	2018-04-11 01:09:10 +00:00
Sriraman Tallam	2db3b802af	GOTPCREL references must always use RIP. With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765	2018-04-10 22:50:05 +00:00
Marek Olsak	3f71d5838c	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Geoff Berry	eef4029a4d	[AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass. Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 llvm-svn: 329761	2018-04-10 21:43:03 +00:00
Jessica Paquette	9222709676	Recommit r329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation" This commit fixes the bot failures that were coming up before with r329716. The fix was to move the check for "isInSection()" inside of the if condition and emit the error there instead of waiting to get past the unreachable statement. This should work in debug and release builds now. llvm-svn: 329746	2018-04-10 19:46:43 +00:00
Amara Emerson	803368e4c4	[AArch64] Fix isel failure when BUILD_PAIR nodes are left over. rdar://39175175 llvm-svn: 329743	2018-04-10 19:01:58 +00:00
Gabor Buella	bb61717e4e	[X86] Split up -march=icelake to -client & -server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742	2018-04-10 18:59:13 +00:00
Craig Topper	3f0b94953a	[X86] Change the name string for the newly add DF flag register to 'dirflag' to match the clobber name supported by clang for MS inline assembly. This should fix the failure found by Chromium reported here https://bugs.chromium.org/p/chromium/issues/detail?id=831158 The test case will be added in clang. llvm-svn: 329734	2018-04-10 18:21:04 +00:00
Jessica Paquette	215350cd45	Revert 329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation" This broke a bunch of bots so I'm reverting while I figure it out. llvm-svn: 329728	2018-04-10 17:53:41 +00:00
Peter Collingbourne	78fa2dfe59	Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF." Caused a build failure in check-tsan. llvm-svn: 329718	2018-04-10 16:19:30 +00:00
Jessica Paquette	a6ca6a3778	Add missing nullptr check to AArch64MachObjectWriter::recordRelocation There was missing nullptr check before a call to getSection() in recordRelocation. This would result in a segfault in code like the attached test. This adds the missing check and a test which makes sure we get the expected error output. llvm-svn: 329716	2018-04-10 15:53:28 +00:00
Nicolai Haehnle	e00914eac3	AMDGPU/MC: Allow disassembling without symbol info Summary: We would like the UMR debugging tool[0] to be able to provide disassembly for currently live waves based on plain memory dumps, and we want to leverage the LLVM disassembler for this. This mostly works, except that UMR clearly can't provide real symbol info, so it wants to set DisInfo == nullptr. [0] https://cgit.freedesktop.org/amd/umr/ Reviewers: arsenm, rampitec, artem.tamazov, dp Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45477 Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6 llvm-svn: 329715	2018-04-10 15:46:43 +00:00
Chad Rosier	5537decd98	Fix spelling. NFC. llvm-svn: 329709	2018-04-10 14:57:13 +00:00
Simon Pilgrim	e9a5e7677c	Fix whitespace indentation. NFCI. llvm-svn: 329704	2018-04-10 14:21:33 +00:00
Gabor Buella	6ebc5794c8	[X86] Disable SGX for Skylake Server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45057 llvm-svn: 329700	2018-04-10 13:58:57 +00:00
Francis Visoiu Mistrih	4152237804	[AArch64] Use FP to access the emergency spill slot In the presence of variable-sized stack objects, we always picked the base pointer when resolving frame indices if it was available. This makes us hit an assert where we can't reach the emergency spill slot if it's too far away from the base pointer. Since on AArch64 we decide to place the emergency spill slot at the top of the frame, it makes more sense to use FP to access it. The changes here don't affect only emergency spill slots but all the frame indices. The goal here is to try to choose between FP, BP and SP so that we minimize the offset and avoid scavenging, or worse, asserting when trying to access a slot allocated by the scavenger. Previously discussed here: https://reviews.llvm.org/D40876. Differential Revision: https://reviews.llvm.org/D45358 llvm-svn: 329691	2018-04-10 11:29:40 +00:00
Tim Renouf	f6f8f3bacb	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690	2018-04-10 11:25:15 +00:00
Tim Northover	298660ed4c	AArch64: diagnose unpredictable store-exclusive instructions Much like any written register in load/store instructions, the status register is not allowed to overlap with any others. So diagnose it like we already do with the other cases. llvm-svn: 329687	2018-04-10 11:04:29 +00:00
Andrea Di Biagio	07ca1a95da	[X86][Broadwell] HWPort5 should not be added to BroadwellModelProcResources. The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell resource, and not a Broadwell processor resource. That entry was added to the Broadwell model because variable blends were consuming it. This was clearly a typo (the resource name should have been BWPort5), which unfortunately was never caught before. It was not reported as an error because HWPort5 is a resource defined by the Haswell model. It has been found when testing some code with llvm-mca: the list of resources in the resource pressure view was odd. This patch fixes the issue; now variable blend instructions consume 2 cycles on BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious) entry in the BroadWellModelProcResources table. llvm-svn: 329686	2018-04-10 10:49:41 +00:00
Sander de Smalen	32aba74592	[AArch64][SVE] Asm: Add support for unpredicated LSL/LSR (shift by immediate) instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45371 llvm-svn: 329681	2018-04-10 10:03:13 +00:00
Clement Courbet	221e6ccd99	[MC][TableGen] Add optional libpfm counter names for ProcResUnits. Summary: Subtargets can define the libpfm counter names that can be used to measure cycles and uops issued on ProcResUnits. This allows making llvm-exegesis available on more targets. Fixes PR36984. Reviewers: gchatelet, RKSimon, andreadb, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45360 llvm-svn: 329675	2018-04-10 08:16:37 +00:00
Sander de Smalen	c9308fd1bc	[AArch64][SVE] Asm: Add support for SVE INDEX instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45370 llvm-svn: 329674	2018-04-10 07:01:53 +00:00
Chandler Carruth	39ece4183c	[x86] Model the direction flag (DF) separately from the rest of EFLAGS. This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673	2018-04-10 06:40:51 +00:00
Craig Topper	124ae55576	[X86] Prevent folding loads with 64-bit ANDs with immediates that fit in 32-bits. Prefer to use the 32-bit AND with immediate instead. Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold. Fixes PR37063. llvm-svn: 329667	2018-04-10 03:44:15 +00:00
Chandler Carruth	dad2c9ed8c	[x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657	2018-04-10 01:41:17 +00:00
Vlad Tsyrklevich	6b62d1890b	ShadowCallStack/x86_64: Ignore pseudo-machine instructions llvm-svn: 329656	2018-04-10 01:31:01 +00:00
Daniel Sanders	e813b4c7f0	[globalisel][legalizerinfo] Add support for the Lower action in getActionDefinitionsBuilder() and use it in AArch64. Lower is slightly odd. It often doesn't change the type but the lowerings do use the new type to decide what code to create. Treat it like a mutation but provide convenience functions that re-use the existing type. Re-uses the existing tests: test/CodeGen/AArch64/GlobalISel/legalize-rem.mir test/CodeGen/AArch64/GlobalISel//legalize-mul.mir test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir llvm-svn: 329623	2018-04-09 21:10:09 +00:00
Konstantin Zhuravlyov	97125d62df	AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel header 1. Remove max_scratch_backing_memory_byte_size from kernel header 2. Make it a reserved field 3. Ignore it while parsing assembly for backwards compatibility 4. Bump up minor version of kernel header Differential Revision: https://reviews.llvm.org/D45452 llvm-svn: 329620	2018-04-09 20:47:22 +00:00
Craig Topper	97482cfcad	[X86] Don't use Lower512IntUnary to split bitcasts with v32i16/v64i8 types on targets without AVX512BW. LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type. Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round. Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases. This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that. llvm-svn: 329616	2018-04-09 20:37:14 +00:00
Peter Collingbourne	8fce7e82b9	AArch64: Allow offsets to be folded into addresses with ELF. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611	2018-04-09 19:59:57 +00:00

1 2 3 4 5 ...

46987 Commits