archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Petar Avramovic	29902ee1b4	AMDGPU/GlobalISel: Fix negative offset folding for buffer_load Buffer_load does unsigned offset calculations. Don't fold operands of 32-bit add that are likely to cause unsigned add overflow (common case is when one of the operands is negative). Differential Revision: https://reviews.llvm.org/D91336	2021-04-27 14:45:22 +02:00
Petar Avramovic	b0f45068ce	AMDGPU/GlobalISel: Add test for buffer_load with negative offset Pre-commit test for D91336.	2021-04-27 14:45:21 +02:00
Zarko Todorovski	04b8e84be8	[AIX] Allow safe for 32bit P9 VSX extract and insert pattern matches In https://reviews.llvm.org/D92789 PPC64 checks were added that disallowed most VSX pattern matching. We enable some safe ones for 32bit in this patch. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97503	2021-04-27 07:27:43 -04:00
Petar Avramovic	b83185c286	AMDGPU/GlobalISel: Remove redundant G_FCANONICALIZE Add basic version of isCanonicalized for global-isel. Copied from sdag. Add post legalizer combine that deletes G_FCANONICALIZE when its input is already Canonicalized. Differential Revision: https://reviews.llvm.org/D96605	2021-04-27 12:26:37 +02:00
Petar Avramovic	495d2a275a	AMDGPU/GlobalISel: Add integer med3 combines Add signed and unsigned integer version of med3 combine. Source pattern is min(max(Val, K0), K1) or max(min(Val, K1), K0) where K0 and K1 are constants and K0 <= K1. Destination is med3 that corresponds to signedness of min/max in source. Differential Revision: https://reviews.llvm.org/D90050	2021-04-27 11:52:23 +02:00
David Sherwood	1d6ba88234	[NFC][SVE] Add tests for inserting subvectors into illegal scalable vectors A previous commit fixed some issues with inserting subvectors into illegal scalable vectors: 0035decae7ab9ab1c988fdcede46598540afd1a0 I've created a patch that simply adds some of those same tests for SVE. Differential Revision: https://reviews.llvm.org/D100641	2021-04-27 09:02:43 +01:00
Chen Zheng	611f6b8043	[XCOFF] make .file directive have directory info The .file directive is changed to only have basename in D36018 for ELF. But on AIX, we require the .file directive to also contain the directory info. This aligns with other AIX compiler like XLC and is required by some AIX tool like DBX. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D99785	2021-04-27 00:15:23 -04:00
Wang, Pengfei	0711512527	Reapply "[X86][AMX] Try to hoist AMX shapes' def" We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D101067	2021-04-27 10:27:59 +08:00
Yonghong Song	1c5a0c4acc	BPF: generate BTF info for LD_imm64 loaded function pointer For an example like below, extern int do_work(int); long bpf_helper(void *callback_fn); long prog() { return bpf_helper(&do_work); } The final generated codes look like: r1 = do_work ll call bpf_helper exit where we have debuginfo for do_work() extern function: !17 = !DISubprogram(name: "do_work", ...) This patch implemented to add additional checking in processing LD_imm64 operands for possible function pointers so BTF for bpf function do_work() can be properly generated. The original llvm function name processReloc() is renamed to processGlobalValue() to better reflect what the function is doing. Differential Revision: https://reviews.llvm.org/D100568	2021-04-26 17:23:36 -07:00
William S. Moses	e7084f2810	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 20:12:12 -04:00
William S. Moses	5b35b95712	Revert "[NVPTX] Enable lowering of atomics on local memory" This reverts commit fede99d386ec9e7bab2762aa16cb10c0513ae464.	2021-04-26 19:33:01 -04:00
William S. Moses	9ac62ee58d	[NVPTX] Enable lowering of atomics on local memory LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store. Differential Revision: https://reviews.llvm.org/D98650	2021-04-26 19:27:27 -04:00
Craig Topper	8ceb7af3d5	[RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32. Reduces the amount of vector ALU operations and reduces vector register pressure.	2021-04-26 15:43:02 -07:00
Baptiste Saleil	c4a674e21e	[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pass and its associated test cases from the tree. Differential Revision: https://reviews.llvm.org/D101313 Change-Id: I0599169a7609c19a887f8d847a71e664030cc141	2021-04-26 17:21:49 -04:00
Craig Topper	ca9962891d	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00
Sebastian Neubauer	6b1be183c5	[AMDGPU] Fix autogenerated wwm-reserved-spill.ll Due to a bug in update_llc_test_checks.py, the test is wrongly coalesced between run lines. Remove common check prefix to fix that. NFC.	2021-04-26 19:09:09 +02:00
Sebastian Neubauer	dec086fb0d	[AMDGPU] Use MapVector for WWMReservedRegs Use MapVector instead of SmallDenseMap because it has a deterministic iteration order. Differential Revision: https://reviews.llvm.org/D101299	2021-04-26 17:43:00 +02:00
Tim Northover	d8308abd8d	AArch64: support atomics in GISel	2021-04-26 14:38:06 +01:00
Bradley Smith	385ac4b46b	[AArch64][SVE] Add missing patterns for scalar versions of SQSHL/UQSHL Differential Revision: https://reviews.llvm.org/D101058	2021-04-26 13:07:12 +01:00
David Green	746a7315fd	[ARM] Expand VMOVRRD simplification pattern This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern, to also handle insert_vectors. Providing we can find the correct insert, this helps further simplify patterns by removing the redundant VMOVRRD. Differential Revision: https://reviews.llvm.org/D100245	2021-04-26 12:27:38 +01:00
David Green	bd921d06d1	[ARM] Additional soft float BE test. NFC	2021-04-26 11:44:10 +01:00
David Green	df2cb039cf	[ARM] Ensure loop invariant active.lane.mask operands CGP can move instructions like a ptrtoint into a loop, but the MVETailPredication when converting them will currently assume invariant trip counts. This tries to ensure the operands are loop invariant, and bails if not. Differential Revision: https://reviews.llvm.org/D100550	2021-04-26 10:04:33 +01:00
Ben Shi	d474a426e5	[RISCV] Optimize addition with immediate Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D101244	2021-04-26 13:26:17 +08:00
Craig Topper	88feba99f6	[RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs. This teaches DAG combine that shift amount operands for grev, gorc shfl, unshfl only read a few bits. This also teaches DAG combine that grevw, gorcw, shflw, unshflw, bcompressw, bdecompressw only consume the lower 32 bits of their inputs. In the future we can teach SimplifyDemandedBits to also propagate demanded bits of the output to the inputs in some cases.	2021-04-25 21:54:06 -07:00
Levy Hsu	b3953bd33d	[RISCV] [1/2] Add IR intrinsic for Zbe extension RV32/64: bcompress bdecompress RV64 ONLY: bcompressw bdecompressw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101143	2021-04-25 19:14:34 -07:00
Roman Lebedev	35f6937f2b	[NFC][X86][AVX2] Add baseline CodeGen/CostModel tests for interleaved loads/stores of i16 w/ strides 2/3/4 `X86TTIImpl::getInterleavedMemoryOpCostAVX2()` currently contains data only for a handful of tuples. For now, at least add tests for a few more. I'm guessing that we care how well the patterns codegen since we use their presumed cost for vectorization decisions, so i've added codegen tests too. There's one really easy caveat for these codegen tests: for interleaved load tests, we really have to ensure that the deinterleaved vectors are escaped separately. Similarly for stores.	2021-04-26 01:13:07 +03:00
Simon Pilgrim	622199a58f	Revert rG2149aa73f640c96 "[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)" This might be the cause of some msan build failures - I don't have access to a msan build right now, so this is a speculative revert.	2021-04-25 12:45:07 +01:00
Simon Pilgrim	b2f9c3dec2	[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841) XADD has the same EFLAGS behaviour as ADD	2021-04-25 12:02:33 +01:00
Simon Pilgrim	b90d4aa510	[X86] Add PR20841 test cases showing failure to reuse ZF from XADD ops	2021-04-25 11:50:18 +01:00
Simon Pilgrim	b5d73cafc6	[X86] Regenerate atomic-flags.ll test file	2021-04-25 11:50:18 +01:00
Xiang1 Zhang	cf25c4dbf6	[X86] Refine AMX fast register allocation	2021-04-25 14:20:53 +08:00
Xiang1 Zhang	6da00a5d84	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-04-25 09:45:41 +08:00
Dávid Bolvanský	7c4c0f3460	[Analysis] Attribute alignment should not prevent tail call optimization Fixes tail folding issue mentioned in D100879. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D101230	2021-04-24 19:57:42 +02:00
David Green	1503461e4f	[AArch64] Enable UseAA globally in the AArch64 backend This is similar to D69796 from the ARM backend. We remove the UseAA feature, enabling it globally in the AArch64 backend. This should in general be an improvement allowing the backend to reorder more instructions in scheduling and codegen, and enabling it by default helps to improve the testing of the feature, not making it cpu-specific. A debugging option is added instead for testing. Differential Revision: https://reviews.llvm.org/D98781	2021-04-24 17:51:50 +01:00
Michael Kitzan	a3f088c51a	[MachineCSE] Prevent CSE of non-local convergent instrs At the moment, MachineCSE allows CSE-ing convergent instrs which are non-local to each other. This can cause illegal codegen as convergent instrs are control flow dependent. The patch prevents non-local CSE of convergent instrs by adding a check in isProfitableToCSE and rejecting CSE-ing if we're considering CSE-ing non-local convergent instrs. We can still CSE convergent instrs which are in the same control flow scope, so the patch purposely does not make all convergent instrs non-CSE candidates in isCSECandidate. https://reviews.llvm.org/D101187	2021-04-23 16:44:48 -07:00
Mitch Phillips	6c51544efa	Revert "[X86][AMX] Try to hoist AMX shapes' def" This reverts commit 90118563ad0f133c696e070ad72761fa0daa4517. Reason: Broke the MSan buildbots. https://lab.llvm.org/buildbot/#/builders/5/builds/6967/steps/9/logs/stdio More details can be found in the original phabricator review: https://reviews.llvm.org/D101067	2021-04-23 10:42:26 -07:00
Craig Topper	d426ff687e	[RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions. Theses instructions are allowed to write v0 when they are masked. We'll still never use v0 because of the earlyclobber constraint so this doesn't really help anything. It just makes the definitions correct. While I was there remove an unused multiclass I noticed. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101118	2021-04-23 09:33:29 -07:00
Sebastian Neubauer	1f0605d432	[AMDGPU] Save WWM registers in functions The values of registers in inactive lanes needs to be saved during function calls. Save all registers used for whole wave mode, similar to how it is done for VGPRs that are used for SGPR spilling. Differential Revision: https://reviews.llvm.org/D99429 Reapply with fixed tests on window.	2021-04-23 18:09:24 +02:00
Nemanja Ivanovic	572869bea6	[PowerPC] Add vec_ctsl and vec_ctul to altivec.h These are added for compatibility with XLC. They are similar to vec_cts and vec_ctu except that the result is a doubleword vector regardless of the parameter type.	2021-04-23 11:03:38 -05:00
Joe Ellis	0afb9c183a	[AArch64][SVE] Fix bug in lowering of fixed-length integer vector divides The function AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE previously assumed the operands were full vectors, but this is not always true. This function would produce bogus if the division operands are not full vectors, resulting in miscompiles when dividing 8-bit or 16-bit vectors. The fix is to perform an extend + div + truncate for non-full vectors, instead of the usual unpacking and unzipping logic. This is an additive change which reduces the non-full integer vector divisions to a pattern recognised by the existing lowering logic. For future reference, an example of code that would miscompile before this patch is below: 1 int8_t foo(unsigned N, int8_t a, int8_t b, int8_t *c) { 2 int8_t result = 0; 3 for (int i = 0; i < N; ++i) { 4 result += (a[i] / b[i]) / c[i]; 5 } 6 return result; 7 } Differential Revision: https://reviews.llvm.org/D100370	2021-04-23 14:55:10 +00:00
Jay Foad	982c01a843	[AMDGPU] Fix typo in implicit operand lists Several tests had a typo where they mentioned sgpr17 twice instead of sgpr17 and sgpr27. This had a significant effect on the "scavenge_sgpr_pei_no_sgprs" tests because there was actually an sgpr available, namely sgpr27. Differential Revision: https://reviews.llvm.org/D100960	2021-04-23 15:44:17 +01:00
Sebastian Neubauer	8b5cb86ad7	Revert "[AMDGPU] Save WWM registers in functions" This reverts commit 91464c30bfcf731ccb7f9d6ef6d26e8c1657a6e6. Seems to break tests on windows.	2021-04-23 16:38:50 +02:00
Piotr Sobczak	099aac7b88	[AMDGPU][NFC] Update auto-gen test Most likely the "glc" was not added to the test when the volatile loads started generating those bits.	2021-04-23 16:33:16 +02:00
Sebastian Neubauer	ec74f9a23a	[AMDGPU] Save WWM registers in functions The values of registers in inactive lanes needs to be saved during function calls. Save all registers used for whole wave mode, similar to how it is done for VGPRs that are used for SGPR spilling. Differential Revision: https://reviews.llvm.org/D99429	2021-04-23 16:09:31 +02:00
Simon Pilgrim	a7d1c869bb	[X86] Add Win32/64 mulo test coverage Part of an investigation to solve the windows regressions caused by rG13ec913bdf50	2021-04-23 14:51:42 +01:00
Matt Arsenault	182934f750	AMDGPU: Fix assert on inline asm on gfx90a This was assuming all mayLoad instructions have one def.	2021-04-23 09:00:25 -04:00
Fraser Cormack	23b998863a	[RISCV] Custom lower vector F(MIN\|MAX)NUM to vf(min\|max) This patch adds support for both scalable- and fixed-length vector code lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent RVV instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101035	2021-04-23 12:22:15 +01:00
Daniel Kiss	30b326d46e	[AArch64] Fix for BTI landing pad insertion with PAC-RET+bkey. EMITBKEY is emitted for PAC-RET+bkey, which is a non machine instructions. PR: 49957 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D100996	2021-04-23 10:07:25 +02:00
Wang, Pengfei	d73d62d45f	[X86][AMX] Try to hoist AMX shapes' def We request no intersections between AMX instructions and their shapes' def when we insert ldtilecfg. However, this is not always ture resulting from not only users don't follow AMX API model, but also optimizations. This patch adds a mechanism that tries to hoist AMX shapes' def as well. It only hoists shapes inside a BB, we can improve it for cases across BBs in future. Currently, it only hoists shapes of which all sources' def above the first AMX instruction. We can improve for the case that only source that moves an immediate value to a register below AMX instruction. Differential Revision: https://reviews.llvm.org/D101067	2021-04-23 12:17:00 +08:00
Wang, Pengfei	d7776e3283	[X86] Enable compilation of user interrupt handlers. Add __uintr_frame structure and use UIRET instruction for functions with x86 interrupt calling convention when UINTR is present. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99708	2021-04-23 11:43:57 +08:00

1 2 3 4 5 ...

38645 Commits