llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:50:30 +00:00

Author	SHA1	Message	Date
Jay Foad	804e78dd0d	[ANDGPU] getMemOperandsWithOffset: support BUF non-stack-access instructions with resource but no vaddr Summary: This enables clustering for many more BUF instructions. Reviewers: rampitec, arsenm, nhaehnle Subscribers: jvesely, wdng, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73868	2020-02-03 22:49:30 +00:00
Max Moroz	641e31f422	[libFuzzer] Minor documentation fixes.	2020-02-03 14:41:06 -08:00
Jessica Paquette	fdb6466319	[AArch64][GlobalISel] Fold G_SHL into TB(N)Z bit calculation This implements the following optimization: ``` (tbz (shl x, c), b) -> (tbz x, b-c) ``` Which appears in `getTestBitOperand` in AArch64ISelLowering.cpp. If we test bit `b` of `shl x, c`, we can fold away the `shl` by looking `c` bits to the right of `b` in `x` when this fits in the type. So, we can just test the `b-c`th bit. Differential Revision: https://reviews.llvm.org/D73924	2020-02-03 14:27:08 -08:00
Matt Arsenault	6eaea3de63	AMDGPU: Add flag to control mem intrinsic expansion GlobalISel doesn't implement the expansion for these yet, so add a flag to force expanding these so it's possible to avoid these for a while.	2020-02-03 14:26:01 -08:00
Reid Kleckner	5e5b298df1	Fix modules build after PassManagerImpl.h addition This new header needs to be in the LLVM_intrinsics_gen module.	2020-02-03 14:25:43 -08:00
Reid Kleckner	77ca9729e6	Fix LLVM_ENABLE_MODULES build after TypeSize.h change	2020-02-03 14:21:44 -08:00
David Green	7f16958e61	[ARM] MVE vector reduction fadd and fmul tests. NFC	2020-02-03 22:03:56 +00:00
Michael Trent	7d03550470	Omit "Contents of" headers when -no-leading-headers is specified. Summary: llvm-objdump -macho will no longer print "Contents of" headers when disassembling section contents when -no-leading-headers is specified. For historical reasons, this flag is independent of -no-leading-addr. Reviewers: ab, pete, jhenderson Reviewed By: jhenderson Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73574	2020-02-03 13:33:50 -08:00
Tyker	b01d58912b	[NFC] Factor out function to detect if an attribute has an argument.	2020-02-03 22:27:24 +01:00
Matt Arsenault	8c7054551b	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Matt Arsenault	e2c73b193c	AMDGPU/GlobalISel: Allow selecting s128 load/stores	2020-02-03 12:28:08 -08:00
Matt Arsenault	b87aa70bb5	AMDGPU: Fix splitting wide f32 s.buffer.load intrinsics This would witch f32 to i32, and produce an invald concat_vectors from i32 pieces to an f32 vector.	2020-02-03 12:28:08 -08:00
David Tenty	68d4581cd4	[AIX] Don't use a zero fill with a second parameter Summary: The AIX assembler .space directive can't take a second non-zero argument to fill with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo to check if non-zero fill is supported, and if we can't zerofill non-zero values we just splat the .byte directives. Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L Reviewed By: jasonliu Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73554	2020-02-03 15:16:08 -05:00
Jessica Paquette	53b445b98d	[AArch64][GlobalISel] Walk through G_AND in TB(N)Z bit calculation Given ``` tb(n)z (and x, m), b ``` Where the `b`-th bit of `m` is 1, ``` tb(n)z (and x, m), b == tb(n)z x, b ``` So, we can walk past a `G_AND` in this case. Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this. Differential Revision: https://reviews.llvm.org/D73790	2020-02-03 11:53:47 -08:00
Amara Emerson	c040f97bf9	[AArch64][GlobalISel] Don't reconvert to p0 in convertPtrAddToAdd(). convertPtrAddToAdd improved overall code size and quality by a significant amount, but on -O0 we generate some cross-class copies due to the fact that we emitted G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any register coalescing, so these cross class copies end up escaping as moves, and we ended up regressing 3 benchmarks on CTMark (though still a winner overall). This patch changes the lowering to instead directly emit the G_ADD into the destination register, and then force changes the dest LLT to s64 from p0. This should be ok, as all uses of the register should now be selected and therefore the LLT doesn't matter for the users. It does however matter for the importer patterns, which will fail to select a G_ADD if there's a p0 LLT. I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't use the same trick of breaking the type system since that could break the selection of the defining instruction. Thus with -O0 we still end up with a cross class copy on source. Code size improvements on -O0: Program baseline new diff test-suite :: CTMark/Bullet/bullet.test 965520 949164 -1.7% test-suite...TMark/7zip/7zip-benchmark.test 1069456 1052600 -1.6% test-suite...ark/tramp3d-v4/tramp3d-v4.test 1213692 1199804 -1.1% test-suite...:: CTMark/sqlite3/sqlite3.test 421680 419736 -0.5% test-suite...-typeset/consumer-typeset.test 837076 833380 -0.4% test-suite :: CTMark/lencod/lencod.test 799712 796976 -0.3% test-suite...:: CTMark/ClamAV/clamscan.test 688264 686132 -0.3% test-suite :: CTMark/kimwitu++/kc.test 1002344 999648 -0.3% test-suite...Mark/mafft/pairlocalalign.test 422296 421768 -0.1% test-suite :: CTMark/SPASS/SPASS.test 656792 656532 -0.0% Geomean difference -0.6% Differential Revision: https://reviews.llvm.org/D73910	2020-02-03 11:50:22 -08:00
Matt Arsenault	b1ca9e2a43	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Quentin Colombet	cbcbad6447	[TargetRegisterInfo] Make the heuristic to skip region split overridable by the target RegAllocGreedy uses a fairly compile time intensive splitting heuristic called region splitting. This heuristic was disabled via another heuristic when it is likely that it won't be worth the compile time. The only way to control this other heuristic was via a command line option (huge-size-for-split). This commit gives more control on this heuristic by making it overridable by the target using a target hook in TargetRegisterInfo called shouldRegionSplitForVirtReg. The default implementation of this hook keeps the heuristic as it was before this patch.	2020-02-03 11:30:35 -08:00
Reid Kleckner	9369556310	Add PassManagerImpl.h to hide implementation details ClangBuildAnalyzer results show that a lot of time is spent instantiating AnalysisManager::getResultImpl across the code base: **** Templates that took longest to instantiate: 50445 ms: llvm::AnalysisManager<llvm::Function>::getResultImpl (412 times, avg 122 ms) 47797 ms: llvm::AnalysisManager<llvm::Function>::getResult<llvm::TargetLibraryAnalysis> (389 times, avg 122 ms) 46894 ms: std::tie<const unsigned long long, const bool> (2452 times, avg 19 ms) 43851 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096>::Allocate (3228 times, avg 13 ms) 33911 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (897 times, avg 37 ms) 33854 ms: std::tie<const unsigned long long, const unsigned long long> (1897 times, avg 17 ms) 27886 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11156 times, avg 2 ms) I mentioned this result to @chandlerc, and he suggested this direction. AnalysisManager is already explicitly instantiated, and getResultImpl doesn't need to be inlined. Move the definition to an Impl header, and include that header in files that explicitly instantiate AnalysisManager. There are only four (real) IR units: - function - module - loop - cgscc Looking at a specific transform (ArgumentPromotion.cpp), here are three compilations before & after this change: BEFORE: $ for i in $(seq 3) ; do ./ccit.bat ; done peak memory: 258.15MB real: 0m6.297s peak memory: 257.54MB real: 0m5.906s peak memory: 257.47MB real: 0m6.219s AFTER: $ for i in $(seq 3) ; do ./ccit.bat ; done peak memory: 235.35MB real: 0m5.454s peak memory: 234.72MB real: 0m5.235s peak memory: 234.39MB real: 0m5.469s The 20MB of memory saved seems real, and the time improvement seems like it is there. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73817	2020-02-03 11:15:55 -08:00
Reid Kleckner	a1c473cd39	Revert "[SVE] Fix bug in simplification of scalable vector instructions" This reverts commit 31574d38ac5fa4646cf01dd252a23e682402134f. The newly added shufflevector test does not pass locally on either of my workstations.	2020-02-03 11:12:09 -08:00
Michael Trent	cd0c8f255c	[llvm-objdump] Suppress spurious warnings when parsing Mach-O binaries. Summary: llvm-objdump started warning when asked to disassemble a section that isn't present in the input files, in Yuanfang Chen's change: d16c162c9453db855503134fe29ae4a3c0bec936. The problem is that the logic was restricted only to the generic llvm-objdump parser, not to the Mach-O-specific parser used for Apple toolchain compatibility. The solution is to log section names from the Mach-O parser. The macho-cstring-dump.test has been updated to fail if it encounters this new warning in the future. Reviewers: pete, ab, lhames, jhenderson, grimar, MaskRay, ychen Reviewed By: jhenderson, grimar Subscribers: rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73586	2020-02-03 10:59:36 -08:00
Alina Sbirlea	0528d7fce2	[LoopUtils] Make duplicate method a utility. [NFCI] Summary: Method appendLoopsToWorklist is duplicate in LoopUnroll and in the LoopPassManager as an internal method. Make it an utility. Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73569	2020-02-03 10:24:18 -08:00
Christopher Tetreault	56276c94bb	[SVE] Fix bug in simplification of scalable vector instructions Summary: * Most of the simplifications in SimplifyShuffleVectorInst depend on the concrete value of, or the length of the mask vector. For scalable vectors, this cannot be known at compile time. ** for these tests, detect if the vector is scalable before attempting the transformation * The functions ShuffleVectorInst::getMaskValue and ShuffleVectorInst::getShuffleMask access the value of the constant mask. However, since the length of the mask is unknown at compile time, these function do not work for scalable vectors. Add asserts to ensure that the input mask is not scalable Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73555	2020-02-03 10:15:56 -08:00
Nikita Popov	092e2dc033	[SimplifyLibCalls] Remove unused IRBuilder argument; NFC isLocallyOpenedFile() does not use IRBuilder.	2020-02-03 19:12:57 +01:00
Nikita Popov	64d4134188	[IRBuilder] Add missing NoFolder::CreatePointerBitCastOrAddrSpaceCast(); NFC Split out from D73835. This method was added to ConstantFolder and TargetFolder, but not NoFolder.	2020-02-03 19:11:27 +01:00
Nikita Popov	41a030740c	[IRBuilder] Remove unnecessary NoFolder methods; NFCI Split out from D73835: These methods are not part of the ConstantFolder API and as such don't serve a purpose.	2020-02-03 19:08:41 +01:00
Simon Pilgrim	780ca3dfa3	[X86] getTargetShuffleMask - use getConstantOperandVal helper. NFCI.	2020-02-03 18:06:47 +00:00
Nikita Popov	df272a915f	[InstCombine] Add replaceOperand() helper Adds a replaceOperand() helper, which is like Instruction.setOperand() but adds the old operand to the worklist. This reduces the amount of missing or incorrect worklist management. This only applies the helper to a relatively small subset of setOperand() calls in InstCombine, namely those of the pattern `I.setOperand(); return &I;`, where it is most obviously applicable. Differential Revision: https://reviews.llvm.org/D73803	2020-02-03 19:00:17 +01:00
Nikita Popov	ed97e37dc0	[InstCombine] Rename worklist methods; NFC This renames Worklist.AddDeferred() to Worklist.add() and Worklist.Add() to Worklist.push(). The intention here is that Worklist.add() should be the go-to method for explicit worklist management, while the raw Worklist.push() is mostly for InstCombine internals. I will then migrate uses of Worklist.push() to Worklist.add() in followup changes. As suggested by spatel on D73411 I'm also changing the remaining method names to lowercase first character, in line with current coding standards. Differential Revision: https://reviews.llvm.org/D73745	2020-02-03 18:56:51 +01:00
Nikita Popov	8495c2d57a	[ARM] Expand vector reduction intrinsics on soft float Followup to D73135. If the target doesn't have hard float (default for ARM), then we assert when trying to soften the result of vector reduction intrinsics. This patch marks these for expansion as well. (A bit odd to use vectors on a target without hard float ... but that's where you end up if you expose target-independent vector types.) Differential Revision: https://reviews.llvm.org/D73854	2020-02-03 18:49:12 +01:00
Nikita Popov	8f742c150c	[Examples] Link BitReader in ThinLtoJIT example D72486 broke the shared library build.	2020-02-03 18:47:38 +01:00
Nikita Popov	08f738d15c	[InstCombine] Fix unused variable warning; NFC	2020-02-03 18:47:38 +01:00
Teresa Johnson	919474967d	[ThinLTO] More efficient export computation (NFC) Summary: A recent change to enable more importing of global variables with references exposed some efficiency issues with export computation. See D73724 for more information and detailed analysis. The first was specific to variable importing. The code was marking every copy of a referenced value (from possibly thousands of files in the case of linkonce_odr) as exported, and we only need to mark the copy in the module containing the variable def being imported as exported. The reason is that this is tracking what values are newly exported as a result of importing. Anything that was defined in another module and simply used in the exporting module is already exported, and would have been identified by the caller (e.g. the LTO API implementations). The second issue is that the code was re-adding previously exported values (along with all references). It is easy to identify when a variable was already imported into the same module (via the import list insert call return value), and we already did this for function importing. However, what we weren't doing for either function or variable importing was avoiding a re-insertion when it was previously exported into a different importing module. The reason we couldn't do this is there was no way of telling from the export list whether it was previously inserted there because its definition was exported (in which case we already marked all its references as exported) from when it was inserted there because it was referenced by another exported value (in which case we haven't yet inserted its own references). To address this we can restructure the way the export list is constructed. This patch only adds the actual imported definitions (variable or function) to the export list for its module during the import computation. After import computation is complete, where we were already post-processing the export list we go ahead and add all references made by those exported values to the export list. These changes speed up the thin link not only with constant variable importing enabled, but also without (due to the efficiency improvement in function importing). Some thin link user time measurements for one large application, average of 5 runs: With constant variable importing enabled: - without this patch: 479.5s - with this patch: 74.6s Without constant variable importing enabled: - without this patch: 80.6s - with this patch: 70.3s Note I have not re-enabled constant variable importing here, as I would like to do additional compile time measurements with these fixes first. Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73851	2020-02-03 09:15:33 -08:00
Jay Foad	ccd7445730	[AMDGPU] getMemOperandsWithOffset: add resource operand for BUF instructions Summary: This prevents unwanted clustering of BUF instructions with the same vaddr but different resource descriptors. Reviewers: rampitec, arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73867	2020-02-03 17:06:09 +00:00
Sanjay Patel	4398f57ddf	[InstCombine] add tests for casted phi; NFC	2020-02-03 11:54:47 -05:00
Simon Pilgrim	539b0b67e9	HexagonOptAddrMode::changeStore - fix null dereference warning (PR43463) As detailed on PR43463, this fixes a static analyzer null dereference warning by sinking Changed = true into the if() blocks where the MIB is actually created. I did a quick check that suggested that one of those if() blocks is always guaranteed to be hit (so we could change it to if-else), but this seems like a safer approach Differential Revision: https://reviews.llvm.org/D73883	2020-02-03 16:50:04 +00:00
Simon Pilgrim	9c81e3d7cc	[TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.	2020-02-03 16:50:04 +00:00
Simon Pilgrim	5c93d9fa0c	[X86] BEXTR SimplifyDemandedBitsForTargetNode - length == 0 -> result = 0	2020-02-03 16:50:03 +00:00
Hans Wennborg	494b532127	Actually, don't try to use __builtin_strlen in StringRef.h before VS 2019 The fix in b3d7d1061dc375bb5ea725e6597382fcd37f41d6 compiled nicely, but didn't link because at least the VS 2017 version I use doesn't have the builtin yet. Instead, make use of the builtin with MSVC conditional on VS 2019 or later.	2020-02-03 17:49:29 +01:00
Guillaume Chatelet	1f9dcd30dc	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Hans Wennborg	41105ccc60	Declare __builtin_strlen in StringRef.h as constexpr Otherwise Visual Studio 2017 will complain about llvm::StringRef::strlen not being constexpr: StringRef.h(80): error C3615: constexpr function 'llvm::StringRef::strLen' cannot result in a constant expression StringRef.h(84): note: failure was caused by call of undefined function or one not declared 'constexpr'	2020-02-03 16:58:01 +01:00
Kazushi (Jam) Marukawa	3f0180f097	[VE] (fp)trunc+store & load+(fp)ext isel Summary: load+sext/zext/fpext and (fp)trunc+store isel legalization and tests Reviewers: arsenm, craig.topper, rengolin, k-ishizaka Reviewed By: arsenm Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73774	2020-02-03 16:55:44 +01:00
Simon Pilgrim	cebe3b26a7	[X86] computeKnownBitsForTargetNode - add BEXTR support (PR39153) Add a KnownBits::extractBits helper	2020-02-03 15:43:59 +00:00
Hans Wennborg	238f3b0462	build_llvm_package.bat: Use a short form of the git revision	2020-02-03 16:40:10 +01:00
Craig Topper	cf7fa877a2	[X86] FUCOMI/FCOMI instructions should Def FPSW not FPCW. These instructions can set the exception in FPSW. But I don't think they can change FPCW. So this looks like a typo. Differential Revision: https://reviews.llvm.org/D73864	2020-02-03 07:39:00 -08:00
Sanjay Patel	cb8bd29a62	[InstCombine] regenerate complete test checks; NFC	2020-02-03 10:30:26 -05:00
Kazushi (Jam) Marukawa	b2d7ee731b	[VE] vaarg functions callers and callees Summary: Isel patterns and tests for vaarg functions as callers and callees. Reviewers: arsenm, rengolin, k-ishizaka Subscribers: merge_guards_bot, wdng, hiraditya, llvm-commits Tags: #ve, #llvm Differential Revision: https://reviews.llvm.org/D73710	2020-02-03 16:26:44 +01:00
Simon Pilgrim	8301fd0d00	[X86] Add some initial BEXTR combine tests	2020-02-03 15:16:40 +00:00
Simon Pilgrim	1f9f866ff9	[X86] Move BEXTR DemandedBits handling inside SimplifyDemandedBitsForTargetNode Some prep work for PR39153.	2020-02-03 15:16:40 +00:00
Matt Arsenault	80bf477ac5	AMDGPU: Fix extra type mangling on llvm.amdgcn.if.break These have to be the same mask type.	2020-02-03 07:02:05 -08:00
Johannes Doerfert	b1f217520e	Revert "[OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder." This reverts commit 1ca740387b9bbdc142ac81c8bdd6370a8813e328. The bots break [0], investigation is needed. [0] http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22899	2020-02-03 08:59:14 -06:00

1 2 3 4 5 ...

191224 Commits