archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Jonas Paulsson	bf5edcf100	[SystemZ] Improve emitSelect() Merge more Select pseudo instructions in emitSelect() by allowing other instructions between them as long as they do not clobber CC. Debug value instructions are now moved down to below the new PHIs instead of erasing them. Review: Ulrich Weigand https://reviews.llvm.org/D67619 llvm-svn: 372873	2019-09-25 14:00:33 +00:00
David Green	b9ecdc3f91	[ARM] Ensure we do not attempt to create lsll #0 During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839	2019-09-25 10:16:48 +00:00
Florian Hahn	1b707b394c	[AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL. I think we should be able to use shl instead of sshl and ushl for positive constant shift values, unless I am missing something. We already have the machinery in place to ensure we only replace nodes, if the shift value is positive and <= the element width. This is a generalization of an earlier patch rL372565. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D67955 llvm-svn: 372824	2019-09-25 08:22:05 +00:00
Amara Emerson	b2c6e9bf99	[AArch64][GlobalISel] Tweak legalization rule for G_BSWAP to handle widening s16. llvm-svn: 372812	2019-09-25 04:52:42 +00:00
Yonghong Song	3cc43e7b0b	[BPF] Generate array dimension size properly for zero-size elements Currently, if an array element type size is 0, the number of array elements will be set to 0, regardless of what user specified. This implementation is done in the beginning where BTF is mostly used to calculate the member offset. For example, struct s {}; struct s1 { int b; struct s a[2]; }; struct s1 s1; The BTF will have struct "s1" member "a" with element count 0. Now BTF types are used for compile-once and run-everywhere relocations and we need more precise type representation for type comparison. Andrii reported the issue as there are differences between original structure and BTF-generated structure. This patch made the change to correctly assign "2" as the number elements of member "a". Some dead codes related to ElemSize compuation are also removed. Differential Revision: https://reviews.llvm.org/D67979 llvm-svn: 372785	2019-09-24 22:38:43 +00:00
Sean Fertile	9371777eb7	Extends the expansion of the LWZtoc pseduo op for AIX. Differential Revision: https://reviews.llvm.org/D67853 llvm-svn: 372772	2019-09-24 18:04:51 +00:00
Simon Pilgrim	76cdef6b0c	[X86] Add MMX MOVD/MOVQ stores to folding tables to support stack folding llvm-svn: 372770	2019-09-24 16:15:32 +00:00
Simon Pilgrim	5314933e77	[X86] Add tests showing failure to stack fold MMX MOVD/MOVQ stores llvm-svn: 372766	2019-09-24 15:40:09 +00:00
Ilya Biryukov	66d4b83781	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
David Green	fdb12ed317	[ARM] Split large widening MVE loads Similar to rL372717, we can force the splitting of extends of vector loads in MVE, in order to use the better widening loads as opposed to going through expensive extends. This adds a combine to early-on detect extends of loads and split the load in two, from where normal legalisation will kick in and we get a series of widening loads. Differential Revision: https://reviews.llvm.org/D67909 llvm-svn: 372721	2019-09-24 10:53:09 +00:00
David Green	4bb16bdf09	[ARM] MVE sext and widen/narrow tests from larger types. NFC llvm-svn: 372719	2019-09-24 10:39:58 +00:00
David Green	483eaed27f	[ARM] Split large truncating MVE stores MVE does not have a simple sign extend instruction that can move elements across lanes. We currently often end up moving each lane into and out of a GPR, in order to get elements into the correct places. When we have a store of a trunc (or a extend of a load), we can instead just split the store/load in two, using the narrowing/widening load/store instructions from each half of the vector. This does that for stores. It happens very early in a store combine, so as to easily detect the truncates. (It would be possible to do this later, but that would involve looking through a buildvector of extract elements. Not impossible but this way seemed simpler). By enabling store combines we also get a vmovdrr combine for free, helping some other tests. Differential Revision: https://reviews.llvm.org/D67828 llvm-svn: 372717	2019-09-24 10:10:41 +00:00
Amara Emerson	756cf7ac87	[GlobalISel][IRTranslator] Fix switch table lowering to use signed LE not unsigned. We were miscompiling switch value comparisons with the wrong signedness, which shows up when we have things like switch case values with i1 types, which end up being legalized incorrectly. Fixes PR43383 llvm-svn: 372675	2019-09-24 00:09:23 +00:00
Craig Topper	c794eefd56	[X86] Reduce the number of unique check prefixes in memset-nonzero.ll. NFC The avx512 with prefer-256-bit generates the same code as AVX2 so just reuse that prefix. llvm-svn: 372661	2019-09-23 21:29:28 +00:00
Thomas Lively	da6e144269	[WebAssembly] vNxM.load_splat instructions Summary: Adds the new load_splat instructions as specified at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#load-and-splat. DAGISel does not allow matching multiple copies of the same load in a single pattern, so we use a new node in WebAssemblyISD to wrap loads that should be splatted. Depends on D67783. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67784 llvm-svn: 372655	2019-09-23 20:42:12 +00:00
Sanjay Patel	73d6fefa33	[BreakFalseDeps] ignore function with minsize attribute This came up in the x86-specific: https://bugs.llvm.org/show_bug.cgi?id=43239 ...but it is a general problem for the BreakFalseDeps pass. Dependencies may be broken by adding some other instruction, so that should be avoided if the overall goal is to minimize size. Differential Revision: https://reviews.llvm.org/D67363 llvm-svn: 372628	2019-09-23 17:01:01 +00:00
Krzysztof Parzyszek	433b346065	[Hexagon] Bitcast v4i16 to v8i8, unify no-op casts between scalar and HVX llvm-svn: 372616	2019-09-23 14:33:27 +00:00
Sanjay Patel	10cca4c938	[x86] fix assert with horizontal math + broadcast of vector (PR43402) https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606	2019-09-23 13:30:23 +00:00
Sam Parker	b6bb844bc8	[ARM][MVE] Remove old tail predicates Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567	2019-09-23 09:48:25 +00:00
Florian Hahn	b156528748	[AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine. Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl, if their first operand is extended and the second operand is a constant Also adds a few tests marked with FIXME, where we can further increase codegen. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D62308 llvm-svn: 372565	2019-09-23 09:38:53 +00:00
Sam Parker	d93fa5ac8a	[ARM][LowOverheadLoops] Use subs during revert. Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp. Differential Revision: https://reviews.llvm.org/D67801 llvm-svn: 372560	2019-09-23 08:57:50 +00:00
Sam Parker	e90d55e782	[ARM][LowOverheadLoops] Use tBcc when reverting Check the branch target ranges and use a tBcc instead of t2Bcc when we can. Differential Revision: https://reviews.llvm.org/D67796 llvm-svn: 372557	2019-09-23 08:35:31 +00:00
Petar Avramovic	5555a652a9	[MIPS GlobalISel] VarArg argument lowering, select G_VASTART and vacopy CC_Mips doesn't accept vararg functions for O32, so we have to explicitly use CC_Mips_FixedArg. For lowerCall we now properly figure out whether callee function is vararg or not, this has no effect for O32 since we always use CC_Mips_FixedArg. For lower formal arguments we need to copy arguments in register to stack and save pointer to start for argument list into MipsMachineFunction object so that G_VASTART could use it during instruction select. For vacopy we need to copy content from one vreg to another, load and store are used for that purpose. Differential Revision: https://reviews.llvm.org/D67756 llvm-svn: 372555	2019-09-23 08:11:41 +00:00
Craig Topper	9c5bf404fb	[X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM. llvm-svn: 372544	2019-09-23 05:35:23 +00:00
Craig Topper	ad887e047c	[X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543	2019-09-23 05:35:20 +00:00
Craig Topper	9476553035	[X86] Add 32-bit command line to avx512f-vec-test-testn.ll llvm-svn: 372542	2019-09-23 05:35:15 +00:00
David Zarzycki	5b0bffb0f5	Prefer AVX512 memcpy when applicable When AVX512 is available and the preferred vector width is 512-bits or more, we should prefer AVX512 for memcpy(). https://bugs.llvm.org/show_bug.cgi?id=43240 https://reviews.llvm.org/D67874 llvm-svn: 372540	2019-09-23 05:00:59 +00:00
Craig Topper	08d311e69e	[X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535	2019-09-23 01:05:33 +00:00
Roman Lebedev	b430c04585	[X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback to BZHI is profitable (PR43381) Summary: PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI, we only do that if we either have BEXTRI (TBM), or if BEXTR is marked as being fast (`-mattr=+fast-bextr`). In all other cases we don't match. But that is mainly only true for AMD CPU's. However, for all the CPU's for which we have sched models, the BZHI is always fast (or the sched models are all bad.) So if we decide that it's unprofitable to emit BEXTR/BEXTRI, we should consider falling-back to BZHI if it is available, and follow-up with the shift. While it's really tempting to do something because it's cool it is wise to first think whether it actually makes sense to do. We shouldn't just use BZHI because we can, but only it it is beneficial. In particular, it isn't really worth it if the input is a register, mask is small, or we can fold a load. But it is worth it if the mask does not fit into 32-bits. (careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here) Thus we manage to fold a load: https://godbolt.org/z/Er0OQz Or if we'd end up using BZHI anyways because the mask is large: https://godbolt.org/z/dBJ_5h But this isn'r actually profitable in general case, e.g. here we'd increase microop count (the register renaming is free, mca does not model that there it seems) https://godbolt.org/z/k6wFoz Likewise, not worth it if we just get load folding: https://godbolt.org/z/1M1deG https://bugs.llvm.org/show_bug.cgi?id=43381 Reviewers: RKSimon, craig.topper, davezarzycki, spatel Reviewed By: craig.topper, davezarzycki Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67875 llvm-svn: 372532	2019-09-22 22:04:29 +00:00
Roman Lebedev	a8217101c5	[NFC][X86] Add BEXTR test with load and 33-bit mask (PR43381 / D67875) llvm-svn: 372524	2019-09-22 19:36:38 +00:00
Craig Topper	44e3baa609	[X86] Update commutable EVEX vcmp patterns to use timm instead of imm. We need to match TargetConstant, not Constant. This was broken in r372338, but we lacked test coverage. llvm-svn: 372523	2019-09-22 19:06:13 +00:00
Craig Topper	d16a5042d5	[X86] Add more tests for commuting evex vcmp instructions during isel to fold a load. Some of the isel patterns were not updated to check for TargetConstant instead of Constant in r372338. llvm-svn: 372522	2019-09-22 19:06:08 +00:00
Craig Topper	5d13fbb57e	[X86] Add test memset and memcpy testcases for D67874. NFC llvm-svn: 372494	2019-09-22 06:52:25 +00:00
Roman Lebedev	c1a31b1e8e	[NFC][X86] Adjust check prefixes in bmi.ll (PR43381) llvm-svn: 372468	2019-09-21 11:12:55 +00:00
Amara Emerson	32510b606a	[AArch64][GlobalISel] Implement selection for G_SHL of <2 x i64> Simple continuation of existing selection support. llvm-svn: 372467	2019-09-21 09:21:16 +00:00
Amara Emerson	5f6b7279f3	[AArch64][GlobalISel] Selection support for G_ASHR of <2 x s64> Just add an extra case to the existing selection logic. llvm-svn: 372466	2019-09-21 09:21:13 +00:00
Amara Emerson	a9369d64ec	[AArch64][GlobalISel] Make <4 x s32> G_ASHR and G_LSHR legal. llvm-svn: 372465	2019-09-21 09:21:10 +00:00
James Molloy	a040a966a9	[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount Recommit: fix asan errors. The way MachinePipeliner uses these target hooks is stateful - we reduce trip count by one per call to reduceLoopCount. It's a little overfit for hardware loops, where we don't have to worry about stitching a loop induction variable across prologs and epilogs (the induction variable is implicit). This patch introduces a new API: /// Analyze loop L, which must be a single-basic-block loop, and if the /// conditions can be understood enough produce a PipelinerLoopInfo object. virtual std::unique_ptr<PipelinerLoopInfo> analyzeLoopForPipelining(MachineBasicBlock LoopBB) const; The return value is expected to be an implementation of the abstract class: /// Object returned by analyzeLoopForPipelining. Allows software pipelining /// implementations to query attributes of the loop being pipelined. class PipelinerLoopInfo { public: virtual ~PipelinerLoopInfo(); /// Return true if the given instruction should not be pipelined and should /// be ignored. An example could be a loop comparison, or induction variable /// update with no users being pipelined. virtual bool shouldIgnoreForPipelining(const MachineInstr MI) const = 0; /// Create a condition to determine if the trip count of the loop is greater /// than TC. /// /// If the trip count is statically known to be greater than TC, return /// true. If the trip count is statically known to be not greater than TC, /// return false. Otherwise return nullopt and fill out Cond with the test /// condition. virtual Optional<bool> createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &Cond) = 0; /// Modify the loop such that the trip count is /// OriginalTC + TripCountAdjust. virtual void adjustTripCount(int TripCountAdjust) = 0; /// Called when the loop's preheader has been modified to NewPreheader. virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0; /// Called when the loop is being removed. virtual void disposed() = 0; }; The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while allowing the target to hold its own state across all calls. This API, in particular the disjunction of creating a trip count check condition and adjusting the loop, improves the code quality in ModuloSchedule.cpp. llvm-svn: 372463	2019-09-21 08:19:41 +00:00
Craig Topper	ef72bd9cc3	[X86] Use sse_load_f32/f64 and timm in patterns for memory form of vgetmantss/sd. Previously we only matched scalar_to_vector and scalar load, but we should be able to narrow a vector load or match vzload. Also need to match TargetConstant instead of Constant. The register patterns were previously updated, but not the memory patterns. llvm-svn: 372458	2019-09-21 06:44:29 +00:00
Craig Topper	717f1d9695	[X86] Add test case to show failure to fold load with getmantss due to isel pattern looking for Constant instead of TargetConstant The intrinsic has an immarg so its gets created with a TargetConstant instead of a Constant after r372338. The isel pattern was only updated for the register form, but not the memory form. llvm-svn: 372457	2019-09-21 06:44:24 +00:00
Matt Arsenault	c078733972	AMDGPU/GlobalISel: Allow selection of scalar min/max I believe all of the uniform/divergent pattern predicates are redundant and can be removed. The uniformity bit already influences the register class, and nothhing has broken when I've removed this and others. llvm-svn: 372450	2019-09-21 02:37:33 +00:00
Amara Emerson	860145b41c	[GlobalISel] Defer setting HasCalls on MachineFrameInfo to selection time. We currently always set the HasCalls on MFI during translation and legalization if we're handling a call or legalizing to a libcall. However, if that call is later optimized to a tail call then we don't need the flag. The flag being set to true causes frame lowering to always save and restore FP/LR, which adds unnecessary code. This change does the same thing as SelectionDAG and ports over some code that scans instructions after selection, using TargetInstrInfo to determine if target opcodes are known calls. Code size geomean improvements on CTMark: -O0 : 0.1% -Os : 0.3% Differential Revision: https://reviews.llvm.org/D67868 llvm-svn: 372443	2019-09-20 23:52:07 +00:00
Ulrich Weigand	8f9591eb21	[SystemZ] Support z15 processor name The recently announced IBM z15 processor implements the architecture already supported as "arch13" in LLVM. This patch adds support for "z15" as an alternate architecture name for arch13. The patch also uses z15 in a number of places where we used arch13 as long as the official name was not yet announced. llvm-svn: 372435	2019-09-20 23:04:45 +00:00
Sterling Augustine	fd7a9c638e	Fix missed case of switching getConstant to getTargetConstant. Try 2. Summary: This fixes a crasher introduced by r372338. Reviewers: echristo, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67850 llvm-svn: 372434	2019-09-20 22:26:55 +00:00
Jinsong Ji	b2118cf9ea	[NFC][PowerPC] Consolidate testing of common linkage symbols Add a new file to test the code gen for common linkage symbol. Remove common linkage in some other testcases to avoid distraction. llvm-svn: 372426	2019-09-20 20:31:37 +00:00
Mitch Phillips	1a7a7c7655	Revert "[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount" This commit broke the ASan buildbot. See comments in rL372376 for more information. This reverts commit 15e27b0b6d9d51362fad85dbe95ac5b3fadf0a06. llvm-svn: 372425	2019-09-20 20:25:16 +00:00
Evgeniy Stepanov	00275eb9d2	[MTE] Handle MTE instructions in AArch64LoadStoreOptimizer. Summary: Generate pre- and post-indexed forms of ST*G and STGP when possible. Reviewers: ostannard, vitalybuka Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67741 llvm-svn: 372412	2019-09-20 17:36:27 +00:00
Sebastian Pop	4c4c86df5d	[aarch64] add def-pats for dot product This patch adds the patterns to select the dot product instructions. Tested on aarch64-linux with make check-all. Differential Revision: https://reviews.llvm.org/D67645 llvm-svn: 372408	2019-09-20 16:33:33 +00:00
Stanislav Mekhanoshin	8776a8f98e	Remove assert from MachineLoop::getLoopPredecessor() According to the documentation method returns predecessor if the given loop's header has exactly one unique predecessor outside the loop. Otherwise return null. In reality it asserts if there is no predecessor outside of the loop. The testcase has the loop where predecessors outside of the loop were not identified as analyzeBranch() was unable to process the mask branch and returned true. That is also not correct to assert for the truly dead loops. Differential Revision: https://reviews.llvm.org/D67634 llvm-svn: 372405	2019-09-20 15:26:10 +00:00
Krzysztof Parzyszek	cfc0a1e836	[MVT] Add v256i1 to MachineValueType This type can show up when lowering some HVX vector code on Hexagon. llvm-svn: 372403	2019-09-20 15:19:20 +00:00

1 2 3 4 5 ...

30777 Commits