Commit Graph

203111 Commits

Author SHA1 Message Date
Florian Hahn
15e96e9e04 [DSE,MemorySSA] Increase walker limit a bit.
This slightly bumps the walker limit so that it covers more cases while
not increasing compile-time too much:
http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions
2020-09-08 14:55:46 +01:00
Sam Parker
551a2d7822 [NFC][ARM] Precommit test 2020-09-08 14:44:28 +01:00
Sanjay Patel
e7dfd86249 [InstCombine] add bitwise logic fold tests for D86395; NFC 2020-09-08 09:17:42 -04:00
Raul Tambre
ccd618fea7 [CMake] Remove dead FindPythonInterp code
LLVM has bumped the minimum required CMake version to 3.13.4, so this has become dead code.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D87189
2020-09-08 15:23:23 +03:00
Simon Pilgrim
9fa62857a3 X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI.
Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.
2020-09-08 13:01:23 +01:00
Simon Pilgrim
6acd1b7baf X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI.
Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.
2020-09-08 13:01:23 +01:00
Xing GUO
51d61a7d47 [DWARFYAML] Make the debug_ranges section optional.
This patch makes the debug_ranges section optional. When we specify an
empty debug_ranges section, yaml2obj only emits the section header.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87263
2020-09-08 19:55:47 +08:00
Sam Tebbs
6a1491418e [ARM][LowOverheadLoops] Remove modifications to the correct element
count register

After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove modifications to the vctp operand
as they should. This patch fixes that by explicitly removing
modifications to the vctp operand rather than the register used as the
element count.
2020-09-08 10:30:05 +01:00
Qiu Chaofan
ff9f37b205 Revert "[PowerPC] Implement instruction clustering for stores"
This reverts commit 3c0b3250230b3847a2a47dfeacfdb794c2285f02, (along
with ea795304 and bb39eb9e) since it breaks test with UB sanitizer.
2020-09-08 17:24:08 +08:00
Serge Guelton
602a8c1ce6 Provide anchor for compiler extensions
This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and
amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks'
2020-09-08 10:33:38 +02:00
Xing GUO
063c6c1b3c [obj2yaml] Stop parsing the debug_str section when it encounters a string without the null terminator.
When obj2yaml encounters a string without the null terminator, it should
stop parsing the debug_str section. This patch addresses comments in
[D86867](https://reviews.llvm.org/D86867#inline-803291).

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87261
2020-09-08 16:09:36 +08:00
Max Kazantsev
19662073e2 [Test] More tests where IndVars fails to eliminate a range check 2020-09-08 14:43:29 +07:00
Qiu Chaofan
d230f91a15 [PowerPC] Fix getMemOperandWithOffsetWidth
Commit 3c0b3250 introduced memory cluster under pwr10 target, but a
check for operands was unexpectedly removed. This adds it back to avoid
regression.
2020-09-08 15:35:25 +08:00
Simon Wallis
cdb9b6e903 [AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long
Implement AArch64 variant of shouldCoalesce() to detect a known failing case
and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load.

Do not coalesce in the following case:
COPY where source is bottom 32 bits of a 64-register,
and destination is a 32-bit subregister of a 64-bit register,
ie it causes the rest of the register to be implicitly set to zero.

A mir test has been added.

In the test case, the 32-bit copy implements a 32 to 64 bit zero extension
and relies on the upper 32 bits being zeroed.

Coalescing to the result of the 64-bit load meant overwriting
the upper 32 bits incorrectly when the loaded byte was negative.

Reviewed By: john.brawn

Differential Revision: https://reviews.llvm.org/D85956
2020-09-08 08:04:52 +01:00
Mikael Holmen
5af75a16f7 [PowerPC] Add parentheses to silence gcc warning
Without gcc 7.4 warns with

../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
          BaseOp1.isFI() &&
          ~~~~~~~~~~~~~~~^~
              "Only base registers and frame indices are supported.");
              ~
2020-09-08 08:39:57 +02:00
Andrew Wei
4e26a597de [LSR] Canonicalize a formula before insert it into the list
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.

Patched by: mdchen
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86939
2020-09-08 13:14:53 +08:00
Johannes Doerfert
56cc4d49d5 [Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions
The CloneFunctionInto has implicit requirements with regards to the
linkage and visibility of the function. We now update these after we did
the CloneFunctionInto on the copy with the same linkage and visibility
as the original.
2020-09-07 23:38:09 -05:00
Johannes Doerfert
7745d5467f [Attributor][NFC] Cleanup internalize test case
One run line was different and probably introduced for the manually
added function attribute & name checks. We can do this with the script
and a check prefix used for the other run lines as well.
2020-09-07 23:38:09 -05:00
Johannes Doerfert
314c55ed96 [Attributor][NFC] Change variable spelling 2020-09-07 23:38:09 -05:00
Johannes Doerfert
926bc1dd25 [Attributor][NFC] Clang tidy: no else after continue 2020-09-07 23:38:08 -05:00
Johannes Doerfert
ba69ec7fb4 [Attributor][NFC] Expand auto types (clang-fix-it) 2020-09-07 23:38:08 -05:00
Johannes Doerfert
ec05659ebd [Attributor][FIX] Properly return changed if the IR was modified
Deleting or replacing anything is certainly a modification. This caused
a later assertion in IPSCCP when compiling 400.perlbench with the new PM.
I'm not sure how to test this.
2020-09-07 23:38:08 -05:00
Max Kazantsev
b6ff4a7671 [Test] Auto-generated checks for some IndVarSimplify tests 2020-09-08 11:15:40 +07:00
Qiu Chaofan
2bb8ef68b6 [PowerPC] Implement instruction clustering for stores
On Power10, it's profitable to schedule some stores with adjacent target
address together. This patch implements this feature.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D86754
2020-09-08 11:03:09 +08:00
Alexander Shaposhnikov
60f196bf71 [llvm-objcopy] Consolidate and unify version tests
In this diff the tests which verify version printing functionality are refactored.
Since they are not specific to a particular format we move them into tool-version.test
and slightly unify (similarly to tool-name.test and tool-help-message.test).

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D87211
2020-09-07 18:44:32 -07:00
Florian Hahn
afd0a35752 [DSE,MemorySSA] Add an early check for read clobbers to traversal.
Depending on the benchmark, this early exit can save a substantial
amount of compile-time:

http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions
2020-09-07 23:22:10 +01:00
Roman Lebedev
8406429eae Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec2183d466dad64b763bab4e15fd8804239
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec2183d466dad64b763bab4e15fd8804239.
2020-09-08 00:24:03 +03:00
Nikita Popov
608aa74b19 [KnownBits] Avoid some copies (NFC)
These lambdas don't need copies, use const reference.
2020-09-07 22:19:29 +02:00
Nikita Popov
6a9be09c82 [SCCP] Compute ranges for supported intrinsics
For intrinsics supported by ConstantRange, compute the result range
based on the argument ranges. We do this independently of whether
some or all of the input ranges are full, as we can often still
constrain the result in some way.

Differential Revision: https://reviews.llvm.org/D87183
2020-09-07 22:16:06 +02:00
Craig Topper
558281ef53 [SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported.
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.

It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87215
2020-09-07 13:15:26 -07:00
Sanjay Patel
c81c9fc73f [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Sanjay Patel
f8a11985c3 [InstCombine] add ptr difference tests; NFC 2020-09-07 15:54:32 -04:00
Craig Topper
6f4d8f61ba [X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets.
Differential Revision: https://reviews.llvm.org/D87214
2020-09-07 11:14:05 -07:00
Craig Topper
05d9a7585d [X86] Pre-commit new test case for D87214. NFC 2020-09-07 11:14:05 -07:00
Sanjay Patel
367b6584ea [DAGCombiner] allow more store merging for non-i8 truncated ops
This is a follow-up suggested in D86420 - if we have a pair of stores
in inverted order for the target endian, we can rotate the source
bits into place.
The "be_i64_to_i16_order" test shows a limitation of the current
function (which might be avoided if we integrate this function with
the other cases in mergeConsecutiveStores). In the earlier
"be_i64_to_i16" test, we skip the first 2 stores because we do not
match the full set as consecutive or rotate-able, but then we reach
the last 2 stores and see that they are an inverted pair of 16-bit
stores. The "be_i64_to_i16_order" test alters the program order of
the stores, so we miss matching the sub-pattern.

Differential Revision: https://reviews.llvm.org/D87112
2020-09-07 14:12:36 -04:00
Eric Astor
ffbe9ff668 [ms] [llvm-ml] Allow use of locally-defined variables in expressions
MASM allows variables defined by equate statements to be used in expressions.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86946
2020-09-07 14:00:14 -04:00
Eric Astor
7ae8eb3e82 [ms] [llvm-ml] Fix STRUCT field alignment
MASM aligns fields to the _minimum_ of the STRUCT alignment value and the size of the next field.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86945
2020-09-07 13:58:59 -04:00
Eric Astor
93e5d34daa [ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM
Add support for expressions of the form '1 or 2', etc.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86944
2020-09-07 13:57:54 -04:00
Simon Pilgrim
b19ede2649 VPlan.h - remove unnecessary forward declarations. NFCI.
Already defined in includes.
2020-09-07 18:35:06 +01:00
Simon Pilgrim
2a0e71d0ce MipsISelLowering.h - remove CCState/CCValAssign forward declarations. NFCI.
These are already defined in the CallingConvLower.h include.
2020-09-07 18:15:26 +01:00
Simon Pilgrim
54e2cf9073 BTFDebug.h - reduce MachineInstr.h include to forward declaration. NFCI. 2020-09-07 17:51:13 +01:00
Simon Pilgrim
38c8707d93 LeonPasses.h - remove unnecessary includes. NFCI.
Reduce to forward declarations and move includes to LeonPasses.cpp where necessary.
2020-09-07 17:51:12 +01:00
Simon Pilgrim
6c92862767 LeonPasses.h - remove orphan function declarations. NFCI.
The implementations no longer exist.
2020-09-07 17:51:12 +01:00
Sanjay Patel
c688e87b99 [InstCombine] improve folds for icmp with multiply operands (PR47432)
Check for no overflow along with an odd constant before
we lose information by converting to bitwise logic.

https://rise4fun.com/Alive/2Xl

  Pre: C1 != 0
  %mx = mul nsw i8 %x, C1
  %my = mul nsw i8 %y, C1
  %r = icmp eq i8 %mx, %my
  =>
  %r = icmp eq i8 %x, %y

  Name: nuw ne
  Pre: C1 != 0
  %mx = mul nuw i8 %x, C1
  %my = mul nuw i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y

  Name: odd ne
  Pre: C1 % 2 != 0
  %mx = mul i8 %x, C1
  %my = mul i8 %y, C1
  %r = icmp ne i8 %mx, %my
  =>
  %r = icmp ne i8 %x, %y
2020-09-07 12:40:37 -04:00
Sanjay Patel
1fd12b30bd [InstCombine] move/add tests for icmp with mul operands; NFC 2020-09-07 12:40:37 -04:00
alex-t
c025a25bff [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block
optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE.
As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor.
In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute.
Removing empty block solves the problem.

This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D86634
2020-09-07 19:37:27 +03:00
Momchil Velikov
8e2bba4fdc Reduce the number of memory allocations when displaying
a warning about clobbering reserved registers (NFC).

Also address some minor inefficiencies and style issues.

Differential Revision: https://reviews.llvm.org/D86088
2020-09-07 17:04:00 +01:00
Simon Pilgrim
8ac9bb3910 AntiDepBreaker.h - remove unnecessary ScheduleDAG.h include. NFCI. 2020-09-07 16:39:42 +01:00
Simon Pilgrim
358c267440 [Sparc] Add reduced funnel shift test case for PR47303 2020-09-07 16:17:31 +01:00
Simon Pilgrim
9b5af79762 [X86][SSE] Don't use LowerVSETCCWithSUBUS for unsigned compare with +ve operands (PR47448)
We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.
2020-09-07 16:11:40 +01:00