Commit Graph

180670 Commits

Author SHA1 Message Date
Julian Lettner
3d2837886d [ASan] Use dynamic shadow on 32-bit iOS and simulators
The VM layout on iOS is not stable between releases. On 64-bit iOS and
its derivatives we use a dynamic shadow offset that enables ASan to
search for a valid location for the shadow heap on process launch rather
than hardcode it.

This commit extends that approach for 32-bit iOS plus derivatives and
their simulators.

rdar://50645192
rdar://51200372
rdar://51767702

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D63586

llvm-svn: 364105
2019-06-21 21:01:39 +00:00
Craig Topper
157019fdeb [X86] Add test cases for incorrect shrinking of volatile vector loads from 128-bits to 32 or 64 bits. NFC
This is caused by isel patterns that look for vzmovl+load and
treat it the same as vzload.

llvm-svn: 364101
2019-06-21 20:16:26 +00:00
Matt Arsenault
47811338cc AMDGPU: Fix not using s33 for scratch wave offset in kernels
Fixes missing piece from r363990.

llvm-svn: 364099
2019-06-21 20:04:02 +00:00
Craig Topper
4f5a568045 [X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Differential Revision: https://reviews.llvm.org/D63512

llvm-svn: 364095
2019-06-21 19:10:21 +00:00
Craig Topper
5aa82d65e9 [X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types.
We don't have any Custom handling during type legalization. Only
operation legalization.

Fixes PR42355

llvm-svn: 364093
2019-06-21 18:50:00 +00:00
Craig Topper
21c023850f [X86] Add avx512bw command lines to avx512-select.ll
Prep for fixing PR42355 and ensuring we have coverage of
ISD::SELECT for v64i8/v32i16 on KNL and SKX configs.

llvm-svn: 364092
2019-06-21 18:49:42 +00:00
Craig Topper
aa14b9b4c2 [X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults.
This should be unreachable, but bugs can make it reachable. This
adds a debug print so we can see the bad node in the output when
the llvm_unreachable triggers.

llvm-svn: 364091
2019-06-21 18:49:21 +00:00
Simon Pilgrim
1d92268266 [X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle
Subvector shuffling often ends up as insert/extract subvector.

llvm-svn: 364090
2019-06-21 18:35:04 +00:00
Amara Emerson
98ded02108 [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops.
With this we can now fully code generate jump tables, which is important for code size.

Differential Revision: https://reviews.llvm.org/D63223

llvm-svn: 364086
2019-06-21 18:10:41 +00:00
Amara Emerson
d9d36b00c1 [GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks.
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.

Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.

For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.

There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.

Actual arm64 support for selection of jump tables is coming in a later patch.

Differential Revision: https://reviews.llvm.org/D63169

llvm-svn: 364085
2019-06-21 18:10:38 +00:00
Simon Pilgrim
ea836bbe93 [SLP] Look-ahead operand reordering heuristic.
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364084
2019-06-21 17:57:01 +00:00
David Bolvansky
b94fe185ae [NFC] Update shl-sub tests
llvm-svn: 364083
2019-06-21 17:51:18 +00:00
Sanjay Patel
ad03d88cb6 [InstCombine] add tests for ctpop folds; NFC
llvm-svn: 364082
2019-06-21 17:44:09 +00:00
Craig Topper
5cca3be196 [X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl.
We already use vmovq for v2i64/v2f64 vzmovl. But we were using a
blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or
movsd+xorpd under optsize.

I think the blend with 0 or movss/d is only needed for
vXi32 where we don't have an instruction that can move 32
bits from one xmm to another while zeroing upper bits.

movq is no worse than blendpd on any known CPUs.

llvm-svn: 364079
2019-06-21 17:24:21 +00:00
Simon Pilgrim
3e570f8cd0 [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.
llvm-svn: 364076
2019-06-21 16:44:51 +00:00
Amara Emerson
8eb83e55cc [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).

On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.

There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.

Differential Revision: https://reviews.llvm.org/D63587

llvm-svn: 364075
2019-06-21 16:43:50 +00:00
Stanislav Mekhanoshin
7c12bcccbf [AMDGPU] hazard recognizer for fp atomic to s_denorm_mode
This requires 3 wait states unless there is a wait or VALU in
between.

Differential Revision: https://reviews.llvm.org/D63619

llvm-svn: 364074
2019-06-21 16:30:14 +00:00
David Bolvansky
e82aca84f0 [InstCombine] (1 << (C - x)) -> ((1 << C) >> x) if C is bitwidth - 1
Summary:
```
%a = sub i32 31, %x
%r = shl i32 1, %a
  =>
%d = shl i32 1, 31
%r = lshr i32 %d, %x

Done: 1
Optimization is correct!
```

https://rise4fun.com/Alive/btZm

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63652

llvm-svn: 364073
2019-06-21 16:25:32 +00:00
Simon Pilgrim
b8b304222c [X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI.
TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate.

llvm-svn: 364072
2019-06-21 16:23:28 +00:00
David Bolvansky
aad51b646f [NFC] Added more tests for D63652
llvm-svn: 364069
2019-06-21 16:14:13 +00:00
Simon Pilgrim
b6bee639c8 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
llvm-svn: 364068
2019-06-21 16:11:18 +00:00
David Bolvansky
f28fa06507 [InstCombine] cttz(abs(x)) -> cttz(x)
Summary: Signedness does not change number of trailing zeros.

Reviewers: spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D63546

llvm-svn: 364064
2019-06-21 15:26:22 +00:00
Sanjay Patel
a81820b258 [GVNSink] prevent crashing on mismatched instructions (PR42346)
Patch based on suggestion by James Molloy (@jmolloy) in:
https://bugs.llvm.org/show_bug.cgi?id=42346

llvm-svn: 364062
2019-06-21 15:17:24 +00:00
David Bolvansky
55c9ce9744 [NFC] Added tests for (1 << (C - x)) -> ((1 << C) >> x)
llvm-svn: 364060
2019-06-21 15:00:31 +00:00
Simon Pilgrim
3b26b2fac1 [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code.
Move the "extract from insert detection code" into a lambda helper function.

llvm-svn: 364059
2019-06-21 14:46:21 +00:00
James Henderson
13188b5a1b [docs][llvm-objdump] Fix bad merge of docs
llvm-svn: 364056
2019-06-21 14:41:36 +00:00
George Rimar
1675d50a91 [llvm-objcopy] - Get rid of dynrel.elf precompiled binary from inputs.
We do not have to spread using the precompiled binaries in the tests,
when we can use YAML. This patch removes the dynrel.elf binary and adds
a few comments to the test cases.

Differential revision: https://reviews.llvm.org/D63641

llvm-svn: 364052
2019-06-21 14:15:15 +00:00
Jay Foad
55a0dcdf87 [Scalarizer] Propagate IR flags
Summary:
The motivation for this was to propagate fast-math flags like nnan and
ninf on vector floating point operations to the corresponding scalar
operations to take advantage of follow-on optimizations. But I think
the same argument applies to all of our IR flags: if they apply to the
vector operation then they also apply to all the individual scalar
operations, and they might enable follow-on optimizations.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63593

llvm-svn: 364051
2019-06-21 14:10:18 +00:00
George Rimar
5920f219f2 [llvm-readobj] - Inline a few yaml inputs into test cases.
There are some test that are splitted into main part + input yaml for no visible reason.
This patch inines the yaml part for the 3 test cases I found.

Differential revision: https://reviews.llvm.org/D63644

llvm-svn: 364049
2019-06-21 14:07:35 +00:00
Andrea Di Biagio
2d1bc962de Set an explicit x86 triple for test bottleneck-analysis.s added by my r364045. NFC
This should unbreak the ppc64 buildbots.

llvm-svn: 364048
2019-06-21 14:05:58 +00:00
Sam Elliott
75a022ea2b [RISCV] Add RISCV-specific TargetTransformInfo
Summary:
LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.

This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.

This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test. Our immediate costs are on the conservative side, for
the moment, but we prevent hoisting in most circumstances anyway.

Previous review was on D63007

Reviewers: asb, luismarques

Reviewed By: asb

Subscribers: ributzka, MaskRay, llvm-commits, Jim, benna, psnobl, jocewei, PkmX, rkruppe, the_o, brucehoult, MartinMosbeck, rogfer01, edward-jones, zzheng, jrtc27, shiva0217, kito-cheng, niosHD, sabuasal, apazos, simoncook, johnrusso, rbar, hiraditya, mgorny

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63433

llvm-svn: 364046
2019-06-21 13:36:09 +00:00
Andrea Di Biagio
88551367a3 [MCA][Bottleneck Analysis] Teach how to compute a critical sequence of instructions based on the simulation.
This patch teaches the bottleneck analysis how to identify and print the most
expensive sequence of instructions according to the simulation. Fixes PR37494.

The goal is to help users identify the sequence of instruction which is most
critical for performance.

A dependency graph is internally used by the bottleneck analysis to describe
data dependencies and processor resource interferences between instructions.

There is one node in the graph for every instruction in the input assembly
sequence. The number of nodes in the graph is independent from the number of
iterations simulated by the tool. It means that a single node of the graph
represents all the possible instances of a same instruction contributed by the
simulated iterations.

Edges are dynamically "discovered" by the bottleneck analysis by observing
instruction state transitions and "backend pressure increase" events generated
by the Execute stage. Information from the events is used to identify critical
dependencies, and materialize edges in the graph. A dependency edge is uniquely
identified by a pair of node identifiers plus an instance of struct
DependencyEdge::Dependency (which provides more details about the actual
dependency kind).

The bottleneck analysis internally ranks dependency edges based on their impact
on the runtime (see field DependencyEdge::Dependency::Cost). To this end, each
edge of the graph has an associated cost. By default, the cost of an edge is a
function of its latency (in cycles). In practice, the cost of an edge is also a
function of the number of cycles where the dependency has been seen as
'contributing to backend pressure increases'. The idea is that the higher the
cost of an edge, the higher is the impact of the dependency on performance. To
put it in another way, the cost of an edge is a measure of criticality for
performance.

Note how a same edge may be found in multiple iteration of the simulated loop.
The logic that adds new edges to the graph checks if an equivalent dependency
already exists (duplicate edges are not allowed). If an equivalent dependency
edge is found, field DependencyEdge::Frequency of that edge is incremented by
one, and the new cost is cumulatively added to the existing edge cost.

At the end of simulation, costs are propagated to nodes through the edges of the
graph. The goal is to identify a critical sequence from a node of the root-set
(composed by node of the graph with no predecessors) to a 'sink node' with no
successors.  Note that the graph is intentionally kept acyclic to minimize the
complexity of the critical sequence computation algorithm (complexity is
currently linear in the number of nodes in the graph).

The critical path is finally computed as a sequence of dependency edges. For
edges describing processor resource interferences, the view also prints a
so-called "interference probability" value (by dividing field
DependencyEdge::Frequency by the total number of iterations).

Examples of critical sequence computations can be found in tests added/modified
by this patch.

On output streams that support colored output, instructions from the critical
sequence are rendered with a different color.

Strictly speaking the analysis conducted by the bottleneck analysis view is not
a critical path analysis. The cost of an edge doesn't only depend on the
dependency latency. More importantly, the cost of a same edge may be computed
differently by different iterations.

The number of dependencies is discovered dynamically based on the events
generated by the simulator. However, their number is not fixed. This is
especially true for edges that model processor resource interferences; an
interference may not occur in every iteration. For that reason, it makes sense
to also print out a "probability of interference".

By construction, the accuracy of this analysis (as always) is strongly dependent
on the simulation (and therefore the quality of the information available in the
scheduling model).

That being said, the critical sequence effectively identifies a performance
criticality. Instructions from that sequence are expected to have a very big
impact on performance. So, users can take advantage of this information to focus
their attention on specific interactions between instructions.
In my experience, it works quite well in practice, and produces useful
output (in a reasonable amount time).

Differential Revision: https://reviews.llvm.org/D63543

llvm-svn: 364045
2019-06-21 13:32:54 +00:00
Simon Tatham
3b314c392b [ARM] Add MVE 64-bit GPR <-> vector move instructions.
These instructions let you load half a vector register at once from
two general-purpose registers, or vice versa.

The assembly syntax for these instructions mentions the vector
register name twice. For the move _into_ a vector register, the MC
operand list also has to mention the register name twice (once as the
output, and once as an input to represent where the unchanged half of
the output register comes from). So we can conveniently assign one of
the two asm operands to be the output $Qd, and the other $QdSrc, which
avoids confusing the auto-generated AsmMatcher too much. For the move
_from_ a vector register, there's no way to get round the fact that
both instances of that register name have to be inputs, so we need a
custom AsmMatchConverter to avoid generating two separate output MC
operands. (And even that wouldn't have worked if it hadn't been for
D60695.)

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62679

llvm-svn: 364041
2019-06-21 13:17:23 +00:00
Simon Tatham
e16469b097 [ARM] Add MVE vector instructions that take a scalar input.
This adds the `MVE_qDest_rSrc` superclass and all its instances, plus
a few other instructions that also take a scalar input register or two.

I've also belatedly added custom diagnostic messages to the operand
classes for odd- and even-numbered GPRs, which required matching
changes in two of the existing MVE assembly test files.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62678

llvm-svn: 364040
2019-06-21 13:17:08 +00:00
Paul Robinson
39d709618f Fix a crash with assembler source and -g.
llvm-mc or clang with -g normally produces debug info describing the
assembler source itself; however, if that source already contains some
.file/.loc directives, we should instead emit the debug info described
by those directives.  For certain assembler sources seen in the wild
(particularly in the Chrome build) this was causing a crash due to
incorrect assumptions about legal sequences of assembler source text.

Fixes PR38994.

Differential Revision: https://reviews.llvm.org/D63573

llvm-svn: 364039
2019-06-21 13:10:19 +00:00
Simon Pilgrim
d3e7305789 [X86] X86ISD::ANDNP is a (non-commutative) binop
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.

llvm-svn: 364038
2019-06-21 12:42:39 +00:00
Simon Tatham
f48c4c76b3 [ARM] Add a batch of similarly encoded MVE instructions.
Summary:
This adds the `MVE_qDest_qSrc` superclass and all instructions that
inherit from it. It's not the complete class of _everything_ with a
q-register as both destination and source; it's a subset of them that
all have similar encodings (but it would have been hopelessly unwieldy
to call it anything like MVE_111x11100).

This category includes add/sub with carry; long multiplies; halving
multiplies; multiply and accumulate, and some more complex
instructions.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62677

llvm-svn: 364037
2019-06-21 12:13:59 +00:00
James Henderson
33503da181 [binutils] Add response file option to help and docs
Many LLVM-based tools already support response files (i.e. files
containing a list of options, specified with '@'). This change simply
updates the documentation and help text for some of these tools to
include it. I haven't attempted to fix all tools, just a selection that
I am interested in.

I've taken the opportunity to add some tests for --help behaviour, where
they were missing. We could expand these tests, but I don't think that's
within scope of this patch.

This fixes https://bugs.llvm.org/show_bug.cgi?id=42233 and
https://bugs.llvm.org/show_bug.cgi?id=42236.

Reviewed by: grimar, MaskRay, jkorous

Differential Revision: https://reviews.llvm.org/D63597

llvm-svn: 364036
2019-06-21 11:49:20 +00:00
Simon Pilgrim
63f0bd16c5 [X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI.
llvm-svn: 364030
2019-06-21 11:25:06 +00:00
James Henderson
b55867a0ff [llvm-dwarfdump] Remove unnecessary explicit -h behaviour
--help and -h are automatically supported by the command-line parser,
unless overridden by the tool. The behaviour of the PrintHelpMessage
being used for -h prior to this patch is subtly different to that
provided by --help automatically (it omits certain elements of help text
and options, such as --help-list), so overriding the default is not
desirable, without good reason. This patch removes the explicit
specification of -h and its behaviour, so that the default behaviour is
used.

Reviewed by: hintonda

Differential Revision: https://reviews.llvm.org/D63565

llvm-svn: 364029
2019-06-21 11:22:20 +00:00
Fangrui Song
4cfa8e915b [ARM] Fix -Wimplicit-fallthrough after D62675
llvm-svn: 364028
2019-06-21 11:19:11 +00:00
Simon Tatham
84e2268946 [ARM] Add MVE vector compare instructions.
Summary:
These take a pair of vector register to compare, and a comparison type
(written in the form of an Arm condition suffix); they output a vector
of booleans in the VPR register, where predication can conveniently
use them.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62676

llvm-svn: 364027
2019-06-21 11:14:51 +00:00
Simon Pilgrim
a2428a85c5 [X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1)
llvm-svn: 364026
2019-06-21 11:13:15 +00:00
Fangrui Song
ff211fcdea [Symbolize] Avoid lifetime extension and simplify std::map find/insert. NFC
llvm-svn: 364025
2019-06-21 11:05:26 +00:00
Simon Pilgrim
c40e3f2037 [X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI.
Use the isConstOrConstSplat helper instead of inspecting the build vector manually.

llvm-svn: 364024
2019-06-21 10:54:30 +00:00
Simon Pilgrim
d91d3ace18 [X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern
llvm-svn: 364022
2019-06-21 10:44:15 +00:00
James Henderson
b0ce5253bb [docs][llvm-objdump] Improve llvm-objdump documentation
The llvm-objdump document was missing many options, and there were also
some style issues with it. This patches fixes all but the first issue
listed in https://bugs.llvm.org/show_bug.cgi?id=42249 by:

    1. Adding missing options and commands.
    2. Standardising on double dashes for long-options throughout.
    3. Moving Mach-O specific options to a separate section.
    4. Removing options that don't exist or aren't relevant to
       llvm-objdump.

Reviewed by: MaskRay, mtrent, alexshap

Differential Revision: https://reviews.llvm.org/D63606

llvm-svn: 364019
2019-06-21 10:12:53 +00:00
Vitaly Buka
dd00882728 [GN] Fix check-clang by disabling plugins
We can't link Analysis/plugins without -fPIC

llvm-svn: 364016
2019-06-21 09:54:32 +00:00
Vitaly Buka
74d422f345 [GN] Put libcxx include into the same place as cmake to fix Driver/print-file-name.c test
llvm-svn: 364015
2019-06-21 09:54:31 +00:00
Simon Tatham
30a95c7cbf [ARM] Add a batch of MVE floating-point instructions.
Summary:
This includes floating-point basic arithmetic (add/sub/multiply),
complex add/multiply, unary negation and absolute value, rounding to
integer value, and conversion to/from integer formats.

Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62675

llvm-svn: 364013
2019-06-21 09:35:07 +00:00