This reverts commit cca9b5985c0c7e3c34da7f2db7cc8e7e707b0e2e.
Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir:
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function: indexed_2s
- basic block: %bb.0 entry (0x640fee8)
Virtual register %7 is used after the block.
*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function: indexed_2s
- v. register: %7
LLVM ERROR: Found 2 machine code errors.
This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner.
FMUL_indexed is normally selected during instruction selection, but it
does not work in cases when VDUP and VMUL are in different basic
blocks.
Differential Revision: https://reviews.llvm.org/D99662
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
This patch tries to reassociate two patterns related to FMA to expose
more ILP on PowerPC.
// Pattern 1:
// A = FADD X, Y (Leaf)
// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)
// -->
// A = FMA X, M21, M22
// B = FMA Y, M31, M32
// C = FADD A, B
// Pattern 2:
// A = FMA X, M11, M12 (Leaf)
// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)
// -->
// A = FMUL M11, M12
// B = FMA X, M21, M22
// D = FMA A, M31, M32
// C = FADD B, D
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D80175
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:
> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
> %mul = fmul fast <8 x half> %c, %b
> %sub = fsub fast <8 x half> %mul, %a
> ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.
This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.
Tested on aarch64-linux with make check-all.
Differential Revision: https://reviews.llvm.org/D67990
llvm-svn: 374044
This patch enables generation of fused multiply add/sub for instructions operating on fp16.
Tested on aarch64-linux.
Differential Revision: https://reviews.llvm.org/D67297
llvm-svn: 371321
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This patch adds MachineCombiner patterns for transforming
(fsub (fmul x y) z) into (fma x y (fneg z)). This has a lower
latency on micro architectures where fneg is cheap.
Patch based on work by George Steed.
Reviewers: rengolin, joelkevinjones, joel_k_jones, evandro, efriedma
Reviewed By: evandro
Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40306
llvm-svn: 319980
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267328
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267098
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
sequence - target independent framework
When the DAGcombiner selects instruction sequences
it could increase the critical path or resource len.
For example, on arm64 there are multiply-accumulate instructions (madd,
msub). If e.g. the equivalent multiply-add sequence is not on the
crictial path it makes sense to select it instead of the combined,
single accumulate instruction (madd/msub). The reason is that the
conversion from add+mul to the madd could lengthen the critical path
by the latency of the multiply.
But the DAGCombiner would always combine and select the madd/msub
instruction.
This patch uses machine trace metrics to estimate critical path length
and resource length of an original instruction sequence vs a combined
instruction sequence and picks the faster code based on its estimates.
This patch only commits the target independent framework that evaluates
and selects code sequences. The machine instruction combiner is turned
off for all targets and expected to evolve over time by gradually
handling DAGCombiner pattern in the target specific code.
This framework lays the groundwork for fixing
rdar://16319955
llvm-svn: 214666