This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
Differential Revision: https://reviews.llvm.org/D31412
llvm-svn: 298948
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.
Differential Revision: https://reviews.llvm.org/D31434
llvm-svn: 298945
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).
This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.
Differential Revision: https://reviews.llvm.org/D30868
llvm-svn: 298943
-ffp-contract=fast does not currently work with LTO because it's passed as a
TargetOption to the backend rather than in the IR. This adds it to
FastMathFlags.
This is toward fixing PR25721
Differential Revision: https://reviews.llvm.org/D31164
llvm-svn: 298939
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Differential Revision: https://reviews.llvm.org/D31429
llvm-svn: 298935
Deal with case that initial node is deleted during dag-combine leading
to an assertional failure in promoteIntShiftOp.
Fixes PR32420.
Reviewers: spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31403
llvm-svn: 298931
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Differential Revision: https://reviews.llvm.org/D30968
llvm-svn: 298928
We want to check each test on each target, so we need another prefix
when SSE and AVX diverge (as they will if we handle 32-byte and higher).
llvm-svn: 298926
Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from
being used.
Fixes PR32340 and PR32345.
Reviewers: hfinkel, dbabokin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31148
llvm-svn: 298923
Summary:
Similar to the ARM target in https://reviews.llvm.org/rL298380, this
patch adds identical infrastructure for disabling negative immediate
conversions, and converts the existing aliases to the new infrastucture.
Reviewers: rengolin, javed.absar, olista01, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D31243
llvm-svn: 298908
Summary:
G_LOAD/G_STORE, add alternative RegisterBank mapping.
For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank.
Reviewers: zvi, rovka, qcolombet, ab
Reviewed By: zvi
Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank
Differential Revision: https://reviews.llvm.org/D30979
llvm-svn: 298907
This patch enables schedulers to specify instructions that
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.
Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744
llvm-svn: 298885
rebase entry errors and test cases for each of the error checks.
Also verified with Nick Kledzik that a BIND_OPCODE_SET_ADDEND_SLEB
opcode is legal in a lazy bind table, so code that had that as an error
check was removed.
With MachORebaseEntry and MachOBindEntry classes now returning
an llvm::Error in all cases for malformed input the variables Malformed
and logic to set use them is no longer needed and has been removed
from those classes.
Also in a few places, removed the redundant Done assignment to true
when also calling moveToEnd() as it does that assignment.
This only leaves the dyld compact export entries left to have
error handling yet to be added for the dyld compact info.
llvm-svn: 298883