We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Differential Revision: https://reviews.llvm.org/D30968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298928 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.
Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297653 91177308-0d34-0410-b5e6-96231b3b80d8
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.
I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297574 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297081 91177308-0d34-0410-b5e6-96231b3b80d8
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.
This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.
The outliner is run like so:
clang -mno-red-zone -mllvm -enable-machine-outliner file.c
Patch by Jessica Paquette<jpaquette@apple.com>!
rdar://29166825
Differential Revision: https://reviews.llvm.org/D26872
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296418 91177308-0d34-0410-b5e6-96231b3b80d8
The instructions are marked commutable, but without special handling we don't get the immediate correct.
While here also remove the masked memory forms that aren't commutable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295602 91177308-0d34-0410-b5e6-96231b3b80d8
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295579 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295565 91177308-0d34-0410-b5e6-96231b3b80d8
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295564 91177308-0d34-0410-b5e6-96231b3b80d8
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295357 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.
Differential Revision: https://reviews.llvm.org/D29856
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295262 91177308-0d34-0410-b5e6-96231b3b80d8
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.
This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.
Overall I think this produces better results in the modified test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294824 91177308-0d34-0410-b5e6-96231b3b80d8
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.
Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.
Differential Revision: https://reviews.llvm.org/D28621
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294634 91177308-0d34-0410-b5e6-96231b3b80d8
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).
Reverting until we figure out the proper solution.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294348 91177308-0d34-0410-b5e6-96231b3b80d8
This includes unmasked forms of variable shift and shifting by the lower element of a register.
Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294285 91177308-0d34-0410-b5e6-96231b3b80d8
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293737 91177308-0d34-0410-b5e6-96231b3b80d8
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293464 91177308-0d34-0410-b5e6-96231b3b80d8
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.
This also picks up cases where the masked move was created due to a masked load intrinsic.
Differential Revision: https://reviews.llvm.org/D28454
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292005 91177308-0d34-0410-b5e6-96231b3b80d8
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.
This makes it possible for the register allocator to have all 32 registers available to work with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292004 91177308-0d34-0410-b5e6-96231b3b80d8