Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.
I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186149 91177308-0d34-0410-b5e6-96231b3b80d8
GPR and FPR constraints like "{r2}" and "{f2}" weren't handled correctly
because the name-to-regno mapping depends on the value type and
(because of that) the internal names in RegStrings are not the
same as the AsmName.
CC constraints like "{cc}" didn't work either because there was no
associated register class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186148 91177308-0d34-0410-b5e6-96231b3b80d8
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186147 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds explicit calling convention types for the Win64 and
System V/x86-64 ABIs. This allows code to override the default, and use
the Win64 convention on a target that wants to use SysV (and
vice-versa). This is needed to implement the `ms_abi` and `sysv_abi` GNU
attributes.
Reviewers:
CC:
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186144 91177308-0d34-0410-b5e6-96231b3b80d8
We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for
v8i16 (which occurs in the test case) or v16i8. The same was true for
V_SETALLONES (so I added the associated patterns for those as well).
Another bug found by llvm-stress.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186108 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI
with an out-of-range operand. Most uses of the isRunOfOnes function are guarded
by a condition that the value is not zero. This was not true in two places, and
in both places a zero input would result in an out-of-rage MB value (= 32).
To fix this, isRunOfOnes returns false on a zero input (and I've remove one
now-redundant guard).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186101 91177308-0d34-0410-b5e6-96231b3b80d8
RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.
It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186072 91177308-0d34-0410-b5e6-96231b3b80d8
RISBG has three 8-bit operands (I3, I4 and I5). I'd originally
restricted all three to 6 bits, since that's the only range we intended
to use at the time. However, the top bit of I4 acts as a "zero" flag for
RISBG, while the top bit of I3 acts as a "test" flag for RNSBG & co.
This patch therefore allows them to have the full 8-bit range.
I've left the fifth operand as a 6-bit value for now since the
upper 2 bits have no defined meaning.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186070 91177308-0d34-0410-b5e6-96231b3b80d8
Enough for the radeonsi driver to use it for calculating derivatives.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186012 91177308-0d34-0410-b5e6-96231b3b80d8
In discussing this change with Bill Schmidt, it was decided that the original
comment about negative FIs was incorrect. We'll still exclude them for now, but
now with a more-accurate explanation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186005 91177308-0d34-0410-b5e6-96231b3b80d8
Currently ARM is the only backend that supports FMA instructions (for at least some subtargets) but does not implement this virtual, so FMAs are never generated except from explicit fma intrinsic calls. Apparently this is due to the fact that it supports both fused (one rounding step) and unfused (two rounding step) multiply + add instructions. This patch clarifies that this the case without changing behavior by implementing the virtual function to simply return false, as the default TargetLoweringBase version does.
It is possible that some cpus perform the fused version faster than the unfused version and vice-versa, so the function implementation should be revisited if hard data is found.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185994 91177308-0d34-0410-b5e6-96231b3b80d8
Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.
rdar://14338767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185982 91177308-0d34-0410-b5e6-96231b3b80d8
A more complete example of the bug in PR16556 was recently provided,
showing that the previous fix was not sufficient. The previous fix is
reverted herein.
The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as
custom lowering for FP_TO_SINT during type legalization, without
checking whether the input type is handled by that routine.
LowerFP_TO_INT requires the input to be f32 or f64, so we fail when
the input is ppcf128.
I'm leaving the test case from the initial fix (r185821) in place, and
adding the new test as another crash-only check.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185959 91177308-0d34-0410-b5e6-96231b3b80d8
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185956 91177308-0d34-0410-b5e6-96231b3b80d8
In the commit message to r185476 I wrote:
>The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD
>correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD.
>This causes some confusion with the asm parser, since VK_PPC_TLSGD
>is output as @tlsgd, which is then read back in as VK_TLSGD.
>
>To avoid this confusion, this patch removes the PowerPC-specific
>modifiers and uses the generic modifiers throughout. (The only
>drawback is that the generic modifiers are printed in upper case
>while the usual convention on PowerPC is to use lower-case modifiers.
>But this is just a cosmetic issue.)
This was unfortunately incorrect, there is is fact another,
serious drawback to using the default VK_TLSLD/VK_TLSGD
variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT
to return true, which in turn causes the ELFObjectWriter to emit
an undefined reference to _GLOBAL_OFFSET_TABLE_.
This is a problem on powerpc64, because it uses the TOC instead
of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_,
so the symbol remains undefined. This means shared libraries
using TLS built with the integrated assembler are currently
broken.
While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation
probably ought to be properly fixed at some point, for now I'm
simply reverting the r185476 commit. Now this in turn exposes
the breakage of handling @tlsgd/@tlsld in the asm parser that
this check-in was originally intended to fix.
To avoid this regression, I'm also adding a different fix for
this problem: while common code now parses @tlsgd as VK_TLSGD,
a special hack in the asm parser translates this code to the
platform-specific VK_PPC_TLSGD that the back-end now expects.
While this is not really pretty, it's self-contained and
shouldn't hurt anything else for now. One the underlying
problem is fixed, this hack can be reverted again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185945 91177308-0d34-0410-b5e6-96231b3b80d8
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.
NOTE: This is a candidate for the stable branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185943 91177308-0d34-0410-b5e6-96231b3b80d8
The PowerPC assembler is supposed to provide a directive .machine
that allows switching the supported CPU instruction set on the fly.
Since we do not yet check CPU feature sets at all and always accept
any available instruction, this is not really useful at this point.
However, it makes sense to accept (and ignore) ".machine any" to
avoid spuriously rejecting existing assembler files that use this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185924 91177308-0d34-0410-b5e6-96231b3b80d8
Look for patterns of the form (store (load ...), ...) in which the two
locations are known not to partially overlap. (Identical locations are OK.)
These sequences are better implemented by MVC unless either the load or
the store could use RELATIVE LONG instructions.
The testcase showed that we weren't using LHRL and LGHRL for extload16,
only sextloadi16. The patch fixes that too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185919 91177308-0d34-0410-b5e6-96231b3b80d8
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC. As with memcpy, I'm leaving longer cases
till later.
The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185918 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for the .llong PowerPC-specifc assembler directive.
In doing so, I notices that .word is currently incorrect: it is
supposed to define a 2-byte data element, not a 4-byte one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185911 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes another bug found by llvm-stress!
If we happen to be doing an i64 load or store into a stack slot that has less
than a 4-byte alignment, then the frame-index elimination may need to use an
indexed load or store instruction (because the offset may not be a multiple of
4, a requirement of the STD/LD instructions). The extra register needed to hold
the offset comes from the register scavenger, and it is possible that the
scavenger will need to use an emergency spill slot. As a result, we need to
make sure that a spill slot is allocated when doing an i64 load/store into a
less-than-4-byte-aligned stack slot.
Because test cases for things like this tend to be fairly fragile, I've
concatenated a few small bugpoint-reduced test cases together to form the
regression test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185907 91177308-0d34-0410-b5e6-96231b3b80d8
Explicit references to %AH for an i8 remainder instruction can lead to
references to %AH in a REX prefixed instruction, which causes things to
blow up. Do the same thing in FastISel as we do for DAG isel and instead
shift %AX right by 8 bits and then extract the 8-bit subreg from that
result.
rdar://14203849
http://llvm.org/bugs/show_bug.cgi?id=16105
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185899 91177308-0d34-0410-b5e6-96231b3b80d8
A setting in MCAsmInfo defines the "assembler dialect" to use. This is used
by common code to choose between alternatives in a multi-alternative GNU
inline asm statement like the following:
__asm__ ("{sfe|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2));
The meaning of these dialects is platform specific, and GCC defines those
for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for
new-style (PowerPC) mnemonics, like in the example above.
To be compatible with inline asm used with GCC, LLVM ought to do the same.
Specifically, this means we should always use assembler dialect 1 since
old-style mnemonics really aren't supported on any current platform.
However, the current LLVM back-end uses:
AssemblerDialect = 1; // New-Style mnemonics.
in PPCMCAsmInfoDarwin, and
AssemblerDialect = 0; // Old-Style mnemonics.
in PPCLinuxMCAsmInfo.
The Linux setting really isn't correct, we should be using new-style
mnemonics everywhere. This is changed by this commit.
Unfortunately, the setting of this variable is overloaded in the back-end
to decide whether or not we are on a Darwin target. This is done in
PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo
AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect
to 1 for both Darwin and Linux no longer allows us to make this distinction.
Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter
to distinguish Darwin targets, and ignores the SyntaxVariant parameter.
As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs
to be passed in by the caller when creating a target MCExpr. (To do so
this patch implicitly also reverts commit 184441.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185858 91177308-0d34-0410-b5e6-96231b3b80d8
Another bug found by llvm-stress! This fixes hitting
llvm_unreachable("Invalid integer vector compare condition");
at the end of getVCmpInst in PPCISelDAGToDAG.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185855 91177308-0d34-0410-b5e6-96231b3b80d8
Another bug found by llvm-stress! This fixes crashing with:
LLVM ERROR: Cannot select: v4f32 = frem ...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185840 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for the old-style time base instructions;
while new programs are supposed to use mfspr, the mftb instructions
are still supported and in use by existing assembler files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185829 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for the basic mnemoics (with the L operand) for the
fixed-point compare instructions. These are defined as aliases for the
already existing CMPW/CMPD patterns, depending on the value of L.
This requires use of InstAlias patterns with immediate literal operands.
To make this work, we need two further changes:
- define a RegisterPrefix, because otherwise literals 0 and 1 would
be parsed as literal register names
- provide a PPCAsmParser::validateTargetOperandClass routine to
recognize immediate literals (like ARM does)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185826 91177308-0d34-0410-b5e6-96231b3b80d8
PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be
either an f32 or f64, but this is not checked. A long double
(ppcf128) operand will normally be custom-lowered to a conversion to
f64 in this context. However, this isn't the case for an UNDEF node.
This patch recognizes a ppcf128 as a legal source operand for
FP_TO_INT only if it's an undef, in which case it creates an undef of
the target type.
At some point we might want to do a wholesale custom lowering of
ISD::UNDEF when the type is ppcf128, but it's not really clear that's
a great idea, and probably more work than it's worth for a situation
that only arises in the case of a programming error. At this point I
think simple is best.
The test case comes from PR16556, and is a crash-test only.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185821 91177308-0d34-0410-b5e6-96231b3b80d8