There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.
There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The N32/N64 ABI's return f128 values in $f0 and $f2 for hard-float and $v0 and
$a0 for soft-float. The registers used in the soft-float case differ from the
usual $v0, and $v1 specified for return values.
Both cases were previously handled by duplicating the CCState::AnalyzeReturn()
and CCState::AnalyzeCallReturn() functions and modifying them to delegate to
a different assignment function for f128 and further replace the register type
for the hard-float case. There is a simpler way to do both of these.
We now use the common functions and select an initial assignment function based
on whether the original type is f128 or not. We then handle the hard-float case
using CCBitConvertToType<>.
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5269
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218036 91177308-0d34-0410-b5e6-96231b3b80d8
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218034 91177308-0d34-0410-b5e6-96231b3b80d8
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218031 91177308-0d34-0410-b5e6-96231b3b80d8
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.
and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218030 91177308-0d34-0410-b5e6-96231b3b80d8
Certain directives are unsupported on Windows (some of which could/should be
supported). We would not diagnose the use but rather crash during the emission
as we try to access the Target Streamer. Add an assertion to prevent creating a
NULL reference (which is not permitted under C++) as well as a test to ensure
that we can diagnose the disabled directives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218014 91177308-0d34-0410-b5e6-96231b3b80d8
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.
I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.
I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218013 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than relying on support for a specific directive to determine if we are
targeting MachO, explicitly check the output format.
As an additional bonus, cleanup the caret diagnostic for the non-MachO case and
avoid the spurious error caused by not discarding the statement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218012 91177308-0d34-0410-b5e6-96231b3b80d8
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218004 91177308-0d34-0410-b5e6-96231b3b80d8
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217993 91177308-0d34-0410-b5e6-96231b3b80d8
Emit an optimized instruction sequence for sdiv by power-of-2 depending on the
exact flag.
This fixes rdar://problem/18224511.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217986 91177308-0d34-0410-b5e6-96231b3b80d8
Try to fold the multiply into the add/sub or logical operations (when
possible).
This is related to rdar://problem/18369687.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217978 91177308-0d34-0410-b5e6-96231b3b80d8
Teach 'computeAddress' to also fold multiplies into the address computation
(when possible).
This fixes rdar://problem/18369443.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217977 91177308-0d34-0410-b5e6-96231b3b80d8
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).
This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217973 91177308-0d34-0410-b5e6-96231b3b80d8
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.
This is related to rdar://problem/18358882.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217972 91177308-0d34-0410-b5e6-96231b3b80d8
Since read2 / write2 are emitted for 4-byte aligned 8-byte
accesses, these are seen by the scheduler.
The DAG scheduler is semi-deprecated, so just
ignore these for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217969 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.
This also adds unit tests to checks all the different condition codes.
This is related o rdar://problem/18358882.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217966 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.
Thanks to luqmana for having noticed this bug.
Test Plan: Added more cases to atomic-load-store.ll + make check-all
Reviewers: jfb, t.p.northover, luqmana
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5304
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217965 91177308-0d34-0410-b5e6-96231b3b80d8
Only 1 decimal place should be printed for inline immediates.
Other constants should be hex constants.
Does not include f64 tests because folding those inline
immediates currently does not work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217964 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions are now generally selected to the e64 forms originally,
and shrunk down later. Rename foldOperands to legalizeOperands,
since that's really most of what it tries to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217959 91177308-0d34-0410-b5e6-96231b3b80d8
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).
Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217928 91177308-0d34-0410-b5e6-96231b3b80d8
ADDS/SUBS unless it's safe to clobber the condition flags.
If the merged instructions are in a range where the CPSR is live,
e.g. between a CMP -> Bcc, we can't safely materialize a new base
register.
This problem is quite rare, I couldn't come up with a test case and I've
never actually seen this happen in the tests I'm running - there is a
potential trigger for this in LNT/oggenc (spills being inserted between
a CMP/Bcc), but at the moment this isn't being merged. I'll try to
reduce that into a small test case once I've committed my upcoming patch
to make merging less conservative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217881 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Changed error messages to be more informative and to resemble other clang/llvm error messages (first letter is lower case, no ending punctuation) and updated corresponding tests.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D5065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217873 91177308-0d34-0410-b5e6-96231b3b80d8
objects. There were a few FIXMEs in ARMAsmBackend.cpp suggesting the class
definitions should be in a separate file. Starting with ARMAsmBackend, the
class definition has been put in a header file, and #includes reduced. Each
sub-type of ARMAsmBackend is now in its own header file.
Derived types have been painted with a different color of bike-shed:
s/DarwinARMAsmBackend/ARMAsmBackendDarwin/g
s/ARMWinCOFFAsmBackend/ARMAsmBackendWinCOFF/g
s/ELFARMAsmBackend/ARMAsmBackendELF/g
Finally, clang-format has been run across ARMAsmBackend.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217866 91177308-0d34-0410-b5e6-96231b3b80d8
the blend that is matched by this are "used" in any sense, and so any
build_vector or other nodes feeding these will already drop other lanes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217855 91177308-0d34-0410-b5e6-96231b3b80d8
matching. This design just fundamentally didn't work because ADDSUB is
available prior to any legal lowerings of BLENDI nodes. Instead, we have
a dedicated ADDSUB synthetic ISD node which is pattern matched trivially
into the instructions. These nodes are then recognized by both the
existing and a trivial new lowering combine in the backend. Removing
these patterns required adding 2 missing shuffle masks to the DAG
combine, without which tests would have failed. Added the masks and
a helpful assert as well to catch if anything ever goes wrong here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217851 91177308-0d34-0410-b5e6-96231b3b80d8
that we don't use VSELECT and directly emit an addsub synthetic node.
Also remove a stale comment referencing VSELECT.
The test case is updated to use 'core2' which only has SSE3, not SSE4.1,
and it still passes. Previously it would not because we lacked
sufficient blend support to legalize the VSELECT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217849 91177308-0d34-0410-b5e6-96231b3b80d8