These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.
I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.
Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217744 91177308-0d34-0410-b5e6-96231b3b80d8
variants where significant.
This will make it more obvious what is happening when we start using
blends in SSE41.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217740 91177308-0d34-0410-b5e6-96231b3b80d8
support transforming the forms from the new vector shuffle lowering to
use 'movddup' when appropriate.
A bunch of the cases where we actually form 'movddup' don't actually
show up in the test results because something even later than DAG
legalization maps them back to 'unpcklpd'. If this shows back up as
a performance problem, I'll probably chase it down, but it is at least
an encoded size loss. =/
To make this work, also always do this canonicalizing step for floating
point vectors where the baseline shuffle instructions don't provide any
free copies of their inputs. This also causes us to canonicalize
unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space
win.
There is one test which is "regressed" by this: extractelement-load.
There, the test case where the optimization it is testing *fails*, the
exact instruction pattern which results is slightly different. This
should probably be fixed by having the appropriate extract formed
earlier in the DAG, but that would defeat the purpose of the test.... If
this test case is critically important for anyone, please let me know
and I'll try to work on it. The prior behavior was actually contrary to
the comment in the test case and seems likely to have been an accident.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217738 91177308-0d34-0410-b5e6-96231b3b80d8
pointer to a dead function. To make sure it's valid, doFinalization nullptrs
RewindFunction just like the constructor and so it will be found on next run.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217737 91177308-0d34-0410-b5e6-96231b3b80d8
... Just make sure we check uses first so we see the kill first. It
turns out ignoring defs gives some pretty nasty runtime failures.
I'm certain this is the fix but I'm still reducing a testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217735 91177308-0d34-0410-b5e6-96231b3b80d8
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.
Fixes rdar://problem/18330589.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217732 91177308-0d34-0410-b5e6-96231b3b80d8
Check that the post RA scheduler is being skipped, regardless of
whether it's the top-down list latency scheduler or the post-RA
MI scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217725 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to my previous -exports-trie option, the -rebase option dumps info from
the LC_DYLD_INFO load command. The rebasing info is a list of the the locations
that dyld needs to adjust if a mach-o image is not loaded at its preferred
address. Since ASLR is now the default, images almost never load at their
preferred address, and thus need to be rebased by dyld.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217709 91177308-0d34-0410-b5e6-96231b3b80d8
The raw profiles that are generated in compiler-rt always add padding
so that each profile is aligned, so we can simply treat files that
don't have this property as malformed.
Caught by Alexey's new ubsan bot. Thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217708 91177308-0d34-0410-b5e6-96231b3b80d8
enabled. A good chunk of the MIsNeedChainEdge() is logic that is valid and should be applied even for targets
that are not using for alias analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217706 91177308-0d34-0410-b5e6-96231b3b80d8
Vector MUL/MLAs have tied operands, which gives us extra constraints
that we currently can't handle. Instead of silently doing the wrong
thing, remove support to be readded later properly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217690 91177308-0d34-0410-b5e6-96231b3b80d8
Defs are seen before uses, so a def without the kill flag doesn't necessarily
mean that the register is not killed on that instruction. It may be killed
in a later use operand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217689 91177308-0d34-0410-b5e6-96231b3b80d8
As far as I can tell UTF-8 has been supported since the beginning of Python's
codec support, and it's the de facto standard for text these days, at least
for primarily-English text. This allows us to put Unicode into lit RUN lines.
rdar://problem/18311663
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217688 91177308-0d34-0410-b5e6-96231b3b80d8
Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217674 91177308-0d34-0410-b5e6-96231b3b80d8
vectors.
e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup
the result by 32 bits, not 64. PR20917.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217671 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a call to sys::fs::equivalent that should've been to
CodeCoverageTool::equivalentFiles, which lets us restore the test of
r217476 that was removed in r217478.
This reverts r217478, but the test works this time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217646 91177308-0d34-0410-b5e6-96231b3b80d8
A "stub found found" diagnostic is emitted when RuntimeDyldChecker's stub lookup
logic fails to find the requested stub. The obvious reason for the failure is
that no such stub has been created, but it can also fail for internal symbols if
the symbol offset is not computed correctly (E.g. due to a mangled relocation
addend). This patch adds a comment about the latter case so that it's not
overlooked.
Inspired by confusion experienced during test case construction for r217635.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217643 91177308-0d34-0410-b5e6-96231b3b80d8