When we have a covered lookup table, make sure we don't delete PHINodes that
are cached in PHIs.
rdar://17887153
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214642 91177308-0d34-0410-b5e6-96231b3b80d8
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214636 91177308-0d34-0410-b5e6-96231b3b80d8
The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214634 91177308-0d34-0410-b5e6-96231b3b80d8
Darwin x86 asm comment prefix designed to work around GAS on that
platform. That makes the comment-matching of the test much more stable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214629 91177308-0d34-0410-b5e6-96231b3b80d8
lowering with a small addition to it and adding PSHUFB combining.
There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).
However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.
Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.
The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214628 91177308-0d34-0410-b5e6-96231b3b80d8
of normally binary shuffle instructions like PUNPCKL and MOVLHPS.
This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.
As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.
Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214625 91177308-0d34-0410-b5e6-96231b3b80d8
expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used
in pic mode. This patch fixes the bug.
<rdar://problem/17886592>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214614 91177308-0d34-0410-b5e6-96231b3b80d8
makes a mess of the lit output when they ultimately fail.
The 2012-10-02-DAGCycle test is really frustrating because the *only*
explanation for what it is testing is a rdar link. I would really rather
that rdar links (which are not public or part of the open source
project) were not committed to the source code. Regardless, the actual
problem *must* be described as the rdar link is completely opaque. The
fact that this test didn't check for any particular output further
exacerbates the inability of any other developer to debug failures.
The mem-promote-integers test has nice comments and *seems* to be
a great test for our lowering... except that we don't actually check
that any of the generated code is correct or matches some pattern. We
just avoid crashing. It would be great to go back and populate this test
with the actual expectations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214605 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of creating global variables for source locations and global names,
just create metadata nodes and strings. They will be transformed into actual
globals in the instrumentation pass (if necessary). This approach is more
flexible:
1) we don't have to ensure that our custom globals survive all the optimizations
2) if globals are discarded for some reason, we will simply ignore metadata for them
and won't have to erase corresponding globals
3) metadata for source locations can be reused for other purposes: e.g. we may
attach source location metadata to alloca instructions and provide better descriptions
for stack variables in ASan error reports.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214604 91177308-0d34-0410-b5e6-96231b3b80d8
When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision.
Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive.
Reviewed by Arnold Schwaighofer
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214599 91177308-0d34-0410-b5e6-96231b3b80d8
This updates the instrumentation based profiling format so that when
we have multiple functions with the same name (but different function
hashes) we keep all of them instead of rejecting the later ones.
There are a number of scenarios where this can come up where it's more
useful to keep multiple function profiles:
* Name collisions in unrelated libraries that are profiled together.
* Multiple "main" functions from multiple tools built against a common
library.
* Combining profiles from different build configurations (ie, asserts
and no-asserts)
The profile format now stores the number of counters between the hash
and the counts themselves, so that multiple sets of counts can be
stored. Since this is backwards incompatible, I've bumped the format
version and added some trivial logic to skip this when reading the old
format.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214585 91177308-0d34-0410-b5e6-96231b3b80d8
`parseBitcodeFile()` uses the generic `getLazyBitcodeFile()` function as
a helper. Since `parseBitcodeFile()` isn't actually lazy -- it calls
`MaterializeAllPermanently()` -- bypass the unnecessary call to
`materializeForwardReferencedFunctions()` by extracting out a common
helper function. This removes the last of the use-list churn caused by
blockaddresses.
This highlights that we can't reproduce use-list order of globals and
constants when parsing lazily -- but that's necessarily out of scope.
When we're parsing lazily, we never have all the functions in memory, so
the use-lists of globals (and constants that reference globals) are
always incomplete.
This is part of PR5680.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214581 91177308-0d34-0410-b5e6-96231b3b80d8
Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.
It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:
1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0
<rdar://problem/16952634>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214580 91177308-0d34-0410-b5e6-96231b3b80d8
variables (for example, by-value struct arguments passed in registers, or
large integer values split across several smaller registers).
On the IR level, this adds a new type of complex address operation OpPiece
to DIVariable that describes size and offset of a variable fragment.
On the DWARF emitter level, all pieces describing the same variable are
collected, sorted and emitted as DWARF expressions using the DW_OP_piece
and DW_OP_bit_piece operators.
http://reviews.llvm.org/D3373
rdar://problem/15928306
What this patch doesn't do / Future work:
- This patch only adds the backend machinery to make this work, patches
that change SROA and SelectionDAG's type legalizer to actually create
such debug info will follow. (http://reviews.llvm.org/D2680)
- Making the DIVariable complex expressions into an argument of dbg.value
will reduce the memory footprint of the debug metadata.
- The sorting/uniquing of pieces should be moved into DebugLocEntry,
to facilitate the merging of multi-piece entries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214576 91177308-0d34-0410-b5e6-96231b3b80d8
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214566 91177308-0d34-0410-b5e6-96231b3b80d8
This is consistent with how we parse them in a standalone .s file, and
inline assembly shouldn't differ.
This fixes errors about requiring more registers than available in
cases like this:
void f();
void __declspec(naked) g() {
__asm pusha
__asm call f
__asm popa
__asm ret
}
There are no registers available to pass the address of 'f' into the asm
blob. The asm should now directly call 'f'.
Tests will land in Clang shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214550 91177308-0d34-0410-b5e6-96231b3b80d8
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).
Fixes <rdar://problem/17887137>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214537 91177308-0d34-0410-b5e6-96231b3b80d8
This change adds code to explicitly mark a function which requires runtime stack realignment as not having a fixed frame size in the StackMap section. As it happens, this is not actually a functional change. The size that would be reported without the check is also "-1", but as far as I can tell, that's an accident. The code change makes this explicit.
Note: There's a separate bug in handling of stackmaps and patchpoints in functions which need dynamic frame realignment. The current code assumes that offsets can be calculated from RBP, but realigned frames must use RSP. (There's a variable gap between RBP and the spill slots.) This change set does not address that issue.
Reviewers: atrick, ributzka
Differential Revision: http://reviews.llvm.org/D4572
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214534 91177308-0d34-0410-b5e6-96231b3b80d8
This is a followup patch for r214366, which added the same behavior to the
AArch64 and X86 FastISel code. This fix reproduces the already existing
behavior of SelectionDAG in FastISel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214531 91177308-0d34-0410-b5e6-96231b3b80d8
Note: The current code in DecodeMSRMask() rejects the unpredictable A/R MSR mask '0000' with Fail. The code in the patch follows this style and rejects unpredictable M-class MSR masks also with Fail (instead of SoftFail). If SoftFail is preferred in this case then additional changes to ARMInstPrinter (to print non-symbolic masks) and ARMAsmParser (to parse non-symbolic masks) will be needed.
Patch by Petr Pavlu!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214505 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM prohibits LDRB/LDRSB instructions with writeback into the destination register. With this commit this constraint is now enforced and we stop assembling LDRH/LDRSH instructions with unpredictable behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214500 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM prohibits LDRH/LDRSH instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling LDRH/LDRSH instructions with unpredictable behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214499 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM prohibits LDR instructions with writeback into the destination register. With this commit this constraint is now enforced and we stop assembling LDR instructions with unpredictable behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214498 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Big-endian mode was not correctly adjusting the offset for types smaller
than an ABI slot.
Fixes PR19612
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: sstankovic, llvm-commits
Differential Revision: http://reviews.llvm.org/D4556
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214493 91177308-0d34-0410-b5e6-96231b3b80d8
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.
Also update the test cases to test the immediate versions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214470 91177308-0d34-0410-b5e6-96231b3b80d8
When generating unaligned vector loads, we need to search for other loads or
stores nearby offset by one vector width. If we find one, then we know that we
can safely generate another aligned load at that address. Otherwise, we must
generate the next load using an offset of the vector width minus one byte (so
we don't read off the end of the allocation if the base unaligned address
happened to be aligned at runtime). We had previously done this using only
other vector loads and stores, but did not consider the PowerPC-specific vector
load/store intrinsics. Now we'll also consider vector intrinsics. By itself,
this change is a feature enhancement, but is a necessary step toward fixing the
underlying problem behind PR19991.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214469 91177308-0d34-0410-b5e6-96231b3b80d8
Abs/neg folding has moved out of foldOperands and into the instruction
selection phase using complex patterns. As a consequence of this
change, we now prefer to select the 64-bit encoding for most
instructions and the modifier operands have been dropped from
integer VOP3 instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214467 91177308-0d34-0410-b5e6-96231b3b80d8