Summary:
We didn't have entries in the commuting table for the 32-bit
instructions. I don't think we hit this problem now, but we
will once uniform branching is enabled. Tests will come in
a later commit.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16600
llvm-svn: 258936
MCJIT emits zero-length CIE at the end of the _eh_frame section. This change
ensures that parser inside DebugInfo will not crash and correctly record such cases.
We are now recording DW_EH_PE_omit as a default value for FDE and LSDA encodings.
Also Offset != EndAugmentationOffset assertion check will only happen if augmentation
string had 'z' letter in it.
Differential Revision: http://reviews.llvm.org/D16588
llvm-svn: 258931
This patch is the second attempt to reapply commit r258404. There was bug in
the initial patch and subsequent fix (mentioned below).
The initial patch caused an assertion because we were computing smaller type
sizes for instructions that cannot be demoted. The fix first determines the
instructions that will be demoted, and then applies the smaller type size to
only those instructions.
This should fix PR26239 and PR26307.
llvm-svn: 258929
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.
Reviewers: tstellarAMD
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16485
llvm-svn: 258922
Summary:
This is a revised version of D13974, and the following quoted summary are from D13974
"This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch."
D13974 was committed but failed one lnt test. The bug was that we only checked the condition from loop exit's incoming block was a loop invariant. But there could be another condition from loop header to that incoming block not being a loop invariant. This would produce miscompiled code.
This patch fixes the issue by checking if the incoming block is loop header, and if not, don't perform the rewrite. The could be further improved by recursively checking all conditions leading to loop exit block, but I'd like to check in this simple version first and improve it with future patches.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16570
llvm-svn: 258912
Most of the time we only hit the small case, so it is beneficial to pull
it out of the insert_imp() implementation. This improves compile time
at least for non-LTO builds.
Differential Revision: http://reviews.llvm.org/D16619
llvm-svn: 258908
SimplifyCFG tries to turn complex branch conditions into a switch.
Some of it's logic attempts to reason about bitwise arithmetic produced
by InstCombine. InstCombine can turn things like (X == 2) || (X == 3)
into (X & 1) == 2 and so SimplifyCFG tries to detect when this occurs so
that it can produce a switch instruction.
However, the legality checking was not sufficient to determine whether
or not this had occured. Correctly check this case by requiring that
the right-hand side of the comparison be a power of two.
This fixes PR26323.
llvm-svn: 258904
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.
Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.
Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.
Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.
llvm-svn: 258901
This brings the compile time of Function.cpp from ~40s down to ~4s for
me locally. It also shaves off about 400KB of object file size in a
release+asserts build.
I also realized that the AMDGPU backend does not have any GCC builtin
names to match, so the extra lookup was a no-op. I removed it to silence
a zero-length string table array warning. There should be no functional
change here.
This change really ends the story of PR11951.
llvm-svn: 258897
Summary:
NVVM doesn't have a standard library, as currently implemented, so this
just isn't going to work. I'd like to revisit this, since it's hiding
opportunities for optimization, but correctness comes first.
Thank you to hfinkel for pointing me in the right direction here.
Reviewers: tra
Subscribers: echristo, jhen, llvm-commits, hfinkel
Differential Revision: http://reviews.llvm.org/D16604
llvm-svn: 258884
at least as big as the mach header to be identified as a Mach-O file and
make sure smaller files are not identified as a Mach-O files but as
unknown files. Also fix identify_magic() so it looks at all 4 bytes of
the filetype field when determining the type of the Mach-O file.
Then fix the macho-invalid-header test case to check that it is an
unknown file and make sure it does not get the error for
object_error::parse_failed. And also update the unit tests.
llvm-svn: 258883
AvailableValue is the part that represents the potential rematerialization. AvailableValueInBlock is simply a pair of an AvailableValue and a BB which we might materialize it in.
This is motivated by http://reviews.llvm.org/D16608. The intent is that we'll have a single function which handles the local case which both local and non-local will use to identify available values. Once that's done, the local case can rematerialize at the use site and the non-local case can do the SSA construction as it does currently.
llvm-svn: 258882
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.
llvm-svn: 258875
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases.
llvm-svn: 258851
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:
jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...
AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:
jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...
However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.
Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.
In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.
Differential Revision: http://reviews.llvm.org/D11393
llvm-svn: 258847
Previously the RedoInsts was processed at the end of the block.
However it was possible that it left behind some instructions that
were not canonicalized.
This should guarantee that any previous instruction in the basic
block is canonicalized before we process a new instruction.
llvm-svn: 258830
This is a step towards solving PR25892:
https://llvm.org/bugs/show_bug.cgi?id=25892
It won't handle the reported case. As noted by the 'TODO' comments in the patch,
we need to relax the hasOneUse() constraint and also match patterns that include
memset_chk() and the llvm.memset() intrinsic in addition to memset().
Differential Revision: http://reviews.llvm.org/D16337
llvm-svn: 258816
This commit exposes a crash in computeKnownBits on the Chromium buildbots.
Reverting to investigate.
Reference: https://llvm.org/bugs/show_bug.cgi?id=26307
llvm-svn: 258812
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).
It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.
After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.
Differential Revision: http://reviews.llvm.org/D16217
llvm-svn: 258798
Their opcodes are used as part of the VEX prefix in 64-bit mode. Clearly the disassembler implicitly decoded them as AVX instructions in 64-bit mode, but I think the AsmParser would have encoded them.
llvm-svn: 258793