The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.
The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015http://llvm.org/viewvc/llvm-project?view=revision&revision=211339
The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.
Differential Revision: http://reviews.llvm.org/D5172
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217051 91177308-0d34-0410-b5e6-96231b3b80d8
This CL replaces the constant DarwinX86AsmBackend.PushInstrSize with a method
that lets the backend account for different sizes of "push %reg" instruction
sizes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217020 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r216805 with a fix to a copy-past error, which resulted in an
incorrect register class.
Original commit message:
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.
This fixes rdar://problem/18183707.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217019 91177308-0d34-0410-b5e6-96231b3b80d8
There is already target-dependent instruction selection support for Adds/Subs to
support compares and the intrinsics with overflow check. This takes advantage of
the existing infrastructure to also support Add/Sub, which allows the folding of
immediates, sign-/zero-extends, and shifts.
This fixes rdar://problem/18207316.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217007 91177308-0d34-0410-b5e6-96231b3b80d8
This uses the target-dependent selection code for shifts first, which allows us
to create better code for shifts with immediates and sign-/zero-extend folding.
Vector type are not handled yet and the code falls back to target-independent
instruction selection for these cases.
This fixes rdar://problem/17907920.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216985 91177308-0d34-0410-b5e6-96231b3b80d8
The code is buggy and barely tested. It is also mostly boilerplate.
(This includes MCObjectDisassembler, which is the interface to that
functionality)
Following an IRC discussion with Jim Grosbach, it seems sensible to just
nuke the whole lot of functionality, and dig it up from VCS if
necessary (I hope not!).
All of this stuff appears to have been added in a huge patch dump (look
at the timeframe surrounding e.g. r182628) where almost every patch
seemed to be untested and not reviewed before being committed.
Post-review responses to the patches were never addressed. I don't think
any of it would have passed pre-commit review.
I doubt anyone is depending on this, since this code appears to be
extremely buggy. In limited testing that Michael Spencer and I did, we
couldn't find a single real-world object file that wouldn't crash the
CFG reconstruction stuff. The symbolizer stuff has O(n^2) behavior and
so is not much use to anyone anyway. It seemed simpler to remove them as
a whole. Most of this code is boilerplate, which is the only way it was
able to scrape by 60% coverage.
HEADSUP: Modules folks, some files I nuked were referenced from
include/llvm/module.modulemap; I just deleted the references. Hopefully
that is the right fix (one was a FIXME though!).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216983 91177308-0d34-0410-b5e6-96231b3b80d8
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reinstates commits r215111, 215115, 215116, 215117, 215136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216982 91177308-0d34-0410-b5e6-96231b3b80d8
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.
Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.
This second point matches the first issue identified in
http://llvm.org/bugs/show_bug.cgi?id=17281
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216980 91177308-0d34-0410-b5e6-96231b3b80d8
This provides an implementation of CFL alias analysis (including some
supporting data structures). Currently, we don't have any extremely fancy
features, sans some interprocedural analysis (i.e. no field sensitivity, etc.),
and we do best sitting behind BasicAA + TBAA. In such a configuration, we take
~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries
TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on
other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA
answered with NoAlias by this algorithm.
Patch by George Burgess IV (with minor modifications by me -- mostly adapting
some BasicAA tests), thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216970 91177308-0d34-0410-b5e6-96231b3b80d8
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216932 91177308-0d34-0410-b5e6-96231b3b80d8
Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().
Fixes PR20828.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216929 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts revision 216913; the new test added at revision 216913
caused regression failures on a couple of buildbots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216914 91177308-0d34-0410-b5e6-96231b3b80d8
When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.
Example:
define double @test() {
%1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
ret double %1
}
Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).
Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.
This fixes PR20832.
Differential Revision: http://reviews.llvm.org/D5152
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216913 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
BBs might contain non-LCSSA'd values after the LCSSA pass is run if they
are unreachable from the entry block.
Normally, the users of the instruction would be PHIs but the unreachable
BBs have normal users; rewrite their uses to be undef values.
An alternative fix could involve fixing this at LCSSA but that would
require this invariant to hold after subsequent transforms. If a BB
created an unreachable block, they would be in violation of this.
This fixes PR19798.
Differential Revision: http://reviews.llvm.org/D5146
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216911 91177308-0d34-0410-b5e6-96231b3b80d8
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216908 91177308-0d34-0410-b5e6-96231b3b80d8
We have been using .init-array for most systems for quiet some time,
but tools like llc are still defaulting to .ctors because the old
option was never changed.
This patch makes llc default to .init-array and changes the option to
be -use-ctors.
Clang is not affected by this. It has its own fancier logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216905 91177308-0d34-0410-b5e6-96231b3b80d8
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.
This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.
Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)
Original commit message (by Adam Nemet):
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216898 91177308-0d34-0410-b5e6-96231b3b80d8
SROA may decide that it needs to insert a bitcast and would set it's
insertion point before a PHI. This will create an invalid module
right quick.
Instead, choose the first insertion point in the basic block that holds
our PHI.
This fixes PR20822.
Differential Revision: http://reviews.llvm.org/D5141
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216891 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r216698 which reverted r216523 and r216598.
We would attempt to perform the transformation even if the match()
failed because, as a side effect, it would set V. This would trick us
into believing that we correctly found a place to correctly apply the
transform.
An additional test case was added to getelementptr.ll so that we might
not regress in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216890 91177308-0d34-0410-b5e6-96231b3b80d8
The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions.
This patch adds a convenience method to make that operation easier because we need to do this
in the loop vectorizer, SLP vectorizer, and possibly other places.
Although this is a 'no functional change' patch, I've added a testcase to verify that the exact
flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked
in existing testcases.
Differential Revision: http://reviews.llvm.org/D5138
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216886 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements a few changes related to the Thumb2 M-class MSR instruction:
* better handling of unpredictable encodings,
* recognition of the _g and _nzcvqg variants by the asm parser only if the DSP
extension is available, preferred output of MSR APSR moves with the _<bits>
suffix for v7-M.
Patch by Petr Pavlu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216874 91177308-0d34-0410-b5e6-96231b3b80d8
chain became completely broken here as *all* intrinsic users ended up
being skipped, and the ones that seemed to be singled out were actually
the exact wrong set.
This is a great example of why long else-if chains can be easily
confusing. Switch the entire code to use early exits and early continues
to have simpler (and more importantly, correct) logic here, as well as
fixing the reversed logic for detecting and continuing on lifetime
intrinsics.
I've also significantly cleaned up the test case and added another test
case demonstrating an example where the optimization is not (trivially)
safe to perform.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216871 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.
This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.
Re-applying again due to MSVC 2013 being minimum requirement, and this patch
having C++11 that MSVC 2012 didn't support.
Fixes PR20655.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216870 91177308-0d34-0410-b5e6-96231b3b80d8
This feeds AA through the IFI structure into the inliner so that
AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
functions only access their arguments (instead of just hard-coding some
knowledge of memory intrinsics). Most of the information is only available from
BasicAA; this is important for preserving alias scoping information for
target-specific intrinsics when doing the noalias parameter attribute to
metadata conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216866 91177308-0d34-0410-b5e6-96231b3b80d8
I thought that I had fixed this problem in r216818, but I did not do a very
good job. The underlying issue is that when we add alias.scope metadata we are
asserting that this metadata completely describes the aliasing relationships
within the current aliasing scope domain, and so in the context of translating
noalias argument attributes, the pointers must all be based on noalias
arguments (as underlying objects) and have no other kind of underlying object.
In r216818 excluding appropriate accesses from getting alias.scope metadata is
done by looking for underlying objects that are not identified function-local
objects -- but that's wrong because allocas, etc. are also function-local
objects and we need to explicitly check that all underlying objects are the
noalias arguments for which we're adding metadata aliasing scopes.
This fixes the underlying-object check for adding alias.scope metadata, and
does some refactoring of the related capture-checking eligibility logic (and
adds more comments; hopefully making everything a bit clearer).
Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the
feature is still disabled by default).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216863 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.
Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.
Reviewers: eliben, meheff, Jiangning
Reviewed By: Jiangning
Subscribers: jholewinski, aemerson, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4814
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216862 91177308-0d34-0410-b5e6-96231b3b80d8
DW_TAG_lexical_scopes inform debuggers about the instruction range for
which a given variable (or imported declaration/module/etc) is valid. If
the scope doesn't itself contain any such entities, it's a waste of
space and should be omitted.
We were correctly doing this for entirely empty leaves, but not for
intermediate nodes.
Reduces total (not just debug sections) .o file size for a bootstrap
-gmlt LLVM by 22% and bootstrap -gmlt clang executable by 13%. The wins
for a full -g build will be less as a % (and in absolute terms), but
should still be substantial - with some of that win being fewer
relocations, thus more substantiall reducing link times than fewer bytes
alone would have.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216861 91177308-0d34-0410-b5e6-96231b3b80d8