This fixes another miscompile introduced by r235232: when there was a
dependency on the memcpy destination other than the memset, we would
ignore it, because we only looked at the source dependency.
It was a mistake to use SrcDepInfo. Instead, just use DepInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237066 91177308-0d34-0410-b5e6-96231b3b80d8
The TargetRegistry is just a namespace-like class, instantiated in one
place to use a range-based for loop. Instead, expose access to the
registry via a range-based 'targets()' function instead. This makes most
uses a bit awkward/more verbose - but eventually we should just add a
range-based find_if function which will streamline these functions. I'm
happy to mkae them a bit awkward in the interim as encouragement to
improve the algorithms in time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237059 91177308-0d34-0410-b5e6-96231b3b80d8
cleanups.
Also, change code in tablegen which printed a message and then called
"exit(1)" to use PrintFatalError, instead.
This fixes instances where an empty output file was left behind after
a failed tablegen invocation, which would confuse subsequent ninja
runs into not attempting to rebuild.
Differential Revision: http://reviews.llvm.org/D9608
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237058 91177308-0d34-0410-b5e6-96231b3b80d8
This is a less ambitious version of:
http://reviews.llvm.org/rL236546
because that was reverted in:
http://reviews.llvm.org/rL236600
because it caused memory corruption that wasn't related to FMF
but was actually due to making nodes with 2 operands derive from a
plain SDNode rather than a BinarySDNode.
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237046 91177308-0d34-0410-b5e6-96231b3b80d8
runOnCountable() allowed the caller to call on a loop without a
predictable backedge-taken count. Change the code so that only loops
with computable backdge-count can call this function, in order to catch
abuses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237044 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.
This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.
Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9592
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237009 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
r235215 adds support for f16 to be considered as a load/store type and
promote f16 operations to f32.
This patch has miscellaneous fixes for the X86 backend so all f16
operations are handled:
1. Set loadextaction for f16 vectors to expand.
2. Handle FP_EXTEND in a switch statement when handling v2f32
3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or
(store (SINT_TO_FP )) to a FILD.
Tests included.
Reviewers: ab, srhines, delena
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9092
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237004 91177308-0d34-0410-b5e6-96231b3b80d8
ValueTracking.
This matching functionality is useful in more than just InstCombine, so
make it available in ValueTracking.
NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236998 91177308-0d34-0410-b5e6-96231b3b80d8
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236973 91177308-0d34-0410-b5e6-96231b3b80d8
Second attempt; instead of using a named local variable, passing
arguments directly to `createSanitizerCtorAndInitFunctions` worked
on Windows.
Reviewers: kcc, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8780
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236951 91177308-0d34-0410-b5e6-96231b3b80d8
warning: enumeral and non-enumeral type in conditional expression
Cast the 0 to the appropriate type. NFC. Identified by GCC 4.9.2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236942 91177308-0d34-0410-b5e6-96231b3b80d8
The bug showed up as a compile-time assertion failure:
Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed
when building msan tests on x86-64.
Prior to r236850, this bug was masked due to a bogus alignment check,
which also accidentally rejected non-byte-sized accesses. Afterwards,
an invalid ElementSizeBytes == 0 got further into the function, and
triggered the assertion failure.
It would probably be a good idea to allow it to handle merging stores
of unusual widths as well, but for now, to un-break it, I'm just
making the minimal fix.
Differential Revision: http://reviews.llvm.org/D9626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236927 91177308-0d34-0410-b5e6-96231b3b80d8
When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to
mark the vreg containing 1000 as killed. Given that we go bottom up in fast-isel, a later
use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill.
However, rematerialised constant expressions aren't generated bottom up. The local value save area
grows downwards. This means that if you remat 2 constant expressions which both use 1000 then the
first will kill it, then the second, which is *lower* in the BB will read a killed register.
This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset.
Note that this commit only makes kill flag generation conservative. There's nothing else obviously wrong with
the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions.
However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236922 91177308-0d34-0410-b5e6-96231b3b80d8
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236916 91177308-0d34-0410-b5e6-96231b3b80d8
This new class in a global context contain arch-specific knowledge in order
to provide LLVM libraries, tools and projects with the ability to understand
the architectures. For now, only FPU, ARCH and ARCH extensions on ARM are
supported.
Current behaviour it to parse from free-text to enum values and back, so that
all users can share the same parser and codes. This simplifies a lot both the
ASM/Obj streamers in the back-end (where this came from), and the front-end
parsers for command line arguments (where this is going to be used next).
The previous implementation, using .def/.h includes is deprecated due to its
inflexibility to be built without the backend support and for being too
cumbersome. As more architectures join this scheme, and as more features of
such architectures are added (such as hardware features, type sizes, etc) into
a full blown TargetDescription class, having a set of classes is the most
sane implementation.
The ultimate goal of this refactor both LLVM's and Clang's target description
classes into one unique interface, so that we can de-duplicate and standardise
the descriptions, as well as make it available for other front-ends, tools,
etc.
The FPU parsing for command line options in Clang has been converted to use
this new library and a number of aliases were added for compatibility:
* A bogus neon-vfpv3 alias (neon defaults to vfp3)
* armv5/v6
* {fp4/fp5}-{sp/dp}-d16
Next steps:
* Port Clang's ARCH/EXT parsing to use this library.
* Create a TableGen back-end to generate this information.
* Run this TableGen process regardless of which back-ends are built.
* Expose more information and rename it to TargetDescription.
* Continue re-factoring Clang to use as much of it as possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236900 91177308-0d34-0410-b5e6-96231b3b80d8
When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register. This is done via updateValueMap,.
However, its possible that the source register we are rewriting *to* to also have uses. If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails.
This code checks for the case where the to register is also used, and if so it clears all kill on the from register. This is conservative, but better that always clearing kills on the from register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236897 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There are several unhandled edge cases in BasicAA's GetLinearExpression
method. This changes fixes outstanding issues, including zext / sext of
a constant with the sign bit set, and the refusal to decompose zexts or
sexts of wrapping arithmetic.
Test Plan: Unit tests added in //q.ext.ll//.
Patch by Nick White.
Reviewers: hfinkel, sanjoy
Reviewed By: hfinkel, sanjoy
Subscribers: sanjoy, llvm-commits, hfinkel
Differential Revision: http://reviews.llvm.org/D6682
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236894 91177308-0d34-0410-b5e6-96231b3b80d8
A trunc from i32 to i1 on x86_64 generates an instruction such as
%vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9
However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one.
Otherwise, we are killing the source of the truncate which could be used later in the program.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236890 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the shape of the statepoint intrinsic from:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)
to:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)
This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.
In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.
Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.
Differential Revision: http://reviews.llvm.org/D9501
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236888 91177308-0d34-0410-b5e6-96231b3b80d8
The test here was sinking the AND here to a lower BB:
%vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8
TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8
which meant that vreg8 was read after it was killed.
This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236886 91177308-0d34-0410-b5e6-96231b3b80d8
Improved the AnalyzeBranch, InsertBranch, and RemoveBranch
functions in order to handle more of our branch instructions.
This requires changes to analyzeCompare and PredicateInstructions.
Specifically, we've added support for new value compare jumps,
improved handling of endloop, added more compare instructions,
and improved support for predicate instructions.
Differential Revision: http://reviews.llvm.org/D9559
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236876 91177308-0d34-0410-b5e6-96231b3b80d8
The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes
where the mask node is a load from constant pool, and the constant pool node
is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching
it how to also look through X86ISD::WrapperRIP.
This helps function combineX86ShufflesRecusively to combine more shuffle
sequences containing PSHUFB nodes if we are in RIPRel PIC mode.
Before this change, llc (with -relocation-model=pic -march=x86-64) was unable
to decode a pshufb where the mask was loaded from a constant pool. For example,
the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its
operand, so instead of generating a single 'movaps' the backend always
generated a sub-optimal 'movdqa + pshufb' sequence.
Added test x86-fold-pshufb.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236863 91177308-0d34-0410-b5e6-96231b3b80d8
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.
Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.
2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236850 91177308-0d34-0410-b5e6-96231b3b80d8
Restructure Triple::getARMCPUForArch so that invalid values will
return nullptr, while retaining the behaviour that an argument
specifying no particular architecture version will give a default
CPU. This will be used by clang to give an error on invalid -march
values.
Also restructure the extraction of the architecture version from
the MArch string a little to hopefully make what it's doing clearer.
Differential Revision: http://reviews.llvm.org/D9599
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236845 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In microMIPS, labels need to know whether they are on code or data. This is
indicated with STO_MIPS_MICROMIPS and can be inferred by being followed
by instructions. For empty basic blocks, we can ensure this by emitting the
.insn directive after the label.
Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the
exception handling regression tests under: SingleSource/Regression/C++/EH
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9530
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236815 91177308-0d34-0410-b5e6-96231b3b80d8
If we duplicate an instruction then we must also clear kill flags on any uses we rewrite.
Otherwise we might be killing a register which was used in other BBs.
For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated.
%vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10
%vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10
The copy here is inserted after the add and so needs vreg10 to be live.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236782 91177308-0d34-0410-b5e6-96231b3b80d8
We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there.
Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236764 91177308-0d34-0410-b5e6-96231b3b80d8
This patch corresponds to review:
http://reviews.llvm.org/D9440
It adds a new register class to the PPC back end to contain single precision
values in VSX registers. Additionally, it adds scalar loads and stores for
VSX registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236755 91177308-0d34-0410-b5e6-96231b3b80d8
This commit enables the tests located in test/YAMLParser directory.
Those tests were never actually enabled, as llvm-lit didn't pick up the
files with the 'data' extension. The commit renames those test files to files
with the 'test' extension so that llvm-lit would find them.
This commit also modifies yaml-bench so that it returns an error status
if an error occurred during parsing. It also adds the '-use-color'
command line option to yaml-bench (to make sure that file check matches
the error messages in the output stream).
This commit modifies some of the renamed tests so that they wouldn't
fail. It gets rid of XFAILs and uses the 'not' command instead for
some of the tests that have to fail during parsing. This commit
also adds some 'FIXME' comments to a couple of tests that are
supposed to fail but currently pass because of various bugs
in the implementation of the yaml parser.
Reviewers: Justin Bogner
Differential Revision: http://reviews.llvm.org/D9448
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236754 91177308-0d34-0410-b5e6-96231b3b80d8
Clang regressions were caused by more stringent assertion checking
introduced by this change. Small fix needed to clang has been committed
in r236751.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236752 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This addresses PR 22718. When branch weights are too large, they were
being clamped to the range [1, MaxWeightForBB]. But this clamping is
only applied to edges that go outside the range, so it distorts the
relative branch probabilities.
This patch changes the weight calculation to scale every branch so the
relative probabilities are preserved. The scaling is done differently
now. First, all the branch weights are added up, and if the sum exceeds
32 bits, it computes an integer scale to bring all the weights within
the range.
The patch fixes an existing test that had slightly wrong branch
probabilities due to the previous clamping. It now gets branch weights
scaled accordingly.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236750 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-on to r236740 where I took Andrea's advice
in D9504 to remove a redundant pattern...except that I removed
the wrong pattern!
AFAICT, there is no change in the final code produced because
subsequent passes would clean up the extra instructions created
by the more complicated pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236743 91177308-0d34-0410-b5e6-96231b3b80d8
Fix two other variables that might cause the same hang fixed in r235914.
The hang is caused by constructing ManagedStatic in signalhandler. In
this case, if FileToRemove or CallBacksToRun is not contructed, it means
there is no work to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236741 91177308-0d34-0410-b5e6-96231b3b80d8
Finish the job that was abandoned in D6958 following the refactoring in
http://reviews.llvm.org/rL230221:
1. Uncomment the intrinsic def for the AVX r_Int instruction.
2. Add missing r_Int entries to the load folding tables; there are already
tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I
haven't added any more in this patch.
3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ).
So instead of this:
movaps %xmm0, %xmm1
rcpss %xmm1, %xmm1
movss %xmm1, %xmm0
We should now get:
rcpss %xmm0, %xmm0
And instead of this:
vsqrtss %xmm0, %xmm0, %xmm1
vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3]
We should now get:
vsqrtss %xmm0, %xmm0, %xmm0
Differential Revision: http://reviews.llvm.org/D9504
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236740 91177308-0d34-0410-b5e6-96231b3b80d8
After r236617, branch probabilities are no longer guaranteed to be >= 1. This
patch makes the swich lowering code handle that correctly, without bumping the
branch weights by 1 which might cause overflow and skews the probabilities.
Covered by @zero_weight_tree in test/CodeGen/X86/switch.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236739 91177308-0d34-0410-b5e6-96231b3b80d8
This change adds support for the SHT_MIPS_ABIFLAGS section
reading/writing to the obj2yaml and yaml2obj tools.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236738 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This will enable the IAS to reject floating point instructions if soft-float is enabled.
Reviewers: dsanders, echristo
Reviewed By: dsanders
Subscribers: jfb, llvm-commits, mpf
Differential Revision: http://reviews.llvm.org/D9053
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236713 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
One step further getting aggregate loads and store being optimized
properly. This will only handle struct with one element at this point.
Test Plan: Added unit tests for the new supported cases.
Reviewers: chandlerc, joker-eph, joker.eph, majnemer
Reviewed By: majnemer
Subscribers: pete, llvm-commits
Differential Revision: http://reviews.llvm.org/D8339
Patch by Amaury Sechet.
From: Amaury Sechet <amaury@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236695 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This gives frontend more precise control over collected coverage
information. User can still override these options by passing
-mllvm flags.
No functionality change.
Test Plan: regression test suite.
Reviewers: kcc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9539
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236687 91177308-0d34-0410-b5e6-96231b3b80d8
If we have recognized that a conditional is constant at a particular location in the code (while trying to decide if we can simplify a conditional branch), we can eagerly replace that condition with a constant if it's definition is post dominated by the branch in question.
In practice, this ends up being a compile time savings at most. JumpThreading would have visited each using branch anyways. CVP would have visited the cmp itself again. Unless LVI gives up early, we shouldn't gain any addition power by doing this transformation early. What we do gain is simplicity and compile time.
Differential Revision: http://reviews.llvm.org/D9312
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236684 91177308-0d34-0410-b5e6-96231b3b80d8
Renames the original CreateGCStatepoint to CreateGCStatepointCall, and
moves invoke creating functionality from PlaceSafepoints.cpp to
IRBuilder.cpp.
This changes the labels generated for PlaceSafepoints/invokes.ll so use
a regex there to make the basic block labels more resilient.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236672 91177308-0d34-0410-b5e6-96231b3b80d8
This makes use of the new API which can remove attributes from a set given a builder.
This is much faster than creating a temporary set and reduces llc time by about 0.3% which was all spent creating temporary attributes sets on the context.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236668 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this change we would have to construct a temporary AttributeSet (which isn't temporary at all given that its allocated on the context), just to contain the attributes in the builder, then call remove on that.
Now we can just remove any attributes from the (lightweight and really temporary) builder itself.
Will be used in a future commit to remove some temporary attributes sets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236666 91177308-0d34-0410-b5e6-96231b3b80d8
Since the coverage mapping reader and the instrprof reader were
emitting a shared set of error codes, the error messages you'd get
back from llvm-cov were ambiguous about what was actually wrong. Add
another error category to fix this.
I've also improved the wording on a couple of the instrprof errors,
for consistency.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236665 91177308-0d34-0410-b5e6-96231b3b80d8
This commit extracts the code that skips over a YAML comment from
the 'scanToNextToken' method into a separate 'skipComment' method.
This refactoring is motivated by a patch that implements parsing
of YAML block scalars (http://reviews.llvm.org/D9503), as the
method that parses a block scalar reuses the 'skipComment' method.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236663 91177308-0d34-0410-b5e6-96231b3b80d8
Somehow I dropped this in r233585, and we haven't had `DEBUG_LOC_AGAIN`
records since. Add it back. Also tests that the output assembly looks
okay.
Fixes PR23436.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236661 91177308-0d34-0410-b5e6-96231b3b80d8
We had code such as this:
r2 = ...
t2Bcc
label1:
ldr ... r2
label2;
return r2<dead, def>
The if converter was transforming this to
r2<def> = ...
return [pred] r2<dead,def>
ldr <r2, kill>
return
which fails the machine verifier because the ldr now reads from a dead def.
The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list. The caller then clears the dead flag from the def is the value is live.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236660 91177308-0d34-0410-b5e6-96231b3b80d8
demanded by the machine verifier.
After shrinking a live-range to its uses, it is possible to create several
smaller live-ranges. When this happens, shrinkToUses returns true and we need to
split the different components into their own live-ranges.
The problem does not reproduce on any in-tree target but Jonas Paulsson
<jonas.paulsson@ericsson.com>, who reported the problem, checked that this patch
fixes the issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236658 91177308-0d34-0410-b5e6-96231b3b80d8
ole32 is considered a default library with MSVC, but apparently
not with MinGW. Since we use CoInitialize, we need to explicitly
link against it in LLVMSupport for a MinGW build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236654 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically, this patch correctly respects the -demangle option,
and additionally adds a hidden --relative-address option allows
input addresses to be relative to the module load address instead
of absolute addresses into the image.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236653 91177308-0d34-0410-b5e6-96231b3b80d8
If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use.
Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236650 91177308-0d34-0410-b5e6-96231b3b80d8
When folding a load in to another instruction, we need to fix the class of the index register
Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236644 91177308-0d34-0410-b5e6-96231b3b80d8
Don't create names for temporary symbols when using an object streamer.
The names never make it to the output anyway. From the starting point
of r236629, my heap profile says this drops peak memory usage from 1100
MB to 1058 MB for CodeGen of `verify-uselistorder`, a savings of almost
4% on peak memory, and removes `StringMap<bool, BumpPtrAllocator...>`
from the profile entirely.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236642 91177308-0d34-0410-b5e6-96231b3b80d8
It's quite possible to encounter an insertvalue instruction that's more deeply
nested than the value we're looking for, but when that happens we really
mustn't compare beyond the end of the index array.
Since I couldn't see any guarantees about what comparisons std::equal makes, we
probably need to directly check the size beforehand. In practice, I suspect
most std::equal implementations would probably bail early, which would be OK.
But just in case...
rdar://20834485
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236635 91177308-0d34-0410-b5e6-96231b3b80d8
Emit the number of bytes in a `.debug_loc` entry directly. The old code
created temp labels (expensive), emitted the difference between them,
and then emitted one on each side of the relevant bytes.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`
(the optimized version of ld64's `-save-temps` when linking the
`verify-uselistorder` executable in an LTO bootstrap). I've hacked
`MCContext::Allocate()` to just call `malloc()` instead of using the
`BumpPtrAllocator` so that the heap profile is easier to read. As far
as peak memory is concerned, `MCContext::Allocate()` is equivalent to a
leak, since it only gets freed at process teardown.
In my heap profile, this patch drops memory usage of
`DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB
(2.7%) at peak memory. Some of that must be noise from `SmallVector`
(or other) allocations -- peak memory only dropped from 1160 MB down to
1100 MB -- but this nevertheless shaves 5% off the top.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236629 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This helper function creates a ctor function, which calls sanitizer's
init function with given arguments. This constructor is then expected
to be added to module's ctors. The patch helps unifying how sanitizer
constructor functions are created, and how init functions are called
across all sanitizers.
Reviewers: kcc, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236627 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When computing branch weights in BPI, we used to disallow branches with
weight 0. This is a minor nuisance, because a branch with weight 0 is
different to "don't have information". In the context of
instrumentation, it may mean "never executed", in the context of
sampling, it means "never or seldom executed".
In allowing 0 weight branches, I ran into issues with the switch
expansion code in selection DAG. It is currently hardwired to not handle
branches with weight 0. To maintain the current behaviour, I changed it
to use 1 when it finds 0, but perhaps the algorithm needs changes to
tolerate branches with weight zero.
Reviewers: hansw
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9533
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236617 91177308-0d34-0410-b5e6-96231b3b80d8
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236613 91177308-0d34-0410-b5e6-96231b3b80d8
With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add.
However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it."
This commit disables SelectBinaryFPOp for any vector types. There's already a FIXME to try handle neon. Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 || VT == MVT::i64'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236609 91177308-0d34-0410-b5e6-96231b3b80d8
The initial code drop for VSX swap optimization permitted the
optimization only when all operations in a web of related computation
are lane-insensitive. For some lane-sensitive operations, we can
still permit the optimization provided that we make adjustments to
those operations. This patch adds special handling for vector splats
so that their presence doesn't kill the optimization.
Vector splats are lane-sensitive since they identify by number a
vector element to be used as the source of a splat. When swap
optimizations take place, the desired vector element will move to the
opposite doubleword of the quadword vector. We thus replace the index
I by (I + N/2) % N, where N is the number of elements in the vector.
A new test case is added to test that swap optimization succeeds when
vector splats are present, and that the proper input element is used
as the source of the splat.
An ancillary change removes SH_BUILDVEC as one of the kinds of special
handling that may be required by VSX swap optimization. From
experience with GCC, I had expected to need some modifications for
vector build operations, but I did not find that to be the case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236606 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size.
Test Plan:
CodeGen for X86 test included.
Also one incorrect regression test fixed.
Reviewers: qcolombet, chandlerc, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D9250
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236584 91177308-0d34-0410-b5e6-96231b3b80d8
I folded the check for the flag -verify-dom-info into the only caller
where I think it is supposed to be checked: verifyAnalysis. (The idea
of the flag is to enable this expensive verification in
verifyPreservedAnalysis.)
I'm assuming that when manually scheduling the verification pass
with -passes=verify<domtree>, we do want to perform the verification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236575 91177308-0d34-0410-b5e6-96231b3b80d8
Since r234249, i1 are sext instead of zext; because of that, doing
"CMP rN, #0; IT EQ/NE" isn't correct anymore.
"TST #1" is the conservatively correct alternative - the tradeoff being
that it doesn't have a 16-bit encoding -, so use that instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236569 91177308-0d34-0410-b5e6-96231b3b80d8
For accessors in the `Statepoint` class, use symbolic constants for
offsets into the argument vector instead of literals. This makes the
code intent clearer and simpler to change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236566 91177308-0d34-0410-b5e6-96231b3b80d8
For consumers of coverage data, any filename prefixes we store in the
profile data are just noise. Strip this prefix if it exists.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236558 91177308-0d34-0410-b5e6-96231b3b80d8
The index reg on instructions with complex address modes is a GPR64_NOSP. Constrain it to appease the machine verifier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236557 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We default the value argument to nullptr. The only use of the value is
in diagnosePossiblyInvalidConstraint and that seems to be resilient to
it being nullptr.
Reviewers: atrick, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9479
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236555 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The exported class will be used in later change, in
StatepointLowering.cpp. It is still internal to SelectionDAG (not
exported via include/).
Reviewers: reames, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9478
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236554 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently this does not change anything, but change will be used in a
later change to StatepointLowering.cpp
Reviewers: reames, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9477
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236553 91177308-0d34-0410-b5e6-96231b3b80d8
Note, this is a recommit of r236515 after fixing an error in r236514. The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error. r236515 itself ran 'make check' without errors.
Original commit message follows:
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236550 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
There are 2 structural changes here:
1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.
2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes.
Differential Revision: http://reviews.llvm.org/D8900
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236546 91177308-0d34-0410-b5e6-96231b3b80d8
COMDAT groups which have become rendered unused because of inline are
discardable if we can prove that we've made the group empty.
This fixes PR22285.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236539 91177308-0d34-0410-b5e6-96231b3b80d8
Note, this is a reapplication of r236515 with a fix to not assert on non-register operands, but instead only handle them until the subsequent commit. Original commit message follows.
The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
Will be used in the next commit to also handle regmasks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236538 91177308-0d34-0410-b5e6-96231b3b80d8
The register set for LDMIA begins at offset 3, not 4. We were previously
missing the short encoding of this instruction in the case where the base
register was the first register in the register set.
Also clean up some dead code:
- The isARMLowRegister check is redundant with what VerifyLowRegs does;
replace with an assert.
- Remove handling of LDMDB instruction, which has no short encoding (and
does not appear in ReduceTable).
Differential Revision: http://reviews.llvm.org/D9485
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236535 91177308-0d34-0410-b5e6-96231b3b80d8
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it. This is needed on z, where element numbers are i32
but pointers are i64.
Original patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236530 91177308-0d34-0410-b5e6-96231b3b80d8
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt). For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt). The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)
Original patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236529 91177308-0d34-0410-b5e6-96231b3b80d8
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.
This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores. E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.
Original patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236528 91177308-0d34-0410-b5e6-96231b3b80d8
This adds intrinsics to allow access to all of the z13 vector instructions.
Note that instructions whose semantics can be described by standard LLVM IR
do not get any intrinsics.
For each instructions whose semantics *cannot* (fully) be described, we
define an LLVM IR target-specific intrinsic that directly maps to this
instruction.
For instructions that also set the condition code, the LLVM IR intrinsic
returns the post-instruction CC value as a second result. Instruction
selection will attempt to detect code that compares that CC value against
constants and use the condition code directly instead.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236527 91177308-0d34-0410-b5e6-96231b3b80d8
The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be
passed in vector registers. We do not yet support those types, and
some infrastructure is missing before we can do so.
In order to prevent accidentally generating code violating the ABI,
this patch adds checks to detect those types and error out if user
code attempts to use them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236526 91177308-0d34-0410-b5e6-96231b3b80d8
The ABI allows sub-128 vectors to be passed and returned in registers,
with the vector occupying the upper part of a register. We therefore
want to legalize those types by widening the vector rather than promoting
the elements.
The patch includes some simple tests for sub-128 vectors and also tests
that we can recognize various pack sequences, some of which use sub-128
vectors as temporary results. One of these forms is based on the pack
sequences generated by llvmpipe when no intrinsics are used.
Signed unpacks are recognized as BUILD_VECTORs whose elements are
individually sign-extended. Unsigned unpacks can have the equivalent
form with zero extension, but they also occur as shuffles in which some
elements are zero.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236525 91177308-0d34-0410-b5e6-96231b3b80d8
The z13 vector facility includes some instructions that operate only on the
high f64 in a v2f64, effectively extending the FP register set from 16
to 32 registers. It's still better to use the old instructions if the
operands happen to fit though, since the older instructions have a shorter
encoding.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236524 91177308-0d34-0410-b5e6-96231b3b80d8
The architecture doesn't really have any native v4f32 operations except
v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32
elements being used. Even so, using vector registers for <4 x float>
and scalarising individual operations is much better than generating
completely scalar code, since there's much less register pressure.
It's also more efficient to do v4f32 comparisons by extending to 2
v2f64s, comparing those, then packing the result.
This particularly helps with llvmpipe.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236523 91177308-0d34-0410-b5e6-96231b3b80d8
This adds ABI and CodeGen support for the v2f64 type, which is natively
supported by z13 instructions.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236522 91177308-0d34-0410-b5e6-96231b3b80d8
This the first of a series of patches to add CodeGen support exploiting
the instructions of the z13 vector facility. This patch adds support
for the native integer vector types (v16i8, v8i16, v4i32, v2i64).
When the vector facility is present, we default to the new vector ABI.
This is characterized by two major differences:
- Vector types are passed/returned in vector registers
(except for unnamed arguments of a variable-argument list function).
- Vector types are at most 8-byte aligned.
The reason for the choice of 8-byte vector alignment is that the hardware
is able to efficiently load vectors at 8-byte alignment, and the ABI only
guarantees 8-byte alignment of the stack pointer, so requiring any higher
alignment for vectors would require dynamic stack re-alignment code.
However, for compatibility with old code that may use vector types, when
*not* using the vector facility, the old alignment rules (vector types
are naturally aligned) remain in use.
These alignment rules are not only implemented at the C language level
(implemented in clang), but also at the LLVM IR level. This is done
by selecting a different DataLayout string depending on whether the
vector ABI is in effect or not.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236521 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for the z13 processor type and its vector facility,
and adds MC support for all new instructions provided by that facilily.
Apart from defining the new instructions, the main changes are:
- Adding VR128, VR64 and VR32 register classes.
- Making FP64 a subclass of VR64 and FP32 a subclass of VR32.
- Adding a D(V,B) addressing mode for scatter/gather operations
- Adding 1-, 2-, and 3-bit immediate operands for some 4-bit fields.
Until now all immediate operands have been the same width as the
underlying field (hence the assert->return change in decode[SU]ImmOperand).
In addition, sys::getHostCPUName is extended to detect running natively
on a z13 machine.
Based on a patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236520 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 963cdbccf6e5578822836fd9b2ebece0ba9a60b7 (ie r236514)
This is to get the bots green while i investigate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236518 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).
This is to get the bots green while i investigate the failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236517 91177308-0d34-0410-b5e6-96231b3b80d8
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236515 91177308-0d34-0410-b5e6-96231b3b80d8
The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
Will be used in the next commit to also handle regmasks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236514 91177308-0d34-0410-b5e6-96231b3b80d8
It got this in some cases (if one of them was an identified object), but not in all cases.
This caused stores to undef to block load-forwarding in some cases, etc.
Added test to Transforms/GVN to verify optimization occurs as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236511 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r236360.
This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236508 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236507 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When using the N64 ABI, element-indices use the i64 type instead of i32.
In many cases, we can use iPTR to account for this but additional patterns
and pseudo's are also required.
This fixes most (but not quite all) failures in the test-suite when using
N64 and MSA together.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9342
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236494 91177308-0d34-0410-b5e6-96231b3b80d8
When forming an IT block from the first MOV here:
%R2<def> = t2MOVr %R0, pred:1, pred:%CPSR, opt:%noreg
%R3<def> = tMOVr %R0<kill>, pred:14, pred:%noreg
the move in to R3 is moved out of the IT block so that later instructions on the same predicate can be inside this block, and we can share the IT instruction.
However, when moving the R3 copy out of the IT block, we need to clear its kill flags for anything in use at this point in time, ie, R0 here.
This appeases the machine verifier which thought that R0 wasn't defined when used.
I have a test case, but its extremely register allocator specific. It would be too fragile to commit a test which depends on the register allocator here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236468 91177308-0d34-0410-b5e6-96231b3b80d8
and avoid cloning unused decls into every partition.
Module partitioning showed up as a source of significant overhead when I
profiled some trivial test cases. Avoiding the overhead of partitionging
for uncalled functions helps to mitigate this.
This change also means that it is no longer necessary to have a
LazyEmittingLayer underneath the CompileOnDemand layer, since the
CompileOnDemandLayer will not extract or emit function bodies until they are
called.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236465 91177308-0d34-0410-b5e6-96231b3b80d8
When deciding whether a value comes from the aggregate or inserted value of an
insertvalue instruction, we compare the indices against those of the location
we're interested in. One of the lists needs reversing because the input data is
backwards (so that modifications take place at the end of the SmallVector), but
we were reversing both before leading to incorrect results.
Should fix PR23408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236457 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The object format can be set to something other than MachO, e.g.
to use ELF-on-Darwin for MCJIT. This already works on Windows, so
there's no reason it shouldn't on Darwin.
Reviewers: lhames, grosbach
Subscribers: rafael, grosbach, t.p.northover, llvm-commits
Differential Revision: http://reviews.llvm.org/D6185
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236455 91177308-0d34-0410-b5e6-96231b3b80d8
A joined option always needs to have an argument, even if it's an empty one.
Clang would previously assert when trying to use --extra-warnings, which is
a flag alias for -W, which is a joined option.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236434 91177308-0d34-0410-b5e6-96231b3b80d8
At the moment, all subregs defined by the SystemZ target can be modified
independently of the wider register. E.g. writing to a GR32 does not
change the upper 32 bits of the GR64. Writing to an FP32 does not change
the lower 32 bits of the FP64.
Hoewver, the upcoming support for the vector extension redefines FP64 as
one half of a V128. Floating-point operations leave the other half of
a V128 in an unpredictable state, so it's no longer the case that writing
to an FP32 leaves the bits of the underlying register (the V128) alone.
I'd prefer to have separate subreg_ names for this situation, so that
it's obvious at a glance whether we're talking about a subreg that leaves
the other parts of the register alone.
No behavioral change intended.
Patch originally by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236433 91177308-0d34-0410-b5e6-96231b3b80d8
We know what MemoryKind an operand has at the time we construct it,
so we might as well just record it in an unused part of the structure.
This makes it easier to add scatter/gather addresses later.
No behavioral change intended.
Patch originally by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236432 91177308-0d34-0410-b5e6-96231b3b80d8
It seems SystemZTargetLowering::getTargetNodeName got out of sync with
some recent changes to the SystemZISD opcode list. Add back all the
missing opcodes (and re-sort to the same order as SystemISelLowering.h).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236430 91177308-0d34-0410-b5e6-96231b3b80d8
ScheduleDAGInstrs wasn't setting or clearing the kill flags on instructions inside bundles. This led to code such as this
%R3<def> = t2ANDrr %R0
BUNDLE %ITSTATE<imp-def,dead>, %R0<imp-use,kill>
t2IT 1, 24, %ITSTATE<imp-def>
R6<def,tied6> = t2ORRrr %R0<kill>, ...
being transformed to
BUNDLE %ITSTATE<imp-def,dead>, %R0<imp-use>
t2IT 1, 24, %ITSTATE<imp-def>
R6<def,tied6> = t2ORRrr %R0<kill>, ...
%R3<def> = t2ANDrr %R0<kill>
where the kill flag was removed from the BUNDLE instruction, but not the t2ORRrr inside it. The verifier then thought that
R0 was undefined when read by the AND.
This change make the toggleKillFlags method also check for bundles and toggle flags on bundled instructions.
Setting the kill flag is special cased as we only want to set the kill flag on the last instruction in the bundle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236428 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236420 91177308-0d34-0410-b5e6-96231b3b80d8
After r210687, windows_error does nothing but call mapWindowsError.
Other Windows/*.inc files directly call mapWindowsError. This patch
updates Path.inc and Process.inc to do the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236409 91177308-0d34-0410-b5e6-96231b3b80d8
Removed code that was replicating v8i16 'shift + mask' implementation that is done more nicely by making use of LowerScalarImmediateShift
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236388 91177308-0d34-0410-b5e6-96231b3b80d8
Seems we were setting the base address on the wrong DwarfCompileUnit
object so it wasn't being used when generating the ranges.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236377 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the --load-address command line option to
llvm-pdbdump, which dumps all addresses assuming the module has
loaded at the specified address.
Additionally, this patch adds an option to llvm-pdbdump to support
dumping of public symbols (i.e. symbols with external linkage).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236342 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r235060 and 235070, which were reverted because of test failures
in LLDB. The failure was caused because at moment RuntimeDyld is processing
relocations for all sections, irrespective of whether we actually load them
into memory or not, but RuntimeDyld was not actually remembering where in memory
the unrelocated section is. This commit includes a fix for that issue by
remembering that pointer, though the longer term fix should be to stop processing
unneeded sections.
Original Summary:
This allows us to get rid of the original unrelocated object file after
we're done processing relocations (but before applying them).
MachO and COFF already do not require this (currently we have temporary hacks
to prevent ownership from being released, but those are brittle and should be
removed soon).
The placeholder mechanism allowed the relocation resolver to look at original
object file to obtain more information that are required to apply the
relocations. This is usually necessary in two cases:
- For relocations targetting sub-word memory locations, there may be pieces
of the instruction at the target address which we should not override.
- Some relocations on some platforms allow an extra addend to be encoded in
their immediate fields.
The problem is that in the second case the information cannot be recovered
after the relocations have been applied once because they will have been
overridden. In the first case we also need to be careful to not use any bits
that aren't fixed and may have been overriden by applying a first relocation.
In the past both have been fixed by just looking at original object file. This
patch attempts to recover the information from the first by looking at the
relocated object file, while the extra addend in the second case is read
upon relocation processing and addend to the regular addend.
I have tested this on X86. Other platforms represent my best understanding
of how those relocations should work, but I may have missed something because
I do not have access to those platforms.
We will keep the ugly workarounds in place for a couple of days, so this commit
can be reverted if it breaks the bots.
Differential Revision: http://reviews.llvm.org/D9028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236341 91177308-0d34-0410-b5e6-96231b3b80d8
This pass is responsible for constructing the EH registration object
that gets linked into fs:00, which is all it does in this change. In the
future, it will also insert stores to update the EH state number.
I considered keeping this functionality in WinEHPrepare, but it's pretty
separable and X86 specific. It has conceptually very little to do with
the task of WinEHPrepare, which is currently outlining. WinEHPrepare is
also in theory useful on ARM, but this logic is pretty x86 specific.
Reviewers: andrew.w.kaylor, majnemer
Differential Revision: http://reviews.llvm.org/D9422
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236339 91177308-0d34-0410-b5e6-96231b3b80d8
Converting from t2LDRs to tLDRr caused the shift argument to drop the internal flag. This would then throw machine verifier errors.
Unfortunately i'm having trouble reducing a test case. I'm going to keep trying, but so far its a scary combination of machine sinking, an 'and i1', loads feeding loads, and a bunch of code which shouldn't change IT block formation, but does. Its not useful to commit a test in that state as we have no way of knowing if it even hits this code reliably in future.
rdar://problem/20752113
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236333 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes a bug where the YAML Output class emitted
a sequence of flow sequences without the '-' characters.
Before:
seq:
[ a, b ]
[ c, d ]
After:
seq:
- [ a, b ]
- [ c, d ]
Reviewers: Justin Bogner
Differential Revision: http://reviews.llvm.org/D9206
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236329 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
LI should never accept immediates larger than 32 bits.
The additional Is32BitImm boolean also paves the way for unifying the functionality that LA and LI have in common.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9289
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236313 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Generate one DSLL32 of 0 instead of two consecutive DSLL of 16.
In order to do this I had to change createLShiftOri's template argument from a bool to an unsigned.
This also gave me the opportunity to rewrite the mips64-expansions.s test, as it was testing the same cases multiple times and skipping over other cases.
It was also somewhat unreadable, as the CHECK lines were grouped in a huge block of text at the beginning of the file.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8974
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236311 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund.
The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector.
I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch.
Differential Revision: http://reviews.llvm.org/D9282
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236308 91177308-0d34-0410-b5e6-96231b3b80d8
This pass was generating 'Instruction does not dominate all uses!'
errors for programs which had loops with a condition variable that
depended on the result of a phi instruction from outside of the loop.
The pass was inserting new phi nodes outside of the loop which used values
defined inside the loop.
http://bugs.freedesktop.org/show_bug.cgi?id=90056
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236306 91177308-0d34-0410-b5e6-96231b3b80d8
If we move an instruction from one block down to a MOVC and predicate it,
then the original instruction could be moved in to a loop. In this case,
its invalid for any kill flags to remain on there.
Fails with -verfy-machineinstrs.
rdar://problem/20752113
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236290 91177308-0d34-0410-b5e6-96231b3b80d8
This change is the second of 3 patches to add support for specifying
the profile output from the command line via -fprofile-instr-generate=<path>,
where the specified output path/file will be overridden by the
LLVM_PROFILE_FILE environment variable.
This patch adds the necessary support to the llvm instrumenter, specifically
a new member of GCOVOptions for clang to save the specified filename, and
support for calling the new compiler-rt interface from __llvm_profile_init.
Patch by Teresa Johnson. Thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236288 91177308-0d34-0410-b5e6-96231b3b80d8
When commuting a thumb instruction in the size reduction pass, thumb
instructions are represented as a bundle and so some operands may be marked
as internal. The internal flag has to move with the operand when commuting.
This test is sensitive to register allocation so can't specifically check that
this error was happening, but so long as it continues to pass with -verify then
hopefully its still ok.
rdar://problem/20752113
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236282 91177308-0d34-0410-b5e6-96231b3b80d8
The expansion for t2ABS was always setting the kill flag on the rsb instruction.
It should instead only be set on rsb if it was set on the original ABS instruction.
rdar://problem/20752113
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236272 91177308-0d34-0410-b5e6-96231b3b80d8
This helps reduce the frequency of stack realignment prologues in 32-bit
X86 Windows code. Before this change and the corresponding clang change,
we would take the max of the type preferred alignment and the explicit
alignment on the alloca.
If you don't override aggregate alignment in datalayout, you get a
default of 8. This dates back to 2007 / r34356, and changing it seems
prohibitively difficult at this point.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236270 91177308-0d34-0410-b5e6-96231b3b80d8
When optimizing demanded bits of the operands of an Add we have to
remove the nsw/nuw flags as we have no guarantee anymore that we don't
wrap. This is legal here because the top bit is not demanded. In fact
this operaion was already performed but missed in the case of an Add
with a constant on the right side. To fix this this patch refactors the
code to unify the code paths in SimplifyDemandedUseBits() handling of
Add/Sub:
- The transformation of Add->Or is removed from the simplify demand
code because the equivalent transformation exists in
InstCombiner::visitAdd()
- KnownOnes/KnownZero are not adjusted for Add x, C anymore as
computeKnownBits() already performs these computations.
- The simplification of the operands is unified. In this new version
constant on the right side of a Sub are shrunk now as I could not find
a reason why not to do so.
- The special case for clearing nsw/nuw in ShrinkDemandedConstant() is
not necessary anymore as the caller does that already.
Differential Revision: http://reviews.llvm.org/D9415
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236269 91177308-0d34-0410-b5e6-96231b3b80d8
The rule that turns a sub to xor if the LHS is 2^n-1 and the remaining bits
are known zero, does not use the demanded bits at all: Move it to the
normal InstCombine code path.
Differential Revision: http://reviews.llvm.org/D9417
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236268 91177308-0d34-0410-b5e6-96231b3b80d8
This is actually fairly simple in the current code layout: Check if we should
compress just before writing out and everything else just works.
This removes the last case in which the object writer was creating a
fragment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236267 91177308-0d34-0410-b5e6-96231b3b80d8
Revision 220239 exposed a latent bug in method
'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine
instruction, method 'commuteInstruction' didn't correctly propagate the
'IsUndef' flag to the register operands of the new (commuted) instruction.
Before this patch, the following instruction:
%vreg4<def> = VADDSDrr %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5
was wrongly converted by method 'commuteInstruction' into:
%vreg4<def> = VADDSDrr %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14
The correct instruction should have been:
%vreg4<def> = VADDSDrr %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14
This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'.
When swapping the operands of a machine instruction, we now make sure that
'IsUndef' flags are correctly set.
Added test case 'pr23103.ll'.
Differential Revision: http://reviews.llvm.org/D9406
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236258 91177308-0d34-0410-b5e6-96231b3b80d8