This will be useful once we start adding the ability to dump type
records and symbol records, since it will allow us to generate
mergeable information instead of information that specifies an
entire file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275109 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.
The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21551
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275108 91177308-0d34-0410-b5e6-96231b3b80d8
Blocks to be tail-merged may share more than one successor. Correct the
comment to state that they share a specific successor, SuccBB, rather
than a single successor, which is not true.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275104 91177308-0d34-0410-b5e6-96231b3b80d8
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call sequence to be generated in two passes instead of one.
Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:
popl %edi
popl %edi
pushl %eax
jmp tailcallee # TAILCALL
To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.
Patch by Magnus Lång <margnus1@gmail.com>
Differential Revision: http://reviews.llvm.org/D21325
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275103 91177308-0d34-0410-b5e6-96231b3b80d8
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.
This also works around (but doesn't quite fix) PR28489.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275099 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values. Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.
Author: RolandF
Reviewers: uweigand
Subscribers: zhanjunl
Commiting on behalf of Roland.
Differential Revision: http://reviews.llvm.org/D22117
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275086 91177308-0d34-0410-b5e6-96231b3b80d8
There's a little bit of churn in this patch because the initialization
mechanism is now shared between the old and the new PM. Other than
that, it's just a pretty mechanical translation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275082 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: http://reviews.llvm.org/D22118 uses metadata to store the call count, which makes it possible to have branch weight to have only one elements. Also fix the assertion failure in inliner when checking the instruction type to include "invoke" instruction.
Reviewers: mkuper, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275079 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch.
E.g.
if (A1 && A2 && A3 && ..... && A10) {
for (i=0; i < 100000000; i++) {
callsite();
}
}
Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value.
In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness.
Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR.
Reviewers: davidxl, eraman, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22118
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275073 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D22180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275072 91177308-0d34-0410-b5e6-96231b3b80d8
This subtle change to getModRefInfo(Instruction, ImmutableCallSite) is to
ensure that the semantics are equal to that of getModRefInfo(CS1, CS2) when
the Instruction is a call-site.
This is now more in line with getModRefInfo generally: it returns Mod when
I modifies a memory location that is accessed (read or written) by CS and
Ref when I reads a memory location that is written by CS.
From a grep of the code, the only uses of this particular getModRefInfo
overload are in MemorySSA and MemCpyOptimizer, and they only care about
where the result is MR_NoModRef or not. Therefore, this change should have
no visible effect.
Separated out from D17279 upon request.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275065 91177308-0d34-0410-b5e6-96231b3b80d8
At present the only shuffle with a variable mask we recognise is PSHUFB, which influences if its worth the cost of mask creation/loading of a combined target shuffle with a variable mask. This change sets up the infrastructure to support other shuffles in the future but has no effect yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275059 91177308-0d34-0410-b5e6-96231b3b80d8
Preserve assembly comments from input in output assembly and flags to
toggle property. This is on by default for inline assembly and off in
llvm-mc.
Parsed comments are emitted immediately before an EOL which generally
places them on the expected line.
Reviewers: rtrieu, dwmw2, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275058 91177308-0d34-0410-b5e6-96231b3b80d8
DAG lowering was missing for the scalar FMINC, FMAXC nodes.
The nodes are generated only in the "unsafe-fp-math" mode.
Added tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275048 91177308-0d34-0410-b5e6-96231b3b80d8
Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look
through returned-argument functions when answering queries. This is essential
so that we don't loose all other AA information when supplementing with
llvm.noalias.
Differential Revision: http://reviews.llvm.org/D9383
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275035 91177308-0d34-0410-b5e6-96231b3b80d8
In order to make the optimizer smarter about using the 'returned' argument
attribute (generally, but motivated by my llvm.noalias intrinsic work), add a
utility function to Call/InvokeInst, and CallSite, to make it easy to get the
returned call argument (when one exists).
P.S. There is already an unfortunate amount of code duplication between
CallInst and InvokeInst, and this adds to it. We should probably clean that up
separately.
Differential Revision: http://reviews.llvm.org/D22204
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275031 91177308-0d34-0410-b5e6-96231b3b80d8
Calls to matchVectorShuffleAsInsertPS only need to ensure the inputs are 128-bit vectors. Only lowerVectorShuffleAsInsertPS needs to ensure that they are v4f32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275028 91177308-0d34-0410-b5e6-96231b3b80d8
A function can have one argument with the 'returned' attribute, indicating that
the associated argument is always the return value of the function. Add
FuncAttrs inference logic.
Differential Revision: http://reviews.llvm.org/D22202
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275027 91177308-0d34-0410-b5e6-96231b3b80d8
The description of the 'returned' attribute says that it is only used when
code-generating the caller. I'd like to make the optimizer smarter about
looking through functions with returned arguments (generally, but motivated by
my llvm.noalias work). As David pointed out in the review of D22202, the
LangRef should be updated to make its expanded uses clearer.
Differential Revision: http://reviews.llvm.org/D22205
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275026 91177308-0d34-0410-b5e6-96231b3b80d8