I've taken the approach from the LoopInfo test:
* Rather than running in the pass manager just build the analyses manually
* Split out the common parts (makeLLVMModule, runWithDomTree) into helpers
Differential Revision: https://reviews.llvm.org/D33617
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304061 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes introduction of an incorrect inttoptr/ptrtoint pair in
the included test case which makes use of non-integral pointers. I
suspect there are more cases like this left, but this takes care of
the one I was seeing at the moment.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33129
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304058 91177308-0d34-0410-b5e6-96231b3b80d8
Rewrite fixupKills() to use the LivePhysRegs class. Simplifies the code
and fixes a bug where the CSR registers in return blocks where missed
leading to invalid kill flags. Also remove the unnecessary rule that we
wouldn't set kill flags on tied operands.
No tests as I have an upcoming commit improving MachineVerifier checks
to catch these cases in multiple existing lit tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304055 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r299287 plus clean-ups.
The localizer pass is a helper pass that could be run at O0 in the GISel
pipeline to work around the deficiency of the fast register allocator.
It basically shortens the live-ranges of the constants so that the
allocator does not spill all over the place.
Long term fix would be to make the greedy allocator fast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304051 91177308-0d34-0410-b5e6-96231b3b80d8
The recommit is to fix a bug about ExtractValue and InsertValue ops. For those
ops, some varargs inside GVN::Expression are not value numbers but raw index
numbers. It is wrong to do phi-translate for raw index numbers, and the fix is
to stop doing that.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304050 91177308-0d34-0410-b5e6-96231b3b80d8
One case in BranchRelaxation did not compute liveins after creating a
new block. This is catched by existing tests with an upcoming commit
that will improve MachineVerifier checking of livein lists.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304049 91177308-0d34-0410-b5e6-96231b3b80d8
- Rewrite livein calculation to use the computeLiveIns() helper
function. This is slightly less efficient but easier to reason about
and doesn't unnecessarily add pristine and reserved registers[1]
- Zero the status register at the beginning of the loop to make sure it
has a defined value.
- Remove kill flags of values that need to stay alive throughout the loop.
[1] An upcoming commit of mine will tighten the MachineVerifier to catch
these.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304048 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, we called simplifyPossiblyCastedAndOrOfICmps twice with the operands commuted, but the call to simplifyAndOrOfICmpsWithConstants further down already handles commuting and doesn't need to be called both ways.
This patch pushes double calls further down to just the individual routines that need to be called twice.
Differential Revision: https://reviews.llvm.org/D33603
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304044 91177308-0d34-0410-b5e6-96231b3b80d8
Wrong assembly code is generated for a simple program with
clang. If clang only produces IR and llc is used
for IR lowering and optimization, correct assembly
code is generated.
The main reason is that clang feeds default Reloc::Static
to llvm and llc feeds no RelocMode to llvm, where
for llc case, BPF backend picks up Reloc::PIC_ mode.
This leads different IR lowering behavior and clang
permits global_addr+off folding while llc doesn't.
This patch introduces isOffsetFoldingLegal function into
BPF backend and the function always return false.
This will make clang and llc behave the same for
the lowering.
Bug https://bugs.llvm.org//show_bug.cgi?id=33183
has more detailed explanation.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304043 91177308-0d34-0410-b5e6-96231b3b80d8
It's a workaround because the test was flakey passing to begin with, but
it looks like (going off commit history) it really did want to test in
the presence of debug info, so keep that behavior (by adding something
to the CU so it's not dropped) & restore the flakey pass in the process.
(added a FIXME in case someone else decides to look at it later)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304042 91177308-0d34-0410-b5e6-96231b3b80d8
[AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.
Patch by Tim Corringham
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304031 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
r295737 included a fix for leaking libraries loaded via. DynamicLibrary::addPermanentLibrary.
This created a problem where static constructors in a library could insert llvm::ManagedStatic objects before DynamicLibrary would register it's own ManagedStatic, meaning a crash could occur at shutdown.
r301562 exasperated this problem by cleaning up the DynamicLibrary ManagedStatic during llvm_shutdown.
Reviewers: v.g.vassilev, lhames, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33581
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304027 91177308-0d34-0410-b5e6-96231b3b80d8
The tests here are have operands commuted to provide more coverage. I also commuted one of the instructions in the scalar tests so the 4 tests cover the 4 commuted variations
Differential Revision: https://reviews.llvm.org/D33599
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304021 91177308-0d34-0410-b5e6-96231b3b80d8
Consistent with GCC and addresses a shortcoming with ThinLTO where many
imported CUs may end up being empty (because the functions imported from
them either ended up not being used (and were then discarded, since
they're imported as available_externally) or optimized away entirely).
Test cases previously testing empty CUs (either intentionally, or
because they didn't need anything more complicated) had a trivial 'int'
or similar basic type added to their retained types list.
This is a first order approximation - a deeper implementation could do
things like:
1) Be more lazy about construction of the CU - for example if two CUs
containing a single identical retained type are linked together, with
this change one of the two CUs will be produced but empty (since a
duplicate type won't be produced).
2) Go further and invert all the CU links the same way the subprogram
link is inverted - keep named CU lists of retained types, macros, etc,
and have those link back to the CU. Then if they're emitted, the CU is
emitted, but never otherwise - this would allow the metadata itself to
be dropped earlier too, though it seems unlikely that's an important
optimization as there shouldn't be many CUs relative to the number of
other entities.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304020 91177308-0d34-0410-b5e6-96231b3b80d8
The whole-program-devirt pass needs to run at -O0 because only it
knows about the llvm.type.checked.load intrinsic: it needs to both
lower the intrinsic itself and handle it in the summary.
Differential Revision: https://reviews.llvm.org/D33571
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304019 91177308-0d34-0410-b5e6-96231b3b80d8
This produced 'strange' DWARF anyway - the CU would have no ranges (or
at least not a range including the inlined code) nor any subprogram or
inlined_subroutine - yet the line table would have entries for these
instructions.
(this actually becomes more relevant with changes coming after this,
where a CU without any contents will be omitted entirely - so there
would be no line table to put this on anyway)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304004 91177308-0d34-0410-b5e6-96231b3b80d8
This change is intended to use for LLD in D33183.
Problem we have in LLD when building .gdb_index is that we need to know section which address range belongs to.
Previously it was solved on LLD side by providing fake section addresses with use of llvm::LoadedObjectInfo
interface. We assigned file offsets as addressed. Then after obtaining ranges lists, for each range we had to find section ID's.
That not only was slow, but also complicated implementation and was the reason of incorrect behavior when
sections share the same offsets, like D33176 shows.
This patch makes DWARF parsers to return section index as well. That solves problem mentioned above.
Differential revision: https://reviews.llvm.org/D33184
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304002 91177308-0d34-0410-b5e6-96231b3b80d8
Re-commit r303938 and r303954 with a fix for addLiveIns(): the internal
addPristines() function must be called on an empty set or it may
accidentally reset saved registers.
- addLiveOutsNoPristines() needs to add callee saved registers that are
actually saved and restored somewhere to the set (they are not
pristine).
- Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines().
This fixes the problem from D32156.
Differential Revision: https://reviews.llvm.org/D32464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304001 91177308-0d34-0410-b5e6-96231b3b80d8
In the best case:
extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN
...we kill all of the extract/concat and just have narrow binops remaining.
If only one of the binop operands is amenable, this transform is still
worthwhile because we kill some of the extract/concat.
Optional bitcasting makes the code more complicated, but there doesn't
seem to be a way to avoid that.
The TODO about extending to more than bitwise logic is there because we really
will regress several x86 tests including madd, psad, and even a plain
integer-multiply-by-2 or shift-left-by-1. I don't think there's anything
fundamentally wrong with this patch that would cause those regressions; those
folds are just missing or brittle.
If we extend to more binops, I found that this patch will fire on at least one
non-x86 regression test. There's an ARM NEON test in
test/CodeGen/ARM/coalesce-subregs.ll with a pattern like:
t5: v2f32 = vector_shuffle<0,3> t2, t4
t6: v1i64 = bitcast t5
t8: v1i64 = BUILD_VECTOR Constant:i64<0>
t9: v2i64 = concat_vectors t6, t8
t10: v4f32 = bitcast t9
t12: v4f32 = fmul t11, t10
t13: v2i64 = bitcast t12
t16: v1i64 = extract_subvector t13, Constant:i32<0>
There was no functional change in the codegen from this transform from what I
could see though.
For the x86 test changes:
1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case,
but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops,
there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op.
SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat
ops to match the pattern.
2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract.
Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026
3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT.
4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction
count by one in each case by eliminating two insert/extract while adding one narrower logic op.
https://bugs.llvm.org/show_bug.cgi?id=32790
Differential Revision: https://reviews.llvm.org/D33137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303997 91177308-0d34-0410-b5e6-96231b3b80d8
Currently getOptimalMemOpType returns i32 for large enough sizes without
checking for alignment, leading to poor code generation when misaligned accesses
aren't permitted as we generate a word store then later split it up into byte
stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for
memset we splat the memset value into a word then immediately split it up
again.
Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type
to use, but also fix a bug there where it wasn't correctly checking if
misaligned memory accesses are allowed.
Differential Revision: https://reviews.llvm.org/D33442
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303990 91177308-0d34-0410-b5e6-96231b3b80d8