Summary:
Problem exposed in PowerPC functional testing.
We did not consider Anti dependence for nodes in same cycle,
so we may end up generating bad machine code.
eg: the reduced test won't verify.
*** Bad machine code: Using an undefined physical register ***
- function: lame_encode_buffer_interleaved
- basic block: %bb.4 (0x4bde4e12928)
- instruction: %29:gprc = ADDZE %27:gprc, implicit-def dead $carry, implicit $carry
- operand 3: implicit $carry
Reviewers: bcahoon, kparzysz, hfinkel
Subscribers: MaskRay, wuzish, nemanjai, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64192
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365859 91177308-0d34-0410-b5e6-96231b3b80d8
Stubs out a number of the classes needed to produce a new object file format
(XCOFF) for the powerpc-aix target. For testing input is an empty module which
produces an object file with just a file header.
Differential Revision: https://reviews.llvm.org/D61694
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365541 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
`extsw` and `sldi` are supposed to be combined if they are in the same
BB in instruction selection phase. This patch handles the case where
extsw and sldi are not in the same BB.
Differential Revision: https://reviews.llvm.org/D63806
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365430 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is exposed by functional testing on PowerPC.
In some pipelined loops, Phi refer to phi did not get value defined by
the Phi, hence getting wrong value later.
As the comment mentioned, we should "use the value defined by the Phi,
unless we're generating the firstepilog and the Phi refers to a Phi
in a different stage.", so Phi refering to same stage Phi should use
the value defined by the Phi here.
Reviewers: bcahoon, hfinkel
Reviewed By: hfinkel
Subscribers: MaskRay, wuzish, nemanjai, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64035
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365428 91177308-0d34-0410-b5e6-96231b3b80d8
The indirect call sequence on PPC requires that the TOC base register be saved
prior to the indirect call and restored after the call since the indirect call
may branch to a global entry point in another DSO which will update the TOC
base. Over the last couple of years, we have improved this to:
- be able to hoist TOC saves from loops (with changes to MachineLICM)
- avoid multiple saves when one dominates the other[s]
However, it is still possible to have multiple TOC saves dynamically in the
execution path if there is no dominance relationship between them.
This patch moves the TOC save to the prologue when one of the TOC saves is in a
block that post-dominates entry (i.e. it cannot be avoided) or if it is in a
block that is hotter than entry.
Differential revision: https://reviews.llvm.org/D63803
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365232 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
"ww" and "ws" are both constraint codes for VSX vector registers that
hold scalar double data. "ww" is preferred for float while "ws" is
preferred for double.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D64119
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365106 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365010 91177308-0d34-0410-b5e6-96231b3b80d8
I initially committed it with --check-prefix instead of --check-prefixes
(again, shame on me, and utils/update_*.py not complaining!)
and did not have a moment to understand the failure,
so i reverted it initially in rL64939.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364945 91177308-0d34-0410-b5e6-96231b3b80d8
After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise,
and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition
Differential Revision: https://reviews.llvm.org/D63804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364886 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
SCRUB_LOOP_COMMENT_RE was introduced in https://reviews.llvm.org/D31285
This works for some loops.
However, we may generate lines with loop comments only.
And since we don't scrub leading white spaces, this will leave an empty
line there, and FileCheck will complain it.
eg: llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll:27:15:
error: found empty check string with prefix 'CHECK:'
; CHECK-NEXT:
This prevented us from using the `update_llc_test_checks.py` for quite some cases.
We should still keep the comment token there, so that we can safely
scrub the loop comment without breaking FileCheck.
Reviewers: timshen, hfinkel, lebedev.ri, RKSimon
Subscribers: nemanjai, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63957
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364775 91177308-0d34-0410-b5e6-96231b3b80d8
This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751
llvm-mc aborted when disassembling tabortdc.
This patch try to clean up TM related DAGs.
* Fixes the problem by remove explicit output of cr0, and put it as implicit def.
* Update int_ppc_tbegin pattern to accommodate the implicit def of cr0.
* Update the TCHECK operand and int_ppc_tcheck accordingly.
* Add some builtin test and disassembly tests.
* Remove unused CRRC0/crrc0
Differential Revision: https://reviews.llvm.org/D61935
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364544 91177308-0d34-0410-b5e6-96231b3b80d8
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364412 91177308-0d34-0410-b5e6-96231b3b80d8
For some reason, the update_llc_checks.py script produces checks for
empty lines which cause failures. Corrected that to check for actual
text produced by llc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364377 91177308-0d34-0410-b5e6-96231b3b80d8
An upcoming patch will modify the behaviour with respect to saving the TOC
in functions with indirect calls.
Adding a test case so the patch will show the difference in codegen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364375 91177308-0d34-0410-b5e6-96231b3b80d8
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.
Differential revision: https://reviews.llvm.org/D63634
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364373 91177308-0d34-0410-b5e6-96231b3b80d8
When we calculate MII, we use two loops, one with iterator R++ to
check whether we can reserve the resource, then --R to move back
the iterator to do reservation.
This is risky, as R++, --R may not point to the same element at all.
The can cause wrong MII.
Differential Revision: https://reviews.llvm.org/D63536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364353 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that:
* -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000.
* -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000.
Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic
Reviewed By: jhibbits
Subscribers: adalava, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63563
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364324 91177308-0d34-0410-b5e6-96231b3b80d8
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363757 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363533 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363526 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363495 91177308-0d34-0410-b5e6-96231b3b80d8
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363471 91177308-0d34-0410-b5e6-96231b3b80d8
This was exposed by PowerPC target enablement.
In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.
When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.
Differential Revision: https://reviews.llvm.org/D63282
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363329 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472
The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.
As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.
Reviewers: dmgreen, qcolombet
Reviewed By: qcolombet
Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63152
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363265 91177308-0d34-0410-b5e6-96231b3b80d8
Looks like a MachinePipeliner algorithm problem found by
sanitizer-x86_64-linux-fast.
I will backout this test first while investigating the problem to
unblock buildbot.
==49637==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x614000002e08 at pc 0x000004364350 bp 0x7ffe228a3bd0 sp 0x7ffe228a3bc8
READ of size 4 at 0x614000002e08 thread T0
#0 0x436434f in
llvm::SwingSchedulerDAG::checkValidNodeOrder(llvm::SmallVector<llvm::NodeSet,
8u> const&) const
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:3736:11
#1 0x4342cd0 in llvm::SwingSchedulerDAG::schedule()
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:486:3
#2 0x434042d in
llvm::MachinePipeliner::swingModuloScheduler(llvm::MachineLoop&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:385:7
#3 0x433eb90 in
llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:207:5
#4 0x428b7ea in
llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13
#5 0x4d1a913 in llvm::FPPassManager::runOnFunction(llvm::Function&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1648:27
#6 0x4d1b192 in llvm::FPPassManager::runOnModule(llvm::Module&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1685:16
#7 0x4d1c06d in runOnModule
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1752:27
#8 0x4d1c06d in llvm::legacy::PassManagerImpl::run(llvm::Module&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1865
#9 0xa48ca3 in compileModule(char**, llvm::LLVMContext&)
/b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:611:8
#10 0xa4270f in main
/b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:365:22
#11 0x7fec902572e0 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
#12 0x971b69 in _start
(/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/llc+0x971b69)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363105 91177308-0d34-0410-b5e6-96231b3b80d8
This was found during HTM cleanup.
Adding a test for builtin_ttest would expose following issue.
*** Bad machine code: Illegal physical register for instruction ***
- function: test10
- basic block: %bb.0 entry (0xf0e57497b58)
- instruction: %5:crrc0 = TABORTWCI 0, $zero, 0
- operand 2: $zero
$zero is not a GPRC register.
LLVM ERROR: Found 1 machine code errors.
Differential Revision: https://reviews.llvm.org/D63079
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362974 91177308-0d34-0410-b5e6-96231b3b80d8
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362921 91177308-0d34-0410-b5e6-96231b3b80d8