This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751
llvm-mc aborted when disassembling tabortdc.
This patch try to clean up TM related DAGs.
* Fixes the problem by remove explicit output of cr0, and put it as implicit def.
* Update int_ppc_tbegin pattern to accommodate the implicit def of cr0.
* Update the TCHECK operand and int_ppc_tcheck accordingly.
* Add some builtin test and disassembly tests.
* Remove unused CRRC0/crrc0
Differential Revision: https://reviews.llvm.org/D61935
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364544 91177308-0d34-0410-b5e6-96231b3b80d8
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364412 91177308-0d34-0410-b5e6-96231b3b80d8
For some reason, the update_llc_checks.py script produces checks for
empty lines which cause failures. Corrected that to check for actual
text produced by llc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364377 91177308-0d34-0410-b5e6-96231b3b80d8
An upcoming patch will modify the behaviour with respect to saving the TOC
in functions with indirect calls.
Adding a test case so the patch will show the difference in codegen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364375 91177308-0d34-0410-b5e6-96231b3b80d8
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.
Differential revision: https://reviews.llvm.org/D63634
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364373 91177308-0d34-0410-b5e6-96231b3b80d8
When we calculate MII, we use two loops, one with iterator R++ to
check whether we can reserve the resource, then --R to move back
the iterator to do reservation.
This is risky, as R++, --R may not point to the same element at all.
The can cause wrong MII.
Differential Revision: https://reviews.llvm.org/D63536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364353 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that:
* -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000.
* -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000.
Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic
Reviewed By: jhibbits
Subscribers: adalava, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63563
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364324 91177308-0d34-0410-b5e6-96231b3b80d8
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363757 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363533 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363526 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363495 91177308-0d34-0410-b5e6-96231b3b80d8
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363471 91177308-0d34-0410-b5e6-96231b3b80d8
This was exposed by PowerPC target enablement.
In ScheduleDAG, if we haven't seen any uses in this scheduling region,
we will create a dependence edge to ExitSU to model the live-out latency.
This is required for vreg defs with no in-region use, and prefetches with
no vreg def.
When we build NodeOrder in Scheduler, we ignore these boundary nodes.
However, when we check Succs in checkValidNodeOrder, we did not skip
them, so we still assume all the nodes have been sorted and in order in
Indices array. So when we call lower_bound() for ExitSU, it will return
Indices.end(), causing memory issues in following Node access.
Differential Revision: https://reviews.llvm.org/D63282
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363329 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472
The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.
As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.
Reviewers: dmgreen, qcolombet
Reviewed By: qcolombet
Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63152
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363265 91177308-0d34-0410-b5e6-96231b3b80d8
Looks like a MachinePipeliner algorithm problem found by
sanitizer-x86_64-linux-fast.
I will backout this test first while investigating the problem to
unblock buildbot.
==49637==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x614000002e08 at pc 0x000004364350 bp 0x7ffe228a3bd0 sp 0x7ffe228a3bc8
READ of size 4 at 0x614000002e08 thread T0
#0 0x436434f in
llvm::SwingSchedulerDAG::checkValidNodeOrder(llvm::SmallVector<llvm::NodeSet,
8u> const&) const
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:3736:11
#1 0x4342cd0 in llvm::SwingSchedulerDAG::schedule()
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:486:3
#2 0x434042d in
llvm::MachinePipeliner::swingModuloScheduler(llvm::MachineLoop&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:385:7
#3 0x433eb90 in
llvm::MachinePipeliner::runOnMachineFunction(llvm::MachineFunction&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachinePipeliner.cpp:207:5
#4 0x428b7ea in
llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13
#5 0x4d1a913 in llvm::FPPassManager::runOnFunction(llvm::Function&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1648:27
#6 0x4d1b192 in llvm::FPPassManager::runOnModule(llvm::Module&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1685:16
#7 0x4d1c06d in runOnModule
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1752:27
#8 0x4d1c06d in llvm::legacy::PassManagerImpl::run(llvm::Module&)
/b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/LegacyPassManager.cpp:1865
#9 0xa48ca3 in compileModule(char**, llvm::LLVMContext&)
/b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:611:8
#10 0xa4270f in main
/b/sanitizer-x86_64-linux-fast/build/llvm/tools/llc/llc.cpp:365:22
#11 0x7fec902572e0 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
#12 0x971b69 in _start
(/b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/llc+0x971b69)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363105 91177308-0d34-0410-b5e6-96231b3b80d8
This was found during HTM cleanup.
Adding a test for builtin_ttest would expose following issue.
*** Bad machine code: Illegal physical register for instruction ***
- function: test10
- basic block: %bb.0 entry (0xf0e57497b58)
- instruction: %5:crrc0 = TABORTWCI 0, $zero, 0
- operand 2: $zero
$zero is not a GPRC register.
LLVM ERROR: Found 1 machine code errors.
Differential Revision: https://reviews.llvm.org/D63079
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362974 91177308-0d34-0410-b5e6-96231b3b80d8
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D62897
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362921 91177308-0d34-0410-b5e6-96231b3b80d8
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
actually overbook the resource.
When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .
Differential Revision: https://reviews.llvm.org/D62345
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362805 91177308-0d34-0410-b5e6-96231b3b80d8
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362774 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362735 91177308-0d34-0410-b5e6-96231b3b80d8
Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.
A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.
This patch just cleans that up.
Differential revision: https://reviews.llvm.org/D60402
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362576 91177308-0d34-0410-b5e6-96231b3b80d8
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.
Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.
All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.
Differential Revision: https://reviews.llvm.org/D61873
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362509 91177308-0d34-0410-b5e6-96231b3b80d8
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c
static void store64(u64 x, unsigned char* y)
{
for(int i = 0; i != 8; ++i)
y[i] = (x >> ((7-i) * 8)) & 255;
}
static u64 load64(const unsigned char* y)
{
u64 res = 0;
for(int i = 0; i != 8; ++i)
res |= (u64)(y[i]) << ((7-i) * 8);
return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.
Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
>
*((i32)p) = val;
i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;
>
*((i32)p) = BSWAP(val);
Differential Revision: https://reviews.llvm.org/D61843
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362472 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.
Reviewers: qcolombet, spatel
Reviewed By: qcolombet
Subscribers: nemanjai, jsji
Differential Revision: https://reviews.llvm.org/D62552
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362439 91177308-0d34-0410-b5e6-96231b3b80d8
We currently miss the opportunities for optmizing comparisons in the peephole
optimizer if the input is the result of a COPY since we look for record-form
versions of the producing instruction.
This patch simply lets the optimization peek through copies.
Differential revision: https://reviews.llvm.org/D59633
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362438 91177308-0d34-0410-b5e6-96231b3b80d8
In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%.
condc = conda || condb
br condc, label %target, label %fallthrough
It can be transformed to following,
br conda, label %target, label %newbb
newbb:
br condb, label %target, label %fallthrough
Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%.
This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged.
Differential Revision: https://reviews.llvm.org/D62430
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362237 91177308-0d34-0410-b5e6-96231b3b80d8
The powerpc64-"nonle" tests are removed. They fail because of a bug that
Drew is currently working on that affects multiple targets.
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Hal Finkel, Kevin P. Neal
Approved by: Hal Finkel
Differential Revision: http://reviews.llvm.org/D62388
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361985 91177308-0d34-0410-b5e6-96231b3b80d8
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62017
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361875 91177308-0d34-0410-b5e6-96231b3b80d8