This utility allows more efficient start of pattern match.
Often MachineInstr(MI) is available and instead of using
mi_match(MI.getOperand(0).getReg(), MRI, ...) followed by
MRI.getVRegDef(Reg) that gives back MI we now use
mi_match(MI, MRI, ...).
Differential Revision: https://reviews.llvm.org/D99735
The .file directive is changed to only have basename in D36018 for
ELF.
But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D99785
This reapplies 8740360093b, which was reverted in bbddadd46e4 due to buildbot
errors.
This version checks that a JIT instance can be safely constructed, skipping
tests if it can not be. To enable this it introduces new C API to retrieve and
set the target triple for a JITTargetMachineBuilder.
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.
Differential Revision: https://reviews.llvm.org/D98650
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 mostly unused pointers. GlobalOpt considers that the pointers can
potentially retain allocated objects, so GlobalOpt cannot optimize out the
`NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage
2 clang.
This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238
# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
FILE SIZE VM SIZE
-------------- --------------
-0.0% -32 -0.0% -24 .eh_frame
-0.0% -336 [ = ] 0 .symtab
-0.0% -360 [ = ] 0 .strtab
[ = ] 0 -0.2% -880 .bss
-0.0% -2.11Ki -0.0% -2.11Ki .rodata
-0.0% -2.89Ki -0.0% -2.89Ki .text
-0.0% -5.71Ki -0.0% -5.88Ki TOTAL
```
Note: LoopFuse is a disabled pass. For now this patch adds
`#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in
LLVM_ENABLE_STATS==0 builds. If these `OptimizationRemarkMissed` are useful in
LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with
`llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings.
Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which
calls `getName`/`getDesc`/`getValue`.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D101211
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.
Differential Revision: https://reviews.llvm.org/D98650
This reverts commit 0ce723cb228bc1d1a0f5718f3862fb836145a333.
D76519 was not quite NFC. If we see a CFISection::Debug function before a
CFISection::EH one (-fexceptions -fno-asynchronous-unwind-tables), we may
incorrectly pick CFISection::Debug and emit a `.cfi_sections .debug_frame`.
We should use .eh_frame instead.
This scenario is untested.
Adds support for creating custom MaterializationUnits in the C API with the new
LLVMOrcCreateCustomMaterializationUnit function.
Modifies ownership rules for LLVMOrcAbsoluteSymbols to make it consistent with
LLVMOrcCreateCustomMaterializationUnit. This is an ABI breaking change for any
clients of the LLVMOrcAbsoluteSymbols API.
Adds LLVMOrcLLJITGetObjLinkingLayer and LLVMOrcObjectLayerEmit functions to
allow clients to get a reference to an LLJIT instance's linking layer, then
emit an object file using it. This can be used to support construction of
custom materialization units in the common case where those units will
generate an object file that needs to be emitted to complete the
materialization.
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 unused pointers. GlobalOpt considers that the pointers can potentially
retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic`
variables (see D69428 for more context), wasting 23KiB for stage 2 clang.
This patch makes `NoopStatistic` empty and thus reclaims the wasted space. The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238
# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
FILE SIZE VM SIZE
-------------- --------------
-0.0% -32 -0.0% -24 .eh_frame
-0.0% -336 [ = ] 0 .symtab
-0.0% -360 [ = ] 0 .strtab
[ = ] 0 -0.2% -880 .bss
-0.0% -2.11Ki -0.0% -2.11Ki .rodata
-0.0% -2.89Ki -0.0% -2.89Ki .text
-0.0% -5.71Ki -0.0% -5.88Ki TOTAL
```
Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so
`OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds. If these
`OptimizationRemarkMissed` are useful and not noisy, we can replace
`llvm::Statistic` with `llvm::TrackingStatistic` in the future.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D101211
1. Add an accessor function to MCSymbolizer to retrieve addresses
referenced by a symbolizable operand, but not resolved to a symbol.
That way, the caller can synthesize labels at those addresses and
then retry disassembling the section.
2. Implement that in AMDGPU -- a failed symbol lookup results in the
address being added to a vector returned by the new function.
3. Use that in llvm-objdump when using MCSymbolizer (which only happens
on AMDGPU) and SymbolizeOperands is on.
Differential Revision: https://reviews.llvm.org/D101145
Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.
Tests for the cost model are added here:
Transforms/LoopVectorize/AArch64/masked-op-cost.ll
Differential Revision: https://reviews.llvm.org/D100745
The StringView::substr now accepts a substring starting position and its
length instead of previous non-standard `from` & `to` positions.
All uses of two argument StringView::substr are in MicrosoftDemangler
and have 0 as a starting position, so no changes are necessary.
This also fixes a bug where attempting to extract a suffix with substr
(a `to` position equal to size) would return a substring without the
last character.
Fixing the issue should not introduce observable changes in the
demangler, since as currently used, a second argument to
StringView::substr is either: 1) a result of a successful call to
StringView::find and so necessarily smaller than size., or 2) in the
case of Demangler::demangleCharLiteral potentially equal to size, but
with demangler expecting more data to follow later on and failing either
way.
Reviewed By: #libc_abi, ldionne, erik.pilkington
Differential Revision: https://reviews.llvm.org/D100246
m_Deferred() has nothing to do with commutative matchers, it needs
to be used whenever the value to match is determinde as part of
the same match expression.
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D76519
There is already code in InlineCost.cpp to identify and ignore ephemeral
values (llvm.assume intrinsics and other side-effect free instructions
only feeding the assumes). However, because llvm.type.test intrinsics
were not marked speculatable, they and any instructions specifically
feeding the type test (typically a bitcast) were being counted towards
the instruction cost when inlining. This was causing profile matching
issues in some cases when enabling -fwhole-program-vtables for whole
program devirtualization.
According to the language reference, the speculatable attribute means:
"the function does not have any effects besides calculating its result
and does not have undefined behavior". I see no reason why type tests
cannot be marked with this attribute.
There are 2 test changes:
llvm/test/Transforms/Inline/ephemeral.ll: I added a type test intrinsic
here to verify the fix. Also, I found the test was not actually testing
what it originally intended. Many of the existing instructions were
optimized away by -Oz, and the cost of inlining was negative due to the
benefit of removing the call. So I changed the test to simply invoke the
inline pass and check the number of instructions computed by InlineCost.
I also fixed an instruction that was not actually used anywhere.
llvm/test/Transforms/SimplifyCFG/no-md-sink.ll needed to be made more
robust to code changes that reordered the metadata.
Differential Revision: https://reviews.llvm.org/D101180
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
In cases when ScalarizationCostPassed has no value, UINT_MAX is actually used
for cost estimation in `return ScalarCalls * ScalarCost + ScalarizationCost`.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D101099
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.
This reverts commit dad5caa59e6b2bde8d6cf5b64a972c393c526c82.
ConstantFoldingMIRBuilder was an experiment which is not used for
anything. The constant folding functionality is now part of
CSEMIRBuilder.
Differential Revision: https://reviews.llvm.org/D101050
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
These instructions don't really exist, but we have ways we can
emulate them.
.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.
For unmasked vmsge(u).vx we use:
vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd
For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt
For other masked cases we use this sequence:
vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.
Differential Revision: https://reviews.llvm.org/D100925
This partially reverts commit 77ac823fd285973cfb3517932c09d82e6a32f46d.
Halide uses le32/le64 (https://github.com/halide/Halide/pull/5934).
Temporarily brings back the code part to give them some time for migration.
Intrinsics for the following instructions are added. The intrinsic
name is "int_hexagon_<inst>[_128B]", e.g.
int_hexagon_V6_vL32b_pred_ai for 64-byte version
int_hexagon_V6_vL32b_pred_ai_128B for 128-byte version
V6_vL32b_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_nt_pred_ai if (Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_pred_pi if (Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_pred_ppu if (Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vL32b_nt_npred_ai if (!Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_npred_pi if (!Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_npred_ppu if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vS32b_pred_ai if (Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_pred_pi if (Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_pred_ppu if (Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32b_npred_ai if (!Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_npred_pi if (!Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_npred_ppu if (!Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32Ub_pred_ai if (Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_pred_pi if (Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_pred_ppu if (Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32Ub_npred_ai if (!Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_npred_pi if (!Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_npred_ppu if (!Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32b_nt_pred_ai if (Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_pred_pi if (Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_pred_ppu if (Pv4) vmem(Rx32++Mu2):nt = Vs32
V6_vS32b_nt_npred_ai if (!Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_npred_pi if (!Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_npred_ppu if (!Pv4) vmem(Rx32++Mu2):nt = Vs32
This patch adds semantic checks for the General Restrictions of the
Allocate Directive.
Since the requires directive is not yet implemented in Flang, the
restriction:
```
allocate directives that appear in a target region must
specify an allocator clause unless a requires directive with the
dynamic_allocators clause is present in the same compilation unit
```
will need to be updated at a later time.
A different patch will be made with the Fortran specific restrictions of
this directive.
I have used the code from https://reviews.llvm.org/D89395 for the
CheckObjectListStructure function.
Co-authored-by: Isaac Perry <isaac.perry@arm.com>
Reviewed By: clementval, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D91159
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X, step_vector(-C))
Differential Revision: https://reviews.llvm.org/D100812
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.
Differential Revision: https://reviews.llvm.org/D100528
This patch allows PRE of the following type of loads:
```
preheader:
br label %loop
loop:
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
br label %merge
merge:
...
br i1 ..., label %loop, label %exit
```
Into
```
preheader:
%x0 = load %p
br label %loop
loop:
%x.pre = phi(x0, x2)
br i1 ..., label %merge, label %clobber
clobber:
call foo() // Clobbers %p
%x1 = load %p
br label %merge
merge:
x2 = phi(x.pre, x1)
...
br i1 ..., label %loop, label %exit
```
So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.
The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.
There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
we don't know if their sum is colder than the header.
Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames