Commit Graph

44737 Commits

Author SHA1 Message Date
Petar Avramovic
3c5ea6a039 [MIPatternMatch]: Add matchers for binary instructions
Add matchers that support commutative and non-commutative binary opcodes.

Differential Revision: https://reviews.llvm.org/D99736
2021-04-27 11:37:42 +02:00
Petar Avramovic
692456b891 [MIPatternMatch]: Add mi_match for MachineInstr
This utility allows more efficient start of pattern match.
Often MachineInstr(MI) is available and instead of using
mi_match(MI.getOperand(0).getReg(), MRI, ...) followed by
MRI.getVRegDef(Reg) that gives back MI we now use
mi_match(MI, MRI, ...).

Differential Revision: https://reviews.llvm.org/D99735
2021-04-27 11:08:16 +02:00
Petar Avramovic
00669818e5 [MIPatternMatch]: Add ICstRegMatch
Matches G_CONSTANT and returns its def register.

Differential Revision: https://reviews.llvm.org/D99734
2021-04-27 10:53:17 +02:00
Petar Avramovic
94c4ae8b43 [GlobalISel]: Add a getConstantIntVRegVal utility
Returns ConstantInt from G_CONSTANT instruction given its def register.

Differential Revision: https://reviews.llvm.org/D99733
2021-04-27 10:52:07 +02:00
dfukalov
584872cffa [TTI] NFC: Change getScalarizationOverhead and getOperandsScalarizationOverhead to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101283
2021-04-27 08:51:48 +03:00
Lang Hames
53ce1eab3b [ORC] Make LLVMOrcLLJITBuilderSetJITTargetMachineBuilder consume as advertised.
This should fix some of the memory leaks seen in the ORC C API test case.
2021-04-26 22:26:38 -07:00
Chen Zheng
611f6b8043 [XCOFF] make .file directive have directory info
The .file directive is changed to only have basename in D36018 for
ELF.

But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D99785
2021-04-27 00:15:23 -04:00
Lang Hames
895d41ab8e Reapply "[ORC] Add unit tests for parts of the ..." with fixes and improvements.
This reapplies 8740360093b, which was reverted in bbddadd46e4 due to buildbot
errors.

This version checks that a JIT instance can be safely constructed, skipping
tests if it can not be. To enable this it introduces new C API to retrieve and
set the target triple for a JITTargetMachineBuilder.
2021-04-26 20:44:40 -07:00
William S. Moses
e7084f2810 [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 20:12:12 -04:00
Fangrui Song
d0dc7f2344 [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 mostly unused pointers. GlobalOpt considers that the pointers can
potentially retain allocated objects, so GlobalOpt cannot optimize out the
`NoopStatistic` variables (see D69428 for more context), wasting 23KiB for stage
2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. For now this patch adds
`#if LLVM_ENABLE_STATS` so `OptimizationRemarkMissed` is skipped in
LLVM_ENABLE_STATS==0 builds.  If these `OptimizationRemarkMissed` are useful in
LLVM_ENABLE_STATS==0 builds, we can replace `llvm::Statistic` with
`llvm::TrackingStatistic`, or use a different abstraction to keep track of the strings.

Similarly, skip the code in `mlir/lib/Pass/PassStatistics.cpp` which
calls `getName`/`getDesc`/`getValue`.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 16:47:32 -07:00
William S. Moses
5b35b95712 Revert "[NVPTX] Enable lowering of atomics on local memory"
This reverts commit fede99d386ec9e7bab2762aa16cb10c0513ae464.
2021-04-26 19:33:01 -04:00
William S. Moses
9ac62ee58d [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 19:27:27 -04:00
Lei Zhang
6f2c652025 Revert "[ADT] Remove StatisticBase and make NoopStatistic empty"
This reverts commit b5403117814a7c39b944839e10492493f2ceb4ac
because it breaks MLIR build:

https://buildkite.com/mlir/mlir-core/builds/13299#ad0f8901-dfa4-43cf-81b8-7940e2c6c15b
2021-04-26 18:31:04 -04:00
Fangrui Song
4b2b6e81e1 Revert D76519 "[NFC] Refactor how CFI section types are represented in AsmPrinter"
This reverts commit 0ce723cb228bc1d1a0f5718f3862fb836145a333.

D76519 was not quite NFC. If we see a CFISection::Debug function before a
CFISection::EH one (-fexceptions -fno-asynchronous-unwind-tables), we may
incorrectly pick CFISection::Debug and emit a `.cfi_sections .debug_frame`.
We should use .eh_frame instead.

This scenario is untested.
2021-04-26 15:17:28 -07:00
Lang Hames
0a240e5b31 [ORC] C API updates.
Adds support for creating custom MaterializationUnits in the C API with the new
LLVMOrcCreateCustomMaterializationUnit function.

Modifies ownership rules for LLVMOrcAbsoluteSymbols to make it consistent with
LLVMOrcCreateCustomMaterializationUnit. This is an ABI breaking change for any
clients of the LLVMOrcAbsoluteSymbols API.

Adds LLVMOrcLLJITGetObjLinkingLayer and LLVMOrcObjectLayerEmit functions to
allow clients to get a reference to an LLJIT instance's linking layer, then
emit an object file using it. This can be used to support construction of
custom materialization units in the common case where those units will
generate an object file that needs to be emitted to complete the
materialization.
2021-04-26 13:58:37 -07:00
Lang Hames
20ebd8978c [ORC] Fix type name.
Rename JITTargetSymbolFlags to JITSymbolTargetFlags. This matches the convention
used for JITSymbolGenericFlags.
2021-04-26 13:58:37 -07:00
Fangrui Song
7fa56f879f [ADT] Remove StatisticBase and make NoopStatistic empty
In LLVM_ENABLE_STATS=0 builds, `llvm::Statistic` maps to `llvm::NoopStatistic`
but has 3 unused pointers. GlobalOpt considers that the pointers can potentially
retain allocated objects, so GlobalOpt cannot optimize out the `NoopStatistic`
variables (see D69428 for more context), wasting 23KiB for stage 2 clang.

This patch makes `NoopStatistic` empty and thus reclaims the wasted space.  The
clang size is even smaller than applying D69428 (slightly smaller in both .bss and
.text).
```
# This means the D69428 optimization on clang is mostly nullified by this patch.
HEAD+D69428: size(.bss) = 0x0725a8
HEAD+D101211: size(.bss) = 0x072238

# bloaty - HEAD+D69428 vs HEAD+D101211
# With D101211, we also save a lot of string table space (.rodata).
    FILE SIZE        VM SIZE
 --------------  --------------
  -0.0%     -32  -0.0%     -24    .eh_frame
  -0.0%    -336  [ = ]       0    .symtab
  -0.0%    -360  [ = ]       0    .strtab
  [ = ]       0  -0.2%    -880    .bss
  -0.0% -2.11Ki  -0.0% -2.11Ki    .rodata
  -0.0% -2.89Ki  -0.0% -2.89Ki    .text
  -0.0% -5.71Ki  -0.0% -5.88Ki    TOTAL
```

Note: LoopFuse is a disabled pass. This patch adds `#if LLVM_ENABLE_STATS` so
`OptimizationRemarkMissed` is skipped in LLVM_ENABLE_STATS==0 builds.  If these
`OptimizationRemarkMissed` are useful and not noisy, we can replace
`llvm::Statistic` with `llvm::TrackingStatistic` in the future.

Reviewed By: lattner

Differential Revision: https://reviews.llvm.org/D101211
2021-04-26 13:39:35 -07:00
William S. Moses
98f63bbad6 [Lexer] Allow LLLexer to be used as an API
Explose LLVM Lexer for usage externally as an API

Differential Revision: https://reviews.llvm.org/D100920
2021-04-26 12:43:14 -04:00
Paul C. Anagnostopoulos
6409c51e4c [TableGen] Change assertion information from a tuple to a struct [NFC]
Differential Revision: https://reviews.llvm.org/D100854
2021-04-26 09:57:16 -04:00
Tim Renouf
79e5f97ca7 [MC][AMDGPU][llvm-objdump] Synthesized local labels in disassembly
1. Add an accessor function to MCSymbolizer to retrieve addresses
   referenced by a symbolizable operand, but not resolved to a symbol.
   That way, the caller can synthesize labels at those addresses and
   then retry disassembling the section.

2. Implement that in AMDGPU -- a failed symbol lookup results in the
   address being added to a vector returned by the new function.

3. Use that in llvm-objdump when using MCSymbolizer (which only happens
   on AMDGPU) and SymbolizeOperands is on.

Differential Revision: https://reviews.llvm.org/D101145

Change-Id: I19087c3bbfece64bad5a56ee88bcc9110d83989e
2021-04-26 13:56:36 +01:00
David Sherwood
6475ab5a00 [AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function
When vectorising for AArch64 targets if you specify the SVE attribute
we automatically then treat masked loads and stores as legal. Also,
since we have no cost model for masked memory ops we believe it's
cheap to use the masked load/store intrinsics even for fixed width
vectors. This can lead to poor code quality as the intrinsics will
currently be scalarised in the backend. This patch adds a basic
cost model that marks fixed-width masked memory ops as significantly
more expensive than for scalable vectors.

Tests for the cost model are added here:

  Transforms/LoopVectorize/AArch64/masked-op-cost.ll

Differential Revision: https://reviews.llvm.org/D100745
2021-04-26 11:00:03 +01:00
Levy Hsu
b3953bd33d [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Tomasz Miąsko
d94bcd2889 [demangler] Use standard semantics for StringView::substr
The StringView::substr now accepts a substring starting position and its
length instead of previous non-standard `from` & `to` positions.

All uses of two argument StringView::substr are in MicrosoftDemangler
and have 0 as a starting position, so no changes are necessary.

This also fixes a bug where attempting to extract a suffix with substr
(a `to` position equal to size) would return a substring without the
last character.

Fixing the issue should not introduce observable changes in the
demangler, since as currently used, a second argument to
StringView::substr is either: 1) a result of a successful call to
StringView::find and so necessarily smaller than size., or 2) in the
case of Demangler::demangleCharLiteral potentially equal to size, but
with demangler expecting more data to follow later on and failing either
way.

Reviewed By: #libc_abi, ldionne, erik.pilkington

Differential Revision: https://reviews.llvm.org/D100246
2021-04-25 13:56:41 +02:00
Xiang1 Zhang
6da00a5d84 [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-04-25 09:45:41 +08:00
Lang Hames
8daeb57ed9 [ORC][C-bindings] Fix missing ')' in comments. 2021-04-24 18:04:57 -07:00
Nikita Popov
da6fbe8578 [PatternMatch] Improve m_Deferred() documentation (NFC)
m_Deferred() has nothing to do with commutative matchers, it needs
to be used whenever the value to match is determinde as part of
the same match expression.
2021-04-24 21:00:24 +02:00
RamNalamothu
065e9bee9e [NFC] Refactor how CFI section types are represented in AsmPrinter
In terms of readability, the `enum CFIMoveType` didn't better document what it
intends to convey i.e. the type of CFI section that gets emitted.

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D76519
2021-04-24 23:29:42 +05:30
dfukalov
4398288fa7 [GVN] Clobber partially aliased loads.
Use offsets stored in `AliasResult` implemented in D98718.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D95543
2021-04-24 14:14:20 +03:00
Teresa Johnson
0f0622320f Mark type test intrinsics as speculatable to fix inline cost
There is already code in InlineCost.cpp to identify and ignore ephemeral
values (llvm.assume intrinsics and other side-effect free instructions
only feeding the assumes). However, because llvm.type.test intrinsics
were not marked speculatable, they and any instructions specifically
feeding the type test (typically a bitcast) were being counted towards
the instruction cost when inlining. This was causing profile matching
issues in some cases when enabling -fwhole-program-vtables for whole
program devirtualization.

According to the language reference, the speculatable attribute means:
"the function does not have any effects besides calculating its result
and does not have undefined behavior". I see no reason why type tests
cannot be marked with this attribute.

There are 2 test changes:

llvm/test/Transforms/Inline/ephemeral.ll: I added a type test intrinsic
here to verify the fix. Also, I found the test was not actually testing
what it originally intended. Many of the existing instructions were
optimized away by -Oz, and the cost of inlining was negative due to the
benefit of removing the call. So I changed the test to simply invoke the
inline pass and check the number of instructions computed by InlineCost.
I also fixed an instruction that was not actually used anywhere.

llvm/test/Transforms/SimplifyCFG/no-md-sink.ll needed to be made more
robust to code changes that reordered the metadata.

Differential Revision: https://reviews.llvm.org/D101180
2021-04-23 10:02:31 -07:00
Nemanja Ivanovic
572869bea6 [PowerPC] Add vec_ctsl and vec_ctul to altivec.h
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
2021-04-23 11:03:38 -05:00
Sander de Smalen
b71b6a828f [TTI] NFC: Change getIntImmCost[Inst|Intrin] to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100565
2021-04-23 16:06:36 +01:00
Sander de Smalen
b447679db3 [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
Sander de Smalen
a6cde052ae [TTI] NFC: Change getMemcpyCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100563
2021-04-23 16:06:35 +01:00
Sander de Smalen
9eda3a0c2e [TTI] NFC: Change getGEPCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100562
2021-04-23 16:06:35 +01:00
Sander de Smalen
9390d5a48f [TTI] NFC: Change getAddressComputationCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100561
2021-04-23 16:06:35 +01:00
dfukalov
9779f666df [TTI] NFC: Use InstructionCost to store ScalarizationCost in IntrinsicCostAttributes.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D101151
2021-04-23 18:02:00 +03:00
Daniil Fukalov
3d4fc1f0f6 [TTI] Fix ScalarizationCost initialization.
In cases when ScalarizationCostPassed has no value, UINT_MAX is actually used
for cost estimation in `return ScalarCalls * ScalarCost + ScalarizationCost`.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101099
2021-04-23 17:59:59 +03:00
Stephen Tozer
8b8275b1fc Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
Previous build failures were caused by an error in bitcode reading and
writing for DIArgList metadata, which has been fixed in e5d844b587.
There were also some unnecessary asserts that were being triggered on
certain builds, which have been removed.

This reverts commit dad5caa59e6b2bde8d6cf5b64a972c393c526c82.
2021-04-23 10:54:01 +01:00
Jay Foad
899f1c90ad [GlobalISel] Remove ConstantFoldingMIRBuilder
ConstantFoldingMIRBuilder was an experiment which is not used for
anything. The constant folding functionality is now part of
CSEMIRBuilder.

Differential Revision: https://reviews.llvm.org/D101050
2021-04-23 09:13:27 +01:00
Fangrui Song
c83fe04e08 [IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.

Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)

Also document the function attribute "frame-pointer" which is long overdue.

Differential Revision: https://reviews.llvm.org/D101016
2021-04-22 18:07:30 -07:00
Levy Hsu
04656c7e3e [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Craig Topper
ae70485f74 [RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.
These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925
2021-04-22 10:44:38 -07:00
Fangrui Song
dcdc354ce1 Temporarily revert the code part of D100981 "Delete le32/le64 targets"
This partially reverts commit 77ac823fd285973cfb3517932c09d82e6a32f46d.

Halide uses le32/le64 (https://github.com/halide/Halide/pull/5934).
Temporarily brings back the code part to give them some time for migration.
2021-04-22 10:18:44 -07:00
Krzysztof Parzyszek
e644e52d90 [Hexagon] Add HVX intrinsics for conditional vector loads/stores
Intrinsics for the following instructions are added. The intrinsic
name is "int_hexagon_<inst>[_128B]", e.g.
  int_hexagon_V6_vL32b_pred_ai        for 64-byte version
  int_hexagon_V6_vL32b_pred_ai_128B   for 128-byte version

V6_vL32b_pred_ai        if (Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_pred_pi        if (Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_pred_ppu       if (Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_npred_ai       if (!Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_npred_pi       if (!Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_npred_ppu      if (!Pv4) Vd32 = vmem(Rx32++Mu2)

V6_vL32b_nt_pred_ai     if (Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_pred_pi     if (Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_pred_ppu    if (Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vL32b_nt_npred_ai    if (!Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_npred_pi    if (!Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_npred_ppu   if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt

V6_vS32b_pred_ai        if (Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_pred_pi        if (Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_pred_ppu       if (Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32b_npred_ai       if (!Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_npred_pi       if (!Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_npred_ppu      if (!Pv4) vmem(Rx32++Mu2) = Vs32

V6_vS32Ub_pred_ai       if (Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_pred_pi       if (Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_pred_ppu      if (Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32Ub_npred_ai      if (!Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_npred_pi      if (!Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_npred_ppu     if (!Pv4) vmemu(Rx32++Mu2) = Vs32

V6_vS32b_nt_pred_ai     if (Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_pred_pi     if (Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_pred_ppu    if (Pv4) vmem(Rx32++Mu2):nt = Vs32
V6_vS32b_nt_npred_ai    if (!Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_npred_pi    if (!Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_npred_ppu   if (!Pv4) vmem(Rx32++Mu2):nt = Vs32
2021-04-22 11:49:29 -05:00
Irina Dobrescu
53c3a79b92 [flang][openmp] Add General Semantic Checks for Allocate Directive
This patch adds semantic checks for the General Restrictions of the
Allocate Directive.

Since the requires directive is not yet implemented in Flang, the
restriction:
```
allocate directives that appear in a target region must
specify an allocator clause unless a requires directive with the
dynamic_allocators clause is present in the same compilation unit
```
will need to be updated at a later time.

A different patch will be made with the Fortran specific restrictions of
this directive.

I have used the code from https://reviews.llvm.org/D89395 for the
CheckObjectListStructure function.

Co-authored-by: Isaac Perry <isaac.perry@arm.com>

Reviewed By: clementval, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D91159
2021-04-22 16:15:06 +00:00
Simon Pilgrim
6870b3d5f3 [LTO] Caching.h - remove unused <string> include. NFCI. 2021-04-22 14:07:12 +01:00
Jun Ma
684cf0c2d5 [DAGCombiner] Allow operand of step_vector to be negative.
It is proper to relax non-negative limitation of step_vector.
Also this patch adds more combines for step_vector:
(sub X, step_vector(C)) -> (add X,  step_vector(-C))

Differential Revision: https://reviews.llvm.org/D100812
2021-04-22 20:58:03 +08:00
Jay Foad
f7148011b7 Fix typo "beneficiates" in comments 2021-04-22 12:30:16 +01:00
Wenlei He
e734b4c21b [CSSPGO][llvm-profdata] Support trimming cold context when merging profiles
The change adds support for triming and merging cold context when mergine CSSPGO profiles using llvm-profdata. This is similar to the context profile trimming in llvm-profgen, however the flexibility to trim cold context after profile is generated can be useful.

Differential Revision: https://reviews.llvm.org/D100528
2021-04-22 00:42:37 -07:00
Max Kazantsev
9fbf6639f4 [GVN] Introduce loop load PRE
This patch allows PRE of the following type of loads:

```
preheader:
  br label %loop

loop:
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  br label %merge

merge:
  ...
  br i1 ..., label %loop, label %exit

```

Into
```
preheader:
  %x0 = load %p
  br label %loop

loop:
  %x.pre = phi(x0, x2)
  br i1 ..., label %merge, label %clobber

clobber:
  call foo() // Clobbers %p
  %x1 = load %p
  br label %merge

merge:
  x2 = phi(x.pre, x1)
  ...
  br i1 ..., label %loop, label %exit

```

So instead of loading from %p on every iteration, we load only when the actual clobber happens.
The typical pattern which it is trying to address is: hot loop, with all code inlined and
provably having no side effects, and some side-effecting calls on cold path.

The worst overhead from it is, if we always take clobber block, we make 1 more load
overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken
at least once, the transform is neutral or profitable.

There are several improvements prospect open up:
- We can sometimes be smarter in loop-exiting blocks via split of critical edges;
- If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that
  we don't know if their sum is colder than the header.

Differential Revision: https://reviews.llvm.org/D99926
Reviewed By: reames
2021-04-22 12:50:38 +07:00