To go with D149267 and D149967, this adds predicated mla/mls patterns, selected
from select(mask, add(a, mul(b, c)), a) -> mla(a, mask, b, c). The existing
patterns are eventually removed by D149967.
Differential Revision: https://reviews.llvm.org/D149969
This patch splits the GlobalISelEmitter.cpp file, which imports DAG ISel patterns for GISel, into separate "GISelMatchTable.h/cpp" files.
The main motive is readability & maintainability. GlobalISelEmitter.cpp was about 6400 lines of mixed code, some bits implementing the match table codegen, some others dedicated to importing DAG patterns.
Now it's down to 2700 + a 2150 header + 2000 impl.
It's a tiny bit more lines overall but that's to be expected - moving
inline definitions to out-of-line, adding comments in the .cpp, etc. all of that takes additional space, but I think the tradeoff is worth it.
I did as little unrelated code changes as possible, I would say the biggest change is the introduction of the `gi` namespace used to prevent name conflicts/ODR violations with type common names such as `Matcher`.
It was previously not an issue because all of the code was in an anonymous namespace.
This moves all of the "match table" code out of the file, so predicates,
rules, and actions are all separated now. I believe this helps separating concerns, now `GlobalISelEmitter.cpp` is more focused on importing DAG patterns into GI, instead of also containing the whole match table internals as well.
Note: the new files have a "GISel" prefix to make them distinct from the other "GI" files in the same folder, which are for the combiner.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D151432
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.
The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
The build failure should be fixed by de681d53. Follow-up refactor will
be done in future patches.
This reverts commit e7c5ced0b9f0551ea17e1d2b48be86f03a772c59.
Fix a bug in detecting unknown ids as mods of known ids that was
preventing certain fusions.
While at this, fix the function signature of `detectAsMod` function to
have output as the last argument.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D152055
Before serializing, optimizations on llvm were only called on path to
hsaco, and not cubin. Define opt-level for `gpu-to-cubin` pass as well,
and move call to optimize llvm to a common place.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D151554
Correctly account for the fact that certain targets do not use the generic address space for the implicit VTT argument. This entails adjusting `ItaniumCXXABI::buildStructorSignature`, `ItaniumCXXABI::addImplicitStructorParams` and `ItaniumCXXABI::getImplicitConstructorArgs` to use the target's global variable address space. The associated test is temporarily marked `XFAIL` as additional fixes are needed.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D150746
Each COFFSection bind MCSection when created. No need to iterate
throught MCAssembler when writeSection.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D151793
Change to system assembler to compile assembly files even
-fintegrated-as is specified. We don't have a good Clang as
for now for assembly files on AIX.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D148490
Previously the SignedVarInt was incorrectly defined. Follow up work
needed for improving Array printing/parsing, but correcting the
definitions for now.
The comment moved is referring to the --output-asm-syntax flag rather
than the --print-imm-hex flag, but seems to have mistakenly been put
under the definition of that flag due to some misplaced line numbers on
phabricator.
If the value was already known to not be uniform for the previous
(smaller VF), it cannot be uniform for the larger VF.
This slightly reduces compile-time, once uniformity checks are becoming
a bit more expensive due to using SCEV rewriting (D148841).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D151658
This patch creates skeleton implementation for the DWARFLinkerParallel.
It also integrates DWARFLinkerParallel into dsymutil and llvm-dwarfutil,
so that empty DWARFLinker::link() can be called. To do this new command
line option is added "--linker apple/llvm". Additionally it changes
existing DWARFLinker interfaces/implementations to be compatible:
use Error for error reporting for the DWARFStreamer, make DWARFFile to
owner of referenced resources, other small refactorings.
Differential Revision: https://reviews.llvm.org/D147952
This patch uses castAs instead of getAs which will assert if the type doesn't match in ConvertQualTypeToKind(clang::ASTContext const &, clang::QualType).
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D151928
This patch uses castAs instead of getAs which will assert if the type doesn't match to resolve dereference issue with nullptr FPT when calling getThisType() in clang::CodeGen::CGDebugInfo::CreateType(clang::MemberPointerType const *, llvm::DIFile *).
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D151947
I've kept the legalForCartesianProduct call, but this requires us to maintain 32-bit/64-bit integer lists - we might want to just use legalIf and perform the type pair set matching manually.
I've kept the legalForCartesianProduct call, but this requires us to maintain 32-bit/64-bit integer lists - we might want to just use legalIf and perform the type pair set matching manually.
Make `qualifyWindowsLibrary` and `addStackProbeTargetAttributes`
protected members of `TargetCodeGenInfo`.
These are helper functions used by `getDependentLibraryOption` and
`setTargetAttributes` methods when targeting Windows. The change will
allow these functions to be reused after splitting `TargetInfo.cpp`.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D150178
This patch creates skeleton implementation for the DWARFLinkerParallel.
It also integrates DWARFLinkerParallel into dsymutil and llvm-dwarfutil,
so that empty DWARFLinker::link() can be called. To do this new command
line option is added "--linker apple/llvm". Additionally it changes
existing DWARFLinker interfaces/implementations to be compatible:
use Error for error reporting for the DWARFStreamer, make DWARFFile to
owner of referenced resources, other small refactorings.
Differential Revision: https://reviews.llvm.org/D147952
The lists contain differences between register numbers, not the register
numbers themselves. Since a difference can also be negative, this also
changes its type to signed.
Changing the type to signed exposed a "bug". For AMDGPU, which has many
registers, the first element of a sequence could be as big as ~45k.
The value does not fit into int16_t, but fits into uint16_t. The bug
didn't show up because of unsigned wrapping and truncation of the Val
field in the advance() method.
To fix the issue, I changed the way regunit difflists are encoded. The
4-bit 'scale' field of MCRegisterDesc::RegUnit was replaced by 12-bit
number of the first regunit, and the first element of each of the lists
was removed. The higher 20 bits of RegUnit field contain the initial
offset into DiffLists array.
AMDGPU has 1'409 regunits (2^12 = 4'096), and the biggest offset is
80'041 (2^20 = 1'048'576). That is, there is enough room.
Changing the encoding method also resulted in a smaller array size, the
numbers are below (I omitted targets with less than 100 elements).
```
AMDGPU | 80052 | 78741 | -1,6%
RISCV | 6498 | 6297 | -3,1%
ARM | 4181 | 3966 | -5,1%
AArch64 | 2770 | 2592 | -6,4%
PPC | 1578 | 1441 | -8,7%
Hexagon | 994 | 740 | -25,6%
R600 | 508 | 398 | -21,7%
VE | 471 | 459 | -2,5%
Sparc | 381 | 363 | -4,7%
X86 | 326 | 208 | -36,2%
Mips | 253 | 200 | -20,9%
SystemZ | 186 | 162 | -12,9%
```
Reviewed By: foad, arsenm
Differential Revision: https://reviews.llvm.org/D151036