When IRBuilder is given an insertion position and there is debug-info, it
sets the DebugLoc of newly inserted instructions to the DebugLoc of the
insertion position. Unfortunately, that means if you insert in front of a
debug intrinsics, your "real" instructions get potentially-misleading
source locations from the debug intrinsics. Worse, if you compile -gmlt to
get source locations but no variable locations, you'll get different source
locations to a normal -g build, which is silly.
Rectify this with the getStableDebugLoc method, which skips over any debug
intrinsics to find the next "real" instruction. This is the source location
that you would get if you compile with -gmlt, and it remains stable in the
presence of debug intrinsics. The changed tests show a few locations where
this has been happening, for example selecting line-zero locations for
instrumentation on a perfectly valid call site.
Differential Revision: https://reviews.llvm.org/D159485
This re-applies 75c487602a ([ORC] Add a MachOBuilder utility, use it to build
MachO debug objects), which was reverted in 99e70cc3a5 due to build
failures. The MachoBuilder class has been refactored to fix the errors.
Fix#64600: the currently implementation is minimal (see
https://reviews.llvm.org/D83758), and an assignment like
`__TEXT_REGION_ORIGIN__ = DEFINED(__TEXT_REGION_ORIGIN__) ? __TEXT_REGION_ORIGIN__ : 0;`
(used by avr-ld[1]) leads to a value of zero (default value in `declareSymbol`),
which is unexpected.
Assign orders to symbol assignments and references so that
for a script-defined symbol, the `DEFINED` results match users'
expectation. I am unclear about GNU ld's exact behavior, but this hopefully
matches its behavior in the majority of cases.
[1]: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/scripttempl/avr.sc
This is the extract side of D159332. The goal is to avoid non-linear costing on patterns where an entire vector is split back into scalars. This is an idiomatic pattern for SLP.
Each vslide operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique extracts, each with a cost linear in LMUL, the overall cost is O(LMUL2) * VLEN/ETYPE. To avoid the degenerate case, fallback to the stack if we're beyond LMUL2.
There's a subtly here. For this to work, we're *relying* on an optimization in LegalizeDAG which tries to reuse the stack slot from a previous extract. In practice, this appear to trigger for patterns within a block, but if we ended up with an explode idiom split across multiple blocks, we'd still be in quadratic territory. I don't think that variant is fixable within SDAG.
It's tempting to think we can do better than going through the stack, but well, I haven't found it yet if it exists. Here's the results for sifive-s280 on all the variants I wrote (all 16 x i64 with V):
output/sifive-x280/linear_decomp_with_slidedown.mca:Total Cycles: 20703
output/sifive-x280/linear_decomp_with_vrgather.mca:Total Cycles: 23903
output/sifive-x280/naive_linear_with_slidedown.mca:Total Cycles: 21604
output/sifive-x280/naive_linear_with_vrgather.mca:Total Cycles: 22804
output/sifive-x280/recursive_decomp_with_slidedown.mca:Total Cycles: 15204
output/sifive-x280/recursive_decomp_with_vrgather.mca:Total Cycles: 18404
output/sifive-x280/stack_by_vreg.mca:Total Cycles: 12104
output/sifive-x280/stack_element_by_element.mca:Total Cycles: 4304
I am deliberately excluding scalable vectors. It functionally works, but frankly, the code quality for an idiomatic explode loop is so terrible either way that it felt better to leave that for future work.
Differential Revision: https://reviews.llvm.org/D159375
This adds code to the loop rotation transformation to ensure that the
computed block execution counts for the loop bodies are the same before
and after the transformation. This isn't always true in practice, but I
believe this is because of numeric inaccuracies in the BlockFrequency
computation.
The invariants this is modeled on and heuristic choice of 0-trip loop
amount is explained in a lenghty comment in the new
`updateBranchWeights()` function.
Differential Revision: https://reviews.llvm.org/D157462
This potentially has a slightly positive performance impact, as
std::visit can be implemented as a `switch`-like jump rather than
a series of `if`s.
More importantly, the reader can be confident is no overlap between the
cases.
Differential Revision: https://reviews.llvm.org/D158678
Only a subset of the fields of DbgVariable are meaningful at any time,
and some fields are re-used for multiple purposes (for example
FrameIndexExprs is used with a throw-away frame-index of 0 to hold a
single DIExpression without needing to add another member). The exact
invariants must be reverse-engineered by inspecting the actual use of
the class, its imprecise/outdated doc-comment, and some asserts.
Refactor DbgVariable into a sum type by inheriting from std::variant.
This makes the active fields for any given state explicit and removes
the need to re-use fields in disparate contexts. As a bonus, it seems to
reduce the size on my x86_64 linux box from 144 bytes to 96 bytes.
There is some potential cost to `std::get` as it must check the active
alternative even when context or an assert obviates it. To try to help
ensure the compiler can optimize out the checks the patch also adds a
helper `get` method which uses the noexcept `std::get_if`.
Some of the extra cost would also be avoided more cleanly with a
refactor that exposes the alternative types in the public interface,
which will come in another patch.
Differential Revision: https://reviews.llvm.org/D158675
* Assert no completions for tests that should not find completions.
* Remove regex mode from complete_from_to, which was unused.
This exposed bugs in 2 of the tests, target stop-hook and
process unload. These were fixed in previous commits but
couldn't be tested properly until this patch.
As per the stack of patches this is attached to, allow users of
BasicBlock::splitBasicBlock to provide an iterator for a position, instead
of just an instruction pointer. This is to fit with my proposal for how to
get rid of debug intrinsics [0]. There are other call-sites that would need
to change, but this is sufficient for a stage2clang self host and some
other C++ projects to build identical binaries, in the context of the whole
remove-DIs project.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152545
This is a follow-on to D158753, and allows the lowering of a
transfer read/write of n-D vectors with a single trailing scalable dimension
to primitive vector ops.
The final conversion to LLVM depends on D158517 and D158752, without
these patches type conversion will fail (or an assert is hit in the LLVM
backend) if the final IR contains an array of scalable vectors.
This patch adds `transform.apply_patterns.vector.lower_create_mask`
which allows the lowering of vector.create_mask/constant_mask to be
tested independently of --convert-vector-to-llvm.
Reviewed By: c-rhodes, awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D159482
It's important that the arch directory be included first so that
its header files which interpose on the default include dir
be included instead of the default ones. The clang driver [1] does
this when not building with -nostdinc, the libcxx build should
do the same.
We found this after https://reviews.llvm.org/D154282 when cross
compiling from non Linux to Linux. If the host machine was not
Linux, _LIBCPP_HAS_NO_TIME_ZONE_DATABASE would be defined in
the default include dir __config_site, while it was undefined
in the arch specific one causing build failures.
I would like to steal one of these bits to denote whether a kind may be
spilled by the register allocator or not, but I'm afraid to touch of any
this code using bitwise operands.
Make flags a first class type using bitfields, rather than launder data
around via `unsigned`.
As noted in
https://github.com/llvm/llvm-project/pull/65392#discussion_r1316259471,
when lowering an extract of a fixed length vector from another vector,
we don't need to perform the vslidedown on the full vector type. Instead
we can extract the smallest subregister that contains the subvector to
be extracted and perform the vslidedown with a smaller LMUL. E.g, with
+Zvl128b:
v2i64 = extract_subvector nxv4i64, 2
is currently lowered as
vsetivli zero, 2, e64, m4, ta, ma
vslidedown.vi v8, v8, 2
This patch shrinks the vslidedown to LMUL=2:
vsetivli zero, 2, e64, m2, ta, ma
vslidedown.vi v8, v8, 2
Because we know that there's at least 128*2=256 bits in v8 at LMUL=2,
and we only need the first 256 bits to extract a v2i64 at index 2.
lowerEXTRACT_VECTOR_ELT already has this logic, so this extracts it out
and reuses it.
I've split this out into a separate PR rather than include it in #65392,
with the hope that we'll be able to generalize it later.
This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
Extend SPIR-V target serialization and deserialization to handle coop
matrix types. Add a roundtrip test. In addition to `FileCheck` checks,
the resulting spirv binary also passes `spir-val` (external tool).
Also fix a type attribute bug surfaced by the `CooperativeMatrixLength`
op.
Multiple matrix operand attributes will be handled in a future patch to
reduce the scope.
Some functions in Process were using LLDB_INVALID_ADDRESS instead of
LLDB_INVALID_TOKEN.
The only visible effect of this appears to be that "process unload
<tab>" would complete to 0 even after the image was unloaded. Since the
command is checking for LLDB_INVALID_TOKEN.
Everything else worked somehow. I've added a check to the existing load
unload tests anyway.
The tab completion cannot be checked as is, but when I make them more
strict in a later patch it will be tested.
This patch refactors extract_subvector lowering to lower to
extract_subreg directly, and to shortcut whenever the index is 0 when
extracting a scalable vector. This doesn't change any of the existing
behaviour, but makes an upcoming patch that extends the scalable path
slightly easier to read.
After commit 610ec954e1 ("[clang] allow const structs/unions/arrays to
be constant expressions for C"), attempts to evaluate
structs/unions/arrays as constants are also performed for C++98 and
C++03.
An assertion was getting tripped up since the potentially-partially
evaluated value was not being reset for those 2 language modes. Make
sure to reset it now for all C++ modes.
Fixes: #65784
This is partly a precommit for an upcoming patch, and partly to remove
the fixed length LMUL restriction similarly to what was done in
https://reviews.llvm.org/D158270, since it's no longer that relevant.
The upstream commit: https://reviews.llvm.org/D151590
added a new flag to mark target specific compiler options.
The side effect of it was that in cases when -### or -v is used without any
input file, clang started emitting an error.
It happened like that becasue there is no compilation actions created
which could consume/verify these target specific options.
This patch changes that error to a warning about unused option in situations
when there is no actions and still generates error when there are actions.
Fix for https://github.com/llvm/llvm-project/issues/64958
Differential Revision: https://reviews.llvm.org/D159361
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
Re-land commit 3787fd942f
This patch adds support for storing OpenMP REQUIRES information in the
semantics symbols for programs/subprograms and modules/submodules, and
populates them during directive resolution. A pass is added to name resolution
that makes sure this information is also propagated across top-level programs,
functions and subprograms.
Storing REQUIRES information inside of semantics symbols will also allow
supporting the propagation of this information across Fortran modules. This
will come as a separate patch.
The `bool DirectiveAttributeVisitor::Pre(const parser::SpecificationPart &x)`
method is removed since it resulted in specification parts being visited twice.
This is patch 3/5 of a series splitting D149337 to simplify review.
Differential Revision: https://reviews.llvm.org/D157983
Querying the debug_str_offsets section requires parsing the top level DIE of the
CU (as well as the section itself); the current getter, however, assumes this is
done elsewhere. This patch changes the getter behavior to match what is done in
other getter methods (e.g. `getCompilationDir` or `getVariableForAddress`), in
other words, `extractDIEsIfNeeded` is now called prior to returning the
debug_str_offsets contributions for the Unit.
One way in which this bug manifested is when `dwarfdump --debug-str-offsets` is
invoked: because the DIEs are never parsed, we incorrectly print an empty
section (with no warnings or errors).
Differential Revision: https://reviews.llvm.org/D159484
Currently there is no PrintOnLeft attribute set, which results in an
empty switch-case. When compiling this, MSVC issues a warning saying
that the switch-case is empty. Fix this by using a macro and checking
if this macro is defined or not.
Links to D157394