This adds a register "svg" which mirrors SVE's "vg" register.
This reports the streaming vector length at all times, read
from the ZA ptrace header.
This register is needed first to implement ZA resizing as
the streaming vector length changes. Like vg, svg will be
expedited to the client so it can reconfigure its register
definitions.
The other use is for users to be able to know the streaming
vector length without resorting to counting the (many, many)
bytes in ZA, or temporarily entering streaming mode (which
would be destructive).
Some refactoring has been done so we don't have to recalculate the
register offsets twice.
Testing for this will come in a later patch.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D159503
Note: This requires later commits for ZA to function properly,
it is split for ease of review. Testing is also in a later patch.
The "Matrix" part of the Scalable Matrix Extension is a new register
"ZA". You can think of this as a square matrix made up of scalable rows,
where each row is one scalable vector long. However it is not made
of the existing scalable vector registers, it is its own register.
Meaning that the size of ZA is the vector length in bytes * the vector
length in bytes.
https://developer.arm.com/documentation/ddi0616/latest/
It uses the streaming vector length, even when streaming mode itself
is not active. For this reason, it's register data header always
includes the streaming vector length.
Due to it's size I've changed kMaxRegisterByteSize to the maximum
possible ZA size and kTypicalRegisterByteSize will be the maximum
possible scalable vector size. Therefore ZA transactions will cause heap
allocations, and non ZA registers will perform exactly as before.
ZA can be enabled and disabled independently of streaming mode. The way
this works in ptrace is different to SVE versus streaming SVE. Writing
NT_ARM_ZA without register data disables ZA, writing NT_ARM_ZA with
register data enables ZA (LLDB will only support the latter, and only
because it's convenient for us to do so).
https://kernel.org/doc/html/v6.2/arm64/sme.html
LLDB does not handle registers that can appear and dissappear at
runtime. Rather than add complexity to implement that, LLDB will
show a block of 0s when ZA is disabled.
The alternative is not only updating the vector lengths every stop,
but every register definition. It's possible but I'm not sure it's worth
pursuing.
Users should refer to the SVCR register (added in later patches)
for the final word on whether ZA is active or not.
Writing to "VG" during streaming mode will change the size of the
streaming sve registers and ZA. LLDB will not attempt to preserve
register values in this case, we'll just read back the undefined
content the kernel shows. This is in line with, as stated, the
kernel ABIs and the prospective software ABIs look like.
ZA is defined as a vector of size SVL*SVL, so the display in lldb
is very basic. A giant block of values. This is no worse than SVE,
just larger. There is scope to improve this but that can wait
until we see some use cases.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D159502
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.
This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
If there are copy instructions between uaddlv with v4i16/v8i16 and dup
for transfer from gpr to fpr, try to remove them with duplane. It is a
follow-up patch of https://reviews.llvm.org/D159267
Running:
$ clang-format -i $(find -regex "\./lldb/.*\.c") $(find -regex
"\./lldb/.*\.cpp") $(find -regex "\./lldb/.*\.h")
Resulted in:
1602 files changed, 25090 insertions(+), 25849 deletions(-)
(note: this includes tests which we wouldn't format, just using this as
an example)
The vast majority of which were whitespace changes. So as far as
formatting we're not deviating from llvm for any reason other than not
churning old code.
Formatting aside, the major features of lldb (single line if, early
return) are all reflected in llvm's style. We differ mainly on variable
naming (proposed to change in
https://llvm.org/docs/Proposals/VariableNames.html anyway) and use of
asserts. Which was already documented.
In line with #66515, change `MutableArrayRange::begin`/`end` to
enumerate `OpOperand &` instead of `Value`. Also remove
`ForOp::getIterOpOperands`/`setIterArg`, which are now redundant.
Note: `MutableOperandRange` cannot be made a derived class of
`indexed_accessor_range_base` (like `OperandRange`), because
`MutableOperandRange::assign` can change the number of operands in the
range.
This removes `CreateMalloc` from `CallInst` and adds it to the `IRBuilderBase`
class.
We no longer needed the `Instruction *InsertBefore` and
`BasicBlock *InsertAtEnd` arguments of the `createMalloc` helper
function because we're using `IRBuilder` now. That's why I we also don't
need 4 `CreateMalloc` functions, but only two.
Differential Revision: https://reviews.llvm.org/D158861
There are some issues in `llvm-libgcc` before this patch:
Commit c5a20b518203613497fa864867fc232648006068 ([llvm-libgcc] initial
commit)
uses `$<TARGET_OBJECTS:unwind_static>` to get libunwind objects, which
is empty.
The built library is actually a shared version of libclang_rt.builtins.
When configuring with `llvm/CMakeLists.txt`, target `llvm-libgcc`
requires a
corresponding target in `llvm-libgcc/CMakeLists.txt`.
Per target installation is not handled by `llvm-libgcc`, which is not
consistent
with `libunwind`.
This patch fixes those issues by:
Reusing target `unwind_shared` in `libunwind`, linking
`compiler-rt.builtins`
objects into it, and applying version script.
Adding target `llvm-libgcc`, creating symlinks, and utilizing cmake's
dependency
and component mechanism to ensure link targets will be built and
installed along
with symlinks.
Mimicking `libunwind` to handle per target installation.
It is quite hard to set necessary options without further modifying the
order of
runtime projects in `runtimes/CMakeLists.txt`. So though this patch
reveals the
possibility of co-existence of `llvm-libgcc` and
`compiler-rt`/`libunwind` in
`LLVM_ENABLE_RUNTIMES`, we still inhibit it to minimize influence on
other
projects, considering that `llvm-libgcc` is only intended for toolchain
vendors,
and not for casual use.
The CHECK2 test in code_placement_ext_tsp_large.ll now has the same
result as
the CHECK test: when chain(0,2,3,4,1) is merged with chain(8), the
result is now
chain(0,2,3,4,8,1).
Ideally we should have test coverage for
-ext-tsp-chain-split-threshold=1, but
it seems challenging to craft one. Perhaps the default value of
-ext-tsp-chain-split-threshold can be decreased as the
-ext-tsp-enable-chain-split-along-jumps heuristic is now more powerful.
* Moves several orphaned methods from Operation/OpView -> _OperationBase
so that both hierarchies share them (whether unknown or known to ODS).
* Adds typing information for missing `MLIRError` exception.
* Adds `DiagnosticInfo` typing.
* Adds `DenseResourceElementsAttr` typing that was missing.
When doing a clean build from vscode, it makes the subdirectories in the
source tree rather than in the build folder. Elsehwere in LLVM, they
prefix the MAKE_DIRECTORY calls, so this appears to be the correct
approach.
After commit cedf2ea, `RISCVMergeBaseOffset` can handle `BlockAddress`
currently. But we didn't handle it in `PrintAsmMemoryOperand` so we
get `invalid operand in inline asm` error.
This patch fixes the error.
D159460 regressed the bugfix in D156644. Fix that and emit a warning.
Add a test case.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159529
Enable usage where capturing AsmState is good (e.g., avoiding creating AsmState over and over again when walking IR and printing).
This also only changes one C API to verify plumbing. But using the AsmState makes the cost more explicit than the flags interface (which hides the traversals and construction here) and also enables a more efficient usage C side.
When inserting prolgue/epilogue, we use the spimm of CM.PUSH/POP to
reduce the following offset for sp. Previously, we tried to use the free
space of the push stack to minimize the following sp-offset. But it's
useless, since the free space must be less than 16 and required stack should
be aligned to 16 before/after the adjustment.
There is an issue: https://github.com/llvm/llvm-project/issues/64612
This issue happens because in RISCVMCCodeEmitter::getImmOpValue it only handles MCExpr kind Target and SymbolRef.
When code with format like .+ comes in, it comes with MCExpr kind Binary, the fixupkind remains fixup_riscv_invalid and reports error.
This patch make MCExpr kind Binary handled with the same way as MCExpr kind SymbolRef,
so code with binary expression can get correct fixupkind and be used to generate more complex relocation.
Differential Revision: https://reviews.llvm.org/D157694
Watchpoints in lldb can be either 'read', 'write', or 'read/write'. This
is exposing the actual behavior of hardware watchpoints. gdb has a
different behavior: a "write" type watchpoint only stops when the
watched memory region *changes*.
A user is using a watchpoint for one of three reasons:
1. Want to find what is changing/corrupting this memory.
2. Want to find what is writing to this memory.
3. Want to find what is reading from this memory.
I believe (1) is the most common use case for watchpoints, and it
currently can't be done in lldb -- the user needs to continue every time
the same value is written to the watched-memory manually. I think gdb's
behavior is the correct one. There are some use cases where a developer
wants to find every function that writes/reads to/from a memory region,
regardless of value, I want to still allow that functionality.
This is also a bit of groundwork for my large watchpoint support
proposal
https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116
where I will be adding support for AArch64 MASK watchpoints which watch
power-of-2 memory regions. A user might ask to watch 24 bytes, and a
MASK watchpoint stub can do this with a 32-byte MASK watchpoint if it is
properly aligned. And we need to ignore writes to the final 8 bytes of
that watched region, and not show those hits to the user.
This patch adds a new 'modify' watchpoint type and it is the default.
rdar://108234227
clang was crashing on a lambda attribute with a statement expression
that contained a `return`.
It attempted to access the lambda type which was unknown at that point.
Fixes https://github.com/llvm/llvm-project/issues/48527
Given a function like the following: https://godbolt.org/z/T9c99fr88
```c
1161_noReadWrite(int *Preds) {
for (int i = 0; i < LEN_1D-1; ++i) {
if (Preds[i] != 0)
b[i] = c[i] + 1;
else
a[i] = i * i;
}
}
```
LLVM will optimize the IR to a single store by a phi instruction:
```llvm
%1 = load ptr, ptr @a, align 64
%2 = load ptr, ptr @b, align 64
...
for.inc:
%.sink = phi ptr [ %1, %if.then ], [ %2, %if.else ]
%add.sink = phi double [ %add, %if.then ], [ %conv8, %if.else ]
%arrayidx7 = getelementptr inbounds double, ptr %.sink, i64 %indvars.iv
store double %add.sink, ptr %arrayidx7, align 8
```
LAA is currently unable to analyze such IR, since ScalarEvolution will
return a SCEVUnknown for the forked pointer operand of the store.
This patch adds initial optional support for analyzing both
possibilities for the pointer and allowing LAA to generate runtime
checks for the bounds if required, refers to D108699, but here address
the phi node.
Fixes https://github.com/llvm/llvm-project/issues/64888
Reviewed By: huntergr-arm, fhahn
Differential Revision: https://reviews.llvm.org/D158965
Bufferization of tensor.reshape generates a memref.reshape operation.
memref.reshape requires the source memref to have an identity layout.
The bufferization process may result in the source memref having a
non-identity layout, resulting in a verification failure.
This change causes the bufferization interface for tensor.reshape to
copy the source memref to a new buffer when the source has a
non-identity layout.