When 'ContinueOnCuIndexOverflow' enables without this modification, the
forcibly generated '.dwp' won't be recognized by Debugger(gdb 10.1)
correctly.
<img width="657" alt="image"
src="https://github.com/llvm/llvm-project/assets/150100070/31732775-2548-453a-a47a-fa04c6d05fe9">
it looks like there's a problem with processing the dwarf header, which
makes debugging completely impossible. (unless the consumer walks the debug_info section to rebuild that column (if that's the only section that overflowed - if it's another section, there's no recovery))
**About patch:**
When llvm-dwp enables option '--continue-on-cu-index-overflow=soft-stop'
and recognizes the overflow problem , it will stop packing and generate
the '.dwp' file at once, discarding any DWO files that would not fit
within the 32 bit/4GB limits of the format.
<img width="625" alt="image"
src="https://github.com/llvm/llvm-project/assets/150100070/77d6be24-262b-4f4c-afc0-9a6c49e133c7">
Migrated from: https://reviews.llvm.org/D155879, with some of the
suggestions applied.
PR Description copied from above:
Currently asan simply exports each overridden new/delete function from
the DLL, this works fine normally, but fails if the user is overriding
some, but not all, of these functions. In this case the non-overridden
functions still come from the asan DLL, but they can't correctly call
the user provided override (for example sized op delete should fall back
to scalar op delete, if a scalar op delete is provided). Things were
also broken in the static build because all the asan overrides were
exported from the same TU, and so if you overrode one but not all of
them then you'd get ODR violations. This PR should fix both of these
cases, but the static case isn't really tested (and indeed one such test
does fail) because linking asan statically basically doesn't work on
windows right now with LLVM's version of asan. In fact, while we did fix
this in our fork, it was a huge mess and we've now made the dynamic
version work in all situations (/MD, /MT, /MDd, /MTd, etc) instead.
The following is the description from the internal PR that implemented
most of this feature.
> Previously, operator new/delete were provided as DLL exports when
linking dynamically and wholearchived when linked statically. Both
scenarios were broken. When linking statically, the user could not
define their own op new/delete, because they were already brought into
the link by ASAN. When dynamically linking, if the user provided some
but not all of the overloads, new and delete would be partially hooked.
For example, if the user defined scalar op delete, but the program then
called sized op delete, the sized op delete would still be the version
provided by ASAN instead of falling back to the user-defined scalar op
delete, like the standard requires.
> The change <internal PR number>: ASAN operator new/delete fallbacks in
the ASAN libraries fixes this moving all operator new/delete definitions
to be statically linked. However, this still won't work if
/InferAsanLibs still whole-archives everything since then all the op
new/deletes would always be provided by ASAN, which is why these changes
are necessary.
> With these changes, we will no longer wholearchive all of ASAN and
will leave the c++ parts (the op new/delete definitions) to be included
as a default library. However, it is also necessary to ensure that the
asan library with op new/delete will be searched before the
corresponding CRT library with the same op new/delete definitions. To
accomplish this, we make sure to add the asan library to the beginning
of the default lib list, or move it explicitly to the front if it's
already in the list. If the C runtime library is explicitly provided, we
make sure to warn the user if the current linker line will result in
operator new/delete not being provided by ASAN.
Note that the rearrangement of defaultlibs is not in this diff.
---------
Co-authored-by: Charlie Barto <Charles.Barto@microsoft.com>
Summary:
This pointer has been causing issues. Allocating and reading from coarse
memory on the CPU is not guaranteed and varies depending on the kernel
version and support. Previously we attempted to pin the memory but this
caused unexpected failures. This should be a legal operation and work
around the problem as fine-grained memory should be always legal to
write to by both sides.
A large amount of complexity when it comes to shuffling DPValue objects
around is pushed into BasicBlock::spliceDebugInfo, and it gets
comprehensive testing there via the unit tests. It turns out that there's a
corner case though: splicing instructions and debug-info to the end()
iterator requires blocks of DPValues to be concatenated, but the DPValues
don't behave normally as they're dangling at the end of a block. While this
splicing-to-an-empty-block case is rare, and it's even rarer for it to
contain debug-info, it does happen occasionally.
Fix this by wrapping spliceDebugInfo with an outer layer that removes any
dangling DPValues in the destination block -- that way the main splicing
function (renamed to spliceDebugInfoImpl) doesn't need to worry about that
scenario. See the diagram in the added function for more info.
A zero-length StringRef can have a null data pointer, which, if passed to the llvm_regex functions which take a pointer+length, but then convert it into a [begin, end) pointer pair can cause a nullptr+0 expression to be evaluated, which is UB. So avoid that by ensuring the data pointer is always non-null, even in the zero-length case.
Currently we emit gathers for scalars being vectorized in the tree as
a pair of extractelement/insertelement instructions. Instead we can try
to find all required vectors and emit shuffle vector instructions
directly, improving the code and reducing compile time.
Part of non-power-of-2 vectorization.
Differential Revision: https://reviews.llvm.org/D110978
Summary:
This caused the bots to begin failing. Revert for now to get the bot
green.
This reverts commit 8bea804923a1b028e86b177caccb3258708ca01c.
This reverts commit e1395c7bdbe74b632ba7fbd90e2be2b4d82ee09e.
This fixes a mis-link when mixing compressed and non-compressed input to
LLD. When relaxing calls, we must respect the source file that the
section came from when deciding whether it's legal to use compressed
instructions. If the call in question comes from a non-rvc source, then it will not
expect 2-byte alignments and cascading failures may result.
This fixes https://github.com/llvm/llvm-project/issues/63964. The symptom
seen there is that a latter RISCV_ALIGN can't be satisfied and we either
fail an assert or produce a totally bogus link result. (It can be easily
reproduced by putting .p2align 5 right before the nop in the reduced
test case and running check-lld on an assertions enabled build.) However,
it's important to note this is just one possible symptom of the problem.
If the resulting binary has a runtime switch between rvc and non-rvc
routines (via e.g. ifuncs), then even if we manage to link we may execute invalid
instructions on a machine which doesn't implement compressed instructions.
When we'd originally added unaligned-scalar-mem and
unaligned-vector-mem, they were separated into two parts under the
theory that some processor might implement one, but not the other. At
the moment, we don't have evidence of such a processor. The C/C++ level
interface, and the clang driver command lines have settled on a single
unaligned flag which indicates both scalar and vector support unaligned.
Given that, let's remove the test matrix complexity for a set of
configurations which don't appear useful.
Given these are internal feature names, I don't think we need to provide
any forward compatibility. Anyone disagree?
Note: The immediate trigger for this patch was finding another case
where the unaligned-vector-mem wasn't being properly serialized to IR
from clang which resulted in problems reproducing assembly from clang's
-emit-llvm feature. Instead of fixing this, I decided getting rid of the
complexity was the better approach.
This basically moves code around again, but this time to provide cleaner
interfaces and remove duplication. PluginAdaptorManagerTy is almost all
gone after this.
Summary:
This may be problematic to pin a stack pointer. Allocate it via the OS
allocator instead as the documentation suggests.
For some reason, if you attempt to free this pointer after the memory
region has been unlocked, it will return an invalid pointer.
Following from https://github.com/llvm/llvm-project/pull/73372:
Fuchsia targets currently don't support `float128`. Add detection for
`LIBC_TARGET_OS_IS_FUCHSIA`, and exclude this OS from setting
`LIBC_COMPILER_HAS_FLOAT128_EXTENSION`.
At the moment the logic to tile and vectorize `linalg.matmul` is
duplicated in multiple test files:
* matmul.mlir
* matmul_mixed_ty.mlir
Instead, this patch uses `transform.foreach` to apply the same sequence
to multiple functions within the same test file (e.g. `matmul_f32` and
`matmul_mixed_ty` as defined in the original files). This allows us to
merge relevant test files.
This reverts commit 4bf8a688956a759b7b6b8d94f42d25c13c7af130.
This commit seems to be breaking the semantics of the
ObjectFile::isSectionText method, which breaks numba/llvmlite bindings.
Follow up on 9468de4 (TargetInstrInfo: make getOperandLatency return
optional (NFC)) to squelch a signedness warning on MSVC, reported by
Simon Pilgrim.
As part of https://reviews.llvm.org/D154130 the logic of
LocationFileChecker changed slightly to try and get the absolute
external file path instead of the name as requested when the file was
openened which would be before VFS mappings in our usage. Ensure that we
only check against the name as requested instead of trying to generate
the external canonical file path.
rdar://115195433
When we lower calls, the sequence of argument copy-to-reg nodes are
glued to the smstart. In the InstrEmitter, these glued copies are turned
into implicit defs, since the actual call instruction uses those
physregs, resulting in the register allocator adding unnecessary copies
of regs that are preserved anyway.
The lowering of tosa.conv2d produces an illegal tensor.empty operation
where the number of inputs do not match the number of dynamic dimensions
in the output type.
The fix is to base the generation of tensor.dim operations off the
result type of the conv2d operation, rather than the input type. The
problem and fix are very similar to this fix
https://github.com/llvm/llvm-project/pull/72724
but for convolution.
The current lowering of tosa.fully_connected produces a linalg.matmul
followed by a linalg.generic to add the bias. The IR looks like the
following:
%init = tensor.empty()
%zero = linalg.fill ins(0 : f32) outs(%init)
%prod = linalg.matmul ins(%A, %B) outs(%zero)
// Add the bias
%initB = tensor.empty()
%result = linalg.generic ins(%prod, %bias) outs(%initB) {
// add bias and product
}
This has two down sides:
1. The tensor.empty operations typically result in additional
allocations after bufferization
2. There is a redundant traversal of the data to add the bias to the
matrix product.
This extra work can be avoided by leveraging the out-param of
linalg.matmul. The new IR sequence is:
%init = tensor.empty()
%broadcast = linalg.broadcast ins(%bias) outs(%init)
%prod = linalg.matmul ins(%A, %B) outs(%broadcast)
In my experiments, this eliminates one loop and one allocation (post
bufferization) from the generated code.
My initial patch contained a typo, resulting in the wrong value
being checked for non-negativeness.
-----
If the lshr operand is non-negative, we can treat it the same
way as an ashr. Ideally we would represent this as "lshr nneg",
but for now just perform the necessary ValueTracking query.
Proof: https://alive2.llvm.org/ce/z/Ahg4ri
TOSA operators consumed by non-TOSA ops generally do not have their
types inferred, as that would alter the types expected by their
consumers. This prevents type refinement on many TOSA operators when the
IR contains a mix of dialects.
This change modifies tosa-infer-shapes to update the types of all TOSA
operators during inference. When a consumer of that TOSA op is not safe
to update, a tensor.cast is inserted back to the original type. This
behavior is similar to how TOSA ops consumed by func.return are handled.
This allows for more type refinement of TOSA ops, and the additional
tensor.cast operators may be removed by later canonicalizations.
If the lshr operand is non-negative, we can treat it the same
way as an ashr. Ideally we would represent this as "lshr nneg",
but for now just perform the necessary ValueTracking query.
Proof: https://alive2.llvm.org/ce/z/Ahg4ri
…ation
The previous code was technically incorrect in that the type indicated
that the memref only has 1 dimension, while the code below was happily
dereferencing the size array out of bounds. Now, if the compiler doesn't
get too smart about optimizations, this code *might even work*. But, if
the compiler realizes that the array has 1 element it might starrt doing
silly things. This generates a specialization per each supported rank,
making sure we don't do any UB.