The corresponding function definition was removed by:
commit 773d663e4729f55d23cb04f78a9d003643f2cb37
Author: Arthur Eubanks <aeubanks@google.com>
Date: Mon Feb 27 19:00:37 2023 -0800
If a value is already the last element of the worklist, then I think that we don't have to add it again, it is not needed to process it repeatedly.
For some long Triton-generated LLVM IR, this can cause a ~100x speedup.
Differential Revision: https://reviews.llvm.org/D153561
Wrapping a warning into a silenceable failure will result in the warning
being interpreted as an error, which it is not.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D153546
When exiting the scope of a region attached to a transform op, clean up
the handle invalidation checks assocaited with handles defined in this
region. Otherwise, these checks may trigger on the next entry to the
region while there is no incorrect usage.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D153545
When the sign of either of the operands is known, it is possible to
determine what the saturating value will be without having to compute it
using the sign bits.
Differential Revision: https://reviews.llvm.org/D153575
This makes the bytecode reader/writer work on big-endian platforms.
The only problem was related to encoding of multi-byte integers,
where both reader and writer code make implicit assumptions about
endianness of the host platform.
This fixes the current test failures on s390x, and in addition allows
to remove the UNSUPPORTED markers from all other bytecode-related
test cases - they now also all pass on s390x.
Also adding a GFAIL_SKIP to the MultiModuleWithResource unit test,
as this still fails due to an unrelated endian bug regarding
decoding of external resources.
Differential Revision: https://reviews.llvm.org/D153567
Reviewed By: mehdi_amini, jpienaar, rriddle
Use hlfir::loadTrivialScalars to dereference pointer, allocatables, and
load numerical and logical scalars.
This has a small fallout on tests:
- load is done on the HLFIR entity (#0 of hlfir.declare) and not the FIR one (#1). This makes no difference at the FIR level (#1 and #0 only differs to account for assumed and explicit shape lower bounds).
- loadTrivialScalars get rids of allocatable fir.box for monomoprhic scalars
(it is not needed). This exposed a bug in lowering of MERGE with
a polymorphic and a monomorphic argument: when the monomorphic is not
a fir.box, the polymorphic fir.class should not be reboxed but its
address should be read.
Reviewed By: tblah
Differential Revision: https://reviews.llvm.org/D153252
- The AST of the function we're currently analyzing
- The CFG
- The CFG element we're currently processing
Reviewed By: ymandel
Differential Revision: https://reviews.llvm.org/D153549
The declaration was added without a corresponding class definition by:
commit 13bb8f491a1cb429226768cfd4ca6bcea3b938dd
Author: Stella Laurenzo <laurenzo@google.com>
Date: Wed Apr 3 11:16:32 2019 -0700
The declaration was added without a corresponding class definition by:
commit a84064bcda1a737658d33e96ca58516d01af70a6
Author: Florian Hahn <flo@fhahn.com>
Date: Wed Dec 21 22:02:31 2022 +0000
It is most likely a misspelling of PredicatedScalarEvolution.
The declaration was added without a corresponding function by:
commit cc3bb85580189d4a004cfd9bd2d6286cd1c1169f
Author: James Nagurne <j-nagurne@ti.com>
Date: Fri Oct 22 17:08:16 2021 -0500
Instead of dumping all sources into RTXray object library with a weird
special case for x86, handle multiarch builds better. Build a separate
object library for each arch with its arch-specific sources, then link
in all those libraries.
This fixes the build on platforms that produce fat binaries, such as new
macOS which expects both x86_64 and aarch64 objects in the same library
since Apple Silicon is a thing.
This only enables building XRay support for Apple Silicon. It does not
actually work yet on macOS, neither on Intel nor on Apple Silicon CPUs.
Thus the tests are still disabled.
Reviewed By: MaskRay, phosek
Differential Revision: https://reviews.llvm.org/D153221
CMake plumbing cargo culted from other tests.
Minor changes to Process to allow statically allocating a buffer.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D153594
`TargetGlobalTLSAddress` is not considered and handled correctly when matching addressing mode, which leads to an incorrect result of instruction selection.
fixes#63162.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D153103
The Objective-C runtime and the shared cache has changed slightly.
Given a class_ro_t, the baseMethods ivar is now a pointer union and may
either be a method_list_t pointer or a pointer to a relative list of
lists. The entries of this relative list of lists are indexes that refer
to a specific image in the shared cache in addition to a pointer offset
to find the accompanying method_list_t. We have to go over each of these
entries, parse it, and then if the relevant image is loaded in the
process, we add those methods to the relevant clang Decl.
In order to determine if an image is loaded, the Objective-C runtime
exposes a symbol that lets us determine if a particular image is loaded.
We maintain a data structure SharedCacheImageHeaders to keep track of
that information.
There is a known issue where if an image is loaded after we create a
Decl for a class, the Decl will not have the relevant methods from that
image (i.e. for Categories).
rdar://107957209
Differential Revision: https://reviews.llvm.org/D153597
On AArch64, object files may be greater than 2^32 bytes. If an
offset is greater than the max value of a 32-bit unsigned integer,
LLVM silently truncates the offset. Instead, make it return an
error.
Differential Revision: https://reviews.llvm.org/D153494
I tried to give a rough overview of our current pseudo structure. I'm mostly focused on the policy handling bits - since that's what I'm in the process of changing - but touched on the other dimensions in the process of framing it.
Differential Revision: https://reviews.llvm.org/D152937
Order code sections with names in the form of ".text.cold.i" based on the value of i
[Context] SplitFunctions.cpp implements splitting strategies that can potentially split each function into maximum N>2 fragments.
When such N-way splitting happens, new code sections with names ".text.cold.1", ..., ".text.cold.i", ... "text.cold.N-2" will be created
A section with name ".text.cold.i" contains the the (i+2)th fragment of each function.
As an example, if each function is splitted into N=3 fragments: hot, warm, cold, then code sections will now include
- a section with name ".text" containing hot fragments
- a section with name ".text.cold" containing warm fragments
- a section with name ".text.cold.1" containing cold fragments
The order of these new sections in the output binary currently depends on the order in which they are encountered by the emitter.
For example, under N=3-way splitting, if the first function is 2-way splitted into hot and cold and the second function is 3-way splitted into hot, warm, and cold
then the cold fragment is encountered first, resulting in the final section to be in the following order
.text (hot), .text.cold.1 (cold), .text.cold (warm)
The above is suboptimal because the distance of jumps/calls between the hot and the warm sections will be much bigger than when ordering the sections as follows
.text (hot), .text.cold (warm), .text.cold.1 (cold)
This diff orders the sections with names in the form of ".text.cold" or ".text.cold.i" based on the value of i (assuming the i-value of ".text.cold" is 0).
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D152941
When optimizations passes do not change anything, skip their diagnostics
output. NFC otherwise.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D153386
I noticed that in some cases _tolower shows as uninstrumented - I've added it as "functional" in the done_abilist.txt file
Reviewed by: browneee
Differential Revision: https://reviews.llvm.org/D153410
This workaround appears to apply with gold<2.34 -O2/-O3 (linker -O2, not
compiler driver -O2). This used to be more visible as we used -Wl,-O3 in
CMake, but the option is generally not recommended and has been removed
by d63016a86548e8231002a760bbe9eb817cd1eb00 (Dec 2021).
This finishes a workaround removal work started by D64327 (2019).
Link: https://github.com/llvm/llvm-project/issues/45269
In getNVPTXLaneID(CodeGenFunction &), the value of LaneIDBits is 4294967295 since function call llvm::Log2_32(CGF->getTarget()->getGridValue().GV_Warp_Size) might return 4294967295.
unsigned LaneIDBits =
llvm::Log2_32(CGF.getTarget().getGridValue().GV_Warp_Size);
unsigned LaneIDMask = ~0u >> (32u - LaneIDBits);
The shift amount (32U - LaneIDBits) might be 33, So it has undefined behavior for right shifting by more than 31 bits.
This patch adds an assert to guard the LaneIDBits overflow issue with LaneIDMask value.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D151606
Lower multi-dimensional arrays reduction for add and mul operator.
Depends on D153448
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D153455
Lower 1d array reduction for add and mul operator. Multi-dimensional arrays and
other operator will follow.
Reviewed By: razvanlupusoru
Differential Revision: https://reviews.llvm.org/D153448
This patch uses castAs instead of getAs which will assert if the type doesn't match and adds nullptr check if needed.
Also this patch improves the codes and passes I.getData() instead of doing a lookup in dumpVarDefinitionName()
since we're iterating over the same map in LocalVariableMap::dumpContex().
Reviewed By: aaron.ballman, aaronpuchert
Differential Revision: https://reviews.llvm.org/D153033
This patch adds missing assignment operator to the class which has user-defined copy constructor.
Reviewed By: tahonermann, aaronpuchert
Differential Revision: https://reviews.llvm.org/D150931
Sanitizers allocate shadow and memory as MAP_NORESERVE.
User memory can stay this way and do not increase RSS as long as we
don't store there.
The shadow unpoisoning also can avoid RSS increase for zeroed pages.
However as soon we poison the shadow, we need the page in RSS.
To avoid unnececary RSS increase we should not poison memory just before
unpoisoning them.
Depends on D153497.
Reviewed By: thurston
Differential Revision: https://reviews.llvm.org/D153500
When the MCAssembler is non-null and the MCAsmLayout is null, we can fold A-B
when
* A and B are in the same fragment, or
* A's fragment suceeds B's fragment, and they are not separated by non-data fragments (D69411)
This patch allows folding when A's fragment precedes B's fragment so
that `9997b - . == 0` below can be evaluated as true:
```
nop
.arch_extension sec
9997:nop
// old behavior: error: expected absolute expression
.if 9997b - . == 0
.endif
```
Add a case to llvm/test/MC/ARM/directive-if-subtraction.s.
Note: for MCAsmStreamer, we cannot evaluate `.if . - 9997b == 0` at parse
time due to MCAsmStreamer::getAssemblerPtr returning nullptr (D45164).
Some Darwin tests check that this folding does not work. Add `.p2align 2` to
block some label difference folding or adjust the tests.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D153096