This patch moves the MBB Profile Dump to ./llvm/test/CodeGen/Generic
from ./llvm/test/CodeGen/MlRegAlloc as the profile dump doesn't have
anything to do with the ML guided register allocation heuristic.
This patch adds initial lowering for DeclareTargetAttr on
GlobalOp's utilising registerTargetGlobalVariable
and getAddrOfDeclareTargetVar from the
OMPIRBuilder.
It also adds initial processing of declare target map
operands, populating the combinedInfo that the
OMPIRBuilder requires to generate kernels and
it's kernel argument structure.
The combination of these additions allows simple mapping
of declare target globals to Target regions, as such a simple
runtime test showcasing this and testing it has been added.
The patch currently does not factor in filtering
based on device_type clauses (e.g. no emission of
globals for device if host specified), this will come in
a future iteration. And for the moment it's only been
tested with 1-D arrays and basic fortran data types,
more complex types (such as user defined derived
types from Fortran, allocatables or Fortran pointers)
may need further work.
reviewers: kiranchandramohan, skatrak
Differential Revision: https://reviews.llvm.org/D149368
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154868
Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.
This implements:
```
(fmul C, (uitofp Pow2))
-> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
-> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
```
The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.
The patch is intentionally conservative about the transform, only
doing so if we:
1) have IEEE floats
2) C is normal
3) add/sub of max(Log2(Pow2)) stays in the min/max exponent
bounds.
Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154805
We are currently rejecting an _Atomic qualified integer in a switch statment.
This fixes the issue by doing an Lvalue conversion before trying to match on the type.
Fixes#65557
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D159522
Summary:
This was mistakenly using the opcode for `ferror` which wasn't noticed
because tests using this weren't yet activated. This patch fixes this
mistake.
instruction.
Need to check if the operand scalars are vectorized in the a different
vector node, if the main instruction is already gets vectorized in other
vector node.
The Linalg named ops specification went out of sync with the OpDSL
description, presumably due to direct manual modifications of the yaml
file.
Additionally, the unsigned division has been generating the signed
scalar instruction, which is now fixed.
Add a straightforward sequentialization transform from `scf.forall` to a
nest of `scf.for` in absence of results and expose it as a transform op.
This is helpful in combination with other transform ops, particularly
fusion, that work best on parallel-by-construction `scf.forall` but
later need to target sequential `for` loops.
This change is symmetric with the one reviewed in
<https://reviews.llvm.org/D157452> and handles the exception handling
specific intrinsic, which slipped through the cracks, in the same way,
by inserting an address-space cast iff RTTI is in a non-default AS.
The size of .ARM.exidx may shrink across `assignAddress` calls. It is
possible
that the initial iteration has a larger location counter, causing
`__code_size =
__code_end - .; osec : { . += __code_size; }` to report an error, while
the error would
have been suppressed for subsequent `assignAddress` iterations.
Other sections like .relr.dyn may change sizes across `assignAddress`
calls as
well. However, their initial size is zero, so it is difficiult to
trigger a
similar error.
Similar to https://reviews.llvm.org/D152170, postpone the error
reporting.
Fix#66836. While here, add more information to the error message.
This avoids the need for a mutable member to implement deferred sorting,
and some bespoke code to maintain a SmallVector as a set.
The performance impact seems to be negligible in some small tests, and
so seems acceptable to greatly simplify the code.
An old FIXME and accompanying workaround are dropped. It is ostensibly
dead-code within the current codebase.
Print OpConstant floats as formatted decimal floating points, with
special case exceptions to print infinity and NaN as hexfloats.
This change follows from the fixes in
https://github.com/llvm/llvm-project/pull/66686 to correct how
constant values are printed generally.
Differential Revision: https://reviews.llvm.org/D159376
sync_source_lists_from_cmake.py checks that every unittest in CMake
also exists in the GN build. 4f330b7f75 added VETests, but the GN
build doesn't include the VE target. So add a dummy target for this
to placate the check.
This fixes 46b961f36b.
Prior to the SME changes the steps were:
* Invalidate all registers.
* Update value of VG and use that to reconfigure the registers.
* Invalidate all registers again.
With the changes for SME I removed the initial invalidate thinking
that it didn't make sense to do if we were going to invalidate them
all anyway after reconfiguring.
Well the reason it made sense was that it forced us to get the
latest value of vg which we needed to reconfigure properly.
Not doing so caused a test failure on our Graviton bot which has SVE
(https://lab.llvm.org/buildbot/#/builders/96/builds/45722). It was
flaky and looping it locally would always fail within a few minutes.
Presumably it was using an invalid value of vg, which caused some offsets
to be calculated incorrectly.
To fix this I've invalided vg in AArch64Reconfigure just before we read
it. This is the same as the fix I have in review for SME's svg register.
Pushing this directly to fix the ongoing test failure.
This revision pipes the fastmath attribute support through the
vector.reduction op. This seemingly simple first step already requires
quite some genuflexions, file and builder reorganization. In the
process, retire the boolean reassoc flag deep in the LLVM dialect
builders and just use the fastmath attribute.
During conversions, templated builders for predicated intrinsics are
partially cleaned up. In the future, to finalize the cleanups, one
should consider adding fastmath to the VPIntrinsic ops.
Auto summaries were only being used when non-pointer/reference variables
didn't have values nor summaries. Greg pointed out that it should be
better to simply use auto summaries when the variable doesn't have a
summary of its own, regardless of other conditions.
This led to code simplification and correct visualization of auto
summaries for pointer/reference types, as seen in this screenshot.
<img width="310" alt="Screenshot 2023-09-19 at 7 04 55 PM"
src="https://github.com/llvm/llvm-project/assets/1613874/d356d579-13f2-487b-ae3a-f3443dce778f">
This patch warns when an align directive with a non-zero fill value is
used in a virtual section. The fill value is also set to zero,
preventing an assertion in `MCAssembler::writeSectionData` for the case
of `MCFragment::FT_Align` from tripping.
This extends `vector.constant_mask` so that mask dim sizes that
correspond to a scalable dimension are treated as if they're implicitly
multiplied by vscale. Currently this is limited to mask dim sizes of 0
or the size of the dim/vscale. This allows constant masks to represent
all true and all false scalable masks (and some variations):
```
// All true scalable mask
%mask = vector.constant_mask [8] : vector<[8]xi1>
// All false scalable mask
%mask = vector.constant_mask [0] : vector<[8]xi1>
// First two scalable rows
%mask = vector.constant_mask [2,4] : vector<4x[4]xi1>
```
This change makes callees with the __arm_preserves_za
type attribute comply with the dormant state requirements
when it's caller has the __arm_shared_za type attribute.
Several external SME functions also do not need to lazy
save.
5e67092434/aapcs64/aapcs64.rst (L1381)
Differential Revision: https://reviews.llvm.org/D159186
Previously, the SPIR-V instruction printer was always printing the first
operand of an `OpConstant`'s literal value as one of the fixed operands.
This is incorrect for 64-bit values, where the first operand is actually
the value's lower-order word and should be combined with the following
higher-order word before printing.
This change fixes that issue by waiting to print the last fixed operand
of `OpConstant` instructions until the variadic operands are ready to be
printed, then using `NumFixedOps - 1` as the starting operand index for
the literal value operands.
Depends on D156049
Matrix has been templated to Matrix (for MPInt and Fraction) with
explicit instantiation for both these types.
IntMatrix, inheriting from Matrix<MPInt>, has been created to allow for
integer-only methods.
makeMatrix has been duplicated to makeIntMatrix and makeFracMatrix.
This was already landed previously but was reverted in
98c994c8e2 due to build failure. This
fixes the failure.
Previously, post-visit state changes were indistinguishable from
ordinary
iterations, which could give a confusing picture of how many iterations
a block
needs to converge.
Now, post-visit state changes are marked with "post-visit" instead of an
iteration number:
![screenshot](https://github.com/llvm/llvm-project/assets/29098113/5e9553d6-dfaa-45d3-8ea4-e623a14ee4c5))