command.
We were reading the command output right after sending the interrupt, but
sometimes that wasn't long enough for the command result text to have been emitted.
I added a poll for the state change to eStateExited, and then added a bit more sleep
to give the command a chance to complete.
scheduling, is previously vectorized.
If the main node was vectorized already, but does not require
scheduling, we still can try to vectorize it in this new node instead of
gathering.
This implements proposals from:
- https://github.com/itanium-cxx-abi/cxx-abi/issues/24: mangling for
constraints, requires-clauses, requires-expressions.
- https://github.com/itanium-cxx-abi/cxx-abi/issues/31: requires-clauses and
template parameters in a lambda expression are mangled into the <lambda-sig>.
- https://github.com/itanium-cxx-abi/cxx-abi/issues/47 (STEP 3): mangling for
template argument is prefixed by mangling of template parameter declaration
if it's not "obvious", for example because the template parameter is
constrained (we already implemented STEP 1 and STEP 2).
This changes the manglings for a few cases:
- Functions and function templates with constraints.
- Function templates with template parameters with deduced types:
`typename<auto N> void f();`
- Function templates with template template parameters where the argument has a
different template-head:
`template<template<typename...T>> void f(); f<std::vector>();`
In each case where a mangling changed, the change fixes a mangling collision.
Note that only function templates are affected, not class templates or variable
templates, and only new constructs (template parameters with deduced types,
constrained templates) and esoteric constructs (templates with template
template parameters with non-matching template template arguments, most of
which Clang still does not accept by default due to
`-frelaxed-template-template-args` not being enabled by default), so the risk
to ABI stability from this change is relatively low. Nonetheless,
`-fclang-abi-compat=17` can be used to restore the old manglings for cases
which we could successfully but incorrectly mangle before.
Fixes#48216, #49884, #61273
Reviewed By: erichkeane, #libc_abi
Differential Revision: https://reviews.llvm.org/D147655
Typedef rename were not properly handled when typedef were used behind
pointer, or as a part of function type. Additionally because entire
function were passed as an source-range, when function started with
macro, such change were not marked for a fix.
Removed workaround and used proper TypedefTypeLoc instead.
Fixes#55156, #54699
This patch moves the MBB Profile Dump to ./llvm/test/CodeGen/Generic
from ./llvm/test/CodeGen/MlRegAlloc as the profile dump doesn't have
anything to do with the ML guided register allocation heuristic.
This patch adds initial lowering for DeclareTargetAttr on
GlobalOp's utilising registerTargetGlobalVariable
and getAddrOfDeclareTargetVar from the
OMPIRBuilder.
It also adds initial processing of declare target map
operands, populating the combinedInfo that the
OMPIRBuilder requires to generate kernels and
it's kernel argument structure.
The combination of these additions allows simple mapping
of declare target globals to Target regions, as such a simple
runtime test showcasing this and testing it has been added.
The patch currently does not factor in filtering
based on device_type clauses (e.g. no emission of
globals for device if host specified), this will come in
a future iteration. And for the moment it's only been
tested with 1-D arrays and basic fortran data types,
more complex types (such as user defined derived
types from Fortran, allocatables or Fortran pointers)
may need further work.
reviewers: kiranchandramohan, skatrak
Differential Revision: https://reviews.llvm.org/D149368
Do so by extending `matchUnaryPredicate` to also work for
`ConstantFPSDNode` types then encapsulate the constant checks in a
lambda and pass it to `matchUnaryPredicate`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154868
Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.
This implements:
```
(fmul C, (uitofp Pow2))
-> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
-> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
```
The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.
The patch is intentionally conservative about the transform, only
doing so if we:
1) have IEEE floats
2) C is normal
3) add/sub of max(Log2(Pow2)) stays in the min/max exponent
bounds.
Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D154805
We are currently rejecting an _Atomic qualified integer in a switch statment.
This fixes the issue by doing an Lvalue conversion before trying to match on the type.
Fixes#65557
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D159522
Summary:
This was mistakenly using the opcode for `ferror` which wasn't noticed
because tests using this weren't yet activated. This patch fixes this
mistake.
instruction.
Need to check if the operand scalars are vectorized in the a different
vector node, if the main instruction is already gets vectorized in other
vector node.
The Linalg named ops specification went out of sync with the OpDSL
description, presumably due to direct manual modifications of the yaml
file.
Additionally, the unsigned division has been generating the signed
scalar instruction, which is now fixed.
Add a straightforward sequentialization transform from `scf.forall` to a
nest of `scf.for` in absence of results and expose it as a transform op.
This is helpful in combination with other transform ops, particularly
fusion, that work best on parallel-by-construction `scf.forall` but
later need to target sequential `for` loops.
This change is symmetric with the one reviewed in
<https://reviews.llvm.org/D157452> and handles the exception handling
specific intrinsic, which slipped through the cracks, in the same way,
by inserting an address-space cast iff RTTI is in a non-default AS.
The size of .ARM.exidx may shrink across `assignAddress` calls. It is
possible
that the initial iteration has a larger location counter, causing
`__code_size =
__code_end - .; osec : { . += __code_size; }` to report an error, while
the error would
have been suppressed for subsequent `assignAddress` iterations.
Other sections like .relr.dyn may change sizes across `assignAddress`
calls as
well. However, their initial size is zero, so it is difficiult to
trigger a
similar error.
Similar to https://reviews.llvm.org/D152170, postpone the error
reporting.
Fix#66836. While here, add more information to the error message.
This avoids the need for a mutable member to implement deferred sorting,
and some bespoke code to maintain a SmallVector as a set.
The performance impact seems to be negligible in some small tests, and
so seems acceptable to greatly simplify the code.
An old FIXME and accompanying workaround are dropped. It is ostensibly
dead-code within the current codebase.
Print OpConstant floats as formatted decimal floating points, with
special case exceptions to print infinity and NaN as hexfloats.
This change follows from the fixes in
https://github.com/llvm/llvm-project/pull/66686 to correct how
constant values are printed generally.
Differential Revision: https://reviews.llvm.org/D159376
sync_source_lists_from_cmake.py checks that every unittest in CMake
also exists in the GN build. 4f330b7f75 added VETests, but the GN
build doesn't include the VE target. So add a dummy target for this
to placate the check.
This fixes 46b961f36b.
Prior to the SME changes the steps were:
* Invalidate all registers.
* Update value of VG and use that to reconfigure the registers.
* Invalidate all registers again.
With the changes for SME I removed the initial invalidate thinking
that it didn't make sense to do if we were going to invalidate them
all anyway after reconfiguring.
Well the reason it made sense was that it forced us to get the
latest value of vg which we needed to reconfigure properly.
Not doing so caused a test failure on our Graviton bot which has SVE
(https://lab.llvm.org/buildbot/#/builders/96/builds/45722). It was
flaky and looping it locally would always fail within a few minutes.
Presumably it was using an invalid value of vg, which caused some offsets
to be calculated incorrectly.
To fix this I've invalided vg in AArch64Reconfigure just before we read
it. This is the same as the fix I have in review for SME's svg register.
Pushing this directly to fix the ongoing test failure.
This revision pipes the fastmath attribute support through the
vector.reduction op. This seemingly simple first step already requires
quite some genuflexions, file and builder reorganization. In the
process, retire the boolean reassoc flag deep in the LLVM dialect
builders and just use the fastmath attribute.
During conversions, templated builders for predicated intrinsics are
partially cleaned up. In the future, to finalize the cleanups, one
should consider adding fastmath to the VPIntrinsic ops.
Auto summaries were only being used when non-pointer/reference variables
didn't have values nor summaries. Greg pointed out that it should be
better to simply use auto summaries when the variable doesn't have a
summary of its own, regardless of other conditions.
This led to code simplification and correct visualization of auto
summaries for pointer/reference types, as seen in this screenshot.
<img width="310" alt="Screenshot 2023-09-19 at 7 04 55 PM"
src="https://github.com/llvm/llvm-project/assets/1613874/d356d579-13f2-487b-ae3a-f3443dce778f">
This patch warns when an align directive with a non-zero fill value is
used in a virtual section. The fill value is also set to zero,
preventing an assertion in `MCAssembler::writeSectionData` for the case
of `MCFragment::FT_Align` from tripping.
This extends `vector.constant_mask` so that mask dim sizes that
correspond to a scalable dimension are treated as if they're implicitly
multiplied by vscale. Currently this is limited to mask dim sizes of 0
or the size of the dim/vscale. This allows constant masks to represent
all true and all false scalable masks (and some variations):
```
// All true scalable mask
%mask = vector.constant_mask [8] : vector<[8]xi1>
// All false scalable mask
%mask = vector.constant_mask [0] : vector<[8]xi1>
// First two scalable rows
%mask = vector.constant_mask [2,4] : vector<4x[4]xi1>
```