LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call for a given VF and it returns the
relevant cost. Otherwise, it returns an invalid cost (as previously).
Add test that check costs on AArch64, when there is no vector library
available and when there is (with and without tail-folding).
NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs.
Operators that are overloadable may be parsed as `CXXOperatorCallExpr`
or as `UnaryOperator` (or `BinaryOperator`). This depends on the context
and can be different if a similar construct is imported into an existing
AST. The two "forms" of the operator call AST nodes should be detected
as equivalent to allow AST import of these cases.
This fix has probably other consequences because if a structure is
imported that has `CXXOperatorCallExpr` into an AST with an existing
similar structure that has `UnaryOperator` (or binary), the additional
data in the `CXXOperatorCallExpr` node is lost at the import (because
the existing node will be used). I am not sure if this can cause
problems.
The CFG doesn't contain a CFGElement for the
`CXXDefaultInitExpr::getInit()`, so
it makes sense to consider the `CXXDefaultInitExpr` to be the expression
that
originally constructs the object.
LDA DMA loads increase VMCNT and a load from the LDS stored must wait on
this counter to only read memory after it is written. Wait count
insertion pass does not track memory dependencies, it tracks register
dependencies. To model the LDS dependency a pseudo register is used in
the scoreboard, acting like if LDS DMA writes it and LDS load reads it.
This patch adds 8 more pseudo registers to use for independent LDS
locations if we can prove they are disjoint using alias analysis.
Fixes: SWDEV-433427
Without this gcc warned
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3585:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
3584 | ((&Current == &AccelDebugNames) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3585 | (Unit.getUnitDie().getTag() != dwarf::DW_TAG_type_unit)) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
3586 | "Kind is CU but TU is being processed.");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3589:70: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
3588 | ((&Current == &AccelTypeUnitsDebugNames) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3589 | (Unit.getUnitDie().getTag() == dwarf::DW_TAG_type_unit)) &&
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
3590 | "Kind is TU but CU is being processed.");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This decouples the Arm type attributes from other bits, which means
the data will only be allocated when a function uses these Arm
attributes.
The first patch adds the bit `HasArmTypeAttributes` to
`FunctionTypeBitfields`, which grows from 62 bits to 63 bits.
In the second patch, I've moved this bit (`HasArmTypeAttributes`) to
`FunctionTypeExtraBitfields`, because it looks like the bits in
`FunctionTypeBitfields` are precious and we really don't want that
struct
to grow beyond 64 bits.
I've split this out into two patches to explain the rationale, but those
can be squashed before merging.
This fixes a crash where `path::parent_path` causes an invalid access on
a string upon receiving a path that consists of a single colon.
On Windows machine, with runtime checks enabled build, upon `clang -I:
test.cc` produces:
```
Assertion failed: Index < Length && "Invalid index!", file llvm\include\llvm/ADT/StringRef.h, line 232
...
#6 0x00007ff7816201eb `anonymous namespace'::parent_path_end llvm\lib\Support\Path.cpp:144:0
#7 0x00007ff781620135 llvm::sys::path::parent_path(class llvm::StringRef, enum llvm::sys::path::Style) llvm\lib\Support\Path.cpp:470:0
```
Ideally, we can look for the last colon starting from the last
character, but we can instead start from second to last, and handle
empty paths by abusing `0 - 1 == npos`.
SourceLocExpr that may produce a function name are marked dependent so that the non-instantiated
name of a function does not get evaluated.
In GH78128, the name('s size) is used as
template argument to a `DeclRef` that is not otherwise dependent, and therefore cached and not transformed when the function is
instantiated, leading to 2 different values existing at the same time for the same function.
Fixes#78128
This commit turns on ASan annotations in `std::basic_string` for short
stings (SSO case).
Originally suggested here: https://reviews.llvm.org/D147680
String annotations added here:
https://github.com/llvm/llvm-project/pull/72677
Requires to pass CI without fails:
- https://github.com/llvm/llvm-project/pull/75845
- https://github.com/llvm/llvm-project/pull/75858
Annotating `std::basic_string` with default allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677 but annotations for
short strings (SSO - Short String Optimization) are turned off there.
This commit turns them on. This also removes
`_LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED`, because we do not plan to
support turning on and off short string annotations.
Support in ASan API exists since
dd1b7b797a.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec7a2.
This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.
The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.
If you have any questions, please email:
advenam.tacet@trailofbits.comdisconnect3d@trailofbits.com
Previously there were two ways to override the verbose abort function
which gets called when a hardening assertion is triggered:
- compile-time: define the `_LIBCPP_VERBOSE_ABORT` macro;
- link-time: provide a definition of `__libcpp_verbose_abort` function.
This patch adds a new configure-time approach: the vendor can provide
a path to a custom header file which will get copied into the build by
CMake and included by the library. The header must provide a definition
of the
`_LIBCPP_ASSERTION_HANDLER` macro which is what will get called should
a hardening assertion fail. As of this patch, overriding
`_LIBCPP_VERBOSE_ABORT` will still work, but the previous mechanisms
will be effectively removed in a follow-up patch, making the
configure-time mechanism the sole way of overriding the default handler.
Note that `_LIBCPP_ASSERTION_HANDLER` only gets invoked when a hardening
assertion fails. It does not affect other cases where
`_LIBCPP_VERBOSE_ABORT` is currently used (e.g. when an exception is
thrown in the `-fno-exceptions` mode).
The library provides a default version of the custom header file that
will get used if it's not overridden by the vendor. That allows us to
always test the override mechanism and reduces the difference in
configuration between the pristine version of the library and
a platform-specific version.
This patch was originally introduced in PR #72340, but was reverted due
to a bug on invalid extension combine.
Specifically, we resolve the case in the
https://github.com/llvm/llvm-project/pull/72340#issuecomment-1874810998
```
define <vscale x 1 x i32> @foo(<vscale x 1 x i1> %x, <vscale x 1 x i2> %y) {
%a = zext <vscale x 1 x i1> %x to <vscale x 1 x i32>
%b = zext <vscale x 1 x i1> %y to <vscale x 1 x i32>
%c = add <vscale x 1 x i32> %a, %b
ret <vscale x 1 x i32> %c
}
```
The previous patch didn't check if the semantic of `ISD::ZERO_EXTEND`
and `ISD::ZERO_EXTEND` is equivalent to the `vsext.vf2` or `vzext.vf2`
(not ensuring the SEW condition on widening Vector Arithmetic
Instructions).
Thanks for @topperc pointing out this bug.
## The original description
This PR mainly aims at resolving the below missed-optimization case,
while it could also be considered as an extension of the previous patch
https://reviews.llvm.org/D133739?id=
### Missed-Optimization Case
Compiler Explorer: https://godbolt.org/z/GzWzP7Pfh
### Source Code:
```
define <vscale x 2 x i16> @multiple_users(ptr %x, ptr %y, ptr %z) {
%a = load <vscale x 2 x i8>, ptr %x
%b = load <vscale x 2 x i8>, ptr %y
%b2 = load <vscale x 2 x i8>, ptr %z
%c = sext <vscale x 2 x i8> %a to <vscale x 2 x i16>
%d = sext <vscale x 2 x i8> %b to <vscale x 2 x i16>
%d2 = sext <vscale x 2 x i8> %b2 to <vscale x 2 x i16>
%e = mul <vscale x 2 x i16> %c, %d
%f = add <vscale x 2 x i16> %c, %d2
%g = sub <vscale x 2 x i16> %c, %d2
%h = or <vscale x 2 x i16> %e, %f
%i = or <vscale x 2 x i16> %h, %g
ret <vscale x 2 x i16> %i
}
```
### Before This Patch
```
# %bb.0:
vsetvli a3, zero, e16, mf2, ta, ma
vle8.v v8, (a0)
vle8.v v9, (a1)
vle8.v v10, (a2)
svf2 v11, v8
vsext.vf2 v8, v9
vsext.vf2 v9, v10
vmul.vv v8, v11, v8
vadd.vv v10, v11, v9
vsub.vv v9, v11, v9
vor.vv v8, v8, v10
vor.vv v8, v8, v9
ret
```
### After This Patch
```
# %bb.0:
vsetvli a3, zero, e8, mf4, ta, ma
vle8.v v8, (a0)
vle8.v v9, (a1)
vle8.v v10, (a2)
vwmul.vv v11, v8, v9
vwadd.vv v9, v8, v10
vwsub.vv v12, v8, v10
vsetvli zero, zero, e16, mf2, ta, ma
vor.vv v8, v11, v9
vor.vv v8, v8, v12
ret
```
We can see Add/Sub/Mul are combined with the Sign Extension.
### Relation to the Patch D133739
The patch D133739 introduced an optimization for folding `ADD_VL`/
`SUB_VL` / `MUL_V` with `VSEXT_VL` / `VZEXT_VL`. However, the patch did
not consider the case of non-fixed length vector case, thus this PR
could also be considered as an extension for the D133739.
We currently force users to use a negative contant in the intrinsic
call. Changing it zext would break existing programs, so just sign
extend an argument.
* Split out the lit release job and the documentation build job into
their own workflow files. This makes it possible to manually run these
jobs via workflow_dispatch.
* Improve tag/user validation and ensure it gets run for each release
task.
snprintf interceptors call `format_buffer` with `size==~0ul`, which
may eventually lead to `snprintf(s, n, "Hello world!")` where `s+n`
wraps around. Since glibc 2.37 (https://sourceware.org/PR30441), the
snprintf call does not write the last char. musl snprintf returns -1
with EOVERFLOW when `n > INT_MAX`.
Change `size` to INT_MAX to work with glibc 2.37+ and musl.
snprintf interceptors are not changed. It's user responsibility to not
cause a compatibility issue with libc implementations.
Fix#60678
Ensure intrinsics and auto-upgrades support i16, i32, and i64 for for
`nvvm.{min,max,mulhi,sad}`
- `nvvm.min` and `nvvm.max`: These are auto-upgraded to `select`
instructions but it is still nice to support the 16 bit variants just in
case any generators of IR are still trying to use these intrinsics.
- `nvvm.sad` added both the 16 and 64 bit variants, also marked this
instruction as speculateble. These directly correspond to the PTX
`sad.{u16,s16,u64,s64}` instructions.
- `nvvm.mulhi` added the 16 bit variants. These directly correspond to
the PTX `mul.hi.{s,u}16` instructions.
llvm-project/llvm/lib/TargetParser/AArch64TargetParser.cpp:157:1:
error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type]
}
^
1 error generated.
Makes it clearer what the issue is when hand-written assembly doesn't
follow medium code model assumptions in a medium code model build.
Alternative to #71248 by only hinting on an overflow.
If multiple globals are placed in an explicit section, there's a chance
that the large data threshold will cause the different globals to be
inconsistent in whether they're large or small. Mixing sections with
mismatched large section flags can cause undesirable issues like
increased relocation pressure because there may be 32-bit references to
the section in some TUs, but the section is considered large since input
section flags are unioned and other TUs added the large section flag.
An explicit code model on the global still overrides the decision. We
can do this for globals without any references to them, like what we did
with asan_globals in #74514. If we have some precompiled small code
model files where asan_globals is not considered large mixed with
medium/large code model files, that's ok because the section is
considered large and placed farther. However, overriding the code model
for globals in some TUs but not others and having references to them
from code will still result in the above undesired behavior.
This mitigates a whole class of mismatched large section flag issues
like what #77986 was trying to fix.
This ends up not adding the SHF_X86_64_LARGE section flag on explicit
sections in the medium/large code model. This is ok for the large code
model since all references from large text must use 64-bit relocations
anyway.
It seems we were forgetting to call `checkArgsForPlaceholders` on the
placement arguments of new-expressions in Sema. I don't think that was
intended—at least doing so doesn't seem to break anything—so this pr
adds that.
This also fixes#65053
---------
Co-authored-by: Erich Keane <ekeane@nvidia.com>
Use the template pattern in determining whether to synthesize the
aggregate deduction guide, and update
DeclareImplicitDeductionGuideFromInitList to substitute outer template
arguments.
Fixes#77599
-fandroid-pad-segment is an Android-specific opt-in option that
links in crt_pad_segment.o (beside other crt*.o relocatable files).
crt_pad_segment.o contains a note section, which will be included in the
linker-created PT_NOTE segment. This PT_NOTE tell Bionic that: when
create a map for a PT_LOAD segment, extend the end to cover the gap so
that we will have fewer kernel 'struct vm_area_struct' objects when
page_size < MAXPAGESIZE.
See also https://sourceware.org/bugzilla/show_bug.cgi?id=31076
Link: https://r.android.com/2902180
This adds Hurd toolchain support to Clang's driver in addition to
handling
translating the triple from GCC toolchain-compatible form (x86_64-gnu)
to
the actual triple registered in LLVM (x86_64-pc-hurd-gnu).
If we have a RISCV_ALIGN relocation which can't be satisfied with the
available space provided, report an error rather than silently
continuing with a corrupt state.
For context, https://github.com/llvm/llvm-project/pull/73977 fixes an
LLD bug which can cause this effect, but that's not the only source of
such cases.
Another is our hard-to-fix set of LTO problems. We can have a single
function which was compiled without C in an otherwise entirely C module.
Until we have all of the mapping symbols and related mechanisms
implemented, this case can continue to arise.
I think it's very important from a user interface perspective to have
non-assertion builds report an error in this case. If we don't report an
error here, we can crash the linker (due to the fatal error at the
bottom of the function), or if we're less lucky silently produce a
malformed binary.
There's a couple of known defects with this patch.
First, there's no test case. I don't know how to write a stable test
case for this that doesn't involve hex editing an object file, or
abusing the LTO bug that we hope to fix.
Second, this will report an error on each relax iteration. I explored
trying to report an error only once after relaxation, but ended up
deciding I didn't have the context to implement it safely.
I would be thrilled if someone more knowledgeable of this code wants to
write a better version of this patch, but in the meantime, I believe we
should land this to address the user experience problem described above.
'bind' takes either a string literal, or an 'identifier' representing
the device-side function that this routine is intended to 'bind' to
(that is, to call). However, it seems that the intent is to permit the
'identifier' to reference any function, thus we're implementing this as
an ID expression.
Additionally, while working on this I discovered that the 'routine' ID
expression parsing for C++ wouldn't allow non-static member functions to
be referenced since it expected us to call it, this was fixed as a part
of this patch as the 'bind' support needed it too. A test was added for
routine.