This removes another massive source of redundancy, and instead has the Merger.{h,cpp} reuse the SparseTensorEnums library.
Depends On D136005
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D136123
The `SimplifyExtractStridedMetadata` pass features a pattern that forward
statically known information (offset, sizes, strides) to their respective
users.
This patch moves this pattern from this pass to the
`extract_strided_metadata` folding patterns.
Differential Revision: https://reviews.llvm.org/D135797
Move the SparseTensorEnums library out of the ExecutionEngine directory and into Dialect/SparseTensor/IR.
Depends On D136002
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D136005
Currently, we parse lines inside of a compiler `#pragma` the same way we
parse any other line. This is fine for some cases, like separating
expressions and adding proper spacing, but in others it causes some poor
results from miscategorizing some tokens.
For example, the OpenMP offloading uses certain clauses that contain
special characters like `map(tofrom : A[0:N])`. This will be formatted
poorly as it will be split between lines on the first colon.
Additionally the subscript notation will lead to poor spacing. This can
be seen in the OpenMP tests as the automatic clang formatting with
inevitably ruin the formatting.
For example, the following contrived example will be formatted poorly.
```
#pragma omp target teams distribute collapse(2) map(to: A[0 : M * K]) \
map(to: B[0:K * N]) map(tofrom:C[0:M*N]) firstprivate(Alpha) \
firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y) \
firstprivate(E) firstprivate(Z) firstprivate(F)
```
This results in this when formatted, which is far from ideal.
```
#pragma omp target teams distribute collapse(2) map(to \
: A [0:M * K]) \
map(to \
: B [0:K * N]) map(tofrom \
: C [0:M * N]) firstprivate(Alpha) \
firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y) \
firstprivate(E) firstprivate(Z) firstprivate(F)
```
This patch seeks to improve this by adding extra logic where the parsing goes
awry. This is primarily caused by the colon being parsed as an inline-asm
directive and the brackes an objective-C expressions. Also the line gets
indented every single time the line is dropped.
This doesn't implement true parsing handling for OpenMP statements.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D136100
This patch changes the kernels generated by OpenMP to have protected
visibility. This is unlikely to change anything functionally. However,
protected visibility better matches the behaviour of these GPU kernels.
We do not expect any pending shared library load to preempt these
kernels so we can specify a more restrictive visibility.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136198
This matches ld64's behavior.
I also extended the icf-stabs.s test to demonstrate that even though
folded symbols have size zero, we cannot use the size-zero property in
lieu of `wasIdenticalCodeFolded`, because size zero symbols should still
get STABS entries.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D136001
Symbols occur at non-zero offsets in a subsection if they are
`.alt_entry` symbols, or if `.subsections_via_symbols` is omitted.
It doesn't seem like ld64 supports folding those subsections either.
Moreover, supporting this it makes `foldIdentical` a lot more
complicated to implement. The existing implementation has some
questionable behavior around STABS omission -- if a section with an
non-zero offset symbol was folded into one without, we would omit the
STABS entry for the non-zero offset symbol.
I will be following up with a diff that makes `foldIdentical` zero out
the symbol sizes for folded symbols. Again, this is much easier to
implement if we don't have to worry about non-zero offsets.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D136000
This differential splits the SparseTensorEnums library out from the SparseTensorRuntime library. The actual moving of files will be handled in the next differential.
Depends On D135996
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D136002
builds SSA cycle for compress insertion loop
adds casting on index mismatch during push_back
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D136186
The JSON dumper is very minimalistic. It pretty much only shows the
delimiting instruction IDs of every segment, so that further queries to
the SBCursor can be used to make sense of the data. It's main purpose is
to be serialized somewhat cheaply.
I also renamed untracedSegment to untracedPrefixSegment, in case in the
future we add an untracedSuffixSegment. In any case, this new name is
more explicit, which I like.
Differential Revision: https://reviews.llvm.org/D136034
This diff implements the reconstruction algorithm for the call tree and
add tests.
See TraceDumper.h for documentation and explanations.
One important detail is that the tree objects are in TraceDumper, even
though Trace.h is a better home. I'm leaving that as future work.
Another detail is that this code is as slow as dumping the entire
symolicated trace, which is not that bad tbh. The reason is that we use
symbols throughout the algorithm and we are not being careful about
memory and speed. This is also another area for future improvement.
Lastly, I made sure that incomplete traces work, i.e. you start tracing
very deep in the stack or failures randomly appear in the trace.
Differential Revision: https://reviews.llvm.org/D135917
The command is thread trace dump function-calls and as minimum will
require printing to a file in json and non-json format
I added a test
Differential Revision: https://reviews.llvm.org/D135521
Always use non-symbolizing disassembler for instruction encoding
validation as symbols will be treated as undefined/zeros be the encoder
and causing byte sequence mismatches.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D136118
When CurrentLoadedOffset is less than TotalSize, current code will
trigger unsigned overflow and will not return an "allocation failed"
indicator.
Google ref: b/248613299
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D135192
This differential replaces all uses of SparseTensorEncodingAttr::DimLevelType with DimLevelType. The next differential will break out a separate library for the DimLevelType enum, so that the Dialect code doesn't need to depend on the rest of the runtime
Depends On D135995
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D135996
This patch is to fix issue related to __stage2_float_loop function, float point value comparison is not working on EBCDIC mode because the mask is hard-coded and assumes character is ASCII, fix is to use toupper function when do the comparison.
Differential Revision: https://reviews.llvm.org/D118930
Prior to this patch we were wrongly applying the sub-strides to the
computation of the final offset of the subview.
Put differently, we were computing the offset as:
```
offset = baseOffset + sum(subOffset#i * baseStrides#i * subSizes#i)
```
Whereas we should be doing:
```
offset = baseOffset + sum(subOffset#i * baseStrides#i)
```
I.e., drop the subSizes#i term from the sum.
Differential Revision: https://reviews.llvm.org/D136107
This change is to make way for reusing the DimLevelType enum in lieu of the SparseTensorEncodingAttr::DimLevelType enum, but broken out to make it quick and easy to review
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D135995
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
cost model for extractelement instructions.
cpu2017
511.povray_r 0.57
520.omnetpp_r -0.98
521.wrf_r -0.01
525.x264_r 3.59 <+
526.blender_r -0.12
531.deepsjeng_r -0.07
538.imagick_r -1.42
Geometric mean: 0.21
Differential Revision: https://reviews.llvm.org/D115757
Representing this as 12 separate operations is a bit ugly, but
trying to represent the different modes using a bitfield seemed worse.
Differential Revision: https://reviews.llvm.org/D135417
Partially implements:
- P1361 Integration of chrono with text formatting
- P2372 Fixing locale handling in chrono formatters
- LWG3270 Parsing and formatting %j with durations
Completes:
- P1650R0 std::chrono::days with 'd' suffix
- LWG3262 Formatting of negative durations is not specified
- LWG3314 Is stream insertion behavior locale dependent when Period::type is micro?
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D134742
Type descriptor and its binding table are defined as fir.global in FIR.
Their names are derived from the derived-type name. This patch adds a new
function `getTypeDescriptorBindingTableName` in the NameUniquer and
refactor the `GetTypeDescriptorName` function to reuse the same code.
This will be used in the fir.dispatch code generation.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D136167
The test currently sets `USE_LIBSTDCPP = 0`, which is curious given the
behavior of `and` and `or` in Makefiles (the contents of the variables
are not important). In particular, this causes the tests to not use the
standard libraries appropriately.
To capture the actual intent of the test, we're changing this to
`USE_LIBCXX=1`.
Differential Revision: https://reviews.llvm.org/D136171
This test requires compiling its input program without debug
information. To do so, it uses certain Makefile variables that are never
populated with custom libcxx paths (if present). Doing so would not
necessarily be correct: we cannot guarantee that said standard library
has no debug symbols.
As such, we keep using the system libraries but disable the tests in
clang versions that are too old to work with more modern system
libraries, as in the case of the lldb-matrix bot.
Differential Revision: https://reviews.llvm.org/D136178
Add the atomic subroutine, atomic_cas, to the list of intrinsic
subroutines and check one of its arguments for a coindexed object.
Create a new function, CheckAtomicKind, that will be used for the
atomic subroutines that have arguments that can be either of type
int and of kind atomic_int_kind or of type logical and of kind
atomic_logical_kind.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D135835
This change is intended to resolve build issues reported in D134503.
A compiler supporting __float128 must define either __FLOAT128__ or
__SIZEOF_FLOAT128__ (or both). Additional check for _LIBCPP_VERSION
was added to disable __float128 for builds with libc++, because
__float128 support is incomplete there.
Differential Revision: https://reviews.llvm.org/D136121
DXContainer files have a handful of sections that need to be written.
This adds a pass to write the section data into IR globals, and writes
the shader flag data into a global.
The test cases here verify that the shader flags are correctly written
from the IR into the global and emitted to the DXContainer.
This change also fixes a bug in the MCDXContainerWriter, where the size
of the dxbc::ProgramHeader was not being included in the part offset
calcuations. This is verified to be working by the new testcases where
obj2yaml can properly dump part data for parts after the DXIL part.
Resolves issue #57742 (https://github.com/llvm/llvm-project/issues/57742)
Reviewed By: python3kgae
Differential Revision: https://reviews.llvm.org/D135793