399585 Commits

Author SHA1 Message Date
Nico Weber
c828b93fb3 [gn build] (manually) port f8b1cc365786 2021-09-22 08:20:12 -04:00
Florian Hahn
a7c6471a85
[Passes] Run vector-combine early with -fenable-matrix.
IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

    using matrix_t = double __attribute__((matrix_type(15, 15)));

    void foo(unsigned i, matrix_t &A, matrix_t &B) {
      for (unsigned j = 0; j < 4; ++j)
        for (unsigned k = 0; k < i; k++)
          B[k][j] -= A[k][j] * B[i][j];
    }

https://clang.godbolt.org/z/6dKxK1Ed7

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D102496
2021-09-22 12:48:32 +01:00
Sanjay Patel
c6013f71a4 Revert "[InstCombine] fold cast of right-shift if high bits are not demanded"
This reverts commit 2f6b07316f560a1f6d225919019dff2e5d6346e5.

This caused several bots to hit an infinite loop at stage 2,
so it needs to be reverted while figuring out how to fix that.
2021-09-22 07:45:21 -04:00
Sanjay Patel
1ee851c585 Revert "[CodeGen] regenerate test checks; NFC"
This reverts commit 52832cd917af00e2b9c6a9d1476ba79754dcabff.
The motivating commit 2f6b07316f5 caused several bots to hit
an infinite loop at stage 2, so that needs to be reverted too
while figuring out how to fix that.
2021-09-22 07:45:21 -04:00
Florian Hahn
ea21d688dc
[Matrix] Emit assumption that matrix indices are valid.
The matrix extension requires the indices for matrix subscript
expression to be valid and it is UB otherwise.

extract/insertelement produce poison if the index is invalid, which
limits the optimizer to not be bale to scalarize load/extract pairs for
example, which causes very suboptimal code to be generated when using
matrix subscript expressions with variable indices for large matrixes.

This patch updates IRGen to emit assumes to for index expression to
convey the information that the index must be valid.

This also adjusts the order in which operations are emitted slightly, so
indices & assumes are added before the load of the matrix value.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D102478
2021-09-22 12:27:37 +01:00
Martin Storsjö
9f34f75ff8 [lldb] [Windows] Fix continuing from breakpoints and singlestepping on ARM/AArch64
Based on suggestions by Eric Youngdale.

This fixes https://llvm.org/PR51673.

Differential Revision: https://reviews.llvm.org/D109777
2021-09-22 14:11:41 +03:00
David Green
02cd8a6b91 [ARM] Allow smaller VMOVL in tail predicated loops
This allows VMOVL in tail predicated loops so long as the the vector
size the VMOVL is extending into is less than or equal to the size of
the VCTP in the tail predicated loop. These cases represent a
sign-extend-inreg (or zero-extend-inreg), which needn't block tail
predication as in https://godbolt.org/z/hdTsEbx8Y.

For this a vecsize has been added to the TSFlag bits of MVE
instructions, which stores the size of the elements that the MVE
instruction operates on. In the case of multiple size (such as a
MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be
chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 =
i64, which often (but not always) comes from the instruction encoding
directly. A unit test was added, and although only a subset of the
vecsizes are currently used, the rest should be useful for other cases.

Differential Revision: https://reviews.llvm.org/D109706
2021-09-22 12:07:52 +01:00
Raphael Isemann
a5e1c746b8 Unbreak module builds by making InstructionWorklist.h non-modular
This regressed in D110181 and apparently the header intentionally requires
DEBUG_TYPE to be defined by the including file. Just exclude the header from
the module to unbreak the build.
2021-09-22 12:17:13 +02:00
Yi Kong
d0746f2e9b Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar
The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C,
Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar.

  SELECT VecC, ScalarIdx, 0

We could splat Idx to a vector but it defeats the purpose of
optimisation. Don't apply the folding rule in this case.

This fixes a regression from commit d561b6fbdbe6d1da05fd92003a4ac1e37bf4b8bc.
2021-09-22 18:11:33 +08:00
Florian Mayer
36daf074d9 [hwasan] also omit safe mem[cpy|mov|set].
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D109816
2021-09-22 11:08:27 +01:00
Sander de Smalen
4ca1fbe361 [SelectionDAG] Make WidenVecRes_Convert work for scalable vectors.
Most of the code wasn't yet scalable safe, although most of the
code conceptually just works for scalable vectors. This change
makes the algorithm work on ElementCount, where appropriate,
and leaves the fixed-width only code to use `getFixedNumElements`.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110058
2021-09-22 10:58:38 +01:00
Simon Pilgrim
41492d77ba [LoopVectorize][X86] Add operands to make it more obvious what line the CHECK concerns
As we're checking the cost debug analysis these should match the original IR line - so we shouldn't have any variable naming issues.

I'm investigating v4i32 mul -> PMADDDW costs handling (for PR47437) and these CHECK lines were proving tricky to keep track of
2021-09-22 10:08:32 +01:00
Florian Hahn
300870a95c
[VectorCombine] Switch to using a worklist.
This patch updates VectorCombine to use a worklist to allow iterative
simplifications where a combine enables other combines.

Suggested in D100302.

The main use case at the moment is foldSingleElementStore and
scalarizeLoadExtract working together to improve scalarization.

Note that we now also do not run SimplifyInstructionsInBlock on the
whole function if there have been changes. This means we fail to
remove/simplify instructions not related to any of the vector combines.
IMO this is fine, as simplifying the whole function seems more like a
workaround for not tracking the changed instructions.

Compile-time impact looks neutral:
NewPM-O3: +0.02%
NewPM-ReleaseThinLTO: -0.00%
NewPM-ReleaseLTO-g: -0.02%

http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D110171
2021-09-22 09:54:58 +01:00
Sander de Smalen
ab3607c0ed [AArch64][SVE] Add missing load/store patterns for unpacked bfloat vectors.
Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D110063
2021-09-22 09:45:33 +01:00
Jay Foad
0205806d0f [AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.

With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.

Differential Revision: https://reviews.llvm.org/D110156
2021-09-22 09:36:34 +01:00
Jay Foad
3828ea6181 [AMDGPU] Divergence-driven instruction selection for mul i32
Differential Revision: https://reviews.llvm.org/D109881
2021-09-22 09:36:34 +01:00
David Green
636fc0ef86 [ARM] Add additional tests for VMOVL in tail predicated loops. 2021-09-22 09:33:36 +01:00
Dmitry Vyukov
0ee77d6db3 tsan: write uptime in mem profile
Write uptime in real time seconds for every mem profile record.
Uptime is useful to make more sense out of the profile,
compare random lines, etc.

Depends on D110153.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110154
2021-09-22 10:19:58 +02:00
Dmitry Vyukov
ae6d57ca5a tsan: remove stale comment
We do query it every 100ms now.
(GetRSS was fixed to not be dead slow IIRC)

Depends on D110152.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110153
2021-09-22 10:18:58 +02:00
Dmitry Vyukov
e8101f2149 tsan: move mem profile initialization into separate function
BackgroundThread function is quite large,
move mem profile initialization into a separate function.

Depends on D110151.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110152
2021-09-22 10:18:08 +02:00
Dmitry Vyukov
b8aa9b0c37 tsan: include internal allocator info in mem profile
We allocate things from the internal allocator,
it's useful to know how much it consumes.

Depends on D110150.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110151
2021-09-22 10:17:01 +02:00
Dmitry Vyukov
58a157cd3b tsan: make mem profile data more consistent
We currently query number of threads before reading /proc/self/smaps.
But reading /proc/self/smaps can take lots of time for huge processes
and it's retries several times with different buffer sizes.
Overall it can take tens of seconds. This can make number of threads
significantly inconsistent with the rest of the stats.
So query it after reading /proc/self/smaps.

Depends on D110149.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110150
2021-09-22 10:16:15 +02:00
Dmitry Vyukov
eefef56ece tsan: include MBlock/SyncObj stats into mem profile
Include info about MBlock/SyncObj memory consumption in the memory profile.

Depends on D110148.

Reviewed By: melver, vitalybuka

Differential Revision: https://reviews.llvm.org/D110149
2021-09-22 10:14:33 +02:00
Dmitry Vyukov
608ffc98c3 tsan: account for mid app range in mem profile
We account low and high ranges, but forgot abount the mid range.
Account mid range as well.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110148
2021-09-22 10:13:31 +02:00
Sebastian Neubauer
ecd5145c27 [Utils] Replace llc with cat for tests
Make the update_llc_test_checks script test independant of llc behavior
by using cat with static files to simulate llc output.

This allows changing llc without breaking the script test case.

The update script is executed in a temporary directory, so the
llc-generated assembly files are copied there. %T is deprecated, but it
allows copying a file with a predictable filename.

Differential Revision: https://reviews.llvm.org/D110143
2021-09-22 10:10:35 +02:00
Balázs Kéri
7ce638538b [clang][ASTImporter] Generic attribute import handling (first step).
Import of Attr objects was incomplete in ASTImporter.
This change introduces support for a generic way of importing an attribute.
For an usage example import of the attribute AssertCapability is
added to ASTImporter.
Updating the old attribute import code and adding new attributes or extending
the generic functions (if needed) is future work.

Reviewed By: steakhal, martong

Differential Revision: https://reviews.llvm.org/D109608
2021-09-22 10:14:03 +02:00
Florian Hahn
e08a5dc86f
[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC).
InstCombine's worklist can be re-used by other passes like
VectorCombine. Move it to llvm/Transform/Utils and rename it to
InstructionWorklist.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D110181
2021-09-22 08:47:21 +01:00
Diana Picus
abbb0f901a [flang] Change complex type define in runtime for clang-cl
When compiling the runtime with a version of clang-cl newer than 12, we
define CMPLXF as __builtin_complex, which returns a float _Complex type.
This errors out in contexts where the result of CMPLXF is expected to be
a float_Complex_t. This is defined as _Fcomplex whenever _MSC_VER is
defined (and as float _Complex otherwise).

This patch defines float_Complex_t & friends as _Fcomplex only when
we're using "true" MSVC, and not just clang-pretending-to-be-MSVC. This
should only affect clang-cl >= 12.

Differential Revision: https://reviews.llvm.org/D110139
2021-09-22 06:54:33 +00:00
Jonas Devlieghere
47f79c6057 [lldb] Add --stack option to target symbols add command
Currently you can ask the target symbols add command to locate the debug
symbols for the current frame. This patch add an options to do that for
the whole call stack.

Differential revision: https://reviews.llvm.org/D110011
2021-09-21 23:08:14 -07:00
Dmitry Vyukov
4986959eb2 tsan: prepare for trace mapping removal
Don't test for presence of the trace mapping,
it will be removed soon.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110194
2021-09-22 07:26:37 +02:00
Dmitry Vyukov
82e593cf90 tsan: uninline Enable/DisableIgnores
ScopedInterceptor::Enable/DisableIgnores is only used for some special cases.
Unline them from the common interceptor handling.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110157
2021-09-22 07:25:14 +02:00
Dmitry Vyukov
db2f870fe3 tsan: reset destination range in Java heap move
Switch Java heap move to the new scheme required for the new tsan runtime.
Instead of copying the shadow we reset the destination range.
The new v3 trace contains addresses of accesses, so we cannot simply copy the shadow.
This can lead to false negatives, but cannot lead to false positives.

Depends on D110159.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110190
2021-09-22 07:23:21 +02:00
Michael Kruse
ced20c6672 [Polly] Add -polly-reschedule and -polly-postopts options.
This command line options allow to off parts of the schedule tree optimization pipeline.
2021-09-22 00:18:19 -05:00
Dmitry Vyukov
41f8ef3e31 tsan: enable sse4.2 in tests
Pass -msse4.2 flag to the tests the same way we do for the runtime.
Layout of some structs in the runtime headers depends on the flag
(TSAN_VECTORIZE), so we need it to be consistent across the runtime
and tests.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110192
2021-09-22 07:13:47 +02:00
Dmitry Vyukov
cf93f7677d tsan: move errno spoiling reporting into a separate function (NFC)
CallUserSignalHandler function is quite large and complex.
Move errno spoiling reporting into a separate function.
No logical changes.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110159
2021-09-22 07:12:53 +02:00
Dmitry Vyukov
20ee72d4cc tsan: don't call dlsym during exit
dlsym calls into dynamic linker which calls malloc and other things.
It's problematic to do it during the actual exit, because
it can happen from a singal handler or from within the runtime
after we reported the first bug, etc.
See https://github.com/google/sanitizers/issues/1440 for an example
(captured in the added test).
Initialize the callbacks during startup instead.

Depends on D110159.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D110166
2021-09-22 07:11:59 +02:00
Chen Zheng
957514eb9e [PowerPC] add testcase for chain commoning; nfc 2021-09-22 05:08:00 +00:00
Aart Bik
128a9e1cb4 [mlir][sparse] cleanup ABI issues in C interface with memrefs
This change adds automatic wrapper functoins with emit_c_interface
to all methods in the sparse support library that deal with MEMREFs.
The wrappers will take care of passing MEMREFs by value internally
and by pointer externally, thereby avoiding ABI issues across platforms.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D110219
2021-09-21 21:58:12 -07:00
Shao-Ce SUN
1d8bbafed2 [RISCV][NFC] Fix clang test for vloxei/vluxei 2021-09-22 11:27:41 +08:00
Louis Dionne
84d07f4dfe [libc++] Add some missing _LIBCPP_HIDE_FROM_ABI markup
Also, as a fly-by fix, use `inline` directly to define inline variables
(all compilers support it).

Differential Revision: https://reviews.llvm.org/D110208
2021-09-21 23:11:23 -04:00
David Blaikie
2ff049b12e DebugInfo: Don't use preferred template names in debug info
Using the preferred name creates a mismatch between the textual name of
a type and the DWARF tags describing the parameters as well as possible
inconsistency between DWARF producers (like Clang and GCC, or
older/newer Clang versions, etc).
2021-09-21 20:08:16 -07:00
Shao-Ce SUN
e247fed23b [RISCV] add Half-precision test for clang
and deleted useless lines.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109799
2021-09-22 11:06:57 +08:00
Matt Arsenault
ec55dcedce AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query
Add an overload to pass the flat workgroup range in separately. This
will allow the attributor to use the assumed value for
amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.
2021-09-21 22:57:17 -04:00
Chen Zheng
ffa9fa9ed2 [PowerPC] prepare for udpate form with non-const increment.
This is a follow-up of D105872. Now we are able to prepare for update
form with non-const increment.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106032
2021-09-22 02:54:28 +00:00
Joe Loser
bc4a23811b
[libc++][test] Fix iterator assertion in span.cons/deduct.pass.cpp
Two tests in span.cons/deduct.pass.cpp accidentally check whether the
iterator range from member begin and member end are equivalent to the
ones from free begin and free end. This is obviously true and not
intended. Correct the intent by comparing the size/data from the span
with the source input.

While in the neighborhood, add test for const int arr[N], remove extraneous
type aliases, unused <type_traits> header, and the
disable_missing_braces_warning.h include.

Reviewed By: Quuxplusone, ldionne, #libc

Differential Revision: https://reviews.llvm.org/D109668
2021-09-21 22:46:08 -04:00
Matt Arsenault
4c2ee57148 AMDGPU: Fix test relying on incompatible attributes
This combination of amdgpu-waves-per-eu and
amdgpu-flat-work-group-size cannot be satisfied at the same time, so
this was using the default.
2021-09-21 22:44:35 -04:00
David Blaikie
db6f1e8a88 DebugInfo: Don't suppress inline namespaces when printing template template parameter names 2021-09-21 19:30:13 -07:00
David Blaikie
d31dfc3011 DebugInfo: Unify some printing policy adjustments 2021-09-21 19:30:12 -07:00
Shao-Ce SUN
d9aff62560 [NFC] Fix typo. 2021-09-22 10:27:11 +08:00
Shao-Ce SUN
a83eda591c [RISCV][NFC] Deleted useless lines in clang tests. 2021-09-22 10:25:57 +08:00