The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).
I added the new sequence (truncate (a >>n) to i1) to the BT pattern.
Differential Revision: https://reviews.llvm.org/D22354
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275950 91177308-0d34-0410-b5e6-96231b3b80d8
This patch updates MemorySSA's use-optimizing walker to be more
accurate and, in some cases, faster.
Essentially, this changed our core walking algorithm from a
cache-as-you-go DFS to an iteratively expanded DFS, with all of the
caching happening at the end. Said expansion happens when we hit a Phi,
P; we'll try to do the smallest amount of work possible to see if
optimizing above that Phi is legal in the first place. If so, we'll
expand the search to see if we can optimize to the next phi, etc.
An iteratively expanded DFS lets us potentially quit earlier (because we
don't assume that we can optimize above all phis) than our old walker.
Additionally, because we don't cache as we go, we can now optimize above
loops.
As an added bonus, this patch adds a ton of verification (if
EXPENSIVE_CHECKS are enabled), so finding bugs is easier.
Differential Revision: https://reviews.llvm.org/D21777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275940 91177308-0d34-0410-b5e6-96231b3b80d8
Add a "-j" option to llvm-profdata to control the number of threads used.
Auto-detect NumThreads when it isn't specified, and avoid spawning threads when
they wouldn't be beneficial.
I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:
No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total
Changes since the initial commit:
- When handling odd-length inputs, call ThreadPool::wait() before merging the
last profile. Should fix a race/off-by-one (see r275937).
Differential Revision: https://reviews.llvm.org/D22438
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275938 91177308-0d34-0410-b5e6-96231b3b80d8
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.
The change will make the vector RegUsages estimation less conservative.
Differential Revision: https://reviews.llvm.org/D20474
The recommit fixed the testcase global_alias.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275936 91177308-0d34-0410-b5e6-96231b3b80d8
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.
Patch by Axel Davy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275935 91177308-0d34-0410-b5e6-96231b3b80d8
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275934 91177308-0d34-0410-b5e6-96231b3b80d8
Add an overview of stubs and compile callbacks before the discussion of the
source changes.
-- This line, and those below, will be ignored--
M docs/tutorial/BuildingAJIT3.rst
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275933 91177308-0d34-0410-b5e6-96231b3b80d8
This is for a situation where the encoding for a register may be
different depending on the specific operand. For some instructions,
we want to apply additional restrictions beyond the encoding's
constraints.
In AMDGPU some operands are VSrc_32, using the VS_32 pseudo register
class which accept VGPRs, SGPRs, or immediates in the encoding.
Some specific instructions with the same encoding operand do not want
to allow immediates or SGPRs, but the encoding format is different
in this case than a regular VGPR_32 operand.
This allows specifying the encoding should be treated the same
without introducing yet another dummy register class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275929 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of extracting raw coverage mappings into an artifact directory,
actually generate useful html reports for a given list of binaries with
symbol demangling turned on.
No tests, but this is actively being used to drive the (still nascent)
coverage bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275927 91177308-0d34-0410-b5e6-96231b3b80d8
Add a "-j" option to llvm-profdata to control the number of threads
used. Auto-detect NumThreads when it isn't specified, and avoid spawning
threads when they wouldn't be beneficial.
I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:
No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total
Differential Revision: https://reviews.llvm.org/D22438
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275921 91177308-0d34-0410-b5e6-96231b3b80d8
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.
The change will make the vector RegUsages estimation less conservative.
Differential Revision: https://reviews.llvm.org/D20474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275912 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Per D22441, MSVC warns on our old implementation of isUInt<64>. It sees
uint64_t(1) << 64 and doesn't realize that it's not going to be
executed. Writing as a template specialization is ugly, but prevents
the warning.
Reviewers: RKSimon
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22472
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275909 91177308-0d34-0410-b5e6-96231b3b80d8
This doesn't seem to work with Bash:
$ /work/llvm/utils/release/merge.sh --proj llvm --rev r275870
/work/llvm/utils/release/merge.sh: line 34: ${$1#r}: bad substitution
I get the same error with and without a leading 'r'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275898 91177308-0d34-0410-b5e6-96231b3b80d8
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.
The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.
Differential Revision: https://reviews.llvm.org/D22428
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275893 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Usually LCSSA survives this transformation, but in some cases (see
attached test) it doesn't: values from the original loop after
separating might be used from the outer loop. Before the transformation
it was the same loop, so LCSSA phis were not required.
This fixes PR28272.
Reviewers: sanjoy, hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21665
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275891 91177308-0d34-0410-b5e6-96231b3b80d8
replaceUsesOfWith will, on average, consider fewer values when trying
to do the replacement.
No functional change is intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275884 91177308-0d34-0410-b5e6-96231b3b80d8
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.
Also fix crashes on 0 x and 1 x arrays.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275869 91177308-0d34-0410-b5e6-96231b3b80d8
Non intrinsic calls aren't really handled, and this
IntrinsicInst dyn_cast checks for the function for us.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275868 91177308-0d34-0410-b5e6-96231b3b80d8
Elsewhere (particularly computeKnownBits) we assume that a global will be
aligned to the value returned by Value::getPointerAlignment. This is used to
boost the alignment on memcpy/memset, so any target-specific request can only
increase that value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275866 91177308-0d34-0410-b5e6-96231b3b80d8