AggressiveAntiDepBreaker was renaming registers specified by the user
for inline assembly. While this will work for compiler-specified
registers, it won't work for user-specified registers, and at the time
this runs, I don't currently see a way to distinguish them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254532 91177308-0d34-0410-b5e6-96231b3b80d8
vector.resize() is significantly slower than memset in many STLs
and the cost of initializing these vectors is significant on targets
with many registers. Since we don't need the overhead of a vector,
use a simple unique_ptr instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254526 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This changes overflow handling during instrumentation profile merge. Rathar than throwing away records that would result in counter overflow, merged counts are instead clamped to the maximum representable value. A warning about counter overflow is still surfaced to the user as before.
Reviewers: dnovillo, davidxl, silvas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14893
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254525 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM is clear that 128-bit loads are only guaranteed to have been atomic
if there has been a corresponding successful stxp. It's less clear for AArch32, so
I'm leaving that alone for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254524 91177308-0d34-0410-b5e6-96231b3b80d8
|9B DD /7| FSTSW m2byte| Valid Valid Store FPU status word at m2byteafter checking for pending unmasked floating-point exceptions.|
|9B DF E0| FSTSW AX| Valid Valid Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.|
|DD /7 |FNSTSW *m2byte| Valid Valid Store FPU status word at m2bytewithout checking for pending unmasked floating-point exceptions.|
|DF E0 |FNSTSW *AX| Valid Valid Store FPU status word in AX register without checking for pending unmasked floating-point exceptions|
m2byte is word register, and therefor instruction operand need to be change from f32mem to i16mem.
Differential Revision: http://reviews.llvm.org/D14953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254512 91177308-0d34-0410-b5e6-96231b3b80d8
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).
Differential Revision: http://reviews.llvm.org/D15074
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254494 91177308-0d34-0410-b5e6-96231b3b80d8
time.
The new overloaded function is used when an attribute is added to a
large number of slots of an AttributeSet (for example, to function
parameters). This is much faster than calling AttributeSet::addAttribute
once per slot, because AttributeSet::getImpl (which calls
FoldingSet::FIndNodeOrInsertPos) is called only once per function
instead of once per slot.
With this commit, clang compiles a file which used to take over 22
minutes in just 13 seconds.
rdar://problem/23581000
Differential Revision: http://reviews.llvm.org/D15085
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254491 91177308-0d34-0410-b5e6-96231b3b80d8
This is very rudimentary support for debug_cu_index, but it is enough to
allow llvm-dwarfdump to find the offsets for contributions and
correctly dump debug_info.
It will need to actually find the real signature of the unit and build
the real hash table with the right number of buckets, as per the DWP
specification.
It will also need to be expanded to cover the tu_index as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254489 91177308-0d34-0410-b5e6-96231b3b80d8
When linking static archive, there is no individual module files to
load. Instead they can be mmap'ed and could be initialized from a
buffer directly. The callback provide flexibility to override the
scheme for loading module from the summary.
Differential Revision: http://reviews.llvm.org/D15101
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254479 91177308-0d34-0410-b5e6-96231b3b80d8
We mustn't introduce a shift of exactly 64-bits for any inputs, since that's an
UNDEF value (and worse, it's not what you want with the natural Arch64
implementation).
The generated code is pretty horrific, but I couldn't come up with an obviously
better alternative (if the amount is constant EXTR could help). Turns out
128-bit shifts are just nasty.
rdar://22491037
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254475 91177308-0d34-0410-b5e6-96231b3b80d8
For the struct with trailing objects, define
a member operator delete. Without this, the program
will fail when -fsized-deallocation option is used
where the wrong size will be passed to the global
delete operator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254471 91177308-0d34-0410-b5e6-96231b3b80d8
The bug is introduced in r254377 which failed some tests on ARM, where a new
probability is assigned to a successor but the provided BB may not be a
successor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254463 91177308-0d34-0410-b5e6-96231b3b80d8
The values in this field are compared against getAvailableFeatures()
which returns an uint64_t. This was causing problems in an internal
branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254462 91177308-0d34-0410-b5e6-96231b3b80d8
ConstantDataArray::getImpl and ConstantDataVector::getImpl had a lot
of copy pasta in how they handled sequences of constants. Break that
out into a couple of simple functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254456 91177308-0d34-0410-b5e6-96231b3b80d8
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.
This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254452 91177308-0d34-0410-b5e6-96231b3b80d8
The linker never takes ownership of a module or changes which module it
is refering to, making it natural to use references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254449 91177308-0d34-0410-b5e6-96231b3b80d8