Use a helper function to find all the direct-calls-sites in a function.
Also split the code into a separated file as this will be use by
indirect-call-promotion transformation.
Differential Revision: http://reviews.llvm.org/D18704
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265199 91177308-0d34-0410-b5e6-96231b3b80d8
These function can be dropped by the compiler if they are no longer
referenced in the current module. However there is a change that
another module is still referencing them because of the import.
Multiple solutions can be used:
- Always import LinkOnce when a caller is imported. This ensure that
every module with a call to a LinkOnce has the definition and will
be able to emit it if it emits the call.
- Turn the LinkOnce into Weak, so that it is always emitted.
- Turn all LinkOnce into available_externally and come back after all
modules are codegen'ed to emit only one copy of the linkonce, when
there is still a reference to it.
This patch implement the second option, with am optimization that
only *one* module will turn the LinkOnce into Weak, while the others
will turn it into available_externally, so that there is exactly one
copy emitted for the whole compilation.
http://reviews.llvm.org/D18346
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265190 91177308-0d34-0410-b5e6-96231b3b80d8
A ``swifterror`` attribute can be applied to a function parameter or an
AllocaInst.
This commit does not include any target-specific change. The target-specific
optimization will come as a follow-up patch.
Differential Revision: http://reviews.llvm.org/D18092
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265189 91177308-0d34-0410-b5e6-96231b3b80d8
ThreadModel::Single is already handled already by ARMPassConfig adding
LowerAtomicPass to the pass list, which lowers all atomics to non-atomic
ops and deletes fences.
So by the time we get to ISel, there's no atomic fences left, so they
don't need special handling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265178 91177308-0d34-0410-b5e6-96231b3b80d8
Was there really no other way to splat a byte in SSE2?
punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265172 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.
32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.
Patch by: Vedran Miletić
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: jvesely, scchan, kanarayan, arsenm
Differential Revision: http://reviews.llvm.org/D17280
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265170 91177308-0d34-0410-b5e6-96231b3b80d8
Note however that this is identical to the existing SSE2 run.
What we really want is yet another run for an SSE2 machine that
also has fast unaligned 16-byte accesses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265167 91177308-0d34-0410-b5e6-96231b3b80d8
Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 -
where we noticed that an intermediate splat was being generated for memsets of
non-zero chars.
That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.
The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
Note that the SSE1 path is not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265161 91177308-0d34-0410-b5e6-96231b3b80d8
A catchswitch cannot be preceded by another instruction in the same
basic block (other than a PHI node).
Instead, insert the extract element right after the materialization of
the vectorized value. This isn't optimal but is a reasonable compromise
given the constraints of WinEH.
This fixes PR27163.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265157 91177308-0d34-0410-b5e6-96231b3b80d8
Follow-up to D18566 - where we noticed that an intermediate splat was being
generated for memsets of non-zero chars.
That was because we told getMemsetStores() to use a 32-bit vector element type,
and it happily obliged by producing that constant using an integer multiply.
The tests that were added in the last patch are now equivalent for AVX1 and AVX2
(no splats, just a vector load), but we have PR27141 to track that splat difference.
In the new tests, the splat via shuffling looks ok to me, but there might be some
room for improvement depending on uarch there.
Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up.
This patch should resolve PR27100.
Differential Revision: http://reviews.llvm.org/D18676
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265148 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids undefined behavior when casting pointers to it. Also make
sure that we don't cast to a derived StringMapEntry before checking for
tombstone, as that may have different alignment requirements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265145 91177308-0d34-0410-b5e6-96231b3b80d8
Since revision 235394, we no longer perform target specific combines on
build_vector nodes. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265138 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The assembler was picking the wrong JR variant because the pre-R6 one was still enabled at R6.
Author: nitesh.jain
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits, mohit.bhakkad, sagar, bhushan, jaydeep
Differential: D18387
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265134 91177308-0d34-0410-b5e6-96231b3b80d8
Some ARM instructions encode 32-bit immediates as a 8-bit integer (0-255)
and a 4-bit rotation (0-30, even) in its least significant 12 bits. The
original fixup, FK_Data_4, patches the instruction by the value bit-to-bit,
regardless of the encoding. For example, assuming the label L1 and L2 are
0x0 and 0x104 respectively, the following instruction:
add r0, r0, #(L2 - L1) ; expects 0x104, i.e., 260
would be assembled to the following, which adds 1 to r0, instead of 260:
e2800104 add r0, r0, #4, 2 ; equivalently 1
The new fixup kind fixup_arm_mod_imm takes care of the encoding:
e2800f41 add r0, r0, #260
Patch by Ting-Yuan Huang!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265122 91177308-0d34-0410-b5e6-96231b3b80d8
The llvm_string_of_message function, called by llvm_raise, calls
LLVMDisposeMessage, which expects the message to be dynamically
allocated; it fails freeing the message otherwise. So always
dynamically allocate with LLVMCreateMessage.
Differential Revision: http://reviews.llvm.org/D18675
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265116 91177308-0d34-0410-b5e6-96231b3b80d8
Expose LLVMCreateTargetMachineData as data_layout.
As r263530 did for go. From that commit: "LLVMGetTargetDataLayout was
removed from the C API, and then TargetMachine.TargetData was removed.
Later, LLVMCreateTargetMachineData was added to the C API"
Differential Revision: http://reviews.llvm.org/D18677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265115 91177308-0d34-0410-b5e6-96231b3b80d8
This allows the linker to instruct ThinLTO to perform only the
optimization part or only the codegen part of the process.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265113 91177308-0d34-0410-b5e6-96231b3b80d8