When trying to upgrade @llvm.x86.sse2.psrl.dq while parsing a module,
BitcodeReader adds the function to its worklist twice, resulting in a
crash when accessing it the second time.
This patch replaces the worklist vector by a map.
Patch by Philip Pfaffe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241281 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes linkage with dbghlp.dll for clang from static (at load time)
to on demand (at the first use of required functions). Clang uses dbghlp.dll
only in minor use-cases. First of all in case of crash and in case of plugin load.
The dbghlp.dll library can be absent on system. In this case clang will fail
to load. With lazy load of dbghlp.dll clang can work even if dbghlp.dll
is not available.
Differential Revision: http://reviews.llvm.org/D10737
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241271 91177308-0d34-0410-b5e6-96231b3b80d8
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.
It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241254 91177308-0d34-0410-b5e6-96231b3b80d8
By default, the GraphWriter code assumes that the generic file open
program (`open` on Apple, `xdg-open` on other systems) can wait on the
forked proces to complete. When the fork ends, the code would delete
the temporary dot files created, and return.
On GNU/Linux, the xdg-open program does not have a "wait for your fork
to complete before dying" option. So the behaviour was that xdg-open
would launch a process, quickly die itself, and then the GraphWriter
code would think its OK to quickly delete all the temporary files.
Once the temporary files were deleted, the dot viewers would get very
upset, and often give you weird errors.
This change only waits on the generic open program on Apple platforms.
Elsewhere, we don't wait on the process, and hence we don't try and
clean up the temporary files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241250 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This allows the "if (Statepoint SP = Statepoint(I))" idiom.
(I don't think this change needs review, this was uploaded to
phabricator to provide context for later dependent changes.)
Differential Revision: http://reviews.llvm.org/D10629
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241232 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Introduce a simple accessor to get the gc_result hanging off of a
statepoint.
(I don't think this change needs review, this was uploaded to
phabricator to provide context for later dependent changes.)
Differential Revision: http://reviews.llvm.org/D10627
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241231 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
r240039 adds a test case to check that CallGraph does the right thing
with respect to non-leaf intrinsics like statepoint and patchpoint.
This ports the same test case to LazyCallGraph. LazyCallGraph already
does the right thing with respect to escaping function pointers so there
is no need to change any code.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10582
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241226 91177308-0d34-0410-b5e6-96231b3b80d8
This checks subtarget feature compatibility for inlining by verifying
that the callee is a strict subset of the caller's features. This includes
the cpu as part of the subtarget we can get via the incoming functions as
the backend takes CPUs as feature sets.
This allows us to inline things like:
int foo() { return baz(); }
int __attribute__((target("sse4.2"))) bar() {
return foo();
}
so that generic code can be inlined into specialized functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241221 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
* Add 64-bit address space feature.
* Rename SIMD feature to SIMD128.
* Handle single-thread model with an IR pass (same way ARM does).
* Rename generic processor to MVP, to follow design's lead.
* Add bleeding-edge processors, with all features included.
* Fix a few DEBUG_TYPE to match other backends.
Test Plan: ninja check
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D10880
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241211 91177308-0d34-0410-b5e6-96231b3b80d8
TwoAddressInstructionPass stops after a successful commuting but 3 Addr
conversion might be good for some cases.
Consider:
int foo(int a, int b) {
return a + b;
}
Before this commit, we emit:
addl %esi, %edi
movl %edi, %eax
ret
After this commit, we try 3 Addr conversion:
leal (%rsi,%rdi), %eax
ret
Patch by Volkan Keles <vkeles@apple.com>!
Differential Revision: http://reviews.llvm.org/D10851
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241206 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch changes the way APInt is compared with a value of type uint64_t.
Before the uint64_t value was truncated to the size of APInt before comparison.
Now the comparison takes into account full 64-bit precision.
Test Plan: Unit tests added. No regressions. Self-hosted check-all done as well.
Reviewers: chandlerc, dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10655
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241204 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly an NFC, which increases code readability (instead of
saving old terminator, generating new one in front of old, and deleting
old, we just call a function). However, it would additionaly copy
the debug location from old instruction to replacement, which
would help PR23837.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241197 91177308-0d34-0410-b5e6-96231b3b80d8
All file formats only needed 16-bits right now which is enough to fit
in to the padding with other fields.
This reduces the size of MCSymbol to 24-bytes on a 64-bit system. The
layout is now
0 | class llvm::MCSymbol
0 | class llvm::PointerIntPair SectionOrFragmentAndHasName
0 | intptr_t Value
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
8 | unsigned int IsTemporary
8 | unsigned int IsRedefinable
8 | unsigned int IsUsed
8 | _Bool IsRegistered
8 | unsigned int IsExternal
8 | unsigned int IsPrivateExtern
8 | unsigned int Kind
9 | unsigned int IsUsedInReloc
9 | unsigned int SymbolContents
9 | unsigned int CommonAlignLog2
10 | uint32_t Flags
12 | uint32_t Index
16 | union
16 | uint64_t Offset
16 | uint64_t CommonSize
16 | const class llvm::MCExpr * Value
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
| [sizeof=24, dsize=24, align=8
| nvsize=24, nvalign=8]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241196 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
According to PTX ISA:
For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.
So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.
As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]
This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.
Patched by Gang Hu.
Test Plan: Test case attached.
Reviewers: jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D10876
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241191 91177308-0d34-0410-b5e6-96231b3b80d8
Given that alignments are always powers of 2, just encode it this way.
This matches how we encode alignment on IR GlobalValue's for example.
This compresses the CommonAlign member down to 5 bits which allows it
to pack better with the surrounding fields.
Reviewed by Duncan Exon Smith.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241189 91177308-0d34-0410-b5e6-96231b3b80d8
Don't pattern match for frontend outlined finally calls on non-x64
platforms. The 32-bit runtime uses a different funclet prototype. Now,
the frontend is pre-outlining the finally bodies so that it ends up
doing most of the heavy lifting for variable capturing. We're just
outlining the callsite, and adapting the frameaddress(0) call to line up
the frame pointer recovery.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241186 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.
Patched by Xuetian Weng.
Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10853
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241185 91177308-0d34-0410-b5e6-96231b3b80d8
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction. This patch moves the
offending logic so lxvdsx is once again generated.
This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant. Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe). This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.
There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization. However, vsx.ll was only
being tested for POWER7 with big-endian code generation. I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241183 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is not intended to change existing codegen behavior for any target.
It just exposes the JumpIsExpensive setting on the command-line to allow for
easier testing and emergency overrides.
Also, change the existing regression test to use FileCheck, explicitly specify
the jump-is-expensive option, and use more precise checks.
Differential Revision: http://reviews.llvm.org/D10846
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241179 91177308-0d34-0410-b5e6-96231b3b80d8
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present. Currently I'm hitting this at
-O0 due to the clang bug PR24009.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241170 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn
with a negative immediate operand and convert them as shown:
add Rd, Rn, -imm -> sub Rd, Rn, imm
sub Rd, Rn, -imm -> add Rd, Rn, imm
adds Rd, Rn, -imm -> subs Rd, Rn, imm
subs Rd, Rn, -imm -> adds Rd, Rn, imm
cmp Rn, -imm -> cmn Rn, imm
cmn Rn, -imm -> cmp Rn, imm
Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers (gas). They are documented in the "ARMv8 Instruction Set
Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc
a programmer-friendly assembler !
This also fixes PR20978: "Assembly handling of adding negative numbers
not as smart as gas".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241166 91177308-0d34-0410-b5e6-96231b3b80d8