The address space of the pointer must be global (1) for these intrinsics. There must also be alignment metadata attached to the intrinsic calls, e.g.
%val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0!0 = metadata !{i32 4}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211939 91177308-0d34-0410-b5e6-96231b3b80d8
This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211935 91177308-0d34-0410-b5e6-96231b3b80d8
NVPTX is a bit special in the optimizations it requires, so this gives
us better control over the backend optimization pipeline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211927 91177308-0d34-0410-b5e6-96231b3b80d8
a bootstrap.
I managed to mis-remember how PACKUS worked on x86, and was using undef
for the high bytes instead of zero. The fix is fairly obvious.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211922 91177308-0d34-0410-b5e6-96231b3b80d8
This new IR facility allows us to represent the object-file semantic of
a COMDAT group.
COMDATs allow us to tie together sections and make the inclusion of one
dependent on another. This is required to implement features like MS
ABI VFTables and optimizing away certain kinds of initialization in C++.
This functionality is only representable in COFF and ELF, Mach-O has no
similar mechanism.
Differential Revision: http://reviews.llvm.org/D4178
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211920 91177308-0d34-0410-b5e6-96231b3b80d8
By default, CMake will set NDEBUG in Rel* builds and leave it off in
debug builds, so we shouldn't need to do anything ourselves.
Before this change, it was possible to a Debug build without assertions
(aka Debug-Asserts in the autoconf system) by configuring with
-DLLVM_ENABLE_ASSERTIONS=OFF, but this configuration isn't very useful.
You can still get the same effect by explicitly adding -DNDEBUG to
CFLAGS.
Differential Revision: http://reviews.llvm.org/D4257
Patch by Janusz Sobczak!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211919 91177308-0d34-0410-b5e6-96231b3b80d8
where there is no timeout. In the case where there is a timeout though, the
code is still wrong since it doesn't check that the alarm really went off.
Without this patch, I cannot debug a program that forks itself using
sys::ExecuteAndWait with lldb.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211918 91177308-0d34-0410-b5e6-96231b3b80d8
COFF sections in MC were represented by a tuple of section-name and
COMDAT-name. This is not sufficient to represent a .text section
associated with another .text section; we need a way to distinguish
between the key section and the one marked associative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211913 91177308-0d34-0410-b5e6-96231b3b80d8
I'll fix the problems in libclang and other projects in ways that don't
require <mutex> until we sort out the cygwin situation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211900 91177308-0d34-0410-b5e6-96231b3b80d8
I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:
ld 0, 20936(1) # 8-byte Folded Reload
stw 12, 10348(0)
stw 12, 10344(0)
The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass. That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.
That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211897 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.
Differential Revision: http://reviews.llvm.org/D4292
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211892 91177308-0d34-0410-b5e6-96231b3b80d8
half-shuffles, even looking through intervening instructions in a chain.
Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.
Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.
Differential Revision: http://reviews.llvm.org/D4291
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211890 91177308-0d34-0410-b5e6-96231b3b80d8
trivially redundant.
This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.
I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.
Differential Revision: http://reviews.llvm.org/D4240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211889 91177308-0d34-0410-b5e6-96231b3b80d8
x86 backend.
This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.
Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
There are cases where a previous shuffle could be modified slightly to
arrange for elements to be in the correct position and a later shuffle
eliminated. Doing this eagerly added quite a bit of complexity, and
so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
added. This is essential for real world shuffles as it turns out...
Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.
Differential Revision: http://reviews.llvm.org/D4225
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211888 91177308-0d34-0410-b5e6-96231b3b80d8
Current PPC64 RuntimeDyld code to handle TOC relocations has two
problems:
- With recent linkers, in addition to the relocations that implicitly
refer to the TOC base (R_PPC64_TOC*), you can now also use the .TOC.
magic symbol with any other relocation to refer to the TOC base
explicitly. This isn't currently used much in ELFv1 code (although
it could be), but it is essential in ELFv2 code.
- In a complex JIT environment with multiple modules, each module may
have its own .toc section, and TOC relocations in one module must
refer to *its own* TOC section. The current findPPC64TOC implementation
does not correctly implement this; in fact, it will always return the
address of the first TOC section it finds anywhere. (Note that at the
time findPPC64TOC is called, we don't even *know* which module the
relocation originally resided in, so it is not even possible to fix
this routine as-is.)
This commit fixes both problems by handling TOC relocations earlier, in
processRelocationRef. To do this, I've removed the findPPC64TOC routine
and replaced it by a new routine findPPC64TOCSection, which works
analogously to findOPDEntrySection in scanning the sections of the
ObjImage provided by its caller, processRelocationRef. This solves the
issue of finding the correct TOC section associated with the current
module.
This makes it straightforward to implement both R_PPC64_TOC relocations,
and relocations explicitly refering to the .TOC. symbol, directly in
processRelocationRef. There is now a new problem in implementing the
R_PPC64_TOC16* relocations, because those can now in theory involve
*three* different sections: the relocation may be applied in section A,
refer explicitly to a symbol in section B, and refer implicitly to the
TOC section C. The final processing of the relocation thus may only
happen after all three of these sections have been assigned final
addresses. There is currently no obvious means to implement this in
its general form with the common-code RuntimeDyld infrastructure.
Fortunately, ppc64 code usually makes no use of this most general form;
in fact, TOC16 relocations are only ever generated by LLVM for symbols
residing themselves in the TOC, which means "section B" == "section C"
in the above terminology. This special case can easily be handled with
the current infrastructure, and that is what this patch does.
[ Unhandled cases result in an explicit error, unlike the current code
which silently returns the wrong TOC base address ... ]
This patch makes the JIT work on both BE and LE (ELFv2 requires
additional patches, of course), and allowed me to successfully run
complex JIT scenarios (via mesa/llvmpipe).
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211885 91177308-0d34-0410-b5e6-96231b3b80d8