Summary:
The 64-bit version of the variable shift instructions uses the
shift_rotate_reg class which uses a GPR32Opnd to specify the variable
shift amount. With this patch we avoid the generation of a redundant
SLL instruction for the variable shift instructions in 64-bit targets.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7413
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235376 91177308-0d34-0410-b5e6-96231b3b80d8
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.
I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.
Differential Revision: http://reviews.llvm.org/D9095
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235372 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.
Test case spotted by Patrik Hagglund
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235371 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.
This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.
Patch by Keno Fischer!
Test Plan: regression test from http://reviews.llvm.org/D7752
Reviewers: chfast, resistor
Reviewed By: chfast, resistor
Subscribers: sanjoy, resistor, chfast, llvm-commits
Differential Revision: http://reviews.llvm.org/D4978
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235370 91177308-0d34-0410-b5e6-96231b3b80d8
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.
Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296
Differential Revision: http://reviews.llvm.org/D9120
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235367 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Bundle aligment requires that the functions always start at an aligned address.
Usually this is ensured by the compiler, but assembly code does not always
begin with a .align directive.
This change ensures that sections get the correct alignment if they contain
any instructions and bundling is enabled. (It also makes LLVM match the
behavior of GNU as).
Differential Revision: http://reviews.llvm.org/D9131
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235365 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In the f16-promote test, make the checks for native conversion instructions
similar to the libcall checks:
- Remove hard coded register names
- Do not check exact instruction sequences.
This fixes test flakiness due to non-determinism in instruction
scheduling and register allocation. I also fixed a few minor things in
the CHECK-LIBCALL checks.
I'll try to find a way to check that unnecessary loads, stores, or
conversions don't happen.
Reviewers: mzolotukhin, srhines, ab
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9112
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235363 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds two flags to `bugpoint`: "-replace-funcs-with-null" and "-disable-pass-list-reduction".
When "-replace-funcs-with-null" is specified, bugpoint will, instead of simply deleting function bodies, replace all uses of functions and then will delete functions completely from the test module, correctly handling aliasing and @llvm.used && @llvm.compiler.used. This part was conceived while trying to debug the PNaCl IR simplification passes, which don't allow undefined functions (ie no declarations).
With "-disable-pass-list-reduction", bugpoint won't try to reduce the set of passes causing the "crash". This is needed in cases where one is trying to debug an issue inside the PNaCl IR simplification passes which is causing an PNaCl ABI verification error, for example.
Reviewers: jfb
Reviewed By: jfb
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D8555
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235362 91177308-0d34-0410-b5e6-96231b3b80d8
Also, replace win and linux runs with a generic run because that
makes no difference in what this test is checking.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235361 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Set operation action for FP16 conversion opcodes, so the Op legalizer
can choose the gnu_* libcalls for Mips.
Set LoadExtAction and TruncStoreAction for f16 scalars and vectors to
prevent (fpext (load )) and (store (fptrunc)) from getting combined into
unsupported operations.
Added test cases to test that these operations are handled correctly
for f16 scalars and vectors. This patch depends on
http://reviews.llvm.org/D8755.
Reviewers: srhines
Subscribers: llvm-commits, ab
Differential Revision: http://reviews.llvm.org/D8804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235341 91177308-0d34-0410-b5e6-96231b3b80d8
This commit fixes the code which adds lifetime markers in InlineFunction to skip
zero-sized allocas instead of asserting on them.
rdar://problem/20531155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235312 91177308-0d34-0410-b5e6-96231b3b80d8
n/1 generates a quotient equal to n and a remainder of 0.
If this case is not recognized, then the SCEV divide() function
can return a remainder that is greater than or equal to the
denominator, which means the delinearized subscripts for the
test case will be incorrect.
Differential Revision: http://reviews.llvm.org/D9003
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235311 91177308-0d34-0410-b5e6-96231b3b80d8
We have to avoid converting a reference to a global into a reference to a local,
but it is fine to look past a local.
Patch by Vasileios Kalintiris.
I just moved the comment and added thet test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235300 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.
Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.
So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.
Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235295 91177308-0d34-0410-b5e6-96231b3b80d8
When an inline asm call has an output register marked as early-clobber, but
that same register is also an input operand, what should we do? GCC accepts
this, and is documented to accept this for read/write operands saying,
"Furthermore, if the earlyclobber operand is also a read/write operand, then
that operand is written only after it's used." For write-only operands, the
situation seems less clear, but I have at least one existing codebase that
assumes this will work, in part because it has syscall macros like this:
({ \
register uint64_t r0 __asm__ ("r0") = (__NR_ ## name); \
register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0)); \
register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1)); \
register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2)); \
__asm__ __volatile__ \
("sc" \
: "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5) \
: "0"(r0), "1"(r3), "2"(r4), "3"(r5) \
: "r6","r7","r8","r9","r10","r11","r12","cr0","memory"); \
r3; \
})
Furthermore, with register aliases and subregister relationships that only the
backend knows about, rejecting this in the frontend seems like a difficult
proposition (if we wanted to do so). However, keeping the early-clobber flag on
the INLINEASM MI does not work for us, because it will cause the register's
live interval to end to soon (so it will not appear defined to be used as an
input).
Fortunately, fixing this does not seem hard: When forming the INLINEASM MI,
check to see if any of the early-clobber outputs are also inputs, and if so,
remove the early-clobber flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235283 91177308-0d34-0410-b5e6-96231b3b80d8
The fix ensures that scalar sources inserted into a vector are the correct bit size.
Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235281 91177308-0d34-0410-b5e6-96231b3b80d8
Harden r235258 to support any integer bitwidth. The quick glance at
the reference made me think only i32 and i64 were valid types, but
they're not special, so any overload is legal.
Thanks to David Majnemer for noticing!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235261 91177308-0d34-0410-b5e6-96231b3b80d8
Followup to r235232, which caused PR23278.
We can't assume the memset and memcpy sizes have the same type, as
nothing in the language reference prevents that.
Instead, zext both to i64 if they disagree.
While there, robustify tests by using i8 %c rather than i8 0 for the
memset character.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235258 91177308-0d34-0410-b5e6-96231b3b80d8
Multiplying INT_MIN by 1 doesn't trigger nsw. However, shifting 1 into
the sign bit *does* trigger nsw.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235250 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of merging everything together, look at the users of
GlobalVariables, and try to group them by function, to create
sets of globals used "together".
Using that information, a less-aggressive alternative is to keep merging
everything together *except* globals that are only ever used alone, that
is, those for which it's clearly non-profitable to merge with others.
In my testing, grouping by Function is too aggressive, but grouping by
BasicBlock is too conservative. Anything in-between isn't trivially
available, so stick with Function grouping for now.
cl::opts are added for testing; both enabled by default.
A few of the testcases aren't testing the merging proper, but just
various edge cases when merging does occur. Update them to use the
previous grouping behavior. Also, one of the tests is unrelated to
GlobalMerge; change it accordingly.
While there, switch to r234666' flags rather than the brutal -O3.
Differential Revision: http://reviews.llvm.org/D8070
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235249 91177308-0d34-0410-b5e6-96231b3b80d8
The result is either an Untyped reg sequence, on ldN with N > 1, or
just the type of the input vector, on ld1. Don't force Untyped.
Instead, just use the type of the reg sequence.
This mirrors the behavior of createTuple, which feeds the LD1*_POST.
The narrow code path wasn't actually covered by tests, because V64
insert_vector_elt are widened to V128 before the LD1LANEpost combine
has the chance to run, usually.
The only case where it does run on V64 vectors is if the vector ops
legalizer ran. So, tickle the code with a ctpop.
Fixes PR23265.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235243 91177308-0d34-0410-b5e6-96231b3b80d8
A common idiom in some code is to do the following:
memset(dst, 0, dst_size);
memcpy(dst, src, src_size);
Some of the memset is redundant; instead, we can do:
memcpy(dst, src, src_size);
memset(dst + src_size, 0,
dst_size <= src_size ? 0 : dst_size - src_size);
Original patch by: Joel Jones
Differential Revision: http://reviews.llvm.org/D498
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235232 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to r235222, but for the weak symbol case.
In an "ideal" assembler/object format an expression would always refer to the
final value and A-B would only be computed from a section in the same
comdat as A and B with A and B strong.
Unfortunately that is not the case with debug info on ELF, so we need an
heuristic. Since we need an heuristic, we may as well use the same one as
gas:
* call weak_sym : produces a relocation, even if in the same section.
* A - weak_sym and weak_sym -A: don't produce a relocation if we can
compute it.
This fixes pr23272 and changes the fix of pr22815 to match what gas does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235227 91177308-0d34-0410-b5e6-96231b3b80d8
They would break the SelectionDAG.
Note that the opposite load->vector dependency is already obvious in:
(LD1*post vec, ..)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235224 91177308-0d34-0410-b5e6-96231b3b80d8
Part of pr23272.
A small annoyance with the assembly syntax we implement is that given an
expression there is no way to know if what is desired is the value of that
expression for the symbols in this file or for the final values of those
symbols in a link.
The first case is useful for use in sections that get discarded or ignored
if the section they are describing is discarded.
For axample, consider A-B where A and B are in the same comdat section.
We can compute the value of the difference in the section that is present in
the current .o and if that section survives to the final DSO the value will
still will be correct.
But the section is in a comdat. Another section from another object file
might be used istead. We know that that section will define A and B, but
we have no idea what the value of A-B might be.
In practice we have to assume that the intention is to compute the value
in the current section since otherwise the is no way to create something like
the debug aranges section.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235222 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds legalization support to operate on FP16 as a load/store type
and do operations on it as floats.
Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll
Reviewers: srhines, t.p.northover
Differential Revision: http://reviews.llvm.org/D8755
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235215 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Implement the method FastMaterializeAlloca in Mips fast-isel
Based on a patch by Reed Kotler.
Test Plan:
Passes test-suite at O0/O2 for mips32 r1/r2
fastalloca.ll
Reviewers: dsanders, rkotler
Subscribers: rfuhler, llvm-commits
Differential Revision: http://reviews.llvm.org/D6742
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235213 91177308-0d34-0410-b5e6-96231b3b80d8
The v1i128 type is needed for the quadword add/substract instructions introduced
in POWER8. Futhermore, the PowerPC ABI specifies that parameters of type v1i128
are to be passed in a single vector register, while parameters of type i128 are
passed in pairs of GPRs. Thus, it is necessary to be able to differentiate
between v1i128 and i128 in LLVM.
http://reviews.llvm.org/D8564
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235198 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Add shift operators implementation to fast-isel for Mips. These are shift ops
for non legal forms, i.e. i8 and i16.
Based on a patch by Reed Kotler.
Test Plan:
Reviewers: dsanders
Subscribers: echristo, rfuhler, llvm-commits
Differential Revision: http://reviews.llvm.org/D6726
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235194 91177308-0d34-0410-b5e6-96231b3b80d8