The variable is private, so the name should not be relied on. Also, the
linker uses the sections, so asan should too when trying to avoid causing
the linker problems.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221480 91177308-0d34-0410-b5e6-96231b3b80d8
This test case was never actually testing the trivial spiller: the -spiller
option has not been hooked up for a while now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221475 91177308-0d34-0410-b5e6-96231b3b80d8
add the code and test cases for 32-bit ARM symbolizer.
Also fixed the printing of data in code as it was not using the table correctly
and needed to fix one of the test cases too.
This will break lld’s test/mach-o/arm-interworking-movw.yaml till the tweak
for that is made. Which I’ll be committing immediately after this commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221470 91177308-0d34-0410-b5e6-96231b3b80d8
instructions. Inlining might cause such cases and it's not valid to
reassociate floating-point instructions without the unsafe algebra flag.
Patch by Mehdi Amini <mehdi_amini@apple.com>!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221462 91177308-0d34-0410-b5e6-96231b3b80d8
On 32 bit windows we use label differences and .set does not suppress
rolocations, a combination that was not used before r220256.
This fixes PR21497.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221456 91177308-0d34-0410-b5e6-96231b3b80d8
Example:
define <4 x i32> @test(<4 x i32> %a, <4 x i32> %b) {
%shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 5, i32 6, i32 3>
ret <4 x i32> %shuffle
}
Before llc (-mattr=+sse4.1), produced the following assembly instruction:
pblendw $4294967103, %xmm1, %xmm0
After
pblendw $63, %xmm1, %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221455 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, we give an error if %z is used with non-immediates, instead of continuing as if the %z isn't there.
For example, you use the %z operand modifier along with the "Jr" constraints ("r" makes the operand a register, and "J" makes it an immediate, but only if its value is 0).
In this case, you want the compiler to print "$0" if the inline asm input operand turns out to be an immediate zero and you want it to print the register containing the operand, if it's not.
We give an error in the latter case, and we shouldn't (GCC also doesn't).
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6023
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221453 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Improved warning message when using .cpload inside a reorder section and added an error message for using .cpload with Mips16 enabled.
Modified the tests to fit with the changes mentioned above, added a test-case for the N32 ABI in cpload.s and did some reformatting to make the tests easier to read.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5465
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221447 91177308-0d34-0410-b5e6-96231b3b80d8
Use the position of the subsequent symbol in the object file to infer
the size of it's predecessor. I hope to eventually remove whatever COFF
specific details from this little algorithm so that we can unify this
logic with what Mach-O does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221444 91177308-0d34-0410-b5e6-96231b3b80d8
When generating gcov compatible profiling, we sometimes skip emitting
data for functions for one reason or another. However, this was
emitting different function IDs in the .gcno and .gcda files, because
the .gcno case was using the loop index before skipping functions and
the .gcda the array index after. This resulted in completely invalid
gcov data.
This fixes the problem by making the .gcno loop track the ID
separately from the loop index.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221441 91177308-0d34-0410-b5e6-96231b3b80d8
If a section cannot be dead stripped, it is safe to use L symbols, since
the linker will keep all of it in the end.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221431 91177308-0d34-0410-b5e6-96231b3b80d8
condition to match a blend.
This prevents optimizations that work on VSELECT to perform invalid
transformations. Indeed, the optimized condition does not match the vector
boolean content that is expected and bad things may happen.
This patch yields the exact same code on the whole test-suite + specs (-O3 and
-O3 -march=core-avx2), it improves one test case (vector-blend.ll) and fixes a
bug reduced in vselect-avx.ll.
<rdar://problem/18819506>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221429 91177308-0d34-0410-b5e6-96231b3b80d8
Added missing memory folding for the (V)CVTDQ2PS instructions - we can safely fold these (but not the (V)CVTDQ2PD versions which have a register/memory size discrepancy in the source operand). I've added a test case demonstrating that stack folding now works.
Differential Revision: http://reviews.llvm.org/D5981
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221407 91177308-0d34-0410-b5e6-96231b3b80d8
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.
If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :
%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8
The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.
Fixes PR21465.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221377 91177308-0d34-0410-b5e6-96231b3b80d8
We currently have no infrastructure to support these correctly.
This is accomplished by generating a call to a runtime library function that
aborts at runtime in place of the regular wrapper for such functions. Direct
calls are rewritten in the usual way during traversal of the caller's IR.
We also remove the "split-stack" attribute from such wrappers, as the code
generator cannot currently handle split-stack vararg functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221360 91177308-0d34-0410-b5e6-96231b3b80d8
This matches the format produced by the AMD proprietary driver.
//==================================================================//
// Shell script for converting .ll test cases: (Pass the .ll files
you want to convert to this script as arguments).
//==================================================================//
; This was necessary on my system so that A-Z in sed would match only
; upper case. I'm not sure why.
export LC_ALL='C'
TEST_FILES="$*"
MATCHES=`grep -v Patterns SIInstructions.td | grep -o '"[A-Z0-9_]\+["e]' | grep -o '[A-Z0-9_]\+' | sort -r`
for f in $TEST_FILES; do
# Check that there are SI tests:
grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f
if [ $? -eq 0 ]; then
for match in $MATCHES; do
sed -i -e "s/\([ :]$match\)/\L\1/" $f
done
# Try to get check lines with partial instruction names
sed -i 's/\(;[ ]*SI[A-Z\\-]*: \)\([A-Z_0-9]\+\)/\1\L\2/' $f
fi
done
sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll
sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.*32.ll ../../../test/CodeGen/R600/sext-in-reg.ll
sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll
sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll
sed -i 's/\(; CHECK[-NOT]*: \)\([A-Z_0-9]\+\)/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll
//==================================================================//
// Shell script for converting .td files (run this last)
//==================================================================//
export LC_ALL='C'
sed -i -e '/Patterns/!s/\("[A-Z0-9_]\+[ "e]\)/\L\1/g' SIInstructions.td
sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221350 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.
This allows for example to simplify the following code:
define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
%1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
%2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
%3 = or <4 x i32> %1, %2
ret <4 x i32> %3
}
Before this patch llc (-mcpu=corei7) generated:
andps LCPI1_0(%rip), %xmm0, %xmm0
andps LCPI1_1(%rip), %xmm1, %xmm1
orps %xmm1, %xmm0, %xmm0
retq
With this patch we generate a single 'vpblendw'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221343 91177308-0d34-0410-b5e6-96231b3b80d8
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.
I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221341 91177308-0d34-0410-b5e6-96231b3b80d8
Exact shifts may not shift out any non-zero bits. Use computeKnownBits
to determine when this occurs and just return the left hand side.
This fixes PR21477.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221325 91177308-0d34-0410-b5e6-96231b3b80d8
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221321 91177308-0d34-0410-b5e6-96231b3b80d8
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.
This had a couple of potential issues:
+ The .cfi directives we emit misplaced dN because they were based on
PrologEpilogInserter's calculation.
+ Unaligned stores can be less efficient.
+ Unaligned stores can actually fault (likely only an issue in niche cases,
but possible).
This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221320 91177308-0d34-0410-b5e6-96231b3b80d8
Divides and remainder operations do not behave like other operations
when they are given poison: they turn into undefined behavior.
It's really hard to know if the operands going into a div are or are not
poison. Because of this, we should only choose to speculate if there
are constant operands which we can easily reason about.
This fixes PR21412.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221318 91177308-0d34-0410-b5e6-96231b3b80d8
Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.
This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).
Differential Revision: http://reviews.llvm.org/D6015
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221313 91177308-0d34-0410-b5e6-96231b3b80d8
While fixing up the register classes in the machine combiner in a previous
commit I missed one.
This fixes the last one and adds a test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221308 91177308-0d34-0410-b5e6-96231b3b80d8
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.
A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.
The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.
But for now, these sizes seem small enough to go ahead with this.
Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221306 91177308-0d34-0410-b5e6-96231b3b80d8
We were producing a relocation for
----------------
.section foo,bar
La:
Lb:
.long La-Lb
--------------
but not for
---------------------
.section foo,bar
zed:
La:
Lb:
.long La-Lb
----------------
This patch handles the case where both fragments are part of the first atom
in a section and there is no corresponding symbol to that atom.
This fixes pr21328.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221304 91177308-0d34-0410-b5e6-96231b3b80d8
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.
Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221293 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).
These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).
Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5462
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221277 91177308-0d34-0410-b5e6-96231b3b80d8
test/MC/ARM/directive-eabi_attribute.s was missing several tests of object file
encodings relative to the existing tests for assembly file encodings. This
commit adds the missing tests.
Change-Id: Ie110ca02b65e8f4d4c77f437bd09d03607fa5c0d
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221250 91177308-0d34-0410-b5e6-96231b3b80d8
This is experimental, just barely enough to get things to not
immediately combust.
A note for those who are curious:
Only lld can successfully link the object files, other linkers truncate
the section names making the debug sections illegible to debuggers.
Even with this in mind, we believe we are having trouble with SECREL
relocations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221245 91177308-0d34-0410-b5e6-96231b3b80d8
the tombstone or empty keys of a DenseMap<int64_t, T>. This patch
fixes the issue (and adds a tests case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221214 91177308-0d34-0410-b5e6-96231b3b80d8
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.
This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221203 91177308-0d34-0410-b5e6-96231b3b80d8
Currently they are passed to tests of llvm itself, but not, for example, lld.
With this patch the options are visible in every test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221198 91177308-0d34-0410-b5e6-96231b3b80d8
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into. The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.
The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.
This fixes PR21377.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221187 91177308-0d34-0410-b5e6-96231b3b80d8
register class tGPRRegClass if the target is thumb1.
This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
rdar://problem/18740489
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221178 91177308-0d34-0410-b5e6-96231b3b80d8
For 8-bit divrems where the remainder is used, we used to generate:
divb %sil
shrw $8, %ax
movzbl %al, %eax
That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.
This patch optimizes that to:
divb %sil
movzbl %ah, %eax
To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register. The extension is done using the NOREX variants of
MOVZX. To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).
Differential Revision: http://reviews.llvm.org/D6064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221176 91177308-0d34-0410-b5e6-96231b3b80d8
EarlyCSE uses a simple generation scheme for handling memory-based
dependencies, and calls to @llvm.assume (which are marked as writing to memory
to ensure the preservation of control dependencies) disturb that scheme
unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative
(adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that).
Fixes PR21448.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221175 91177308-0d34-0410-b5e6-96231b3b80d8
Function calls aren't supported yet.
This was reverted due to build breakages, which should be fixed now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221173 91177308-0d34-0410-b5e6-96231b3b80d8
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.
Commit includes the test that found this, plus another one that got
missed in the original optnone work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221168 91177308-0d34-0410-b5e6-96231b3b80d8
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.
LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.
This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.
Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221166 91177308-0d34-0410-b5e6-96231b3b80d8
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221157 91177308-0d34-0410-b5e6-96231b3b80d8
test/MC/ARM/directive-eabi_attribute.s had gotten out-of-sync with
test/MC/ARM/directive-eabi_attribute-2.s. The former tests the encoding of
build attributes in object files, and the latter the encoding in assembly
files. Since both these tests need to be updated at the same time, it makes
sense to combine them into a single test. The object file encodings are being
checked against the ouput of -arm-attributes rather than by direct byte
comparisons which makes for easier reading.
Change-Id: I0075de506ae5626fb2fa235383fe5ce6a65a15a9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221155 91177308-0d34-0410-b5e6-96231b3b80d8
The MRI scripts have to work with CRLF, and in general it is probably
a good idea to support this in a core utility like LineIterator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221153 91177308-0d34-0410-b5e6-96231b3b80d8
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):
A symbol table entry with STB_LOCAL binding that is defined relative to one
of a group's sections, and that is contained in a symbol table section that is
not part of the group, must be discarded if the group members are discarded.
References to this symbol table entry from outside the group are not allowed.
This means that we need to use the function symbol for the relocation, not a
temporary symbol.
There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221150 91177308-0d34-0410-b5e6-96231b3b80d8
Bindings built out-of-tree, e.g. via OPAM, should append
a line to META.llvm like the following:
linkopts = "-cclib -L$libdir -cclib -Wl,-rpath,$libdir"
where $libdir is the lib/ directory where LLVM libraries are
installed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221139 91177308-0d34-0410-b5e6-96231b3b80d8
ocamlc and ocamlopt expose a distinct set of buildsystem bugs, e.g.
only ocamlc would detect -custom or -dllib-related bugs, and as all
buildbots will have ocamlopt, these bugs will stay hidden.
This change should add no more than 30 seconds of testing time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221137 91177308-0d34-0410-b5e6-96231b3b80d8
The issue was that linkAppendingVarProto does the full linking job, including
deleting the old dst variable. The fix is just to call it and return early
if we have a GV with appending linkage.
original message:
Refactor duplicated code in liking GlobalValues.
There is quiet a bit of logic that is common to any GlobalValue but was
duplicated for Functions, GlobalVariables and GlobalAliases.
While at it, merge visibility even when comdats are used, fixing pr21415.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221098 91177308-0d34-0410-b5e6-96231b3b80d8
This commit introduces heap-use-after-free detected by ASan. Here is the output
for one of several tests that detect it:
******************** TEST 'LLVM :: Linker/AppendingLinkage.ll' FAILED ********************
Command Output (stderr):
--
=================================================================
==2122==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c00000b9c8 at pc 0x0000005d05d1 bp 0x7fff64ed27c0 sp 0x7fff64ed27b8
READ of size 4 at 0x60c00000b9c8 thread T0
#0 0x5d05d0 in llvm::GlobalValue::setUnnamedAddr(bool) /usr/local/google/home/chandlerc/src/llvm/build/../include/llvm/IR/GlobalValue.h:115:35
#1 0x69fff1 in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1041:5
#2 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
#3 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
#4 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
#5 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
#6 0x41eb71 in _start (/usr/local/google/home/chandlerc/src/llvm/build/bin/llvm-link+0x41eb71)
0x60c00000b9c8 is located 72 bytes inside of 128-byte region [0x60c00000b980,0x60c00000ba00)
freed by thread T0 here:
#0 0x4a1e6b in operator delete(void*) /usr/local/google/home/chandlerc/src/llvm/opt-build/../projects/compiler-rt/lib/asan/asan_new_delete.cc:94:3
#1 0x5d1a7a in llvm::iplist<llvm::GlobalVariable, llvm::ilist_traits<llvm::GlobalVariable> >::erase(llvm::ilist_iterator<llvm::GlobalVariable>) /usr/local/google/home/chandlerc/src/llvm/build/../inclu
de/llvm/ADT/ilist.h:466:5
#2 0x5d1980 in llvm::GlobalVariable::eraseFromParent() /usr/local/google/home/chandlerc/src/llvm/build/../lib/IR/Globals.cpp:204:3
#3 0x6a8a4d in (anonymous namespace)::ModuleLinker::linkAppendingVarProto(llvm::GlobalVariable*, llvm::GlobalVariable const*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.
cpp:980:3
#4 0x6a7403 in (anonymous namespace)::ModuleLinker::linkGlobalVariableProto(llvm::GlobalVariable const*, llvm::GlobalValue*, bool) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkMod
ules.cpp:1074:11
#5 0x69ff4e in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1028:13
#6 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
#7 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
#8 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
#9 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
previously allocated by thread T0 here:
#0 0x4a192b in operator new(unsigned long) /usr/local/google/home/chandlerc/src/llvm/opt-build/../projects/compiler-rt/lib/asan/asan_new_delete.cc:62:35
#1 0x61d85c in llvm::User::operator new(unsigned long, unsigned int) /usr/local/google/home/chandlerc/src/llvm/build/../lib/IR/User.cpp:57:19
#2 0x6a7525 in (anonymous namespace)::ModuleLinker::linkGlobalVariableProto(llvm::GlobalVariable const*, llvm::GlobalValue*, bool) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkMod
ules.cpp:1100:3
#3 0x69ff4e in (anonymous namespace)::ModuleLinker::linkGlobalValueProto(llvm::GlobalValue*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1028:13
#4 0x697229 in (anonymous namespace)::ModuleLinker::run() /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1485:9
#5 0x696542 in llvm::Linker::linkInModule(llvm::Module*) /usr/local/google/home/chandlerc/src/llvm/build/../lib/Linker/LinkModules.cpp:1621:10
#6 0x4a2db7 in main /usr/local/google/home/chandlerc/src/llvm/build/../tools/llvm-link/llvm-link.cpp:116:9
#7 0x7f4ae61e5ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
SUMMARY: AddressSanitizer: heap-use-after-free /usr/local/google/home/chandlerc/src/llvm/build/../include/llvm/IR/GlobalValue.h:115 llvm::GlobalValue::setUnnamedAddr(bool)
Shadow bytes around the buggy address:
0x0c187fff96e0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c187fff96f0: 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa fa
0x0c187fff9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x0c187fff9710: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c187fff9720: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
=>0x0c187fff9730: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd
0x0c187fff9740: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c187fff9750: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
0x0c187fff9760: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c187fff9770: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
0x0c187fff9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
ASan internal: fe
==2122==ABORTING
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221096 91177308-0d34-0410-b5e6-96231b3b80d8
m_ZExt might bind against a ConstantExpr instead of an Instruction.
Assuming this, using cast<Instruction>, results in InstCombine crashing.
Instead, introduce ZExtOperator to bridge both Instruction and
ConstantExpr ZExts.
This fixes PR21445.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221069 91177308-0d34-0410-b5e6-96231b3b80d8
Will try to find a portable way to test this (or a fixed-target test I
can add such coverage to) shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221068 91177308-0d34-0410-b5e6-96231b3b80d8
This was a compile-unit specific label (unused in type units) and seems
unnecessary anyway when we can more easily directly compute the size of
the compile unit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221067 91177308-0d34-0410-b5e6-96231b3b80d8
This can happen pretty often in code that looks like:
int foo = bar - 1;
if (foo < 0)
do stuff
In this case, bar < 1 is an equivalent condition.
This transform requires that the add instruction be annotated with nsw.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221045 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r221028. Later commits depend on this and
reverting just this one causes even more bots to fail.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221041 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch extends the 'show' and 'merge' commands in llvm-profdata to handle
sample PGO formats. Using the 'merge' command it is now possible to convert
one sample PGO format to another.
The only format that is currently not working is 'gcc'. I still need to
implement support for it in lib/ProfileData.
The changes in the sample profile support classes are needed for the
merge operation.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221032 91177308-0d34-0410-b5e6-96231b3b80d8
"[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221028 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r220996.
It introduced layering violations causing link errors in many
configurations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221020 91177308-0d34-0410-b5e6-96231b3b80d8
There is quiet a bit of logic that is common to any GlobalValue but was
duplicated for Functions, GlobalVariables and GlobalAliases.
While at it, merge visibility even when comdats are used, fixing pr21415.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221014 91177308-0d34-0410-b5e6-96231b3b80d8
We need to figure out how to track ptrtoint values all the
way until result is converted back to a pointer in order
to correctly rewrite the pointer type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220997 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions. This patch adds
basic support for VSX intrinsics in general, and tests it by
implementing intrinsics for minimum and maximum for the vector double
data type.
The LLVM portion of this is quite straightforward. There is a
companion patch for Clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220988 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
%in1 = load <2 x i32>* %addr1, align 8
%extract = extractelement <2 x i32> %in1, i32 1
%out = or i32 %extract, 1
store i32 %out, i32* %dest, align 4
ret void
}
As it is, this IR generates the following assembly on armv7:
vldr d16, [r0] @vector load
vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles
orr r0, r0, #1 @ scalar bitwise or
str r0, [r1] @ scalar store
bx lr
Whereas we could generate much faster code:
vldr d16, [r0] @ vector load
vorr.i32 d16, #0x1 @ vector bitwise or
vst1.32 {d16[1]}, [r1:32] @ vector extract + store
bx lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220978 91177308-0d34-0410-b5e6-96231b3b80d8
In a case where we have a no {un,}signed wrap flag on the increment, if
RHS - Start is constant then we can avoid inserting a max operation bewteen
the two, since we can statically determine which is greater.
This allows us to unroll loops such as:
void testcase3(int v) {
for (int i=v; i<=v+1; ++i)
f(i);
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220960 91177308-0d34-0410-b5e6-96231b3b80d8
Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.
The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220959 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically:
* Directories match module names.
* Test names match module names.
* The language is called "OCaml", not "Ocaml".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220958 91177308-0d34-0410-b5e6-96231b3b80d8
Since JIT->MCJIT migration, most of the ExecutionEngine interface
became deprecated and/or broken. This especially affected the OCaml
bindings, as runFunction is no longer available, and unlike in C,
it is not possible to coerce a pointer to a function and call it
in OCaml.
In practice, LLVM 3.5 shipped completely unusable
Llvm_executionengine.
The GenericValue interface and runFunction were essentially
a poor man's FFI. As such, this interface was removed and instead
a dependency on ctypes >=0.3 added, which handled platform-specific
aspects of accessing data and calling functions.
The new interface does not expose JIT (which is a shim around MCJIT),
as well as the interpreter (which can't handle a lot of valid IR).
Llvm_executionengine.add_global_mapping is currently unusable
due to PR20656.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220957 91177308-0d34-0410-b5e6-96231b3b80d8
Do a better job classifying symbols. This increases the consistency
between the COFF handling code and the ELF side of things.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220952 91177308-0d34-0410-b5e6-96231b3b80d8
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands. On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.
It did that by choosing to widen v1 types whenever possible. However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.
This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.
This is similar to r205625.
Fixes PR20777.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220937 91177308-0d34-0410-b5e6-96231b3b80d8
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.
Tests included.
rdar://18816719
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220933 91177308-0d34-0410-b5e6-96231b3b80d8
The langref says:
LLVM explicitly allows declarations of global variables to be marked
constant, even if the final definition of the global is not. This
capability can be used to enable slightly better optimization of the
program, but requires the language definition to guarantee that
optimizations based on the ‘constantness’ are valid for the
translation units that do not include the definition.
Given that definition, when merging two declarations, we have to drop
constantness if of of them is not marked contant, since the Module
without the constant marker might not have the necessary guarantees.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220927 91177308-0d34-0410-b5e6-96231b3b80d8
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes. This helps to remove redundant checks and canonicalize checks for other optimization passes. This particular patch checks whether a value is known to be non-zero from the range metadata.
Currently, these tests are against InstCombine. In theory, all of these should be InstSimplify since we're not inserting any new instructions. Moving the code may follow in a separate change.
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220925 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch finishes up support for handling sampling profiles in both
text and binary formats. The new binary format uses uleb128 encoding to
represent numeric values. This makes profiles files about 25% smaller.
The profile writer class can write profiles in the existing text and the
new binary format. In subsequent patches, I will add the capability to
read (and perhaps write) profiles in the gcov format used by GCC.
Additionally, I will be adding support in llvm-profdata to manipulate
sampling profiles.
There was a bit of refactoring needed to separate some code that was in
the reader files, but is actually common to both the reader and writer.
The new test checks that reading the same profile encoded as text or
raw, produces the same results.
Reviewers: bogner, dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6000
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220915 91177308-0d34-0410-b5e6-96231b3b80d8
It should be on for every target that supports unaligned accesses (e.g. not
v6m).
Patch by Charlie Turner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220912 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The previous calling convention prevented custom functions from being able
to access argument labels unless it knew how many variadic arguments there
were, and of which type. This restriction made it impossible to correctly
model functions in the printf family, as it is legal to pass more arguments
than required to those functions. We now pass arguments in the following order:
non-vararg arguments
labels for non-vararg arguments
[if vararg function, pointer to array of labels for vararg arguments]
[if non-void function, pointer to label for return value]
vararg arguments
Differential Revision: http://reviews.llvm.org/D6028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220906 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this commit, the Llvm_target tests (ab)used
the Llvm_executionengine as a mechanism to initialize at least some
target. This needlessly restricted tests to builds which can emit
code for their host architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220901 91177308-0d34-0410-b5e6-96231b3b80d8
This commit updates the OCaml bindings and tests to use ocamlfind.
The bindings are migrated in order to use ctypes, which are now
required for MCJIT-backed Llvm_executionengine.
The tests are migrated in order to use OUnit and to verify that
the distributed META.llvm allows to build working executables.
Every OCaml toolchain invocation is now chained through ocamlfind,
which (in theory) allows to cross-compile the OCaml bindings.
The configure script now checks for ctypes (>= 0.2.3) and
OUnit (>= 2). The code depending on these libraries will be added
later. The configure script does not check the package versions
in order to keep changes less invasive.
Additionally, OCaml bindings will now be automatically enabled
if ocamlfind is detected on the system, rather than ocamlc, as it
was before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220899 91177308-0d34-0410-b5e6-96231b3b80d8
The also-emit-llvm option only supported getting the IR before optimizations.
This patch replaces it with a more generic save-temps option that saves the IR
both before and after optimizations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220885 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This helps llvm-objdump -r to print out the symbol name along
with the relocation type on x86. Adjust existing tests from checking
for "Unknown" to check for the symbol now.
Test Plan: Adjusted test/Object tests.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5987
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220866 91177308-0d34-0410-b5e6-96231b3b80d8
In practice this means:
* Always using -g flag.
* Embedding -cclib -lstdc++ into the corresponding cma/cmxa file.
This also moves -lstdc++ in a single place.
* Using caml_named_value instead of a homegrown mechanism.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220843 91177308-0d34-0410-b5e6-96231b3b80d8
For example, MS PSDK is not expected to have <cxxabi.h>.
You should introduce the new feature in lit.cfg corresponding to HAVE_CXXABI_H if you would like to test demangler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220840 91177308-0d34-0410-b5e6-96231b3b80d8
Remove pointless checks for storage of uninteresting values. Ensure that we
perform basic alias analysis to make the test more correct. Finally, apply a
stylistic change to the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220839 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, tests hardcoded ocamlopt and cmxa, which broke builds on
machines without ocamlopt. Instead, they now fall back to ocamlc.
As a side effect this fixes PR14727, which was caused by a crude hack
that replaced gcc with g++ everywhere in the ocamlopt native compiler
path and passes it back using -cc. Now the tests use the same
technique as META, i.e. -cclib -lstdc++. It might be more fragile
than using g++ explicitly, but it will break when the installed
package will also break, which is good.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220828 91177308-0d34-0410-b5e6-96231b3b80d8
This restores the commit from SVN r219899 with an additional change to ensure
that the CodeGen is correct for the case that was identified as being incorrect
(originally PR7272).
In the case that during inlining we need to synthesize a value on the stack
(i.e. for passing a value byval), then any function involving that alloca must
be stripped of its tailness as the restriction that it does not access the
parent's stack no longer holds. Unfortunately, a single alloca can cause a
rippling effect through out the inlining as the value may be aliased or may be
mutated through an escaped external call. As such, we simply track if an alloca
has been introduced in the frame during inlining, and strip any tail calls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220811 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
Added tests for vselect of <X x i1> values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220777 91177308-0d34-0410-b5e6-96231b3b80d8
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code.
Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll):
define <16 x i32> @_inreg16xi32(i32 %a) {
; CHECK-LABEL: _inreg16xi32:
; CHECK: ## BB#0:
-; CHECK-NEXT: vpbroadcastd %edi, %zmm0
+; CHECK-NEXT: vmovd %edi, %xmm0
+; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0
+; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
; CHECK-NEXT: retq
%b = insertelement <16 x i32> undef, i32 %a, i32 0
%c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer
ret <16 x i32> %c
}
Here, 256-bit broadcast was generated instead of 512-bit one.
In this patch
1) I added vector-shuffle lowering through broadcasts
2) Removed asserts and branches likes because this is incorrect
- assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI");
3) Fixed lowering tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220774 91177308-0d34-0410-b5e6-96231b3b80d8
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.
Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.
On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D5943
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220745 91177308-0d34-0410-b5e6-96231b3b80d8
Benchmarks have shown that it's harmless to the performance there, and having a
unified set of passes between the two cores where possible helps big.LITTLE
deployment.
Patch by Z. Zheng.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220744 91177308-0d34-0410-b5e6-96231b3b80d8
This is implemented via a multiclass that derives from the vperm imm
multiclass.
Fixes <rdar://problem/18426089>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220737 91177308-0d34-0410-b5e6-96231b3b80d8
For a call to not return in to the stackmap shadow, the shadow must end with the call.
To do this, we must insert any required nops *before* the call, and not after it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220728 91177308-0d34-0410-b5e6-96231b3b80d8
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.
This fixes rdar://problem/18785125.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220713 91177308-0d34-0410-b5e6-96231b3b80d8
Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and'
instruction.
This fixes rdar://problem/18784953.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220712 91177308-0d34-0410-b5e6-96231b3b80d8
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.
A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220710 91177308-0d34-0410-b5e6-96231b3b80d8
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.
This fixes rdar://problem/18784732.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220709 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.
This fixes rdar://problem/18784013.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220704 91177308-0d34-0410-b5e6-96231b3b80d8
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.
This fix extends the possible addressing modes for frame indices.
This fixes rdar://problem/18783298.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220700 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
(a < b) ? a : b
(a > b) ? a : b
but not these expressions:
(a > b) ? b : a
(a < b) ? b : a
This patch allows all of these expressions to be matched.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220671 91177308-0d34-0410-b5e6-96231b3b80d8
An icmp may have pointer arguments, it isn't limited to integers or
vectors of integers.
This fixes PR21388.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220664 91177308-0d34-0410-b5e6-96231b3b80d8
First, return true on success, as it is the OCaml convention.
Second, also initialize the native assembly printer, which is,
despite the name, required for MCJIT operation.
Since this function did not initialize the assembly printer earlier
and no function to initialize native assembly printer was available
elsewhere, it is safe to break its interface: it means that it
simply could not be used successfully before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220620 91177308-0d34-0410-b5e6-96231b3b80d8
The dividend in "signed % unsigned" is treated as unsigned instead of signed,
causing unexpected behavior such as -64 % (uint64_t)24 == 0.
Added a regression test in split-gep.ll
Patched by Hao Liu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220618 91177308-0d34-0410-b5e6-96231b3b80d8
The two operands of the new OR expression should be NextInChain and TheOther
instead of the two original operands.
Added a regression test in split-gep.ll.
Hao Liu reported this bug, and provded the test case and an initial patch.
Thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220615 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.
Test Plan: test/Codegen/NVPTX/vector-return.ll
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D5612
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220607 91177308-0d34-0410-b5e6-96231b3b80d8
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section. But when either symbol is undefined it
is an error.
The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.
rdar://18678402
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220599 91177308-0d34-0410-b5e6-96231b3b80d8
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.
An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.
Differential Revision: http://reviews.llvm.org/D5917
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220592 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5963
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220556 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a miscompilation in the AArch64 fast-isel which was
triggered when a branch is based on an icmp with condition eq or ne,
and type i1, i8 or i16. The cbz instruction compares the whole 32-bit
register, so values with the bottom 1, 8 or 16 bits clear would cause
the wrong branch to be taken.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220553 91177308-0d34-0410-b5e6-96231b3b80d8
This is asm/diasm-only support, similar to AVX.
For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.
For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant. The addition operand (op3) in both cases can come from
memory. Again the ony difference is which operand is destructed.
There could be a post-RA pass that would convert a 213 or 132 into a 231.
Part of <rdar://problem/17082571>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220540 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for legalization of instructions of the form:
[fp_conv] <1 x i1> %op to <1 x double>
where fp_conv is one of fpto[us]i, [us]itofp. This used to assert
because they were simply missing from the vector operand scalarizer.
A similar problem arose in r190830, with trunc instead.
Fixes PR20778.
Differential Revision: http://reviews.llvm.org/D5810
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220533 91177308-0d34-0410-b5e6-96231b3b80d8
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.
The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.
Should fix PR20376.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220529 91177308-0d34-0410-b5e6-96231b3b80d8
While refactoring this code I was confused by both the name I had
introduced (addNonArgumentVariable... but it has all this logic to
handle argument numbering and keep things in order?) and by the
redundancy. Seems when I fixed the misordered inlined argument handling,
I didn't realize it was mostly redundant with the argument ordering code
(which I may've also written, I'm not sure). So let's just rely on the
more general case.
The only oddity in output this produces is that it means when we emit
all the variables for the current function, we don't track when we've
finished the argument variables and are about to start the local
variables and insert DW_AT_unspecified_parameters (for varargs
functions) there. Instead it ends up after the local variables, scopes,
etc. But this isn't invalid and doesn't cause DWARF consumers problems
that I know of... so we'll just go with that because it makes the code
nice & simple.
(though, let's see what the buildbots have to say about this - *crosses
fingers*)
There will be some cleanup commits to follow to remove the now trivial
wrappers, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220527 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where
3 are really needed.
This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to
MUL8/IMUL8 + SETO.
i8 is a special case because there is no two/three operand variants of
(I)MUL8, so the first operand and return value need to go in AL/AX.
Also, we can't write patterns for these instructions: TableGen refuses
patterns where output operands don't match SDNode results. In this case,
instructions where the output operand is an implicitly defined register.
A related special case (and FIXME) exists for MUL8 (X86InstrArith.td):
// FIXME: Used for 8-bit mul, ignore result upper 8 bits.
// This probably ought to be moved to a def : Pat<> if the
// syntax can be accepted.
[(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)]
Ideally, these go away with UMUL8, but we still need to improve TableGen
support of implicit operands in patterns.
Before this change:
movsbl %sil, %eax
movsbl %dil, %ecx
imull %eax, %ecx
movb %cl, %al
sarb $7, %al
movzbl %al, %eax
movzbl %ch, %esi
cmpl %eax, %esi
setne %al
After:
movb %dil, %al
imulb %sil
seto %al
Also, remove a made-redundant testcase for PR19858, and enable more FastISel
ALU-overflow tests for SelectionDAG too.
Differential Revision: http://reviews.llvm.org/D5809
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220516 91177308-0d34-0410-b5e6-96231b3b80d8