This reverts commit r248963.
Seems there's some standard libraries (and libcxxabi implementations)
that aren't -Wdeprecated clean... hrm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248972 91177308-0d34-0410-b5e6-96231b3b80d8
The custom code produces incorrect results if later reassociated.
Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:
movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
pand %xmm0, %xmm1
por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
psrld $16, %xmm0
por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
addps %xmm1, %xmm0
Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).
In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).
However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.
When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.
Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.
One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.
Fixes PR24512.
...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248965 91177308-0d34-0410-b5e6-96231b3b80d8
This particularly helps enforce the C++ Rule of 5 (for new move ops this
is already an error, but for a type only using C++98 features (copy
ctor/assign, dtor) it is only deprecated, not invalid)
Applying the flag for any GCC compatible compiler - GCC doesn't warn on
the Rule of 5 cases that C++11 deprecates, but it doesn't have other
false positives so far as I could see (compiling with GCC 4.8 didn't
produce any -Wdeprecated warnings I could spot).
Reviewers: aaron.ballman
Differential Revision: http://reviews.llvm.org/D13314
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248963 91177308-0d34-0410-b5e6-96231b3b80d8
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.
Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call. Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.
The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248959 91177308-0d34-0410-b5e6-96231b3b80d8
glibc's PowerPC /usr/include/asm/sigcontext.h, has this:
#ifdef __powerpc64__
#include <asm/elf.h>
#endif
and that contains defines of all of the relocation symbols, like this:
#define R_PPC_NONE 0
and if that file is included prior to including
include/llvm/Support/ELFRelocs/PowerPC*.def, which we cannot in general
prevent, the result will fail.
As it turns out, this happens when compiling
lld/unittests/DriverTests/GnuLdDriverTest.cpp under PPC64/Linux, because:
lld/include/lld/ReaderWriter/ELFLinkingContext.h includes
lld/unittests/DriverTests/DriverTest.h which includes
utils/unittest/googletest/include/gtest/gtest.h which includes
utils/unittest/googletest/include/gtest/internal/gtest-internal.h which includes
/usr/include/sys/wait.h which includes
/usr/include/signal.h which includes
/usr/include/bits/sigcontext.h which includes
/usr/include/asm/sigcontext.h which includes
/usr/include/asm/elf.h
the test could be fixed to include ReaderWriter/ELFLinkingContext.h before
including unittests/DriverTests/DriverTest.h, but dealing with this in the
*.def files is a more-general solution that localizes the fix to the headers
instead of requiring changes to an unbounded number of other source files (both
in-tree and external).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248957 91177308-0d34-0410-b5e6-96231b3b80d8
Unscaled load/store combining has been enabled since the initial ARM64 port. No
need for a redundance run. Also, add CHECK-LABEL directives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248945 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.
PS: Initial patch was proposed by Arnold.
Reviewers: aschwaighofer, nadav, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248943 91177308-0d34-0410-b5e6-96231b3b80d8
The test requires X86 target support, and checks the actual debug
info contents, including register numbers which would be different on
other platforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248938 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the index was constrained to the size of the memory operation for
no apparent reason. This change removes that constraint so that we can form
pre-index instructions with any valid offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248931 91177308-0d34-0410-b5e6-96231b3b80d8
Same strategy as simplifyInstructionsInBlock. ~1/3 less time
on my test suite. This pass doesn't have many in-tree users,
but getting rid of an O(N^2) worst case and making it cleaner
should at least make it a viable alternative to ADCE, since
it's now consistently somewhat faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248927 91177308-0d34-0410-b5e6-96231b3b80d8
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248924 91177308-0d34-0410-b5e6-96231b3b80d8
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.
In particular, there were no tests for TrustZone being supported in these
architectures.
The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.
Differential Revision: http://reviews.llvm.org/D13236
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248921 91177308-0d34-0410-b5e6-96231b3b80d8
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248917 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.
Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.
We may end up with byte shuffles with constant masks as the result of inlining.
Differential Revision: http://reviews.llvm.org/D13252
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248913 91177308-0d34-0410-b5e6-96231b3b80d8
When building a plugin against an installed LLVM toolchain using
add_llvm_loadable_module (in the documented manner) doesn't work as nothing sets
the *_OUTPUT_INTDIR variables causing an error when set_output_directory is
called. Making those arguments optional (causing the default output directory
to be used) fixes this.
Differential Revision: http://reviews.llvm.org/D13215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248911 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
As per Duncan's review for D12536, I extracted the sub-byte bit aligned
reading and writing code into lib/Support, and generalized it. Added calls from
BackpatchWord. Also added unittests.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248897 91177308-0d34-0410-b5e6-96231b3b80d8
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,
<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)
to
<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)
This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.
Differential Revision: http://reviews.llvm.org/D12985
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248887 91177308-0d34-0410-b5e6-96231b3b80d8
When using LLVMConfig.cmake from an installed toolchain in order to build a
loadable pass using add_llvm_loadable_module LLVM_ENABLE_PLUGINS and
LLVM_PLUGIN_EXT must be set. Also make LLVM_DEFINITIONS be set to what it
actually is.
Differential Revision: http://reviews.llvm.org/D13214
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248884 91177308-0d34-0410-b5e6-96231b3b80d8
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.
Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.
Differential Revision: http://reviews.llvm.org/D8690
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248878 91177308-0d34-0410-b5e6-96231b3b80d8
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.
This is a candidate for stable llvm 3.7.
Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248858 91177308-0d34-0410-b5e6-96231b3b80d8
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248857 91177308-0d34-0410-b5e6-96231b3b80d8
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow. However, we'd insert code to
initialize the return address after stack adjustments. Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248839 91177308-0d34-0410-b5e6-96231b3b80d8
Add support to the indexed instrprof reader and writer for the format
that will be used for value profiling.
Patch by Betul Buyukkurt, with minor modifications.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248833 91177308-0d34-0410-b5e6-96231b3b80d8
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.
In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.
When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.
One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)
Differential Revision: http://reviews.llvm.org/D12681
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248832 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Funclets have been turned into functions by the time they hit the object
file. Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.
Differential Revision: http://reviews.llvm.org/D13261
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248824 91177308-0d34-0410-b5e6-96231b3b80d8
PDB files have a lot of noise in them, with hundreds (or thousands)
of symbols from system libraries and compiler generated types. If
you're only looking for a specific type, this can be problematic.
This CL allows you to display *only* types, variables, or compilands
matching a particular pattern. These filters can even be combined
with exclude filters. Include-only filters are given priority, so
that first the set of items to display is limited only to those that
match the include filters, and then the set of exclude filters is
applied to those. If there are no include filters specified, then
it means "display everything".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248822 91177308-0d34-0410-b5e6-96231b3b80d8