No functionality change. No change in X86.td.expanded except that we only set
the CD8 attributes for the memory variants. (This shouldn't be used unless we
have a memory operand.)
llvm-svn: 220736
1) i512mem -> f512mem (this is the packed FP input being permuted)
2) element size is 64 bits in EVEX_CD8 for PD.
(A good illustration why X86VectorVTInfo is useful)
llvm-svn: 220734
For a call to not return in to the stackmap shadow, the shadow must end with the call.
To do this, we must insert any required nops *before* the call, and not after it.
llvm-svn: 220728
This is a minor change to use the immediate version when the operand is a null
value. This should get rid of an unnecessary 'mov' instruction in debug
builds and align the code more with the one generated by SelectionDAG.
This fixes rdar://problem/18785125.
llvm-svn: 220713
To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets.
A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return.
llvm-svn: 220710
The pattern matching for a 'ConstantInt' value was too restrictive. Checking for
a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction.
This fixes rdar://problem/18784732.
llvm-svn: 220709
This fixes a bug where the input register was not defined for the 'tbz/tbnz'
instruction. This happened, because we folded the 'and' instruction from a
different basic block.
This fixes rdar://problem/18784013.
llvm-svn: 220704
At higher optimization levels the LLVM IR may contain more complex patterns for
loads/stores from/to frame indices. The 'computeAddress' function wasn't able to
handle this and triggered an assertion.
This fix extends the possible addressing modes for frame indices.
This fixes rdar://problem/18783298.
llvm-svn: 220700
sets as keys into a cache of interference matrice values in the Interference
constraint adder.
Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.
llvm-svn: 220688
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
(a < b) ? a : b
(a > b) ? a : b
but not these expressions:
(a > b) ? b : a
(a < b) ? b : a
This patch allows all of these expressions to be matched.
llvm-svn: 220671
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.
We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.
llvm-svn: 220652
This is a simple fix that brings the compilation time from 5min to 5s
on a specific real-world example. It's a large chain of computation in
a crypto routine (always a problem for SCEV). A unit test is not
feasible and there would be no way to check it. The fix is just basic
good practice for dealing with SCEVs, there's no risk of regression.
Patch by Daniel Reynaud!
llvm-svn: 220622
The dividend in "signed % unsigned" is treated as unsigned instead of signed,
causing unexpected behavior such as -64 % (uint64_t)24 == 0.
Added a regression test in split-gep.ll
Patched by Hao Liu.
llvm-svn: 220618
The two operands of the new OR expression should be NextInChain and TheOther
instead of the two original operands.
Added a regression test in split-gep.ll.
Hao Liu reported this bug, and provded the test case and an initial patch.
Thanks!
llvm-svn: 220615
Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1).
Minor patch agreed with qcolombet.
llvm-svn: 220613
These asserts can trigger if the worklist iteration order is
sufficiently unlucky. Instead of adding special case logic to handle
these edge conditions, just bail out on trying to transform them:
InstSimplify will get them when it reaches them on the worklist.
This fixes PR21378.
N.B. No test case is included because any test would rely on the
fragile worklist iteration order.
llvm-svn: 220612
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.
Test Plan: test/Codegen/NVPTX/vector-return.ll
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D5612
llvm-svn: 220607
In a Mach-O object file a relocatable expression of the form
SymbolA - SymbolB + constant is allowed when both symbols are
defined in a section. But when either symbol is undefined it
is an error.
The code was crashing when it had an undefined symbol in this case.
And should have printed a error message using the location information
in the relocation entry.
rdar://18678402
llvm-svn: 220599
So that it has access to getOrCreateGlobalVariableDIE. If we ever support
decsribing using directive in C++ classes (thus requiring support in type
units), it will certainly use another mechanism anyway.
Differential Revision: http://reviews.llvm.org/D5975
llvm-svn: 220594
Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element.
An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle.
Differential Revision: http://reviews.llvm.org/D5917
llvm-svn: 220592
Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst.
Adding encoding bits for add opcode.
Adding llvm-mc tests.
Removing unit tests.
http://reviews.llvm.org/D5624
llvm-svn: 220584
To do this, change the representation of lazy loaded functions.
The previous representation cannot differentiate between a function whose body
has been removed and one whose body hasn't been read from the .bc file. That
means that in order to drop a function, the entire body had to be read.
llvm-svn: 220580
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)
llvm-svn: 220578
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
llvm-svn: 220570
Summary:
No functional change yet, it's just an object replacement for an enum.
It will allow us to gather ABI information in a single place so that we can
start testing for properties of the ABI's instead of the ABI itself.
For example we will eventually be able to use:
ABI.MinStackAlignmentInBytes()
instead of:
(isABI_N32() || isABI_N64()) ? 16 : 8
which is clearer and more maintainable.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3341
llvm-svn: 220568
Summary:
i32 is always promoted to i64 so it no longer makes sense to assign i32 to
registers.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5964
llvm-svn: 220561
Summary:
Most structs were fixed by r218451 but those of between >32-bits and
<64-bits remained broken since they were not marked with [ASZ]ExtUpper.
This patch fixes the remaining cases by using
CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5963
llvm-svn: 220556