Commit Graph

129526 Commits

Author SHA1 Message Date
Nemanja Ivanovic
5a73f4bec5 [PowerPC] Basic support for P9 atomic loads and stores
This patch corresponds to review:
http://reviews.llvm.org/D18032

This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265022 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 15:26:37 +00:00
Jun Bum Lim
7a8b700c13 [AArch64] Handle missing store pair opportunity
Summary:
This change will handle missing store pair opportunity where the first store
instruction stores zero followed by the non-zero store. For example, this change
will convert :

  str wzr, [x8]
  str w1, [x8, #4]
into:
  stp wzr, w1, [x8]

Reviewers: jmolloy, t.p.northover, mcrosier

Subscribers: flyingforyou, aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18570

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265021 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:47:24 +00:00
Ulrich Weigand
ca50bf5da1 [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel
The fast isel pass currently emits a COPY_TO_REGCLASS node to convert
from a F4RC to a F8RC register class during conversion of a
floating-point number to integer. There is actually no support in the
common code instruction printers to emit COPY_TO_REGCLASS nodes, so the
PowerPC back-end has special code there to simply ignore
COPY_TO_REGCLASS.

This is correct *if and only if* the source and destination registers of
COPY_TO_REGCLASS are the same (except for the different register class).
But nothing guarantees this to be the case, and if the register
allocator does end up allocating source and destination to different
registers after all, the back-end simply generates incorrect code. I've
included a test case that shows such incorrect code generation.

However, it seems that COPY_TO_REGCLASS is actually not intended to be
used at the MI layer at all. It is used during SelectionDAG, but always
lowered to a plain COPY before emitting MI. Other back-end's fast isel
passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong
for the PowerPC back-end to emit it here.

This patch changes the PowerPC back-end to directly emit COPY instead of
COPY_TO_REGCLASS and removes the special handling in the instruction
printers.

Differential Revision: http://reviews.llvm.org/D18605


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265020 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:44:50 +00:00
Daniel Sanders
2fcdb70af7 [mips] Range check simm16
Summary:
There are too many instructions to exhaustively test so addiu and lwc2 are
used as representative examples.

It should be noted that many memory instructions that should have simm16
range checking do not because it is also necessary to support the macro
of the same name which accepts simm32. The range checks for these occur in
the macro expansion.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18437


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265019 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:34:00 +00:00
Daniel Sanders
15d116ada7 [mips] Range check simm11 and mem_simm11.
Summary:
ldc2/sdc2 now emit slightly worse diagnostics for MIPS-I. The problem
is that they don't trigger the custom parser because all the candidates
are disabled by feature bits. On all other subtargets, the diagnostics are
accurate but are subject to the usual issues of needing to report multiple
ways to correct the code (e.g. smaller offset, enable a CPU feature) but
only being able to report one error.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18436


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265018 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:23:20 +00:00
Dmitry Polukhin
dc022bad27 [IFUNC] Introduce GlobalIndirectSymbol as a base class for alias and ifunc
This patch is a part of http://reviews.llvm.org/D15525

GlobalIndirectSymbol class contains common implementation for both
aliases and ifuncs. This patch should be NFC change that just prepare
common code for ifunc support.

Differential Revision: http://reviews.llvm.org/D18433

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265016 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:16:21 +00:00
Sam Kolton
8fef4bc756 [AMDGPU] Disassembler: support for DPP
Review: http://reviews.llvm.org/D18642

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265015 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:15:04 +00:00
Daniel Sanders
2693fe3c51 [mips] Split mem_msa into range checked mem_simm10 and mem_simm10_lsl[123]
Summary:
Also, made test_mi10.s formatting consistent with the majority of the
MC tests.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18435


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265014 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 14:12:01 +00:00
Nirav Dave
9daabaed60 Prevent X86ISelLowering from merging volatile loads
Change isConsecutiveLoads to check that loads are non-volatile as this
is a requirement for any load merges. Propagate change to two callers.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18546

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265013 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 13:40:55 +00:00
Daniel Sanders
8cd0a25bf4 [mips] Range check simm9 and fix a bug this revealed.
Summary:
The bug was that microMIPS's [ls]w[lr]e instructions claimed to support a
12-bit offset when it is only 9-bit.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18434


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265010 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 13:15:23 +00:00
Benjamin Kramer
638cd0356d [TTI] Let the cost model estimate ctpop costs based on legality
PPC has a vector popcount, this lets the vectorizer use the correct cost
for it. Tweak X86 test to use an intrinsic that's actually scalarized (we
have a somewhat efficient lowering for vector popcount using SSE, the
cost model finds that now).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265005 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 10:42:40 +00:00
Zlatko Buljan
4e6485e747 [mips][microMIPS] Implement MFC*, MFHC* and DMFC* instructions
Differential Revision: http://reviews.llvm.org/D17334


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265002 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 08:51:24 +00:00
Jeroen Ketema
12c0e3a87f Silence warnings in OCaml bindings
* LLVMDisposeMessage lives in llvm-c/Core.h, include this file where necessary
* LLVMAddTargetData has been removed, follow suit in the bindings

Differential Revision: http://reviews.llvm.org/D18633


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265001 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 08:39:42 +00:00
Jonas Paulsson
0bfce3885d Indentation fix in SystemZInstrInfo.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265000 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 08:00:14 +00:00
Sanjoy Das
0d31c7f6c6 [InstCombine] Fix incorrect rule from rL236202
The rule for SMIN introduced in rL236202 doesn't work as advertised: the
check for Pred == ICmpInst::ICMP_SGT was missing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264996 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 05:14:34 +00:00
Sanjoy Das
2b00aae5c0 Delete trailing whitespace
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 05:14:29 +00:00
Sanjoy Das
e840b37dc0 [SCEV] Track NoWrap properties using MatchBinaryOp, NFC
This way once we teach MatchBinaryOp to map more things into arithmetic,
the non-wrapping add recurrence construction would understand it too.
Right now MatchBinaryOp still only understands arithmetic, so this is
solely a code-reorganization change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264994 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 05:14:26 +00:00
Sanjoy Das
bf8c7c9a0d [SCEV] NFC code motion to simplify later change
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264993 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 05:14:22 +00:00
Craig Topper
6c25a70663 [X86] Use MVT instead of EVT in code called after legalization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264992 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 04:37:41 +00:00
Davide Italiano
a528f3f7b7 [DebugInfo] Subprograms should belong to a CU.
Start fixing tests accordingly. There are still
about 35 failures before we can enable this check
in the IR verifier.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264990 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 03:40:07 +00:00
Hal Finkel
dc6860fc82 [PowerPC] Load two floats directly instead of using one 64-bit integer load
When dealing with complex<float>, and similar structures with two
single-precision floating-point numbers, especially when such things are being
passed around by value, we'll sometimes end up loading both float values by
extracting them from one 64-bit integer load. It looks like this:

  t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
      t16: i64 = srl t13, Constant:i32<32>
    t17: i32 = truncate t16
  t18: f32 = bitcast t17
    t19: i32 = truncate t13
  t20: f32 = bitcast t19

The problem, especially before the P8 where those bitcasts aren't legal (and
get expanded via the stack), is that it would have been better to use two
floating-point loads directly. Here we add a target-specific DAGCombine to do
just that. In short, we turn:

	ld 3, 0(5)
	stw 3, -8(1)
	rldicl 3, 3, 32, 32
	stw 3, -4(1)
	lfs 3, -4(1)
	lfs 0, -8(1)

into:

        lfs 3, 4(5)
        lfs 0, 0(5)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264988 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 02:56:05 +00:00
Sean Silva
0fb6150931 Fix case confusion.
The test case was defining and using a function 'notExported()', but
the FileCheck checks were checking for the name 'not_exported'. This
changes the test to use 'notExported' across the board. Also, the test
defined a function 'not_defined()', but doesn't have any checks related
to it. For consistency, this name is changed to 'notDefined'. A later
commit will add checks for 'notDefined'.

Patch by Warren Ristow!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264984 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 01:47:33 +00:00
Sanjoy Das
de765686f8 Introduce a @llvm.experimental.guard intrinsic
Summary:
As discussed on llvm-dev[1].

This change adds the basic boilerplate code around having this intrinsic
in LLVM:

 - Changes in Intrinsics.td, and the IR Verifier
 - A lowering pass to lower @llvm.experimental.guard to normal
   control flow
 - Inliner support

[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html

Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18527

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264976 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-31 00:18:46 +00:00
Hans Wennborg
d3f401b2e9 Add some more triples after r264966
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264972 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 23:55:22 +00:00
Hans Wennborg
b667d698ac [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.

Differential Revision: http://reviews.llvm.org/D18573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264966 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 23:38:01 +00:00
Matthias Braun
137f4d9ca1 CodeGen: Factor out code for tail call result compatibility check; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264959 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:46:04 +00:00
Matthias Braun
8c3b4a5a2f Avoid unnecessary #include; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264958 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:45:58 +00:00
Paul Robinson
cc01455dbf Update copyright year to 2016.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264954 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:41:06 +00:00
Matt Arsenault
e8cd894c28 AMDGPU: Add frexp_exp intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264944 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:28:52 +00:00
Matt Arsenault
f965621c5f AMDGPU: Constant folding for frexp_mant
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264943 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:28:26 +00:00
Teresa Johnson
d47ed92bc6 Use existing PrintEscapedString in AssemblyWriter
r264884 introduced a helper to escape the backslashes in the source file
path, but I since discovered an existing mechanism to escape strings.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264936 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:17:28 +00:00
Peter Collingbourne
af289e0441 Cloning: Reduce complexity of debug info cloning and fix correctness issue.
Commit r260791 contained an error in that it would introduce a cross-module
reference in the old module. It also introduced O(N^2) complexity in the
module cloner by requiring the entire module to be visited for each function.
Fix both of these problems by avoiding use of the CloneDebugInfoMetadata
function (which is only designed to do intra-module cloning) and cloning
function-attached metadata in the same way that we clone all other metadata.

Differential Revision: http://reviews.llvm.org/D18583

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264935 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:05:13 +00:00
Sanjay Patel
c3260cad54 fix typos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264933 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:38:20 +00:00
Aaron Ballman
59a288b05f Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264929 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:30:00 +00:00
Matt Arsenault
af539c40a3 LegalizeDAG: Don't replace vector store with integer if not legal
For the same reason as the corresponding load change.

Note that ExpandStore is completely broken for non-byte sized element
vector stores, but preserve the current broken behavior which has tests
for it. The behavior should be the same, but now introduces a new typed
store that is incorrectly split later rather than doing it directly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264928 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:15:18 +00:00
Matt Arsenault
532e88a080 LegalizeDAG: Don't replace vector load with integer unless legal
On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
If the access is unaligned, this would conclude that since i64 is legal,
it would convert it back to i64 and there is an endless legalization
loop.

Extract the logic for scalarizing the load into a new TargetLowering
function, where this can also replace the custom function AMDGPU
has for this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264927 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:15:10 +00:00
David Majnemer
82ef54d4b0 [IndVarSimplify] Don't insert after a catchswitch
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.

This fixes PR27133.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264926 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 21:12:06 +00:00
Justin Lebar
5fa9211ba4 Add #include <functional> to PassManagerBuilder, now that it uses std::function. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264923 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:52:40 +00:00
Simon Pilgrim
8f62cbea76 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264922 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:52:24 +00:00
Justin Lebar
6feb1718ed [NVPTX] Make NVVMReflect a function pass.
Summary:
Currently it's a module pass.  Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.

Reviewers: rnk

Subscribers: tra, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18615

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:40:11 +00:00
Justin Lebar
3daeed4a53 [PassManager] Make PassManagerBuilder::addExtension take an std::function, rather than a function pointer.
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style.  (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)

No functional change, and the new API is backwards-compatible.

Reviewers: chandlerc

Subscribers: joker.eph, cfe-commits

Differential Revision: http://reviews.llvm.org/D18613

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264918 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:39:29 +00:00
Justin Bogner
18af5051ba test: Remove a test for a transform that hasn't existed in 5 years.
The TailDup transform was removed in r138841 in 2011, along with most
of the tests for it. This test, however, was missed. Probably because
it had already been XFAIL'd for 3 years at that point (since r52243!)
and continued to fail when the opt flag for -tailduplicate stopped
being valid.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 20:36:07 +00:00
Hal Finkel
f5526fc013 Add a copy constructor to StringMap
There is code under review that requires StringMap to have a copy constructor,
and this makes StringMap more consistent with our other containers (like
DenseMap) that have copy constructors.

Differential Revision: http://reviews.llvm.org/D18506

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264906 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 19:54:56 +00:00
Hal Finkel
dfdada0adb [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264904 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 19:37:08 +00:00
Rong Xu
37444017a2 [PGO] PGOFuncName in LTO optimizations
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.

Also add a new map that maps PGOFuncName's MD5 value to the function definition.

Differential Revision: http://reviews.llvm.org/D17895





git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:37:52 +00:00
Reid Kleckner
564787f18e [cmake] Instead of testing char16_t for MSVC compat, directly ask cl.exe its version
Credit to Aaron Ballman for thinking of this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264886 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:19:39 +00:00
Teresa Johnson
2dec5fc2c0 Restore "[ThinLTO] Serialize the Module SourceFileName to/from LLVM assembly"
This restores commit 264869, with a fix for windows bots to properly
escape '\' in the path when serializing out. Added test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264884 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:15:08 +00:00
Chad Rosier
21db7b79c9 [AArch64] Fix warnings pointed out by Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264882 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 18:08:51 +00:00
Reid Kleckner
db91435d44 [cmake] Add -fms-compatibility-version=19 when clang-cl gives errors about char16_t
What we are really trying to do here is to figure out if we are using
the 2015 STL. Unfortunately, so far as I know the MSVC STL does not
define a version macro that we can check directly. Instead I wrote a
check to see if char16_t works.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264881 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 17:30:26 +00:00
Reid Kleckner
f3ba4d1f7a [cmake] Allow EH usage with clang-cl
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264880 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 17:28:21 +00:00