Commit Graph

124916 Commits

Author SHA1 Message Date
Xinliang David Li
9a2e877d9a [PGO] Stop using invalid char in instr variable names.
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).

This is the second try. The fix in this patch is very localized.
Only profile symbol names of profile symbols with internal 
linkage are fixed up while initializer of name syms are not 
changes. This means there is no format change nor version bump.

llvm-svn: 255434
2015-12-12 17:28:03 +00:00
Sanjay Patel
3f624d4650 [InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X))
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261

...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232

while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.

Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before 
proceeding with the transform.

llvm-svn: 255433
2015-12-12 16:44:48 +00:00
Simon Pilgrim
0023d32200 [X86][AVX] Tests tidyup
Cleanup/regenerate some tests for some upcoming patches.

llvm-svn: 255432
2015-12-12 12:52:52 +00:00
David Majnemer
8a1dde1744 Try to appease sphinx
llvm-svn: 255429
2015-12-12 06:56:02 +00:00
David Majnemer
2e1367257e Move catchpad-phi-cast.ll to the X86 specific subdirectory
It is X86 specific and will not be properly exercised unless LLVM is
built with the X86 target.

llvm-svn: 255426
2015-12-12 06:21:08 +00:00
David Majnemer
520173cadf Try to appease a buildbot
The builder complains thusly:
error C2027: use of undefined type 'llvm::raw_ostream'

Try to make it happy by including raw_ostream.h

llvm-svn: 255425
2015-12-12 05:53:20 +00:00
David Majnemer
bf189bdcd7 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Hal Finkel
a39d5a985d [PowerPC] OutStreamer cleanup in PPCAsmPrinter
We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and
LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already
available. NFC.

llvm-svn: 255418
2015-12-12 01:47:08 +00:00
Chen Li
7b8ddaa293 [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255415
2015-12-12 01:04:15 +00:00
Hal Finkel
6f341281a9 Fix test/CodeGen/PowerPC/ppc-shrink-wrapping.ll after r255398
llvm-svn: 255414
2015-12-12 00:42:05 +00:00
Sanjay Patel
ceecde00d5 [InstCombine] allow any pair of bitcasts to be combined
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.

It should be safe to convert any pair of bitcasts into a single bitcast, 
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm 
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases. 

Some day we'll get to remove that MMX wart from LLVM IR completely?

Differential Revision: http://reviews.llvm.org/D15468

llvm-svn: 255399
2015-12-12 00:33:36 +00:00
Hal Finkel
28f5cfb976 [PowerPC] Add Branch Hints for Highly-Biased Branches
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.

Patch by Tom Jablin!

llvm-svn: 255398
2015-12-12 00:32:00 +00:00
Derek Schuff
f8949427f2 [WebAssembly] Update test expectations
Many tests are now passing due to eliminateFrameIndex implementation and
the list needs to be re-triaged because it unblocks other failures, and
some previous failures are different. However I'm about to churn it more
by implementing more lowering, so will wait on that.

llvm-svn: 255396
2015-12-12 00:18:40 +00:00
Chen Li
a9e4a15783 Revert rL255391: [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
because it broke buildbot.

llvm-svn: 255395
2015-12-12 00:08:37 +00:00
Sanjay Patel
6151d885e6 use FileCheck for better checking
llvm-svn: 255394
2015-12-12 00:01:10 +00:00
Derek Schuff
75ae87426d [WebAssembly] Implement prolog/epilog insertion and FrameIndex elimination
Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.

Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).

Differential Revision: http://reviews.llvm.org/D15344

llvm-svn: 255392
2015-12-11 23:49:46 +00:00
Chen Li
c54223aa8c [X86ISelLowering] Add additional support for multiplication-to-shift conversion.
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.

Reviewers: craig.topper, RKSimon

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14603

llvm-svn: 255391
2015-12-11 23:39:32 +00:00
Diego Novillo
9bbd13f9a0 SamplePGO - Reduce memory utilization by 10x.
DenseMap is the wrong data structure to use for sample records and call
sites.  The keys are too large, causing massive core memory growth when
reading profiles.

Before this patch, a 21Mb input profile was causing the compiler to grow
to 3Gb in memory.  By switching to std::map, the compiler now grows to
300Mb in memory.

There still are some opportunities for memory footprint reduction. I'll
be looking at those next.

llvm-svn: 255389
2015-12-11 23:21:38 +00:00
Matt Arsenault
0640683138 SelectionDAG: Match min/max if the scalar operation is legal
llvm-svn: 255388
2015-12-11 23:16:47 +00:00
Hal Finkel
e58db13c29 Revert r248483, r242546, r242545, and r242409 - absdiff intrinsics
After much discussion, ending here:

  http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html

it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.

r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation

llvm-svn: 255387
2015-12-11 23:11:52 +00:00
Rafael Espindola
8ea226400f Avoid buffered reads of /dev/urandom
I am seeing disappointing clang performance on a large PowerPC64
Linux box. GetRandomNumberSeed() does a buffered read from
/dev/urandom to seed its PRNG. As a result we read an entire page
even though we only need 4 bytes.

With every clang task reading a page worth of /dev/urandom we
end up spending a large amount of time stuck on kernel spinlock.

Patch by Anton Blanchard!

llvm-svn: 255386
2015-12-11 22:52:32 +00:00
Davide Italiano
5f81084374 [llvm-objdump/MachODump] Reduce code duplication.
llvm-svn: 255380
2015-12-11 22:27:59 +00:00
Sanjay Patel
5d71fe5292 Add tests for bitcast-bitcast sequences for all scalar/vector permutations
As noted in http://reviews.llvm.org/D15392 , we should be able to improve this.

llvm-svn: 255370
2015-12-11 20:26:30 +00:00
Xinliang David Li
2cad28318a [PGO] Revert r255365: solution incomplete, not handling lambda yet
llvm-svn: 255369
2015-12-11 20:23:22 +00:00
Xinliang David Li
de46641441 [PGO] Stop using invalid char in instr variable names.
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).

This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is 
preserved with this change.

Differential Revision: http://reviews.llvm.org/D15243

llvm-svn: 255365
2015-12-11 19:53:19 +00:00
Matthias Braun
b8675cada7 CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.

This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).

Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.

This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033

Differential Revision: http://reviews.llvm.org/D15320

llvm-svn: 255362
2015-12-11 19:42:09 +00:00
Matt Arsenault
fad94dae85 Start replacing vector_extract/vector_insert with extractelt/insertelt
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359
2015-12-11 19:20:16 +00:00
Derek Schuff
06821d8eaa [WebAssembly] Fix ADJCALLSTACKDOWN/UP use/defs
Summary:
ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use
and def the stack pointer. Since they do not, all the nodes are being
eliminated by DeadMachineInstructionElim, so they aren't in the IR when
PrologEpilogInserter/eliminateCallFramePseudo needs them.

This change fixes that, but since RegStackify will not stackify across
them (and it runs early, before PEI), change LowerCall to only emit them
when the call frame size is > 0. That makes the current code work the
same way and makes code handled by D15344 also work the same way. We can
expand the condition beyond NumBytes > 0 in the future if needed.

Reviewers: sunfish, jfb

Subscribers: jfb, dschuff, llvm-commits

Differential Revision: http://reviews.llvm.org/D15459

llvm-svn: 255356
2015-12-11 18:55:34 +00:00
Chad Rosier
2dd4cf7c08 Revert r255247, r255265, and r255286 due to serious compile-time regressions.
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."

llvm-svn: 255354
2015-12-11 18:39:41 +00:00
Manman Ren
6d08250fe8 CXX_FAST_TLS calling convention: target independent portion.
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given calling convention and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

rdar://problem/23557469

Differential Revision: http://reviews.llvm.org/D15340

llvm-svn: 255353
2015-12-11 18:24:30 +00:00
Sanjay Patel
cc9c766d9f fix typos; NFC
llvm-svn: 255352
2015-12-11 18:12:01 +00:00
Frederic Riss
1db2534294 [dsymutil] Ignore absolute symbols in the debug map
Quoting from the comment added to the code:

    // Objective-C on i386 uses artificial absolute symbols to
    // perform some link time checks. Those symbols have a fixed 0
    // address that might conflict with real symbols in the object
    // file. As I cannot see a way for absolute symbols to find
    // their way into the debug information, let's just ignore those.

llvm-svn: 255350
2015-12-11 17:50:37 +00:00
Hal Finkel
4c089491a5 AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAA
GlobalsAA's assumptions that passes do not escape globals not previously
escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking
them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline.

http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html

Patch by Vaivaswatha Nagaraj!

llvm-svn: 255348
2015-12-11 17:46:01 +00:00
Hal Finkel
94fa3b65bf [TableGen] Correct Namespace lookup with AltNames in AsmWriterEmitter
AsmWriterEmitter will generate a getRegisterName function with an alternate
register name index as its second argument if the target makes use of them. The
enum of these values is generated in RegisterInfoEmitter. The getRegisterName
generator would assume the namespace could always be found by reading index 1
of the list of AltNameIndices, but this will fail if this list is sorted such
that the NoRegAltName is at index 1. Because this list is sorted by record name
(in CodeGenTarget::ReadRegAltNameIndices), you only run in to problems if your
MyTargetRegisterInfo.td defines a single RegAltNameIndex that sorts lexically
before NoRegAltName.

For example, if a target has something like

  def AnAltNameIndex : RegAltNameIndex

and defines RegAltNameIndices for some registers then, prior to this change,
AsmWriterEmitter would generate references to

  ::AnAltNameIndex and ::NoRegAltName

Patch by Alex Bradbury!

llvm-svn: 255344
2015-12-11 17:31:27 +00:00
Artur Pilipenko
8b6635b2f6 PruneEH pass incorrectly reports that a change was made
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D14097

llvm-svn: 255343
2015-12-11 16:30:26 +00:00
James Molloy
f5a3f4d77c [Mem2Reg] Respect optnone
Mem2Reg shouldn't be optimizing a function that is marked
optnone. There is a test checking this that fails when mem2reg is
explicitly added to the standard pass pipeline.

llvm-svn: 255336
2015-12-11 13:36:59 +00:00
James Molloy
d8003c7bf6 [InstCombine] Make MatchBSwap also match bit reversals
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.

llvm-svn: 255334
2015-12-11 10:04:51 +00:00
Maxim Ostapenko
c10d3f5ee5 Revert previous test commit.
llvm-svn: 255331
2015-12-11 07:40:25 +00:00
Maxim Ostapenko
c888ff9a5a This is a test commit to check my commit access works.
llvm-svn: 255330
2015-12-11 07:31:29 +00:00
Xinliang David Li
b2d316e534 [PGO] Read VP raw data without depending on the Value field
Before this patch, each function's on-disk VP data is 'pointed'
to by the Value field of per-function ProfileData structue, and 
read relies on this field (relocated with ValueDataDelta field)
to read the value data. However this means the Value field needs
to be updated during runtime before dumping, which creates undesirable
data races.

With this patch, the reading of VP data no longer depends on Value
field. There is no format change. ValueDataDelta header field becomes
obsolute but will be kept for compatibility reason (will be removed
next time the raw format change is needed).

llvm-svn: 255329
2015-12-11 06:53:53 +00:00
Hans Wennborg
903b7737d9 Fix build after r255319.
llvm-svn: 255322
2015-12-11 00:58:32 +00:00
Eric Christopher
b4999f7af6 Fix a spurious if.
llvm-svn: 255321
2015-12-11 00:51:59 +00:00
Akira Hatanaka
ef959cd337 [LazyValueInfo] Stop inserting overdefined values into ValueCache to
reduce memory usage.

Previously, LazyValueInfoCache inserted overdefined lattice values into
both ValueCache and OverDefinedCache. This wasn't necessary and was
causing LazyValueInfo to use an excessive amount of memory in some cases.

This patch changes LazyValueInfoCache to insert overdefined values only
into OverDefinedCache. The memory usage decreases by 70 to 75% when one
of the files in llvm is compiled.

rdar://problem/11388615

Differential revision: http://reviews.llvm.org/D15391

llvm-svn: 255320
2015-12-11 00:49:47 +00:00
Kyle Butt
d374fb8cd3 [PPC]: Peephole optimize small accesss to aligned globals.
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
        addis 3, 2, b4v@toc@ha
        addi 4, 3, b4v@toc@l
        lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
        lbz 6, 1(4)         ; optimizer
        lbz 7, 2(4)
        lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
        addis 3, 2, b4v@toc@ha
        lbz 4, b4v@toc@l(3)
        lbz 5, b4v@toc@l+1(3)
        lbz 6, b4v@toc@l+2(3)
        lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.

llvm-svn: 255319
2015-12-11 00:47:36 +00:00
Hans Wennborg
b179aaa5e1 Check in the script for building Win snapshots
llvm-svn: 255318
2015-12-11 00:43:42 +00:00
Vedant Kumar
6f3e981379 [ProfileData] clang-format TextInstrProfReader::hasFormat. NFC.
llvm-svn: 255317
2015-12-11 00:40:05 +00:00
Cong Hou
649a5e80cc [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.
Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.


Differential revision: http://reviews.llvm.org/D15132

llvm-svn: 255315
2015-12-11 00:31:39 +00:00
Xinliang David Li
d864384cd6 Format fix (NFC)
llvm-svn: 255313
2015-12-10 23:48:05 +00:00
Eric Christopher
5f9e955382 s/need/needs
llvm-svn: 255306
2015-12-10 22:29:26 +00:00
Eric Christopher
971f116a1c Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,
x)) combines for ppc_fp128, since signbit computation is more
complicated.

Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html

Patch by Tim Shen!

llvm-svn: 255305
2015-12-10 22:09:06 +00:00