Commit Graph

17024 Commits

Author SHA1 Message Date
NAKAMURA Takumi
2811ae66a8 llvm/test/MC/X86/x86_nop.s: Make sure -arch=x86 when -mcpu=geode.
-mcpu doesn't infer -arch. Consider non-x86 host.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164185 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-19 00:56:20 +00:00
Chandler Carruth
38f35fd3b7 Fix the last crasher I've gotten a reproduction for in SROA. This one
from the dragonegg build bots when we turned on the full version of the
pass. Included a much reduced test case for this pesky bug, despite
bugpoint's uncooperative behavior.

Also, I audited all the similar code I could find and didn't spot any
other cases where this mistake cropped up.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164178 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 22:37:19 +00:00
Evan Cheng
b37b6ca4bb MOVi16 (movw) is only legal on cpus with V6T2 support. rdar://12300648
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164169 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 21:24:16 +00:00
Benjamin Kramer
30ce40e3f7 FileCheck: Fix off-by-one bug that made CHECK-NOT: ignore the next character after the colon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164165 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 20:51:39 +00:00
Roman Divacky
51ca601977 Add test for r164155 and remove two tests superseded by ppc64-calls.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164162 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 19:51:44 +00:00
Jan Sjödin
18505b3522 Add hidden flag to exclude aliases from output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164158 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 18:47:58 +00:00
Andrew Trick
f08c115e6c LSR critical edge splitting fix for PR13756.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164147 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 17:51:33 +00:00
Chandler Carruth
7c8df7aa0c Fix getCommonType in a different way from the way I fixed it when
working on FCA splitting. Instead of refusing to form a common type when
there are uses of a subsection of the alloca as well as a use of the
entire alloca, just skip the subsection uses and continue looking for
a whole-alloca use with a type that we can use.

This produces slightly prettier IR I think, and also fixes the other
failure in the test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164146 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 17:49:37 +00:00
Roman Divacky
f145c135f3 Avoid symbol name clash when filling TOC.
Patch by Adhemerval Zanella.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164141 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 17:10:37 +00:00
Roman Divacky
4cd56014ae On PPC64 emit the environment pointer. Patch by Adhemerval Zanella.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164139 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 16:55:29 +00:00
Roman Divacky
eb8b7dc536 Optimize local func calls to not emit nop for TOC restoration.
Patch by Adhemerval Zanella.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164138 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 16:47:58 +00:00
Roman Divacky
36b07f2f07 Add test for r164132.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164134 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 16:19:10 +00:00
NAKAMURA Takumi
0480064526 llvm/test/DebugInfo: Move two tests, 2010-04-13-PubType.ll and linkage-name.ll to X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164129 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 14:57:11 +00:00
Benjamin Kramer
6ab4bd7145 XFAIL SROA test until Chandler can get to it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164128 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 14:27:53 +00:00
Chandler Carruth
9e3f639579 Fix a warning in release builds and a test case I forgot to update with
a fix to getCommonType in the previous patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164120 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 13:02:06 +00:00
Chandler Carruth
c370acdf96 Add a major missing piece to the new SROA pass: aggressive splitting of
FCAs. This is essential in order to promote allocas that are used in
struct returns by frontends like Clang. The FCA load would block the
rest of the pass from firing, resulting is significant regressions with
the bullet benchmark in the nightly test suite.

Thanks to Duncan for repeated discussions about how best to do this, and
to both him and Benjamin for review.

This appears to have blocked many places where the pass tries to fire,
and so I'm expect somewhat different results with this fix added.

As with the last big patch, I'm including a change to enable the SROA by
default *temporarily*. Ben is going to remove this as soon as the LNT
bots pick up the patch. I'm just trying to get a round of LNT numbers
from the stable machines in the lab.

NOTE: Four clang tests are expected to fail in the brief window where
this is enabled. Sorry for the noise!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164119 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 12:57:43 +00:00
Richard Osborne
d7cc8b839c Fix instcombine to obey requested alignment when merging allocas.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164117 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 09:31:44 +00:00
James Molloy
97ecb83dff More domain conversion; convert VFP VMOVS to NEON instructions in more cases - when we may clobber the other S-lane by converting an S to a D instruction, make an effort to work out if the S lane is clobberable or not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164114 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 08:31:15 +00:00
Evan Cheng
d10eab0a95 Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte
aligned address. Based on patch by David Peixotto.

Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164089 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 01:42:45 +00:00
Manman Ren
222d6192ad PGO: preserve branch-weight metadata when simplifying Switch to a sub, an icmp
and a conditional branch; also when removing dead cases from a switch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164084 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 00:47:33 +00:00
Manman Ren
b010277b59 PGO: preserve branch-weight metadata when simplifying Switch
Hanlde the case when we split the default edge if the default target has "icmp"
and unconditinal branch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164076 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 23:07:43 +00:00
Jakob Stoklund Olesen
87f7864c6d Merge into undefined lanes under -new-coalescer.
Add LIS::pruneValue() and extendToIndices(). These two functions are
used by the register coalescer when merging two live ranges requires
more than a trivial value mapping as supported by LiveInterval::join().

The pruneValue() function can remove the part of a value number that is
going to conflict in join(). Afterwards, extendToIndices can restore the
live range, using any new dominating value numbers and updating the SSA
form.

Use this complex value mapping to support merging a register into a
vector lane that has a conflicting value, but the clobbered lane is
undef.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164074 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 23:03:25 +00:00
Manman Ren
b11cbe6b23 PGO: preserve branch-weight metadata when simplifying SwitchOnSelect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164068 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 22:28:55 +00:00
Jan Wen Voung
b024b7014a Add some cases to x86 OptimizeCompare to handle DEC and INC, too.
While we are setting the earlier def to true, also make it live.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164056 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 22:04:23 +00:00
Manman Ren
566540332f PGO: preserve branch-weight metadata when simplifying two branches with a common
destination in SimplifyCondBranchToCondBranch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164054 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 21:30:40 +00:00
Michael Liao
bb73002247 Fix PR13859
- Preserve the original NOutVT during casting from vector to integer by
  extracting vector elements.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164042 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 18:05:20 +00:00
Silviu Baranga
c8bf0f8662 Removed the VMLxForwarding feature for the Cortex-A15 target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164030 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-17 14:10:54 +00:00
Nadav Rotem
6fc671ca63 Fix the testcase to work on all platforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163997 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-16 07:58:47 +00:00
Nadav Rotem
638e4c13cb The PMOVZXWD family of functions had patterns extends narrow vector types to wide vector types.
It had patterns for zext-loading and extending. This commit adds patterns for loading a wide type, performing a bitcast,
and extending. This is an odd pattern, but it is commonly used when writing code with intrinsics.

rdar://11897677



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163995 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-16 07:39:07 +00:00
Benjamin Kramer
562b240fc5 X86: Emitting x87 fsin/fcos for sinf/cosf is not safe without unsafe fp math.
This was only an issue if sse is disabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163967 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-15 12:44:27 +00:00
Chandler Carruth
1c8db50a9a Port the SSAUpdater-based promotion logic from the old SROA pass to the
new one, and add support for running the new pass in that mode and in
that slot of the pass manager. With this the new pass can completely
replace the old one within the pipeline.

The strategy for enabling or disabling the SSAUpdater logic is to do it
by making the requirement of the domtree analysis optional. By default,
it is required and we get the standard mem2reg approach. This is usually
the desired strategy when run in stand-alone situations. Within the
CGSCC pass manager, we disable requiring of the domtree analysis and
consequentially trigger fallback to the SSAUpdater promotion.

In theory this would allow the pass to re-use a domtree if one happened
to be available even when run in a mode that doesn't require it. In
practice, it lets us have a single pass rather than two which was
simpler for me to wrap my head around.

There is a hidden flag to force the use of the SSAUpdater code path for
the purpose of testing. The primary testing strategy is just to run the
existing tests through that path. One notable difference is that it has
custom code to handle lifetime markers, and one of the tests has been
enhanced to exercise that code.

This has survived a bootstrap and the test suite without serious
correctness issues, however my run of the test suite produced *very*
alarming performance numbers. I don't entirely understand or trust them
though, so more investigation is on-going.

To aid my understanding of the performance impact of the new SROA now
that it runs throughout the optimization pipeline, I'm enabling it by
default in this commit, and will disable it again once the LNT bots have
picked up one iteration with it. I want to get those bots (which are
much more stable) to evaluate the impact of the change before I jump to
any conclusions.

NOTE: Several Clang tests will fail because they run -O3 and check the
result's order of output. They'll go back to passing once I disable it
again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163965 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-15 11:43:14 +00:00
Akira Hatanaka
f934d159ae Handled unaligned load/stores properly in Mips16
Patch by Reed Kotler.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163956 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-15 01:02:03 +00:00
Manman Ren
062986c2f0 PGO: preserve branch-weight metadata when simplifying two branches with a common
destination.

Updated previous implementation to fix a case not covered:
// PBI: br i1 %x, TrueDest, BB
// BI:  br i1 %y, TrueDest, FalseDest
The other case was handled correctly.
// PBI: br i1 %x, BB, FalseDest
// BI:  br i1 %y, TrueDest, FalseDest

Also tried to use 64-bit arithmetic instead of APInt with scale to simplify the
computation. Let me know if you have other opinions about this.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163954 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-15 00:39:57 +00:00
Manman Ren
ad2890760f PGO: preserve branch-weight metadata when simplifying a switch with a single
case to a conditional branch and when removing dead cases.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163942 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-14 21:53:06 +00:00
Alex Rosenberg
eee94b3432 Review feedback from Duncan Sands. Alphabetize includes and simplify
lit config.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163928 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-14 19:19:57 +00:00
Manman Ren
a8a2b99aec PGO: preserve branch-weight metadata when merging two switches where
the default target of the first switch is not the basic block the second switch
is in (PredDefault != BB).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163916 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-14 17:29:56 +00:00
Chandler Carruth
713aa9431d Introduce a new SROA implementation.
This is essentially a ground up re-think of the SROA pass in LLVM. It
was initially inspired by a few problems with the existing pass:
- It is subject to the bane of my existence in optimizations: arbitrary
  thresholds.
- It is overly conservative about which constructs can be split and
  promoted.
- The vector value replacement aspect is separated from the splitting
  logic, missing many opportunities where splitting and vector value
  formation can work together.
- The splitting is entirely based around the underlying type of the
  alloca, despite this type often having little to do with the reality
  of how that memory is used. This is especially prevelant with unions
  and base classes where we tail-pack derived members.
- When splitting fails (often due to the thresholds), the vector value
  replacement (again because it is separate) can kick in for
  preposterous cases where we simply should have split the value. This
  results in forming i1024 and i2048 integer "bit vectors" that
  tremendously slow down subsequnet IR optimizations (due to large
  APInts) and impede the backend's lowering.

The new design takes an approach that fundamentally is not susceptible
to many of these problems. It is the result of a discusison between
myself and Duncan Sands over IRC about how to premptively avoid these
types of problems and how to do SROA in a more principled way. Since
then, it has evolved and grown, but this remains an important aspect: it
fixes real world problems with the SROA process today.

First, the transform of SROA actually has little to do with replacement.
It has more to do with splitting. The goal is to take an aggregate
alloca and form a composition of scalar allocas which can replace it and
will be most suitable to the eventual replacement by scalar SSA values.
The actual replacement is performed by mem2reg (and in the future
SSAUpdater).

The splitting is divided into four phases. The first phase is an
analysis of the uses of the alloca. This phase recursively walks uses,
building up a dense datastructure representing the ranges of the
alloca's memory actually used and checking for uses which inhibit any
aspects of the transform such as the escape of a pointer.

Once we have a mapping of the ranges of the alloca used by individual
operations, we compute a partitioning of the used ranges. Some uses are
inherently splittable (such as memcpy and memset), while scalar uses are
not splittable. The goal is to build a partitioning that has the minimum
number of splits while placing each unsplittable use in its own
partition. Overlapping unsplittable uses belong to the same partition.
This is the target split of the aggregate alloca, and it maximizes the
number of scalar accesses which become accesses to their own alloca and
candidates for promotion.

Third, we re-walk the uses of the alloca and assign each specific memory
access to all the partitions touched so that we have dense use-lists for
each partition.

Finally, we build a new, smaller alloca for each partition and rewrite
each use of that partition to use the new alloca. During this phase the
pass will also work very hard to transform uses of an alloca into a form
suitable for promotion, including forming vector operations, speculating
loads throguh PHI nodes and selects, etc.

After splitting is complete, each newly refined alloca that is
a candidate for promotion to a scalar SSA value is run through mem2reg.

There are lots of reasonably detailed comments in the source code about
the design and algorithms, and I'm going to be trying to improve them in
subsequent commits to ensure this is well documented, as the new pass is
in many ways more complex than the old one.

Some of this is still a WIP, but the current state is reasonbly stable.
It has passed bootstrap, the nightly test suite, and Duncan has run it
successfully through the ACATS and DragonEgg test suites. That said, it
remains behind a default-off flag until the last few pieces are in
place, and full testing can be done.

Specific areas I'm looking at next:
- Improved comments and some code cleanup from reviews.
- SSAUpdater and enabling this pass inside the CGSCC pass manager.
- Some datastructure tuning and compile-time measurements.
- More aggressive FCA splitting and vector formation.

Many thanks to Duncan Sands for the thorough final review, as well as
Benjamin Kramer for lots of review during the process of writing this
pass, and Daniel Berlin for reviewing the data structures and algorithms
and general theory of the pass. Also, several other people on IRC, over
lunch tables, etc for lots of feedback and advice.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163883 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-14 09:22:59 +00:00
Eric Christopher
ffaf69b8b1 Fix both the test for zero and what we do if we have a zero for
umulo legalization.

Fixes PR13839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163856 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 23:24:02 +00:00
Jim Grosbach
3f90a4c42d Assembler: Darwin variables defined via .set are no-dead-strip.
For gas compatibility.

rdar://12219394

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163854 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 23:11:31 +00:00
Dan Gohman
b998913ff4 Handle the new !tbaa.struct metadata tags when converting a memcpy into scalar
loads and stores.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163844 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 21:51:01 +00:00
Michael Liao
f966e4e5b3 Add wider vector/integer support for PR12312
- Enhance the fix to PR12312 to support wider integer, such as 256-bit
  integer. If more than 1 fully evaluated vectors are found, POR them
  first followed by the final PTEST.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163832 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 20:24:54 +00:00
Michael Liao
092122f124 Enhance type legalization on bitcast from vector to integer
- Find a legal vector type before casting and extracting element from it.
- As the new vector type may have more than 2 elements, build the final
  hi/lo pair by BFS pairing them from bottom to top.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163830 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 19:58:21 +00:00
Jakob Stoklund Olesen
da0e8219b7 Fix test case to avoid PIC magic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163827 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 19:47:45 +00:00
Jakob Stoklund Olesen
7bba7d0efc Fix the TCRETURNmi64 bug differently.
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.

<rdar://problem/12282281>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163819 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen
0767dc546e Revert r163761 "Don't fold indexed loads into TCRETURNmi64."
The patch caused "Wrong topological sorting" assertions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163810 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 16:52:17 +00:00
Benjamin Kramer
39acdb0200 MemCpyOpt: When forming a memset from stores also take GEP constexprs into account.
This is common when storing to global variables.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163809 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 16:29:49 +00:00
Silviu Baranga
616471d4bf This patch introduces A15 as a target in LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163803 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 15:05:10 +00:00
Nadav Rotem
91a7e0184a Fix a dagcombine optimization. The optimization attempts to optimize a bitcast of fneg to integers
by xoring the high-bit. This fails if the source operand is a vector because we need to negate
each of the elements in the vector.

Fix rdar://12281066 PR13813.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163802 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 14:54:28 +00:00
Nadav Rotem
0cd19b9301 Stack Coloring: We have code that checks that all of the uses of allocas
are within the lifetime zone. Sometime legitimate usages of allocas are
hoisted outside of the lifetime zone. For example, GEPS may calculate the
address of a member of an allocated struct. This commit makes sure that
we only check (abort regions or assert) for instructions that read and write
memory using stack frames directly. Notice that by allowing legitimate
usages outside the lifetime zone we also stop checking for instructions
which use derivatives of allocas. We will catch less bugs in user code
and in the compiler itself.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163791 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 12:38:37 +00:00
Jakob Stoklund Olesen
aa0cfea9a4 Don't fold indexed loads into TCRETURNmi64.
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.

This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.

<rdar://problem/12282281>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163761 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-13 00:25:00 +00:00