Commit Graph

105 Commits

Author SHA1 Message Date
Stefan Pintilie
76f5d80f5b [Power9] Add new instructions for floating point status and control registers.
Added the following P9 instructions: mffsce, mffscdrn, mffscdrni, mffscrn,
  mffscrni, mffsl

Differential Revision: https://reviews.llvm.org/D37167

llvm-svn: 311903
2017-08-28 18:46:01 +00:00
Tony Jiang
87eac04da6 [PowerPC] Implement missing ISA 2.06 instructions.
Instructions: fctidu[.], fctiwu[.], ftdiv, ftsqrt are not implemented. Implement
them and add corresponding test cases in this patch.

llvm-svn: 291116
2017-01-05 15:00:45 +00:00
Ehsan Amiri
e2a6f26a8d [PPC] Generate positive FP zero using xor insn instead of loading from constant area
https://reviews.llvm.org/D23614

Currently we load +0.0 from constant area. That can change to be generated using
XOR instruction.

llvm-svn: 284995
2016-10-24 17:31:09 +00:00
Nemanja Ivanovic
fe9adb9248 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Nemanja Ivanovic
5779a8862c [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246
2016-09-23 13:25:31 +00:00
Hal Finkel
68985b72e3 [PowerPC] Support asm parsing for bc[l][a][+-] mnemonics
PowerPC assembly code in the wild, so it seems, has things like this:

  bc+     12, 28, .L9

This is a bit odd because the '+' here becomes part of the BO field, and the BO
field is otherwise the first operand. Nevertheless, the ISA specification does
clearly say that the +- hint syntax applies to all conditional-branch mnemonics
(that test either CTR or a condition register, although not the forms which
check both), both basic and extended, so this is supposed to be valid.

This introduces some asm-parser-only definitions which take only the upper
three bits from the specified BO value, and the lower two bits are implied by
the +- suffix (via some associated aliases).

Fixes PR23646.

llvm-svn: 280571
2016-09-03 02:31:44 +00:00
Hal Finkel
a0698b2905 [PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfev
These few book-III instructions are used by the Linux kernel.

Partially fixes PR24796.

llvm-svn: 280560
2016-09-02 23:42:01 +00:00
Kit Barton
d03785d120 This reverts commit r265505.
Revert "[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance".
This patch has caused a functional regression in SPEC2k6 namd, and a performance regression in mesa-pipe.

llvm-svn: 267927
2016-04-28 20:00:42 +00:00
Nemanja Ivanovic
f6399ae1dc [PowerPC] Basic support for P9 byte comparison and count trailing zero insns
This patch corresponds to review:
http://reviews.llvm.org/D17850

This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.

llvm-svn: 266228
2016-04-13 18:51:18 +00:00
Chuang-Yu Cheng
45f8677a56 [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance
This patch implement the following instructions:
- addpcis subpcis
- maddhd maddhdu maddld
- modsw moduw modsd modud
- darn
- extswsli extswsli.
- setb
- dtstsfi dtstsfiq

Total 15 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton

http://reviews.llvm.org/D17885

llvm-svn: 265505
2016-04-06 01:47:02 +00:00
Chuang-Yu Cheng
dff7434b0f [Power9] Implement copy-paste, msgsync, slb, and stop instructions
This patch implements the following BookII and Book III instructions:
- copy copy_first cp_abort paste paste. paste_last
- msgsync
- slbieg slbsync
- stop

Total 10 instructions

Reviewers: nemanjai hfinkel tjablin amehsan kbarton
llvm-svn: 265504
2016-04-06 01:46:45 +00:00
Nemanja Ivanovic
bbd7e26280 [PowerPC] Basic support for P9 atomic loads and stores
This patch corresponds to review:
http://reviews.llvm.org/D18032

This patch provides asm implementation for the following instructions:
lwat, ldat, stwat, stdat, ldmx, mcrxrx

llvm-svn: 265022
2016-03-31 15:26:37 +00:00
Chuang-Yu Cheng
946c8b7db0 [Power9] Implement new altivec instructions: bcd* series
This patch implements the following altivec instructions:

- Decimal Convert From/to National/Zoned/Signed-QWord:
    bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.

- Decimal Copy-Sign/Set-Sign:
    bcdcpsgn. bcdsetsgn.

- Decimal Shift/Unsigned-Shift/Shift-and-Round:
    bcds. bcdus. bcdsr.

- Decimal (Unsigned) Truncate:
    bcdtrunc. bcdutrunc.

Total 13 instructions

Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D17838

llvm-svn: 264568
2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng
86334cecad [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
This change implements the following vsx instructions:

- Scalar Insert/Extract
    xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp

- Vector Insert/Extract
    xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
    xxextractuw xxinsertw

- Scalar/Vector Test Data Class
    xststdcdp xststdcsp xststdcqp
    xvtstdcdp xvtstdcsp

- Maximum/Minimum
    xsmaxcdp xsmaxjdp
    xsmincdp xsminjdp

- Vector Byte-Reverse/Permute/Splat
    xxbrd xxbrh xxbrq xxbrw
    xxperm xxpermr
    xxspltib

30 instructions

Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16842

llvm-svn: 264567
2016-03-28 08:34:28 +00:00
Chuang-Yu Cheng
e4ba18d52f [Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq

28 instructions

Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
2016-03-26 05:46:11 +00:00
Kit Barton
8b3a860234 [Power9] Implement new vsx instructions: load, store instructions for vector and scalar
We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.

This patch implements the following vsx instructions:

Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions

Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906
2016-03-08 03:49:13 +00:00
Kit Barton
e51a13ad8a Power9] Implement new vsx instructions: compare and conversion
This change implements the following vsx instructions:

Quad/Double-Precision Compare:
xscmpoqp xscmpuqp
xscmpexpdp xscmpexpqp
xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
xvcmpnedp(.) xvcmpnesp(.)
Quad-Precision Floating-Point Conversion
xscvqpdp(o) xscvdpqp
xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
xscvdphp xscvhpdp xvcvhpsp xvcvsphp
xsrqpi xsrqpix xsrqpxp
28 instructions

Phabricator: http://reviews.llvm.org/D16709
llvm-svn: 262068
2016-02-26 21:11:55 +00:00
Bill Schmidt
e55f451b6f [PPC64] Add support for clrbhrb, mfbhrbe, rfebb.
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching.  These will
not be used by typical applications, so built-in support is not
required.  They will only be available via inline assembly.

Assembly/disassembly tests are included in the patch.

llvm-svn: 238032
2015-05-22 16:44:10 +00:00
Hal Finkel
5faf4416ba [PowerPC] Add asm/disasm support for dcbt with hint
Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint
field specified (non-zero). Unforunately, the syntax for this instruction is
special in that it differs for server vs. embedded cores:
   dcbt ra, rb, th [server]
   dcbt th, ra, rb [embedded]
where th can be omitted when it is 0. dcbtst is the same. Thus we need to play
games in the parser and the printer to flip the operands around on the embedded
cores. We'll use the server syntax as the default (binutils currently uses the
embedded form by default, but IBM is changing that).

We also stop marking dcbtst as having unmodeled side effects (this is not
necessary, it is just a hint like dcbt -- noticed by inspection, so no separate
test case).

llvm-svn: 235657
2015-04-23 22:47:57 +00:00
Nemanja Ivanovic
9d5491e285 Add direct moves to/from VSR and exploit them for FP/INT conversions
This patch corresponds to review:
http://reviews.llvm.org/D8928

It adds direct move instructions to/from VSX registers to GPR's. These are
exploited for FP <-> INT conversions.

llvm-svn: 234682
2015-04-11 10:40:42 +00:00
Kit Barton
d0dd6e5750 Add Hardware Transactional Memory (HTM) Support
This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07
(POWER8). The intrinsic support is based on GCC one [1], but currently only the
'PowerPC HTM Low Level Built-in Function' are implemented.

The HTM instructions follows the RC ones and the transaction initiation result
is set on RC0 (with exception of tcheck). Currently approach is to create a
register copy from CR0 to GPR and comapring. Although this is suboptimal, since
the branch could be taken directly by comparing the CR0 value, it generates code
correctly on both test and branch and just return value. A possible future
optimization could be elimitate the MFCR instruction to branch directly.

The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on
powerpc64 and powerpc64le.

This is send along a clang patch to enabled the builtins and option switch.

[1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html

Phabricator Review: http://reviews.llvm.org/D8247

llvm-svn: 233204
2015-03-25 19:36:23 +00:00
Nemanja Ivanovic
38e13136f3 Add LLVM support for PPC cryptography builtins
Review: http://reviews.llvm.org/D7955

llvm-svn: 231285
2015-03-04 20:44:33 +00:00
Hal Finkel
67b5b15e9e [PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially  { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).

I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).

The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.

llvm-svn: 230413
2015-02-25 01:06:45 +00:00
Hal Finkel
28c9c891ec [PowerPC] Add assembler support for mcrfs and friends
Fill out our support for the floating-point status and control register
instructions (mcrfs and friends). As it turns out, these are necessary for
compiling src/test/harness_fp.h in TBB for PowerPC.

Thanks to Raf Schietekat for reporting the issue!

llvm-svn: 226070
2015-01-15 01:00:53 +00:00
Hal Finkel
e37bd38154 [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64
On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl
instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction
sequence is interpreted by the unwinding code in libgcc. To make sure these
occur as a pair, as with other pairings interpreted by the linker, fuse the two
instructions into one instruction (for code generation only).

In the future, we might wish to do this by emitting CFI directives instead,
but this solution is simpler, and mirrors what GCC does. Additional discussion
on this point is contained in the PR.

Fixes PR22015.

llvm-svn: 224788
2014-12-23 22:29:40 +00:00
Hal Finkel
d19aa4cdf8 [PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.

llvm-svn: 222712
2014-11-25 00:30:11 +00:00
Hal Finkel
7014e3d50f [PowerPC] Add support for dcbtst and icbt (prefetch)
Adds code generation support for dcbtst (data cache prefetch for write) and
icbt (instruction cache prefetch for read - Book E cores only).

We still end up with a 'cannot select' error for the non-supported prefetch
intrinsic forms. This will be fixed in a later commit.

Fixes PR20692.

llvm-svn: 216339
2014-08-23 23:21:04 +00:00
Joerg Sonnenberger
0f4e0ad62e tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.
llvm-svn: 214784
2014-08-04 21:28:22 +00:00
Joerg Sonnenberger
ca3cb82714 Don't use additional arguments for dss and friends to satisfy DSS_Form,
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.

llvm-svn: 214636
2014-08-02 15:09:41 +00:00
Joerg Sonnenberger
3845a9a9b8 Refactor TLBIVAX and add tlbsx.
llvm-svn: 214354
2014-07-30 22:51:15 +00:00
Joerg Sonnenberger
afb8bfcb47 Recognize BookE's mbar instruction.
llvm-svn: 214244
2014-07-29 23:16:31 +00:00
Joerg Sonnenberger
d52d4c80b5 Support move to/from segment register.
llvm-svn: 214234
2014-07-29 22:21:57 +00:00
Ulrich Weigand
e256aa48b9 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).

llvm-svn: 211193
2014-06-18 17:52:49 +00:00
Hal Finkel
8b6358ead9 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

llvm-svn: 203768
2014-03-13 07:58:58 +00:00
Hal Finkel
883c64377d Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Hal Finkel
ce61543897 Add a disassembler to the PowerPC backend
The tests for the disassembler were adapted from the encoder tests, and for the
most part, the output from the disassembler matches that encoder-test inputs.
There are some places where more-informative mnemonics could be produced
(notably for the branch instructions), and those cases are noted in the tests
with FIXMEs.

Future work includes:

 - Generating more-informative mnemonics when possible (this may also be done
   in the printer).

 - Remove the dependence on positional "numbered" operand-to-variable mapping
   (for both encoding and decoding).

 - Internally using 64-bit instruction variants in 64-bit mode (if this turns
   out to matter).

llvm-svn: 197693
2013-12-19 16:13:01 +00:00
Hal Finkel
e8c7fef754 Improve instruction scheduling for the PPC POWER7
Aside from a few minor latency corrections, the major change here is a new
hazard recognizer which focuses on better dispatch-group formation on the
POWER7. As with the PPC970's hazard recognizer, the most important thing it
does is avoid load-after-store hazards within the same dispatch group. It uses
the POWER7's special dispatch-group-terminating nop instruction (instead of
inserting multiple regular nop instructions). This new hazard recognizer makes
use of the scheduling dependency graph itself, built using AA information, to
robustly detect the possibility of load-after-store hazards.

significant test-suite performance changes (the error bars are 99.5% confidence
intervals based on 5 test-suite runs both with and without the change --
speedups are negative):

speedups:

MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.55171% +/- 0.333168%

MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl
	-17.5576% +/- 14.598%

MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
	-29.5708% +/- 7.09058%

MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt
	-34.9471% +/- 11.4391%

SingleSource/Benchmarks/BenchmarkGame/puzzle
	-25.1347% +/- 11.0104%

SingleSource/Benchmarks/Misc/flops-8
	-17.7297% +/- 9.79061%

SingleSource/Benchmarks/Shootout-C++/ary3
	-35.5018% +/- 23.9458%

SingleSource/Regression/C/uint64_to_float
	-56.3165% +/- 25.4234%

SingleSource/UnitTests/Vectorizer/gcc-loops
	-18.5309% +/- 6.8496%

regressions:

MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000
	18.351% +/- 12.156%

SingleSource/Benchmarks/Shootout-C++/methcall
	27.3086% +/- 14.4733%

llvm-svn: 197099
2013-12-12 00:19:11 +00:00
Hal Finkel
40fc5609c6 Add IIC_ prefix to PPC instruction-class names
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.

Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.

No functionality change intended.

llvm-svn: 195890
2013-11-27 23:26:09 +00:00
Roman Divacky
582df28090 Implement asm support for a few PowerPC bookIII that are needed for assembling
FreeBSD kernel.

llvm-svn: 190618
2013-09-12 17:50:54 +00:00
Ulrich Weigand
22cc4fd86f [PowerPC] Support "eieio" instruction
This adds support for the "eieio" instruction to
the asm parser.

llvm-svn: 185349
2013-07-01 17:06:26 +00:00
Ulrich Weigand
9dfc60786f [PowerPC] Add variants of "sync" instruction
This adds support for the "sync $L" instruction with operand,
and provides aliases for "lwsync" and "ptesync".

llvm-svn: 185344
2013-07-01 16:37:52 +00:00
Ulrich Weigand
5f83058705 [PowerPC] Support generic conditional branches in asm parser
This adds instruction patterns to cover the generic forms of
the conditional branch instructions.  This allows the assembler
to support the generic mnemonics.

The compiler will still generate the various specific forms
of the instruction that were already supported.

llvm-svn: 184722
2013-06-24 11:55:21 +00:00
Bill Schmidt
b5b390b792 Implement the PowerPC system call (sc) instruction.
Instruction added at request of Roman Divacky.  Tested via asm-parser.

llvm-svn: 181821
2013-05-14 19:35:45 +00:00
Ulrich Weigand
c7ad3c20c4 [PowerPC] Add some Book II instructions to AsmParser
This patch adds a couple of Book II instructions (isync, icbi) to the
PowerPC assembler parser.  These are needed when bootstrapping clang
with the integrated assembler forced on, because they are used in
inline asm statements in the code base.

The test case adds the full list of Book II storage control instructions,
including associated extended mnemonics.  Again, those that are not yet
supported as marked as FIXME.

llvm-svn: 181052
2013-05-03 19:51:09 +00:00
Ulrich Weigand
71ccd2df46 PowerPC: Fix encoding of rldimi and rldcl instructions
When testing the asm parser, I noticed wrong encodings for the
above instructions (wrong operand name in rldimi, wrong form
and sub-opcode for rldcl).

Tests will be added together with the asm parser.

llvm-svn: 180606
2013-04-26 15:39:12 +00:00
Hal Finkel
c6d837e387 Add a comment about the PPC Interpretation64Bit bit
llvm-svn: 179391
2013-04-12 18:17:38 +00:00
Hal Finkel
a4429c79f5 Add PPC instruction record forms and associated query functions
This is prep. work for the implementation of optimizeCompare. Many PPC
instructions have 'record' forms (in almost all cases, this means that the RC
bit is set) that cause the result of the instruction to be compared with zero,
and the result of that comparison saved in a predefined condition register. In
order to add the record forms of the instructions without too much
copy-and-paste, the relevant functions have been refactored into multiclasses
which define both the record and normal forms.

Also, two TableGen-generated mapping functions have been added which allow
querying the instruction code for the record form given the normal form (and
vice versa).

No functionality change intended.

llvm-svn: 179356
2013-04-12 02:18:09 +00:00
Hal Finkel
0daaa8e2de Generate PPC early conditional returns
PowerPC has a conditional branch to the link register (return) instruction: BCLR.
This should be used any time when we'd otherwise have a conditional branch to a
return. This adds a small pass, PPCEarlyReturn, which runs just prior to the
branch selection pass (and, importantly, after block placement) to generate
these conditional returns when possible. It will also eliminate unconditional
branches to returns (these happen rarely; most of the time these have already
been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice
optimization for small functions that do not maintain a stack frame.

llvm-svn: 179026
2013-04-08 16:24:03 +00:00
Ulrich Weigand
b1ad5b2c91 PowerPC: Mark patterns as isCodeGenOnly.
There remain a number of patterns that cannot (and should not)
be handled by the asm parser, in particular all the Pseudo patterns.

This commit marks those patterns as isCodeGenOnly.

No change in generated code.

llvm-svn: 178008
2013-03-26 10:57:16 +00:00
Ulrich Weigand
5c8ca72b53 PowerPC: Simplify FADD in round-to-zero mode.
As part of the the sequence generated to implement long double -> int
conversions, we need to perform an FADD in round-to-zero mode.  This is
problematical since the FPSCR is not at all modeled at the SelectionDAG
level, and thus there is a risk of getting floating point instructions
generated out of sequence with the instructions to modify FPSCR.

The current code handles this by somewhat "special" patterns that in part
have dummy operands, and/or duplicate existing instructions, making them
awkward to handle in the asm parser.

This commit changes this by leaving the "FADD in round-to-zero mode"
as an atomic operation on the SelectionDAG level, and only split it up into
real instructions at the MI level (via custom inserter).  Since at *this*
level the FPSCR *is* modeled (via the "RM" hard register), much of the
"special" stuff can just go away, and the resulting patterns can be used by
the asm parser.

No significant change in generated code expected.

llvm-svn: 178006
2013-03-26 10:56:22 +00:00