Henrik Rydgård
|
a1f5c537d4
|
Merge pull request #7672 from unknownbrackets/jit-minor
More x86jit micro optimizations for the FPU
|
2015-04-13 09:57:02 +02:00 |
|
Henrik Rydgård
|
c1b91ff5c1
|
x86: Add a way to eliminate some mov instructions.
Not currently used yet.
|
2015-04-12 13:50:23 -07:00 |
|
Henrik Rydgård
|
70fa830ba5
|
Split out the ReplaceJalTo test logic.
This makes it so the IR, in the future, can work correctly for
replacements.
|
2015-04-12 13:35:10 -07:00 |
|
Henrik Rydgård
|
c6113b831d
|
Remove unused and duplicate define.
|
2015-04-12 13:16:01 -07:00 |
|
Henrik Rydgård
|
f81781d25c
|
Unify JitOptions in FakeJit also.
|
2015-04-12 13:15:00 -07:00 |
|
Henrik Rydgård
|
a8b50d0c9b
|
Fix MIPSInfo masking for 64-bit flags.
|
2015-04-12 11:57:49 -07:00 |
|
Henrik Rydgård
|
6dcf56530b
|
Add some missing FPU flags to MIPSTables.
|
2015-04-12 11:56:04 -07:00 |
|
Henrik Rydgård
|
071b6b986a
|
Best-effort update of the MipsJit prototype
|
2015-04-12 11:53:16 -07:00 |
|
Henrik Rydgård
|
d014d420db
|
Unify JitOptions across the backends.
This is required to make ExtractIR not a member of the various backends.
|
2015-04-12 11:41:26 -07:00 |
|
Henrik Rydgård
|
7bf67509d1
|
ARM: Cleanup a TODO in NEON VFPU.
|
2015-04-12 11:21:53 -07:00 |
|
Unknown W. Brackets
|
56f071d26a
|
x86jit: Support SIMD load/store with fastmem off.
Which is a lot faster, since it usually takes the fast path.
|
2015-04-11 01:22:50 -07:00 |
|
Henrik Rydgård
|
81dec36da8
|
Use an accessor to read the compilerPC.
In the IR it will be read from the block.
|
2015-04-11 01:14:37 -07:00 |
|
Henrik Rydgård
|
a897723e6a
|
Separate out jit reading nearby instructions.
This makes it easier to use an IR for these things, or remove them.
|
2015-04-11 00:53:24 -07:00 |
|
Henrik Rydgård
|
59d0baca93
|
Add way to print some block bloat stats.
|
2015-04-11 00:12:56 -07:00 |
|
Henrik Rydgård
|
115486e431
|
Fix some fp instruction in/out flags
|
2015-04-11 00:03:56 -07:00 |
|
Unknown W. Brackets
|
7ea9bcbc13
|
x86jit: Avoid mapping rs in vfpu load/store.
This allows immediate address load/store, when possible, which can be
faster (especially with slow mem enabled.)
|
2015-04-10 20:30:14 -07:00 |
|
Unknown W. Brackets
|
eaed080add
|
x86jit: Fix immediate kernel addresses.
Using a signed add + a value with the top bit set = bad. Will have to
live with losing the kernel bit here, should be fine.
|
2015-04-10 20:25:29 -07:00 |
|
Unknown W. Brackets
|
e58eb5e186
|
x86jit: Small optimization for fd->fd fp convert.
We just generate a little less code. This is also slightly faster
generally.
|
2015-04-10 20:07:43 -07:00 |
|
Unknown W. Brackets
|
7e38df077f
|
x86jit: Prefer MOVAPS over MOVSS for reg->reg.
|
2015-04-10 20:07:43 -07:00 |
|
Unknown W. Brackets
|
9069c84928
|
x86jit: Use ANDPS for abs.s.
Should be faster considering they're likely to use other floating point
math on it. As long as that's the case, this is faster than PAND.
|
2015-04-10 13:20:52 -07:00 |
|
Unknown W. Brackets
|
b2b20a6eee
|
Correct an invalid format parameter.
|
2015-04-08 12:17:24 -07:00 |
|
Unknown W. Brackets
|
8e8a18e9b5
|
Log read failures from hashmap too.
|
2015-04-08 12:10:45 -07:00 |
|
Unknown W. Brackets
|
3cb474047b
|
Fix potential shift by negative number.
|
2015-04-08 11:57:59 -07:00 |
|
Unknown W. Brackets
|
fc0788bc95
|
Avoid unpredictable behavior in error condition.
|
2015-04-08 11:57:57 -07:00 |
|
Unknown W. Brackets
|
b0d291032d
|
armjit Avoid cfc1/mfc1 to $0.
|
2015-04-07 18:30:36 -07:00 |
|
Unknown W. Brackets
|
7ce5841f30
|
jit: Avoid mfhi/mflo to $0.
|
2015-04-07 18:25:28 -07:00 |
|
Unknown W. Brackets
|
788b9d78f8
|
jit: Avoid a super unlikely write to zero.
|
2015-04-07 18:20:37 -07:00 |
|
Henrik Rydgård
|
a8c2d0945a
|
ARM64: lwl: Pass INVALID_REG to be sure SCRATCH1 doesn't get overwritten...
|
2015-04-06 18:13:41 +02:00 |
|
Henrik Rydgård
|
13c9390c53
|
ARM64: Emitter fix, disable swl/swr/lwl/lwr again fully
|
2015-04-06 18:13:38 +02:00 |
|
Henrik Rydgård
|
95cd1478de
|
Restore the x86 build.
|
2015-04-06 18:13:37 +02:00 |
|
Henrik Rydgård
|
fbaffdceab
|
Remove some outdated comments, minor stuff
|
2015-04-06 18:13:36 +02:00 |
|
Henrik Rydgard
|
0a70618f87
|
ARM64: Accurate floating point rounding. For some reason, FTZ doesn't seem to work though.
|
2015-04-06 18:13:36 +02:00 |
|
Henrik Rydgard
|
ad3d539451
|
ARM64: Attempt at lwl/lwr/swl/swr. The first two don't work
|
2015-04-06 18:13:35 +02:00 |
|
Henrik Rydgard
|
44286a2b37
|
ARM64: Accurate float->int conversion with rounding mode.
|
2015-04-06 18:13:34 +02:00 |
|
Henrik Rydgard
|
acf08eefa8
|
ARM64: Fix FCVTL, use it in v2hf
|
2015-04-06 18:13:33 +02:00 |
|
Henrik Rydgard
|
8eedcc7fb0
|
ARM64: Speedup fpu/vfpu load/stores too using "pointerification". Actually noticeable gain.
|
2015-04-06 18:13:32 +02:00 |
|
Henrik Rydgard
|
ad648baa9c
|
ARM64 regcache: Add support to "pointerify" registers. Use in load/store to cut down instructions.
|
2015-04-06 18:13:32 +02:00 |
|
Henrik Rydgard
|
2780eef595
|
ARM64: Another couple of VFPU ops
|
2015-04-06 18:13:31 +02:00 |
|
Henrik Rydgard
|
ca58f322e5
|
ARM64: Port over some missing VFPU instructions from ARM. Not much left now.
|
2015-04-06 18:13:30 +02:00 |
|
Henrik Rydgard
|
f06e9a9d18
|
ARM64: Even more VFPU instructions
|
2015-04-06 18:13:30 +02:00 |
|
Henrik Rydgard
|
1b1ab73b0f
|
ARM64: Enable some more VFPU instructions, some code cleanup
|
2015-04-06 18:13:29 +02:00 |
|
Henrik Rydgard
|
500ca94ab8
|
ARM64: Port over tons of VFPU code from ARM, leave most of it disabled.
|
2015-04-06 18:13:28 +02:00 |
|
Henrik Rydgard
|
8df8c210d1
|
ARM64: Start porting over VFPU stuff from ARM, fix regalloc bug
|
2015-04-06 18:13:28 +02:00 |
|
Henrik Rydgard
|
6cb107d6fc
|
ARM64: Fix LDP disassembly
|
2015-04-06 18:13:25 +02:00 |
|
Henrik Rydgard
|
34e61ab875
|
ARM64: More FPU instructions (int<->float convert), minor stuff
|
2015-04-06 18:13:25 +02:00 |
|
Henrik Rydgard
|
25ec85551f
|
ARM64: Implement FP compares, misc
|
2015-04-06 18:13:22 +02:00 |
|
Henrik Rydgard
|
ceb9f66502
|
ARM64: Fix bug in mult
|
2015-04-06 18:13:21 +02:00 |
|
Henrik Rydgard
|
1a02e32ad1
|
ARM64: Implement the multiplication instructions
|
2015-04-06 18:13:20 +02:00 |
|
Henrik Rydgard
|
a12e448fb4
|
ARM64: Stub vertex decoder jit, implementing just enough for the cube.elf cube.
|
2015-04-06 18:13:18 +02:00 |
|
Henrik Rydgard
|
acd9502b44
|
ARM64: stp/ldp disasm improvements
|
2015-04-06 18:13:17 +02:00 |
|