Commit Graph

557 Commits

Author SHA1 Message Date
Henrik Rydgard
804de50711 x86 jit: SIMD-ify VFPU register file writebacks where possible 2014-11-26 01:33:05 +01:00
Henrik Rydgard
b3c8a82c49 x86 jit: SIMD-ify some more 2014-11-25 23:56:46 +01:00
Henrik Rydgard
b5ee47a80c x86 jit: SIMD-ify lv.q and sv.q 2014-11-25 23:28:29 +01:00
Henrik Rydgård
4db6b7f3e2 SIMD-ify a couple instructions a bit 2014-11-25 22:47:26 +01:00
Unknown W. Brackets
a4b9122943 x86jit: Use NS instead of NBE for checked entries.
This may cause us to more correctly bail on linked blocks in some cases.
2014-11-23 11:05:49 -08:00
Unknown W. Brackets
fe525a52f9 Update native (shutdown crash) + comment. 2014-11-23 11:04:07 -08:00
Unknown W. Brackets
473f388088 Disable the simd stuff for now.
Won't have time to look at this for a bit...
2014-11-20 14:07:56 -08:00
Henrik Rydgård
6a49337a0c Merge pull request #7096 from unknownbrackets/jit-simd
x86jit: Add basic support for mapping SIMD
2014-11-18 18:25:39 +01:00
Unknown W. Brackets
ab7dd0df25 x86jit: Add an option to enable/disable vpfu simd. 2014-11-17 20:37:27 -08:00
Henrik Rydgard
53b5d331b4 Assorted minor optimizations 2014-11-17 21:21:44 +01:00
Unknown W. Brackets
921b39ebf5 x86jit: Optimize a 2-reg simd load. 2014-11-16 15:05:17 -08:00
Unknown W. Brackets
e68eb0a292 x86jit: Load sequential regs in one shot. 2014-11-16 15:05:17 -08:00
Unknown W. Brackets
ed501302a2 x86jit: Add a check to see if we can map simd. 2014-11-16 15:05:16 -08:00
Unknown W. Brackets
27148d3712 x86jit: Add some helpers to check state. 2014-11-16 13:33:16 -08:00
Unknown W. Brackets
de566be2ce x86jit: Split out the logic for loading simd regs. 2014-11-16 13:33:15 -08:00
Unknown W. Brackets
5347431c20 x86jit: Initial simd for VecDo3(). Broken.
I'm not sure why/where it's broken...
2014-11-16 13:33:15 -08:00
Unknown W. Brackets
aad505e7b3 x86jit: Add a TryMapDirtyInInVS() for 3-op. 2014-11-16 13:33:14 -08:00
Unknown W. Brackets
88a753eff3 x86jit: Add an invariant contract to the fpu cache.
This should help catch things better in debug mode.
2014-11-16 13:33:14 -08:00
Unknown W. Brackets
39afeb490f x86jit: Add some typesafety. 2014-11-16 13:33:13 -08:00
Unknown W. Brackets
4335bf3346 x86jit: Add basic mapping of SIMD regs.
Not tested yet, just sketched out.  All very suboptimal.
2014-11-16 13:33:13 -08:00
Unknown W. Brackets
9429359b47 x86jit: Add fallbacks when moving from VS -> V. 2014-11-16 13:33:12 -08:00
Unknown W. Brackets
2862367927 x86jit: Add force-non-simd to all current ops.
Unless they already use MapRegs, because that will automatically handle
it.
2014-11-16 13:33:12 -08:00
Unknown W. Brackets
4cf0913692 x86jit: Sketch some initial SIMD apis. 2014-11-16 13:33:07 -08:00
Henrik Rydgard
bfcd3690b6 x86 jit: Fix+enable quaternion product, optimize "sw zero, *" 2014-11-16 18:37:38 +01:00
Henrik Rydgard
28ca8d4818 x86 jit: Use LEA to emulate addu but only when it can save a few bytes 2014-11-16 17:39:47 +01:00
Henrik Rydgard
1c78e29c79 x86 jit: For clarity, use TEMPREG where it doesn't matter that it's EAX.
Might have missed a few places.
2014-11-16 17:38:26 +01:00
Henrik Rydgard
8b90f881b8 x86 jit: A tiny optimization and a tiny bugfix 2014-11-16 16:46:35 +01:00
Unknown W. Brackets
096b41cceb x86jit: Interleave reg usage in vcmp. 2014-11-10 23:22:04 -08:00
Unknown W. Brackets
0e1aa35e84 x86jit: Just do the ES/NS compare once. 2014-11-10 23:04:38 -08:00
Unknown W. Brackets
2758e8fa3c x86jit: Optimize vcmp for single and simd. 2014-11-10 23:04:37 -08:00
Unknown W. Brackets
86e3739a3e x86jit: Optimize some cases of ins/ext.
They happen but are minor.
2014-11-09 09:22:29 -08:00
Unknown W. Brackets
e05263af32 x86jit: Allow EBX sign extension for 32-bit. 2014-11-09 09:07:52 -08:00
Unknown W. Brackets
8dbd3c3b9c x86jit: Don't lie about ZERO when it's not an imm. 2014-11-09 08:27:02 -08:00
Unknown W. Brackets
d0a2ced2f9 x86jit: Flip cc in stl* to avoid reg loads.
Unfortunately, this zero thing is now concerning me...
2014-11-09 08:15:39 -08:00
Unknown W. Brackets
59f491eddb x86jit: Micro optimize slt* a bit.
This improves their performance and hopefully latency.  It also avoids
filling registers that are not likely to be used again.

Fixed a small mistake.
2014-11-09 07:23:44 -08:00
Henrik Rydgard
18495a452d Rename an enum 2014-11-09 14:55:23 +01:00
Henrik Rydgard
a19d0b648a x86 jit: Add a simple speedhack (ignore masking stack pointers) but disable due to low impact. 2014-11-09 14:54:39 +01:00
Henrik Rydgard
a528921f3c x86 JIT: EBX was free in 32-bit mode, let's use it in the regcache. 2014-11-09 12:55:17 +01:00
Henrik Rydgard
5888b3bdc4 Revert "x86jit: Micro optimize slt* a bit."
This reverts commit ee66596b8d.

Broke a lot of games, probably some small bug.

Conflicts:
	Core/MIPS/x86/CompALU.cpp
2014-11-09 12:07:21 +01:00
Unknown W. Brackets
313d9e95c7 Clarify a comment. 2014-11-09 01:05:03 -08:00
Unknown W. Brackets
ee66596b8d x86jit: Micro optimize slt* a bit.
This improves their performance and hopefully latency.  It also avoids
filling registers that are not likely to be used again.
2014-11-08 22:54:03 -08:00
Unknown W. Brackets
27d8108bb2 x86jit: Optimize loads of 0 into fp regs. 2014-11-08 18:41:16 -08:00
Unknown W. Brackets
7d8858687e x86jit: Avoid speculative loads in mtc1/mfc1. 2014-11-08 18:35:15 -08:00
Unknown W. Brackets
57caa95273 x86jit: Implement round.w.s and friends.
They are not terribly fast, though, updating MXCSR.
2014-11-08 17:59:38 -08:00
Unknown W. Brackets
3908e0f445 x86jit: Small optimization for add.s f1, f2, f2.
Doubles the speed of that particular case.  Biggest difference is not
loading fd for no reason.
2014-11-08 17:32:53 -08:00
Unknown W. Brackets
f9893c29ce x86jit: Very small optimization to c.nge.s. 2014-11-08 17:01:02 -08:00
Unknown W. Brackets
78dfe43776 x86jit: Optimize neg.s and abs.s a tiny bit.
Same reg is probably a common case, improves micro benchmark.
2014-11-08 16:50:41 -08:00
Unknown W. Brackets
bed0d0b059 x86jit: Improve cvt.w.s when fd is loaded or fs.
We have no need to store it.
2014-11-08 16:40:54 -08:00
Unknown W. Brackets
1917d946ea x86jit: Micro optimize cvt.s.w a bit.
This implementation is about 5x faster for micro benchmarks.  Little
impact to overall perf in games I tested, though.
2014-11-08 13:30:38 -08:00
Unknown W. Brackets
671dee85c7 x86jit: Micro optimize vi2f a little bit.
This didn't help overall perf much but micro benchmarks are better.
2014-11-08 13:07:01 -08:00
Unknown W. Brackets
c29b126357 x86jit: Oops, can't have an imm here. 2014-11-08 12:41:48 -08:00
Unknown W. Brackets
c0be19edb6 x86jit: Simplify vavg a bit. 2014-11-08 12:40:04 -08:00
Unknown W. Brackets
761e269e5f x86jit: Avoid some regcache pollution. 2014-11-08 12:38:08 -08:00
Unknown W. Brackets
bc7497857a x86jit: Micro optimize vi2x a bit with ssse3/sse4.
Both are small wins.
2014-11-08 12:13:26 -08:00
Unknown W. Brackets
0e646f748a x86jit: Implement vi2x instructions.
Also, my opcodes were wrong in the test (shifted the pair bit the wrong
way, oops.)

AFAICT, there's no reason PSRAD/etc. were not encoding REX...
2014-11-08 12:13:26 -08:00
Unknown W. Brackets
ddc90ee550 x86jit: Implement vfad and vavg. 2014-11-08 12:13:25 -08:00
Unknown W. Brackets
5ae43defd9 Oops, these should be signed. 2014-11-08 09:39:17 -08:00
Unknown W. Brackets
316e923b40 x86jit: Implement other forms of vx2i.
Gains 3.2% performance in Grand Knights History.
2014-11-08 00:39:40 -08:00
Unknown W. Brackets
097a483d77 x86jit: Micro optimize vs2i a bit. 2014-11-06 22:45:54 -08:00
Unknown W. Brackets
3061e89250 Fix copy/paste mistake. 2014-11-04 01:41:17 -08:00
Unknown W. Brackets
0d36d4e082 Add a helper to reduce duplicate code.
This is not performance critical.  I wonder if compilers can inline
closures?
2014-11-03 23:50:23 -08:00
Unknown W. Brackets
16ca2b0155 x86jit: Fix trig vv2ops on 32-bit, arg. 2014-11-03 23:43:18 -08:00
Unknown W. Brackets
3e95763a3f x86jit: Implement other rounding modes in vf2i.
3% improvement in Grand Knights History.  I know other games use these
too.
2014-11-03 23:27:05 -08:00
Unknown W. Brackets
717cf25f0d x86jit: Use our sincos funcs for VV2Op as well.
Small (0.7%) speedup in Gods Eater Burst.  There's probably SSE
approximations we could use instead, but those will also need at least xmm
reg flushing/thunking.

At least this avoids flushing gprs, etc.  The sin and cos ops are fairly
common.
2014-11-03 22:13:38 -08:00
Unknown W. Brackets
5bb9d32eaa jit: Fix partial invalidation of larger blocks.
Fixes #7031.
2014-10-27 19:04:19 -07:00
Unknown W. Brackets
100afc07a2 x86jit: Fix andLink cases of imm blezl, etc. 2014-10-24 08:57:56 -07:00
Unknown W. Brackets
b53f13480a x86jit: Centralize continuing logic. 2014-10-12 19:01:04 -07:00
Unknown W. Brackets
d98adf27d6 x86jit: Add proxy blocks for continuing. 2014-10-12 17:15:31 -07:00
Unknown W. Brackets
01f9521dc5 jit: Invalidate blocks even if they end unevenly.
This allows blocks to start and end where ever they need, which should be
good for replacements and for continuing.
2014-10-12 17:13:04 -07:00
Unknown W. Brackets
90821b761d x86jit: Pad linked exits with breakpoints.
So that we don't get garbage, and so we see if we end up there.
2014-10-12 16:00:58 -07:00
Unknown W. Brackets
5fd402222b x86jit: Use the shorter MDisp() offset for andLink. 2014-10-12 15:18:22 -07:00
Unknown W. Brackets
0f32103615 x86jit: Consistently use mips_. 2014-10-12 15:16:09 -07:00
Henrik Rydgård
afbe50d3b9 Merge pull request #6998 from unknownbrackets/jit-minor2
x86jit: Preload sp and similar regs used often
2014-10-13 00:00:28 +02:00
Unknown W. Brackets
e3a04aa2d2 x86jit: Preload sp and similar regs used often.
This can help us avoid using a temporary.

Very tiny performance improvement.
2014-10-12 14:53:56 -07:00
Unknown W. Brackets
6fae78cd3f x86jit: Fix a bug in branch continuing.
When we predict it won't take a likely delay slot, we'd lose our register
allocation state.
2014-10-12 12:51:47 -07:00
Unknown W. Brackets
2f598e8f38 jit: Statically jump for fixed branches.
This handles both loops (first step is known) and static branches (some
code uses them instead of jumps, and we disassemble that to "b".)

Not likely to be a big improvement, but might help if the branch predictor
was wrong.

This is as opposed to continuing, which would build a larger jit block.
2014-10-12 12:51:47 -07:00
Unknown W. Brackets
9228ac72da jit: Reorganize imm branch logic a bit. 2014-10-12 12:51:46 -07:00
Unknown W. Brackets
4d30288601 x86jit: Fix force flush to zero. 2014-10-12 12:51:46 -07:00
Unknown W. Brackets
928e2adfc9 jit: Avoid applying/restoring the rounding mode.
If the game never sets it, we can skip around syscalls, interpreter,
replacements, etc.
2014-10-12 12:51:45 -07:00
Unknown W. Brackets
8d0dca71fe jit: Rename the rounding mode funcs to clarify.
They apply/restore the value, set/clear is confusing.
2014-10-12 11:35:20 -07:00
Henrik Rydgard
8177b4c43b Avoid an ifdef using PTRBITS 2014-10-12 19:35:55 +02:00
Henrik Rydgård
eab010a0c0 x86 JIT: Sacrifice a register for a pointer to the MIPS context. Shrinks emitted x86 code considerably.
Nice in 64-bit, but might be a bit too much in 32-bit though... Needs testing.
2014-10-12 19:35:55 +02:00
Henrik Rydgård
f99c2cd010 x86 Jit: Generate nicer code for some cases of addiu 2014-10-12 17:47:53 +02:00
Unknown W. Brackets
4210ba44eb Clean up a few more ImmPtr() cases. 2014-09-21 08:34:27 -07:00
Unknown W. Brackets
52b6f1095e armjit: Fix rounding mode, allow non flush-to-zero.
Default: force flush to zero (for RunFast mode.)  But now it's an ini
option so we can more easily compare armjit differences.
2014-09-11 07:58:51 -07:00
Andrew Church
3033dc5138 Revert to unconditional ClearRoundingMode() when setting FCR31. 2014-09-04 11:36:56 +09:00
Andrew Church
128122af39 Fix broken rounding mode handling. 2014-09-04 11:30:11 +09:00
Andrew Church
726cb851b9 Don't unconditionally ClearRoundingMode() before setting it. 2014-09-04 09:28:56 +09:00
Andrew Church
5816685668 Handle the FS (flush-to-zero) bit in FCR31 for x86 JIT. 2014-09-04 01:50:24 +09:00
Unknown W. Brackets
4a1514730f x86jit/ppcjit: Correct some bad sltiu compares. 2014-09-02 08:04:22 -07:00
Unknown W. Brackets
4459b8f483 jit: Actually jit vmtfc/vmfvc.
Sicne we have them and they are easy.
2014-09-01 23:13:39 -07:00
Unknown W. Brackets
5f6f6827b5 jit: Update rounding mode immediately on ctc1. 2014-08-30 23:48:27 -07:00
Unknown W. Brackets
e8cdbcc33f x86jit: Fix some flags/EAX trashing in rounding.
Fixes #6810.
2014-08-30 16:46:43 -07:00
Unknown W. Brackets
925557ed47 x86jit: Maintain the rounding mode always.
This should be less often than doing it per block that uses fpu, unless
the game doesn't use fpu much at all.
2014-08-22 09:53:00 -07:00
Unknown W. Brackets
ab13b36484 x86jit: Implement cvt.w.s.
Not really used that often, anyway, but easy enough and good for testing
that we set the rounding mode correctly.
2014-08-22 00:01:06 -07:00
Unknown W. Brackets
dc91dc1ce8 x86jit: Support fpu rounding modes for mul, etc.
Fixes Gods Eater Burst loading PSP savedata, but can no longer load old
savedata.
2014-08-21 23:59:55 -07:00
Unknown W. Brackets
245a2a3be0 Don't zero out downcount in replacements.
It doesn't write out js.downcountAmount in any of these cases, so zeroing
it is wrong.
2014-08-03 13:22:30 -07:00
Unknown W. Brackets
d060a06fa6 Disable a bunch of function replacements.
These are just for speed, let's turn them off.  Using a flag because:
 * I think there's still some issue with savestates, not sure.
 * We might swap this flag to a separate option.
2014-08-03 13:15:41 -07:00
Henrik Rydgard
82421f4dcf x86 jit: Further fix for nor, thanks unknown
See #6638
2014-07-27 22:26:35 +02:00
Henrik Rydgard
903ddbc513 x86 JIT: Fix bug where NOR would not get computed correctly in corner case
(CompTriArith can end up not actually mapping rd to a register when taking
a shortcut)

May fix the JIT issue mentioned by CPkmn and located by daniel229 as an aside in #6638
2014-07-27 21:41:41 +02:00