Commit Graph

469 Commits

Author SHA1 Message Date
Henrik Rydgard
5888b3bdc4 Revert "x86jit: Micro optimize slt* a bit."
This reverts commit ee66596b8d.

Broke a lot of games, probably some small bug.

Conflicts:
	Core/MIPS/x86/CompALU.cpp
2014-11-09 12:07:21 +01:00
Unknown W. Brackets
313d9e95c7 Clarify a comment. 2014-11-09 01:05:03 -08:00
Unknown W. Brackets
ee66596b8d x86jit: Micro optimize slt* a bit.
This improves their performance and hopefully latency.  It also avoids
filling registers that are not likely to be used again.
2014-11-08 22:54:03 -08:00
Unknown W. Brackets
27d8108bb2 x86jit: Optimize loads of 0 into fp regs. 2014-11-08 18:41:16 -08:00
Unknown W. Brackets
7d8858687e x86jit: Avoid speculative loads in mtc1/mfc1. 2014-11-08 18:35:15 -08:00
Unknown W. Brackets
57caa95273 x86jit: Implement round.w.s and friends.
They are not terribly fast, though, updating MXCSR.
2014-11-08 17:59:38 -08:00
Unknown W. Brackets
3908e0f445 x86jit: Small optimization for add.s f1, f2, f2.
Doubles the speed of that particular case.  Biggest difference is not
loading fd for no reason.
2014-11-08 17:32:53 -08:00
Unknown W. Brackets
f9893c29ce x86jit: Very small optimization to c.nge.s. 2014-11-08 17:01:02 -08:00
Unknown W. Brackets
78dfe43776 x86jit: Optimize neg.s and abs.s a tiny bit.
Same reg is probably a common case, improves micro benchmark.
2014-11-08 16:50:41 -08:00
Unknown W. Brackets
bed0d0b059 x86jit: Improve cvt.w.s when fd is loaded or fs.
We have no need to store it.
2014-11-08 16:40:54 -08:00
Unknown W. Brackets
1917d946ea x86jit: Micro optimize cvt.s.w a bit.
This implementation is about 5x faster for micro benchmarks.  Little
impact to overall perf in games I tested, though.
2014-11-08 13:30:38 -08:00
Unknown W. Brackets
671dee85c7 x86jit: Micro optimize vi2f a little bit.
This didn't help overall perf much but micro benchmarks are better.
2014-11-08 13:07:01 -08:00
Unknown W. Brackets
c29b126357 x86jit: Oops, can't have an imm here. 2014-11-08 12:41:48 -08:00
Unknown W. Brackets
c0be19edb6 x86jit: Simplify vavg a bit. 2014-11-08 12:40:04 -08:00
Unknown W. Brackets
761e269e5f x86jit: Avoid some regcache pollution. 2014-11-08 12:38:08 -08:00
Unknown W. Brackets
bc7497857a x86jit: Micro optimize vi2x a bit with ssse3/sse4.
Both are small wins.
2014-11-08 12:13:26 -08:00
Unknown W. Brackets
0e646f748a x86jit: Implement vi2x instructions.
Also, my opcodes were wrong in the test (shifted the pair bit the wrong
way, oops.)

AFAICT, there's no reason PSRAD/etc. were not encoding REX...
2014-11-08 12:13:26 -08:00
Unknown W. Brackets
ddc90ee550 x86jit: Implement vfad and vavg. 2014-11-08 12:13:25 -08:00
Unknown W. Brackets
5ae43defd9 Oops, these should be signed. 2014-11-08 09:39:17 -08:00
Unknown W. Brackets
316e923b40 x86jit: Implement other forms of vx2i.
Gains 3.2% performance in Grand Knights History.
2014-11-08 00:39:40 -08:00
Unknown W. Brackets
097a483d77 x86jit: Micro optimize vs2i a bit. 2014-11-06 22:45:54 -08:00
Unknown W. Brackets
3061e89250 Fix copy/paste mistake. 2014-11-04 01:41:17 -08:00
Unknown W. Brackets
0d36d4e082 Add a helper to reduce duplicate code.
This is not performance critical.  I wonder if compilers can inline
closures?
2014-11-03 23:50:23 -08:00
Unknown W. Brackets
16ca2b0155 x86jit: Fix trig vv2ops on 32-bit, arg. 2014-11-03 23:43:18 -08:00
Unknown W. Brackets
3e95763a3f x86jit: Implement other rounding modes in vf2i.
3% improvement in Grand Knights History.  I know other games use these
too.
2014-11-03 23:27:05 -08:00
Unknown W. Brackets
717cf25f0d x86jit: Use our sincos funcs for VV2Op as well.
Small (0.7%) speedup in Gods Eater Burst.  There's probably SSE
approximations we could use instead, but those will also need at least xmm
reg flushing/thunking.

At least this avoids flushing gprs, etc.  The sin and cos ops are fairly
common.
2014-11-03 22:13:38 -08:00
Unknown W. Brackets
5bb9d32eaa jit: Fix partial invalidation of larger blocks.
Fixes #7031.
2014-10-27 19:04:19 -07:00
Unknown W. Brackets
100afc07a2 x86jit: Fix andLink cases of imm blezl, etc. 2014-10-24 08:57:56 -07:00
Unknown W. Brackets
b53f13480a x86jit: Centralize continuing logic. 2014-10-12 19:01:04 -07:00
Unknown W. Brackets
d98adf27d6 x86jit: Add proxy blocks for continuing. 2014-10-12 17:15:31 -07:00
Unknown W. Brackets
01f9521dc5 jit: Invalidate blocks even if they end unevenly.
This allows blocks to start and end where ever they need, which should be
good for replacements and for continuing.
2014-10-12 17:13:04 -07:00
Unknown W. Brackets
90821b761d x86jit: Pad linked exits with breakpoints.
So that we don't get garbage, and so we see if we end up there.
2014-10-12 16:00:58 -07:00
Unknown W. Brackets
5fd402222b x86jit: Use the shorter MDisp() offset for andLink. 2014-10-12 15:18:22 -07:00
Unknown W. Brackets
0f32103615 x86jit: Consistently use mips_. 2014-10-12 15:16:09 -07:00
Henrik Rydgård
afbe50d3b9 Merge pull request #6998 from unknownbrackets/jit-minor2
x86jit: Preload sp and similar regs used often
2014-10-13 00:00:28 +02:00
Unknown W. Brackets
e3a04aa2d2 x86jit: Preload sp and similar regs used often.
This can help us avoid using a temporary.

Very tiny performance improvement.
2014-10-12 14:53:56 -07:00
Unknown W. Brackets
6fae78cd3f x86jit: Fix a bug in branch continuing.
When we predict it won't take a likely delay slot, we'd lose our register
allocation state.
2014-10-12 12:51:47 -07:00
Unknown W. Brackets
2f598e8f38 jit: Statically jump for fixed branches.
This handles both loops (first step is known) and static branches (some
code uses them instead of jumps, and we disassemble that to "b".)

Not likely to be a big improvement, but might help if the branch predictor
was wrong.

This is as opposed to continuing, which would build a larger jit block.
2014-10-12 12:51:47 -07:00
Unknown W. Brackets
9228ac72da jit: Reorganize imm branch logic a bit. 2014-10-12 12:51:46 -07:00
Unknown W. Brackets
4d30288601 x86jit: Fix force flush to zero. 2014-10-12 12:51:46 -07:00
Unknown W. Brackets
928e2adfc9 jit: Avoid applying/restoring the rounding mode.
If the game never sets it, we can skip around syscalls, interpreter,
replacements, etc.
2014-10-12 12:51:45 -07:00
Unknown W. Brackets
8d0dca71fe jit: Rename the rounding mode funcs to clarify.
They apply/restore the value, set/clear is confusing.
2014-10-12 11:35:20 -07:00
Henrik Rydgard
8177b4c43b Avoid an ifdef using PTRBITS 2014-10-12 19:35:55 +02:00
Henrik Rydgård
eab010a0c0 x86 JIT: Sacrifice a register for a pointer to the MIPS context. Shrinks emitted x86 code considerably.
Nice in 64-bit, but might be a bit too much in 32-bit though... Needs testing.
2014-10-12 19:35:55 +02:00
Henrik Rydgård
f99c2cd010 x86 Jit: Generate nicer code for some cases of addiu 2014-10-12 17:47:53 +02:00
Unknown W. Brackets
4210ba44eb Clean up a few more ImmPtr() cases. 2014-09-21 08:34:27 -07:00
Unknown W. Brackets
52b6f1095e armjit: Fix rounding mode, allow non flush-to-zero.
Default: force flush to zero (for RunFast mode.)  But now it's an ini
option so we can more easily compare armjit differences.
2014-09-11 07:58:51 -07:00
Andrew Church
3033dc5138 Revert to unconditional ClearRoundingMode() when setting FCR31. 2014-09-04 11:36:56 +09:00
Andrew Church
128122af39 Fix broken rounding mode handling. 2014-09-04 11:30:11 +09:00
Andrew Church
726cb851b9 Don't unconditionally ClearRoundingMode() before setting it. 2014-09-04 09:28:56 +09:00