Commit Graph

63 Commits

Author SHA1 Message Date
Unknown W. Brackets
a201d3f561 samplerjit: Fix non-AVX three-op shift.
Oops, was still shifting the source register.
2022-02-15 20:12:45 -08:00
Unknown W. Brackets
16dca4f69b x86jit: Use BMI2 for variable shifts.
We don't actually regalloc ECX, but this still saves a copy, and on modern
CPUs these seem to be pretty fast.
2022-01-31 19:38:17 -08:00
Unknown W. Brackets
c1e657ed47 samplerjit: Better vectorize UV linear calc.
Gives about 1-2% when mips are used.
2022-01-24 20:42:07 -08:00
Unknown W. Brackets
8573c34f85 x86jit: Check CALL dist for safe memory funcs. 2022-01-22 00:14:15 -08:00
Unknown W. Brackets
0ba2d05da5 samplerjit: Simplify AVX shift-copies.
These have been the most common and the fallback is safe.  Let's just add
a helper.
2022-01-17 15:15:36 -08:00
Unknown W. Brackets
ce6ea8da11 samplerjit: Apply gather lookup to all CLUT4. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828 samplerjit: Use VPGATHERDD for simple CLUT4 loads.
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
1addf84e90 samplerjit: Use SSSE3/SSE4 in linear filtering. 2021-12-30 23:22:56 -08:00
Unknown W. Brackets
7aa9664d20 x64jit: Add AVX2-only instructions. 2021-12-29 19:46:26 -08:00
Unknown W. Brackets
7508fcc22d x64jit: Add AVX-only instructions. 2021-12-29 19:46:26 -08:00
Unknown W. Brackets
147b81d6f7 x64jit: Add AVX/AVX2 encodings.
Also fix the FMA double ones, which were passing W wrongly.
2021-12-29 19:46:26 -08:00
Unknown W. Brackets
bf06342f9d samplerjit: Minor SSE4 optimizations.
These seem to be a bit faster.
2021-12-29 07:07:35 -08:00
Unknown W. Brackets
820361f34b samplerjit: Calculate texel byte offset as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
3f3e0ea8cf softjit: Optimize typical alpha/depth test.
Messed with SSE4 then realized there's no point, just use SHR.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
4178f09e57 Build: More consistently avoid _M_ defines.
We use PPSSPP_ARCH in several places already, this makes it more complete.
2021-03-02 21:49:21 -08:00
Gleb Mazovetskiy
7305ba9d9b x64Emitter: Fix unaligned store UBSAN errors
This compiles to the same assembly as before even without optimizations and avoids UB.

https://godbolt.org/z/4G5edM

While the UB here is benign, this improves signal-to-noise ratio of UBSAN errors.

Fixes #14005
2021-01-30 12:26:01 +00:00
Henrik Rydgård
e8a9845d93 First step of cleaning up Log.h. Plus a few other bits and bobs. 2020-08-16 14:48:54 +02:00
Henrik Rydgård
0829543987 Third part of getting rid of PanicAlert 2020-07-19 20:34:02 +02:00
Henrik Rydgård
47a3bf1dd7 Step 2 of removing PanicAlert 2020-07-19 20:34:02 +02:00
Henrik Rydgård
c5e0b799d9 Remove category from _assert_msg_ functions. We don't filter these by category anyway.
Fixes the inconsistency where we _assert_ didn't take a category but
_assert_msg_ did.
2020-07-19 20:33:25 +02:00
Unknown W. Brackets
7910b4029a arm64jit: Track writable and non-writable pointers.
Switch uses different memory regions.  We can handle this, might as well
cleanup some const abuse.
2020-05-17 00:15:12 -07:00
Henrik Rydgård
381c4ca4b2 X64: Fix bug in a case in the MOVQ emitter : rex byte should be after the 0x66 prefix 2017-07-07 11:33:07 +02:00
Henrik Rydgård
0645677fea Access FPU temps through CTXREG 2017-07-07 11:33:06 +02:00
Unknown W. Brackets
cb3db559bd SoftGPU: Jit the linear sampling too.
For now, just reducing overhead.  Could be smarter.
2017-05-30 22:57:46 -07:00
Henrik Rydgård
0ec1e5e3b2 Don't erase and rewrite the dispatcher when the cache is cleared. Fixes #9708 2017-05-26 15:48:03 +02:00
Henrik Rydgard
323eb72b7c Write-protect the dispatcher on all platforms. 2016-08-28 13:35:27 +02:00
Henrik Rydgard
ffe4c266ef Add CodeBlockCommon base class to remove further arch-specificity in JitBlockCache
Remove unused ArmThunk.
2016-05-01 11:40:00 +02:00
Henrik Rydgard
88f25fd50e x86-64: Fix L bit in VEX instruction emitter. Ported fix from Citra.
Currently unused in the emulator, though.
2016-02-28 13:07:24 +01:00
aroulin
8a09dedf94 x64Emitter: add RCPPS and RCPSS SSE instructions 2015-08-23 16:43:07 +02:00
Henrik Rydgard
604abe933e Update submodules, add x64Emitter bugfix from Dolphin (plus a few new instrs), misc 2015-01-11 00:12:32 +01:00
Henrik Rydgard
4ec30d98e1 Port the x86 and ARM emitters over to use the generic CodeBlock class 2014-12-15 22:32:55 +01:00
Henrik Rydgard
2bce7bc460 X64Emitter: Merge some AVX stuff from Dolphin 2014-12-07 23:09:38 +01:00
Henrik Rydgard
5290ffd929 Minor cleanup in vtfm. Re-enable vrot combination. Optimize vfad/vavg when dpps is available.
Also fixes bug in emitter of dpps.
2014-12-03 22:44:32 +01:00
Henrik Rydgard
344f71b092 x86 jit: Commit commented-out haddps-based vdot.q as reminder not to use haddps... 2014-11-28 00:19:11 +01:00
Henrik Rydgard
5033babb10 x86 Jit: SIMD-ify vdot 2014-11-26 23:47:18 +01:00
Henrik Rydgard
28ca8d4818 x86 jit: Use LEA to emulate addu but only when it can save a few bytes 2014-11-16 17:39:47 +01:00
Unknown W. Brackets
bc7497857a x86jit: Micro optimize vi2x a bit with ssse3/sse4.
Both are small wins.
2014-11-08 12:13:26 -08:00
Unknown W. Brackets
0e646f748a x86jit: Implement vi2x instructions.
Also, my opcodes were wrong in the test (shifted the pair bit the wrong
way, oops.)

AFAICT, there's no reason PSRAD/etc. were not encoding REX...
2014-11-08 12:13:26 -08:00
Unknown W. Brackets
d7bdded6f8 x86jit: fix rip addressing on PEXTRW/PINSRW.
I think this is right anyway, not 100% sure.
2014-11-03 23:18:32 -08:00
Unknown W. Brackets
844c7e73d3 x86jit: Add SSE 4.1 rounding ops to emitter. 2014-11-03 23:18:09 -08:00
Henrik Rydgård
7bde976069 Merge x64 emitter from a newer Dolphin version.
This one can generate slightly smaller code by exploiting some EAX-only
encoding and various other short forms, and adds support for many newer
CPU instructions.
2014-10-12 19:46:58 +02:00
Henrik Rydgård
281ab5f9cb Sync x64 emitter to Dolphin's. 2014-10-12 19:45:26 +02:00
Unknown W. Brackets
e1a57abcb4 Fix mixed newline style. 2014-09-20 08:30:37 -07:00
Henrik Rydgard
62054b1e7b Fix PINSRW/PEXTRW emitters.
Fixes crash introduced in 5276487611
(apparently we haven't used PINSRW before)
2014-09-20 11:46:05 +02:00
Henrik Rydgard
215abfb951 Some cleanup in /Common 2014-09-06 10:47:25 +02:00
Henrik Rydgard
d3dce422a8 X64emitter: merge from dolphin 2014-07-20 00:21:28 +02:00
Henrik Rydgard
221216b5b2 Bugfix in x64 emitter, thanks magumagu 2014-03-27 22:25:30 +01:00
Unknown W. Brackets
632eec38e8 vertexjit: Use SSE4.1 where available on x86.
Just because we can.
2014-03-22 16:11:16 -07:00
Unknown W. Brackets
162f229294 vertexjit: Support the color morphs on x86. 2014-03-22 15:56:29 -07:00
Unknown W. Brackets
f14361c3b8 Add a bunch more missing cstring includes. 2013-12-30 21:37:19 -08:00