I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.
In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%
Missed this instruction when implementing rdtscp. Returns the same ID
result in a register just like rdtscp, but without the cycle counter
results. Doesn't touch any flags just like rdtscp.
This unit test recreates the error condition that #3478 causes.
With a string operation that is a backwards copy then the optimization
will read past the end of the page and result in a crash.
Seemingly only happens with backwards string operations, but test
forward and backward in this test.
x86 has a few prefetch instructions.
- prefetch - One of two classic 3DNow! instructions
- Prefetch in to L1 data cache
- prefetchw - One of two classic 3DNow! instructions
- Implies prefetch in to L1 data cache
- Prefetch cacheline with intent to write and exclusive ownership
- prefetchnta
- Prefetch non-temporal data in respect to /all/ cache levels
- Assumes inclusive caches?
- prefetch{t0,t1,t2}
- Prefetch data with respect to each cache level
- T0 = L1 and higher
- T1 = L2 and higher
- T2 = L3 and higher
**Some silly duplicates**
- prefetchwt1
- Duplicate of prefetchw but explicitly L1 data cache
- prefetch_exclusive
- Duplicate of prefetch
God Of War 2018 uses prefetchw as a hint for exclusive ownership of the
cacheline in some very aggressive spin-loops. Let's implement the
operations to help it along.
This function can be unit-tested more easily, and the stack special is more
cleanly handled as a post-collection step.
There is a minor functional change: The stack special case didn't trigger
previously if the range end was within the stack mapping. This is now fixed.
can help a lot of x86 code because x86 is 2-address and a64 is 3-address, so x86
ends up with piles of movs that end up dead after translation
It's not a win across the board because our RA isn't aware of tied registers so
sometimes we regress moves. But it's a win on average, and the RA bits can be
improved with time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Patch written by Sonicadvance1. Unclear how this wasn't already broken, but we
need this to keep CI happy with the rest of this series.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Arguments and conditional doesn't get optimized out in release builds
for the inline function call versus the define.
Was showing up an annoying amount of time when testing.
The NVIF ioctl isn't publicly described in the nouveau headers and it is
required for anything to work with Nouveau.
Pass the ioctl command through without modification and hope that this
ioctl is architecture agnostic.
Folds reg+const memory address into addressing mode,
if the constant is within 16Kb.
Update instcountci files.
Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm