Since we do an immediate overwrite of the file we are copying, we can
instead do a rename. Failure on rename is fine, will either mean the
telemetry file didn't exist initially, or some other permission error so
the telemetry will get lost regardless.
This was a funny joke that this was here, but it is fundamentally
incompatible with what we're doing. All those users are running proot
anyway because of how broken running under termux directly is.
Just remove this from here.
Take e.g a forward rep movsb copy from addr 0 to 1, the expected
behaviour since this is a bytewise copy is:
before: aaabbbb...
after: aaaaaaa...
but by copying in 32-byte chunks we end up with:
after: aaaabbbb...
due to the self overwrites not occuring within a single 32 bit copy.
When TSO is disabled, vector LDP/STP can be used for a two
instruction 32 byte memory copy which is significantly faster than the
current byte-by-byte copy. Performing two such copies directly after
oneanother also marginally increases copy speed for all sizes >=64.
I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.
In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%
Missed this instruction when implementing rdtscp. Returns the same ID
result in a register just like rdtscp, but without the cycle counter
results. Doesn't touch any flags just like rdtscp.
This unit test recreates the error condition that #3478 causes.
With a string operation that is a backwards copy then the optimization
will read past the end of the page and result in a crash.
Seemingly only happens with backwards string operations, but test
forward and backward in this test.
x86 has a few prefetch instructions.
- prefetch - One of two classic 3DNow! instructions
- Prefetch in to L1 data cache
- prefetchw - One of two classic 3DNow! instructions
- Implies prefetch in to L1 data cache
- Prefetch cacheline with intent to write and exclusive ownership
- prefetchnta
- Prefetch non-temporal data in respect to /all/ cache levels
- Assumes inclusive caches?
- prefetch{t0,t1,t2}
- Prefetch data with respect to each cache level
- T0 = L1 and higher
- T1 = L2 and higher
- T2 = L3 and higher
**Some silly duplicates**
- prefetchwt1
- Duplicate of prefetchw but explicitly L1 data cache
- prefetch_exclusive
- Duplicate of prefetch
God Of War 2018 uses prefetchw as a hint for exclusive ownership of the
cacheline in some very aggressive spin-loops. Let's implement the
operations to help it along.
This function can be unit-tested more easily, and the stack special is more
cleanly handled as a post-collection step.
There is a minor functional change: The stack special case didn't trigger
previously if the range end was within the stack mapping. This is now fixed.