Just like #3508, clang-18 complains about VLA usage.
This vector is relatively small, only around 18 elements but is
semi-dynamic depending on arch and if FEXCore is targeting Linux or
Win32.
It has been a long time coming that FEX no longer needed to leak IR
implementation details to the frontend, this was legacy due to IR CI and
various other problems.
Now that the last bits of IR leaking has been removed, move everything
that we can internally to the implementation.
We still have a couple of minor details in the exposed IR.h to the
frontend, but these are limited to a few enums and some thunking struct
information rather than all the implementation details.
No functional change with this, just moving headers around.
FEXCore includes was including an FHU header which would result in
compilation failure for external projects trying to link to libFEXCore.
Moves it over to fix this, it was the only FHU usage in FEXCore/include
NFC
This is no longer necessary to be part of the public API. Moves the
header internally.
Needed to pass through `IsAddressInCodeBuffer` from CPUBackend through
the Context object, but otherwise no functional change.
In the old case:
* if we take the branch, 1 instruction
* if we don't take the branch, 3 instruction
* branch predictor fun
* 3 instructions of icache pressure
In the new case:
* unconditionally 2 instructions
* no branch predictor dependence
* 2 instructions of icache pressure
This should not be non-neglibly worse, and it simplifies things for RA.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
exhaustively checked against the Intel pseudocode since this is tricky:
def intel(AL, CF, AF):
old_AL = AL
old_CF = CF
CF = False
if (AL & 0x0F) > 9 or AF:
Borrow = AL < 6
AL = (AL - 6) & 0xff
CF = old_CF or Borrow
AF = True
else:
AF = False
if (old_AL > 0x99) or old_CF:
AL = (AL - 0x60) & 0xff
CF = True
return (AL & 0xff, CF, AF)
def fex(AL, CF, AF):
AF = AF | ((AL & 0xf) > 9)
CF = CF | (AL > 0x99)
NewCF = CF | (AF if (AL < 6) else CF)
AL = (AL - 6) if AF else AL
AL = (AL - 0x60) if CF else AL
return (AL & 0xff, NewCF, AF)
for AL in range(256):
for CF in [False, True]:
for AF in [False, True]:
ref = intel(AL, CF, AF)
test = fex(AL, CF, AF)
print(AL, "CF" if CF else "", "AF" if AF else "", ref, test)
assert(ref == test)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Based on https://www.righto.com/2023/01/
New implementation is branchless, which is theoretically easier to RA. It's also
massively simpler which is good for a demon opcode.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Since we do an immediate overwrite of the file we are copying, we can
instead do a rename. Failure on rename is fine, will either mean the
telemetry file didn't exist initially, or some other permission error so
the telemetry will get lost regardless.
This may be useful for tracking TSO faulting when it manages to fetch
stale data. While most TSO crashes are due to nullptr dereferences, this
can still check for the corruption case.
In 64-bit mode, the LOOP instruction's RCX register usage is 64-bit or
32-bit.
In 32-bit mode, the LOOP instruction's RCX register usage is 32-bit or
16-bit.
FEX wasn't handling the 16-bit case at all which was causing the LOOP
instruction to effectively always operate at 32-bit size. Now this is
correctly supported, and it also stops treating the operation as 64-bit.
This was a funny joke that this was here, but it is fundamentally
incompatible with what we're doing. All those users are running proot
anyway because of how broken running under termux directly is.
Just remove this from here.
Take e.g a forward rep movsb copy from addr 0 to 1, the expected
behaviour since this is a bytewise copy is:
before: aaabbbb...
after: aaaaaaa...
but by copying in 32-byte chunks we end up with:
after: aaaabbbb...
due to the self overwrites not occuring within a single 32 bit copy.