At the moment we always run ctest with max number of cpus. If
undefined, it will keep current behaviour, otherwise it will
honour TEST_JOB_COUNT.
Therefore to run ctest one test at a time, use
`cmake ... -DTEST_JOB_COUNT=1`
Found out that Far Cry uses this instruction and it is viable to use in
CPL-3. This only returns constant data but its behaviour is a little
quirky.
This instruction has a weird behaviour that the 32-bit operation does an
insert in to the 64-bit destination, which might be an Intel versus AMD
behaviour. I don't have an Intel machine available to test if that
theory is true although. This assumption would match similar behaviour
where segment registers are inserted instead of zext.
Gets the game farther but then it crashes in a `___ascii_strnicmp`
function where the arguments end up being `___ascii_strnicmp(nullptr, "Color", 5);`.
Folds reg+const memory address into addressing mode,
if the constant is within 16Kb.
Update instcountci files.
Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm
Doesn't quite match the libc code directly because it uses `[gs:eax]`
with both having the sign bit set and we can't deal with that with ASM
tests. So match the behaviour in a different way.
When the source arguments for LoadMem/StoreMem have bit 31 set then they
are incorrectly sign extending in some instances.
Detected this when testing #3421 but I don't have a proper fix for it.
When a 32-bit imul was being executed it had a chance of returning
garbage data in the upper 32-bits of the 64-bit result.
While this didn't typically cause problems, this gets exacerbated from
32-bit applications executing multiplies for address calculations.
A combination of commits 714669136086cee0d2cc4dfb479e26b204206c37 and
d01b457727208fd34511d48e850e3b4a33d76147 exposed this problem where
previously there would be multiple moves between the calculation and
data use which would have zero'd the upper bits for us previously.
Now that we are no longer doing that, we need to make sure the opcode
dispatcher doesn't generate broken code instead.
Fixes Dungeon Defenders, which hasn't worked since FEX-2308.
Adds an ASM test that ensures we don't break it again.
This PR has a bug around flags calculation and REP LODS{B,W,D,Q}.
This currently passes on main but fails on #3162.
Bug only occurs in 32-bit instead of 64-bit with the same test. Should
help diagnose the bugs in #3162.
It is scarcely used today, and like the x86 jit, it is a significant
maintainence burden complicating work on FEXCore and arm64 optimization. Remove
it, bringing us down to 2 backends.
1 down, 1 to go.
Some interpreter scaffolding remains for x87 fallbacks. That is not a problem
here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This needs to default to 64-bit addresses, this was previously
defaulting to 32-bit which was meaning the destination address was
getting truncated. In a 32-bit process the address is still 32-bit.
I'm actually surprised this hasn't caused spurious SIGSEGV before this
point.
Adds a 32-bit test to ensure that side is tested as well.
This is an undocumented but supported instruction. It behaves just like
an `sbb al, al` but doesn't set flags and is one byte shorter.
The end result is that al is set to 0xFF or 0 depending on if CF is set
or not.
All of these operations were only testing positive integers which is why
they didn't show 16-bit failures.
Adds a bunch of negative tests to each ones now that #2314 is merged,
which would have caught them.
32-bit GOT calculation needs to do a call+pop to do get the EIP on
32-bit. LEA doesn't work because it there is no EIP relative ops like on
x86-64.
This causes a terrible block split on every GOT calculation without the
optimization in place.
Now the block can continue through this weird GOT calculation.
This will be worthwhile for our 32-bit thunks where for some reason the
GOT calculation can't be removed. The GOT is calculated even though it
isn't used.
The Zen+ CI runner doesn't support the UMIP hardware feature, so it
doesn't hit the kernel emulated path.
Instead the instruction returns real data on this hardware. Still in
kernel space, so it is unmapped as expected.
Instead of relying on runner features, classify based on CPU features.
This fixes an annoying issue where if running unit tests locally without
it set then you get an unexpected failure.
Fixes#1807
Was incorrectly setting the FCW to 037h when it was supposed to be
037Fh.
Fixes a bug in a visual novel where its CPUID state wouldn't initialize
if this was set incorrectly.
I misread the implementation details of this instruction when
implementing.
The pseudocode says `ST(0) = ST(0) ∗ 2^rndint(ST(1))` so I understood
the instruction to use the current rounding mode of the host to extract
the integer portion of `ST(1)`.
The actual implementation is in the details of the statement `the
integer portion of the floating- point value in ST(1).`
This behaves like round towards zero/truncate, additional hardware
testing and documentation reading confirms this.
Fixes#1584