We can fold the Not into the And. This requires flipping the arguments
to Andn, but we do not flip the order of the assignments since that
requires an extra register in a test I'm looking at.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
WIN32 has a define already called `GetObject` and will cause our
symbol to have an A appended to it and break linking.
Just rename it to `GetTelemetryValue`
Noticed during introspection that we were generating zero constants
redundantly. Bunch of single cycle hits or zero-register renames.
Every time a `SetRFLAG` helper was called, it was /always/ doing a BFE
on everything passed in to extract the lowest bit. In nearly all cases
the data getting passed in is already only the lowest bit.
Instead, stop the helper from doing this BFE, and ensure the
OpcodeDispatcher does BFE in the couple of cases it still needs to do.
As I was skimming through all these to ensure BFE isn't necessary, I did
notice that some of the BCD instructions are wrong or questionable. So I
left a comment on those so we can come back to it.
These address calculations were failing to understand that they can be
optimized. When TSO emulation is disabled these were fine, but with TSO
we were eating one more instruction.
Before:
```
add x20, x12, #0x4 (4)
dmb ish
ldr s16, [x20]
dmb ish
```
After:
```
dmb ish
ldr s16, [x12, #4]
dmb ish
```
Also left a note that once LRCPC3 is supported in hardware that we can do a similar optimization there.
When this instruction returns the index in to the ecx register, this is
defined as a 32-bit result. This means it actually gets zero-extended to
the full 64-bit GPR size on 64-bit processes.
Previously FEX was doing a 32-bit insert which leaves garbage data in
the upper 32-bits of the RCX register.
Adds a unit test to ensure the result is zero extended.
Fixes running Java games under FEX now that SSE4.2 is exposed.
ARM64 BFI doesn't allow you to encode two source registers here to match
our SSA semantics. Also since we don't support RA constraints to ensure
that these match, just do the optimal case in the backend.
Leave a comment for future RA contraint excavators to make this more
optimal
In libstdc++ version 13, they moved the implementation of
`polymorphic_allocator` to `bits/memory_resource.h`.
In doing so they forgot to move the template's default argument to that
header. This causes the problem that `bits/memory_resource.h` is
included first without the template's default argument defined. This
breaking the automatic type deducation of `std::byte`.
Still broken in
[upstream](be240fc6ac/libstdc%2B%2B-v3/include/std/memory_resource (L79-L83))
and is unlikely to be fixed and backported. Since this is the only place
we use this type, just fix it here.
When a fork occurs FEX needs to be incredibly careful as any thread
(that isn't forking) that holds a lock will vanish when the fork occurs.
At this point if the newly forked process tries to use these mutexes
then the process hangs indefinitely.
The three major mutexes that need to be held during a fork:
- Code Invalidation mutex
- This is the highest priority and causes us to hang frequently.
- This is highly likely to occur when one thread is loading shared
libraries and another thread is forking.
- Happens frequently with Wine and steam.
- VMA tracking mutex
- This one happens when one thread is allocating memory while a fork
occurs.
- This closely relates to the code invalidation mutex, just happens at
the syscall layer instead of the FEXCore layer.
- Happens as frequently as the code invalidation mutex.
- Allocation mutex
- This mutex is used for FEX's 64-bit Allocator, this happens when FEX
is allocating memory on one thread and a fork occurs.
- Fairly infrequent because jemalloc doesn't allocate VMA regions that
often.
While this likely doesn't hit all of the FEX mutexes, this hits the ones
that are burning fires and are happening frequently.
- FEXCore: Adds forkable mutex/locks
Necessary since we have a few locations in FEX that need to be locked
before and after a fork.
When a fork occurs the locks must be locked prior to the fork. Then
afterwards they either need to unlock or be set to default
initialization state.
- Parent
- Does an unlock
- Child
- Sets the lock to default initialization state
- This is because it pthreads does TID based ownership checking on
unique locks and refcount based waiting for shared locks.
- No way to "unlock" after fork in this case other than default
initializing.
Fixes a spurious `No such file or directory` error when `ls` is trying
to query a path's xattributes that come from the emulated rootfs.
These syscalls don't support the *at variants, so it can't use the optimized `GetEmulatedFDPath` implementation.
It must also return an error on a found file path, which makes their
implementation be slightly different than the other user of of
`GetEmulatedPath`. In the case of error, it must only return an error
from the emulated path if it is /not/ ENOENT.
Before:
```
$ FEXInterpreter /usr/bin/ls -alth /usr/bin/wine-stable
/usr/bin/ls: /usr/bin/wine-stable: No such file or directory
-rwxr-xr-x 1 ryanh ryanh 1.1K Sep 24 2022 /usr/bin/wine-stable
```
After:
```
$ FEXInterpreter /usr/bin/ls -alth /usr/bin/wine-stable
-rwxr-xr-x 1 ryanh ryanh 1.1K Sep 24 2022 /usr/bin/wine-stable
```