The Vulkan specification states that querying "global commands" like
vkCreateInstance with a non-NULL instance is undefined behavior. Indeed, some
implementations will return null pointers in such cases.
Instead, we can drop the query from DoSetupWithInstance altogether, since
the library initializer will load the function pointer using dlsym instead.
Fixes#3519.
exhaustively checked against the Intel pseudocode since this is tricky:
def intel(AL, CF, AF):
old_AL = AL
old_CF = CF
CF = False
if (AL & 0x0F) > 9 or AF:
Borrow = AL < 6
AL = (AL - 6) & 0xff
CF = old_CF or Borrow
AF = True
else:
AF = False
if (old_AL > 0x99) or old_CF:
AL = (AL - 0x60) & 0xff
CF = True
return (AL & 0xff, CF, AF)
def fex(AL, CF, AF):
AF = AF | ((AL & 0xf) > 9)
CF = CF | (AL > 0x99)
NewCF = CF | (AF if (AL < 6) else CF)
AL = (AL - 6) if AF else AL
AL = (AL - 0x60) if CF else AL
return (AL & 0xff, NewCF, AF)
for AL in range(256):
for CF in [False, True]:
for AF in [False, True]:
ref = intel(AL, CF, AF)
test = fex(AL, CF, AF)
print(AL, "CF" if CF else "", "AF" if AF else "", ref, test)
assert(ref == test)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Based on https://www.righto.com/2023/01/
New implementation is branchless, which is theoretically easier to RA. It's also
massively simpler which is good for a demon opcode.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Same situation as the last stack leak memory fix, this is fairly tricky
since it is dealing with stack pivoting. Fixes the memory leak around
pthread stack allocations, making memory usage lower for applications
that constantly spin-up and destroy threads (Like Steam).
We need to let glibc allocate a minimum sized stack (128KB and we can't
control it) to work around a race condition with DTV/TLS regions. This
means we need to do a stack pivot once the thread starts executing.
We also need to be careful because the `PThread` object is deleted
inside of the execution thread, which was resulting in a use-after-free
bug.
There are definitely some more memory leaks that I'm still fighting, and I have
noticed in my abusive thread creation program that we might want to
change some jemalloc options to more aggressively cut down on residency.
This is just one out of many.
I remember seeing some application last year where they closed a FEX
owned FD but now I don't remember what it was. This can really mess us
up so add some debug tracking so we can try and find it again.
Might be something specifically around flatpack, appimage, or chrome's
sandbox. I have some ideas about how to work around these problems if
they crop up but need to find the problem applications again.
Since we do an immediate overwrite of the file we are copying, we can
instead do a rename. Failure on rename is fine, will either mean the
telemetry file didn't exist initially, or some other permission error so
the telemetry will get lost regardless.
This may be useful for tracking TSO faulting when it manages to fetch
stale data. While most TSO crashes are due to nullptr dereferences, this
can still check for the corruption case.
In 64-bit mode, the LOOP instruction's RCX register usage is 64-bit or
32-bit.
In 32-bit mode, the LOOP instruction's RCX register usage is 32-bit or
16-bit.
FEX wasn't handling the 16-bit case at all which was causing the LOOP
instruction to effectively always operate at 32-bit size. Now this is
correctly supported, and it also stops treating the operation as 64-bit.
This was a funny joke that this was here, but it is fundamentally
incompatible with what we're doing. All those users are running proot
anyway because of how broken running under termux directly is.
Just remove this from here.
Take e.g a forward rep movsb copy from addr 0 to 1, the expected
behaviour since this is a bytewise copy is:
before: aaabbbb...
after: aaaaaaa...
but by copying in 32-byte chunks we end up with:
after: aaaabbbb...
due to the self overwrites not occuring within a single 32 bit copy.
When TSO is disabled, vector LDP/STP can be used for a two
instruction 32 byte memory copy which is significantly faster than the
current byte-by-byte copy. Performing two such copies directly after
oneanother also marginally increases copy speed for all sizes >=64.