3987 Commits

Author SHA1 Message Date
Ryan Houdek
eaed5c4704
Merge pull request #2862 from Sonicadvance1/optimize_vector_zero
ARM64: Optimize vector zeroing
2023-08-09 03:51:04 -07:00
Ryan Houdek
5f0efda8fe ARM64: Fixes shift by immediate zero
These would emit invalid instructions in most cases. Turn in to a move
or a no-op if the shift is zero.
2023-08-09 02:16:17 -07:00
Ryan Houdek
d198d701aa OpcodeDispatcher: Fixes vector shifts by immediate zero
pslldq logic was wrong in the case of zero shift.
The rest should just return their source in the case of zero shift.
2023-08-09 02:16:17 -07:00
Ryan Houdek
bf5719770e Arm64: Remove erroneous LoadConstant
This was a debug LoadConstant that would load the entry in to a temprary
register to make it easier to see what RIP a block was in.

This was implemented when FEX stopped storing the RIP in the CPU state
for every block. This is now no longer necessary since FEX stores the
in the tail data of the block.

This was affecting instructioncountci when in a debug build.
2023-08-08 22:56:36 -07:00
Ryan Houdek
e0461497a0 ARM64: Optimize vector zeroing
`eor <reg>, <reg>, <reg>` is not the optimal way to zero a vector
register on ARM CPUs. Instead we should move by constant or zero
register to take advantage of zero-latency moves.
2023-08-08 22:24:11 -07:00
Ryan Houdek
68cb6e61d1
Merge pull request #2860 from lioncash/alias
ARMEmitter: Add missing atomic aliases
2023-08-06 02:05:30 -07:00
Mai
5d0b2060e2
Merge pull request #2858 from Sonicadvance1/fix_clzero
X86Tables: Fixes CLZero destination address
2023-08-06 05:05:07 -04:00
Lioncache
b7d05a65c7 ARMEmitter: Add missing atomic aliases 2023-08-04 21:49:59 -04:00
Lioncache
93fe2fe06c ARMEmitter: Detemplatize LoadStoreAtomicLSE
Lets us lessen some template instantiations.
2023-08-04 19:47:03 -04:00
Lioncache
1b53337925 ARMEmitter: Simplify LoadStoreAtomicLSE variant
We can move the base opcode into the implementation function.
2023-08-04 19:35:54 -04:00
Billy Laws
52e5b8ccd9 OpcodeDispatcher: Fix 16-bit popa insertion behaviour
The 16-bit writes shouldn't overwrite the upper half of the 32-bit register for
POPA.
2023-08-04 17:24:55 +01:00
Ryan Houdek
617977357a X86Tables: Fixes CLZero destination address
This needs to default to 64-bit addresses, this was previously
defaulting to 32-bit which was meaning the destination address was
getting truncated. In a 32-bit process the address is still 32-bit.

I'm actually surprised this hasn't caused spurious SIGSEGV before this
point.

Adds a 32-bit test to ensure that side is tested as well.
2023-08-04 02:31:30 -07:00
Alyssa Rosenzweig
a996e5300e JIT: Use TST instead of CMN
This is more obvious. llvm-mca says TST is half the cycle count of CMN
for whatever it's defaulting to. dougallj's reference shows both as the
same performance.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 17:51:33 -04:00
Ryan Houdek
25b2af14fd
Merge pull request #2854 from alyssarosenzweig/flags/rotate-harder
OpcodeDispatcher: Optimize rotates
2023-08-02 14:09:40 -07:00
Alyssa Rosenzweig
76059949ea
Merge pull request #2852 from Sonicadvance1/optimize_phsubsw
OpcodeDispatcher: Optimize phsubsw/phaddsw
2023-08-02 17:02:13 -04:00
Ryan Houdek
6e15c9c213
Merge pull request #2851 from Sonicadvance1/optimize_cas128_select
OpcodeDispatcher: Optimize CMPXCHG{8B,16B} final comparison
2023-08-02 14:01:58 -07:00
Alyssa Rosenzweig
01fcca884b OpcodeDispatcher: Optimize rotates
In the non-immediate cases, we can amortize some work between the two
flags to come out 1 instruction ahead.

In the immediate case, costs us an extra 2 instructions compared to
before we packed NZCV flags, but this mitigates a bigger instr count
regression that this PR would otherwise have. Coming out ahead will
require FlagM and smarter RA, but is doable.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 16:54:56 -04:00
Alyssa Rosenzweig
7a0119b092 OpcodeDispatcher: Optimize right shifts
Same technique as the left shifts. Gets rid of all our COND_FLAG_SET
use, which is good because it's a performance footgun.

Overall saves 17 instructions (!!!!) from the flag calculation code for
`sar eax, cl`.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 14:45:44 -04:00
Alyssa Rosenzweig
fa42c1616e OpcodeDispatcher: Preserve AF for non-immediate shift
The selection logic is expensive. Saves 5 instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 14:45:10 -04:00
Alyssa Rosenzweig
18783948f7 OpcodeDispatcher: Optimize non-immediate shift
Similar to the immediate case, but now we select between the entire old
and new NZCV registers. This is faster than selecting each bit
independently. Saves 11 instructions for calculating flags for "shl eax,
cl".
2023-08-02 14:45:10 -04:00
Alyssa Rosenzweig
969d2e4b6a OpcodeDispatcher: Zero OF for shift > 1
It is undefined in this case. We prefer to zero (rather than preserve
the existing value) as it avoids a costly RMW of the NZCV register.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 14:14:08 -04:00
Alyssa Rosenzweig
7ceaf56407 OpcodeDispatcher: Use SetNZ_ZeroCV for immediate shifts
We need to be careful to preserve V if needed. For `shl 1` and `shr 1`,
saves 2 instruction overall compared to before the PR. For `sar 1`,
saves 3 overall.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 14:12:00 -04:00
Alyssa Rosenzweig
7c52375267 OpcodeDispatcher: Optimize (U)MUL flags calculation
csel the value we want directly. Saves 2 instructions. Could do better
still but hey, progress is progress. Currently looks like:

2995: 0x0000ffff6c600040  320407f5		orr w21, wzr, #0x30000000
2995: 0x0000ffff6c600044  f10000df		cmp x6, #0x0 (0)
2995: 0x0000ffff6c600048  9a950294		csel x20, x20, x21, eq
2995: 0x0000ffff6c60004c  b902db94		str w20, [x28, #728]

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:40:44 -04:00
Alyssa Rosenzweig
a4ea792d03 OpcodeDispatcher: Optimize GetRFLAG of definitely-0 flag
We can just return zero, no need to do a pointless Bfe. Saves yet
another instruction for GetPackedRLAG in a test I'm looking at.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
cec98637c9 OpcodeDispatcher: Avoid extra OR in GetPackedRFLAG
We know that bit 0 is CF, so we can do CF first and then avoid setting
Original to 2 (for reserved) with a silly `or xzr, #2` instruction.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
d5026f5815 OpcodeDispatcher: Handle SF/ZF together for GetPackedRFLAG
They're together on both x86 and arm64, so this is faster if we're
getting both.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
1e4456ec40 OpcodeDispatcher: Use orlshl in GetPackedRFLAG
Saves some moves.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
e285c7c9a0 OpcodeDispatcher: Use TEST when possible
Faster sign/negate testing for 32-bit/64-bit inputs. This could maybe be
extended to 8/16-bit if we have FlagM but that's for later.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
8244c7f2a6 OpcodeDispatcher: Use orlshl when possible
If we can prove that a flag bit could not possibly be set, we can use
orlshl rather than bfi, which can be more efficient.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
747c5e17f8 OpcodeDispatcher: Add ZeroNZCV helper
In some cases we just want to insert in one bit at a time, add a helper
to zero the 4 flags together so we can avoid the extra RMW cycle at the
beginning.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 13:18:39 -04:00
Alyssa Rosenzweig
52ce027e9b OpcodeDispatcher: Set N flag more efficiently
We can set N more efficiently with some bit math, and zero ZCV at the
same time. In the future we'll be able to use TST for this to make it
even faster.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
0fb3903889 OpcodeDispatcher: Pack NZCV flags together
Later, this will let us take advantage of the arm64 flags. For
now, this just turns some strb's into bfi's for dubious benefit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
2832afd0f7 OpcodeDispatcher: Calculate deferred flags more
If we read or write NZCV flags we need to call CalculateDeferredFlags on
block boundaries, if only to flush out the cached copy.

Also, when leaving a block we call it to flush out. This is annoyingly
invasive but I don't know of a better way to do this that doesn't
involve rearchitecting the dispatcher.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
4af42477a7 OpcodeDispatcher: Use GetRFLAG more
We'll add an extra caching layer in a moment so can't call _LoadFlag
directly and expect correct results.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
4786aa479c OpcodeDispatcher: Unify SetRFLAG impls
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
8a049aa0c3 Context: Make BackendFeatures public
So that the OpcodeDispatcher can check the supported features and emit
code accordingly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
64bac687b5 IR: Add TEST opcode
Maps to arm64 tst, except properly SSA. This will need some RA support
to avoid redundant mrs/msr sequences.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
6e05494ad0 IR: Add Orlshr
Similar to Orlshl. This will let us save an instruction in
GetPackedRFLAG.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
4d13f5d97d IR: Add Orlshl op
On arm64, orr (with a shifted register) is maybe fewer cycles and
definitely easier on the RA than bfi (=> fewer moves generated). So,
it's preferred when we know the corresponding bit is 0 in the
destination.

It's not useful on other targets, so it's gated behind a backend feature
bit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Alyssa Rosenzweig
1ad928dc2c IR: Add special NZCV "flag"
Reserve 4 bytes of "flags" to model the 32-bit arm64 NZCV register, so
we can start porting FEX's flag handling code over to using NZCV without
needing the whole compiler to be aware of instructions that might
clobber host flags.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 12:38:00 -04:00
Ryan Houdek
c235ab883d OpcodeDispatcher: Optimize phsubsw/phaddsw
Goes from 41 instructions down to 20(!) instructions.
The primary optimization here is removing a bunch of dull inserts and
instead using zip logic to get the elements where we want them.

The previous implementation was trying to retain semantics around how
the original instruction is implemented, which makes no sense at all.
Unzip the even and odd elements and just do the saturating operations
directly!

Huge shoutout to @dougallj again, showing that I was thinking about this implementation far too much like an x86 developer.

Before:
```asm
0x0000ffff8f300c50  10ffffe0    adr x0, #-0x4 (addr 0xffff8f300c4c)
0x0000ffff8f300c54  f9005f80    str x0, [x28, #184]
0x0000ffff8f300c58  3dc010c4    ldr q4, [x6, #64]
0x0000ffff8f300c5c  6e60bae5    neg v5.8h, v23.8h
0x0000ffff8f300c60  6e60b886    neg v6.8h, v4.8h
0x0000ffff8f300c64  4eb71ee0    mov v0.16b, v23.16b
0x0000ffff8f300c68  6e0614a0    mov v0.h[1], v5.h[1]
0x0000ffff8f300c6c  4ea01c07    mov v7.16b, v0.16b
0x0000ffff8f300c70  6e0614c4    mov v4.h[1], v6.h[1]
0x0000ffff8f300c74  6e0e34a7    mov v7.h[3], v5.h[3]
0x0000ffff8f300c78  6e0e34c4    mov v4.h[3], v6.h[3]
0x0000ffff8f300c7c  6e1654a7    mov v7.h[5], v5.h[5]
0x0000ffff8f300c80  6e1654c4    mov v4.h[5], v6.h[5]
0x0000ffff8f300c84  4ea71ce0    mov v0.16b, v7.16b
0x0000ffff8f300c88  6e1e74a0    mov v0.h[7], v5.h[7]
0x0000ffff8f300c8c  4ea01c05    mov v5.16b, v0.16b
0x0000ffff8f300c90  6e1e74c4    mov v4.h[7], v6.h[7]
0x0000ffff8f300c94  0f10a4a6    sxtl v6.4s, v5.4h
0x0000ffff8f300c98  4f10a4a5    sxtl2 v5.4s, v5.8h
0x0000ffff8f300c9c  0f10a487    sxtl v7.4s, v4.4h
0x0000ffff8f300ca0  4f10a484    sxtl2 v4.4s, v4.8h
0x0000ffff8f300ca4  4ea5bcc5    addp v5.4s, v6.4s, v5.4s
0x0000ffff8f300ca8  4ea4bce4    addp v4.4s, v7.4s, v4.4s
0x0000ffff8f300cac  0e6148a5    sqxtn v5.4h, v5.4s
0x0000ffff8f300cb0  4ea51ca0    mov v0.16b, v5.16b
0x0000ffff8f300cb4  4e614880    sqxtn2 v0.8h, v4.4s
0x0000ffff8f300cb8  4ea01c17    mov v23.16b, v0.16b
0x0000ffff8f300cbc  58000040    ldr x0, pc+8 (addr 0xffff8f300cc4)
0x0000ffff8f300cc0  d63f0000    blr x0
0x0000ffff8f300cc4  a4e97128    ldff1h {z8.d}, p4/z, [x9, x9, lsl #1]
0x0000ffff8f300cc8  0000ffff    udf #0xffff
0x0000ffff8f300ccc  000100dd    unallocated (Unallocated)
0x0000ffff8f300cd0  00000000    udf #0x0
[DEBUG] RIP: 0x100d7
[DEBUG] Guest Code instructions: 1
[DEBUG] Host Code instructions: 41
[DEBUG] Blow-up Amt: 41x
```

After:
```asm
0x0000ffffe2500a04  10ffffe0            adr x0, #-0x4 (addr 0xffffe2500a00)
0x0000ffffe2500a08  f9005f80            str x0, [x28, #184]
0x0000ffffe2500a0c  3dc010c4            ldr q4, [x6, #64]
0x0000ffffe2500a10  4e441ae5            uzp1 v5.8h, v23.8h, v4.8h
0x0000ffffe2500a14  4e445ae4            uzp2 v4.8h, v23.8h, v4.8h
0x0000ffffe2500a18  4e642cb7            sqsub v23.8h, v5.8h, v4.8h
0x0000ffffe2500a1c  58000040            ldr x0, pc+8 (addr 0xffffe2500a24)
0x0000ffffe2500a20  d63f0000            blr x0
0x0000ffffe2500a24  f7fec128            unallocated (Unallocated)
0x0000ffffe2500a28  0000ffff            udf #0xffff
0x0000ffffe2500a2c  000100dd            unallocated (Unallocated)
0x0000ffffe2500a30  00000000            udf #0x0
[DEBUG] RIP: 0x100d7
[DEBUG] Guest Code instructions: 1
[DEBUG] Host Code instructions: 20
[DEBUG] Blow-up Amt: 20x
```
2023-08-02 00:10:32 -07:00
Ryan Houdek
4bf3a0888b OpcodeDispatcher: Optimize CMPXCHG{8B,16B} final comparison
Optimizes the instruction blow-up from 36x to 34x.
The issue with this instruction is that AArch64 doesn't do something
like x86 where it sets a flag if the CAS was successful. This means we
need to do additional comparisons after the fact to see if it was
actually successful.

Previously this was implemented as eor+eor+orr+cmp+cset, Now it is
cmp+ccmp+cset, Saving two instructions.
Simple optimization but easy to do. This instruction is still mostly
killed by the overhead of moving registers all over.

Before:
```asm
0x0000ffffe25002ec  10ffffe0            adr x0, #-0x4 (addr 0xffffe25002e8)
0x0000ffffe25002f0  f9005f80            str x0, [x28, #184]
0x0000ffffe25002f4  aa0403f4            mov x20, x4
0x0000ffffe25002f8  aa0603f5            mov x21, x6
0x0000ffffe25002fc  aa0703f6            mov x22, x7
0x0000ffffe2500300  aa0503f7            mov x23, x5
0x0000ffffe2500304  aa1403e2            mov x2, x20
0x0000ffffe2500308  aa1503e3            mov x3, x21
0x0000ffffe250030c  4862ffb6            caspal x2, x3, x22, x23, [x29]
0x0000ffffe2500310  aa0203f4            mov x20, x2
0x0000ffffe2500314  aa0303f5            mov x21, x3
0x0000ffffe2500318  aa1403f6            mov x22, x20
0x0000ffffe250031c  aa1503f4            mov x20, x21
; ZF Flag + branch
0x0000ffffe2500320  ca0402d5            eor x21, x22, x4
0x0000ffffe2500324  ca060297            eor x23, x20, x6
0x0000ffffe2500328  aa1702b5            orr x21, x21, x23
0x0000ffffe250032c  f10002bf            cmp x21, #0x0 (0)
0x0000ffffe2500330  9a9f17f7            cset x23, eq
0x0000ffffe2500334  390b1b97            strb w23, [x28, #710]
0x0000ffffe2500338  b4000075            cbz x21, #+0xc (addr 0xffffe2500344)

0x0000ffffe250033c  aa1603e4            mov x4, x22
0x0000ffffe2500340  aa1403e6            mov x6, x20
0x0000ffffe2500344  58000040            ldr x0, pc+8 (addr 0xffffe250034c)
0x0000ffffe2500348  d63f0000            blr x0
0x0000ffffe250034c  f7fec128            unallocated (Unallocated)
0x0000ffffe2500350  0000ffff            udf #0xffff
0x0000ffffe2500354  00010053            unallocated (Unallocated)
0x0000ffffe2500358  00000000            udf #0x0
[DEBUG] RIP: 0x1004f
[DEBUG] Guest Code instructions: 1
[DEBUG] Host Code instructions: 36
[DEBUG] Blow-up Amt: 36x
```

After:
```asm
0x0000ffffe25002ec  10ffffe0            adr x0, #-0x4 (addr 0xffffe25002e8)
0x0000ffffe25002f0  f9005f80            str x0, [x28, #184]
0x0000ffffe25002f4  aa0403f4            mov x20, x4
0x0000ffffe25002f8  aa0603f5            mov x21, x6
0x0000ffffe25002fc  aa0703f6            mov x22, x7
0x0000ffffe2500300  aa0503f7            mov x23, x5
0x0000ffffe2500304  aa1403e2            mov x2, x20
0x0000ffffe2500308  aa1503e3            mov x3, x21
0x0000ffffe250030c  4862ffb6            caspal x2, x3, x22, x23, [x29]
0x0000ffffe2500310  aa0203f6            mov x22, x2
0x0000ffffe2500314  aa0303f7            mov x23, x3
0x0000ffffe2500318  aa1603f8            mov x24, x22
0x0000ffffe250031c  aa1703f9            mov x25, x23
; ZF Flag + branch
0x0000ffffe2500320  eb1402df            cmp x22, x20
0x0000ffffe2500324  fa5502e0            ccmp x23, x21, #nzcv, eq
0x0000ffffe2500328  9a9f17f4            cset x20, eq
0x0000ffffe250032c  390b1b94            strb w20, [x28, #710]
0x0000ffffe2500330  b5000074            cbnz x20, #+0xc (addr 0xffffe250033c)

0x0000ffffe2500334  aa1803e4            mov x4, x24
0x0000ffffe2500338  aa1903e6            mov x6, x25
0x0000ffffe250033c  58000040            ldr x0, pc+8 (addr 0xffffe2500344)
0x0000ffffe2500340  d63f0000            blr x0
0x0000ffffe2500344  f7fec128            unallocated (Unallocated)
0x0000ffffe2500348  0000ffff            udf #0xffff
0x0000ffffe250034c  00010053            unallocated (Unallocated)
0x0000ffffe2500350  00000000            udf #0x0
[DEBUG] RIP: 0x1004f
[DEBUG] Guest Code instructions: 1
[DEBUG] Host Code instructions: 34
[DEBUG] Blow-up Amt: 34x
```
2023-08-01 20:01:02 -07:00
Ryan Houdek
6834fe32e4 IR: Adds support for GPRPair to Select IR op
This IR operation was limited to GPR only previously for the values
getting compared.

This adds support for GPRPair (and technically FPR) so that it can be
used directly with GPR pairs. I say technically FPR because the
IREmitter disallowed FPRs for the comparison, but this was already
supported in all of the backends, we just didn't ever use it.

Some minor changes to the constant prop pass to ensure that we don't try
to propagate a select in to a CondJump, otherwise pair comparisons would
be duplicated. This code is expecting to be able to merge a simple
comparison in to a `cbnz`, which doesn't happen with GPR pairs.
2023-08-01 19:50:15 -07:00
Ryan Houdek
475a6a38fa FEXCore: Ensure that the man page follows DESTDIR
When cmake's `install` function is invoked with a relative path, then it
is interpreted as being relative to the `CMAKE_INSTALL_PREFIX` variable.

This variable follows both `DESTDIR` and `CMAKE_INSTALL_PREFIX` so it is
best to use relative addresses in the install path.

Thanks for the report Mike!
Fixes #2849
2023-08-01 13:28:21 -07:00
Ryan Houdek
173b70d191
Merge pull request #2817 from Sonicadvance1/psad_you_know
OpcodeDispatcher: Optimize PSAD* to use vuabdl{2,}
2023-07-31 14:54:57 -07:00
Ryan Houdek
52d7efda10
Merge pull request #2839 from Sonicadvance1/shrdi_of
OpcodeDispatcher: Fixes SHRD by immediate OF flag calculation
2023-07-31 10:56:40 -07:00
Lioncache
f9fa0f25e2 ARMEmitter: Add cinc/cinv/csetm aliases
Also corrects a cneg test using CC_AL. The ARM ARM says this
shouldn't be using both AL and NV
2023-07-30 22:41:03 -04:00
Lioncache
1483ddb538 ARMEmitter: Add ngc/ngcs aliases 2023-07-30 21:55:54 -04:00
Lioncache
8f48021a63 ARMEmitter: Add bfxil alias 2023-07-30 21:18:20 -04:00
Lioncache
1da5d7f2e6 ARMEmitter: Add bfc alias 2023-07-30 21:18:17 -04:00