It is not an external component, and it makes paths needlessly long.
Ryan seemed amenable to this when we discussed on IRC earlier.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Now all vbroadcast implementations go down the more optimal path.
For non-SVE 128-bit cases where we only have 128-bit wide registers,
we behave like ld1rqb and just act as a normal 128-bit load for
interface convenience.
In the case of running on a 128-bit SVE system this predicate wasn't
setup. Since we never had any predicate usage before this wasn't an
issue. Now that #2914 is using the 128-bit predicate we need to make
sure that we are generating it.
Allows the implementations of the vbroadcast instructions to perform the
load and broadcast in one operation as opposed to doing the load and then
broadcast separately.
Notably, the broadcasting loads can also be used on systems that have SVE 128-bit
support as well, not only 256-bit.
On non-SVE systems, we use the equivalent AdvSIMD instructions.
For config values that were string objects we were unnecessary creating
copies each time the string was accessed.
Convert the () operator over to returning a reference.
The current implementation uses orr excessively. This has FEX missing
hardware optimization opportunities where some CPU cores will zero-cycle
move constants that fit in to the 16-bits of movz/movk.
First evaluate up front if the number of 16-bit segments is > 1, in
those cases we should check if it is a bitfield that can be moved in one
instruction with orr.
After that point we will use movz for 16-bit constant moves.
Additionally this optimizes the case where a constant of zero is loaded
to be a `mov <reg>, zr` which gets renamed in most hardware.
Commonly we are doing a BFI into a 32-bit register, which is hitting the
ubfx (lsr alias) path.
In the case of 32-bit destination we can also do a regular move, which
will take advantage of CPU's rename functionality and give a minor speed
boost.
Didn't notice this in the previous PR, When DUMPIR=stderr without and
selection of where to place it in PASSMANAGERDUMPIR it was supposed to
put the dumper at the end of the passes.
We need to make sure that it it placed at the end of the passes rather
than current `it`.
We can perform the SQRT first and then broadcast 1.0 into the destination
since all the intermediary work is done, meaning we don't have to worry
about Dst and Vector aliasing one another.
If DumpIR is enabled but the PassManagerDumpIR option isn't enabled then
this currently does nothing.
As a convenience, enable dumping the final optimized IR if an option
hasn't been specified.
We need to move the modifier enum out of the SVEMemOperand class
since it's also used with adr. Plus, this can also be convenient
not being tied down to the class itself.
This also makes accessing modifiers less noisy, since the class
When a 32-bit adcx instruction was encountered, it was getting treated
as a 16-bit adcx instruction instead. This is because of the 0x66 prefix
required to handle this instruction.
Adds a unit test to ensure it doesn't break again.
This was confusingly split between Arm64Emitter, Arm64Dispatcher, and
Arm64JIT.
- Arm64JIT objects were unnecessary and free to be deleted.
- Arm64Dispatcher simulator and decoder moved to Arm64Emitter
- Arm64Emitter disassembler and decoder renamed
- Dropped usage of the PrintDisassembler since it is hardcoded to go
through a FILE* type
- We instead want its output to go through LogMan, which means using a
split Decoder+Disassembler object pair.
- Can't reuse the object from the vixl simulator since the simulator
registers the decoder as a visitor, causing the simulator to execute
while disassembling instructions if reused.
- Disassembly output for blocks and dispatcher now output through Logman
- Blocks wrapped in Begin/End text for tracking purposes for CI.