We need to move the modifier enum out of the SVEMemOperand class
since it's also used with adr. Plus, this can also be convenient
not being tied down to the class itself.
This also makes accessing modifiers less noisy, since the class
When a 32-bit adcx instruction was encountered, it was getting treated
as a 16-bit adcx instruction instead. This is because of the 0x66 prefix
required to handle this instruction.
Adds a unit test to ensure it doesn't break again.
This was confusingly split between Arm64Emitter, Arm64Dispatcher, and
Arm64JIT.
- Arm64JIT objects were unnecessary and free to be deleted.
- Arm64Dispatcher simulator and decoder moved to Arm64Emitter
- Arm64Emitter disassembler and decoder renamed
- Dropped usage of the PrintDisassembler since it is hardcoded to go
through a FILE* type
- We instead want its output to go through LogMan, which means using a
split Decoder+Disassembler object pair.
- Can't reuse the object from the vixl simulator since the simulator
registers the decoder as a visitor, causing the simulator to execute
while disassembling instructions if reused.
- Disassembly output for blocks and dispatcher now output through Logman
- Blocks wrapped in Begin/End text for tracking purposes for CI.
This was causing test failure locally where some values were set to
uninitialized data. Ensure that gregs, YMM, and MMX registers are all
zero initialized.
Requires the IR headerop to house the number of host instructions this
code is translating for the stats.
Fixes compiling with disassembly enabled, will be used with the
instruction count CI.
This is incredibly useful and I find myself hacking this feature in
every time I am optimizing IR. Adds a new configuration option which
allows dumping IR at various times.
Before any optimization passes has happened
After all optimizations passes have happened
Before and After each IRPass to see what is breaking something.
Needs #2864 merged first
This is a /very/ simple optimization purely because of a choice that ARM
made with SVE in latest Cortex.
Cortex-A715:
- sxtl/sxtl2/uxtl/uxtl2 can execute 1 instruction per cycle.
- sunpklo/sunpkhi/uunpklo/uunpkhi can execute 2 instructions per cycle.
Cortex-X3:
- sxtl/sxtl2/uxtl/uxtl2 can execute 2 instruction per cycle.
- sunpklo/sunpkhi/uunpklo/uunpkhi can execute 4 instructions per cycle.
This is fairly quirky since this optimization only works on SVE systems
with 128-bit Vector length. Which since it is all of the current
consumer platforms, it will work.
We need to know the difference between the host supporting SVE with
128-bit registers versus 256-bit registers. Ensure we know the
difference.
No functional change here.
This allows use to both enable and disable regardless of what the host
supports. This replaces the old `EnableAVX` option.
Unlike the old EnableAVX option which was a binary option which could
only disable, each of these options are technically trinary states.
Not setting an option gives you the default detection, while explicitly
enabling or disabling will toggle the option regardless of what the host
supports.
This will be used by the instruction count CI in the future.
This was a debug LoadConstant that would load the entry in to a temprary
register to make it easier to see what RIP a block was in.
This was implemented when FEX stopped storing the RIP in the CPU state
for every block. This is now no longer necessary since FEX stores the
in the tail data of the block.
This was affecting instructioncountci when in a debug build.
I use this locally when looking for optimization opportunities in the
JIT.
The instruction count CI in the future will use this as well.
Just get it upstreamed right away.
`eor <reg>, <reg>, <reg>` is not the optimal way to zero a vector
register on ARM CPUs. Instead we should move by constant or zero
register to take advantage of zero-latency moves.
This needs to default to 64-bit addresses, this was previously
defaulting to 32-bit which was meaning the destination address was
getting truncated. In a 32-bit process the address is still 32-bit.
I'm actually surprised this hasn't caused spurious SIGSEGV before this
point.
Adds a 32-bit test to ensure that side is tested as well.
This is more obvious. llvm-mca says TST is half the cycle count of CMN
for whatever it's defaulting to. dougallj's reference shows both as the
same performance.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>