RCLSE: optimize out pointless stores

can help a lot of x86 code because x86 is 2-address and a64 is 3-address, so x86
ends up with piles of movs that end up dead after translation

It's not a win across the board because our RA isn't aware of tied registers so
sometimes we regress moves. But it's a win on average, and the RA bits can be
improved with time.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This commit is contained in:
Alyssa Rosenzweig 2024-03-11 10:59:38 -04:00
parent 85f8ad3842
commit 92f31648b9

View File

@ -544,10 +544,20 @@ bool RCLSE::ClassifyContextLoad(FEXCore::IR::IREmitter *IREmit, ContextInfo *Loc
bool RCLSE::ClassifyContextStore(FEXCore::IR::IREmitter *IREmit, ContextInfo *LocalInfo, FEXCore::IR::RegisterClassType Class, uint32_t Offset, uint8_t Size, FEXCore::IR::OrderedNode *CodeNode, FEXCore::IR::OrderedNode *ValueNode) {
auto Info = FindMemberInfo(LocalInfo, Offset, Size);
ContextMemberInfo PreviousMemberInfoCopy = *Info;
RecordAccess(Info, Class, Offset, Size, LastAccessType::WRITE, ValueNode,
CodeNode);
// TODO: Optimize redundant stores.
// ContextMemberInfo PreviousMemberInfoCopy = *Info;
if (PreviousMemberInfoCopy.AccessRegClass == Info->AccessRegClass &&
PreviousMemberInfoCopy.AccessOffset == Info->AccessOffset &&
PreviousMemberInfoCopy.AccessSize == Size &&
PreviousMemberInfoCopy.Accessed == LastAccessType::WRITE) {
// This optimizes redundant stores with no intervening load
IREmit->Remove(PreviousMemberInfoCopy.StoreNode);
return true;
}
// TODO: Optimize the case of partial stores.
return false;
}