Enabled loop interchange support for floating point reductions
if it is allowed to reorder floating point operations.
Previously when we encouter a floating point PHI node in the
outer loop exit block, we bailed out since we could not detect
floating point reductions in the early days. Now we remove this
limiation since we are able to detect floating point reductions.
Reviewed By: #loopoptwg, Meinersbur
Differential Revision: https://reviews.llvm.org/D117450
rv64izbb has a RORW/ROLW instructions that operate on the lower
32-bits of a 64-bit value and sign extend bit 31 of the result.
DAGCombiner won't match rotate idioms because the i32 type isn't Legal
on riscv64.
This patch teaches DAGCombiner to allow it if the type is going to
be promoted and the target has Custom type legalization for ISD::ROTL
or ISD::ROTR. I've restricted this to scalar types. It doesn't appear
any in tree targets other than riscv64 have custom type legalization
for rotates.
If this patch isn't acceptable, I guess I can match SRLW, SLLW, and OR
after type legalization, but I'd like to avoid that if possible.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D119062
Adds `-pagezero_size`. `-pagezero_size` commonly used for kernel development.
`-pagezero_size` changes the `__PAGEZERO` size, removing that segment if it is set to zero.
One of the four flags from {D118570}
Now with error messages and tests.
Differential Revision: https://reviews.llvm.org/D118724
When the shift amount is known and a known sign bit analysis of
the shiftee indicates that no saturation will occur, then we can
replace SSHLSAT/USHLSAT by SHL.
Differential Revision: https://reviews.llvm.org/D118765
It should be possible to replace SSHLSAT and USHLSAT with SHL when
it is known that no saturation will take place (e.g. by analysing
known sign bits in the first shift operand).
Differential Revision: https://reviews.llvm.org/D118764
In scalarizeInstruction(), isUniformAfterVectorization is used to detect
cases where it is sufficient to always access the first lane. This
should map directly checking whether the operand is a uniform replicate
recipe.
Differential Revision: https://reviews.llvm.org/D116654
This way they get lowered through the ARMISD::BUILD_VECTOR, which can
produce more efficient D register moves.
Also helps D115653 not get stuck in a loop.
FSAFDO profile loader is currently disabled even --enable-fs-discriminator is enabled.
They need to be turned on by options which makes it cumbersome for experiments.
This patch changes the FSAFDO profile loader enabled by default. Since they are
guarded by EnableFSDiscriminator, they will only be turned on if
--enable-fs-discriminator is enabled. Note that --enable-fs-discriminator is
still disabled by default.
Differential Revision: https://reviews.llvm.org/D119033
SharedSymbol::SharedSymbol initializes verdefIndex and Symbol::replace
copies verdefIndex.
By move verdefIndex assignment outside of ctor, Symbol::replace can be changed
to not copy verdefIndex. This can be used to decrease work for for
ObjKind/BitcodeKind.
The WWM register has unmodeled register liveness, For v_set_inactive_*,
clobberring source register is dangerous because it will overwrite the
inactive lanes. When the source vgpr is dead at v_set_inactive_lane,
the inactive lanes may be not really dead. This may make common
optimizations doing wrong.
For example in a simple if-then cfg in Machine IR:
bb.if:
%src =
bb.then:
%src1 = COPY %src
%dst = V_SET_INACTIVE %src1(tied-def 0), %inactive
bb.end
... = PHI [0, %bb.then] [%src, %bb.if]
The register coalescer will think it is safe to optimize "%src1 = COPY %src"
in bb.then. And at the same time, there is no interference for the PHI in
bb.end. The source and destination values of the PHI will be assigned
the same register. The single PHI register will be overwritten by the
v_set_inactive, then we would get wrong value in bb.end.
With this change, we will copy the content of the source register before
setting inactive lanes after register allocation. Yes, this will sacrifice
the WWM code generation a little, but I don't have any better idea to do things
correctly.
Differential Revision: https://reviews.llvm.org/D117482
Introduced by 23a5090c6, some style option markers indicated 'clang-format 14',
though their respective options were available in earlier releases.
Note: Even though the value type of 'SpacesInAngles' option changed,
this option has been already present since version 3.4.
Differential Revision: https://reviews.llvm.org/D118991
Currently `this->getName() == newSym.getName()`.
By keeping the old nameData/nameSize, newSym's nameData/nameSize will be
ignored. The call sites can avoid calling getName().
printTraceSymbol needs to take the symbol name since `other`'s name is empty.