The .file directive is changed to only have basename in D36018 for
ELF.
But on AIX, we require the .file directive to also contain the
directory info. This aligns with other AIX compiler like XLC and is
required by some AIX tool like DBX.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D99785
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
We currently do not utilize instructions that convert single
precision vectors to doubleword integer vectors. These conversions
come up in code occasionally and this improvement allows us to
open code some functions that need to be added to altivec.h.
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.
Differential revision: https://reviews.llvm.org/D100478
This patch exploits mtvsrdd instruction (available in ISA3.0+) to save
two callee-saved GPR registers into a single VSR, making it more
efficient.
Reviewed By: jsji, nemanjai
Differential Revision: https://reviews.llvm.org/D62565
This patch is the last one in backend to support fp128 type in
pre-POWER9 subtargets with VSX, removing temporary option and updating
remaining tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92374
The VSX tablegen file has some rather eggregious uses of
COPY_TO_REGCLASS even in situations where it needs to use
SUBREG_TO_REG. While this produces correct code, it often doesn't
allow the register coalescer to coalesce copies and the resulting
code ends up being suboptimal. This patch just changes over
patterns that should use SUBREG_TO_REG.
We should consider the feeder user number when we do reverse memory
operation transformation. Otherwise, we may get negative impact.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D100166
XSCMPUQP is not available for pre-P9 subtargets. This patch will lower
them into libcall for correct behavior on power7/power8.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92083
Regiser types for xxsplti32dx for two td file patterns was incorrect.
Fixed the two types and added a test case that was reduced from a larger
failing test.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D100223
LLVM test CodeGen/PowerPC/ctrloops-softfloat.ll tries to check for the
absence of sequences of instructions with several CHECK-NOT with one of
those directives using a variable defined in another. However CHECK-NOT
are checked independently so that is using a variable defined in a
pattern that should not occur in the input.
This commit changes occurence of the variable for the regex used in its
definition, thereby making each CHECK-NOT independent.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D99881
Commit 6ad3d05b681b36f6ecc98523257d154053e4116d disables the definition
of CSR that a follow-up CHECK-NOT directive depends on. This commit
replaces the undefined CSR variable use by the regex used to define it.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D99870
Commit 6646033e6e759657b6122fde64844fd28a2c9635 removed the definition
of variable RESULT used in two CHECK-NOT directives in LLVM test
CodeGen/PowerPC/ppc64-i128-abi.ll. This commit replaces the uses by the
regex that was used to define that variable.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D99868
LLVM test CodeGen/PowerPC/ppc-disable-non-volatile-cr.ll tries to check
for the absence of a sequence of instructions with several CHECK-NOT
with one of those directives using a variable defined in another.
However CHECK-NOT are checked independently so that is using a variable
defined in a pattern that should not occur in the input.
This commit changes occurence of the variable for the regex used in its
definition, thereby making each CHECK-NOT independent.
Reviewed By: NeHuang, nemanjai
Differential Revision: https://reviews.llvm.org/D99880
Previously, 34-bit constants were materialized in selectI64Imm(), and we relied
on td pattern matching to instead produce a pli. This becomes problematic as
there is no guarantee that the 34-bit constant will reach the td pattern
selection for pli. It is also possible for other transformations (such as complex
bit permutations) to also produce and utilize the 34-bit constant materialized
through selectI64Imm().
This patch instead produces pli on Power10 directly whenever the constant fits
within 34-bits.
Differential Revision: https://reviews.llvm.org/D99906
This patch adds support for TLS variables to the XCOFF object writer:
- Add TData and TBSS sections
- Add CsectGroups for the mapping classes XCOFF::XMC_TL and XCOFF::XMC_UL
- Add XMC_UL in the enum entry of CsectStorageMapping class to print the string
while reading the symbol properties for TLS variables
- Fix the starting address of TData and TBSS sections
Reviewed by: hubert.reinterpretcast, DiggerLin
Differential Revision: https://reviews.llvm.org/D98946
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.
Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D88785
On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32.
Differential Revision: https://reviews.llvm.org/D99076
This is currently performed in SelectionDAGLegalize, here we make it also
happen in LegalizeVectorOps, allowing a target to lower the SETCC condition
codes first in LegalizeVectorOps and then lower to a custom node afterwards,
without having to duplicate all of the SETCC condition legalization in the
target specific lowering.
As a result of this, fixed length floating point SETCC nodes can now be
properly lowered for SVE.
Differential Revision: https://reviews.llvm.org/D98939
This patch exploits the xxsplti32dx instruction available on Power10
in place of constant pool loads where xxspltidp would not be able to,
usually because the immediate cannot fit into 32 bits.
Differential Revision: https://reviews.llvm.org/D95458
Add an option to tell the compiler that it can use privileged instructions.
This patch only adds the option. Backend implementation will be added in a
future patch.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D99193
In order to have the same option on power PC LLVM and power PC gcc
the option will be changed from -mrop-protection to -mrop-protect.
The feature will be off by default and turned on when the option is used.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D99185
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.
Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.
Differential Revision: https://reviews.llvm.org/D98857
When a D-Form instruction is fed by an add-immediate, we attempt
to merge the two immediates to form a single displacement so we
can remove the add-immediate.
However, we don't check whether the new displacement fits into
a 16-bit signed immediate field early enough. Namely, we do a
sign-extend from 16 bits first which will discard high bits and
then we check whether the result is a 16-bit signed immediate.
It of course will always be.
Move the check prior to the sign extend to ensure we are checking
the correct value.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49640
This patch is to replace the fixed value with expression.
Keep .file section as fixed values as it might be changed. The
remaining sections will hardly be modified. So the Index values
are sequential. By using expression, we can avoid the fixed value
changes in coming patches.
This is a follow-up of patch D97117.
Reviewed By: hubert.reinterpretcast, shchenz
Differential Revision: https://reviews.llvm.org/D98620
This patch adds additional load/store test cases involving scalars, vectors,
and PC-Rel in preparation for the refactored load and store implementation
introduced in D93370.
Differential Revision: https://reviews.llvm.org/D97391
This reverts commit 329aeb5db43f5e69df038fb20d2def77fe6f8595,
and relands commit 61f006ac655431bd44b9e089f74c73bec0c1a48c.
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This is a continuation of D89456.
As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.
This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D98147
This also briefly tests a larger set of architectures than the more
exhaustive functionality tests for AArch64 and x86.
As requested in D88785
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D98339
Starting with Power 10 the instruction paddi is available to use.
The instruction allows for immediates that are 34 bits.
This patch adds exploitation of the paddi instruction to allow us
to materialize constants.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D93300
4c973ae implemented reduction of vector swap for lane-insensitive
operations. This commit fixes it for checking number of uses of the
vector operation.
If we encounter a degenerate select node where both operands are
the same, then we can continue negating the condition while swapping
operands, resulting in an infinite loop. Avoid this by bailing out
if both operands are the same.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49509.
Differential Revision: https://reviews.llvm.org/D98340
As we're going to replace this ambiguous option with more precise
instruction-level fast-math description, some tests need to be updated
and the option doesn't play any role in some of them.
This patch simplifies pattern (xxswap (vec-op (xxswap a) (xxswap b)))
into (vec-op a b) if vec-op is lane-insensitive. The motivating case
is ScalarToVector-VecOp-ExtractElement sequence on LE, but the
peephole itself is not related to endianness, so BE may also benefit
from this.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D97658
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.
Differential Revision: https://reviews.llvm.org/D94816#inline-921403
Add support for the TLS general dynamic access model to assembly
files on AIX 64-bit.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D98078
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.
Differential Revision: https://reviews.llvm.org/D98064
Adds support for the TLS general dynamic access model to
assembly files on AIX 32-bit.
To generate the correct code sequence when accessing a TLS variable
`v`, we first create two TOC entry nodes, one for the variable offset, one
for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX`
node (new node introduced by this patch).
The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is
expanded to 2 copies (to put the variable offset and region handle in
the right registers) and a call to `__tls_get_addr`.
This patch also changes the way TC entries are generated in asm files.
If the generated TC entry is for the region handle of a TLS variable,
we add the `@m` relocation and the `.` prefix to the entry name.
For example:
```
L..C0:
.tc .v[TC],v[TL]@m -> region handle
L..C1:
.tc v[TC],v[TL] -> variable offset
```
Reviewed By: nemanjai, sfertile
Differential Revision: https://reviews.llvm.org/D97948