Commit Graph

487439 Commits

Author SHA1 Message Date
Luke Lau
aca7586ac9 [RISCV] Fix M1 shuffle on wrong SrcVec in lowerShuffleViaVRegSplitting
This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do
the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended
up taking it from V1 instead of V2.
2024-01-31 22:49:19 -08:00
Luke Lau
5605312fc5 [RISCV] Add test to showcase miscompile from #79072 2024-01-31 22:49:19 -08:00
Sander de Smalen
e502141a42 [AArch64][SME] Fix inlining bug introduced in #78703 (#79994)
Calling a `__arm_locally_streaming` function from a function that
is not a streaming-SVE function would lead to incorrect inlining.

The issue didn't surface because the tests were not testing what
they were supposed to test.

(cherry picked from commit 3abf55a68caefd45042c27b73a658c638afbbb8b)
2024-01-31 22:29:42 -08:00
Sander de Smalen
27139bceb2 [SME] Stop RA from coalescing COPY instructions that transcend beyond smstart/smstop. (#78294)
This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that
expands to
a 'nop', but which stops the register allocator from coalescing a COPY
node when
its use/def crosses a SMSTART or SMSTOP instruction.

For example:

    %0:fpr64 = COPY killed $d0
    undef %2.dsub:zpr = COPY %0       // <- Do not coalesce this COPY
    ADJCALLSTACKDOWN 0, 0
MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0
    $d0 = COPY killed %0
    BL @use_f64, csr_aarch64_aapcs

If the COPY would be coalesced, that would lead to:

    $d0 = COPY killed %0

being replaced by:

    $d0 = COPY killed %2.dsub

which means the whole ZPR reg would be live upto the call, causing the
MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register:

    str     q0, [sp]   // 16-byte Folded Spill
    smstop  sm
    ldr     z0, [sp]   // 16-byte Folded Reload
    bl      use_f64

which would be incorrect for two reasons:
1. The program may load more data than it has allocated.
2. If there are other SVE objects on the stack, the compiler might use
the
   'mul vl' addressing modes to access the spill location.

By disabling the coalescing, we get the desired results:

    str     d0, [sp, #8]  // 8-byte Folded Spill
    smstop  sm
    ldr     d0, [sp, #8]  // 8-byte Folded Reload
    bl      use_f64

(cherry picked from commit dd736661826e215ac70ff3a4a4ccd75bda0c5ccd)
2024-01-31 15:41:15 +00:00
Sam James
400a02bb28 [sanitizer] Handle Gentoo's libstdc++ path
On Gentoo, libc++ is indeed in /usr/include/c++/*, but libstdc++ is at
e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14.

Use '/include/g++' as it should be unique enough. Note that the omission of
a trailing slash is intentional to match g++-*.

See https://github.com/llvm/llvm-project/pull/78534#issuecomment-1904145839.

Reviewed by: mgorny
Closes: https://github.com/llvm/llvm-project/pull/79264

Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit e8f882f83acf30d9b4da8846bd26314139660430)
2024-01-30 15:47:38 -08:00
Alex Bradbury
fe2fca3b8e
Backport [RISCV] Graduate Zicond to non-experimental (#79811) (#80018)
The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.

(cherry-picked from commit d833b9d677c9dd0a35a211e2fdfada21ea9a464b)
2024-01-30 15:31:38 -08:00
Yi Wu
a2d4a4c0b2 Apply kind code check on exitstat and cmdstat (#78286)
When testing on gcc, both exitstat and cmdstat must be a kind=4 integer,
e.g. DefaultInt. This patch changes the input arg requirement from
`AnyInt` to `TypePattern{IntType, KindCode::greaterOrEqualToKind, n}`.

The standard stated in 16.9.73
- EXITSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least nine.
- CMDSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least four.

```fortran
program bug
  implicit none
  integer(kind = 2) :: exitstatvar
  integer(kind = 4) :: cmdstatvar
  character(len=256) :: msg
  character(len=:), allocatable :: command
  command='echo hello'
  call execute_command_line(command, exitstat=exitstatvar, cmdstat=cmdstatvar)
end program
```
When testing the above program with exitstatvar kind<4, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'exitstat=' has bad type or kind 'INTEGER(2)'
    call execute_command_line(command, exitstat=exitstatvar)
```

When testing the above program with exitstatvar kind<2, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'cmdstat=' has bad type or kind 'INTEGER(1)'
    call execute_command_line(command, cmdstat=cmdstatvar)
```

Test file for this semantics has been added to `flang/test/Semantic`
Fixes: https://github.com/llvm/llvm-project/issues/77990

(cherry picked from commit 14a15103cc9dbdb3e95c04627e0b96b5e3aa4944)
2024-01-30 16:45:35 +00:00
David Green
bab01aead7 Revert "[AArch64] merge index address with large offset into base address"
This reverts commit 32878c2065 due to #79756 and #76202.

(cherry picked from commit 915c3d9e5a2d1314afe64cd6116a3b6c9809ec90)
2024-01-29 15:17:53 -08:00
Andrei Golubev
0680e84a3f [mlir] Revert to old fold logic in IR::Dialect::add{Types, Attributes}() (#79582)
Fold expressions on Clang are limited to 256 elements. This causes
compilation errors in cases when the amount of elements added exceeds
this limit. Side-step the issue by restoring the original trick that
would use the std::initializer_list. For the record, in our downstream
Clang 16 gives:

mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold
expression with 688 arguments exceeded expression nesting limit of 256
    (addType<Args>(), ...);

Partially reverts 26d811b3ec.

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)
2024-01-29 15:10:47 -08:00
David Sherwood
bdaf16d59f [LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc2

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.

(cherry picked from commit 962fbafecf4730ba84a3b9fd7a662a5c30bb2c7c)
2024-01-29 15:05:18 -08:00
Jay Foad
824a3e5dec [AMDGPU] Do not bother adding reserved registers to liveins (#79436)
Tweak the implementation of llvm.amdgcn.wave.id to not add TTMP8 to the
function liveins.
2024-01-29 15:00:47 -08:00
Jay Foad
4c8cf4a1c2 [AMDGPU] New llvm.amdgcn.wave.id intrinsic (#79325)
This is only valid on targets with architected SGPRs.
2024-01-29 15:00:47 -08:00
Alexander Kornienko
b73cd5ec71 Revert "[SemaCXX] Implement CWG2137 (list-initialization from objects of the same type) (#77768)"
This reverts commit 924701311a. Causes compilation
errors on valid code, see
https://github.com/llvm/llvm-project/pull/77768#issuecomment-1908062472.

(cherry picked from commit 6e4930c67508a90bdfd756f6e45417b5253cd741)
2024-01-29 14:57:23 -08:00
Andrei Golubev
3df71e5a3f [mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562)
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:

```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]

mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]
```

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)
2024-01-29 10:29:59 -08:00
Michał Górny
2c3214135f [llvm] [cmake] Include httplib in LLVMConfig.cmake (#79305)
Include LLVM_ENABLE_HTTPLIB along with httplib package finding in
LLVMConfig.cmake, as this dependency is needed by LLVMDebuginfod that is
now used by LLDB. Without it, building LLDB standalone fails with:

```
CMake Error at /usr/lib/llvm/19/lib64/cmake/llvm/LLVMExports.cmake:90 (set_target_properties):
  The link interface of target "LLVMDebuginfod" contains:

    httplib::httplib

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  /usr/lib/llvm/19/lib64/cmake/llvm/LLVMConfig.cmake:357 (include)
  cmake/modules/LLDBStandalone.cmake:9 (find_package)
  CMakeLists.txt:34 (include)
```

(cherry picked from commit 3c9f34c12450345c6eb524e47cf79664271e4260)
2024-01-29 10:25:40 -08:00
Tom Stellard
b79e6a3c8a
[workflows] Fix argument passing in abi-dump jobs (#79658) (#79836)
This was broken by 859e6aa100, which added
quotes around the EXTRA_ARGS variable.
2024-01-29 10:15:50 -08:00
Jay Foad
27654471cc [AMDGPU] Move architected SGPR implementation into isel (#79120)
(cherry picked from commit 70fc9703788e8965813c5b677a85cb84b66671b6)
2024-01-27 23:18:25 -08:00
Tom Stellard
ddbdd7b267
workflows: Merge LLVM tests together into a single job (#78877) (#79710)
This is possible now that the free GitHub runners for Windows and Linux
have more disk space:


https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/

I also had to switch from macOS-11 to macOS-13 in order to prevent the
job from timing out. macOS-13 runners have 4 vCPUs and the macOS-11
runners only have 3.
2024-01-27 23:14:59 -08:00
Paschalis Mpeis
fa0a72b584 [LTO] Fix Veclib flags correctly pass to LTO flags (#78749)
Flags `-fveclib=name` were not passed to LTO flags.
This pass fixes that by converting the `-fveclib` flags to their
relevant names for opt's `-vector-lib=name` flags.

For example:
`-fveclib=SLEEF` would become `-vector-library=sleefgnuabi` and passed
through the `-plugin-opt` flag.

(cherry picked from commit 03cf0e9354e7e56ff794e9efb682ed2971bc91ec)
2024-01-27 15:52:23 -08:00
erichkeane
16bfe1e89f Fix comparison of Structural Values
Fixes a regression from #78041 as reported in the review.  The original
patch failed to compare the canonical type, which this adds.  A slightly
modified test of the original report is added.

(cherry picked from commit e3ee3762304aa81e4a240500844bfdd003401b36)
2024-01-27 10:31:53 -08:00
Adhemerval Zanella
2cf04c020f [X86] Do not end 'note.gnu.property' section with -fcf-protection (#79360)
The glibc now adds the required minimum ISA level for libc-nonshared.a
(linked on all programs) and this is done with an inline asm along with
.note.gnu.property and .pushsection/.popsection. However, the x86
backend always ends the 'note.gnu.property' section when building with
-fcf-protection, leading to assert failure:

llvm/llvm-project-git/llvm/lib/MC/MCStreamer.cpp:1251: virtual void
llvm::MCStreamer::switchSection(llvm::MCSection*, const llvm::MCExpr*):
Assertion `!Section->hasEnded() && "Section already ended"' failed.

[1]
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/isa-level.c;h=3f1b269848a52f994275bab6f60dded3ded6b144;hb=HEAD

(cherry picked from commit a58c62fa824fd24d20fa2366e0ec8f241cb321fe)
2024-01-27 10:19:11 -08:00
Sean Fertile
15aeb35c53 [LTO] Fix fat-lto output for -c -emit-llvm. (#79404)
Fix and add a test case for combining '-ffat-lto-objects -c -emit-llvm'
options and fix a spelling mistake in same test.

(cherry picked from commit f1b1611148fa533fe198fec3fa4ef8139224dc80)
2024-01-27 06:51:08 -08:00
Mariusz Sikora
2d759eff89 [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (#78414)
…bf8 instructions

    Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
    instructions that were supported on GFX940 (MI300):
    - V_CVT_F32_FP8
    - V_CVT_F32_BF8
    - V_CVT_PK_F32_FP8
    - V_CVT_PK_F32_BF8
    - V_CVT_PK_FP8_F32
    - V_CVT_PK_BF8_F32
    - V_CVT_SR_FP8_F32
    - V_CVT_SR_BF8_F32

---------

Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com>
(cherry picked from commit cfddb59be2124f7ec615f48a2d0395c6fdb1bb56)
2024-01-27 06:45:32 -08:00
Fangrui Song
e2521eaa1a [ELF] Implement R_RISCV_TLSDESC for RISC-V
Support
R_RISCV_TLSDESC_HI20/R_RISCV_TLSDESC_LOAD_LO12/R_RISCV_TLSDESC_ADD_LO12/R_RISCV_TLSDESC_CALL.
LOAD_LO12/ADD_LO12/CALL relocations reference a label at the HI20
location, which requires special handling. We save the value of HI20 to
be reused. Two interleaved TLSDESC code sequences, which compilers do
not generate, are unsupported.

For -no-pie/-pie links, TLSDESC to initial-exec or local-exec
optimizations are eligible. Implement the relevant hooks
(R_RELAX_TLS_GD_TO_LE, R_RELAX_TLS_GD_TO_IE): the first two instructions
are converted to NOP while the latter two are converted to a GOT load or
a lui+addi.

The first two instructions, which would be converted to NOP, are removed
instead in the presence of relaxation. Relaxation is eligible as long as
the R_RISCV_TLSDESC_HI20 relocation has a pairing R_RISCV_RELAX,
regardless of whether the following instructions have a R_RISCV_RELAX.
In addition, for the TLSDESC to LE optimization (`lui a0,<hi20>; addi a0,a0,<lo12>`),
`lui` can be removed (i.e. use the short form) if hi20 is 0.

```
// TLSDESC to LE/IE optimization
.Ltlsdesc_hi2:
  auipc a4, %tlsdesc_hi(c)                      # if relax: remove; otherwise, NOP
  load  a5, %tlsdesc_load_lo(.Ltlsdesc_hi2)(a4) # if relax: remove; otherwise, NOP
  addi  a0, a4, %tlsdesc_add_lo(.Ltlsdesc_hi2)  # if LE && !hi20 {if relax: remove; otherwise, NOP}
  jalr  t0, 0(a5), %tlsdesc_call(.Ltlsdesc_hi2)
  add   a0, a0, tp
```

The implementation carefully ensures that an instruction unrelated to
the current TLSDESC code sequence, if immediately follows a removable
instruction (HI20 or LOAD_LO12 OR (LE-specific) ADD_LO12), is not
converted to NOP.

* `riscv64-tlsdesc.s` is inspired by `i386-tlsdesc-gd.s` (https://reviews.llvm.org/D112582).
* `riscv64-tlsdesc-relax.s` tests linker relaxation.
* `riscv-tlsdesc-gd-mixed.s` is inspired by `x86-64-tlsdesc-gd-mixed.s` (https://reviews.llvm.org/D116900).

Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373

Reviewed By: ilovepi

Pull Request: https://github.com/llvm/llvm-project/pull/79239

(cherry picked from commit 1117fdd7c16873eb389e988c6a39ad922bae0fd0)
2024-01-26 21:34:49 -08:00
Fangrui Song
3d02473ac5 [ELF] Fix terminology: TLS optimizations instead of TLS relaxation. NFC
(cherry picked from commit 849951f8759171cb6c74d3ccbcf154506fc1f0ae)
2024-01-26 21:34:49 -08:00
Fangrui Song
e9d99e5183 [ELF] Clean up R_RISCV_RELAX code. NFC
(cherry picked from commit ccb99f221422b8de5e1ae04d3427f15878f7cd93)
2024-01-26 21:34:48 -08:00
Wang Pengcheng
d9e26c223b [RISCV] Use TableGen-based macro fusion (#72224)
We convert existed macro fusions to TableGen.

Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.

(cherry picked from commit 3fdb431b636975f2062b1931158aa4dfce6a3ff1)
2024-01-26 21:28:01 -08:00
Wang Pengcheng
7cfa0c1b7c [TableGen] Add predicates for immediates comparison (#76004)
These predicates can be used to represent `<`, `<=`, `>`, `>=`.

And a predicate for `in range` is added.

(cherry picked from commit 664a0faac464708fc061d12e5cd492fcbfea979a)
2024-01-26 21:28:00 -08:00
Wang Pengcheng
62877e375c [TableGen] Use MapVector to remove non-determinism
This fixes found non-determinism when `LLVM_REVERSE_ITERATION`
option is `ON`.

Fixes #79420.

Reviewers: ilovepi, MaskRay

Reviewed By: MaskRay

Pull Request: https://github.com/llvm/llvm-project/pull/79411

(cherry picked from commit 41fe98a6e7e5cdcab4a4e9e0d09339231f480c01)
2024-01-26 21:24:37 -08:00
Mirko Brkušanin
ed48280f8e [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-26 20:01:08 -08:00
Fangrui Song
aa4cb0e313 [Driver,CodeGen] Support -mtls-dialect= (#79256)
GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values

* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)

AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.

TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.

There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.

Co-authored-by: Paul Kirth <paulkirth@google.com>
(cherry picked from commit 36b4a9ccd9f7e04010476e6b2a311f2052a4ac20)
2024-01-26 19:51:03 -08:00
dyung
147c623a86
Change check for embedded llvm version number to a regex to make test more flexible. (#79528) (#79642)
This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.

I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.

I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.

(cherry picked from commit 45f883ed06f39fba7557dfbbff4d10595b45f874)
2024-01-26 19:27:28 -08:00
Fangrui Song
0991d3c7b5 [ELF] Don't resolve relocations referencing SHN_ABS to tombstone in non-SHF_ALLOC sections (#79238)
A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before #74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses

(https://github.com/llvm/llvm-project/pull/74686#issuecomment-1904101711)
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.

```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }

0x00000023:   DW_TAG_variable [2]   (0x0000000c)
                ...
                DW_AT_location [DW_FORM_exprloc]        (DW_OP_addrx 0x0)

```

.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.

The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b6b9, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.

We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.

(cherry picked from commit 8abf8d124ae346016c56209de7f57b85671d4367)
2024-01-25 17:27:37 -08:00
Shengchen Kan
453ff4b733 [X86][CodeGen] Fix crash when commute operands of Instruction for code size (#79245)
Reported in 134fcc6278

Incorrect opcode is used  b/c there is a `[[fallthrough]]` at line 2386.

(cherry picked from commit 33ecef9812e2c9bfadef035b8e34a949acae2abc)
2024-01-25 17:24:02 -08:00
Weining Lu
c9e73cdd9a [test] Update dwarf-loongarch-relocs.ll
Address buildbot faiures:
http://45.33.8.238/macm1/77360/step_11.txt
http://45.33.8.238/linux/128902/step_12.txt

(cherry picked from commit baba7e4175b6ca21e83b1cf8229f29dbba02e979)
2024-01-25 17:14:34 -08:00
Louis Dionne
3173faaff1
[🍒][ci] Fix the base branch we use to determine changes (#79503) (#79506)
We should diff against the base branch, not always against `main`. This
allows the BuildKite pre-commit CI to work properly when we target other
branches, such as `release/18.x`.

(cherry picked from commit 3b762891826192ded07286852d326f9c9060f52e)
2024-01-25 13:57:41 -08:00
Tom Stellard
6abd792a67 [workflows] Fix version-check.yml to work with the new minor release bump
(cherry picked from commit d5e69147b9d261bd53b4dd027f17131677be8613)
2024-01-25 13:33:17 -08:00
Tom Stellard
2268346374 Use rc version suffix 2024-01-23 20:27:37 -08:00
Tom Stellard
f0c79331a0 Bump version to 18.1.0 2024-01-23 20:19:45 -08:00
Wang Pengcheng
93248729cf
[RISCV][MC] Split tests for A into Zaamo and Zalrsc parts
So that we don't duplicate tests in later patch.

Reviewers: topperc, dtcxzyw, asb

Reviewed By: asb

Pull Request: https://github.com/llvm/llvm-project/pull/79111
2024-01-24 10:49:14 +08:00
Michael Maitland
63f742c15f
[RISCV] Add sifive-p670 processor (#79015)
This is an OOO core that has a vector unit. For more information see
https://www.sifive.com/cores/performance-p650-670.

Scheduler model and other tuning will come in separate patches.
2024-01-23 21:45:24 -05:00
paperchalice
7bda0ce15a
[llc] Remove C backend support (#79237)
C backend is removed in 3.1.
2024-01-24 10:40:11 +08:00
Chuanqi Xu
f0c3870388
[Modules] [HeaderSearch] Don't reenter headers if it is pragma once (#76119)
Close https://github.com/llvm/llvm-project/issues/73023

The direct issue of https://github.com/llvm/llvm-project/issues/73023 is
that we entered a header which is marked as pragma once since the
compiler think it is OK if there is controlling macro.

It doesn't make sense. I feel like it should be sufficient to skip it
after we see the '#pragma once'.

From the context, it looks like the workaround is primarily for
ObjectiveC. So we might need reviewers from OC.
2024-01-24 10:22:35 +08:00
Nico Weber
ecde13b1a8 [gn build] port 7e50f006f7 2024-01-23 21:14:36 -05:00
Craig Topper
3dea0aa8f4
[LSR] Fix incorrect comment. NFC (#79207) 2024-01-23 17:57:34 -08:00
Christudasan Devadasan
230c13d59d
[AMDGPU] Pick available high VGPR for CSR SGPR spilling (#78669)
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.

This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
2024-01-24 07:08:43 +05:30
paperchalice
7e50f006f7
[NewPM][CodeGen][llc] Add NPM support (#70922)
Add new pass manager support to `llc`. Users can use
`--passes=pass1,pass2...` to run mir passes, and use `--enable-new-pm`
to run default codegen pipeline.
This patch is taken from [D83612](https://reviews.llvm.org/D83612), the
original author is @yuanfang-chen.

---------

Co-authored-by: Yuanfang Chen <455423+yuanfang-chen@users.noreply.github.com>
2024-01-24 09:27:25 +08:00
Fangrui Song
c663c8b883 [ELF,test] Improve dead-reloc-in-nonalloc.s
Test an absolute relocation referencing a DSO symbol, relocating a
non-SHF_ALLOC section. Also test --gc-sections.
2024-01-23 17:23:52 -08:00
Jeffrey Byrnes
f709fbb1bb
[SROA] Only try additional vector type candidates when needed (#77678)
f9c2a341b9
causes regressions when we have a slice with integer vector type that is
the same size as the partition, and a ptr load/store slice that is not
the size of the element type.

Ref `vector-promotion.ll:ptrLoadStoreTys`. 

Before the patch, we would only consider `<4 x i32>` as a candidate type
for vector promotion, and would find that it is a viable type for all
the slices.

After the patch, we now add `<2 x ptr>` as a candidate type due to slice
with user `store ptr %val0, ptr %obj, align 8` -- and flag that we
`HaveVecPtrTy`. The pre-existing behavior of this flag results in
removing the viable `<4 x i32>` and keeping only the unviable `<2 x
ptr>`, which results in a failure to promote.

The end result is failing to promote an alloca that was previously
promoted -- this does not appear to be the intent of that patch, which
has the goal of increasing promotions by providing more promotion
opportunities.

This PR preserves this behavior via a simple reorganization of the
implemention: try first the slice types with same size as the partition,
then, if there is no promotable type, try the `LoadStoreTys.`
2024-01-23 17:22:49 -08:00
Jinyang He
c51ab483e6
[LoongArch] Insert nops and emit align reloc when handle alignment directive (#72962)
Refer to RISCV, we will fix up the alignment if linker relaxation
changes code size and breaks alignment. Insert enough Nops and emit
R_LARCH_ALIGN relocation type so that linker could satisfy the alignment
by removing Nops.
It does so only in sections with the SHF_EXECINSTR flag.

In LoongArch psABI v2.30, R_LARCH_ALIGN requires symbol index. The
lowest 8 bits of addend represent alignment and the other bits of addend
represent the maximum number of bytes to emit.
2024-01-24 09:17:49 +08:00