65934 Commits

Author SHA1 Message Date
Matt Arsenault e9835c0f31 AMDGPU: Remove optnone from a test
It's not clear why the test had this. I'm unable to break the original
case with the original patch reverted with or without optnone.

This avoids a failure in a future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375321 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-19 01:34:59 +00:00
Matt Arsenault 8672594561 LiveIntervals: Fix handleMoveUp with subreg def moving across a def
If a subregister def was moved across another subregister def and
another use, the main range was not correctly updated. The end point
of the moved interval ended too early and missed the use from theh
other lanes in the subreg def.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375300 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 23:24:25 +00:00
Stanislav Mekhanoshin 6cfc726a65 [AMDGPU] move PHI nodes to AGPR class
If all uses of a PHI are in AGPR register class we should
avoid unneeded copies via VGPRs.

Differential Revision: https://reviews.llvm.org/D69200

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375297 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 22:48:45 +00:00
Wei Mi 762a6a2124 [SampleFDO] Add profile remapping support for profile on-demand loading used
by ExtBinary format profile

Profile on-demand loading was added for ExtBinary format profile in rL374233,
but currently profile on-demand loading doesn't work well with profile
remapping. The patch adds the support.

Suppose a function in the current module has outline instance in the profile.
The function name in the module is different from the name of the outline
instance, but remapper knows the two names are equal. When loading profile
on-demand, the outline instance has to be loaded with remapper's help.

At the same time SampleProfileReaderItaniumRemapper is changed from a proxy
of SampleProfileReader to a helper member in SampleProfileReader.

Differential Revision: https://reviews.llvm.org/D68901

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375295 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 22:35:20 +00:00
Jay Foad 1e88075ba3 [AMDGPU] Remove -amdgpu-spill-sgpr-to-smem.
Summary: The implementation was never completed and never used except in tests.

Reviewers: arsenm, mareko

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69163

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375293 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 21:48:22 +00:00
Reid Kleckner 88cdc6927e [X86] Fix register parsing in .seh_* in Intel syntax
Previously, the parser checked for a '%' prefix to indicate a register.
In Intel syntax mode, LLVM does not print a '%' prefix on registers, so
LLVM could not parse its own assembly output. Instead, require that
register numbers be integer literals, or at least start with an integer
literal, which is consistent with .cfi_* directive register parsing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375287 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 21:01:41 +00:00
Roman Lebedev a82e2a53ab [NFC][CVP] Some tests for mul no-wrap deduction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375285 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 20:36:19 +00:00
Thomas Lively 523325a26c [WebAssembly] Allow multivalue signatures in object files
Summary:
Also changes the wasm YAML format to reflect the possibility of having
multiple return types and to put the returns after the params for
consistency with the binary encoding.

Reviewers: aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, arphaman, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375283 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 20:27:30 +00:00
Roman Lebedev e1659b963e [CVP] After proving that @llvm.with.overflow()/@llvm.sat() don't overflow, also try to prove other no-wrap
Summary:
CVP, unlike InstCombine, does not run till exaustion.
It only does a single pass.

When dealing with those special binops, if we prove that they can
safely be demoted into their usual binop form,
we do set the no-wrap we deduced. But when dealing with usual binops,
we try to deduce both no-wraps.

So if we convert e.g. @llvm.uadd.with.overflow() to `add nuw`,
we won't attempt to check whether it can be `add nuw nsw`.

This patch proposes to call `processBinOp()` on newly-created binop,
which is identical to what we do for div/rem already.

Reviewers: nikic, spatel, reames

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69183

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375273 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 19:32:47 +00:00
Matt Arsenault 7d97468a23 AMDGPU: Relax 32-bit SGPR register class
Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This
will allow the register coalescer to do a better job eliminating
copies to m0.

For GlobalISel, as a terrible hack, use SGPR_32 for things that should
use SCC until booleans are solved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375267 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 18:26:37 +00:00
Austin Kerbow 0a8dc0861b AMDGPU: Fix SMEM WAR hazard for gfx10 readlane
Summary: Hazard recognizer fails to see hazard with V_READLANE_B32_gfx10.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375265 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 18:20:30 +00:00
Roman Lebedev 07fddc38a9 [NFC][CVP] Add @llvm.*.sat tests where we could prove both no-overflows
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375260 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 17:18:12 +00:00
Joseph Tremoulet eab53d737b Update MinidumpYAML to use minidump::Exception for exception stream
Reviewers: labath, jhenderson, clayborg, MaskRay, grimar

Reviewed By: grimar

Subscribers: lldb-commits, grimar, MaskRay, hiraditya, llvm-commits

Tags: #llvm, #lldb

Differential Revision: https://reviews.llvm.org/D68657

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375242 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 14:56:19 +00:00
Dmitry Preobrazhensky c90b4243b5 [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32
See https://bugs.llvm.org/show_bug.cgi?id=43608

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69096

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375241 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 14:49:53 +00:00
Eugene Leviant db52ae4706 One more attempt to fix PS4 buildbot after r375219
PS4 buildbot seems to be dropping variable names for some reason


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375237 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 14:11:19 +00:00
Eugene Leviant 732f3bb372 Attempt to fix PS4 buildbot after r375219
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375235 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 13:52:51 +00:00
Nemanja Ivanovic 6ad9a8a867 Revert r375152 as it is causing failures on EXPENSIVE_CHECKS bot
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375233 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 13:38:46 +00:00
Dmitry Preobrazhensky 7abc3dffcc [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa
See https://bugs.llvm.org/show_bug.cgi?id=43607

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69095

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375231 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 13:31:53 +00:00
Victor Campos 3e5185e268 [AArch64] Adding support for PMMIR_EL1 register
Summary:
The PMMIR_EL1 register is present in Armv8.4 with PMU extension.
This patch adds support for it.

Reviewers: t.p.northover, dnsampaio

Reviewed By: dnsampaio

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68940

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375228 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 12:40:29 +00:00
Graham Hunter 77a2562985 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375222 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 11:48:35 +00:00
Eugene Leviant 7944d32db7 [ThinLTOCodeGenerator] Add support for index-based WPD
Differential revision: https://reviews.llvm.org/D68950


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375219 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 10:54:14 +00:00
David Green 40c2ef733e [AArch64] Don't combine callee-save and local stack adjustment when optimizing for size
For arm64, D18619 introduced the ability to combine bumping the stack pointer
upfront in case it needs to be bumped for both the callee-save area as well as
the local stack area.

That diff already remarks that "This change can cause an increase in
instructions", but argues that even when that happens, it should be still be a
performance benefit because the number of micro-ops is reduced.

We have observed that this code-size increase can be significant in practice.
This diff disables combining stack bumping for methods that are marked as
optimize-for-size.

Example of a prologue with the behavior before this diff (combining stack bumping when possible):
  sub        sp, sp, #0x40
  stp        d9, d8, [sp, #0x10]
  stp        x20, x19, [sp, #0x20]
  stp        x29, x30, [sp, #0x30]
  add        x29, sp, #0x30
  [... compute x8 somehow ...]
  stp        x0, x8, [sp]

And after this  diff, if the method is marked as optimize-for-size:
  stp        d9, d8, [sp, #-0x30]!
  stp        x20, x19, [sp, #0x10]
  stp        x29, x30, [sp, #0x20]
  add        x29, sp, #0x20
  [... compute x8 somehow ...]
  stp        x0, x8, [sp, #-0x10]!

Note that without combining the stack bump there are two auto-decrements,
nicely folded into the stp instructions, whereas otherwise there is a single
sub sp, ... instruction, but not folded.

Patch by Nikolai Tillmann!

Differential Revision: https://reviews.llvm.org/D68530


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375217 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 10:35:46 +00:00
Simon Pilgrim f1a723063c [X86] Regenerate memcmp tests and add X64-AVX512 common prefix
Should help make the changes in D69157 clearer

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375215 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 09:59:51 +00:00
David Green eaef0d10bd [Codegen] Alter the default promotion for saturating adds and subs
The default promotion for the add_sat/sub_sat nodes currently does:
    ANY_EXTEND iN to iM
    SHL by M-N
    [US][ADD|SUB]SAT
    L/ASHR by M-N

If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.

Differential Revision: https://reviews.llvm.org/D68926


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375211 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 09:47:48 +00:00
Kerry McLaughlin 9d9055cfe2 [AArch64][SVE] Implement unpack intrinsics
Summary:
Implements the following intrinsics:
  - int_aarch64_sve_sunpkhi
  - int_aarch64_sve_sunpklo
  - int_aarch64_sve_uunpkhi
  - int_aarch64_sve_uunpklo

This patch also adds AArch64ISD nodes for UNPK instead of implementing
the intrinsics directly, as they are required for a future patch which
implements the sign/zero extension of legal vectors.

This patch includes tests for the Subdivide2Argument type added by D67549

Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka

Reviewed By: greened

Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D67550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375210 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 09:40:16 +00:00
Bjorn Pettersson c395575313 [InstCombine] Fix miscompile bug in canEvaluateShuffled
Summary:
Add restrictions in canEvaluateShuffled to prevent that we for example
transform

  %0 = insertelement <2 x i16> undef, i16 %a, i32 0
  %1 = srem <2 x i16> %0, <i16 2, i16 1>
  %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0>

into

   %1 = insertelement <2 x i16> undef, i16 %a, i32 1
   %2 = srem <2 x i16> %1, <i16 undef, i16 2>

as having an undef denominator makes the srem undefined (for all
vector elements).

Fixes: https://bugs.llvm.org/show_bug.cgi?id=43689

Reviewers: spatel, lebedev.ri

Reviewed By: spatel, lebedev.ri

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69038

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375208 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 07:42:02 +00:00
Bjorn Pettersson acb3705a54 [InstCombine] Pre-commit of test case showing miscompile bug in canEvaluateShuffled
Adding the reproducer from  https://bugs.llvm.org/show_bug.cgi?id=43689,
showing that instcombine is doing a bad transform. It transforms

  %0 = insertelement <2 x i16> undef, i16 %a, i32 0
  %1 = srem <2 x i16> %0, <i16 2, i16 1>
  %2 = shufflevector <2 x i16> %1, <2 x i16> undef, <2 x i32> <i32 undef, i32 0>

into

   %1 = insertelement <2 x i16> undef, i16 %a, i32 1
   %2 = srem <2 x i16> %1, <i16 undef, i16 2>

The undef denominator makes the whole srem undefined.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375207 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 07:41:53 +00:00
David Zarzycki 3a2363c797 [X86] Emit KTEST when possible
https://reviews.llvm.org/D69111

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375197 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-18 03:45:52 +00:00
Philip Reames 8aca037140 [Test] Precommit test for D69006
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375190 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 23:32:35 +00:00
Jordan Rupprecht dcc80a5730 Reland [llvm-objdump] Use a counter for llvm-objdump -h instead of the section index.
This relands r374931 (reverted in r375088). It fixes 32-bit builds by using the right format string specifier for uint64_t (PRIu64) instead of `%d`.

Original description:

When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).

While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.

Reviewers: grimar, jhenderson, espindola

Reviewed By: grimar

Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68848

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375178 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 21:55:43 +00:00
Jordan Rupprecht c86b9d718d [llvm-objcopy] Add support for shell wildcards
Summary: GNU objcopy accepts the --wildcard flag to allow wildcard matching on symbol-related flags. (Note: it's implicitly true for section flags).

The basic syntax is to allow *, ?, \, and [] which work similarly to how they work in a shell. Additionally, starting a wildcard with ! causes that wildcard to prevent it from matching a flag.

Use an updated GlobPattern in libSupport to handle these patterns. It does not fully match the `fnmatch` used by GNU objcopy since named character classes (e.g. `[[:digit:]]`) are not supported, but this should support most existing use cases (mostly just `*` is what's used anyway).

Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap

Reviewed By: MaskRay

Subscribers: nickdesaulniers, emaste, arichardson, hiraditya, jakehehrlich, abrachet, seiya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66613

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375169 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 20:51:00 +00:00
Sanjay Patel 8c2e05f1ac [x86] add test for setcc to shift transform; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375158 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 19:32:24 +00:00
Nemanja Ivanovic 3c336dad2a [PowerPC] Turn on CR-Logical reducer pass
Quite a while ago, we implemented a pass that will reduce the number of
CR-logical operations we emit. It does so by converting a CR-logical operation
into a branch. We have kept this off by default because it seemed to cause a
significant regression with one benchmark.
However, that regression turned out to be due to a completely unrelated
reason - AADB introducing a self-copy that is a priority-setting nop and it was
just exacerbated by this pass.

Now that we understand the reason for the only degradation, we can turn this
pass on by default. We have long since fixed the cause for the degradation.

Differential revision: https://reviews.llvm.org/D52431


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375152 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 18:24:28 +00:00
Sanjay Patel 40eb6f64fe [PowerPC] add tests for popcount with zext; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375142 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 17:44:04 +00:00
Reid Kleckner c21ecb75c2 [codeview] Workaround for PR43479, don't re-emit instr labels
Summary:
In the long run we should come up with another mechanism for marking
call instructions as heap allocation sites, and remove this workaround.
For now, we've had two bug reports about this, so let's apply this
workaround. SLH (the other client of instruction labels) probably has
the same bug, but the solution there is more likely to be to mark the
call instruction as not duplicatable, which doesn't work for debug info.

Reviewers: akhuang

Subscribers: aprantl, hiraditya, aganea, chandlerc, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69068

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375137 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 17:28:31 +00:00
Roman Lebedev 5600873160 [NFC][InstCombine] Tests for "fold variable mask before variable shift-of-trunc" (PR42563)
https://bugs.llvm.org/show_bug.cgi?id=42563

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375135 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 17:20:12 +00:00
Xiangling Liao a0544616ea [AIX] TOC pseudo expansion for 64bit large + 64bit small + 32bit large models
This patch provides support for peudo ops including ADDIStocHA8, ADDIStocHA, LWZtocL,
LDtoc, LDtocL for AIX, lowering them from MIR to assembly.

Differential Revision: https://reviews.llvm.org/D68341

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375113 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 13:20:25 +00:00
Daniil Fukalov 8fac8d9af4 [AMDGPU] Improve code size cost model
Summary:
Added estimation for zero size insertelement, extractelement
and llvm.fabs operators.
Updated inline/unroll parameters default values.

Reviewers: rampitec, arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68881

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375109 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 12:15:35 +00:00
Sam Parker 5a5be13090 [ARM][MVE] Enable truncating masked stores
Allow us to generate truncating masked store which take v4i32 and
v8i16 vectors and can store to v4i8, v4i16 and v8i8 and memory.
Removed support for unaligned masked stores.

Differential Revision: https://reviews.llvm.org/D68461

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375108 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 12:11:18 +00:00
Fangrui Song dd84a1ead8 [llvm-ar] Implement the O modifier: display member offsets inside the archive
Since GNU ar 2.31, the 't' operation prints member offsets beside file
names if the 'O' modifier is specified. 'O' is ignored for thin
archives.

Reviewed By: gbreynoo, ruiu

Differential Revision: https://reviews.llvm.org/D69087

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375106 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 11:34:29 +00:00
Fangrui Song 327f5f9ccd [llvm-objcopy] --add-symbol: fix crash if SHT_SYMTAB does not exist
Exposed by D69041. If SHT_SYMTAB does not exist, ELFObjcopy.cpp:handleArgs will crash due
to a null pointer dereference.

  for (const NewSymbolInfo &SI : Config.ELF->SymbolsToAdd) {
    ...
    Obj.SymbolTable->addSymbol(

Fix this by creating .symtab and .strtab on demand in ELFBuilder<ELFT>::readSections,
if --add-symbol is specified.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D69093

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375105 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 11:21:54 +00:00
Roman Lebedev f7a287e981 [LoopIdiom] BCmp: check, not assert that loop exits exit out of the loop (PR43687)
We can't normally stumble into that assertion because a tautological
*conditional* `br` in loop body is required, one that always
branches to loop latch. But that should have been always folded
to an unconditional branch before we get it.
But that is not guaranteed if the pass is run standalone.
So let's just promote the assertion into a proper check.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43687

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375100 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 11:01:29 +00:00
George Rimar 2172294141 [llvm-readobj] - Refine the LLVM-style output to be consistent.
Our LLVM-style output was inconsistent.
This patch changes the output in the following way:

SHT_GNU_verdef { -> VersionDefinitions [
SHT_GNU_verneed { -> VersionRequirements [
Version symbols [ -> VersionSymbols [
EH_FRAME Header [ -> EHFrameHeader {

Differential revision: https://reviews.llvm.org/D68636

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375095 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 10:23:48 +00:00
Oliver Stannard 3400920f53 Reland: Dead Virtual Function Elimination
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.

Original commit message:

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375094 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 09:58:57 +00:00
Mikhail Maltsev faeea2dc5e [Analysis] Don't assume that unsigned overflow can't happen in EmitGEPOffset (PR42699)
Summary:
Currently when computing a GEP offset using the function EmitGEPOffset
for the following instruction

  getelementptr inbounds i32, i32* %p, i64 %offs

we get

  mul nuw i64 %offs, 4

Unfortunately we cannot assume that unsigned wrapping won't happen
here because %offs is allowed to be negative.

Making such assumptions can lead to miscompilations: see the new test
test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine
would generate the following comparison:

   icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe

Whereas the correct value to compare with is -2.

This patch replaces the NUW flag with NSW in the multiplication
instructions generated by EmitGEPOffset and adjusts the test suite.

https://bugs.llvm.org/show_bug.cgi?id=42699

Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune

Reviewed By: lebedev.ri

Subscribers: reames, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68342

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375089 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 08:59:06 +00:00
Hans Wennborg 4531183453 Revert r374931 "[llvm-objdump] Use a counter for llvm-objdump -h instead of the section index."
This broke llvm-objdump in 32-bit builds, see e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10925

> Summary:
> When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).
>
> While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.
>
> Reviewers: grimar, jhenderson, espindola
>
> Reviewed By: grimar
>
> Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D68848

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375088 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 08:52:29 +00:00
Sam Parker d9a54d74ef [ARM][MVE] Change VPST to use, not def, VPR
Unlike VPT, VPST just uses the current value of VPR.P0.

Differential Revision: https://reviews.llvm.org/D69037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375087 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 08:46:31 +00:00
James Molloy 9d008e4cc9 [DFAPacketizer] Use DFAEmitter. NFC.
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.

After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375086 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 08:34:29 +00:00
Sam Parker 3a4bfa616e [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375085 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 07:55:55 +00:00
Eugene Leviant dfe2edfdcb [ThinLTO] Import virtual method with single implementation in hybrid mode
Differential revision: https://reviews.llvm.org/D68782


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375083 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-17 07:46:18 +00:00