`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.
Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.
This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.
This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.
As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]
Differential Revision: https://reviews.llvm.org/D50680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339740 91177308-0d34-0410-b5e6-96231b3b80d8
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.
To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)
Differential Revision: https://reviews.llvm.org/D50667
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339734 91177308-0d34-0410-b5e6-96231b3b80d8
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339676 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.
For example:
```
extern int bar(int[]);
int foo(int i) {
int a[i]; // VLA
asm volatile(
"mov r7, #1"
:
:
: "r7"
);
return 1 + bar(a);
}
```
Compiled for thumb, this gives:
```
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r5, r6, r7, lr}
push {r4, r5, r6, r7, lr}
.setfp r7, sp, #12
add r7, sp, #12
.pad #4
sub sp, #4
movs r1, #7
add.w r0, r1, r0, lsl #2
bic r0, r0, #7
sub.w r0, sp, r0
mov sp, r0
@APP
mov.w r7, #1
@NO_APP
bl bar
adds r0, #1
sub.w r4, r7, #12
mov sp, r4
pop {r4, r5, r6, r7, pc}
...
```
r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.
This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.
The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.
If we find a reserved register, we print a warning:
```
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
"mov r7, #1"
^
```
Reviewers: eli.friedman, olista01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: efriedma, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D49727
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339257 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The accelerator tables use the debug_str section to store their strings.
However, they do not support the indirect method of access that is
available for the debug_info section (DW_FORM_strx et al.).
Currently our code is assuming that all strings can/will be referenced
indirectly, and puts all of them into the debug_str_offsets section.
This is generally true for regular (unsplit) dwarf, but in the DWO case,
most of the strings in the debug_str section will only be used from the
accelerator tables. Therefore the contents of the debug_str_offsets
section will be largely unused and bloating the main executable.
This patch rectifies this by teaching the DwarfStringPool to
differentiate between strings accessed directly and indirectly. When a
user inserts a string into the pool it has to declare whether that
string will be referenced directly or not. If at least one user requsts
indirect access, that string will be assigned an index ID and put into
debug_str_offsets table. Otherwise, the offset table is skipped.
This approach reduces the overall binary size (when compiled with
-gdwarf-5 -gsplit-dwarf) in my tests by about 2% (debug_str_offsets is
shrunk by 99%).
Reviewers: probinson, dblaikie, JDevlieghere
Subscribers: aprantl, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D49493
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339122 91177308-0d34-0410-b5e6-96231b3b80d8
This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators.
It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV.
Differential Revision: https://reviews.llvm.org/D49248
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339121 91177308-0d34-0410-b5e6-96231b3b80d8
Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339101 91177308-0d34-0410-b5e6-96231b3b80d8
getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339096 91177308-0d34-0410-b5e6-96231b3b80d8
Add a parameter for testing specifically for
sNaNs - at least one instruction pattern on AMDGPU
needs to check specifically for this.
Also handle more cases, and add a target hook
for custom nodes, similar to the hooks for known
bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338910 91177308-0d34-0410-b5e6-96231b3b80d8
First step towards a BuildSDIV equivalent to D49248 for non-uniform vector support - this just pushes the splat detection down into TargetLowering::BuildSDIV where its still used.
Differential Revision: https://reviews.llvm.org/D50185
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338838 91177308-0d34-0410-b5e6-96231b3b80d8
There are two forms for label debug information in DWARF format.
1. Labels in a non-inlined function:
DW_TAG_label
DW_AT_name
DW_AT_decl_file
DW_AT_decl_line
DW_AT_low_pc
2. Labels in an inlined function:
DW_TAG_label
DW_AT_abstract_origin
DW_AT_low_pc
We will collect label information from DBG_LABEL. Before every DBG_LABEL,
we will generate a temporary symbol to denote the location of the label.
The symbol could be used to get DW_AT_low_pc afterwards. So, we create a
mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase.
The DBG_LABEL in the mapping is used to query the symbol before it.
The AbstractLabels in DwarfCompileUnit is used to process labels in inlined
functions.
We also keep a mapping between scope and labels in DwarfFile to help to
generate correct tree structure of DIEs.
It also generates label debug information under global isel.
Differential Revision: https://reviews.llvm.org/D45556
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338390 91177308-0d34-0410-b5e6-96231b3b80d8
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338329 91177308-0d34-0410-b5e6-96231b3b80d8
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:
f1:
...
bl OUTLINED_FUNCTION
...
f2:
...
move LR's contents to a register
bl OUTLINED_FUNCTION
move the register's contents back
instead of falling back to saving LR in both cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338278 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the need for an assert to ensure the pointer isn't null.
Years ago we had ifs the checked the pointer was non-null before very access to the vector. These checks were removed and replaced with a single assert. But a reference seems more suitable here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338205 91177308-0d34-0410-b5e6-96231b3b80d8
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338194 91177308-0d34-0410-b5e6-96231b3b80d8
There was a missing check for if a candidate list was entirely deleted. This
adds that check.
This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll
with the MachineOutliner enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338148 91177308-0d34-0410-b5e6-96231b3b80d8
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.
The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).
This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.
One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.
Testing:
- check-llvm, and an end-to-end test using lldb to debug an optimized
program.
- Existing unit tests for DIExpression::appendToStack fully cover the
new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)
Differential Revision: https://reviews.llvm.org/D49454
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338069 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
NVPTX target dos not use register-based frame information. Instead it
relies on the artificial local_depot that is used instead of the frame
and the data for variables must be emitted relatively to this
local_depot.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45963
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338039 91177308-0d34-0410-b5e6-96231b3b80d8
- Remove unnecessary anchor function
- Remove unnecessary override of getAnalysisUsage
- Use reference instead of pointers where things cannot be nullptr
- Use ArrayRef instead of std::vector where possible
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337989 91177308-0d34-0410-b5e6-96231b3b80d8
- Avoid duplication of regmask size calculation.
- Simplify allocateRegisterMask() call.
- Rename allocateRegisterMask() to allocateRegMask() to be consistent
with naming in MachineOperand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337986 91177308-0d34-0410-b5e6-96231b3b80d8
Just some gardening here.
Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.
Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337848 91177308-0d34-0410-b5e6-96231b3b80d8
When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location.
Fixes PR34169
Patch by Caroline Tice!
Differential Revision: https://reviews.llvm.org/D49434
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337847 91177308-0d34-0410-b5e6-96231b3b80d8
Before this, TCI contained all the call information for each Candidate.
This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.
A follow-up patch for the AArch64 outliner will demonstrate this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337840 91177308-0d34-0410-b5e6-96231b3b80d8
Just some simple gardening to improve clarity.
Before, we had something along the lines of
1) Create a std::vector of Candidates
2) Create an OutlinedFunction
3) Create a std::vector of pointers to Candidates
4) Copy those over to the OutlinedFunction and the Candidate list
Now, OutlinedFunctions create the Candidate pointers. They're still copied
over to the main list of Candidates, but it makes it a bit clearer what's
going on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337838 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Each of the four methods had a dozen lines and was doing almost exactly
the same thing: get the appropriate accelerator table kind and insert an
entry into it. I move this common logic to a helper function and make
these methods delegate to it.
This came up in the context of D49493, where I've needed to make adding
a string to a string pool slightly more complicated, and it seemed to
make sense to do it in one place instead of five.
To make this work I've needed to unify the interface of the AccelTable
data types, as some used to store DIE& and others DIE*. I chose to unify
to a reference as that's what the caller uses.
This technically isn't NFC, because it changes the StringPool used for
apple tables in the DWO case (now it uses the main file like DWARF v5
instead of the DWO file). However, that shouldn't matter, as DWO is not
a thing on apple targets (clang frontend simply ignores -gsplit-dwarf).
Reviewers: JDevlieghere, aprantl, probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49542
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337562 91177308-0d34-0410-b5e6-96231b3b80d8
For dsymutil we want to store offsets in the accelerator table entries
rather than DIE pointers. In addition, we need a way to communicate
which CU a DIE belongs to. This patch provides support for both of these
issues.
Differential revision: https://reviews.llvm.org/D49102
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337158 91177308-0d34-0410-b5e6-96231b3b80d8
Spectre variant #1 for x86.
There is a lengthy, detailed RFC thread on llvm-dev which discusses the
high level issues. High level discussion is probably best there.
I've split the design document out of this patch and will land it
separately once I update it to reflect the latest edits and updates to
the Google doc used in the RFC thread.
This patch is really just an initial step. It isn't quite ready for
prime time and is only exposed via debugging flags. It has two major
limitations currently:
1) It only supports x86-64, and only certain ABIs. Many assumptions are
currently hard-coded and need to be factored out of the code here.
2) It doesn't include any options for more fine-grained control, either
of which control flow edges are significant or which loads are
important to be hardened.
3) The code is still quite rough and the testing lighter than I'd like.
However, this is enough for people to begin using. I have had numerous
requests from people to be able to experiment with this patch to
understand the trade-offs it presents and how to use it. We would also
like to encourage work to similar effect in other toolchains.
The ARM folks are actively developing a system based on this for
AArch64. We hope to merge this with their efforts when both are far
enough along. But we also don't want to block making this available on
that effort.
Many thanks to the *numerous* people who helped along the way here. For
this patch in particular, both Eric and Craig did a ton of review to
even have confidence in it as an early, rough cut at this functionality.
Differential Revision: https://reviews.llvm.org/D44824
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336990 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336964 91177308-0d34-0410-b5e6-96231b3b80d8
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336929 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.
I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D40092
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336589 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.
As discussed later, that was worse at least for the code size,
and potentially for the performance, too.
https://rise4fun.com/Alive/Zmpl
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: spatel
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D48768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336585 91177308-0d34-0410-b5e6-96231b3b80d8
When emitting the DWARF accelerator tables from dsymutil, we don't have
a DwarfDebug instance and we use a custom class to represent Dwarf
compile units. This patch adds an interface AccelTableWriterInfo to
abstract these from the Dwarf5AccelTableWriter, so we can have a custom
implementation for this in dsymutil.
Differential revision: https://reviews.llvm.org/D49031
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336529 91177308-0d34-0410-b5e6-96231b3b80d8