Summary:
In SelectionDAG, when a store is immediately chained to another store
to the same address, elide the first store as it has no observable
effects. This is causes small improvements dealing with intrinsics
lowered to stores.
Test notes:
* Many testcases overwrite store addresses multiple times and needed
minor changes, mainly making stores volatile to prevent the
optimization from optimizing the test away.
* Many X86 test cases optimized out instructions associated with
associated with va_start.
* Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has
dependencies to check and can probably be removed and potentially
replaced with another test.
Reviewers: rnk, john.brawn
Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33206
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303198 91177308-0d34-0410-b5e6-96231b3b80d8
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.
Differential Revision: https://reviews.llvm.org/D32858
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303054 91177308-0d34-0410-b5e6-96231b3b80d8
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.
It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.
Differential Revision: https://reviews.llvm.org/D32857
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303053 91177308-0d34-0410-b5e6-96231b3b80d8
This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.
At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302782 91177308-0d34-0410-b5e6-96231b3b80d8
G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.
When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302778 91177308-0d34-0410-b5e6-96231b3b80d8
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302527 91177308-0d34-0410-b5e6-96231b3b80d8
The separated libcalls are implemented in terms of __divmodsi4 and __udivmodsi4
anyway, so we should always use them if possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302462 91177308-0d34-0410-b5e6-96231b3b80d8
This exposes a method in MachineFrameInfo that calculates
MaxCallFrameSize and calls it after instruction selection in the ARM
target.
This avoids
ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame()
giving different answers in early/late phases of codegen.
The testcase shows a particular nasty example result of that where we
would fail to properly align an alloca.
Differential Revision: https://reviews.llvm.org/D32622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302303 91177308-0d34-0410-b5e6-96231b3b80d8
- MIParser: If the successor list is not specified successors will be
added based on basic block operands in the block and possible
fallthrough.
- MIRPrinter: Adds a new `simplify-mir` option, with that option set:
Skip printing of block successor lists in cases where the
parser is guaranteed to reconstruct it. This means we still print the
list if some successor cannot be determined (happens for example for
jump tables), if the successor order changes or branch probabilities
being unequal.
Differential Revision: https://reviews.llvm.org/D31262
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302289 91177308-0d34-0410-b5e6-96231b3b80d8
Added the integer data processing intrinsics from ACLE v2.1 Chapter 9
but I have missed out the saturation_occurred intrinsics for now. For
the instructions that read and write the GE bits, a chain is included
and the only instruction that reads these flags (sel) is only
selectable via the implemented intrinsic.
Differential Revision: https://reviews.llvm.org/D32281
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302126 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change adds a new section to the xray-instrumented binary that
stores an index into ranges of the instrumentation map, where sleds
associated with the same function can be accessed as an array. At
runtime, we can get access to this index by function ID offset allowing
for selective patching and unpatching by function ID.
Each entry in this new section (xray_fn_idx) will include two pointers
indicating the start and one past the end of the sleds associated with
the same function. These entries will be 16 bytes long on x86 and
aarch64. On arm, we align to 16 bytes anyway so the runtime has to take
that into consideration.
__{start,stop}_xray_fn_idx will be the symbols that the runtime will
look for when we implement the selective patching/unpatching by function
id APIs. Because XRay synthesizes the function id's in a monotonically
increasing manner at runtime now, implementations (and users) can use
this table to look up the sleds associated with a specific function.
This is useful in implementations that want to do things like:
- Implement coverage mode for functions by patching everything
pre-main, then as functions are encountered, the installed handler
can unpatch the function that's been encountered after recording
that it's been called.
- Do "learning mode", so that the implementation can figure out some
statistical information about function calls by function id for a
time being, and then determine which functions are worth
uninstrumenting at runtime.
- Do "selective instrumentation" where an implementation can
specifically instrument only certain function id's at runtime
(either based on some external data, or through some other
heuristics) instead of patching all the instrumented functions at
runtime.
Reviewers: dblaikie, echristo, chandlerc, javed.absar
Subscribers: pelikan, aemerson, kpw, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D32693
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302109 91177308-0d34-0410-b5e6-96231b3b80d8
I was worried we might replace a mul with a mul+shift even if there were later
uses. Turns out to be unfounded but I'd just as well add an actual test for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302051 91177308-0d34-0410-b5e6-96231b3b80d8
When we replaced the multiplicand the destination node might already exist.
When that happens the original gets CSEd and deleted. However, it's actually
used as the offset so nonsense is produced.
Should fix PR32726.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301983 91177308-0d34-0410-b5e6-96231b3b80d8
I doubt anyone actually uses it, and I'm not even entirely convinced it exists
myself; but it is our default for "clang -arch armv6". Functionally, if it does
exist it's identical to the arm1176jz-f from LLVM's point of view (the
difference is apparently in the "Security Extensions").
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301962 91177308-0d34-0410-b5e6-96231b3b80d8
Fix a crash when trying to extend a value passed as a sign- or
zero-extended stack parameter. The cause of the crash was that we were
setting the size of the loaded value to 32 bits, and then tyring to
extend again to 32 bits.
This patch addresses the issue by also introducing a G_TRUNC after the
load. This will leave the unused bits to their original values set by
the caller, while being consistent about the types. For values that are
not extended, we just use a smaller load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301531 91177308-0d34-0410-b5e6-96231b3b80d8
Besides better codegen, the motivation is to be able to canonicalize this pattern
in IR (currently we don't) knowing that the backend is prepared for that.
This may also allow removing code for special constant cases in
DAGCombiner::foldSelectOfConstants() that was added in D30180.
Differential Revision: https://reviews.llvm.org/D31944
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301457 91177308-0d34-0410-b5e6-96231b3b80d8
For targets that don't have ISD::MULHS or ISD::SMUL_LOHI for the type
and the double width type is illegal, then the two operands are
sign extended to twice their size then multiplied to check for overflow.
The extended upper halves were mismatched causing an incorrect result.
This fixes the mismatch.
A test was added for ARM V6-M where the bug was detected.
Patch by James Duley.
Differential Revision: https://reviews.llvm.org/D31807
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301404 91177308-0d34-0410-b5e6-96231b3b80d8
We have to widen the operands to 32 bits and then we can either use
hardware division if it is available or lower to a libcall otherwise.
At the moment it is not enough to set the Legalizer action to
WidenScalar, since for libcalls it won't know what to do (it won't be
able to find what size to widen to, because it will find Libcall and not
Legal for 32 bits). To hack around this limitation, we request Custom
lowering, and as part of that we widen first and then we run another
legalizeInstrStep on the widened DIV.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301166 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for both targets with hardware division and without. For
hardware division we have to add support throughout the pipeline
(legalizer, reg bank select, instruction select). For targets without
hardware division, we only need to mark it as a libcall.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301164 91177308-0d34-0410-b5e6-96231b3b80d8
When selecting a G_CONSTANT to a MOVi, we need the value to be an Imm
operand. We used to just leave the G_CONSTANT operand unchanged, which
works in some cases (such as the GEP offsets that we create when
referring to stack slots). However, in many other places the G_CONSTANTs
are created with CImm operands. This patch makes sure to handle those as
well, and to error out gracefully if in the end we don't end up with an
Imm operand.
Thanks to Oliver Stannard for reporting this issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301162 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise there's some mismatch, and we'll either form an illegal type or an
illegal node.
Thanks to Eli Friedman for pointing out the problem with my original solution.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301036 91177308-0d34-0410-b5e6-96231b3b80d8
DAG combine was mistakenly assuming that the step-up it was looking at was
always a doubling, but it can sometimes be a larger extension in which case
we'd crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301002 91177308-0d34-0410-b5e6-96231b3b80d8
Select them as copies. We only select if both the source and the
destination are on the same register bank, so this shouldn't cause any
trouble.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300971 91177308-0d34-0410-b5e6-96231b3b80d8
The condition in isSupportedType didn't handle struct/array arguments
properly. Fix the check and add a test to make sure we use the fallback
path in this kind of situation. The test deals with some common cases
where the call lowering should error out. There are still some issues
here that need to be addressed (tail calls come to mind), but they can
be addressed in other patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300967 91177308-0d34-0410-b5e6-96231b3b80d8
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300904 91177308-0d34-0410-b5e6-96231b3b80d8
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.
Should fix PR32658.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300878 91177308-0d34-0410-b5e6-96231b3b80d8
Re-commit after revert in r300668. Changed getMaxFPOffset() to a
more conservative heuristic instead of trying to be clever and missing
for some exotic calling conventions.
We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.
rdar://31317893
Differential Revision: https://reviews.llvm.org/D31643
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300761 91177308-0d34-0410-b5e6-96231b3b80d8
Support G_MUL, very similar to G_ADD and G_SUB. The only difference is
in the instruction selector, where we have to select either MUL or MULv5
depending on the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300665 91177308-0d34-0410-b5e6-96231b3b80d8
In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.
This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.
If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.
This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.
The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.
Differential revision: https://reviews.llvm.org/D31812
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300547 91177308-0d34-0410-b5e6-96231b3b80d8
For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.
However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:
bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
: Subtarget->hasDivideInARMMode();
In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).
This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.
Differential Revision: https://reviews.llvm.org/D32005
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300536 91177308-0d34-0410-b5e6-96231b3b80d8
Use the same handling in the generic legalizer code as for the other
libcalls (G_FREM, G_FPOW).
Enable it on ARM for float and double so we can test it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299931 91177308-0d34-0410-b5e6-96231b3b80d8
A fix for the bug reported in PR30911.
The issue arises when multiple CALLSEQ_BEGIN nodes are unscheduled as
the last node to be unscheduled will gain access to the CallResource
register. But when a node is being picked, only CALLSEQ_END nodes are
checked against the CallResource and have their chains evaluated.
This then means that other CALLSEQ_BEGIN nodes can be scheduled
before the existing call sequence has been finalised. This patch adds
a check against the FrameSetup nodes in DelayForLiveRegs to prevent
this from happening.
Differential Revision: https://reviews.llvm.org/D31536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299926 91177308-0d34-0410-b5e6-96231b3b80d8
BIC is generally faster, and it can put the output in a different
register from the input.
We already do this in Thumb2 mode; not sure why the equivalent fix
never got applied to ARM mode.
Differential Revision: https://reviews.llvm.org/D31797
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299803 91177308-0d34-0410-b5e6-96231b3b80d8
It turns out -float-abi=hard doesn't set the hard float calling
convention for libcalls. We need to use a hard float triple instead
(e.g. gnueabihf).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299761 91177308-0d34-0410-b5e6-96231b3b80d8
Legalize to a libcall.
On this occasion, also start allowing soft float subtargets. For the
moment G_FREM is the only legal floating point operation for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299753 91177308-0d34-0410-b5e6-96231b3b80d8