The unused upper bits are guaranteed to be 0 so we don't need to worry about accidentally counting them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301091 91177308-0d34-0410-b5e6-96231b3b80d8
The later uses of dyn_castNotVal in this block are either
incomplete (doesn't handle vector constants) or overstepping
(shouldn't handle constants at all), but this first use is
just unnecessary. 'I' is obviously not a constant, and it
can't be a not-of-a-not because that would already be
instsimplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301088 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Some targets need to be able to do more complex rendering than just adding an
operand or two to an instruction. For example, it may need to insert an
instruction to extract a subreg first, or it may need to perform an operation
on the operand.
In SelectionDAG, targets would create SDNode's to achieve the desired effect
during the complex pattern predicate. This worked because SelectionDAG had a
form of garbage collection that would take care of SDNode's that were created
but not used due to a later predicate rejecting a match. This doesn't translate
well to GlobalISel and the churn was wasteful.
The API changes in this patch enable GlobalISel to accomplish the same thing
without the waste. The API is now:
InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const;
where Root is the root of the match. The return value can be omitted to
indicate that the predicate failed to match, or a function with the signature
ComplexRendererFn can be returned. For example:
return OptionalComplexRendererFn(
[=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); });
adds two immediate operands to the rendered instruction. Immed and ShVal are
captured from the predicate function.
As an added bonus, this also reduces the amount of information we need to
provide to GIComplexOperandMatcher.
Depends on D31418
Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar
Reviewed By: ab
Subscribers: dberris, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31761
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301079 91177308-0d34-0410-b5e6-96231b3b80d8
In dwo files the fixed offset can be used - if the dwos are linked into
a dwp, the dwo consumer must use the dwp tables to find out where the
original range of the debug_info was and resolve the "section relative"
value relative to that original range - effectively
avoiding/reimplementing the relocation handling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301072 91177308-0d34-0410-b5e6-96231b3b80d8
The bug was introduced by r301018 "[InstCombine] fadd double (sitofp x), y check that the promotion is valid". The patch didn't expect that fadd can be on vectors not necessarily scalars. Add vector support along with the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301070 91177308-0d34-0410-b5e6-96231b3b80d8
If a switch was in an unreachable block that branched
to a block with a phi, it would leave phis with missing
predecessors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301064 91177308-0d34-0410-b5e6-96231b3b80d8
Since Split DWARF needs to name the actual .dwo file that is generated,
it can't be known at the time the llvm::Module is produced as it may be
merged with other Modules before the object is generated and that object
may be generated with any name.
By passing the Split DWARF file name when LLVM is producing object code
the .dwo file name in the object file can match correctly.
The support for Split DWARF for implicit modules remains the same -
using metadata to store the dwo name and dwo id so that potentially
multiple skeleton CUs referring to different dwo files can be generated
from one llvm::Module.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301062 91177308-0d34-0410-b5e6-96231b3b80d8
The code assumed that when saving an additional CSR register
(ExtraCSSpill==true) we would have a free register throughout the
function. This was not true if this CSR register is also used to pass
values as in the swiftself case.
rdar://31451816
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301057 91177308-0d34-0410-b5e6-96231b3b80d8
There are two reasons why users might want to build libfuzzer:
- To fuzz LLVM itself
- To get the libFuzzer.a archive file, so that they can attach it to their code
This change always builds libfuzzer, and supports the second use case if the specified flag is set.
The point of this patch is to have something that can potentially be shipped with the compiler, and this also ensures that the version of libFuzzer is correct to use with that compiler.
Patch by George Karpenkov.
Differential Revision: https://reviews.llvm.org/D32096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301054 91177308-0d34-0410-b5e6-96231b3b80d8
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301047 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes leaving intermediate flat addressing computations
where a GEP instruction's source is a constant expression.
Still leaves behind a trivial addrspacecast + gep pair that
instcombine is able to handle, which ideally could be folded
here directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301044 91177308-0d34-0410-b5e6-96231b3b80d8
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.
Differential Revision: https://reviews.llvm.org/D32330
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301040 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise there's some mismatch, and we'll either form an illegal type or an
illegal node.
Thanks to Eli Friedman for pointing out the problem with my original solution.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301036 91177308-0d34-0410-b5e6-96231b3b80d8
SI_MASKED_UNREACHABLE does not have machine instruction encoding.
It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some
other pseudo instructions.
This patch fixes compilation failure of RadeonRays.
Differential Revision: https://reviews.llvm.org/D32364
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301025 91177308-0d34-0410-b5e6-96231b3b80d8
It seems we have on situation in a sanitizer enable bootstrap build
where the return instruction has a frame index operand that does not
point to a fixed object and fails the assert added here.
This reverts commit r300923.
This reverts commit r300922.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301024 91177308-0d34-0410-b5e6-96231b3b80d8
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.
This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301019 91177308-0d34-0410-b5e6-96231b3b80d8
There are two reasons why users might want to build libfuzzer:
- To fuzz LLVM itself
- To get the libFuzzer.a archive file, so that they can attach it to their code
This change always builds libfuzzer, and supports the second use case if the specified flag is set.
The point of this patch is to have something that can potentially be shipped with the compiler, and this also ensures that the version of libFuzzer is correct to use with that compiler.
Patch by George Karpenkov.
Differential Revision: https://reviews.llvm.org/D32096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301010 91177308-0d34-0410-b5e6-96231b3b80d8
Factor out the common code used for generating addresses into common
templated functions that call overloaded versions of a new function,
getTargetNode.
Tested with make check-llvm with targets AArch64.
Differential Revision: https://reviews.llvm.org/D32169
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301005 91177308-0d34-0410-b5e6-96231b3b80d8
DAG combine was mistakenly assuming that the step-up it was looking at was
always a doubling, but it can sometimes be a larger extension in which case
we'd crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301002 91177308-0d34-0410-b5e6-96231b3b80d8
Older compilers (e.g. LLVM 3.4) do not support the attribute target("popcnt").
In order to support those, this diff check the attribute support using the preprocessor.
Patch by George Karpenkov.
Differential Revision: https://reviews.llvm.org/D32311
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300999 91177308-0d34-0410-b5e6-96231b3b80d8
Currently sle and ule have to call slt/ult and eq to get the proper answer. This results in extra code for both calls and additional scans of multiword APInts.
This patch replaces slt/ult with a compareSigned/compare that can return -1, 0, or 1 so we can cover all the comparison functions with a single call.
While I was there I removed the activeBits calls and other checks at the start of the slow part of ult. Both of the activeBits calls potentially scan through each of the APInts separately. I can't imagine that's any better than just scanning them in parallel and doing the compares. Now we just share the code with tcCompare.
These changes seem to be good for about a 7-8k reduction on the size of the opt binary on my local x86-64 build.
Differential Revision: https://reviews.llvm.org/D32339
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300995 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
Reviewed By: rovka
Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D31418
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300993 91177308-0d34-0410-b5e6-96231b3b80d8
places based on it.
Existing constant hoisting pass will merge a group of contants in a small range
and hoist the const materialization code to the common dominator of their uses.
However, if the uses are all in cold pathes, existing implementation may hoist
the materialization code from cold pathes to a hot place. This may hurt performance.
The patch introduces BFI to the pass and selects the best insertion places based
on it.
The change is controlled by an option consthoist-with-block-frequency which is
off by default for now.
Differential Revision: https://reviews.llvm.org/D28962
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300989 91177308-0d34-0410-b5e6-96231b3b80d8
It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I
haven't worked out why. Reverting to make it green while I figure it out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300978 91177308-0d34-0410-b5e6-96231b3b80d8
Select them as copies. We only select if both the source and the
destination are on the same register bank, so this shouldn't cause any
trouble.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300971 91177308-0d34-0410-b5e6-96231b3b80d8
The condition in isSupportedType didn't handle struct/array arguments
properly. Fix the check and add a test to make sure we use the fallback
path in this kind of situation. The test deals with some common cases
where the call lowering should error out. There are still some issues
here that need to be addressed (tail calls come to mind), but they can
be addressed in other patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300967 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
Reviewed By: rovka
Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D31418
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300964 91177308-0d34-0410-b5e6-96231b3b80d8
when the subtarget has fast strings.
This has two advantages:
- Speed is improved. For example, on Haswell thoughput improvements increase
linearly with size from 256 to 512 bytes, after which they plateau:
(e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
- Code is much smaller (no need to handle boundaries).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300957 91177308-0d34-0410-b5e6-96231b3b80d8
This is splitted from D32228,
currently DWARF parsers code has few places that applied relocations values manually.
These places has similar duplicated code. Patch introduces separate method that can be
used to obtain relocated value. That helps to reduce code and simplifies things.
Differential revision: https://reviews.llvm.org/D32284
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300956 91177308-0d34-0410-b5e6-96231b3b80d8
- Mark an internal function static
- Remove the llvm namespace (just holding on to the `using namespace
llvm;` Works on My Machine(TM))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300947 91177308-0d34-0410-b5e6-96231b3b80d8
CodeExtractor looks up the dominator node corresponding to return blocks
when splitting them. If one of these blocks is unreachable, there's no
node in the Dom and CodeExtractor crashes because it doesn't check
for domtree node validity.
In theory, we could add just a check for skipping null DTNodes in
`splitReturnBlock` but the fix I propose here is slightly different. To the
best of my knowledge, unreachable blocks are irrelevant for the algorithm,
therefore we can just skip them when building the candidate set in the
constructor.
Differential Revision: https://reviews.llvm.org/D32335
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300946 91177308-0d34-0410-b5e6-96231b3b80d8
This should fix the bug https://bugs.llvm.org/show_bug.cgi?id=12906
To print the FP constant AsmWriter does the following:
1) convert FP value to String (actually using snprintf function which is locale dependent).
2) Convert String back to FP Value
3) Compare original and got FP values. If they are not equal just dump as hex.
The problem happens on the 2nd step when APFloat does not expect group delimiter or
fraction delimiter other than period symbol and so on, which can be produced on the
first step if LLVM library is used in an environment with corresponding locale set.
To fix this issue the locale independent APFloat:toString function is used.
However it prints FP values slightly differently than snprintf does. Specifically
it suppress trailing zeros in significant, use capital E and so on.
It results in 117 test failures during make check.
To avoid this I've also updated APFloat.toString a bit to pass make check at least.
Reviewers: sberg, bogner, majnemer, sanjoy, timshen, rnk
Reviewed By: timshen, rnk
Subscribers: rnk, llvm-commits
Differential Revision: https://reviews.llvm.org/D32276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300943 91177308-0d34-0410-b5e6-96231b3b80d8
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:
MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300940 91177308-0d34-0410-b5e6-96231b3b80d8
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300930 91177308-0d34-0410-b5e6-96231b3b80d8
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300928 91177308-0d34-0410-b5e6-96231b3b80d8
The demanded mask and the constant should always be the same width for all callers today.
Also stop copying the demanded mask as its passed in. We should avoid allocating memory unless we are going to do something. The final AND to create the new constant will take care of it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300927 91177308-0d34-0410-b5e6-96231b3b80d8
X86RegisterInfo::eliminateFrameIndex() and
X86FrameLowering::getFrameIndexReference() both had logic to compute the
base register. This consolidates the code.
Also use MachineInstr::isReturn instead of manually enumerating tail
call instructions (return instructions were not included in the previous
list because they never reference frame indexes).
Differential Revision: https://reviews.llvm.org/D32206
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300923 91177308-0d34-0410-b5e6-96231b3b80d8
AfterFPPop is used for tailcall/tailjump instructions. We shouldn't ever
have frame-pointer/base-pointer relative addressing for those. After all
the frame/base pointer should already be restored to their previous
values at the return.
Make this fact explicit in preparation for an upcoming refactoring.
Differential Revision: https://reviews.llvm.org/D32205
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300922 91177308-0d34-0410-b5e6-96231b3b80d8
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300913 91177308-0d34-0410-b5e6-96231b3b80d8
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300905 91177308-0d34-0410-b5e6-96231b3b80d8
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300904 91177308-0d34-0410-b5e6-96231b3b80d8
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.
Should fix PR32658.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300878 91177308-0d34-0410-b5e6-96231b3b80d8
Associate the version-when-defined with definitions of standard DWARF
constants. Identify the "vendor" for DWARF extensions.
Use this information to verify FORMs in .debug_abbrev are defined as
of the DWARF version specified in the associated unit.
Removed two tests that had specified DWARF v1 (which essentially does
not exist).
Differential Revision: http://reviews.llvm.org/D30785
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300875 91177308-0d34-0410-b5e6-96231b3b80d8
This enables use after free and uninit memory checking for memory
returned by a recycler. SelectionDAG currently relies on the opcode of a
free'd node being ISD::DELETED_NODE, so poke a hole in the asan poison
for SDNode opcodes. This means that we won't find some issues, but only
in SDag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300868 91177308-0d34-0410-b5e6-96231b3b80d8
This will become asan errors once the patch lands that poisons the
memory after free. The x86 change is a hack, but I don't see how to
solve this properly at the moment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300867 91177308-0d34-0410-b5e6-96231b3b80d8
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300864 91177308-0d34-0410-b5e6-96231b3b80d8
Have the AttributeList overload delegate to the AttrBuilder one.
Simplify the AttrBuilder overload by avoiding getSlotAttributes, which
creates temporary AttributeLists.
Simplify `AttrBuilder::removeAttributes(AttributeList, unsigned)` by
using getAttributes instead of manually iterating over slots.
Extracted from https://reviews.llvm.org/D32262
NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300863 91177308-0d34-0410-b5e6-96231b3b80d8
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300856 91177308-0d34-0410-b5e6-96231b3b80d8
We started with zero-based params and switched to one-based locals...
Also, variables start with a capital and functions do not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300854 91177308-0d34-0410-b5e6-96231b3b80d8
This question comes up in many places in SimplifyDemandedBits. This makes it easy to ask without allocating additional temporary APInts.
The BitVector class provides a similar functionality through its (IMHO badly named) test(const BitVector&) method. Though its output polarity is reversed.
I've provided one example use case in this patch. I plan to do more as a follow up.
Differential Revision: https://reviews.llvm.org/D32258
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300851 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we don't explicitly process ConstantDataSequential, ConstantAggregateZero, or ConstantVector, or Undef before applying the Depth limit. Instead they occur after the depth check in the non-instruction path.
For the constant types that we do handle, the code is replicated from computeKnownBits.
This patch fixes the missing constant handling and the reduces the amount of code by just using computeKnownBits directly for any type of Constant.
Differential Revision: https://reviews.llvm.org/D32123
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300849 91177308-0d34-0410-b5e6-96231b3b80d8
Masked vectors which hold shift amounts when creating the following nodes:
ISD::SHL, ISD::SRL or ISD::SRA.
Instructions that use said nodes, which have had their arguments altered are
sll, srl, sra, bneg, bclr and bset.
For said instructions, the shift amount or the bit position that is
specified in the corresponding vector elements will be interpreted as the
shift amount/bit position modulo the size of the element in bits.
The problem lies in compiling with -O2 enabled, where the instructions for
formats .w and .d are not generated, but are instead optimized away.
In this case, having shift amounts that are either negative or greater than
the element bit size results in generation of incorrect results when
constant folding.
We remedy this by masking the operands for the nodes mentioned above before
actually creating them, so that the final result is correct before placed
into the constant pool.
Patch by Stefan Maksimovic.
Differential Revision: https://reviews.llvm.org/D31331
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300839 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a few helper functions to obtain new vector
value types based on existing ones without needing to care
about whether they are scalable or not.
I've confined their use to a few common locations right now,
and targets that don't have scalable vectors should never
need to care about these.
Patch by Graham Hunter.
Differential Revision: https://reviews.llvm.org/D32017
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300838 91177308-0d34-0410-b5e6-96231b3b80d8
ChangeSection incorrectly registers LastEMSInfo as belonging to the previous
section, not the current section. This happens to work when changing sections
using .section, as the previous section is set to the current section before
the call to ChangeSection, but not when using .popsection.
Differential Revision: https://reviews.llvm.org/D32225
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300831 91177308-0d34-0410-b5e6-96231b3b80d8
Currently fmov #0 with a vector destination is handle incorrectly and results in
fmov #-1.9375 being emitted but should instead give an error. This is due to the
way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so
fix this by actually doing it through an alias.
Differential Revision: https://reviews.llvm.org/D31949
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300830 91177308-0d34-0410-b5e6-96231b3b80d8
When an integer is used as an fp immediate we're failing to check the return
value of getFP64Imm, so invalid values are silently permitted. Fix this by
merging together the integer and real handling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300828 91177308-0d34-0410-b5e6-96231b3b80d8
The hardware div feature refers only to Thumb, but because of its name
it is tempting to use it to check for hardware division in general,
which may cause problems in ARM mode. See https://reviews.llvm.org/D32005.
This patch adds "Thumb" to its name, to make its scope clear. One
notable place where I haven't made the change is in the feature flag
(used with -mattr), which is still hwdiv. Changing it would also require
changes in a lot of tests, including clang tests, and it doesn't seem
like it's worth the effort.
Differential Revision: https://reviews.llvm.org/D32160
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300827 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This is a simple question we should be able to answer without creating a temporary to hold the AND result. We can also get an early out as soon as we find a word that intersects.
Reviewers: RKSimon, hans, spatel, davide
Reviewed By: hans, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32253
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300812 91177308-0d34-0410-b5e6-96231b3b80d8
- introduced in r300522 and found via the Swift LLDB testsuite.
The fix is to set the location kind to memory whenever an FrameIndex
location is emitted.
rdar://problem/31707602
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300793 91177308-0d34-0410-b5e6-96231b3b80d8
- introduced in r300522 and found via the Swift LLDB testsuite.
The fix is to set the location kind to memory whenever an FrameIndex
location is emitted.
rdar://problem/31707602
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300790 91177308-0d34-0410-b5e6-96231b3b80d8
There are two reasons why users might want to build libfuzzer:
- To fuzz LLVM itself
- To get the libFuzzer.a archive file, so that they can attach it to their code
This change always builds libfuzzer, and supports the second use case if the specified flag is set.
The point of this patch is to have something that can potentially be shipped with the compiler, and this also ensures that the version of libFuzzer is correct to use with that compiler.
Differential Revision: https://reviews.llvm.org/D32096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300789 91177308-0d34-0410-b5e6-96231b3b80d8
This change is correct because the verifier requires that at most one
argument be marked 'sret'.
NFC, removes a use of AttributeList slot APIs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300784 91177308-0d34-0410-b5e6-96231b3b80d8
Re-commit after revert in r300668. Changed getMaxFPOffset() to a
more conservative heuristic instead of trying to be clever and missing
for some exotic calling conventions.
We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.
rdar://31317893
Differential Revision: https://reviews.llvm.org/D31643
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300761 91177308-0d34-0410-b5e6-96231b3b80d8
This is preparation for a clang change to improve the [[nodiscard]] warning to not be ignored on methods that return a class marked [[nodiscard]] that are defined in the class itself. See D32207.
We should consider adding wrapper methods to APInt that return the overflow flag directly and discard the APInt result. This would eliminate the void casts and the need to create a bool before the call to pass to the out param.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300758 91177308-0d34-0410-b5e6-96231b3b80d8
Move the BFI logic to computeKnownBitsForTargetNode, and delete
the redundant CMOV logic.
This is intended as a cleanup, but it's probably possible to construct
a case where moving the BFI logic allows more combines.
Differential Revision: https://reviews.llvm.org/D31795
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300752 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In the current implementation, to find inline stack for an address incurs expensive linear search in 2 places:
* linear search for the top-level DIE
* recursive linear traverse the DIE tree to find the path to the leaf DIE
In this patch, a map is built from address to its corresponding leaf DIE. The inline stack is built by traversing from the leaf DIE up to the root DIE. This speeds up batch symbolization by ~10X without noticible memory overhead.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300742 91177308-0d34-0410-b5e6-96231b3b80d8
This is inserted directly in the text section. The relocation
for the function ends up resolving to the beginning of the
amd_kernel_code_t header rather than the actual function
entry point.
Also skip some of the comments for initialization
that only makes sense for kernels.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300736 91177308-0d34-0410-b5e6-96231b3b80d8
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.
This produces nicer to read structurizer output.
This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300732 91177308-0d34-0410-b5e6-96231b3b80d8
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in
SimplifyDemandedBits().
Some notes for reference:
1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions.
Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since
they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards
improving the vector codegen in that patch without writing any actual new code.
Differential Revision: https://reviews.llvm.org/D32230
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300725 91177308-0d34-0410-b5e6-96231b3b80d8
This should simplify the call sites, which typically want to tweak one
attribute at a time. It should also avoid creating ephemeral
AttributeLists that live forever.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300718 91177308-0d34-0410-b5e6-96231b3b80d8
This patch simplifies the examples from D31509 and D31927 (PR30630) and catches
the basic identity shuffle tests that Zvi recently added.
I'm not sure if we have something like this in DAGCombiner, but we should?
It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment.
We might want to bump that up if there are longer shuffle chains like this in the wild.
For now, we're ignoring shuffles that have undef mask elements because it's not
clear how those should be handled.
Differential Revision: https://reviews.llvm.org/D31960
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300714 91177308-0d34-0410-b5e6-96231b3b80d8
Also, make a few changes to allow using the pass in .mir testcases.
Among other things, change the abbreviation from opt-amode to amode-opt,
because otherwise lit would expand the "opt" part to the full path to
the opt binary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300707 91177308-0d34-0410-b5e6-96231b3b80d8
The list has a single element 75+% of the time, reservation of 4 elements
is sufficient in 95% of cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300705 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In the current implementation, to find inline stack for an address incurs expensive linear search in 2 places:
* linear search for the top-level DIE
* recursive linear traverse the DIE tree to find the path to the leaf DIE
In this patch, a map is built from address to its corresponding leaf DIE. The inline stack is built by traversing from the leaf DIE up to the root DIE. This speeds up batch symbolization by ~10X without noticible memory overhead.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300697 91177308-0d34-0410-b5e6-96231b3b80d8
Support G_MUL, very similar to G_ADD and G_SUB. The only difference is
in the instruction selector, where we have to select either MUL or MULv5
depending on the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300665 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300664 91177308-0d34-0410-b5e6-96231b3b80d8
This showed up in r300535/r300537, which were reverted in r300538 due to
some of the introduced tests in there failing on some bots, due to the
non-determinism fixed in this commit.
Re-committing r300535/r300537 will add 2 tests for the change in this
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300663 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).
Reviewers: efriedma, sanjoy
Reviewed By: sanjoy
Subscribers: dberlin, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D30869
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300657 91177308-0d34-0410-b5e6-96231b3b80d8
In r300196 several methods were added to TarfetInstrInfo to access
data stored with call frame setup/destroy instructions. This change
replaces calls to getOperand with calls to such special methods in
ARM target.
Differential Revision: https://reviews.llvm.org/D32127
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300655 91177308-0d34-0410-b5e6-96231b3b80d8
The 'addAttributes(unsigned, AttrBuilder)' overload delegated to 'get'
instead of 'addAttributes'.
Since we can implicitly construct an AttrBuilder from an AttributeSet,
just standardize on AttrBuilder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300651 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
VersionPrinter by default outputs information about the Host CPU
and Default target. Printing this information requires linking in
a large amount of data, such as supported target triples as C
strings, which in turn bloats the binary size.
Enable a new CMake option LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO
which controls printing of the host and target info. This allows
the target triple names to be dead-code stripped. This is a nice
win for LLVM clients that wish to minimize their binary size, such
as graphics drivers.
By default this is ON, so there is no change in the default behavior.
Clients who wish to suppress this printing can do so by setting this
option to off via CMake.
A test app on Linux that uses ParseCommandLineOptions() shows a binary
size reduction of 23KB (from 149K to 126K) for a Release build, and 24KB
(from 135K to 111K) in a MinSizeRel build.
Reviewers: klimek, beanz, bogner, chandlerc, compnerd
Reviewed By: compnerd
Patch by pammon (Peter Ammon) !
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300630 91177308-0d34-0410-b5e6-96231b3b80d8
We were creating an APInt at the top of these methods that isn't always returned. For ranges wider than 64-bits this results in an allocation and deallocation when its not used.
In getSignedMax we were creating Upper-1 to use in a compare and then creating it again for a return value. The compiler is unable to determine that these can be shared. So help it out and create the Upper-1 in a temporary that can be reused.
This provides a little compile time improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300621 91177308-0d34-0410-b5e6-96231b3b80d8
BasicAA wants to know if a function is either a malloc or calloc like function. Currently we have to check both separately. This means both calls check if its an intrinsic, query TLI, check the nobuiltin attribute, scan the AllocationFnData, etc.
This patch adds a isMallocOrCallocLikeFn so we can go through all of the checks once per call.
This also changes the one other location I saw that called both together.
Differential Revision: https://reviews.llvm.org/D32188
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300608 91177308-0d34-0410-b5e6-96231b3b80d8
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.
Differential revision: https://reviews.llvm.org/D32065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300574 91177308-0d34-0410-b5e6-96231b3b80d8
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300566 91177308-0d34-0410-b5e6-96231b3b80d8
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.
Reviewers: jyknight, bogner, javed.absar, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300561 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is part of D28975's breakdown.
Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.
Differential Revision: https://reviews.llvm.org/D32054
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300557 91177308-0d34-0410-b5e6-96231b3b80d8
In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.
This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.
If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.
This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.
The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.
Differential revision: https://reviews.llvm.org/D31812
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300547 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, we always called method 'findCalleeFunctionSamples()' on
intrinsic calls. However, intrinsic calls like llvm.dbg.value() are not viable
candidates for obvious reasons.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D32008
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300541 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:
remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)
Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300538 91177308-0d34-0410-b5e6-96231b3b80d8
For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.
However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:
bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
: Subtarget->hasDivideInARMMode();
In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).
This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.
Differential Revision: https://reviews.llvm.org/D32005
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300536 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300535 91177308-0d34-0410-b5e6-96231b3b80d8
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
- describe a variable in a register
- consist of only a DW_OP_reg
2. Memory location descriptions
- describe the address of a variable
3. Implicit location descriptions
- describe the value of a variable
- end with DW_OP_stack_value & friends
The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident. This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.
This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.
There are two major changes in this patch:
I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.
When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8
DBG_VALUE, RAX --> DW_OP_reg0 +0
DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0
All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.
https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>
Differential Revision: https://reviews.llvm.org/D31439
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300522 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of storing an UncommonIndex on the Symbol, use a flag bit to store
whether the Symbol has an Uncommon. This shrinks Chromium's .bc files (after
D32061) by about 1%.
Differential Revision: https://reviews.llvm.org/D32070
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300514 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.
Reviewers: davidxl, dnovillo, tejohnson
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31952
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300507 91177308-0d34-0410-b5e6-96231b3b80d8
The use list is a linked list so getNumUses requires a linear scan through the whole list. hasNUses will stop scanning at N and see if that is the end.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300505 91177308-0d34-0410-b5e6-96231b3b80d8
This merges the two different multiword shift right implementations into a single version located in tcShiftRight. lshrInPlace now calls tcShiftRight for the multiword case.
I retained the memmove fast path from lshrInPlace and used a memset for the zeroing. The for loop is basically tcShiftRight's implementation with the zeroing and the intra-shift of 0 removed.
Differential Revision: https://reviews.llvm.org/D32114
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300503 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to
paramHasAttr(i).
Add more tests to WebAssemblyOptimizeReturned that catch that
regression.
Reviewers: dschuff
Subscribers: jfb, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D32136
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300502 91177308-0d34-0410-b5e6-96231b3b80d8
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.
OK'ed by Sanjay/Craig. Fixes PR32686.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300495 91177308-0d34-0410-b5e6-96231b3b80d8
the exponential behavior.
The patch is to fix PR32043. Functions getZeroExtendExpr and getSignExtendExpr
may call themselves recursively more than once. This is potentially a 2^N
complexity behavior. The exponential behavior was not commonly exposed before
because of existing global cache mechnism like UniqueSCEVs or some early return
mechanism when flags FlagNSW or FlagNUW are seen. However, we still have case
which can expose the exponential behavior, like the case in PR32043, so we add
a local cache in getZeroExtendExpr and getSignExtendExpr. If the input of the
functions -- SCEV and type pair have been seen before, we can find the extended
expression directly in the local cache.
Differential Revision: https://reviews.llvm.org/D30350
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300494 91177308-0d34-0410-b5e6-96231b3b80d8
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.
Also fixes some counting errors with flat_scr when there
are no stack objects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300482 91177308-0d34-0410-b5e6-96231b3b80d8
While the incoming stack for a kernel is 256-byte aligned,
this refers to the base address of the entire wave. This isn't
useful information for most of codegen. Fixes unnecessarily
aligning stack objects in callees.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300481 91177308-0d34-0410-b5e6-96231b3b80d8
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.
This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.
Differential Revision: https://reviews.llvm.org/D32126
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300480 91177308-0d34-0410-b5e6-96231b3b80d8
Currently we use getTypeSizeInBits which contains a switch statement to dispatch based on what the Type is. We know we always have a pointer type here, but the compiler isn't able to figure out that out to remove the switch.
This patch changes it to just call handle the pointer type directly by calling getPointerSizeInBits without going through a switch.
getPointerTypeSizeInBits is called pretty often, particularly by getOrEnforceKnownAlignment which is used by InstCombine. This should speed that up a little bit.
Differential Revision: https://reviews.llvm.org/D31841
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300475 91177308-0d34-0410-b5e6-96231b3b80d8
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300474 91177308-0d34-0410-b5e6-96231b3b80d8
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.
This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.
On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html
Differential Revision: https://reviews.llvm.org/D31838
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300464 91177308-0d34-0410-b5e6-96231b3b80d8
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300462 91177308-0d34-0410-b5e6-96231b3b80d8
This makes statements like KnownZero.isNegative() (which means the value we're tracking is positive) less confusing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300457 91177308-0d34-0410-b5e6-96231b3b80d8
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300453 91177308-0d34-0410-b5e6-96231b3b80d8