Requires callers to directly associate relocations with a DataExtractor
used to read data from a DWARF section, which helps a callee not make
assumptions about which section it is reading.
This is the next step in reducing DWARFFormValue's dependence on DWARFUnit.
Differential Revision: https://reviews.llvm.org/D34704
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306699 91177308-0d34-0410-b5e6-96231b3b80d8
In LLVM IR the following code:
%r = urem <ty> %t, %b
is equivalent to:
%q = udiv <ty> %t, %b
%s = mul <ty> nuw %q, %b
%r = sub <ty> nuw %t, %q ; (t / b) * b + (t % b) = t
As UDiv, Mul and Sub are already supported by SCEV, URem can be
implemented with minimal effort this way.
Note: While SRem and SDiv are also related this way, SCEV does not
provides SDiv yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306695 91177308-0d34-0410-b5e6-96231b3b80d8
Relanding after restricting equalBaseIndex to not erroneuosly consider
a FrameIndices stemming from alloca from being comparable as its
offset is set post-selectionDAG.
Pull FrameIndex comparision reasoning from DAGCombiner::isAlias to
general BaseIndexOffset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306688 91177308-0d34-0410-b5e6-96231b3b80d8
For networking-type bpf program, it often needs to access
packet data. A context data structure is provided to the bpf
programs with two fields:
u32 data;
u32 data_end;
User can access these two fields with ctx->data and ctx->data_end.
During program verification process, the kernel verifier modifies
the bpf program with loading of actual pointer value from kernel
data structure.
r = ctx->data ===> r = actual data start ptr
r = ctx->data_end ===> r = actual data end ptr
A typical program accessing ctx->data like
char *data_ptr = (char *)(long)ctx->data
will result in a 32-bit load followed by a zero extension.
Such an operation is combined into a single LDW in DAG combiner
as bpf LDW does zero extension automatically.
In cases like the below (which can be a result of global value numbering
and partial redundancy elimination before insn selection):
B1:
u32 a = load-32-bit &ctx->data
u64 pa = zext a
...
B2:
u32 b = load-32-bit &ctx->data
u64 pb = zext b
...
B3:
u32 m = PHI(a, b)
u64 pm = zext m
In B3, "pm = zext m" cannot be removed, which although is legal
from compiler perspective, will generate incorrect code after
kernel verification.
This patch recognizes this pattern and traces through PHI node
to see whether the operand of "zext m" is defined with LDWs or not.
If it is, the "zext m" itself can be removed.
The patch also recognizes the pattern where the load and use of
the load value not in the same basic block, where truncate operation
may be removed as well.
The patch handles 1-byte, 2-byte and 4-byte truncation.
Two test cases are added to verify the transformation happens properly
for the above code pattern.
Signed-off-by: Yonghong Song <yhs@fb.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306685 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed.
Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP.
Differential Revision: https://reviews.llvm.org/D34347
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306678 91177308-0d34-0410-b5e6-96231b3b80d8
Because of mistake introduced in r306517,
wrong variable ("name" instead of "Name") was used
in error message.
As a result it reported section name instead of
relocation name.
This file still needs cleanup to match LLVM coding style
and more tests I think.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306677 91177308-0d34-0410-b5e6-96231b3b80d8
The changes are a result of discussion of https://reviews.llvm.org/D33685.
It solves the following problem:
1. We can inform getGEPCost about simplified indices to help it with
calculating the cost. But getGEPCost does not take into account the
context which GEPs are used in.
2. We have getUserCost which can take the context into account but we cannot
inform about simplified indices.
With the changes getUserCost will have access to additional information
as getGEPCost has.
The one parameter getUserCost is also provided.
Differential Revision: https://reviews.llvm.org/D34057
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306674 91177308-0d34-0410-b5e6-96231b3b80d8
The difference from the previous version is the use of decltype, as the
implementation of std::result_of in libc++ did not work correctly for
variadic function like open(2).
Original summary:
This function retries an operation if it was interrupted by a signal
(failed with EINTR). It's inspired by the TEMP_FAILURE_RETRY macro in
glibc, but I've turned that into a template function. I've also added a
fail-value argument, to enable the function to be used with e.g.
fopen(3), which is documented to fail for any reason that open(2) can
fail (which includes EINTR).
The main user of this function will be lldb, but there were also a
couple of uses within llvm that I could simplify using this function.
Reviewers: zturner, silvas, joerg
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D33895
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306671 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665
Reviewers: qcolombet, t.p.northover, zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, kristof.beyls, guyblank, llvm-commits
Differential Revision: https://reviews.llvm.org/D33958
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306665 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
TBB and THH allow using a Thumb GPR or the PC as destination operand.
A few machine verifier failures where due to those instructions not
expecting PC as destination operand.
Add -verify-machineinstrs to test/CodeGen/ARM/jump-table-tbh.ll to add
test coverage even if expensive checks are disabled.
Reviewers: MatzeB, t.p.northover, jmolloy
Reviewed By: MatzeB
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34610
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306654 91177308-0d34-0410-b5e6-96231b3b80d8
Examining a large profile example, it seems relatively few records have
non-empty IndirectCall and MemOP data, so indirecting these through a
unique_ptr (non-null only when they are non-empty) Reduces memory usage
on this particular example from 14GB to 10GB according to valgrind's
massif.
I suspect it'd still be worth moving InstrProfWriter to its own data
structure that had Counts and the indirected IndirectCall+MemOP, and did
not include the Name, Hash, or Error fields. This would reduce the size
of this dominant data structure by half of this new, lower amount.
(Name(2), Hash(1), Error(1) ~= Counts(vector, 3), ValueProfData
(unique_ptr, 1))
-> From code review feedback, might actually refactor InstrProfRecord
itself to have a sub-struct with all the counts, and use that from
InstrProfWriter, rather than InstrProfWriter owning its own data
structure for this.
Reviewers: davidxl
Differential Revision: https://reviews.llvm.org/D34694
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306631 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit d4c7e9fc63c10dbab0c30186ef8575474a704496.
This is done in order to address the failure of CrWinClangLLD etc. bots.
These throw an error of "side-by-side configuration is incorrect" during
compilation, which sounds suspiciously related to these manifest
changes.
Revert "Switch external cvtres.exe for llvm's own resource library."
This reverts commit 71fe8ef283a9dab9a3f21432c98466cbc23990d1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306618 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
As discussed on the mailing list it is legal to propagate TBAA to loads/stores
from/to smaller regions of a larger load tagged with TBAA. Do so for
(load->extractvalue)=>(gep->load) and similar foldings.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D31954
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306615 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of creating symbols directly in the findChildren methods of the native
symbol implementations, they will rely on the NativeSession to act as a factory
for these types. This lets NativeSession cache the NativeRawSymbols in its
new symbol cache and makes that cache the source of unique IDs for the symbols.
Right now, this affects only NativeCompilandSymbols. There's no external
change yet, so I think the existing tests are still sufficient. Coming soon
are patches to extend this to built-in types and enums.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306610 91177308-0d34-0410-b5e6-96231b3b80d8
r306381 caused PR33613, by reversing the order in which insertelements were
generated per unroll part. This patch fixes PR33613 by retraining this order,
placing each set of insertelements per part immediately after the last scalar
being packed for this part. Includes a test case derived from PR33613.
Reference: https://bugs.llvm.org/show_bug.cgi?id=33613
Differential Revision: https://reviews.llvm.org/D34760
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306575 91177308-0d34-0410-b5e6-96231b3b80d8
Some conditional branch instructions generated by this pass are checking
the wrong condition code. The instructions TBZ and TBNZ are transformed
into B.GE and B.LT instead of B.PL and B.MI respectively. They should
only be checking the Negative bit.
Differential Revision: https://reviews.llvm.org/D34743
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306550 91177308-0d34-0410-b5e6-96231b3b80d8
The current heuristic in isProfitableToIfCvt assumes we have a branch predictor,
and so gives the wrong answer in some cases when we don't. This patch adds a
subtarget feature to indicate that a subtarget has no branch predictor, and
changes the heuristic in isProfitableToiIfCvt when it's present. This gives a
slight overall improvement in a set of embedded benchmarks on Cortex-M4 and
Cortex-M33.
Differential Revision: https://reviews.llvm.org/D34398
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306547 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I was testing using this expansion logic in other cases besides
NVPTX, and found some runtime failures due to the lack of a check
for a zero length memcpy/memset before the loop. There is already
such a check in the memmove expansion code though.
Reviewers: hfinkel
Subscribers: jholewinski, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34707
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306541 91177308-0d34-0410-b5e6-96231b3b80d8
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.
Majority of the changes in this patch:
1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.
These changes are target independent and described below.
Changed CFI instructions so that they:
1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal
Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.
Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D18046
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306529 91177308-0d34-0410-b5e6-96231b3b80d8
The original patch was an improvement to IR ValueTracking on non-negative
integers. It has been checked in to trunk (D18777, r284022). But was disabled by
default due to performance regressions.
Perf impact has improved. The patch would be enabled by default.
Reviewers: reames
Differential Revision: https://reviews.llvm.org/D34101
Patch by: Olga Chupina <olga.chupina@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306528 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306525 91177308-0d34-0410-b5e6-96231b3b80d8
With fix in include folder character case:
#include "llvm/Codegen/AsmPrinter.h" -> #include "llvm/CodeGen/AsmPrinter.h"
Original commit message:
Change introduces error reporting policy for DWARFContextInMemory.
New callback provided by client is able to handle error on it's
side and return Halt or Continue.
That allows to either keep current behavior when parser prints all errors
but continues parsing object or implement something very different, like
stop parsing on a first error and report an error in a client style.
Differential revision: https://reviews.llvm.org/D34328
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306517 91177308-0d34-0410-b5e6-96231b3b80d8
The benchmarking summarized in
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113525.html showed
this is beneficial for a wide range of cores.
As is to be expected, quite a few small adaptations are needed to the
regressions tests, as the difference in scheduling results in:
- Quite a few small instruction schedule differences.
- A few changes in register allocation decisions caused by different
instruction schedules.
- A few changes in IfConversion decisions, due to a difference in
instruction schedule and/or the estimated cost of a branch mispredict.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306514 91177308-0d34-0410-b5e6-96231b3b80d8
Change introduces error reporting policy for DWARFContextInMemory.
New callback provided by client is able to handle error on it's
side and return Halt or Continue.
That allows to either keep current behavior when parser prints all errors
but continues parsing object or implement something very different, like
stop parsing on a first error and report an error in a client style.
Differential revision: https://reviews.llvm.org/D34328
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306512 91177308-0d34-0410-b5e6-96231b3b80d8
That is pretty common for clang to produce code like
(shl %x, (and %amt, 31)). In this situation we can still perform
trunc (shl) into shl (trunc) conversion given the known value
range of shift amount.
Differential Revision: https://reviews.llvm.org/D34723
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306499 91177308-0d34-0410-b5e6-96231b3b80d8
When simplifying an instruction that has been re-mapped, it should never
simplify to an instruction in the original function. In the edge case
where we are inlining a function into itself, the existing code led to
incorrect behavior. Replace the incorrect code with an assert verifying
that we never expect simplification to produce an instruction in the old
function, unless the functions are the same.
Differential Revision: https://reviews.llvm.org/D33850
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306495 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the llvm part of the initial implementation to support Windows ARM64 COFF format.
I will gradually add more functionality in subsequent patches.
Reviewers: ruiu, rnk, t.p.northover, compnerd
Reviewed By: ruiu, compnerd
Subscribers: aemerson, mgorny, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D34705
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306490 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in D34071, there are some IR optimization opportunities that could be
handled by normal IR passes if this expansion wasn't happening so late in CGP.
Regardless of that, it seems wasteful to knowingly produce suboptimal IR here,
so I'm proposing this change:
%s = sub i32 %x, %y
%r = icmp ne %s, 0
=>
%r = icmp ne %x, %y
Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.
The fact that the PowerPC backend doesn't eliminate the 'subf.' might be
something for PPC folks to investigate separately.
Differential Revision: https://reviews.llvm.org/D34416
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306471 91177308-0d34-0410-b5e6-96231b3b80d8
Without this check, COPY instructions can actually be one of the generic casts
in disguise. That's confusing and bad.
At some point during ISel this restriction has to be relaxed since the fully
selected instructions will usually use COPY for those purposes. Right now I
think it's possible that relaxation occurs during RegBankSelect (hence the
change there). I'm not convinced that's where it belongs long-term though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306470 91177308-0d34-0410-b5e6-96231b3b80d8
The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less.
This patch changes it to use m_APInt to remove both these issues
Differential Revision: https://reviews.llvm.org/D34699
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306457 91177308-0d34-0410-b5e6-96231b3b80d8
Depending on the compare code that can be either an argument of
sext or negate of it. This helps to avoid v_cndmask_b64 instruction
for sext. A reversed value can be further simplified and folded into
its parent comparison if possible.
Differential Revision: https://reviews.llvm.org/D34545
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306446 91177308-0d34-0410-b5e6-96231b3b80d8
Account for the fact that both, the feeder and the compare can be moved
over instructions that kill registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306443 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently this replacement can really be substituting the
same as the original register. Avoid restarting the loop
when there's been no change in the register uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306441 91177308-0d34-0410-b5e6-96231b3b80d8
Also factored out function to check if a boolean is an already
deserialized value which does not require v_cndmask_b32 to be
loaded. Added binary logical operators to its check.
Differential Revision: https://reviews.llvm.org/D34500
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306439 91177308-0d34-0410-b5e6-96231b3b80d8
- DenseMap should be faster than std::map
- Use the `InsertRes = insert() if (!InsertRes.inserted)` pattern rather
than the `if (!X.contains(...)) { X.insert(...); }` to save one map
lookup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306436 91177308-0d34-0410-b5e6-96231b3b80d8
This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform.
We have this transform for icmp+br, so unless there's some reason that icmp+select should be
treated differently, we should do the same thing here.
The benefit comes from increasing the chances of creating identical instructions. This is shown in
the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE
can simplify the identical cmps, and then InstCombine can fold the selects together.
The possible regression for the tests in select.ll raises questions about poison/undef:
http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html
...but that transform is just as likely to be triggered by this canonicalization as it is to be
missed, so we're just pointing out a commutation deficiency in the pattern matching:
https://reviews.llvm.org/rL228409
Differential Revision: https://reviews.llvm.org/D34242
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306435 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
1. Instruction V_CVT_U32_F32 allow omod operand (see SIInstrInfo.td:1435). In fact this operand shouldn't be allowed here. This fix checks if SDWA pseudo instruction has OMod operand and then copy it.
2. There were several problems with support of VOPC instructions in SDWA peephole pass.
Reviewers: tstellar, arsenm, vpykhtin, airlied, kzhuravl
Subscribers: wdng, nhaehnle, yaxunl, dstuttard, tpr, sarnex, t-tye
Differential Revision: https://reviews.llvm.org/D34626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306413 91177308-0d34-0410-b5e6-96231b3b80d8
This patch modifies the conditional compares pass so that it keeps successor
probabilities up-to-date after the conversion. Previously, successor
probabilities were being normalized to a uniform distribution, even though they
may have been heavily biased prior to the conversion (e.g., if one of the edges
was the back edge of a loop). This loss of information affected passes later in
the pipeline.
Differential Revision: https://reviews.llvm.org/D34109
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306412 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block
(which is proven to be the only exiting block in the loop to be unrolled).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306410 91177308-0d34-0410-b5e6-96231b3b80d8
Add the instruction aliases for ds(r|l)l for the two operand alias
of ds(r|l)lv and the aliases ds(r|l)l with the three register operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306405 91177308-0d34-0410-b5e6-96231b3b80d8
When SelectionDAG merges consecutive stores and loads in MergeConsecutiveStores, it does not set dereferenceable flag for a created load instruction. This results in an assertion failure if SelectionDAG commonizes this load instruction with other load instructions, as well as it may miss optimization opportunities.
This patch sat dereferenceable flag for the newly created load instruction if all the load instructions to be merged are dereferenceable.
Differential Revision: https://reviews.llvm.org/D34679
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306404 91177308-0d34-0410-b5e6-96231b3b80d8
[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).
AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.
When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.
Differential Revision: https://reviews.llvm.org/D33188
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306402 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
After this patch, we finally have test cases that require multiple
instruction emission.
Depends on D33590
Reviewers: ab, qcolombet, t.p.northover, rovka, kristof.beyls
Subscribers: javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D33596
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306388 91177308-0d34-0410-b5e6-96231b3b80d8
Borrow from the logic for 'jal' in MipsAsmParser::processInstruction
and add the extra condition of bypassing CALL16 if the destination symbol
is an ELF symbol with STB_LOCAL binding.
Patch by: John Baldwin
Reviewers: sdardis
Differential Revision: https://reviews.llvm.org/D33999
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306387 91177308-0d34-0410-b5e6-96231b3b80d8
* Mark as legal for (s32, i1, s32, s32)
* Map everything into GPRs
* Select to two instructions: a CMP of the condition against 0, to set
the flags, and a MOVCCr to select between the two inputs based on the
flags that we've just set
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306382 91177308-0d34-0410-b5e6-96231b3b80d8
Undoing revert 306338 after fixed bug: add metadata to the load instead of the
reverse shuffle added to it, retaining the original ValueMap implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306381 91177308-0d34-0410-b5e6-96231b3b80d8
This is based heavily on the work done ni D34285. I mostly wanted to do
test cleanup for the author to save them some time, but I had a really
hard time understanding why it was so hard to write better test cases
for these issues.
The problem is that because SROA does a second rewrite of the loads and
because we *don't* propagate !nonnull for non-pointer loads, we first
introduced invalid !nonnull metadata and then stripped it back off just
in time to avoid most ways of this PR manifesting. Moving to the more
careful utility only fixes this by changing the predicate to look at the
new load's type rather than the target type. However, that *does* fix
the bug, and the utility is much nicer including adding range metadata
to model the nonnull property after a conversion to an integer.
However, we have bigger problems because we don't actually propagate
*range* metadata, and the utility to do this extracted from instcombine
isn't really in good shape to do this currently. It *only* handles the
case of copying range metadata from an integer load to a pointer load.
It doesn't even handle the trivial cases of propagating from one integer
load to another when they are the same width! This utility will need to
be beefed up prior to using in this location to get the metadata to
fully survive.
And even then, we need to go and teach things to turn the range metadata
into an assume the way we do with nonnull so that when we *promote* an
integer we don't lose the information.
All of this will require a new test case that looks kind-of like
`preserve-nonnull.ll` does here but focuses on range metadata. It will
also likely require more testing because it needs to correctly handle
changes to the integer width, especially as SROA actively tries to
change the integer width!
Last but not least, I'm a little worried about hooking the range
metadata up here because the instcombine logic for converting from
a range metadata *to* a nonnull metadata node seems broken in the face
of non-zero address spaces where null is not mapped to the integer `0`.
So that probably needs to get fixed with test cases both in SROA and in
instcombine to cover it.
But this *does* extract the core PR fix from D34285 of preventing the
!nonnull metadata from being propagated in a broken state just long
enough to feed into promotion and crash value tracking.
On D34285 there is some discussion of zero-extend handling because it
isn't necessary. First, the new load size covers all of the non-undef
(ie, possibly initialized) bits. This may even extend past the original
alloca if loading those bits could produce valid data. The only way its
valid for us to zero-extend an integer load in SROA is if the original
code had a zero extend or those bits were undef. And we get to assume
things like undef *never* satifies nonnull, so non undef bits can
participate here. No need to special case the zero-extend handling, it
just falls out correctly.
The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to
save a few rounds of trivial edits fixing style issues and test case
formulation.
Differental Revision: D34285
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306379 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
With scalar stores, M0 is clobbered and therefore marked as implicitly
defined. However, it is also dead.
This fixes an assertion when the Greedy Register Allocator decides to
optimize a spill/restore pair away again (via tryHintsRecoloring).
Reviewers: arsenm
Subscribers: qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D33319
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306375 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
EraseInst didn't report that it made IR changes through MadeChange.
It is essential that changes to the IR are reported correctly,
since for example ReassociatePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().
Reviewers: craig.topper, rnk, davide
Reviewed By: rnk, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34616
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306368 91177308-0d34-0410-b5e6-96231b3b80d8
PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level.
This patch makes the PowerPC backend set the optimization level correctly.
Differential Revision: https://reviews.llvm.org/D34615
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306367 91177308-0d34-0410-b5e6-96231b3b80d8
Remove invalid shortcut in fixupKills(): A register needs to be marked
live even when we are not adding a kill flag. This is because a
partially live register must not get a kill flags, but it still needs to
be fully marked live when walking backwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306352 91177308-0d34-0410-b5e6-96231b3b80d8
Using Optional<> here doesn't seem to be terribly valuable, but
this is not the main point of this change. The change enables
us to merge the (now) two identical copies of parentFunctionOfValue()
that Steensgaard's and Andersens' provide.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306351 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:
spec/2006/fp/C++/444.namd 26.84 -0.31%
spec/2006/fp/C++/447.dealII 46.19 +0.89%
spec/2006/fp/C++/450.soplex 42.92 -0.44%
spec/2006/fp/C++/453.povray 38.57 -2.25%
spec/2006/fp/C/433.milc 24.54 -0.76%
spec/2006/fp/C/470.lbm 41.08 +0.26%
spec/2006/fp/C/482.sphinx3 47.58 -0.99%
spec/2006/int/C++/471.omnetpp 22.06 +1.87%
spec/2006/int/C++/473.astar 22.65 -0.12%
spec/2006/int/C++/483.xalancbmk 33.69 +4.97%
spec/2006/int/C/400.perlbench 33.43 +1.70%
spec/2006/int/C/401.bzip2 23.02 -0.19%
spec/2006/int/C/403.gcc 32.57 -0.43%
spec/2006/int/C/429.mcf 40.35 +0.27%
spec/2006/int/C/445.gobmk 26.96 +0.06%
spec/2006/int/C/456.hmmer 24.4 +0.19%
spec/2006/int/C/458.sjeng 27.91 -0.08%
spec/2006/int/C/462.libquantum 57.47 -0.20%
spec/2006/int/C/464.h264ref 46.52 +1.35%
geometric mean +0.29%
The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.
I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.
Reviewers: hfinkel, mkuper, davidxl, chandlerc
Reviewed By: chandlerc
Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33341
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306336 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes bug 33597.
Use of substituteRegister in the tied operand case messes
up the register use iterator, causing some uses to be left
unprocessed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306333 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of providing access to the internal MapStorage holding all Values
associated with a given Key, used for setting or resetting them all together,
ValueMap keeps its MapStorage internal; its new interface allows getting,
setting or resetting a single Value, per part or per part-and-lane.
Follows the discussion in https://reviews.llvm.org/D32871.
Differential Revision: https://reviews.llvm.org/D34473
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306331 91177308-0d34-0410-b5e6-96231b3b80d8
This is the dual problem to legalizing G_INSERTs so most of the code and
testing was cribbed from there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306328 91177308-0d34-0410-b5e6-96231b3b80d8
When we forward a stored value to a load and eliminate it entirely we need to
make sure the liveness of the register is maintained all the way to its use.
Previously we only cleared liveness on the store doing the forwarding, but
there could be other killing uses in between.
We already do the right thing when the load has to be converted into something
else, it was just this one path that skipped it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306318 91177308-0d34-0410-b5e6-96231b3b80d8
Some forms have sizes that depend on the DWARF version, DWARF format
(32/64-bit), or the size of an address. Collect these into a struct
to simplify passing them around. Require callers to provide one when
they query a form's size.
Differential Revision: http://reviews.llvm.org/D34570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306315 91177308-0d34-0410-b5e6-96231b3b80d8
The recommit fixes three bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306313 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the dependency on the external rc.exe tool by writing
a simple .res file using our own library. In this patch I also added an
explicit definition for the .res file magic. Furthermore, I added a
unittest for embeded manifests and fixed a bug exposed by the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306311 91177308-0d34-0410-b5e6-96231b3b80d8
We sometimes need emergency spill slots for the register scavenger.
This may be the case when code needs to access a stack slot that
has an offset of 4096 or more relative to the stack pointer.
To make that determination, processFunctionBeforeFrameFinalized
currently simply checks the total stack frame size of the current
function. But this is not enough, since code may need to access
stack slots in the caller's stack frame as well, in particular
incoming arguments stored on the stack.
This commit fixes the problem by taking argument slots into account.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306305 91177308-0d34-0410-b5e6-96231b3b80d8
The non-AVX-512 behavior was changed in r248266 to match N1778
(C bindings for IEEE-754 (2008)), which defined the four functions
to not raise the inexact exception ("rint" is still defined as raising
it).
Update the AVX-512 lowering of these functions to match that: it should
not be different.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306299 91177308-0d34-0410-b5e6-96231b3b80d8
Convert vector increment or decrement to sub/add with an all-ones constant:
add X, <1, 1...> --> sub X, <-1, -1...>
sub X, <1, 1...> --> add X, <-1, -1...>
The all-ones vector constant can be materialized using a pcmpeq instruction that is
commonly recognized as an idiom (has no register dependency), so that's better than
loading a splat 1 constant.
AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better
way to produce 512 one-bits.
The general advantages of this lowering are:
1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables,
so in theory, this could be better for perf, but...
2. That seems unlikely to affect any OOO implementation, and I can't measure any real
perf difference from this transform on Haswell or Jaguar, but...
3. It doesn't look like it from the diffs, but this is an overall size win because we
eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting
a scalar load (which might itself be a bug), then we're replacing a scalar constant
load + broadcast with a single cheap op, so that should always be smaller/better too.
4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1
and psub x, -1, so we should use that form for +1 too because we can. If there's some
reason to favor a constant load on some CPU, let's make the reverse transform for all
of these cases (either here in the DAG or in a later machine pass).
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=33483
Differential Revision: https://reviews.llvm.org/D34336
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306289 91177308-0d34-0410-b5e6-96231b3b80d8
Csmith discovered that this function can be called with a zero argument,
in which case an assert for this triggered.
This patch also adds a guard before the other call to this function since
it was missing, although the test only covers the case where it was
discovered.
Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll.
Review: Ulrich Weigand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306287 91177308-0d34-0410-b5e6-96231b3b80d8
This method doesn't do any initializing. It just contains asserts. So renaming to AssertOK makes it consistent with similar instructions in other Instruction classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306277 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "[MBP] do not rotate loop if it creates extra branch"
It breaks the sanitizer build bots. Need to fix this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306276 91177308-0d34-0410-b5e6-96231b3b80d8
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.
We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.
Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one
So if C is not a predecessor of H then we introduce extra branch.
This change actually prohibits rotation of the loop if both true
1) Best Exit has next element in chain as successor.
2) Last element in chain is not a predecessor of first element of chain.
Reviewers: iteratee, xur
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34271
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306272 91177308-0d34-0410-b5e6-96231b3b80d8
metadata out of InstCombine and into helpers.
NFC, this just exposes the logic used by InstCombine when propagating
metadata from one load instruction to another. The plan is to use this
in SROA to address PR32902.
If anyone has better ideas about how to factor this or name variables,
I'm all ears, but this seemed like a pretty good start and lets us make
progress on the PR.
This is based on a patch by Ariel Ben-Yehuda (D34285).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306267 91177308-0d34-0410-b5e6-96231b3b80d8
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.
Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306264 91177308-0d34-0410-b5e6-96231b3b80d8
This was reverted in r306252, but I already had the bug fixed and was
just trying to form a test case.
The original commit factored the logic for forming dedicated exits
inside of LoopSimplify into a helper that could be used elsewhere and
with an approach that required fewer intermediate data structures. See
that commit for full details including the change to the statistic, etc.
The code looked fine to me and my reviewers, but in fact didn't handle
indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty.
If you have code that looks *just* right, you can end up leaking these
predecessors into a subsequent rewrite, and crash deep down when trying
to update PHI nodes for predecessors that don't exist.
I've added an assert that makes the bug much more obvious, and then
changed the code to reliably clear the vector so we don't get this bug
again in some other form as the code changes.
I've also added a test case that *does* manage to catch this while also
giving some nice positive coverage in the face of indirectbr.
The real code that found this came out of what I think is CPython's
interpreter loop, but any code with really "creative" interpreter loops
mixing indirectbr and other exit paths could manage to tickle the bug.
I was hard to reduce the original test case because in addition to
having a particular pattern of IR, the whole thing depends on the order
of the predecessors which is in turn depends on use list order. The test
case added here was designed so that in multiple different predecessor
orderings it should always end up going down the same path and tripping
the same bug. I hope. At least, it tripped it for me without
manipulating the use list order which is better than anything bugpoint
could do...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306257 91177308-0d34-0410-b5e6-96231b3b80d8
Recommit NFC patch (rL306157) where I missed incrementing the basic block iterator,
which caused loop deletion tests to hang due to infinite loop.
Had reverted it in rL306162.
rL306157 commit message:
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306254 91177308-0d34-0410-b5e6-96231b3b80d8
This leads to a segfault. Chandler already has a test case and should be
able to recommit with a fix soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306252 91177308-0d34-0410-b5e6-96231b3b80d8
http://rise4fun.com/Alive/i8Q
A narrow bitwise logic op is obviously better than math for value tracking,
and zext is better than sext. Typically, the 'not' will be folded into an
icmp predicate.
The IR difference would even survive through codegen for x86, so we would see
worse code:
https://godbolt.org/g/C14HMF
one_or_zero(int, int): # @one_or_zero(int, int)
xorl %eax, %eax
cmpl %esi, %edi
setle %al
retq
one_or_zero_alt(int, int): # @one_or_zero_alt(int, int)
xorl %ecx, %ecx
cmpl %esi, %edi
setg %cl
movl $1, %eax
subl %ecx, %eax
retq
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306243 91177308-0d34-0410-b5e6-96231b3b80d8
The compiler fails with assertion during legalization of SETCC for <3 x i8> operands.
The result is extended to <4 x i8> and then truncated <4 x i1>. It does not happen on AVX2, because the final result of SETCC is <4 x i32>.
Differential Revision: https://reviews.llvm.org/D34503
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306242 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Make sure we are comparing the unknown instructions in the alias set and the instruction interested in.
I believe this is clearly a bug (missed opportunity). I can also add some test cases if desired.
Reviewers: hfinkel, davide, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34597
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306241 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665
Reviewers: qcolombet, t.p.northover, zvi, guyblank
Reviewed By: guyblank
Subscribers: guyblank, rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33957
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306240 91177308-0d34-0410-b5e6-96231b3b80d8
The cost of an interleaved access was only implemented for AVX512. For other
X86 targets an overly conservative Base cost was returned, resulting in
avoiding vectorization where it is actually profitable to vectorize.
This patch starts to add costs for AVX2 for most prominent cases of
interleaved accesses (stride 3,4 chars, for now).
Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb
workloads; There is also a known issue of 15-30% degradations on some of these
workloads, associated with an interleaved access followed by type
promotion/widening; the resulting shuffle sequence is currently inefficient and
will be improved by a series of patches that extend the X86InterleavedAccess pass
(such as D34601 and more to follow).
Note 2: The costs in this patch do not reflect port pressure penalties which can
be very dominant in the case of interleaved accesses since most of the shuffle
operations are restricted to a single port. Further tuning, that may incorporate
these considerations, will be done on top of the upcoming improved shuffle
sequences (that is, along with the abovementioned work to extend
X86InterleavedAccess pass).
Differential Revision: https://reviews.llvm.org/D34023
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306238 91177308-0d34-0410-b5e6-96231b3b80d8
If you dump a pdb to yaml, and then round-trip it back to a pdb,
and run cvdump -l <file> on the new pdb, cvdump will generate
output such as this.
*** LINES
** Module: "d:\src\llvm\test\DebugInfo\PDB\Inputs\empty.obj"
Error: Line number corrupted: invalid file id 0
<Unknown> (MD5), 0001:00000010-0000001A, line/addr pairs = 3
5 00000010 6 00000013 7 00000018
Note the error message about the corrupted line number.
It turns out that the problem is that cvdump cannot find the
/names stream (e.g. the global string table), and the reason it
can't find the /names stream is because it doesn't understand
the NameMap that we serialize which tells pdb consumers which
stream has the string table.
Some experimentation shows that if we add items to the hash
table in a specific order before serializing it, cvdump can read
it. This suggests that either we're using the wrong hash function,
or we're serializing something incorrectly, but it will take some
deeper investigation to figure out how / why. For now, this at
least allows cvdump to read our line information (and incidentally,
produces an identical byte sequence to what Microsoft tools
produce when writing the named stream map).
Differential Revision: https://reviews.llvm.org/D34491
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306233 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch changes getRange to getRangeRef and returns a reference to the ConstantRange object stored inside the DenseMap caches. We then take advantage of that to add new helper methods that can return min/max value of a signed or unsigned ConstantRange using that reference without first copying the ConstantRange.
getRangeRef calls itself recursively and I believe the reference return is fine for those calls.
I've left getSignedRange and getUnsignedRange returning a ConstantRange object so they will make a copy now. This is to ensure safety since the reference will be invalidated if the DenseMap changes.
I'm sure there are still more places that can take advantage of the reference and I'll submit future patches as I find them.
Reviewers: sanjoy, davide
Reviewed By: sanjoy
Subscribers: zzheng, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D32978
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306229 91177308-0d34-0410-b5e6-96231b3b80d8
When SelectionDAG expands memcpy (or memmove) call into a sequence of load and store instructions, it disregards dereferenceable flag even the source pointer is known to be dereferenceable.
This results in an assertion failure if SelectionDAG commonizes a load instruction generated for memcpy with another load instruction for the source pointer.
This patch makes SelectionDAG to set the dereferenceable flag for the load instructions properly to avoid the assertion failure.
Differential Revision: https://reviews.llvm.org/D34467
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306209 cdac9f57-aa62-4fd3-8940-286f4534e8a0
Summary:
m_CombineOr isn't very efficient. The code using it is also quite verbose.
This patch adds m_Shift and m_BitwiseLogic matchers to make the using code more concise and improve the match efficiency.
Reviewers: spatel, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34593
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306206 cdac9f57-aa62-4fd3-8940-286f4534e8a0
The intention of processFixupValue is not to redefine the semantics of
MCExpr. It is odd enough that a expression lowers to a PCRel MCExpr or
not depending on what it looks like. At least it is a local hack now.
I left a fix for anyone trying to figure out what producers should be
producing a different expression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306200 cdac9f57-aa62-4fd3-8940-286f4534e8a0
Summary:
InstCombine replaces large allocas with small globals consts causing buffer overflows
on valid code, see PR33372.
This fix permits this optimization only if the global is dereference for alloca size.
Fixes PR33372
Reviewers: eugenis, majnemer, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34311
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306194 91177308-0d34-0410-b5e6-96231b3b80d8
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306177 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "[ORC] Remove redundant semicolons from DEFINE_SIMPLE_CONVERSION_FUNCTIONS uses."
Revert "[ORC] Move ORC IR layer interface from addModuleSet to addModule and fix the module type as std::shared_ptr<Module>."
They broke ExecutionEngine/OrcMCJIT/test-global-ctors.ll on linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306176 91177308-0d34-0410-b5e6-96231b3b80d8
After fixing (r306173) a failing test in the lld test suite (r306173),
reland r306095.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aui/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306174 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r306157.
It caused some timeouts in clang tests. Perhaps unreachable loops have
far too many phi nodes.
Reverting and investigating.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306162 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, the implementation of delete dead loops has a special case
when the loop being deleted is never executed. This special case
(updating of exit block's incoming values for phis) can be
run as a prepass for non-executable loops before performing
the actual deletion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306157 91177308-0d34-0410-b5e6-96231b3b80d8
This patch dumps the raw bytes of the pdb name map which contains
the mapping of stream name to stream index for the string table
and other reserved streams.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306148 91177308-0d34-0410-b5e6-96231b3b80d8
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses.
cbz w8, .LBB1_2 -> b.eq .LBB1_2
sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses.
tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.
Differential Revision: https://reviews.llvm.org/D34220.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306144 91177308-0d34-0410-b5e6-96231b3b80d8
Commit r306010 adjusted the condition as follows:
- if (Is64Bit) {
+ if (!STI.isTargetWin32()) {
The intent was to preserve the behavior on all Windows platforms
but extend the behavior on 64-bit Windows platforms to every
other one. (Before r306010, emitStackProbeCall only ever executed
when emitting code for Windows triples.)
Unfortunately,
if (Is64Bit && STI.isOSWindows())
is not the same as
if (!STI.isTargetWin32())
because of the way isTargetWin32() is defined:
bool isTargetWin32() const {
return !In64BitMode && (isTargetCygMing() ||
isTargetKnownWindowsMSVC());
}
In practice this broke the JIT tests on 32-bit Windows, which did not
satisfy the new condition:
LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll
LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll
LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll
LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll
LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll
LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll
LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll
LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll
because %esp was not updated correctly. The failures are only visible
on a MSVC 2017 Debug build, for which we do not have bots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306142 91177308-0d34-0410-b5e6-96231b3b80d8
The goal here is to make it possible to display absolute
file offsets when dumping byets from an MSF. The problem is
that when dumping bytes from an MSF, often the bytes will
cross a block boundary and encounter a discontinuity. We
can't use the normal formatBinary() function for this because
this would just treat the sequence as entirely ascending, and
not account out-of-order blocks.
This patch adds a formatMsfData() function to our printer, and
then uses this function to improve the output of the -stream-data
command line option for dumping bytes from a particular stream.
Test coverage is also expanded to make sure to include all possible
scenarios of offsets, sizes, and crossing block boundaries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306141 91177308-0d34-0410-b5e6-96231b3b80d8
It causes an extra pass of the machine verifier to be added to the pass
manager, and causes test/CodeGen/Generic/llc-start-stop.ll to fail.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306140 91177308-0d34-0410-b5e6-96231b3b80d8
This is useful when an upper limit on the cache size needs to be
controlled independently of the amount of the amount of free space.
One use case is a machine with a large number of cache directories
(e.g. a buildbot slave hosting a large number of independent build
jobs). By imposing an upper size limit on each cache directory,
users can more easily estimate the server's capacity.
Differential Revision: https://reviews.llvm.org/D34547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306126 91177308-0d34-0410-b5e6-96231b3b80d8
This is essentially just a BinaryStreamRef packaged with an
offset and the logic for reading one is no different than the
logic for reading a BinaryStreamRef, except that we save the
current offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306122 91177308-0d34-0410-b5e6-96231b3b80d8
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306120 91177308-0d34-0410-b5e6-96231b3b80d8
G_SEQUENCE is going away soon so as a first step the MachineIRBuilder needs to
be taught how to emulate it with alternatives. We use G_MERGE_VALUES where
possible, and a sequence of G_INSERTs if not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306119 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.
Reviewers: hans, echristo, dmgreen
Reviewed By: dmgreen
Subscribers: mcrosier, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D34436
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306118 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad
introduced by r196905. Nothing in the semantics of the "volatile"
keyword or the definition of the z/Architecture actually requires
that volatile loads are preceded by a serialization operation, and
no other compiler on the platform actually implements this.
Since we've now seen a use case where this additional serialization
causes noticable performance degradation, this patch removes it.
The patch still leaves in the serialization before atomic loads,
which is now implemented directly in lowerATOMIC_LOAD. (This also
seems overkill, but that can be addressed separately.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306117 91177308-0d34-0410-b5e6-96231b3b80d8
The isBarrier/isTerminator flags have been removed from the SystemZ trap
instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just
an issue at -O0 and did not affect code output on benchmarks.
(Like Eli pointed out: "targets are split over whether they consider their
"trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and
SystemZ do. We should probably try to be consistent here.". This is still the
case, although SystemZ has switched sides).
SystemZ now returns true in isMachineVerifierClean() :-)
These Generic tests have been modified so that they can be run with or without
EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and
CodeGen/Generic/print-machineinstrs.ll
Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman
https://bugs.llvm.org/show_bug.cgi?id=33047https://reviews.llvm.org/D34143
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306106 91177308-0d34-0410-b5e6-96231b3b80d8
The single exit block allowed in runtime unrolling is guaranteed to be
the Latch's successor, so rename it as LatchExitBlock.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306105 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Many languages have a three way comparison idiom where comparing two values
produces not a boolean, but a tri-state value. Typical values (e.g. as used in
the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1
for greater than.
We actually do a great job already of converting three way comparisons into
binary comparisons when the result produced has one a single use. Unfortunately,
such values can have more than one use, and in that case, our existing
optimizations break down.
The patch adds a peephole which converts a three-way compare + test idiom into a
binary comparison on the original inputs. It focused on replacing the test on
the result of the three way compare and does nothing about removing the three
way compare itself. That's left to other optimizations (which do actually kick
in commonly.)
We currently recognize one idiom on signed integer compare. In the future, we
plan to recognize and simplify other comparison idioms on
other signed/unsigned datatypes such as floats, vectors etc.
This is a resurrection of Philip Reames' original patch:
https://reviews.llvm.org/D19452
Reviewers: majnemer, apilipenko, reames, sanjoy, mkazantsev
Reviewed by: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34278
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306100 91177308-0d34-0410-b5e6-96231b3b80d8
ELF/mips-plt-r6.s in lld-test is failing. Reverting the change.
Original commit message:
[mips] Fix register positions in the aui/daui instructions
Swapped the position of the rt and rs register in the aut/daui
instructions for mips32r6 and mips64r6. With this change, the format of
the generated instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306099 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The function matches the interface of llvm::to_integer, but as we are
calling out to a C library function, I let it take a Twine argument, so
we can avoid a string copy at least in some cases.
I add a test and replace a couple of existing uses of strtod with this
function.
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34518
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306096 91177308-0d34-0410-b5e6-96231b3b80d8
Swapped the position of the rt and rs register in the aut/daui instructions
for mips32r6 and mips64r6. With this change, the format of the generated
instructions complies with specifications and GCC.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D33988
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306095 91177308-0d34-0410-b5e6-96231b3b80d8
Before this change, it was always the first element of a vector that got splatted since the lower 6 bits of vshf.d $wd were always zero for little endian.
Additionally, masking has been performed for vshf via which splat.d is created.
Vshf has a property where if its first operand's elements have either bit 6 or 7 set, destination element is set to zero.
Initially masked with 63 to avoid this property, which would result in generation of and.v + vshf.d in all cases.
Masking with one results in generating a single splati.d instruction when possible.
Differential Revision: https://reviews.llvm.org/D32216
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306090 91177308-0d34-0410-b5e6-96231b3b80d8
Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition.
But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare.
This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it.
Differential Revision: https://reviews.llvm.org/D33262
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306085 91177308-0d34-0410-b5e6-96231b3b80d8
X86_64 COFF only has support for 32 bit pcrel relocations. Produce an
error on all others.
Note that gnu as has extended the relocation values to support
this. It is not clear if we should support the gnu extension.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306082 91177308-0d34-0410-b5e6-96231b3b80d8
I want to use the same logic as LoopSimplify to form dedicated exits in
another pass (SimpleLoopUnswitch) so I wanted to factor it out here.
I also noticed that there is a pretty significantly more efficient way
to implement this than the way the code in LoopSimplify worked. We don't
need to actually retain the set of unique exit blocks, we can just
rewrite them as we find them and use only a set to deduplicate.
This did require changing one part of LoopSimplify to not re-use the
unique set of exits, but it only used it to check that there was
a single unique exit. That part of the code is about to walk the exiting
blocks anyways, so it seemed better to rewrite it to use those exiting
blocks to compute this property on-demand.
I also had to ditch a statistic, but it doesn't seem terribly valuable.
Differential Revision: https://reviews.llvm.org/D34049
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306081 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: LVI can reason about an AND of icmps on the true dest of a branch. I believe we can do similar for the false dest of ORs. This allows us to get the same answer for the demorganed versions of some of the AND test cases as you can see.
Reviewers: anna, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34431
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306076 91177308-0d34-0410-b5e6-96231b3b80d8
This is very similar to the transform in:
https://reviews.llvm.org/rL306040
...but in this case, we use cmp X, 1 to set the carry bit as needed.
Again, we can show that all of these are logically equivalent (although
InstCombine currently canonicalizes to a form not seen here), and if
we believe IACA, then this is the smallest/fastest code. Eg, with SNB:
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
---------------------------------------------------------------------
| 1 | 1.0 | | | | | | | cmp edi, 0x1
| 2 | | 1.0 | | | | 1.0 | CP | sbb eax, eax
The larger motivation is to clean up all select-of-constants combining/lowering
because we're missing some common cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306072 91177308-0d34-0410-b5e6-96231b3b80d8
For whatever reason, when processing
.globl foo
foo:
.data
bar:
.long foo-bar
llvm-mc creates a relocation with the section:
0x0 IMAGE_REL_I386_REL32 .text
This is different than when the relocation is relative from the
beginning. For example, a file with
call foo
produces
0x0 IMAGE_REL_I386_REL32 foo
I would like to refactor the logic for converting "foo - ." into a
relative relocation so that it is shared with ELF. This is the first
step and just changes the coff implementation to match what ELF (and
COFF in the case of calls) does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306063 91177308-0d34-0410-b5e6-96231b3b80d8
The feeder instruction will be moved to right before the compare, so
the updating code should not be looking for kills past the compare.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306059 91177308-0d34-0410-b5e6-96231b3b80d8
move the ObjectCache from the IRCompileLayer to SimpleCompiler.
This is the first in a series of patches aimed at cleaning up and improving the
robustness and performance of the ORC APIs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306058 91177308-0d34-0410-b5e6-96231b3b80d8
There's nothing incorrect about emitting such relocations against
symbols defined in other objects. The code in EmitCOFFSec* was missing
the visitUsedExpr part of MCStreamer::EmitValueImpl, so these symbols
were not being registered with the object file assembler.
This will be used to make reduced test cases for LLD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306057 91177308-0d34-0410-b5e6-96231b3b80d8
It looks like that when this code was written recordRelocation could
be called with A-B where A and B are in the same section. The
expression evaluation logic these days makes sure those are folded, so
some of this code was dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306053 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, we incorrectly update exit blocks of loops when there are multiple
edges from a single exiting block to the exit block. This can happen when we
have switches as the terminator of the exiting blocks.
The fix here is to correctly update the phi nodes in the exit block, and remove
all incoming values *except* for one which is from the preheader.
Note: Currently, this error can manifest only while deleting non-executed loops. However, it
is possible to trigger this error in invariant loops, once we enhance the logic
around the exit conditions for the loop check.
Reviewers: chandlerc, dberlin, sanjoy, efriedma
Reviewed by: efriedma
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34516
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306048 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
These intrinsics aren't used by clang and haven't been for a while.
There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today.
Reviewers: spatel, RKSimon, zvi, igorb
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34389
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306047 91177308-0d34-0410-b5e6-96231b3b80d8
This matches the checks done at the beginning of isKnownNonEqual that this code is partially emulating.
Without this we can get assertion failures due to the bit widths of the KnownBits not matching.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306044 91177308-0d34-0410-b5e6-96231b3b80d8