The issue is not if the value is pcrel. It is whether we have a
relocation or not.
If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307730 91177308-0d34-0410-b5e6-96231b3b80d8
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307729 91177308-0d34-0410-b5e6-96231b3b80d8
I encountered these when linking LLD, which uses atls.lib. Those objects
appear to use these uncommon symbol records:
0x115E S_HEAPALLOCSITE
0x113D S_ENVBLOCK
0x1113 S_GTHREAD32
0x1153 S_FILESTATIC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307725 91177308-0d34-0410-b5e6-96231b3b80d8
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307722 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the .module and .set directives for the MT ASE,
notably that .module sets the relevant flags in .MIPS.abiflags and .set
doesn't.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35249
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307716 91177308-0d34-0410-b5e6-96231b3b80d8
For ELF, a movw+movt pair is handled as two separate relocations.
If an offset should be applied to the symbol address, this offset is
stored as an immediate in the instruction (as opposed to stored as an
offset in the relocation itself).
Even though the actual value stored in the movt immediate after linking
is the top half of the value, we need to store the unshifted offset
prior to linking. When the relocation is made during linking, the offset
gets added to the target symbol value, and the upper half of the value
is stored in the instruction.
This makes sure that movw+movt with offset symbols get properly
handled, in case the offset addition in the lower half should be
carried over to the upper half.
This makes the output from the additions to the test case match
the output from GNU binutils.
For COFF and MachO, the movw/movt relocations are handled as a pair,
and the overflow from the lower half gets carried over to the movt,
so they should keep the shifted offset just as before.
Differential Revision: https://reviews.llvm.org/D35242
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307713 91177308-0d34-0410-b5e6-96231b3b80d8
This is fine as nothing in the code relies on leader and memory
leader being the same for a given congruency class. Ack'ed by
Dan.
Fixes PR33720.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307699 91177308-0d34-0410-b5e6-96231b3b80d8
The warning is reproducible with GCC 4.8. Thanks to David Blaikie for
the suggested fix.
The reported warning was
```
/usr/local/google/home/echristo/sources/llvm/lib/Fuzzer/FuzzerExtFunctions.def:29:10: warning: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Wpedantic]
EXT_FUNC(__lsan_enable, void, (), false);
^
/usr/local/google/home/echristo/sources/llvm/lib/Fuzzer/FuzzerExtFunctionsWeak.cpp:44:24: note: in definition of macro ‘EXT_FUNC’
CheckFnPtr((void *)::NAME, #NAME, WARN);
^
```
Differential Revision: https://reviews.llvm.org/D35243
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307686 91177308-0d34-0410-b5e6-96231b3b80d8
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307676 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Patch by Klaus Kretzschmar
We would like to introduce a new type of llvm error handler for handling
bad alloc fault situations. LLVM already provides a fatal error handler
for serious non-recoverable error situations which by default writes
some error information to stderr and calls exit(1) at the end (functions
are marked as 'noreturn').
For long running processes (e.g. a server application), exiting the
process is not an acceptable option, especially not when the system is
in a temporary resource bottleneck with a good chance to recover from
this fault situation. In such a situation you would rather throw an
exception to stop the current compilation and try to overcome the
resource bottleneck. The user should be aware of the problem of throwing
an exception in bad alloc situations, e.g. you must not do any
allocations in the unwind chain. This is especially true when adding
exceptions in existing unfamiliar code (as already stated in the comment
of the current fatal error handler)
So the new handler can also be used to distinguish from general fatal
error situations where recovering is no option. It should be used in
cases where a clean unwind after the allocation is guaranteed.
This patch contains:
- A report_bad_alloc function which calls a user defined bad alloc
error handler. If no user handler is registered the
report_fatal_error function is called. This function is not marked as
'noreturn'.
- A install/restore_bad_alloc_error_handler to install/restore the bad
alloc handler.
- An example (in Mutex.cpp) where the report_bad_alloc function is
called in case of a malloc returns a nullptr.
If this patch gets accepted we would create similar patches to fix
corresponding malloc/calloc usages in the llvm code.
Reviewers: chandlerc, greened, baldrick, rnk
Reviewed By: rnk
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D34753
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307673 91177308-0d34-0410-b5e6-96231b3b80d8
1. The available program storage region of the red zone to compilers is 288
bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).
Differential Revision: https://reviews.llvm.org/D34337
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307672 91177308-0d34-0410-b5e6-96231b3b80d8
This change allows the pc to be used as a destination register for the
pseudo instruction LDR pc,=expression . The pseudo instruction must not be
transformed into a MOV, but it can use the Thumb2 LDR (literal) instruction
to a constant pool entry. See (A7.7.43 from ARMv7M ARM ARM).
Differential Revision: https://reviews.llvm.org/D34751
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307640 91177308-0d34-0410-b5e6-96231b3b80d8
We used to forget to erase the original instruction when replacing a
G_FCMP true/false. Fix this bug and make sure the tests check for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307639 91177308-0d34-0410-b5e6-96231b3b80d8
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.
The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307634 91177308-0d34-0410-b5e6-96231b3b80d8
This is a second attempt to land this patch.
The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.
We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.
Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one
So if C is not a predecessor of H then we introduce extra branch.
This change actually prohibits rotation of the loop if both true
Best Exit has next element in chain as successor.
Last element in chain is not a predecessor of first element of chain.
Reviewers: iteratee, xur, sammccall, chandlerc
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307631 91177308-0d34-0410-b5e6-96231b3b80d8
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.
However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.
The added test case shows an example.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307628 91177308-0d34-0410-b5e6-96231b3b80d8
querying for analysis results on a function declaration rather than
a definition.
The only reason this worked previously is by chance -- because the way
we got alias analysis results with the legacy PM, we happened to not
compute a dominator tree and so we happened to not hit an assert even
though it didn't make any real sense. Now we bail out before trying to
compute alias analysis so that we don't hit these asserts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307625 91177308-0d34-0410-b5e6-96231b3b80d8
In the POWER9 instruction scheduler, SchedWriteRes for the simple integer instructions are misconfigured to use that of (costly) DFU instructions.
This results in surprisingly long instruction latency estimation and causes misbehavior in some optimizers such as if-conversion.
Differential Revision: https://reviews.llvm.org/D34869
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307624 91177308-0d34-0410-b5e6-96231b3b80d8
This patch reduces compilation time by avoiding redundant analysis while selecting instructions to create an immediate.
If the instruction count required to create the input number without rotate is 2, we do not need further analysis to find a shorter instruction sequence with rotate; rotate + load constant cannot be done by 1 instruction (i.e. getInt64CountDirectnever return 0).
This patch should not change functionality.
Differential Revision: https://reviews.llvm.org/D34986
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307623 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of the continuing effort to increase parity between
LLD and MSVC PDBs. link still doesn't like our PDBs, so the most
obvious thing to check was whether adding an empty publics stream
would get it to do something else. It still fails in the same way
but at least this removes one more variable from the equation.
The next logical step would be to try creating an empty globals
stream.
Differential Revision: https://reviews.llvm.org/D35224
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307598 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
As metioned in https://reviews.llvm.org/D34576, checkings in
`collectConstantCandidates` can be replaced by using
`llvm::canReplaceOperandWithVariable`.
The only special case is that `collectConstantCandidates` return false for
all `IntrinsicInst` but it is safe for us to collect constant candidates from
`IntrinsicInst`.
Reviewers: pirama, efriedma, srhines
Reviewed By: efriedma
Subscribers: llvm-commits, javed.absar
Differential Revision: https://reviews.llvm.org/D34921
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307587 91177308-0d34-0410-b5e6-96231b3b80d8
Immediates can be folded as long as the immediate is a vreg.
Also undo commuting instructions if it didn't fold an immediate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307575 91177308-0d34-0410-b5e6-96231b3b80d8
An instruction that has an immediate operand can't reach
this point. This is only called for a freshly shrunk instruction,
which prevously couldn't have had a literal constant operand.
This was also not conservative enough since it woudl also have
had to filter other constant-like inputs like frame indexes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307574 91177308-0d34-0410-b5e6-96231b3b80d8
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307546 91177308-0d34-0410-b5e6-96231b3b80d8
When unrolling under multiple exits which is under off-by-default option,
the assert that checks for VMap entry in loop exit values is too strong.
(assert if VMap entry did not exist, the value should be a
constant). However, values derived from
constants or from values outside loop, does not have a VMap entry too.
Removed the assert and added a testcase showcasing the property for
non-constant values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307542 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds a callback registration API to the PassBuilder,
enabling registering out-of-tree passes with it.
Through the Callback API, callers may register callbacks with the
various stages at which passes are added into pass managers, including
parsing of a pass pipeline as well as at extension points within the
default -O pipelines.
Registering utilities like `require<>` and `invalidate<>` needs to be
handled manually by the caller, but a helper is provided.
Additionally, adding passes at pipeline extension points is exposed
through the opt tool. This patch adds a `-passes-ep-X` commandline
option for every extension point X, which opt parses into pipelines
inserted into that extension point.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: lksbhm, grosser, davide, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307532 91177308-0d34-0410-b5e6-96231b3b80d8
Variable was called 'Name' and contained text
name of relocation type. Problem was that
outside of this error handling scope we already
have different 'Name' variable that contains
section name.
Change helps to avoid confusion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307530 91177308-0d34-0410-b5e6-96231b3b80d8
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.
Please note that the patch extensively affects the X86 MC instr scheduling for SNB.
Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.
The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs
For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:
def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];
}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;
Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.
Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb
Differential Revision: https://reviews.llvm.org/D35019#inline-304691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307529 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D35177
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307526 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This solves PR33641.
When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.
If the calls are left untouched, they will later on cause errors:
"function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.
Reviewers: mstorsjo, rnk, arsenm
Reviewed By: rnk
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34874
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307521 91177308-0d34-0410-b5e6-96231b3b80d8
These asserts could only occur if we fail to properly detect the compiler, but an assert is not a good way to do that because it doesn't work in release builds.
I wonder if we could use #error?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307520 91177308-0d34-0410-b5e6-96231b3b80d8
Reduces llvm-profdata memory usage on a large profile from 7.8GB to 5.1GB.
The ProfData API now supports reporting all the errors/warnings rather
than only the first, though llvm-profdata ignores everything after the
first for now to preserve existing behavior. (if there's a desire for
other behavior, happy to implement that - but might be as well left for
a separate patch)
Reviewers: davidxl
Differential Revision: https://reviews.llvm.org/D35149
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307516 91177308-0d34-0410-b5e6-96231b3b80d8
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.
Fixes PR33715.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307508 91177308-0d34-0410-b5e6-96231b3b80d8
This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue.
Fixes PR#33094.
Submitted on behalf of @Praetonus (Benoit Vey)
Differential Revision: https://reviews.llvm.org/D34538
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307502 91177308-0d34-0410-b5e6-96231b3b80d8
invalidation of analyses when merging SCCs.
While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.
However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.
With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307498 91177308-0d34-0410-b5e6-96231b3b80d8
dependencies between analyses.
This uncovers even more issues with the proxies and the splitting apart
of SCCs which are fixed in this patch. I discovered this while trying to
add more rigorous testing for a change I'm making to the call graph
update invalidation logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307497 91177308-0d34-0410-b5e6-96231b3b80d8
Users of getHostCPUName should also use getHostCPUFeatures which will take care of making sure avx512 is disabled if the CPU doesn't support it. This is consistent with what we do for other CPUs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307495 91177308-0d34-0410-b5e6-96231b3b80d8
the invalidation propagation logic from an SCC to a Function.
I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.
However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.
Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.
Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307487 91177308-0d34-0410-b5e6-96231b3b80d8
I recently changed m_One and m_AllOnes to use Constant::isOneValue/isAllOnesValue which work on floating point values too. The original implementation looked specifically for ConstantInt scalars and splats. So I'm guessing we are accidentally trying to issue sext/zexts on floating point types now.
Hopefully I figure out how to reproduce the failure from the PR soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307486 91177308-0d34-0410-b5e6-96231b3b80d8
Add breaks - doesn't affect results as both GPR/FPU both check for 32/64 bit sizes. So will still default to GenericOps in the same way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307484 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We don't want to autocomplete flags whose Flags class has `NoDriverOption` when argv[1] is not `-cc1`.
Another idea for this implementation is to make --autocomplete a cc1
option and handle it in clang Frontend, by porting --autocomplete
handler from Driver to Frontend, so that we can handle Driver options
and CC1 options in unified manner.
Differential Revision: https://reviews.llvm.org/D34770
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307479 91177308-0d34-0410-b5e6-96231b3b80d8
The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp
instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate.
In this patch we use the original predicate to do the transformation.
Also added a test case that exercises this situation.
Differentian Revision: https://reviews.llvm.org/D35107
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307477 91177308-0d34-0410-b5e6-96231b3b80d8
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. Selecting 0 or -1
needs extra attention to produce the optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)
As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307471 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
(re)definition of _RESTRICT_KYWD rightfully causes a warning message during the Solaris build.
This hack is not needed if build compiler is properly configured (.e.g /usr/bin/gcc) so just remove it.
Reviewers: ro, mgorny, krytarowski, joerg
Reviewed By: joerg
Subscribers: quenelle, llvm-commits
Patch by Fedor Sergeev (Oracle).
Differential Revision: https://reviews.llvm.org/D35054
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307469 91177308-0d34-0410-b5e6-96231b3b80d8
The CPU name is really just used for scheduler and other microarchitectural optimizations. The feature flags should be determined by getHostCPUFeatures which should always be used with getHostCPUName. Trying to alter CPU name strings to control features just isn't practical.
Most of these types of things were removed from Intel CPUs a while ago.
This is part of my plan to bring compiler-rt's cpu_model.c file up to date with the equivalent functionality in libgcc. A lot of the code in that file is copied from Host.cpp and we want to keep them reasonably in sync.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307467 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 147f45ff24456aea59575fa4ac16c8fa554df46a.
Revert "Revert "Revert "Revert "Replace trivial use of external rc.exe by writing our own .res file.""""
This reverts commit 61a90a67ed54a1f0dfeab457b65abffa129569e4.
The patches were intially reverted because they were causing a failure
on CrWinClangLLD. Unfortunately, this was done haphazardly and didn't
compile, so the revert was reverted again quickly to fix this. One that
was done, the revert of the revert was itself reverted. This allowed me
to finally fix the actual bug in r307452. This patch re-enables the
code path that had originally been causing the bug, now that it (should)
be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307460 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The original cvtres.exe sets the high bit when an identifier offset
points to a string. Even though this is not mentioned in the spec, and
in fact does not seem to cause errors with most cases, for some reason
this causes a failure in Chromium where the new resource file is not
verified as a new version. This patch sets this high bit flag, and also
adds a test case to check that the output of our library is always
identical to original cvtres.
Reviewers: zturner, ruiu
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35099
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307452 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307451 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D35096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307439 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For SamplePGO + ThinLTO, because profile annotation is done twice at both PrepareForThinLTO pipeline and backend compiler, the following changes are needed at the PrepareForThinLTO phase to ensure the IR is not changed dramatically. Otherwise the profile annotation will be inaccurate in the backend compiler.
* disable hot-caller heuristic
* disable loop unrolling
* disable indirect call promotion
This will unblock the new PM testing for sample PGO (tools/clang/test/CodeGen/pgo-sample-thinlto-summary.c), which will be covered in another cfe patch.
Reviewers: chandlerc, tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, Prazek, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D34895
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307437 91177308-0d34-0410-b5e6-96231b3b80d8
1) Don't write a /src/headerblock stream. This appears to be
written conditionally by MSVC, but it's not clear what the
condition is. For now, just remove it since we dont' know
what it is anyway and the particular pdb we've checked in
for the test doesn't have one.
2) Write a valid timestamp for the PDB file signature. This
leads to non-reproducible builds, but it matches the default
behavior of link, so it should be out default as well. If
we need reproducibility, we should add a separate command
line option for it that is off by default.
3) Write an empty FPO stream. MSVC seems to always write an
FPO stream. This change makes the stream directory match
up, although we still need to make the contents of the FPO
stream match.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307436 91177308-0d34-0410-b5e6-96231b3b80d8
With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic
is in place to support multiple exit/exiting blocks when prolog
remainder is generated.
This patch removed the assert that multiple exit blocks unrolling is only
supported when epilog remainder is generated.
Also, added test runs and checks with PROLOG prefix in
runtime-loop-multiple-exits.ll test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307435 91177308-0d34-0410-b5e6-96231b3b80d8
When reusing a register for a new definition, the fast register allocator
used to insert a kill flag at the previous last use of that register to
inform later passes that this register is free between the redef and the
last use. However, this may be wrong when subregisters are involved.
Indeed, a partially redef would have trigger a kill of the full super
register, potentially wrongly marking all the other subregisters as
free. Given we don't track which lanes are still live, we cannot set the
kill flag in such case.
Note: This bug has been latent for about 7 years (r104056).
llvmg.org/PR33677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307428 91177308-0d34-0410-b5e6-96231b3b80d8
It referenced a wrong function name, and didn't mention what the
second argument did. This should be slightly more accurate now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307425 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug where unmodifiable strings where passed to posix_spawn.
This is an attempt to unbreak the greendragon libFuzzer bot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307424 91177308-0d34-0410-b5e6-96231b3b80d8
A couple of things were different about our generated PDBs.
1) We were outputting the wrong Version on the PDB Stream.
The version we were setting was newer than what MSVC is setting.
It's not clear what the implications are, but we change LLD
to use PdbImplVC70, as MSVC does.
2) For the optional debug stream indices in the DBI Stream, we
were outputting 0 to mean "the stream is not present". MSVC
outputs uint16_t(-1), which is the "correct" way to specify
that a stream is not present. So we fix that as well.
3) We were setting the PDB Stream signature to 0. This is supposed
to be the result of calling time(nullptr). Although this leads
to non-deterministic builds, a better way to solve that is by
having a command line option explicitly for generating a
reproducible build, and have the default behavior of lld-link
match the default behavior of link.
To test this, I'm making use of the new and improved `pdb diff`
sub command. To make it suitable for writing tests against, I had
to modify the diff subcommand slightly to print less verbose output.
Previously it would always print | <column> | <value1> | <value2> |
which is quite verbose, and the values are fragile. All we really
want to know is "did we produce the same value as link?" So I added
command line options to print a single character representing the
result status (different, identical, equivalent), and another to
hide the value display. Note that just inspecting the diff output
used to write the test, you can see some things that are obviously
wrong. That is just reflective of the fact that this is the state
of affairs today, not that we're asserting that this is "correct".
We can use this as a starting point to discover differences, fix
them, and update the test.
Differential Revision: https://reviews.llvm.org/D35086
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307422 91177308-0d34-0410-b5e6-96231b3b80d8
We're getting to the point that some MS tools (e.g. DIA) can recognize
our PDBs but others (e.g. link.exe) cannot. I think the way forward is
to improve our tooling to help us find differences more easily. For
example, if we can compile the same program with clang-cl and cl and
have a tool tell us all the places where the PDBs differ, this could
tell us what we're doing wrong. It's tricky though, because there are a
lot of "benign" differences in a PDB. For example, if the string table
in one PDB consists of "foo" followed by "bar" and in the other PDB it
consists of "bar" followed by "foo", this is not necessarily a critical
difference, as long as the uses of these strings also refer to the
correct location. On the other hand, if the second PDB doesn't even
contain the string "foo" at all, this is a critical difference.
diff mode has been in llvm-pdbutil for quite a while, but because of the
above challenge along with some others, it's been hard to make it
useful. I think this patch addresses that. It looks for all the same
things, but it now prints the output in tabular format (carefully
formatted and aligned into tables and fields), and it highlights
critical differences in red, non-critical differences in yellow, and
identical fields in green. This makes it easy to spot the places we
differ, and the general concept of outputting arbitrary fields in
tabular format can be extended to provide analysis into many of the
different types of information that show up in a PDB.
Differential Revision: https://reviews.llvm.org/D35039
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307421 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is an addon to the change rl304488 cloning fixes. (Originally rl304226 reverted rl304228 and reapplied rl304488 https://reviews.llvm.org/D33655)
rl304488 works great when DILocalVariables that comes from the inlined function has a 'unique-ed' type, but,
in the case when the variable type is distinct we will create a second DILocalVariable in the scope of the original function that was inlined.
Consider cloning of the following function:
```
define private void @f() !dbg !5 {
%1 = alloca i32, !dbg !11
call void @llvm.dbg.declare(metadata i32* %1, metadata !14, metadata !12), !dbg !18
ret void, !dbg !18
}
!14 = !DILocalVariable(name: "inlined", scope: !15, file: !6, line: 5, type: !17) ; came from an inlined function
!15 = distinct !DISubprogram(name: "inlined", linkageName: "inlined", scope: null, file: !6, line: 8, type: !7, isLocal: true, isDefinition: true, scopeLine: 9, isOptimized: false, unit: !0, variables: !16)
!16 = !{!14}
!17 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "some_struct", size: 32, align: 32)
```
Without this fix, when function 'f' is cloned, we will create another DILocalVariable for "inlined", due to its type being distinct.
```
define private void @f.1() !dbg !23 {
%1 = alloca i32, !dbg !26
call void @llvm.dbg.declare(metadata i32* %1, metadata !28, metadata !12), !dbg !30
ret void, !dbg !30
}
!14 = !DILocalVariable(name: "inlined", scope: !15, file: !6, line: 5, type: !17)
!15 = distinct !DISubprogram(name: "inlined", linkageName: "inlined", scope: null, file: !6, line: 8, type: !7, isLocal: true, isDefinition: true, scopeLine: 9, isOptimized: false, unit: !0, variables: !16)
!16 = !{!14}
!17 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "some_struct", size: 32, align: 32)
;
!28 = !DILocalVariable(name: "inlined", scope: !15, file: !6, line: 5, type: !29) ; OOPS second DILocalVariable
!29 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "some_struct", size: 32, align: 32)
```
Now we have two DILocalVariable for "inlined" within the same scope. This result in assert in AsmPrinter/DwarfDebug.h:131: void llvm::DbgVariable::addMMIEntry(const llvm::DbgVariable &): Assertion `V.Var == Var && "conflicting variable"' failed.
(Full example: See: https://bugs.llvm.org/show_bug.cgi?id=33492)
In this change we prevent duplication of types so that when a metadata for DILocalVariable is cloned it will get uniqued to the same metadate node as an original variable.
Reviewers: loladiro, dblaikie, aprantl, echristo
Reviewed By: loladiro
Subscribers: EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D35106
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307418 91177308-0d34-0410-b5e6-96231b3b80d8
Minor refactoring to use the preexisting loop exit that's already
calculated. We do not need to recompute the loop exit in ConnectProlog.
Apart from avoiding redundant computation, this is required for
supporting multiple loop exits when Prolog remainder loops are generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307417 91177308-0d34-0410-b5e6-96231b3b80d8
r306334 fixed a bug in AArch64 dealing with wide interleaved accesses having
pointer types. The bug also exists in ARM, so this patch copies over the fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307409 91177308-0d34-0410-b5e6-96231b3b80d8
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. DAGCombiner already
has the foundation to allow the transforms, so we just need to fill in the holes
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
Differential Revision: https://reviews.llvm.org/D34652
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307404 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this commit both of the added test cases were passing. However, in the
latter case (test7) we were doing a lot more work to arrive at the same answer
(i.e., we were using isImpliedCondMatchingOperands() to determine the
implication.).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307400 91177308-0d34-0410-b5e6-96231b3b80d8
Today the safepoint IR verifier catches some unrelocated uses of base
pointers that are actually valid.
With this change, we narrow down the set of false positives.
Specifically, the verifier knows about compares to null and compares
between 2 unrelocated pointers.
Reviewed by: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35057
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307392 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change gives a 0.89% speed on execution time, a 0.94% improvement
in benchmark scores and a 0.62% increase in binary size on a Cortex-A57.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A57 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga
Reviewed By: kristof.beyls
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34954
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307389 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change gives a 0.34% speed on execution time, a 0.61% improvement
in benchmark scores and a 0.57% increase in binary size on a Cortex-A72.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A72 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar
Reviewed By: kristof.beyls
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D34961
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307380 91177308-0d34-0410-b5e6-96231b3b80d8
the system's version of macOS
sys::getProcessTriple returns LLVM_HOST_TRIPLE, whose system version might not
be the actual version of the system on which the compiler running. This commit
ensures that, for macOS, sys::getProcessTriple returns a triple with the
system's macOS version.
rdar://33177551
Differential Revision: https://reviews.llvm.org/D34446
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307372 91177308-0d34-0410-b5e6-96231b3b80d8
We lower to a sequence consisting of:
- MOVi 0 into a register
- VCMPS to do the actual comparison and set the VFP flags
- FMSTAT to move the flags out of the VFP unit
- MOVCCi to either use the "zero register" that we have previously set
with the MOVi, or move 1 into the result register, based on the values
of the flags
As was the case with soft-float, for some predicates (one, ueq) we
actually need two comparisons instead of just one. When that happens, we
generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of
using the result of the first MOVCCi as the "zero register" for the
second one. This is a bit overkill, since one comparison followed by
two non-flag-setting conditional moves should be enough. In any case,
the backend manages to CSE one of the comparisons away so it doesn't
matter much.
Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not
VCMPES. This makes the code a lot simpler, and it also seems correct
since the LLVM Lang Ref defines simple true/false returns if the
operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand
exception, so they won't be slipping through unnoticed.
Implementation-wise, this introduces a template so we can share the same
code that we use for handling integer comparisons, since the only
differences are in the details (exact opcodes to be used etc). Hopefully
this will be easy to extend to s64 G_FCMP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307365 91177308-0d34-0410-b5e6-96231b3b80d8
Based strictly on the name, this seems to have something to do
width edit & continue. The goal of this patch has nothing to do
with supporting edit and continue though. msvc link.exe writes
very basic information into this area even when *not* compiling
with support for E&C, and so the goal here is to bring lld-link
to parity. Since we cannot know what assumptions standard tools
make about the content of PDB files, we need to be as close as
possible.
This ECNames data structure is a standard PDB string hash table.
link.exe puts a single string into this hash table, which is the
full path to the PDB file on disk. It then references this string
from the module descriptor for the compiler generated `* Linker *`
module.
With this patch, lld-link will generate the exact same sequence of
bytes as MSVC link for this subsection for a given object file
input (as reported by `llvm-pdbutil bytes -ec`).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307356 91177308-0d34-0410-b5e6-96231b3b80d8
Contrary to the stepForward()/stepBackward() method accumulate() doesn't
have a direction as defs, uses and clobbers all have the same effect.
Also improve the documentation comment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307351 91177308-0d34-0410-b5e6-96231b3b80d8
This patch updates the ORC layers and utilities to return and propagate
llvm::Errors where appropriate. This is necessary to allow ORC to safely handle
error cases in cross-process and remote JITing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307350 91177308-0d34-0410-b5e6-96231b3b80d8
InferAddressSpaces does not check address space in collectFlatAddressExpressions,
which causes values with non flat address space put into Postorder and causes
assertion in cloneValueWithNewAddressSpace.
This patch fixes assertion in OpenCL 2.0 conformance test generic_address_space
subtest for amdgcn target.
Differential Revision: https://reviews.llvm.org/D34991
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307349 91177308-0d34-0410-b5e6-96231b3b80d8
Model weakly defined symbols as symbols that are both
exports and imported and marked as weak. Local references
to the symbols refer to the import but the linker can
resolve this to the weak export if not strong symbol
is found at link time.
Differential Revision: https://reviews.llvm.org/D35029
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307348 91177308-0d34-0410-b5e6-96231b3b80d8
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "Copy arguments passed by value into explicit allocas for ASan."
Revert "[asan] Add end-to-end tests for overflows of byval arguments."
Build failure on lldb-x86_64-ubuntu-14.04-buildserver.
Test failure on clang-cmake-aarch64-42vma and sanitizer-x86_64-linux-android.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307345 91177308-0d34-0410-b5e6-96231b3b80d8
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Patch by Matt Morehouse.
Differential revision: https://reviews.llvm.org/D34789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307342 91177308-0d34-0410-b5e6-96231b3b80d8
Added a new Enum to identify if the base pointer is exclusively null or
exlusively some constant or not exclusively any constant.
Converted the base pointer identification method from recursive to
iterative form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307340 91177308-0d34-0410-b5e6-96231b3b80d8
Using profile information to guide consthoisting is generally helpful for
performance, so the patch turns it on by default. No compile time or perf
regression were found using spec2000 and spec2006 on x86. Some significant
improvement (>20%) was seen on internal benchmarks.
Differential Revision: https://reviews.llvm.org/D35063
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307338 91177308-0d34-0410-b5e6-96231b3b80d8
The patch is to adjust the strategy of frequency based consthoisting:
Previously when the candidate block has the same frequency with the existing
blocks containing a const, it will not hoist the const to the candidate block.
For that case, now we change the strategy to hoist the const if only existing
blocks have more than one block member. This is helpful for reducing code size.
Differential Revision: https://reviews.llvm.org/D35084
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307328 91177308-0d34-0410-b5e6-96231b3b80d8
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307326 91177308-0d34-0410-b5e6-96231b3b80d8
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12!12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307308 91177308-0d34-0410-b5e6-96231b3b80d8
This is the same as r304719 but for ThinLTO.
The substantial difference is that in this case we don't have
whole visibility, just the summary.
In the LTO case, when we got the resolution for the input file we
could just see if the linker told us whether a symbol was linker
redefined (using --wrap or --defsym) and switch the linkage directly
for the GV.
Here, we have the summary. So, we record that the linkage changed
from <whatever it was> to $weakany to prevent IPOs across this symbol
boundaries and actually just switch the linkage at FunctionImport time.
This patch should also fixes the lld bits (as all the scaffolding for
communicating if a symbol is linker redefined should be there & should
be the same), but I'll make sure to add some tests there as well.
Fixes PR33192.
Differential Revision: https://reviews.llvm.org/D35064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307303 91177308-0d34-0410-b5e6-96231b3b80d8
Allows the MachineIRBuilder APIs to directly create registers (based on
LLT or TargetRegisterClass) as well as accept MachineInstrBuilders
and implicitly converts to register(with getOperand(0).getReg()).
Eg usage:
LLT s32 = LLT::scalar(32);
auto C32 = Builder.buildConstant(s32, 32);
auto Tmp = Builder.buildInstr(TargetOpcode::G_SUB, s32, C32,
OtherReg);
auto Tmp2 = Builder.buildInstr(Opcode, DstReg,
Builder.buildConstant(s32, 31)); ....
Only a few methods added for now.
Reviewed by Tim
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307302 91177308-0d34-0410-b5e6-96231b3b80d8
The InstrProfWriter already stores the name and hash of the record in
the nested maps it uses for lookup while merging - this data is
duplicated in the value within the maps.
Refactor the InstrProfRecord to use a nested struct for the counters
themselves so that InstrProfWriter can use this nested struct alone
without the name or hash duplicated there.
This work is incomplete, but enough to demonstrate the value (around a
50% decrease in memory usage for a large test case (10GB -> 5GB)).
Though most of that decrease is probably from removing the
SoftInstrProfError as well, but I haven't implemented a replacement for
it yet. (it needs to go with the counters, because the operations on the
counters - merging, etc, are where the failures are - unlike the
name/hash which are totally unused by those counter-related operations
and thus easy to split out)
Ongoing discussion about removing SoftInstrProfError as a field of the
InstrProfRecord is happening on the thread that added it - including
the possibility of moving back towards an earlier version of that
proposed patch that passed SoftInstrProfError through the various APIs,
rather than as a member of InstrProfRecord.
Reviewers: davidxl
Differential Revision: https://reviews.llvm.org/D34838
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307298 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
`Instruction::Switch`: only first operand can be set to a non-constant value.
`Instruction::InsertValue` both the first and the second operand can be set to a non-constant value.
`Instruction::Alloca` return true for non-static allocation.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: srhines, pirama, llvm-commits
Differential Revision: https://reviews.llvm.org/D34905
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307294 91177308-0d34-0410-b5e6-96231b3b80d8
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307292 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, we do not support multiple exiting blocks to the
latch exit block. However, this bailout wasn't triggered when we had a
unique exit block (which is the latch exit), with multiple exiting
blocks to that unique exit.
Moved the bailout so that it's triggered in both cases and added
testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307291 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: In this code we got to Dom by following the predecessor link of BB. So it stands to reason that BB should also show up as a successor of Dom's terminator right? There isn't a way to have the CFG connect in only one direction is there?
Reviewers: jmolloy, davide, mcrosier
Reviewed By: mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D35025
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307276 91177308-0d34-0410-b5e6-96231b3b80d8
Bswap isn't a simple operation so we need to make sure we are really removing a call to it before doing these simplifications.
For the case when both LHS and RHS are bswaps I've allowed it to be moved if either LHS or RHS has a single use since that at least allows us to move it later where it might find another bswap to combine with and it decreases the use count on the other side so maybe the other user can be optimized.
Differential Revision: https://reviews.llvm.org/D34974
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307273 91177308-0d34-0410-b5e6-96231b3b80d8
When the formulae search space is huge, LSR uses a series of heuristic to keep
pruning the search space until the number of possible solutions are within
certain limit.
The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs,
which picks the register which is used by the most LSRUses and deletes the other
formulae which don't use the register. This is a effective way to prune the search
space, but quite often not a good way to keep the best solution. We saw cases before
that the heuristic pruned the best formula candidate out of search space.
To relieve the problem, we introduce a new heuristic called
NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to
reduce the search space while keeping the best formula, we want to keep as many
formulae with different Scale and ScaledReg as possible. That is because the central
idea of LSR is to choose a group of loop induction variables and use those induction
variables to represent LSRUses. An induction variable candidate is often represented
by the Scale and ScaledReg in a formula. If we have more formulae with different
ScaledReg and Scale to choose, we have better opportunity to find the best solution.
That is why we believe pruning search space by only keeping the best formula with the
same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use
two criteria to choose the best formula with the same Scale and ScaledReg. The first
criteria is to select the formula using less non shared registers, and the second
criteria is to select the formula with less cost got from RateFormula. The patch
implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the
last resort.
Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly
testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help
on the two improved internal cases we saw.
Differential Revision: https://reviews.llvm.org/D34583
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307269 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Added MachineVerifier code to check register ties more thoroughly, especially so that physical registers that are tied are the same. This may help e.g. when creating MIR files.
Original patch by Jesper Antonsson
Reviewers: stoklund, sanjoy, qcolombet
Reviewed By: qcolombet
Subscribers: qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D34394
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307259 91177308-0d34-0410-b5e6-96231b3b80d8
It appears that the problem is still there. Needs more analysis to understand why
SaturatedMultiply test fails.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307249 91177308-0d34-0410-b5e6-96231b3b80d8