Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks.
Reviewers: kariddi, resistor, hans, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11852
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244642 91177308-0d34-0410-b5e6-96231b3b80d8
Other objects can never reference the MergedGlobals symbol so external linkage
is never needed. Using private instead of internal linkage means the object is
more similar to what it looks like when global merging is not enabled, with
the only difference being that the merged variables are addressed indirectly
relative to the start of the section they are in.
Also add aliases for merged variables with internal linkage, as this also makes
the object be more like what it is when they are not merged.
Differential Revision: http://reviews.llvm.org/D11942
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244615 91177308-0d34-0410-b5e6-96231b3b80d8
I incorrectly wrote CHECK-NEXT with followin with ':', the check was
ignored by FileCheck.
The non-inbound GEP is folded here because the DataLayout is no longer
optional, the fold was originally guarded with a comment that said:
We need TD information to know the pointer size unless this is inbounds.
Now we always have "TD information" and perform the fold.
Thanks Jonathan Roelofs for noticing.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244613 91177308-0d34-0410-b5e6-96231b3b80d8
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.
Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244601 91177308-0d34-0410-b5e6-96231b3b80d8
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:
- Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
- f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244595 91177308-0d34-0410-b5e6-96231b3b80d8
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.
matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.
C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.
ARM and AArch64 at least have different instructions for these different cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244580 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch remaps the assembly idiom 'move' to 'or' instead of 'daddu' or
'addu'. The use of addu/daddu instead of or as move was highlighted as a
performance issue during the analysis of a recent 64bit design. Originally
move was encoded as 'or' by binutils but was changed for the r10k cpu family
due to their pipeline which had 2 arithmetic units and a single logical unit,
and so could issue multiple (d)addu based moves at the same time but only 1
logical move.
This patch preserves the disassembly behaviour so that disassembling a old style
(d)addu move still appears as move, but assembling move always gives an or
Patch by Simon Dardis.
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11796
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244579 91177308-0d34-0410-b5e6-96231b3b80d8
When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp"
following a call by one or two pops, respectively. We don't try to do it in
general, but only when the stack adjustment immediately follows a call - which
is the most common case.
That allows taking a short-cut when trying to find a free register to pop into,
instead of a full-blown liveness check. If the adjustment immediately follows a
call, then every register the call clobbers but doesn't define should be dead at
that point, and can be used.
Differential Revision: http://reviews.llvm.org/D11749
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244578 91177308-0d34-0410-b5e6-96231b3b80d8
The condition for clearing the folding candidate list was clamped together
with the "uninteresting instruction" condition. This is too conservative,
e.g. we don't need to clear the list when encountering an IMPLICIT_DEF.
Differential Revision: http://reviews.llvm.org/D11591
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244577 91177308-0d34-0410-b5e6-96231b3b80d8
These tests pass with Windows 7 x64 + MSYS2. I'll see if the bots like
them as well and disable the failing ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244572 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested.
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11927
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244562 91177308-0d34-0410-b5e6-96231b3b80d8
This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244555 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914.
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11925
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244551 91177308-0d34-0410-b5e6-96231b3b80d8
This commit serializes the UsedPhysRegMask register mask from the machine
register information class. The mask is serialized as an inverted
'calleeSavedRegisters' mask to keep the output minimal.
This commit also allows the MIR parser to infer this mask from the register
mask operands if the machine function doesn't specify it.
Reviewers: Duncan P. N. Exon Smith
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244548 91177308-0d34-0410-b5e6-96231b3b80d8
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244523 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For now output using C99's hexadecimal floating-point representation.
This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11914
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244520 91177308-0d34-0410-b5e6-96231b3b80d8
The PATCHPOINT instructions have a single optional defined register operand,
but the machine verifier can't verify the optional defined register operands.
This commit makes sure that the machine verifier won't report an error when a
PATCHPOINT instruction doesn't have its optional defined register operand.
This change will allow us to enable the machine verifier for the code
generation tests for the patchpoint intrinsics.
Reviewers: Juergen Ributzka
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244513 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This makes it so that reports symbolized after the fact with
llvm-symbolizer are more similar to the ones we generate at runtime with
in-process dbghelp.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11785
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244512 91177308-0d34-0410-b5e6-96231b3b80d8
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
Reviewers: Juergen Ributzka
Differential Revision: http://reviews.llvm.org/D11853
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244508 91177308-0d34-0410-b5e6-96231b3b80d8
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.
As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.
I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:
| Time per call (ms) | Runtime (ms) | Benchmark |
| 0.000012514 | 6257 | sete.i386 |
| 0.000012810 | 6405 | sete.i386-fast |
| 0.000010456 | 5228 | sete.x86-64 |
| 0.000010496 | 5248 | sete.x86-64-fast |
| 0.000012906 | 6453 | lahf-sahf.i386 |
| 0.000013236 | 6618 | lahf-sahf.i386-fast |
| 0.000010580 | 5290 | lahf-sahf.x86-64 |
| 0.000010304 | 5152 | lahf-sahf.x86-64-fast |
| 0.000028056 | 14028 | pushf-popf.i386 |
| 0.000027160 | 13580 | pushf-popf.i386-fast |
| 0.000023810 | 11905 | pushf-popf.x86-64 |
| 0.000026468 | 13234 | pushf-popf.x86-64-fast |
Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.
Reviewers: rnk, jvoung, t.p.northover
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D6629
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244503 91177308-0d34-0410-b5e6-96231b3b80d8