Reapplying patch as it was reverted when it was first
committed because of an assertion failure when the
mrrc2 intrinsic was called in ARM mode. The failure
was happening because the instruction was being built
in ARMISelDAGToDAG.cpp and the tablegen description for
mrrc2 instruction doesn't allow you to use a predicate.
The ARM architecture manuals do say that mrrc2 in ARM
mode can be predicated with AL in assembly but this has
no effect on the encoding of the instruction as the top
4 bits will always be 1111 not 1110 which is the encoding
for the condition AL.
Differential Revision: http://reviews.llvm.org/D21408
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272982 91177308-0d34-0410-b5e6-96231b3b80d8
When calculating a square root using Newton-Raphson with two constants,
a naive implementation is to use five multiplications (four muls to calculate
reciprocal square root and another one to calculate the square root itself).
However, after some reassociation and CSE the same result can be obtained
with only four multiplications. Unfortunately, there's no reliable way to do
such a reassociation in the back-end. So, the patch modifies NR code itself
so that it directly builds optimal code for SQRT and doesn't rely on any
further reassociation.
Patch by Nikolai Bozhenov!
Differential Revision: http://reviews.llvm.org/D21127
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272920 91177308-0d34-0410-b5e6-96231b3b80d8
The R_ARM_PLT32 relocation is deprecated and is not produced by MC.
This means that the code being deleted is dead from the .o point of
view and was making the .s more confusing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272909 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
[ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore
lower frameindex nodes such that there is a copy to a CPU16Regs register. This
is now done consistently using a separate addressing mode that does not
permit frameindex nodes.
As part of this I've had to remove an optimization that reduced the number of
instructions needed to work around the lack of sp-relative addresses on [ls][bh]
and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as
a copy of the stack pointer and it's implementation was the root cause of many
of the register vs register class mismatches.
lw/sw can use sp-relative addresses but we ought to ensure that we use the
correct version of lw/sw internally for things like IAS. This is not currently
the case and this change does not fix this. However, this change does clean it
up sufficiently well to fix the machine verifier failures.
Also removed irrelevant functions from stchar.ll.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21062
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272882 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The Mips implementation only covers the feature bits described by the ELF
e_flags so far. Mips stores additional feature bits such as MSA in the
.MIPS.abiflags section.
Also fixed a small bug this revealed where microMIPS wouldn't add the
EF_MIPS_MICROMIPS flag when using -filetype=obj.
Reviewers: echristo, rafael
Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21125
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272880 91177308-0d34-0410-b5e6-96231b3b80d8
The backend has been around for years, it's pretty ridiculous that we can't
even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen
can't handle the complex predicates when printing so it's a bunch of nasty C++.
Oh well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272865 91177308-0d34-0410-b5e6-96231b3b80d8
Of course the assembly was right but because the opcode was MOVZWi it was
encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and
would go horribly wrong on a CPU.
No idea how this bug survived this long. It seems nobody is using that aspect
of patchpoints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272831 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to emit native IR in Clang (next commit).
Also, update the intrinsic tests to show that codegen already knows how to handle
the IR that Clang will soon produce.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272806 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.
Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.
An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21326
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272761 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
... when the offset is not statically known.
Prioritize addresses relative to the stack pointer in the stackmap, but
fallback gracefully to other modes of addressing if the offset to the
stack pointer is not a known constant.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm
Differential Revision: http://reviews.llvm.org/D21259
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272756 91177308-0d34-0410-b5e6-96231b3b80d8
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272737 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20991
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272729 91177308-0d34-0410-b5e6-96231b3b80d8
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272709 91177308-0d34-0410-b5e6-96231b3b80d8
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.
Differential Revision: http://reviews.llvm.org/D20931
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272694 91177308-0d34-0410-b5e6-96231b3b80d8
Change EmitGlobalVariable to check final assembler section is in BSS
before using .lcomm/.comm directive. This prevents globals from being
put into .bss erroneously when -data-sections is used.
This fixes PR26570.
Reviewers: echristo, rafael
Subscribers: llvm-commits, mehdi_amini
Differential Revision: http://reviews.llvm.org/D21146
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272674 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of always using addu to adjust the stack pointer when the
size out is of the range of an addiu instruction, use subu so that
a smaller constant can be generated.
This can give savings of ~3 instructions whenever a function has a
a stack frame whose size is out of range of an addiu instruction.
This change may break some naive stack unwinders.
Partially resolves PR/26291.
Thanks to David Chisnall for reporting the issue.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D21321
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272666 91177308-0d34-0410-b5e6-96231b3b80d8
We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256.
Found by Oliver Stannard and csmith!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272665 91177308-0d34-0410-b5e6-96231b3b80d8
PR27458 highlights that the MIPS backend does not have well formed
MIR for atomic operations (among other errors).
This patch adds expands and corrects the LL/SC descriptions and uses
for MIPS(64).
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D19719
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272655 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The machine verifier reports 'Explicit operand marked as def' when it is
manually specified even though it agrees with the operand info.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272646 91177308-0d34-0410-b5e6-96231b3b80d8
The exit-on-error flag in the ARM test is necessary in order to avoid an
unreachable in the DAGTypeLegalizer, when trying to expand a physical register.
We can also avoid this situation by introducing a bitcast early on, where the
invalid scalar-to-vector conversion is detected.
We also add a test for PowerPC, which goes through a similar code path in the
SelectionDAGBuilder.
Fixes PR27765.
Differential Revision: http://reviews.llvm.org/D21061
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272644 91177308-0d34-0410-b5e6-96231b3b80d8
In rL272580 I accidentally added a test case to test/CodeGen when
test/Transforms/DeadStoreElimination/ is a better place for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272581 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D21286
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272580 91177308-0d34-0410-b5e6-96231b3b80d8
The need for these intrinsics has been obviated by r272564 which
reimplements their functionality using generic IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272566 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1
This makes one particular compute shader 8x faster.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D21136
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272556 91177308-0d34-0410-b5e6-96231b3b80d8
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.
As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).
Author: colpell
Differential Revision: http://reviews.llvm.org/D21239
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272547 91177308-0d34-0410-b5e6-96231b3b80d8
MRRC/MRRC2 instruction writes to two registers. The
intrinsic definition returns a single uint64_t to
represent the write, this is a compact way of
representing a write to two 32 bit registers,
the alternative might have been two return a
struct of 2 uint32_t's but this isn't as nice.
Differential Revision:
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272544 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044
By changing the definition of X86ISD::CMPP to use float types, we allow it to be created
and pass legalization for an SSE1-only target where v4i32 is not legal.
The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001
and eventually makes this trigger:
http://reviews.llvm.org/D21190
Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.
Differential Revision: http://reviews.llvm.org/D21235
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272511 91177308-0d34-0410-b5e6-96231b3b80d8
A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272508 91177308-0d34-0410-b5e6-96231b3b80d8
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file
Refreshed some tests to demonstrate the new check
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272488 91177308-0d34-0410-b5e6-96231b3b80d8
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272477 91177308-0d34-0410-b5e6-96231b3b80d8
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272471 91177308-0d34-0410-b5e6-96231b3b80d8
The vector cases don't change because we already have folds in X86ISelLowering
to look through and remove bitcasts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272427 91177308-0d34-0410-b5e6-96231b3b80d8
Support and generate Compare and Traps like CRT, CIT, etc.
Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.
Differential Revision: http://reviews.llvm.org/D21155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272419 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.
Reviewers: arsenm, kzhuravl
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21153
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272417 91177308-0d34-0410-b5e6-96231b3b80d8
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272414 91177308-0d34-0410-b5e6-96231b3b80d8
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272403 91177308-0d34-0410-b5e6-96231b3b80d8
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).
There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272397 91177308-0d34-0410-b5e6-96231b3b80d8
Memory operand is new for AVX512 (SSE/AVX2 didn't support it).
Also dropped the 'mask' from the tests (VPSLLDQ/VPSRLDQ don't support masked operations).
Regenerated VPALIGNR test now that the shuffle comments work
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272383 91177308-0d34-0410-b5e6-96231b3b80d8
The test case is not great espicially because it is still cumbersome to
run the regalloc pass with run-pass. (We miss a bunch of initiliazier to
be properly implemented.)
Related to llvm.org/PR27983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272360 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we could run only one machine pass with the run-pass option.
With that patch, we can now specify several passes with several run-pass
options (or just one option with a list of comma separated passes) and
llc will build the related pipeline.
This is great to test the interaction of two passes that are not
necessarily next to each other in the pipeline, or play with pass
ordering.
Now, we should be at parity with opt for the flexibility of running
passes.
Note: I also moved the run pass option from CommandFlags.h to llc.cpp
because, really, this is needed only there!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272356 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.
Reviewers: scchan, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19994
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272349 91177308-0d34-0410-b5e6-96231b3b80d8
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272345 91177308-0d34-0410-b5e6-96231b3b80d8
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.
There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272344 91177308-0d34-0410-b5e6-96231b3b80d8
Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272308 91177308-0d34-0410-b5e6-96231b3b80d8
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272300 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D21160
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272298 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies commit r271930, r271915, r271923. They hit a bug in
Thumb which is fixed in r272258 now.
The original message:
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272267 91177308-0d34-0410-b5e6-96231b3b80d8
This enables use of the 'S' constraint for inline ASM operands on
SystemZ, which allows for a memory reference with a signed 20-bit
immediate displacement. This patch includes corresponding documentation
and test case updates.
I've changed the 'T' constraint to match the new behavior for 'S', as
'T' also uses a long displacement (though index constraints are still
not implemented). I also changed 'm' to match the behavior for 'S' as
this will allow for a wider range of displacements for 'm', though
correct me if that's not the right decision.
Author: colpell
Differential Revision: http://reviews.llvm.org/D21097
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272266 91177308-0d34-0410-b5e6-96231b3b80d8
This is made possible by removing an assert in llc that assumed
MIRParser::parseLLVMModule would exit on error. MIRParser's documentation states
that it returns null if a parsing error occurs, so there's no reason to assert.
We can instead just fall through to where the check for a module is performed
and exit if it is null.
This commit is part of the clean-up after r269655.
Fixes PR27770
Differential Revision: http://reviews.llvm.org/D20371
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272254 91177308-0d34-0410-b5e6-96231b3b80d8
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272251 91177308-0d34-0410-b5e6-96231b3b80d8
Without that check it was possible to write test cases where the size
was not specified and we ended up with weird asserts down the road,
because the default value (1) would not make sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272226 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Consider the following diamond CFG:
A
/ \
B C
\/
D
Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20989
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272203 91177308-0d34-0410-b5e6-96231b3b80d8
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.
Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272170 91177308-0d34-0410-b5e6-96231b3b80d8
The MSR instructions can write to the CPSR, but we did not model this
fact, so we could emit them in the middle of IT blocks, changing the
condition flags for later instructions in the block.
The tests use two calls to llvm.write_register.i32 because it is valid
to use these instructions at the end of an IT block, which if conversion
does do in some cases. With two calls, the first clobbers the flags, so
a branch has to be used to make the second one conditional.
Differential Revision: http://reviews.llvm.org/D21139
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272154 91177308-0d34-0410-b5e6-96231b3b80d8
The MachineMemOperand parser lacked the code to handle %stack.X
references (%fixed-stack.X was working).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272082 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.
The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.
Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D20839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272063 91177308-0d34-0410-b5e6-96231b3b80d8
There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272060 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch is adding support for the MSVC buffer security check implementation
The buffer security check is turned on with the '/GS' compiler switch.
* https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
* To be added to clang here: http://reviews.llvm.org/D20347
Some overview of buffer security check feature and implementation:
* https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
* http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
* http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
For the following example:
```
int example(int offset, int index) {
char buffer[10];
memset(buffer, 0xCC, index);
return buffer[index];
}
```
The MSVC compiler is adding these instructions to perform stack integrity check:
```
push ebp
mov ebp,esp
sub esp,50h
[1] mov eax,dword ptr [__security_cookie (01068024h)]
[2] xor eax,ebp
[3] mov dword ptr [ebp-4],eax
push ebx
push esi
push edi
mov eax,dword ptr [index]
push eax
push 0CCh
lea ecx,[buffer]
push ecx
call _memset (010610B9h)
add esp,0Ch
mov eax,dword ptr [index]
movsx eax,byte ptr buffer[eax]
pop edi
pop esi
pop ebx
[4] mov ecx,dword ptr [ebp-4]
[5] xor ecx,ebp
[6] call @__security_check_cookie@4 (01061276h)
mov esp,ebp
pop ebp
ret
```
The instrumentation above is:
* [1] is loading the global security canary,
* [3] is storing the local computed ([2]) canary to the guard slot,
* [4] is loading the guard slot and ([5]) re-compute the global canary,
* [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
Overview of the current stack-protection implementation:
* lib/CodeGen/StackProtector.cpp
* There is a default stack-protection implementation applied on intermediate representation.
* The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
* An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
* Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
* Guard manipulation and comparison are added directly to the intermediate representation.
* lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
* lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
* There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
* see long comment above 'class StackProtectorDescriptor' declaration.
* The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
* 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
* The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
* include/llvm/Target/TargetLowering.h
* Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
* lib/Target/X86/X86ISelLowering.cpp
* Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
Function-based Instrumentation:
* The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
* To support function-based instrumentation, this patch is
* adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
* If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
* modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
* generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
* if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
Modifications
* adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
* adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
Results
* IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```
```
*** Final LLVM Code input to ISel ***
; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
%StackGuardSlot = alloca i8* <<<-- Allocated guard slot
%0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value
call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot)
%index.addr = alloca i32, align 4
%offset.addr = alloca i32, align 4
%buffer = alloca [10 x i8], align 1
store i32 %index, i32* %index.addr, align 4
store i32 %offset, i32* %offset.addr, align 4
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
%1 = load i32, i32* %index.addr, align 4
call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
%2 = load i32, i32* %index.addr, align 4
%arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
%3 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %3 to i32
%4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot
call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check
ret i32 %conv
}
```
* SelectionDAG generated instrumentation:
```
clang-cl /GS test.cc /O1 /c /FA
```
```
"?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z"
# BB#0: # %entry
pushl %esi
subl $16, %esp
movl ___security_cookie, %eax <<<-- Loading Stack Guard value
movl 28(%esp), %esi
movl %eax, 12(%esp) <<<-- Store to Guard slot
leal 2(%esp), %eax
pushl %esi
pushl $204
pushl %eax
calll _memset
addl $12, %esp
movsbl 2(%esp,%esi), %esi
movl 12(%esp), %ecx <<<-- Loading Guard slot
calll @__security_check_cookie@4 <<<-- Epilogue function-based check
movl %esi, %eax
addl $16, %esp
popl %esi
retl
```
Reviewers: kcc, pcc, eugenis, rnk
Subscribers: majnemer, llvm-commits, hans, thakis, rnk
Differential Revision: http://reviews.llvm.org/D20346
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272053 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
Differential Review: http://reviews.llvm.org/D20965
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272010 91177308-0d34-0410-b5e6-96231b3b80d8
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.
This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).
This is needed for Swift and (possibly) other non-clang frontends.
Fixes PR26190.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272005 91177308-0d34-0410-b5e6-96231b3b80d8
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.
This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272003 91177308-0d34-0410-b5e6-96231b3b80d8
A Thumb-2 post-indexed LDR instruction such as:
ldr.w r0, [r1], #4
Can be rewritten as:
ldm.n r1!, {r0}
LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272002 91177308-0d34-0410-b5e6-96231b3b80d8
If we have an LDM that uses only low registers and doesn't write to its base register:
ldm.w r0, {r1, r2, r3}
And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding:
ldm.n r0!, {r1, r2, r3}
Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272000 91177308-0d34-0410-b5e6-96231b3b80d8
TLS access requires an offset from the TLS index. The index itself is the
section-relative distance of the symbol. For ARM, the relevant relocation
(IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may
not be an immediate and must be lowered into a constant pool. This offset will
not be base relocated. We were previously emitting the actual address of the
symbol which would be base relocated and would therefore be the vaue offset by
the ImageBase + TLS Offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271974 91177308-0d34-0410-b5e6-96231b3b80d8
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271935 91177308-0d34-0410-b5e6-96231b3b80d8
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
Differential Revision: http://reviews.llvm.org/D20276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271925 91177308-0d34-0410-b5e6-96231b3b80d8
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.
Differential Revision: http://reviews.llvm.org/D20796
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271900 91177308-0d34-0410-b5e6-96231b3b80d8
Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271828 91177308-0d34-0410-b5e6-96231b3b80d8