Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.
Reviewers: scchan, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19994
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272349 91177308-0d34-0410-b5e6-96231b3b80d8
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272345 91177308-0d34-0410-b5e6-96231b3b80d8
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.
There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272344 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We failed to unpoison uninteresting allocas on return as unpoisoning is part of
main instrumentation which skips such allocas.
Added check -asan-instrument-allocas for dynamic allocas. If instrumentation of
dynamic allocas is disabled it will not will not be unpoisoned.
PR27453
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21207
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272341 91177308-0d34-0410-b5e6-96231b3b80d8
I'm still not sure under what circumstances the offset here is non-0,
but private memory is not limited to 27-bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272337 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this patch, we used argument/global stratified attributes in
order to note that a value could have come from either dereferencing a
global/arg, or from the assignment from a global/arg.
Now, AttrUnknown is placed on sets when we see a dereference, instead of
the global/arg attributes. This allows us to be more aggressive in the
future when we see global/arg attributes without AttrUnknown.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21110
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272335 91177308-0d34-0410-b5e6-96231b3b80d8
[cpu-detection] [amdfam10] Return barcelona, and amdfam10 for all other
subtypes. Address Bug 28067.
Along with the refactoring of Host.cpp, getHostCPUName() was modified to
return more precise types for CPUs in amdfam10.
However, callers of getHostCPUName() do string matching on type, so this
cannot be modified.
Currently there is support in the x86 backend for barcelona.
For all other subtypes the assumed return value is amdfam10.
Fix: getHostCPUName() returns barcelona subtype and amdfam10 for all
others. This can be extended further when support for the other subtypes
is added.
Differential revision: http://reviews.llvm.org/D21193
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272333 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272321 91177308-0d34-0410-b5e6-96231b3b80d8
When we delete a live-range, we check if that live-range is the origin of others
to keep it around for rematerialization. For that we check that the instruction
we are about to remove is the same as the definition of the VNI of the original
live-range.
If this is the case, we just shrink the live-range to an empty one.
Now, when we try to delete one of the children of such live-range (product of
splitting), we do the same check.
However, now the original live-range is empty and there is no way we can
access the VNI to check its definition, and we crash.
When we cannot get the VNI for the original live-range, that means we are not in
the presence of the original definition. Thus, this check does not need to happen
in that case and the crash is sloved!
This bug was introduced in r266162 | wmi | 2016-04-12 20:08:27. It affects every
target that uses the greedy register allocator.
To happen, we need to delete both a the original instruction and its split
products, in that order. This is likely to happen when rematerialization comes
into play.
Trying to produce a more robust test case. Will follow in a coming commit.
This fixes llvm.org/PR27983.
rdar://problem/26651519
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272314 91177308-0d34-0410-b5e6-96231b3b80d8
Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272308 91177308-0d34-0410-b5e6-96231b3b80d8
r267296 used std::piecewise_construct without using
std::forward_as_tuple, and r267298 hacked it out (using an emplace_back
followed by a couple of reset() calls) because of a problem on a bot.
I'm finally circling back to call forward_as_tuple as I should have to
begin with (thanks to David Blaikie for pointing out the missing piece).
Note that this code uses emplace_back() instead of
push_back(make_pair()) because the move constructor for TrackingMDRef is
expensive (cheaper than a copy, but still expensive).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272306 91177308-0d34-0410-b5e6-96231b3b80d8
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272300 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D21160
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272298 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, we materialized secondary vector IVs from the primary scalar IV,
by offseting the primary to match the correct start value, and then broadcasting
it - inside the loop body. Instead, we can use a real vector IV, like we do for
the primary.
This enables using vector IVs for secondary integer IVs whose type matches the
type of the primary.
Differential Revision: http://reviews.llvm.org/D20932
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272283 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies commit r271930, r271915, r271923. They hit a bug in
Thumb which is fixed in r272258 now.
The original message:
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272267 91177308-0d34-0410-b5e6-96231b3b80d8
This enables use of the 'S' constraint for inline ASM operands on
SystemZ, which allows for a memory reference with a signed 20-bit
immediate displacement. This patch includes corresponding documentation
and test case updates.
I've changed the 'T' constraint to match the new behavior for 'S', as
'T' also uses a long displacement (though index constraints are still
not implemented). I also changed 'm' to match the behavior for 'S' as
this will allow for a wider range of displacements for 'm', though
correct me if that's not the right decision.
Author: colpell
Differential Revision: http://reviews.llvm.org/D21097
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272266 91177308-0d34-0410-b5e6-96231b3b80d8
ReplaceTailWithBranchTo assumed that if an instruction is predicated, it must be part of an IT block. This is not correct for conditional branches.
No testcase as this was triggered by the reverted patch r272017 - test coverage will occur when that patch is re-reverted and there is no known way to trigger this in the meantime.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272258 91177308-0d34-0410-b5e6-96231b3b80d8
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272251 91177308-0d34-0410-b5e6-96231b3b80d8
Add support to the AArch64 IAS for the `.arch` directive. This allows the
assembly input to use architectural functionality in part of a file. This is
used in existing code like BoringSSL.
Resolves PR26016!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272241 91177308-0d34-0410-b5e6-96231b3b80d8
We can safely rely on a NoWrap add recurrence causing UB down the road
only if we know the loop does not have a exit expressed in a way that is
opaque to ScalarEvolution (e.g. by a function call that conditionally
calls exit(0)).
I believe with this change PR28012 is fixed.
Note: I had to change some llvm-lit tests in LoopReroll, since it looks
like they were depending on this incorrect behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272237 91177308-0d34-0410-b5e6-96231b3b80d8
looking for it along $PATH. This allows installs of LLVM tools outside of
$PATH to find the symbolizer and produce pretty backtraces if they crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272232 91177308-0d34-0410-b5e6-96231b3b80d8
TPI hash table contains a parallel array for the type records.
For each type record R, a hash value is calculated by `H(R) % NumBuckets`
where H is a hash function, and the result is stored to a bucket element.
H is TPI1::hashPrec function in microsoft-pdb repository.
Our hash function does not support all type record types yet.
Currently it supports only records for line number.
I'll extend it in a follow up patch.
The aim of verify the hash table is not only detect corrupted files.
It ensures that our understanding of how the hash values are calculated
is correct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272229 91177308-0d34-0410-b5e6-96231b3b80d8
Without that check it was possible to write test cases where the size
was not specified and we ended up with weird asserts down the road,
because the default value (1) would not make sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272226 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes PR26682. Also add LCSSA as a preserved pass to LoopSimplify,
that looks correct to me and allows to write a test for the issue.
Reviewers: chandlerc, bogner, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21112
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272224 91177308-0d34-0410-b5e6-96231b3b80d8
For complex rewrittings, which do not occur currently, the related
machine instruction may have been deleted in the process. Therefore, do
not try to print it after the mapping is applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272209 91177308-0d34-0410-b5e6-96231b3b80d8
Refactor the code so that we do not compute in two different places the
end iterator for the range of new virtual registers for a given operand.
Although this refactoring was intended as NFC, this is not the case
because it actually fixes a bug where we were returning a range off by 1
(too long). Right now, this could not result in an actual bug because we
were accessing this range via the BreakDown size of the related operand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272208 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes PR27617.
Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.
The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.
Reviewers: nadav, mzolotukhin
Subscribers: JesperAntonsson, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20685
Patch by Jesper Antonsson!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272206 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Consider the following diamond CFG:
A
/ \
B C
\/
D
Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20989
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272203 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Now DISubroutineType has a 'cc' field which should be a DW_CC_ enum. If
it is present and non-zero, the backend will emit it as a
DW_AT_calling_convention attribute. On the CodeView side, we translate
it to the appropriate enum for the LF_PROCEDURE record.
I added a new LLVM vendor specific enum to the list of DWARF calling
conventions. DWARF does not appear to attempt to standardize these, so I
assume it's OK to do this until we coordinate with GCC on how to emit
vectorcall convention functions.
Reviewers: dexonsmith, majnemer, aaboud, amccarth
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D21114
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272197 91177308-0d34-0410-b5e6-96231b3b80d8
with user specified count has been applied.
Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D20765
From: Evgeny Stupachenko <evstupac@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272195 91177308-0d34-0410-b5e6-96231b3b80d8
Again, the Microsoft linker does not like empty substreams.
We still emit an empty string table if CodeView is enabled, but that
doesn't cause problems because it always contains at least one null
byte.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272183 91177308-0d34-0410-b5e6-96231b3b80d8
This is NFC as far as externally visible behavior is concerned, but will
keep us from spinning in the worklist traversal algorithm unnecessarily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272182 91177308-0d34-0410-b5e6-96231b3b80d8
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB. We also need to guard against calls that terminate the thread
or infinite loop themselves.
This partially addresses PR28012.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272181 91177308-0d34-0410-b5e6-96231b3b80d8
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison. This
change fixes the problem and makes the code structure more obvious.
Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272179 91177308-0d34-0410-b5e6-96231b3b80d8
When repairing with a copy, instead of accounting for the cost of that
copy and actually inserting it, we may be able to use an alternative
source for the register to repair and just use it.
Make sure this is documented, so that we consider that opportunity at
some point.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272176 91177308-0d34-0410-b5e6-96231b3b80d8
r272064 apparently made them angry. This undoes some changes made in
r272064 (defaulting move ctors) to make them happy again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272173 91177308-0d34-0410-b5e6-96231b3b80d8
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.
Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272170 91177308-0d34-0410-b5e6-96231b3b80d8
Now, the target will be able to provide its how implementation to remap
an instruction. This open the way to crazier optimizations, but to
beginning with, we will be able to handle something else than the
default mapping.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272165 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have an entity that hold the remap information the
rewritting should be easier to do.
No functional changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272164 91177308-0d34-0410-b5e6-96231b3b80d8
When the command line option is set, it overrides any thing that the
target may have set. The rationale is that we get what we asked for.
Options are respectively regbankselect-fast and regbankselect-greedy for
fast and greedy mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272158 91177308-0d34-0410-b5e6-96231b3b80d8
repairing.
Copies are easy because we repair only when there is a mismatch. For
non-copy repairing, i.e., cases that involves breaking down or gathering
up the value, one of the operand may not have a register bank yet. Thus,
derivate a cost from that, requires more work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272157 91177308-0d34-0410-b5e6-96231b3b80d8
The MSR instructions can write to the CPSR, but we did not model this
fact, so we could emit them in the middle of IT blocks, changing the
condition flags for later instructions in the block.
The tests use two calls to llvm.write_register.i32 because it is valid
to use these instructions at the end of an IT block, which if conversion
does do in some cases. With two calls, the first clobbers the flags, so
a branch has to be used to make the second one conditional.
Differential Revision: http://reviews.llvm.org/D21139
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272154 91177308-0d34-0410-b5e6-96231b3b80d8
The architecture enumeration is shared across ARM and AArch64. However, the
data is not. The code incorrectly would index into the array using the
architecture index which was offset by the ARMv7 architecture enumeration. We
do not have a marker for indicating the architectural family to which the
enumeration belongs so we cannot be clever about offsetting the index (at least
it is not immediately apparent to me). Instead, fall back to the tried-and-true
method of slowly iterating the array (its not a large array, so the impact of
this is not too high).
Because of the incorrect indexing, if we were lucky, we would crash, but usually
we would return an invalid StringRef. We did not have any tests for the AArch64
target parser previously;. Extend the previous tests I had added for ARM to
cover AArch64 for ensuring that we return expected StringRefs.
Take the opportunity to change some iterator types to references.
This work is needed to support parsing `.arch name` directives in the AArch64
target asm parser.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272145 91177308-0d34-0410-b5e6-96231b3b80d8
Also, switch to using functions from LiveIntervalAnalysis to update
live intervals, instead of performing the updates manually.
Re-committing r272045.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272135 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272126 91177308-0d34-0410-b5e6-96231b3b80d8
The cost of a copy may be different based on how many bits we have to
copy around. E.g., a 8-bit copy may be different than a 32-bit copy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272084 91177308-0d34-0410-b5e6-96231b3b80d8
The MachineMemOperand parser lacked the code to handle %stack.X
references (%fixed-stack.X was working).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272082 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes linking problems on OSX.
Unfortunately it turns out we need to use an instance of the
``fuzzer::ExternalFunctions`` object in several places so this
commit also replaces all instances with a single global instance.
It also turns out initializing a global ``fuzzer::ExternalFunctions``
before main is entered (i.e. letting the object be initialised by the
global initializers) is not safe (on OSX the call to ``Printf()`` in the
CTOR crashes if it is called from a global initializer) so we instead
have a global ``fuzzer::ExternalFunctions*`` and initialize it inside
``FuzzerDriver()``.
Multiple unit tests depend also depend on the
``fuzzer::ExternalFunctions*`` global so a ``main()`` function has been
added that initializes it before running any tests.
Differential Revision: http://reviews.llvm.org/D20943
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272072 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.
The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.
Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D20839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272063 91177308-0d34-0410-b5e6-96231b3b80d8
This is necessary because the existing fuzzer-oom.test was Linux
specific due to its use of __sanitizer_print_memory_profile() which
is only available on Linux right now and so the test would fail on OSX.
Differential Revision: http://reviews.llvm.org/D20977
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272061 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch is adding support for the MSVC buffer security check implementation
The buffer security check is turned on with the '/GS' compiler switch.
* https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
* To be added to clang here: http://reviews.llvm.org/D20347
Some overview of buffer security check feature and implementation:
* https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
* http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
* http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
For the following example:
```
int example(int offset, int index) {
char buffer[10];
memset(buffer, 0xCC, index);
return buffer[index];
}
```
The MSVC compiler is adding these instructions to perform stack integrity check:
```
push ebp
mov ebp,esp
sub esp,50h
[1] mov eax,dword ptr [__security_cookie (01068024h)]
[2] xor eax,ebp
[3] mov dword ptr [ebp-4],eax
push ebx
push esi
push edi
mov eax,dword ptr [index]
push eax
push 0CCh
lea ecx,[buffer]
push ecx
call _memset (010610B9h)
add esp,0Ch
mov eax,dword ptr [index]
movsx eax,byte ptr buffer[eax]
pop edi
pop esi
pop ebx
[4] mov ecx,dword ptr [ebp-4]
[5] xor ecx,ebp
[6] call @__security_check_cookie@4 (01061276h)
mov esp,ebp
pop ebp
ret
```
The instrumentation above is:
* [1] is loading the global security canary,
* [3] is storing the local computed ([2]) canary to the guard slot,
* [4] is loading the guard slot and ([5]) re-compute the global canary,
* [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
Overview of the current stack-protection implementation:
* lib/CodeGen/StackProtector.cpp
* There is a default stack-protection implementation applied on intermediate representation.
* The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
* An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
* Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
* Guard manipulation and comparison are added directly to the intermediate representation.
* lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
* lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
* There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
* see long comment above 'class StackProtectorDescriptor' declaration.
* The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
* 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
* The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
* include/llvm/Target/TargetLowering.h
* Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
* lib/Target/X86/X86ISelLowering.cpp
* Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
Function-based Instrumentation:
* The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
* To support function-based instrumentation, this patch is
* adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
* If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
* modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
* generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
* if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
Modifications
* adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
* adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
Results
* IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```
```
*** Final LLVM Code input to ISel ***
; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
%StackGuardSlot = alloca i8* <<<-- Allocated guard slot
%0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value
call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot)
%index.addr = alloca i32, align 4
%offset.addr = alloca i32, align 4
%buffer = alloca [10 x i8], align 1
store i32 %index, i32* %index.addr, align 4
store i32 %offset, i32* %offset.addr, align 4
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
%1 = load i32, i32* %index.addr, align 4
call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
%2 = load i32, i32* %index.addr, align 4
%arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
%3 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %3 to i32
%4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot
call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check
ret i32 %conv
}
```
* SelectionDAG generated instrumentation:
```
clang-cl /GS test.cc /O1 /c /FA
```
```
"?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z"
# BB#0: # %entry
pushl %esi
subl $16, %esp
movl ___security_cookie, %eax <<<-- Loading Stack Guard value
movl 28(%esp), %esi
movl %eax, 12(%esp) <<<-- Store to Guard slot
leal 2(%esp), %eax
pushl %esi
pushl $204
pushl %eax
calll _memset
addl $12, %esp
movsbl 2(%esp,%esi), %esi
movl 12(%esp), %ecx <<<-- Loading Guard slot
calll @__security_check_cookie@4 <<<-- Epilogue function-based check
movl %esi, %eax
addl $16, %esp
popl %esi
retl
```
Reviewers: kcc, pcc, eugenis, rnk
Subscribers: majnemer, llvm-commits, hans, thakis, rnk
Differential Revision: http://reviews.llvm.org/D20346
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272053 91177308-0d34-0410-b5e6-96231b3b80d8
Also, switch to using functions from LiveIntervalAnalysis to update
live intervals, instead of performing the updates manually.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272045 91177308-0d34-0410-b5e6-96231b3b80d8
This patch does a few things:
- Unifies AttrAll and AttrUnknown (since they were used for more or less
the same purpose anyway).
- Introduces AttrEscaped, an attribute that notes that a value escapes
our analysis for a given set, but not that an unknown value flows into
said set.
- Removes functions that take bit indices, since we also had functions
that took bitsets, and the use of both (with similar names) was
unclear and bug-prone.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21000
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272040 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
Differential Review: http://reviews.llvm.org/D20965
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272011 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
Differential Review: http://reviews.llvm.org/D20965
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272010 91177308-0d34-0410-b5e6-96231b3b80d8
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.
This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).
This is needed for Swift and (possibly) other non-clang frontends.
Fixes PR26190.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272005 91177308-0d34-0410-b5e6-96231b3b80d8
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.
This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272003 91177308-0d34-0410-b5e6-96231b3b80d8
A Thumb-2 post-indexed LDR instruction such as:
ldr.w r0, [r1], #4
Can be rewritten as:
ldm.n r1!, {r0}
LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272002 91177308-0d34-0410-b5e6-96231b3b80d8
If we have an LDM that uses only low registers and doesn't write to its base register:
ldm.w r0, {r1, r2, r3}
And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding:
ldm.n r0!, {r1, r2, r3}
Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272000 91177308-0d34-0410-b5e6-96231b3b80d8
The Thumb2 conditional branch B<cond>.W has a different encoding (T3)
to the unconditional branch B.W (T4) as it needs to record <cond>.
As the encoding is different the B<cond>.W is given a different
relocation type.
ELF for the ARM Architecture 4.6.1.6 (Table-13) states that
R_ARM_THM_JUMP19 should be used for B<cond>.W. At present the
MC layer is using the R_ARM_THM_JUMP24 from B.W.
This change makes B<cond>.W use R_ARM_THM_JUMP19 and alters the
existing test that checks for R_ARM_THM_JUMP24 to expect
R_ARM_THM_JUMP19.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271997 91177308-0d34-0410-b5e6-96231b3b80d8
Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit).
If the shift amount is constant we can sometimes convert these instructions to native shifts:
1 - if all shift amounts are in range then the conversion is trivial.
2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion.
3 - logical shifts just return zero if all elements have out of range shift amounts.
In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts.
Differential Revision: http://reviews.llvm.org/D19675
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271996 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions.
The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs.
Differential Revision: http://reviews.llvm.org/D20998
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271990 91177308-0d34-0410-b5e6-96231b3b80d8
In order to efficiently write PDBs, we need to be able to make a
StreamWriter class similar to a StreamReader, which can transparently deal
with writing to discontiguous streams, and we need to use this for all
writing, similar to how we use StreamReader for all reading.
Most discontiguous streams are the typical numbered streams that appear in
a PDB file and are described by the directory, but the exception to this,
that until now has been parsed by hand, is the directory itself.
MappedBlockStream works by querying the directory to find out which blocks
a stream occupies and various other things, so naturally the same logic
could not possibly work to describe the blocks that the directory itself
resided on.
To solve this, I've introduced an abstraction IPDBStreamData, which allows
the client to query for the list of blocks occupied by the stream, as well
as the stream length. I provide two implementations of this: one which
queries the directory (for indexed streams), and one which queries the
super block (for the directory stream).
This has the side benefit of vastly simplifying the code to parse the
directory. Whereas before a mini state machine was rolled by hand, now we
simply use FixedStreamArray to read out the stream sizes, then build a
vector of FixedStreamArrays for the stream map, all in just a few lines of
code.
Reviewed By: ruiu
Differential Revision: http://reviews.llvm.org/D21046
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271982 91177308-0d34-0410-b5e6-96231b3b80d8
TLS access requires an offset from the TLS index. The index itself is the
section-relative distance of the symbol. For ARM, the relevant relocation
(IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may
not be an immediate and must be lowered into a constant pool. This offset will
not be base relocated. We were previously emitting the actual address of the
symbol which would be base relocated and would therefore be the vaue offset by
the ImageBase + TLS Offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271974 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r271962 and reinstantes r271957.
MSVC's linker doesn't appear to like it if you have an empty symbol
substream, so only open a symbol substream if we're going to emit
something about globals into it.
Makes check-asan pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271965 91177308-0d34-0410-b5e6-96231b3b80d8
scalarizePHI only looked for phis that have exactly two uses - the "latch"
use, and an extract. Unfortunately, we can not assume all equivalent extracts
are CSE'd, since InstCombine itself may create an extract which is a duplicate
of an existing one. This extends it to handle several distinct extracts from
the same index.
This should fix at least some of the performance regressions from PR27988.
Differential Revision: http://reviews.llvm.org/D20983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271961 91177308-0d34-0410-b5e6-96231b3b80d8
This currently emits everything as S_GDATA32, which isn't right for
things like thread locals, but it's a start.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271957 91177308-0d34-0410-b5e6-96231b3b80d8
Arrange to call verify(Function &) on each function, followed by
verify(Module &), whether the verifier is being used from the pass or
from verifyModule(). As a side effect, this fixes an issue that caused
us not to call verify(Function &) on unmaterialized functions from
verifyModule().
Differential Revision: http://reviews.llvm.org/D21042
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271956 91177308-0d34-0410-b5e6-96231b3b80d8
Remove previously unreachable code that verifies that a function definition has
an entry block. By definition, a function definition has at least one block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271948 91177308-0d34-0410-b5e6-96231b3b80d8
Calls to this function are currently injected by the
``SanitizerCoverageModule`` pass when the both the ``indirect-calls``
and ``trace-pc`` sanitizer coverage options are enabled and the code
being instrumented has indirect calls. Previously because LibFuzzer did
not define this function this would lead to link errors when building
some of the tests on OSX.
Differential Revision: http://reviews.llvm.org/D20946
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271938 91177308-0d34-0410-b5e6-96231b3b80d8
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271935 91177308-0d34-0410-b5e6-96231b3b80d8
In some cases, when simplifying with SCEV, we might consider pointer values as
just usual integer values. Thus, we might get a different type from what we
had originally in the map of simplified values, and hence we need to check
types before operating on the values.
This fixes PR28015.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271931 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs. This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.
Originally reviewed in http://reviews.llvm.org/D18001
Reviewers: atrick
Subscribers: llvm-commits, mzolotukhin, mcrosier
Differential Revision: http://reviews.llvm.org/D18480
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271929 91177308-0d34-0410-b5e6-96231b3b80d8
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
Differential Revision: http://reviews.llvm.org/D20276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271925 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Following D20970 (committed as r271726).
This is a substantial refactoring of the host CPU detection code.
There is no functionality change intended, but the changes are extensive.
Definitions of architecture types and subtypes are by no means exhaustive or
perfectly defined, but a fair starting point.
Suggestions for futher improvements are welcome.
Reviewers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20988
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271921 91177308-0d34-0410-b5e6-96231b3b80d8
In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check
above this to work for any Constant rather than ConstantInt. AFAICT,
that part makes sense if we can determine that the shrunken/extended
constant remained equal. But it doesn't make sense for this later
transform where we assume that the constant DID change.
This could assert for a ConstantExpr:
https://llvm.org/bugs/show_bug.cgi?id=28011
And it could be wrong for a vector as shown in the added regression test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271908 91177308-0d34-0410-b5e6-96231b3b80d8
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.
Differential Revision: http://reviews.llvm.org/D20796
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271900 91177308-0d34-0410-b5e6-96231b3b80d8
Since FoldOpIntoPhi speculates the binary operation to potentially each
of the predecessors of the PHI node (pulling it out of arbitrary control
dependence in the process), we can FoldOpIntoPhi only if we know the
operation doesn't have UB.
This also brings up an interesting profitability question -- the way it
is written today, commonIRemTransforms will hoist out work from
dynamically dead code into code that will execute at runtime. Perhaps
that isn't the best canonicalization?
Fixes PR27968.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271857 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
There are some rough corners, since the new pass manager doesn't have
(as far as I can tell) LoopSimplify and LCSSA, so I've updated the
tests to run them separately in the old pass manager in the lit tests.
We also don't have an equivalent for AU.setPreservesCFG() in the new
pass manager, so I've left a FIXME.
Reviewers: bogner, chandlerc, davide
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20783
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271846 91177308-0d34-0410-b5e6-96231b3b80d8
Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271828 91177308-0d34-0410-b5e6-96231b3b80d8
Change the name of the ICmpInst to 'ICmp' and the Constant (was a ConstantInt) to 'C',
so that it's hopefully clearer that 'CI' refers to CastInst in this context.
While we're scrubbing, fix the documentation comment and use 'auto' with 'dyn_cast'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271817 91177308-0d34-0410-b5e6-96231b3b80d8