Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers.
Reviewers: dnovillo, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30233
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295852 91177308-0d34-0410-b5e6-96231b3b80d8
Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295848 91177308-0d34-0410-b5e6-96231b3b80d8
is_local can't pass on some our buildbots as some of our buildbots use network
shares for building and testing LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295840 91177308-0d34-0410-b5e6-96231b3b80d8
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.
Thus, remove the tPUSH as well in this case.
Differential Revision: https://reviews.llvm.org/D30207
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295816 91177308-0d34-0410-b5e6-96231b3b80d8
This needed a const_cast for the dominator tree recalculation in
OptimizationRemarkEmitter, but we do that all over the place already
and it's safe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295812 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.
Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295811 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.
I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.
Differential revision: https://reviews.llvm.org/D30186
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295810 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Motivation: fix PR31181 without regression (the actual fix is still in
progress). However, the actual content of PR31181 is not relevant
here.
This change makes poison propagation more aggressive in the following
cases:
1. poision * Val == poison, for any Val. In particular, this changes
existing intentional and documented behavior in these two cases:
a. Val is 0
b. Val is 2^k * N
2. poison << Val == poison, for any Val
3. getelementptr is poison if any input is poison
I think all of these are justified (and are axiomatically true in the
new poison / undef model):
1a: we need poison * 0 to be poison to allow transforms like these:
A * (B + C) ==> A * B + A * C
If poison * 0 were 0 then the above transform could not be allowed
since e.g. we could have A = poison, B = 1, C = -1, making the LHS
poison * (1 + -1) = poison * 0 = 0
and the RHS
poison * 1 + poison * -1 = poison + poison = poison
1b: we need e.g. poison * 4 to be poison since we want to allow
A * 4 ==> A + A + A + A
If poison * 4 were a value with all of their bits poison except the
last four; then we'd not be able to do this transform since then if A
were poison the LHS would only be "partially" poison while the RHS
would be "full" poison.
2: Same reasoning as (1b), we'd like have the following kinds
transforms be legal:
A << 1 ==> A + A
Reviewers: majnemer, efriedma
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30185
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295809 91177308-0d34-0410-b5e6-96231b3b80d8
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295788 91177308-0d34-0410-b5e6-96231b3b80d8
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.
This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.
General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
which returns an array of flags marking boundaries of vectorized
load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
that constructs a vector according to the plan.
Differential Revision: https://reviews.llvm.org/D30011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295784 91177308-0d34-0410-b5e6-96231b3b80d8
Since I'm only seeing failures on OSX, and it's saying
permission denied, I'm suspecting this is due to the addition
of the MAP_RESILIENT_CODESIGN and/or MAP_RESILIENT_MEDIA flags.
Speculatively trying to remove those to get the bots working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295770 91177308-0d34-0410-b5e6-96231b3b80d8
For whatever reason ld64 requires that member headers (not the member
themselves) should be aligned. The only way to do that is to edit the
previous member so that it ends at an aligned boundary.
Since modifying data put in an archive is an undesirable property,
llvm-ar should only do it when it is absolutely necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295765 91177308-0d34-0410-b5e6-96231b3b80d8
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295762 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use
matchSelectPattern or m_SMax and friends.
The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392https://reviews.llvm.org/rL289738
If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295758 91177308-0d34-0410-b5e6-96231b3b80d8
There are still over 3400 files remaining with this property set, but there are tens of thousands more with the property not set. Until we decide what to do on a global scale, this at least unblocks me temporarily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295756 91177308-0d34-0410-b5e6-96231b3b80d8
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295753 91177308-0d34-0410-b5e6-96231b3b80d8