RPCS3/llvm - llvm - Free-Git: DMCA Non-Compliant

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-06-30 21:08:10 -04:00

Author	SHA1	Message	Date
Simon Pilgrim	09c57f735c	[X86][AVX] Add i686 avx splat tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374719 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 13:18:07 +00:00
Craig Topper	213f1f4cbf	[X86] Add a one use check on the setcc to the min/max canonicalization code in combineSelect. This seems to improve std::midpoint code where we have a min and a max with the same condition. If we split the setcc we can end up with two compares if the one of the operands is a constant. Since we aggressively canonicalize compares with constants. For non-constants it can interfere with our ability to share control flow if we need to expand cmovs into control flow. I'm also not sure I understand this min/max canonicalization code. The motivating case talks about comparing with 0. But we don't check for 0 explicitly. Removes one instruction from the codegen for PR43658. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374706 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 06:48:05 +00:00
Craig Topper	e166f9ed84	[X86] Enable v4i32->v4i16 and v8i16->v8i8 saturating truncates to use pack instructions with avx512. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374705 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 05:47:47 +00:00
Craig Topper	aa7ea7f0db	[X86] Add v2i64->v2i32/v2i16/v2i8 test cases to the trunc packus/ssat/usat tests. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374704 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 05:47:42 +00:00
Johannes Doerfert	40b2b61e65	[Attributor][FIX] Avoid splitting blocks if possible Before, we eagerly split blocks even if it was not necessary, e.g., they had a single unreachable instruction and only a single predecessor. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374703 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 05:27:09 +00:00
Johannes Doerfert	b2985004e4	[Attributor][FIX] Ensure h2s doesn't trigger on escaped pointers We do not yet perform h2s because we know something is free'ed but we do it because we know the pointer does not escape. Storing the pointer allows it to escape so we have to prevent that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374699 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 04:14:15 +00:00
Johannes Doerfert	cfbbef2231	[Attributor][FIX] Do not apply h2s for arbitrary mallocs H2S did apply to mallocs of non-constant sizes if the uses were OK. This is now forbidden through reording of the "good" and "bad" cases in the conditional. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374698 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 03:54:08 +00:00
Johannes Doerfert	62bc91d4ab	[Attributor][FIX] Add missing function declaration in test case git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374696 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 02:42:09 +00:00
Johannes Doerfert	e6b56b94e2	[Attributor][FIX] Avoid modifying naked/optnone functions The check for naked/optnone was insufficient for different reasons. We now check before we initialize an abstract attribute and we do it for all abstract attributes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374694 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 02:24:02 +00:00
Johannes Doerfert	de0069a2f4	[SROA] Reuse existing lifetime markers if possible Summary: If the underlying alloca did not change, we do not necessarily need new lifetime markers. This patch adds a check and reuses the old ones if possible. Reviewers: reames, ssarda, t.p.northover, hfinkel Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68900 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374692 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-13 02:21:23 +00:00
Joel E. Denny	24feaedf06	Revert r374652: "[lit] Fix internal diff's --strip-trailing-cr and use it" This series of patches still breaks a Windows bot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374679 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 18:51:51 +00:00
Joel E. Denny	672549cb42	Revert r374653: "[lit] Fix a few oversights in r374651 that broke some bots" This series of patches still breaks a Windows bot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374678 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 18:51:34 +00:00
Roman Lebedev	3e0f2dcf49	[LoopIdiomRecognize] Recommit: BCmp loop idiom recognition Summary: This is a recommit, this originally landed in rL370454 but was subsequently reverted in rL370788 due to https://bugs.llvm.org/show_bug.cgi?id=43206 The reduced testcase was added to bcmp-negative-tests.ll as @pr43206_different_loops - we must ensure that the SCEV's we got are both for the same loop we are currently investigating. Original commit message: @mclow.lists brought up this issue up in IRC. It is a reasonably common problem to compare some two values for equality. Those may be just some integers, strings or arrays of integers. In C, there is `memcmp()`, `bcmp()` functions. In C++, there exists `std::equal()` algorithm. One can also write that function manually. libstdc++'s `std::equal()` is specialized to directly call `memcmp()` for various types, but not `std::byte` from C++2a. https://godbolt.org/z/mx2ejJ libc++ does not do anything like that, it simply relies on simple C++'s `operator==()`. https://godbolt.org/z/er0Zwf (GOOD!) So likely, there exists a certain performance opportunities. Let's compare performance of naive `std::equal()` (no `memcmp()`) with one that is using `memcmp()` (in this case, compiled with modified compiler). {F8768213} ``` #include <algorithm> #include <cmath> #include <cstdint> #include <iterator> #include <limits> #include <random> #include <type_traits> #include <utility> #include <vector> #include "benchmark/benchmark.h" template <class T> bool equal(T* a, T* a_end, T* b) noexcept { for (; a != a_end; ++a, ++b) { if (a != b) return false; } return true; } template <typename T> std::vector<T> getVectorOfRandomNumbers(size_t count) { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<T> dis(std::numeric_limits<T>::min(), std::numeric_limits<T>::max()); std::vector<T> v; v.reserve(count); std::generate_n(std::back_inserter(v), count, [&dis, &gen]() { return dis(gen); }); assert(v.size() == count); return v; } struct Identical { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto Tmp = getVectorOfRandomNumbers<T>(count); return std::make_pair(Tmp, std::move(Tmp)); } }; struct InequalHalfway { template <typename T> static std::pair<std::vector<T>, std::vector<T>> Gen(size_t count) { auto V0 = getVectorOfRandomNumbers<T>(count); auto V1 = V0; V1[V1.size() / size_t(2)]++; // just change the value. return std::make_pair(std::move(V0), std::move(V1)); } }; template <class T, class Gen> void BM_bcmp(benchmark::State& state) { const size_t Length = state.range(0); const std::pair<std::vector<T>, std::vector<T>> Data = Gen::template Gen<T>(Length); const std::vector<T>& a = Data.first; const std::vector<T>& b = Data.second; assert(a.size() == Length && b.size() == a.size()); benchmark::ClobberMemory(); benchmark::DoNotOptimize(a); benchmark::DoNotOptimize(a.data()); benchmark::DoNotOptimize(b); benchmark::DoNotOptimize(b.data()); for (auto _ : state) { const bool is_equal = equal(a.data(), a.data() + a.size(), b.data()); benchmark::DoNotOptimize(is_equal); } state.SetComplexityN(Length); state.counters["eltcnt"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariant); state.counters["eltcnt/sec"] = benchmark::Counter(Length, benchmark::Counter::kIsIterationInvariantRate); const size_t BytesRead = 2 * sizeof(T) * Length; state.counters["bytes_read/iteration"] = benchmark::Counter(BytesRead, benchmark::Counter::kDefaults, benchmark::Counter::OneK::kIs1024); state.counters["bytes_read/sec"] = benchmark::Counter( BytesRead, benchmark::Counter::kIsIterationInvariantRate, benchmark::Counter::OneK::kIs1024); } template <typename T> static void CustomArguments(benchmark::internal::Benchmark* b) { const size_t L2SizeBytes = []() { for (const benchmark::CPUInfo::CacheInfo& I : benchmark::CPUInfo::Get().caches) { if (I.level == 2) return I.size; } return 0; }(); // What is the largest range we can check to always fit within given L2 cache? const size_t MaxLen = L2SizeBytes / /total bufs/ 2 / /maximal elt size/ sizeof(T) / /safety margin/ 2; b->RangeMultiplier(2)->Range(1, MaxLen)->Complexity(benchmark::oN); } BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, Identical) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, Identical) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, Identical) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, Identical) ->Apply(CustomArguments<uint64_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint8_t, InequalHalfway) ->Apply(CustomArguments<uint8_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint16_t, InequalHalfway) ->Apply(CustomArguments<uint16_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint32_t, InequalHalfway) ->Apply(CustomArguments<uint32_t>); BENCHMARK_TEMPLATE(BM_bcmp, uint64_t, InequalHalfway) ->Apply(CustomArguments<uint64_t>); ``` {F8768210} ``` $ ~/src/googlebenchmark/tools/compare.py --no-utest benchmarks build-{old,new}/test/llvm-bcmp-bench RUNNING: build-old/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpb6PEUx 2019-04-25 21:17:11 Running build-old/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 0.65, 3.90, 4.14 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 432131 ns 432101 ns 1613 bytes_read/iteration=1000k bytes_read/sec=2.20706G/s eltcnt=825.856M eltcnt/sec=1.18491G/s BM_bcmp<uint8_t, Identical>_BigO 0.86 N 0.86 N BM_bcmp<uint8_t, Identical>_RMS 8 % 8 % <...> BM_bcmp<uint16_t, Identical>/256000 161408 ns 161409 ns 4027 bytes_read/iteration=1000k bytes_read/sec=5.90843G/s eltcnt=1030.91M eltcnt/sec=1.58603G/s BM_bcmp<uint16_t, Identical>_BigO 0.67 N 0.67 N BM_bcmp<uint16_t, Identical>_RMS 25 % 25 % <...> BM_bcmp<uint32_t, Identical>/128000 81497 ns 81488 ns 8415 bytes_read/iteration=1000k bytes_read/sec=11.7032G/s eltcnt=1077.12M eltcnt/sec=1.57078G/s BM_bcmp<uint32_t, Identical>_BigO 0.71 N 0.71 N BM_bcmp<uint32_t, Identical>_RMS 42 % 42 % <...> BM_bcmp<uint64_t, Identical>/64000 50138 ns 50138 ns 10909 bytes_read/iteration=1000k bytes_read/sec=19.0209G/s eltcnt=698.176M eltcnt/sec=1.27647G/s BM_bcmp<uint64_t, Identical>_BigO 0.84 N 0.84 N BM_bcmp<uint64_t, Identical>_RMS 27 % 27 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 192405 ns 192392 ns 3638 bytes_read/iteration=1000k bytes_read/sec=4.95694G/s eltcnt=1.86266G eltcnt/sec=2.66124G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.38 N 0.38 N BM_bcmp<uint8_t, InequalHalfway>_RMS 3 % 3 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 127858 ns 127860 ns 5477 bytes_read/iteration=1000k bytes_read/sec=7.45873G/s eltcnt=1.40211G eltcnt/sec=2.00219G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint16_t, InequalHalfway>_RMS 0 % 0 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 49140 ns 49140 ns 14281 bytes_read/iteration=1000k bytes_read/sec=19.4072G/s eltcnt=1.82797G eltcnt/sec=2.60478G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.40 N 0.40 N BM_bcmp<uint32_t, InequalHalfway>_RMS 18 % 18 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 32101 ns 32099 ns 21786 bytes_read/iteration=1000k bytes_read/sec=29.7101G/s eltcnt=1.3943G eltcnt/sec=1.99381G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.50 N 0.50 N BM_bcmp<uint64_t, InequalHalfway>_RMS 1 % 1 % RUNNING: build-new/test/llvm-bcmp-bench --benchmark_out=/tmp/tmpQ46PP0 2019-04-25 21:19:29 Running build-new/test/llvm-bcmp-bench Run on (8 X 4000 MHz CPU s) CPU Caches: L1 Data 16K (x8) L1 Instruction 64K (x4) L2 Unified 2048K (x4) L3 Unified 8192K (x1) Load Average: 1.01, 2.85, 3.71 --------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 18593 ns 18590 ns 37565 bytes_read/iteration=1000k bytes_read/sec=51.2991G/s eltcnt=19.2333G eltcnt/sec=27.541G/s BM_bcmp<uint8_t, Identical>_BigO 0.04 N 0.04 N BM_bcmp<uint8_t, Identical>_RMS 37 % 37 % <...> BM_bcmp<uint16_t, Identical>/256000 18950 ns 18948 ns 37223 bytes_read/iteration=1000k bytes_read/sec=50.3324G/s eltcnt=9.52909G eltcnt/sec=13.511G/s BM_bcmp<uint16_t, Identical>_BigO 0.08 N 0.08 N BM_bcmp<uint16_t, Identical>_RMS 34 % 34 % <...> BM_bcmp<uint32_t, Identical>/128000 18627 ns 18627 ns 37895 bytes_read/iteration=1000k bytes_read/sec=51.198G/s eltcnt=4.85056G eltcnt/sec=6.87168G/s BM_bcmp<uint32_t, Identical>_BigO 0.16 N 0.16 N BM_bcmp<uint32_t, Identical>_RMS 35 % 35 % <...> BM_bcmp<uint64_t, Identical>/64000 18855 ns 18855 ns 37458 bytes_read/iteration=1000k bytes_read/sec=50.5791G/s eltcnt=2.39731G eltcnt/sec=3.3943G/s BM_bcmp<uint64_t, Identical>_BigO 0.32 N 0.32 N BM_bcmp<uint64_t, Identical>_RMS 33 % 33 % <...> BM_bcmp<uint8_t, InequalHalfway>/512000 9570 ns 9569 ns 73500 bytes_read/iteration=1000k bytes_read/sec=99.6601G/s eltcnt=37.632G eltcnt/sec=53.5046G/s BM_bcmp<uint8_t, InequalHalfway>_BigO 0.02 N 0.02 N BM_bcmp<uint8_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint16_t, InequalHalfway>/256000 9547 ns 9547 ns 74343 bytes_read/iteration=1000k bytes_read/sec=99.8971G/s eltcnt=19.0318G eltcnt/sec=26.8159G/s BM_bcmp<uint16_t, InequalHalfway>_BigO 0.04 N 0.04 N BM_bcmp<uint16_t, InequalHalfway>_RMS 29 % 29 % <...> BM_bcmp<uint32_t, InequalHalfway>/128000 9396 ns 9394 ns 73521 bytes_read/iteration=1000k bytes_read/sec=101.518G/s eltcnt=9.41069G eltcnt/sec=13.6255G/s BM_bcmp<uint32_t, InequalHalfway>_BigO 0.08 N 0.08 N BM_bcmp<uint32_t, InequalHalfway>_RMS 30 % 30 % <...> BM_bcmp<uint64_t, InequalHalfway>/64000 9499 ns 9498 ns 73802 bytes_read/iteration=1000k bytes_read/sec=100.405G/s eltcnt=4.72333G eltcnt/sec=6.73808G/s BM_bcmp<uint64_t, InequalHalfway>_BigO 0.16 N 0.16 N BM_bcmp<uint64_t, InequalHalfway>_RMS 28 % 28 % Comparing build-old/test/llvm-bcmp-bench to build-new/test/llvm-bcmp-bench Benchmark Time CPU Time Old Time New CPU Old CPU New --------------------------------------------------------------------------------------------------------------------------------------- <...> BM_bcmp<uint8_t, Identical>/512000 -0.9570 -0.9570 432131 18593 432101 18590 <...> BM_bcmp<uint16_t, Identical>/256000 -0.8826 -0.8826 161408 18950 161409 18948 <...> BM_bcmp<uint32_t, Identical>/128000 -0.7714 -0.7714 81497 18627 81488 18627 <...> BM_bcmp<uint64_t, Identical>/64000 -0.6239 -0.6239 50138 18855 50138 18855 <...> BM_bcmp<uint8_t, InequalHalfway>/512000 -0.9503 -0.9503 192405 9570 192392 9569 <...> BM_bcmp<uint16_t, InequalHalfway>/256000 -0.9253 -0.9253 127858 9547 127860 9547 <...> BM_bcmp<uint32_t, InequalHalfway>/128000 -0.8088 -0.8088 49140 9396 49140 9394 <...> BM_bcmp<uint64_t, InequalHalfway>/64000 -0.7041 -0.7041 32101 9499 32099 9498 ``` What can we tell from the benchmark? * Performance of naive equality check somewhat improves with element size, maxing out at eltcnt/sec=1.58603G/s for uint16_t, or bytes_read/sec=19.0209G/s for uint64_t. I think, that instability implies performance problems. * Performance of `memcmp()`-aware benchmark always maxes out at around bytes_read/sec=51.2991G/s for every type. That is 2.6x the throughput of the naive variant! * eltcnt/sec metric for the `memcmp()`-aware benchmark maxes out at eltcnt/sec=27.541G/s for uint8_t (was: eltcnt/sec=1.18491G/s, so 24x) and linearly decreases with element size. For uint64_t, it's ~4x+ the elements/second. * The call obvious is more pricey than the loop, with small element count. As it can be seen from the full output {F8768210}, the `memcmp()` is almost universally worse, independent of the element size (and thus buffer size) when element count is less than 8. So all in all, bcmp idiom does indeed pose untapped performance headroom. This diff does implement said idiom recognition. I think a reasonable test coverage is present, but do tell if there is anything obvious missing. Now, quality. This does succeed to build and pass the test-suite, at least without any non-bundled elements. {F8768216} {F8768217} This transform fires 91 times: ``` $ /build/test-suite/utils/compare.py -m loop-idiom.NumBCmp result-new.json Tests: 1149 Metric: loop-idiom.NumBCmp Program result-new MultiSourc...Benchmarks/7zip/7zip-benchmark 79.00 MultiSource/Applications/d/make_dparser 3.00 SingleSource/UnitTests/vla 2.00 MultiSource/Applications/Burg/burg 1.00 MultiSourc.../Applications/JM/lencod/lencod 1.00 MultiSource/Applications/lemon/lemon 1.00 MultiSource/Benchmarks/Bullet/bullet 1.00 MultiSourc...e/Benchmarks/MallocBench/gs/gs 1.00 MultiSourc...gs-C/TimberWolfMC/timberwolfmc 1.00 MultiSourc...Prolangs-C/simulator/simulator 1.00 ``` The size changes are: I'm not sure what's going on with SingleSource/UnitTests/vla.test yet, did not look. ``` $ /build/test-suite/utils/compare.py -m size..text result-{old,new}.json --filter-hash Tests: 1149 Same hash: 907 (filtered out) Remaining: 242 Metric: size..text Program result-old result-new diff test-suite...ingleSource/UnitTests/vla.test 753.00 833.00 10.6% test-suite...marks/7zip/7zip-benchmark.test 1001697.00 966657.00 -3.5% test-suite...ngs-C/simulator/simulator.test 32369.00 32321.00 -0.1% test-suite...plications/d/make_dparser.test 89585.00 89505.00 -0.1% test-suite...ce/Applications/Burg/burg.test 40817.00 40785.00 -0.1% test-suite.../Applications/lemon/lemon.test 47281.00 47249.00 -0.1% test-suite...TimberWolfMC/timberwolfmc.test 250065.00 250113.00 0.0% test-suite...chmarks/MallocBench/gs/gs.test 149889.00 149873.00 -0.0% test-suite...ications/JM/lencod/lencod.test 769585.00 769569.00 -0.0% test-suite.../Benchmarks/Bullet/bullet.test 770049.00 770049.00 0.0% test-suite...HMARK_ANISTROPIC_DIFFUSION/128 NaN NaN nan% test-suite...HMARK_ANISTROPIC_DIFFUSION/256 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/64 NaN NaN nan% test-suite...CHMARK_ANISTROPIC_DIFFUSION/32 NaN NaN nan% test-suite...ENCHMARK_BILATERAL_FILTER/64/4 NaN NaN nan% Geomean difference nan% result-old result-new diff count 1.000000e+01 10.00000 10.000000 mean 3.152090e+05 311695.40000 0.006749 std 3.790398e+05 372091.42232 0.036605 min 7.530000e+02 833.00000 -0.034981 25% 4.243300e+04 42401.00000 -0.000866 50% 1.197370e+05 119689.00000 -0.000392 75% 6.397050e+05 639705.00000 -0.000005 max 1.001697e+06 966657.00000 0.106242 ``` I don't have timings though. And now to the code. The basic idea is to completely replace the whole loop. If we can't fully kill it, don't transform. I have left one or two comments in the code, so hopefully it can be understood. Also, there is a few TODO's that i have left for follow-ups: * widening of `memcmp()`/`bcmp()` * step smaller than the comparison size * Metadata propagation * more than two blocks as long as there is still a single backedge? * ??? Reviewers: reames, fhahn, mkazantsev, chandlerc, craig.topper, courbet Reviewed By: courbet Subscribers: miyuki, hiraditya, xbolva00, nikic, jfb, gchatelet, courbet, llvm-commits, mclow.lists Tags: #llvm Differential Revision: https://reviews.llvm.org/D61144 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374662 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 15:35:32 +00:00
Roman Lebedev	d31582e52f	[NFC][LoopIdiom] Add bcmp loop idiom miscompile test from PR43206. The transform forgot to check SCEV loop scopes. https://bugs.llvm.org/show_bug.cgi?id=43206 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374661 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 15:35:16 +00:00
Roman Lebedev	c88bcb8459	[NFC][LoopIdiom] Move one bcmp test into the proper place git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374660 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 15:35:09 +00:00
Simon Pilgrim	276965a576	[X86][SSE] Avoid unnecessary PMOVZX in v4i8 sum reduction This should go away once D66004 has landed and we can simplify shuffle chains using demanded elts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374658 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 15:19:13 +00:00
Simon Pilgrim	fb4c141673	[CostModel][X86] Improve sum reduction costs. I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374655 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 13:21:50 +00:00
Joel E. Denny	8abc716d19	[lit] Fix a few oversights in r374651 that broke some bots git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374653 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 12:32:00 +00:00
Joel E. Denny	76211a391a	[lit] Fix internal diff's --strip-trailing-cr and use it Using GNU diff, `--strip-trailing-cr` removes a `\r` appearing before a `\n` at the end of a line. Without this patch, lit's internal diff only removes `\r` if it appears as the last character. That seems useless. This patch fixes that. This patch also adds `--strip-trailing-cr` to some tests that fail on Windows bots when D68664 is applied. Based on what I see in the bot logs, I think the following is happening. In each test there, lit diff is comparing a file with `\r\n` line endings to a file with `\n` line endings. Without D68664, lit diff reads those files with Python's universal newlines support activated, causing `\r` to be dropped. However, with D68664, lit diff reads the files in binary mode instead and thus reports that every line is different, just as GNU diff does (at least under Ubuntu). Adding `--strip-trailing-cr` to those tests restores the previous behavior while permitting the behavior of lit diff to be more like GNU diff. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D68839 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374652 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 11:58:30 +00:00
Craig Topper	ec998a2f4e	[X86] Use pack instructions for packus/ssat truncate patterns when 256-bit is the largest legal vector and the result type is at least 256 bits. Since the input type is larger than 256-bits we'll need to some concatenating to reassemble the results. The pack instructions ability to concatenate while packing make this a shorter/faster sequence. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374643 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 07:59:29 +00:00
Craig Topper	6e215a38ba	[X86] Test SKX cpu in the vector-trunc-packus/ssat/usat.ll tests instad of min-legal-vector-width.ll This adds "min-legal-vector-width"="256" function attributes to all the tests for a larger than 256-bit input. Also switch any larger than 512-bit inputs to use a load. This makes the arguments consistent with min-legal-vector-width attribute which should usually be at least as large as the arguments. The SKX configuration will avoid using zmm registers on the modified test cases. For many of them we should use something closer to the AVX2 codegen with pack instructions instead of the avx512 saturating truncates. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374642 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 07:59:24 +00:00
Simon Atanasyan	684885085c	[mips] Fix `loadImmediate` calls when load non-address values. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374640 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 07:42:44 +00:00
Vitaly Buka	f5557e5aac	Revert 374629 "[sancov] Accommodate sancov and coverage report server for use under Windows" http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/27650/steps/ninja%20check%201/logs/stdio http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/31759 http://lab.llvm.org:8011/builders/clang-s390x-linux-lnt/builds/15095 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/21075 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/31759 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 05:23:43 +00:00
Zi Xuan Wu	704914973a	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374634 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 02:53:04 +00:00
Vitaly Buka	d8a8b4bb7c	[sancov] Accommodate sancov and coverage report server for use under Windows Summary: This patch makes the following changes to SanCov and its complementary Python script in order to resolve issues pertaining to non-UNIX file paths in JSON symbolization information: * Convert all paths to use forward slash. * Update `coverage-report-server.py` to correctly handle paths to sources which contain spaces. * Remove Linux platform restriction for all SanCov unit tests. All SanCov tests passed when ran on my local Windows machine. Patch by Douglas Gliner. Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman Reviewed By: vitalybuka Subscribers: vsk, Dor1s, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D51018 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374629 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 02:29:26 +00:00
Vitaly Buka	6dcc8f11e9	[sancov] Use LLVM Support library JSON writer in favor of individual implementation Summary: In this diff, I've replaced the individual implementation of `JSONWriter` with `json::OStream` provided by `llvm/Support/JSON.h`. Important Note: The output format of the JSON is considerably different compared to the original implementation. Important differences include: * New line for each entry in an array (should make diffs cleaner) * No space between keys and colon in attributed object entries. * Attributes with empty strings will now print the attribute name and a quote pair rather than excluding the attribute altogether Examples of these differences can be seen in the changes to the sancov tests which compare the JSON output. Patch by Douglas Gliner. Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman Subscribers: mehdi_amini, dexonsmith, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D68752 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374628 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 02:29:24 +00:00
Vedant Kumar	35cba0afc9	[llvm-profdata] Make "malformed-ptr-to-counter-array.test" textual As pointed out in https://reviews.llvm.org/D66979 post-commit, making this test textual would make it more maintainable. Differential Revision: https://reviews.llvm.org/D68718 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374617 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 00:23:15 +00:00
Craig Topper	f68c4aef85	[X86] Fold a VTRUNCS/VTRUNCUS+store into a saturating truncating store. We already did this for VTRUNCUS with a specific combination of types. This extends this to VTRUNCS and handles any types where a truncating store is legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374615 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 00:01:08 +00:00
Craig Topper	d42eac67f5	[X86] Add test case showing missing opportunity to fold vmovsdb into a store after type legalization. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374614 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-12 00:00:59 +00:00
Stanislav Mekhanoshin	1158580213	[AMDGPU] Use GCN prefix in dpp_combine.mir. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374607 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 22:28:04 +00:00
Stanislav Mekhanoshin	c409f7029b	[AMDGPU] link dpp pseudos and real instructions on gfx10 This defaults to zero fi operand, but we do not expose it anyway. Should we expose it later it needs to be added to the pseudo. This enables dpp combining on gfx10. Differential Revision: https://reviews.llvm.org/D68888 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374604 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 22:03:36 +00:00
David Blaikie	86b0b371bb	DebugInfo: Use base address selection entries for debug_loc Unify the range and loc emission (for both DWARFv4 and DWARFv5 style lists) and take advantage of that unification to use strategic base addresses for loclists. Differential Revision: https://reviews.llvm.org/D68620 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374600 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 21:52:41 +00:00
Simon Atanasyan	99343bd7e0	[mips] Store 64-bit `li.d' operand as a single 8-byte value Now assembler generates two consecutive `.4byte` directives to store 64-bit `li.d' operand. The first directive stores high 4-byte of the value. The second directive stores low 4-byte of the value. But on 64-bit system we load this value at once and get wrong result if the system is little-endian. This patch fixes the bug. It stores the `li.d' operand as a single 8-byte value. Differential Revision: https://reviews.llvm.org/D68778 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374598 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 21:51:33 +00:00
Simon Atanasyan	a7f4c4e9f4	[mips] Use less instruction to load zero into FPR by li.s / li.d pseudos If `li.s` or `li.d` loads zero into a FPR, it's not necessary to load zero into `at` GPR register and then move its value into a floating point register. We can use as a source register the `zero / $0` one. Differential Revision: https://reviews.llvm.org/D68777 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374597 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 21:51:23 +00:00
David Green	043238b41c	Revert 374373: [Codegen] Alter the default promotion for saturating adds and subs This commit is not extending the promoted integers as it should. Reverting whilst I look into the details. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374592 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 20:33:03 +00:00
Quentin Colombet	f4daa70c54	[GISel][CallLowering] Enable vector support in argument lowering The exciting code is actually already enough to handle the splitting of vector arguments but we were lacking a test case. This commit adds a test case for vector argument lowering involving splitting and enable the related support in call lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374589 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 20:22:57 +00:00
David Blaikie	9a6583f43f	llvm-dwarfdump: Add verbose printing for debug_loclists git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374582 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 19:06:35 +00:00
Simon Pilgrim	072da28304	[X86][SSE] Add support for v4i8 add reduction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374579 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 17:54:15 +00:00
Sanjay Patel	976a21434d	[AArch64] add tests for (v)select-of-constants; NFC These are copied from existing test files in x86/PPC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374568 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 16:10:23 +00:00
Kerry McLaughlin	aa6063f761	[AArch64][SVE] Implement sdot and udot (lane) intrinsics Summary: Implements the following arithmetic intrinsics: - int_aarch64_sve_sdot - int_aarch64_sve_sdot_lane - int_aarch64_sve_udot - int_aarch64_sve_udot_lane This patch includes tests for the Subdivide4Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67551 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374566 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 15:53:41 +00:00
David Tenty	f6f4d20ff0	[AIX] Use .space instead of .zero in assembly Summary: The AIX system assembler does not understand .zero, so we should prefer emitting .space. Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68815 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374564 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 15:07:28 +00:00
Dmitry Preobrazhensky	c51d5e1850	[AMDGPU][MC][GFX9][GFX10] Corrected number of src operands for ds_[read/write]_addtid_b32 See https://bugs.llvm.org/show_bug.cgi?id=37941 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68787 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374561 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 14:53:26 +00:00
Dmitry Preobrazhensky	dcea518d64	[AMDGPU][MC][GFX6][GFX7][GFX10] Added instructions buffer_atomic_[fcmpswap/fmin/fmax]* See https://bugs.llvm.org/show_bug.cgi?id=28232 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68788 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374559 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 14:44:51 +00:00
Dmitry Preobrazhensky	95015bdb1a	[AMDGPU][MC][GFX10] Enabled null for 64-bit dst operands See https://bugs.llvm.org/show_bug.cgi?id=43524 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D68785 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374557 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 14:35:11 +00:00
Sanjay Patel	f2c20cc531	[DAGCombiner] fold vselect-of-constants to shift The diffs suggest that we are missing some more basic analysis/transforms, but this keeps the vector path in sync with the scalar (rL374397). This is again a preliminary step for introducing the reverse transform in IR as proposed in D63382. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374555 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 14:17:56 +00:00
Dmitry Preobrazhensky	702b2af051	[AMDGPU][MC] Corrected parsing of optional operands See https://bugs.llvm.org/show_bug.cgi?id=43486 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D68350 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374553 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 14:05:09 +00:00
Simon Atanasyan	f9fcba66e2	[mips] Follow-up to r374544. Fix test case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374548 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 12:58:37 +00:00
Kai Nacke	f84bcd1a41	[Tests] Output of od can be lower or upper case (llvm-objcopy/yaml2obj). The command `od -t x` is used to dump data in hex format. The LIT tests assumes that the hex characters are in lowercase. However, there are also platforms which use uppercase letter. To solve this issue the tests are updated to use the new `--ignore-case` option of FileCheck. Reviewers: Bigcheese, jakehehrlich, rupprecht, espindola, alexshap, jhenderson Differential Revision: https://reviews.llvm.org/D68693 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374547 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 12:50:57 +00:00
Simon Atanasyan	1699e3d2a0	[mips] Fix loading "double" immediate into a GPR and FPR If a "double" (64-bit) value has zero low 32-bits, it's possible to load such value into a GP/FP registers as an instruction immediate. But now assembler loads only high 32-bits of the value. For example, if a target register is GPR the `li.d $4, 1.0` instruction converts into the `lui $4, 16368` one. As a result, we get `0x3FF00000` in the register. While a correct representation of the `1.0` value is `0x3FF0000000000000`. The patch fixes that. Differential Revision: https://reviews.llvm.org/D68776 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374544 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 12:33:12 +00:00
George Rimar	8726a32c2b	[llvm-readobj] - Remove excessive fields when dumping "Version symbols". This removes a few fields that are not useful: "Section Name", "Address", "Offset" and "Link" (they duplicated the information available under the "Sections [" tag). Differential revision: https://reviews.llvm.org/D68704 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374541 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-11 12:27:11 +00:00

... 3 4 5 6 7 ...

65934 Commits