mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2025-04-17 15:10:07 +00:00

Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector. E.g. for (i = 0; i < N; i+=2) { a = A[i]; // load of even element b = A[i+1]; // load of odd element ... // operations on a, b, c, d A[i] = c; // store of even element A[i+1] = d; // store of odd element } The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like: %wide.vec = load <8 x i32>, <8 x i32>* %ptr %vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7> The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like: %interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> store <8 x i32> %interleaved.vec, <8 x i32>* %ptr This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'. llvm-svn: 239291
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//