Files
archived-llvm/lib/Target/AMDGPU
Alexander Timofeev d767830f39 [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.

Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.

Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.

Differential Revision: https://reviews.llvm.org/D25944

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-03 14:37:13 +00:00
..
2016-11-01 00:55:14 +00:00
2016-11-01 00:55:14 +00:00
2016-11-01 00:55:14 +00:00
2016-11-01 00:55:14 +00:00
2016-11-01 00:55:14 +00:00
2016-07-22 17:01:21 +00:00
2015-06-13 03:28:10 +00:00
2015-06-13 03:28:10 +00:00
2016-11-01 00:55:14 +00:00