Use hyperbarrier by default on all architectures

All architectures except x86_64 used the linear barrier implementation
by default which doesn't give good performance for a larger number
of threads.

Improvements for PARALLEL overhead (EPCC) with this patch on a Power8
system (2 sockets x 10 cores x 8 threads, OMP_PLACES=cores)

 20 threads:  4.55us -> 3.49us
 40 threads:  8.84us -> 4.06us
 80 threads: 19.18us -> 4.74us
160 threads: 54.22us -> 6.73us

Differential Revision: https://reviews.llvm.org/D40358

llvm-svn: 320152
This commit is contained in:
Jonas Hahnfeld 2017-12-08 15:07:07 +00:00
parent ce528acf0d
commit e628ab4c65

View File

@ -76,25 +76,16 @@ size_t __kmp_malloc_pool_incr = KMP_DEFAULT_MALLOC_POOL_INCR;
// Barrier method defaults, settings, and strings.
// branch factor = 2^branch_bits (only relevant for tree & hyper barrier types)
#if KMP_ARCH_X86_64
kmp_uint32 __kmp_barrier_gather_bb_dflt = 2;
/* branch_factor = 4 */ /* hyper2: C78980 */
kmp_uint32 __kmp_barrier_release_bb_dflt = 2;
/* branch_factor = 4 */ /* hyper2: C78980 */
#else
kmp_uint32 __kmp_barrier_gather_bb_dflt = 2;
/* branch_factor = 4 */ /* communication in core for MIC */
kmp_uint32 __kmp_barrier_release_bb_dflt = 2;
/* branch_factor = 4 */ /* communication in core for MIC */
#endif // KMP_ARCH_X86_64
#if KMP_ARCH_X86_64
kmp_bar_pat_e __kmp_barrier_gather_pat_dflt = bp_hyper_bar; /* hyper2: C78980 */
kmp_bar_pat_e __kmp_barrier_release_pat_dflt =
bp_hyper_bar; /* hyper2: C78980 */
#else
kmp_bar_pat_e __kmp_barrier_gather_pat_dflt = bp_linear_bar;
kmp_bar_pat_e __kmp_barrier_release_pat_dflt = bp_linear_bar;
#endif
kmp_bar_pat_e __kmp_barrier_gather_pat_dflt = bp_hyper_bar;
/* hyper2: C78980 */
kmp_bar_pat_e __kmp_barrier_release_pat_dflt = bp_hyper_bar;
/* hyper2: C78980 */
kmp_uint32 __kmp_barrier_gather_branch_bits[bs_last_barrier] = {0};
kmp_uint32 __kmp_barrier_release_branch_bits[bs_last_barrier] = {0};
kmp_bar_pat_e __kmp_barrier_gather_pattern[bs_last_barrier] = {bp_linear_bar};