I apologise in advance for the size of this check-in. At Intel we do

understand that this is not friendly, and are working to change our
internal code-development to make it easier to make development
features available more frequently and in finer (more functional)
chunks. Unfortunately we haven't got that in place yet, and unpicking
this into multiple separate check-ins would be non-trivial, so please
bear with me on this one. We should be better in the future.

Apologies over, what do we have here?

GGC 4.9 compatibility
--------------------
* We have implemented the new entrypoints used by code compiled by GCC
4.9 to implement the same functionality in gcc 4.8. Therefore code
compiled with gcc 4.9 that used to work will continue to do so.
However, there are some other new entrypoints (associated with task
cancellation) which are not implemented. Therefore user code compiled
by gcc 4.9 that uses these new features will not link against the LLVM
runtime. (It remains unclear how to handle those entrypoints, since
the GCC interface has potentially unpleasant performance implications
for join barriers even when cancellation is not used)

--- new parallel entry points ---
new entry points that aren't OpenMP 4.0 related
These are implemented fully :-
      GOMP_parallel_loop_dynamic()
      GOMP_parallel_loop_guided()
      GOMP_parallel_loop_runtime()
      GOMP_parallel_loop_static()
      GOMP_parallel_sections()
      GOMP_parallel()

--- cancellation entry points ---
Currently, these only give a runtime error if OMP_CANCELLATION is true
because our plain barriers don't check for cancellation while waiting
        GOMP_barrier_cancel()
        GOMP_cancel()
        GOMP_cancellation_point()
        GOMP_loop_end_cancel()
        GOMP_sections_end_cancel()

--- taskgroup entry points ---
These are implemented fully.
      GOMP_taskgroup_start()
      GOMP_taskgroup_end()

--- target entry points ---
These are empty (as they are in libgomp)
     GOMP_target()
     GOMP_target_data()
     GOMP_target_end_data()
     GOMP_target_update()
     GOMP_teams()

Improvements in Barriers and Fork/Join
--------------------------------------
* Barrier and fork/join code is now in its own file (which makes it
easier to understand and modify).
* Wait/release code is now templated and in its own file; suspend/resume code is also templated
* There's a new, hierarchical, barrier, which exploits the
cache-hierarchy of the Intel(r) Xeon Phi(tm) coprocessor to improve
fork/join and barrier performance.

***BEWARE*** the new source files have *not* been added to the legacy
Cmake build system. If you want to use that fixes wil be required.

Statistics Collection Code
--------------------------
* New code has been added to collect application statistics (if this
is enabled at library compile time; by default it is not). The
statistics code itself is generally useful, the lightweight timing
code uses the X86 rdtsc instruction, so will require changes for other
architectures.
The intent of this code is not for users to tune their codes but
rather 
1) For timing code-paths inside the runtime
2) For gathering general properties of OpenMP codes to focus attention
on which OpenMP features are most used. 

Nested Hot Teams
----------------
* The runtime now maintains more state to reduce the overhead of
creating and destroying inner parallel teams. This improves the
performance of code that repeatedly uses nested parallelism with the
same resource allocation. Set the new KMP_HOT_TEAMS_MAX_LEVEL
envirable to a depth to enable this (and, of course, OMP_NESTED=true
to enable nested parallelism at all).

Improved Intel(r) VTune(Tm) Amplifier support
---------------------------------------------
* The runtime provides additional information to Vtune via the
itt_notify interface to allow it to display better OpenMP specific
analyses of load-imbalance.

Support for OpenMP Composite Statements
---------------------------------------
* Implement new entrypoints required by some of the OpenMP 4.1
composite statements.

Improved ifdefs
---------------
* More separation of concepts ("Does this platform do X?") from
platforms ("Are we compiling for platform Y?"), which should simplify
future porting.


ScaleMP* contribution
---------------------
Stack padding to improve the performance in their environment where
cross-node coherency is managed at the page level.

Redesign of wait and release code
---------------------------------
The code is simplified and performance improved.

Bug Fixes
---------
    *Fixes for Windows multiple processor groups.
    *Fix Fortran module build on Linux: offload attribute added.
    *Fix entry names for distribute-parallel-loop construct to be consistent with the compiler codegen.
    *Fix an inconsistent error message for KMP_PLACE_THREADS environment variable.

llvm-svn: 219214
This commit is contained in:
Jim Cownie 2014-10-07 16:25:50 +00:00
parent f72fa67fc3
commit 4cc4bb4c60
121 changed files with 31865 additions and 22060 deletions

View File

@ -9,6 +9,7 @@ beautification by scripts. The fields are: name (N), email (E), web-address
(S).
N: Carlo Bertolli
W: http://ibm.com
D: IBM contributor to PowerPC support in CMake files and elsewhere.
N: Sunita Chandrasekaran
@ -28,6 +29,11 @@ D: Created the runtime.
N: Matthias Muller
D: Contributor to testsuite from OpenUH
N: Tal Nevo
E: tal@scalemp.com
D: ScaleMP contributor to improve runtime performance there.
W: http://scalemp.com
N: Pavel Neytchev
D: Contributor to testsuite from OpenUH

View File

@ -14,7 +14,7 @@ software contained in this directory tree is included below.
University of Illinois/NCSA
Open Source License
Copyright (c) 1997-2013 Intel Corporation
Copyright (c) 1997-2014 Intel Corporation
All rights reserved.
@ -51,7 +51,7 @@ SOFTWARE.
==============================================================================
Copyright (c) 1997-2013 Intel Corporation
Copyright (c) 1997-2014 Intel Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@ -137,8 +137,7 @@ libiomp5 version can be 5 or 4.
OpenMP version can be either 40 or 30.
-Dmic_arch=knc|knf
Intel(R) MIC Architecture. Can be
knf (Knights Ferry) or knc (Knights Corner).
Intel(R) MIC Architecture, can be knf or knc.
This value is ignored if os != mic
-Dmic_os=lin|bsd

View File

@ -238,8 +238,18 @@ set(USE_BUILDPL_RULES false CACHE BOOL "Should the build follow build.pl rules/r
# - these predefined linker flags should work for Windows, Mac, and True Linux for the most popular compilers/linkers
set(USE_PREDEFINED_LINKER_FLAGS true CACHE BOOL "Should the build use the predefined linker flags in CommonFlags.cmake?")
# - On multinode systems, larger alignment is desired to avoid false sharing
set(USE_INTERNODE_ALIGNMENT false CACHE BOOL "Should larger alignment (4096 bytes) be used for some locks and data structures?")
# - libgomp drop-in compatibility
if(${LINUX} AND NOT ${PPC64})
set(USE_VERSION_SYMBOLS true CACHE BOOL "Should version symbols be used? These provide binary compatibility with libgomp.")
else()
set(USE_VERSION_SYMBOLS false CACHE BOOL "Should version symbols be used? These provide binary compatibility with libgomp.")
endif()
# - TSX based locks have __asm code which can be troublesome for some compilers. This feature is also x86 specific.
if({${IA32} OR ${INTEL64})
if(${IA32} OR ${INTEL64})
set(USE_ADAPTIVE_LOCKS true CACHE BOOL "Should TSX-based lock be compiled (adaptive lock in kmp_lock.cpp). These are x86 specific.")
else()
set(USE_ADAPTIVE_LOCKS false CACHE BOOL "Should TSX-based lock be compiled (adaptive lock in kmp_lock.cpp). These are x86 specific.")

View File

@ -37,7 +37,7 @@ omp_root: The path to the top-level directory containing the top-level
current working directory.
omp_os: Operating system. By default, the build will attempt to
detect this. Currently supports "linux", "freebsd", "macos", and
detect this. Currently supports "linux", "freebsd", "macos", and
"windows".
arch: Architecture. By default, the build will attempt to
@ -72,36 +72,44 @@ There is also an experimental CMake build system. This is *not* yet
supported for production use and resulting binaries have not been checked
for compatibility.
On OS X* machines, it is possible to build universal (or fat) libraries which
include both IA-32 architecture and Intel(R) 64 architecture objects in a
single archive; just build the 32 and 32e libraries separately, then invoke
make again with a special argument as follows:
make compiler=clang build_args=fat
Supported RTL Build Configurations
==================================
Supported Architectures: IA-32 architecture, Intel(R) 64, and
Intel(R) Many Integrated Core Architecture
--------------------------------------------
| icc/icl | gcc | clang |
--------------|---------------|--------------------------|
| Linux* OS | Yes(1,5) | Yes(2,4) | Yes(4,6,7) |
| FreeBSD* | No | No | Yes(4,6,7) |
| OS X* | Yes(1,3,4) | No | Yes(4,6,7) |
| Windows* OS | Yes(1,4) | No | No |
----------------------------------------------------------
----------------------------------------------
| icc/icl | gcc | clang |
--------------|---------------|----------------------------|
| Linux* OS | Yes(1,5) | Yes(2,4) | Yes(4,6,7) |
| FreeBSD* | No | No | Yes(4,6,7,8) |
| OS X* | Yes(1,3,4) | No | Yes(4,6,7) |
| Windows* OS | Yes(1,4) | No | No |
------------------------------------------------------------
(1) On IA-32 architecture and Intel(R) 64, icc/icl versions 12.x are
supported (12.1 is recommended).
(2) gcc version 4.6.2 is supported.
(2) GCC* version 4.6.2 is supported.
(3) For icc on OS X*, OS X* version 10.5.8 is supported.
(4) Intel(R) Many Integrated Core Architecture not supported.
(5) On Intel(R) Many Integrated Core Architecture, icc/icl versions 13.0
or later are required.
(6) clang version 3.3 is supported.
(7) clang currently does not offer a software-implemented 128 bit extended
(6) Clang* version 3.3 is supported.
(7) Clang* currently does not offer a software-implemented 128 bit extended
precision type. Thus, all entry points reliant on this type are removed
from the library and cannot be called in the user program. The following
functions are not available:
__kmpc_atomic_cmplx16_*
__kmpc_atomic_float16_*
__kmpc_atomic_*_fp
(8) Community contribution provided AS IS, not tested by Intel.
Front-end Compilers that work with this RTL
===========================================

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
###############################################################################
# This file contains additional build rules that correspond to build.pl's rules
# Building libiomp5.dbg is linux only, Windows will build libiomp5md.dll.pdb

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Clang (clang/clang++) specific compiler dependent flags
# The flag types are:
# 1) Assembly flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Clang (clang/clang++) specific compiler dependent flags
# The flag types are:
# 1) C/C++ Compiler flags
@ -19,6 +30,7 @@ function(append_compiler_specific_c_and_cxx_flags input_c_flags input_cxx_flags)
endif()
append_c_and_cxx_flags("-Wno-unused-value") # Don't warn about unused values
append_c_and_cxx_flags("-Wno-switch") # Don't warn about switch statements that don't cover entire range of values
append_c_and_cxx_flags("-Wno-deprecated-register") # Don't warn about using register keyword
set(${input_c_flags} ${${input_c_flags}} "${local_c_flags}" PARENT_SCOPE)
set(${input_cxx_flags} ${${input_cxx_flags}} "${local_cxx_flags}" PARENT_SCOPE)
endfunction()

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds the common flags independent of compiler
# The flag types are:
# 1) Assembly flags (append_asm_flags_common)
@ -71,22 +82,21 @@ function(append_linker_flags_common input_ld_flags input_ld_flags_libs)
set(local_ld_flags)
set(local_ld_flags_libs)
#################################
# Windows linker flags
if(${WINDOWS})
if(${USE_PREDEFINED_LINKER_FLAGS})
##################
# MAC linker flags
elseif(${MAC})
if(${USE_PREDEFINED_LINKER_FLAGS})
#################################
# Windows linker flags
if(${WINDOWS})
##################
# MAC linker flags
elseif(${MAC})
append_linker_flags("-single_module")
append_linker_flags("-current_version ${version}.0")
append_linker_flags("-compatibility_version ${version}.0")
endif()
#####################################################################################
# Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) linker flags
elseif(${MIC})
if(${USE_PREDEFINED_LINKER_FLAGS})
#####################################################################################
# Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) linker flags
elseif(${MIC})
append_linker_flags("-Wl,-x")
append_linker_flags("-Wl,--warn-shared-textrel") # Warn if the linker adds a DT_TEXTREL to a shared object.
append_linker_flags("-Wl,--as-needed")
@ -98,13 +108,11 @@ function(append_linker_flags_common input_ld_flags input_ld_flags_libs)
if(${STATS_GATHERING})
append_linker_flags_library("-Wl,-lstdc++") # link in standard c++ library (stats-gathering needs it)
endif()
endif()
#########################
# Unix based linker flags
else()
# For now, always include --version-script flag on Unix systems.
append_linker_flags("-Wl,--version-script=${src_dir}/exports_so.txt") # Use exports_so.txt as version script to create versioned symbols for ELF libraries
if(${USE_PREDEFINED_LINKER_FLAGS})
#########################
# Unix based linker flags
else()
# For now, always include --version-script flag on Unix systems.
append_linker_flags("-Wl,--version-script=${src_dir}/exports_so.txt") # Use exports_so.txt as version script to create versioned symbols for ELF libraries
append_linker_flags("-Wl,-z,noexecstack") # Marks the object as not requiring executable stack.
append_linker_flags("-Wl,--as-needed") # Only adds library dependencies as they are needed. (if libiomp5 actually uses a function from the library, then add it)
if(NOT ${STUBS_LIBRARY})
@ -117,8 +125,9 @@ function(append_linker_flags_common input_ld_flags input_ld_flags_libs)
append_linker_flags_library("-Wl,-ldl") # link in libdl (dynamic loader library)
endif()
endif()
endif() # if(${USE_PREDEFINED_LINKER_FLAGS})
endif() # if(${OPERATING_SYSTEM}) ...
endif() # if(${OPERATING_SYSTEM}) ...
endif() # USE_PREDEFINED_LINKER_FLAGS
set(${input_ld_flags} "${${input_ld_flags}}" "${local_ld_flags}" "${USER_LD_FLAGS}" PARENT_SCOPE)
set(${input_ld_flags_libs} "${${input_ld_flags_libs}}" "${local_ld_flags_libs}" "${USER_LD_LIB_FLAGS}" PARENT_SCOPE)

View File

@ -42,6 +42,10 @@ function(append_cpp_flags input_cpp_flags)
endif()
append_definitions("-D INTEL_ITTNOTIFY_PREFIX=__kmp_itt_")
if(${USE_VERSION_SYMBOLS})
append_definitions("-D KMP_USE_VERSION_SYMBOLS")
endif()
#####################
# Windows definitions
if(${WINDOWS})
@ -133,6 +137,11 @@ function(append_cpp_flags input_cpp_flags)
append_definitions("-D KMP_USE_ADAPTIVE_LOCKS=0")
append_definitions("-D KMP_DEBUG_ADAPTIVE_LOCKS=0")
endif()
if(${USE_INTERNODE_ALIGNMENT})
append_definitions("-D KMP_USE_INTERNODE_ALIGNMENT=1")
else()
append_definitions("-D KMP_USE_INTERNODE_ALIGNMENT=0")
endif()
set(${input_cpp_flags} "${${input_cpp_flags}}" "${local_cpp_flags}" "${USER_CPP_FLAGS}" "$ENV{CPPFLAGS}" PARENT_SCOPE)
endfunction()

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds GNU (gcc/g++) specific compiler dependent flags
# The flag types are:
# 1) Assembly flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds GNU (gcc/g++) specific compiler dependent flags
# The flag types are:
# 2) C/C++ Compiler flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds GNU (gcc/g++) specific compiler dependent flags
# The flag types are:
# 1) Fortran Compiler flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Intel(R) C Compiler / Intel(R) C++ Compiler / Intel(R) Fortran Compiler (icc/icpc/icl.exe/ifort) dependent flags
# The flag types are:
# 1) Assembly flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Intel(R) C Compiler / Intel(R) C++ Compiler / Intel(R) Fortran Compiler (icc/icpc/icl.exe/ifort) dependent flags
# The flag types are:
# 2) C/C++ Compiler flags
@ -41,7 +52,6 @@ function(append_compiler_specific_c_and_cxx_flags input_c_flags input_cxx_flags)
endif()
else()
append_c_and_cxx_flags("-Wsign-compare") # warn on sign comparisons
append_c_and_cxx_flags("-Werror") # Changes all warnings to errors.
append_c_and_cxx_flags("-Qoption,cpp,--extended_float_types") # Enabled _Quad type.
append_c_and_cxx_flags("-fno-exceptions") # Exception handling table generation is disabled.
append_c_and_cxx_flags("-x c++") # Compile C files as C++ files

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Intel(R) C Compiler / Intel(R) C++ Compiler / Intel(R) Fortran Compiler (icc/icpc/icl.exe/ifort) dependent flags
# The flag types are:
# 1) Fortran Compiler flags
@ -17,12 +28,20 @@ function(append_fortran_compiler_specific_fort_flags input_fort_flags)
append_fort_flags("-GS")
append_fort_flags("-DynamicBase")
append_fort_flags("-Zi")
# On Linux and Windows Intel(R) 64 architecture we need offload attribute
# for all Fortran entries in order to support OpenMP function calls inside device contructs
if(${INTEL64})
append_fort_flags("/Qoffload-attribute-target:mic")
endif()
else()
if(${MIC})
append_fort_flags("-mmic")
endif()
if(NOT ${MAC})
append_fort_flags("-sox")
if(${INTEL64} AND ${LINUX})
append_fort_flags("-offload-attribute-target=mic")
endif()
endif()
endif()
set(${input_fort_flags} ${${input_fort_flags}} "${local_fort_flags}" PARENT_SCOPE)

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Microsoft Visual Studio dependent flags
# The flag types are:
# 1) Assembly flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
# This file holds Microsoft Visual Studio dependent flags
# The flag types are:
# 1) C/C++ Compiler flags

View File

@ -1,3 +1,14 @@
#
#//===----------------------------------------------------------------------===//
#//
#// The LLVM Compiler Infrastructure
#//
#// This file is dual licensed under the MIT and the University of Illinois Open
#// Source Licenses. See LICENSE.txt for details.
#//
#//===----------------------------------------------------------------------===//
#
######################################################
# MICRO TESTS
# The following micro-tests are small tests to perform on
@ -219,15 +230,14 @@ if(${test_deps} AND ${tests})
set(td_exp libc.so.7 libthr.so.3 libunwind.so.5)
elseif(${LINUX})
set(td_exp libdl.so.2,libgcc_s.so.1)
if(NOT ${IA32} AND NOT ${INTEL64})
set(td_exp ${td_exp},libffi.so.6,libffi.so.5)
endif()
if(${IA32})
set(td_exp ${td_exp},libc.so.6,ld-linux.so.2)
elseif(${INTEL64})
set(td_exp ${td_exp},libc.so.6,ld-linux-x86-64.so.2)
elseif(${ARM})
set(td_exp ${td_exp},libc.so.6,ld-linux-armhf.so.3)
set(td_exp ${td_exp},libffi.so.6,libffi.so.5,libc.so.6,ld-linux-armhf.so.3)
elseif(${PPC64})
set(td_exp ${td_exp},libc.so.6,ld64.so.1)
endif()
if(${STD_CPP_LIB})
set(td_exp ${td_exp},libstdc++.so.6)

View File

@ -69,7 +69,8 @@ endfunction()
function(set_cpp_files input_cpp_source_files)
set(local_cpp_source_files "")
if(NOT ${STUBS_LIBRARY})
#append_cpp_source_file("kmp_barrier.cpp")
append_cpp_source_file("kmp_barrier.cpp")
append_cpp_source_file("kmp_wait_release.cpp")
append_cpp_source_file("kmp_affinity.cpp")
append_cpp_source_file("kmp_dispatch.cpp")
append_cpp_source_file("kmp_lock.cpp")
@ -78,10 +79,10 @@ function(set_cpp_files input_cpp_source_files)
append_cpp_source_file("kmp_taskdeps.cpp")
append_cpp_source_file("kmp_cancel.cpp")
endif()
#if(${STATS_GATHERING})
# append_cpp_source_file("kmp_stats.cpp")
# append_cpp_source_file("kmp_stats_timing.cpp")
#endif()
if(${STATS_GATHERING})
append_cpp_source_file("kmp_stats.cpp")
append_cpp_source_file("kmp_stats_timing.cpp")
endif()
endif()
set(${input_cpp_source_files} "${local_cpp_source_files}" PARENT_SCOPE)

File diff suppressed because it is too large Load Diff

View File

@ -1539,7 +1539,7 @@ INCLUDE_FILE_PATTERNS =
# undefined via #undef or recursively expanded use the := operator
# instead of the = operator.
PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1
PREDEFINED = OMP_30_ENABLED=1, OMP_40_ENABLED=1, KMP_STATS_ENABLED=1
# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then
# this tag can be used to specify a list of macro names that should be expanded.

View File

@ -208,6 +208,7 @@ are documented in different modules.
- @ref THREADPRIVATE functions to support thread private data, copyin etc
- @ref SYNCHRONIZATION functions to support `omp critical`, `omp barrier`, `omp master`, reductions etc
- @ref ATOMIC_OPS functions to support atomic operations
- @ref STATS_GATHERING macros to support developer profiling of libiomp5
- Documentation on tasking has still to be written...
@section SEC_EXAMPLES Examples
@ -319,8 +320,29 @@ These functions are used for implementing barriers.
@defgroup THREADPRIVATE Thread private data support
These functions support copyin/out and thread private data.
@defgroup STATS_GATHERING Statistics Gathering from OMPTB
These macros support profiling the libiomp5 library. Use --stats=on when building with build.pl to enable
and then use the KMP_* macros to profile (through counts or clock ticks) libiomp5 during execution of an OpenMP program.
@section sec_stats_env_vars Environment Variables
This section describes the environment variables relevent to stats-gathering in libiomp5
@code
KMP_STATS_FILE
@endcode
This environment variable is set to an output filename that will be appended *NOT OVERWRITTEN* if it exists. If this environment variable is undefined, the statistics will be output to stderr
@code
KMP_STATS_THREADS
@endcode
This environment variable indicates to print thread-specific statistics as well as aggregate statistics. Each thread's statistics will be shown as well as the collective sum of all threads. The values "true", "on", "1", "yes" will all indicate to print per thread statistics.
@defgroup TASKING Tasking support
These functions support are used to implement tasking constructs.
These functions support tasking constructs.
@defgroup USER User visible functions
These functions can be called directly by the user, but are runtime library specific, rather than being OpenMP interfaces.
*/

View File

@ -1,6 +1,6 @@
# defs.mk
# $Revision: 42061 $
# $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
# $Revision: 42951 $
# $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
#
#//===----------------------------------------------------------------------===//

View File

@ -161,10 +161,8 @@
# Regular entry points
__kmp_wait_yield_4
__kmp_wait_yield_8
__kmp_wait_sleep
__kmp_fork_call
__kmp_invoke_microtask
__kmp_release
__kmp_launch_monitor
__kmp_launch_worker
__kmp_reap_monitor
@ -192,6 +190,14 @@
_You_must_link_with_Microsoft_OpenMP_library DATA
%endif
__kmp_wait_32
__kmp_wait_64
__kmp_wait_oncore
__kmp_release_32
__kmp_release_64
__kmp_release_oncore
# VT_getthid 1
# vtgthid 2
@ -360,6 +366,18 @@ kmpc_set_defaults 224
__kmpc_cancel 244
__kmpc_cancellationpoint 245
__kmpc_cancel_barrier 246
__kmpc_dist_for_static_init_4 247
__kmpc_dist_for_static_init_4u 248
__kmpc_dist_for_static_init_8 249
__kmpc_dist_for_static_init_8u 250
__kmpc_dist_dispatch_init_4 251
__kmpc_dist_dispatch_init_4u 252
__kmpc_dist_dispatch_init_8 253
__kmpc_dist_dispatch_init_8u 254
__kmpc_team_static_init_4 255
__kmpc_team_static_init_4u 256
__kmpc_team_static_init_8 257
__kmpc_team_static_init_8u 258
%endif # OMP_40
%endif

View File

@ -40,6 +40,8 @@ VERSION {
__kmp_thread_pool;
__kmp_thread_pool_nth;
__kmp_reset_stats;
#if USE_ITT_BUILD
#
# ITT support.
@ -64,8 +66,12 @@ VERSION {
__kmp_launch_worker;
__kmp_reap_monitor;
__kmp_reap_worker;
__kmp_release;
__kmp_wait_sleep;
__kmp_release_32;
__kmp_release_64;
__kmp_release_oncore;
__kmp_wait_32;
__kmp_wait_64;
__kmp_wait_oncore;
__kmp_wait_yield_4;
__kmp_wait_yield_8;

View File

@ -1,7 +1,7 @@
/*
* extractExternal.cpp
* $Revision: 42181 $
* $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
* $Revision: 43084 $
* $Date: 2014-04-15 09:15:14 -0500 (Tue, 15 Apr 2014) $
*/

View File

@ -1,6 +1,6 @@
# en_US.txt #
# $Revision: 42659 $
# $Date: 2013-09-12 09:22:48 -0500 (Thu, 12 Sep 2013) $
# $Revision: 43419 $
# $Date: 2014-08-27 14:59:52 -0500 (Wed, 27 Aug 2014) $
#
#//===----------------------------------------------------------------------===//
@ -40,7 +40,7 @@ Language "English"
Country "USA"
LangId "1033"
Version "2"
Revision "20130911"
Revision "20140827"
@ -290,7 +290,7 @@ ChangeThreadAffMaskError "Cannot change thread affinity mask."
ThreadsMigrate "%1$s: Threads may migrate across %2$d innermost levels of machine"
DecreaseToThreads "%1$s: decrease to %2$d threads"
IncreaseToThreads "%1$s: increase to %2$d threads"
BoundToOSProcSet "%1$s: Internal thread %2$d bound to OS proc set %3$s"
OBSOLETE "%1$s: Internal thread %2$d bound to OS proc set %3$s"
AffCapableUseCpuinfo "%1$s: Affinity capable, using cpuinfo file"
AffUseGlobCpuid "%1$s: Affinity capable, using global cpuid info"
AffCapableUseFlat "%1$s: Affinity capable, using default \"flat\" topology"
@ -395,9 +395,17 @@ AffThrPlaceInvalid "%1$s: invalid value \"%2$s\", valid format is \"nC
AffThrPlaceUnsupported "KMP_PLACE_THREADS ignored: unsupported architecture."
AffThrPlaceManyCores "KMP_PLACE_THREADS ignored: too many cores requested."
SyntaxErrorUsing "%1$s: syntax error, using %2$s."
AdaptiveNotSupported "%1$s: Adaptive locks are not supported; using queuing."
EnvSyntaxError "%1$s: Invalid symbols found. Check the value \"%2$s\"."
EnvSpacesNotAllowed "%1$s: Spaces between digits are not allowed \"%2$s\"."
AdaptiveNotSupported "%1$s: Adaptive locks are not supported; using queuing."
EnvSyntaxError "%1$s: Invalid symbols found. Check the value \"%2$s\"."
EnvSpacesNotAllowed "%1$s: Spaces between digits are not allowed \"%2$s\"."
BoundToOSProcSet "%1$s: pid %2$d thread %3$d bound to OS proc set %4$s"
CnsLoopIncrIllegal "%1$s error: parallel loop increment and condition are inconsistent."
NoGompCancellation "libgomp cancellation is not currently supported."
AffThrPlaceNonUniform "KMP_PLACE_THREADS ignored: non-uniform topology."
AffThrPlaceNonThreeLevel "KMP_PLACE_THREADS ignored: only three-level topology is supported."
AffGranTopGroup "%1$s: granularity=%2$s is not supported with KMP_TOPOLOGY_METHOD=group. Using \"granularity=fine\"."
AffGranGroupType "%1$s: granularity=group is not supported with KMP_AFFINITY=%2$s. Using \"granularity=core\"."
# --------------------------------------------------------------------------------------------------
-*- HINTS -*-

View File

@ -1,7 +1,7 @@
/*
* include/25/iomp.h.var
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,6 +1,6 @@
! include/25/iomp_lib.h.var
! $Revision: 42061 $
! $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//

View File

@ -1,7 +1,7 @@
/*
* include/25/omp.h.var
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,6 +1,6 @@
! include/25/omp_lib.f.var
! $Revision: 42181 $
! $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//
@ -314,7 +314,7 @@
!dec$ else
!***
!*** On Windows* OS IA-32 architecture, the Fortran entry points have an
!*** On Windows* OS IA-32 architecture, the Fortran entry points have an
!*** underscore prepended.
!***

View File

@ -1,6 +1,6 @@
! include/25/omp_lib.f90.var
! $Revision: 42061 $
! $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//

View File

@ -1,6 +1,6 @@
! include/25/omp_lib.h.var
! $Revision: 42181 $
! $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//
@ -301,7 +301,7 @@
!dec$ else
!***
!*** On Windows* OS IA-32 architecture, the Fortran entry points have an
!*** On Windows* OS IA-32 architecture, the Fortran entry points have an
!*** underscore prepended.
!***

View File

@ -1,7 +1,7 @@
/*
* include/30/iomp.h.var
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,6 +1,6 @@
! include/30/iomp_lib.h.var
! $Revision: 42061 $
! $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//

View File

@ -1,7 +1,7 @@
/*
* include/30/omp.h.var
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,6 +1,6 @@
! include/30/omp_lib.f.var
! $Revision: 42181 $
! $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//

View File

@ -1,6 +1,6 @@
! include/30/omp_lib.f90.var
! $Revision: 42061 $
! $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//

View File

@ -1,6 +1,6 @@
! include/30/omp_lib.h.var
! $Revision: 42181 $
! $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
! $Revision: 42951 $
! $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
!
!//===----------------------------------------------------------------------===//

View File

@ -91,7 +91,7 @@
} kmp_cancel_kind_t;
extern int __KAI_KMPC_CONVENTION kmp_get_cancellation_status(kmp_cancel_kind_t);
# undef __KAI_KMPC_CONVENTION
/* Warning:

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
/*
* kmp_affinity.cpp -- affinity management
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -19,7 +19,7 @@
#include "kmp_i18n.h"
#include "kmp_io.h"
#include "kmp_str.h"
#include "kmp_wrapper_getpid.h"
#if KMP_AFFINITY_SUPPORTED
@ -49,7 +49,7 @@ __kmp_affinity_print_mask(char *buf, int buf_len, kmp_affin_mask_t *mask)
return buf;
}
sprintf(scan, "{%ld", i);
sprintf(scan, "{%ld", (long)i);
while (*scan != '\0') scan++;
i++;
for (; i < KMP_CPU_SETSIZE; i++) {
@ -66,7 +66,7 @@ __kmp_affinity_print_mask(char *buf, int buf_len, kmp_affin_mask_t *mask)
if (end - scan < 15) {
break;
}
sprintf(scan, ",%-ld", i);
sprintf(scan, ",%-ld", (long)i);
while (*scan != '\0') scan++;
}
if (i < KMP_CPU_SETSIZE) {
@ -89,7 +89,6 @@ __kmp_affinity_entire_machine_mask(kmp_affin_mask_t *mask)
if (__kmp_num_proc_groups > 1) {
int group;
struct GROUP_AFFINITY ga;
KMP_DEBUG_ASSERT(__kmp_GetActiveProcessorCount != NULL);
for (group = 0; group < __kmp_num_proc_groups; group++) {
int i;
@ -315,6 +314,106 @@ __kmp_affinity_cmp_Address_child_num(const void *a, const void *b)
return 0;
}
/** A structure for holding machine-specific hierarchy info to be computed once at init. */
class hierarchy_info {
public:
/** Typical levels are threads/core, cores/package or socket, packages/node, nodes/machine,
etc. We don't want to get specific with nomenclature */
static const kmp_uint32 maxLevels=7;
/** This is specifically the depth of the machine configuration hierarchy, in terms of the
number of levels along the longest path from root to any leaf. It corresponds to the
number of entries in numPerLevel if we exclude all but one trailing 1. */
kmp_uint32 depth;
kmp_uint32 base_depth;
kmp_uint32 base_num_threads;
bool uninitialized;
/** Level 0 corresponds to leaves. numPerLevel[i] is the number of children the parent of a
node at level i has. For example, if we have a machine with 4 packages, 4 cores/package
and 2 HT per core, then numPerLevel = {2, 4, 4, 1, 1}. All empty levels are set to 1. */
kmp_uint32 numPerLevel[maxLevels];
kmp_uint32 skipPerLevel[maxLevels];
void deriveLevels(AddrUnsPair *adr2os, int num_addrs) {
int hier_depth = adr2os[0].first.depth;
int level = 0;
for (int i=hier_depth-1; i>=0; --i) {
int max = -1;
for (int j=0; j<num_addrs; ++j) {
int next = adr2os[j].first.childNums[i];
if (next > max) max = next;
}
numPerLevel[level] = max+1;
++level;
}
}
hierarchy_info() : depth(1), uninitialized(true) {}
void init(AddrUnsPair *adr2os, int num_addrs)
{
uninitialized = false;
for (kmp_uint32 i=0; i<maxLevels; ++i) { // init numPerLevel[*] to 1 item per level
numPerLevel[i] = 1;
skipPerLevel[i] = 1;
}
// Sort table by physical ID
if (adr2os) {
qsort(adr2os, num_addrs, sizeof(*adr2os), __kmp_affinity_cmp_Address_labels);
deriveLevels(adr2os, num_addrs);
}
else {
numPerLevel[0] = 4;
numPerLevel[1] = num_addrs/4;
if (num_addrs%4) numPerLevel[1]++;
}
base_num_threads = num_addrs;
for (int i=maxLevels-1; i>=0; --i) // count non-empty levels to get depth
if (numPerLevel[i] != 1 || depth > 1) // only count one top-level '1'
depth++;
kmp_uint32 branch = 4;
if (numPerLevel[0] == 1) branch = num_addrs/4;
if (branch<4) branch=4;
for (kmp_uint32 d=0; d<depth-1; ++d) { // optimize hierarchy width
while (numPerLevel[d] > branch || (d==0 && numPerLevel[d]>4)) { // max 4 on level 0!
if (numPerLevel[d] & 1) numPerLevel[d]++;
numPerLevel[d] = numPerLevel[d] >> 1;
if (numPerLevel[d+1] == 1) depth++;
numPerLevel[d+1] = numPerLevel[d+1] << 1;
}
if(numPerLevel[0] == 1) {
branch = branch >> 1;
if (branch<4) branch = 4;
}
}
for (kmp_uint32 i=1; i<depth; ++i)
skipPerLevel[i] = numPerLevel[i-1] * skipPerLevel[i-1];
base_depth = depth;
}
};
static hierarchy_info machine_hierarchy;
void __kmp_get_hierarchy(kmp_uint32 nproc, kmp_bstate_t *thr_bar) {
if (machine_hierarchy.uninitialized)
machine_hierarchy.init(NULL, nproc);
if (nproc <= machine_hierarchy.base_num_threads)
machine_hierarchy.depth = machine_hierarchy.base_depth;
KMP_DEBUG_ASSERT(machine_hierarchy.depth > 0);
while (nproc > machine_hierarchy.skipPerLevel[machine_hierarchy.depth-1]) {
machine_hierarchy.depth++;
machine_hierarchy.skipPerLevel[machine_hierarchy.depth-1] = 2*machine_hierarchy.skipPerLevel[machine_hierarchy.depth-2];
}
thr_bar->depth = machine_hierarchy.depth;
thr_bar->base_leaf_kids = (kmp_uint8)machine_hierarchy.numPerLevel[0]-1;
thr_bar->skip_per_level = machine_hierarchy.skipPerLevel;
}
//
// When sorting by labels, __kmp_affinity_assign_child_nums() must first be
@ -1963,7 +2062,7 @@ __kmp_affinity_create_cpuinfo_map(AddrUnsPair **address2os, int *line,
// A newline has signalled the end of the processor record.
// Check that there aren't too many procs specified.
//
if (num_avail == __kmp_xproc) {
if ((int)num_avail == __kmp_xproc) {
CLEANUP_THREAD_INFO;
*msg_id = kmp_i18n_str_TooManyEntries;
return -1;
@ -2587,7 +2686,7 @@ static int nextNewMask;
#define ADD_MASK_OSID(_osId,_osId2Mask,_maxOsId) \
{ \
if (((_osId) > _maxOsId) || \
(! KMP_CPU_ISSET((_osId), KMP_CPU_INDEX(_osId2Mask, (_osId))))) {\
(! KMP_CPU_ISSET((_osId), KMP_CPU_INDEX((_osId2Mask), (_osId))))) { \
if (__kmp_affinity_verbose || (__kmp_affinity_warnings \
&& (__kmp_affinity_type != affinity_none))) { \
KMP_WARNING(AffIgnoreInvalidProcID, _osId); \
@ -3045,14 +3144,15 @@ __kmp_process_place(const char **scan, kmp_affin_mask_t *osId2Mask,
(*setSize)++;
}
*scan = next; // skip num
}
}
else {
KMP_ASSERT2(0, "bad explicit places list");
}
}
static void
//static void
void
__kmp_affinity_process_placelist(kmp_affin_mask_t **out_masks,
unsigned int *out_numMasks, const char *placelist,
kmp_affin_mask_t *osId2Mask, int maxOsId)
@ -3109,71 +3209,41 @@ __kmp_affinity_process_placelist(kmp_affin_mask_t **out_masks,
// valid follow sets are ',' ':' and EOL
//
SKIP_WS(scan);
int stride;
if (*scan == '\0' || *scan == ',') {
int i;
for (i = 0; i < count; i++) {
int j;
if (setSize == 0) {
break;
}
ADD_MASK(tempMask);
setSize = 0;
for (j = __kmp_affin_mask_size * CHAR_BIT - 1; j > 0; j--) {
//
// Use a temp var in case macro is changed to evaluate
// args multiple times.
//
if (KMP_CPU_ISSET(j - 1, tempMask)) {
KMP_CPU_SET(j, tempMask);
setSize++;
}
else {
KMP_CPU_CLR(j, tempMask);
}
}
for (; j >= 0; j--) {
KMP_CPU_CLR(j, tempMask);
}
}
KMP_CPU_ZERO(tempMask);
setSize = 0;
stride = +1;
}
else {
KMP_ASSERT2(*scan == ':', "bad explicit places list");
scan++; // skip ':'
if (*scan == '\0') {
//
// Read stride parameter
//
int sign = +1;
for (;;) {
SKIP_WS(scan);
if (*scan == '+') {
scan++; // skip '+'
continue;
}
if (*scan == '-') {
sign *= -1;
scan++; // skip '-'
continue;
}
break;
}
scan++; // skip ','
continue;
}
KMP_ASSERT2(*scan == ':', "bad explicit places list");
scan++; // skip ':'
//
// Read stride parameter
//
int sign = +1;
for (;;) {
SKIP_WS(scan);
if (*scan == '+') {
scan++; // skip '+'
continue;
}
if (*scan == '-') {
sign *= -1;
scan++; // skip '-'
continue;
}
break;
KMP_ASSERT2((*scan >= '0') && (*scan <= '9'),
"bad explicit places list");
next = scan;
SKIP_DIGITS(next);
stride = __kmp_str_to_int(scan, *next);
KMP_DEBUG_ASSERT(stride >= 0);
scan = next;
stride *= sign;
}
SKIP_WS(scan);
KMP_ASSERT2((*scan >= '0') && (*scan <= '9'),
"bad explicit places list");
next = scan;
SKIP_DIGITS(next);
int stride = __kmp_str_to_int(scan, *next);
KMP_DEBUG_ASSERT(stride >= 0);
scan = next;
stride *= sign;
if (stride > 0) {
int i;
@ -3185,12 +3255,20 @@ __kmp_affinity_process_placelist(kmp_affin_mask_t **out_masks,
ADD_MASK(tempMask);
setSize = 0;
for (j = __kmp_affin_mask_size * CHAR_BIT - 1; j >= stride; j--) {
if (KMP_CPU_ISSET(j - stride, tempMask)) {
KMP_CPU_SET(j, tempMask);
setSize++;
if (! KMP_CPU_ISSET(j - stride, tempMask)) {
KMP_CPU_CLR(j, tempMask);
}
else if ((j > maxOsId) ||
(! KMP_CPU_ISSET(j, KMP_CPU_INDEX(osId2Mask, j)))) {
if (__kmp_affinity_verbose || (__kmp_affinity_warnings
&& (__kmp_affinity_type != affinity_none))) {
KMP_WARNING(AffIgnoreInvalidProcID, j);
}
KMP_CPU_CLR(j, tempMask);
}
else {
KMP_CPU_CLR(j, tempMask);
KMP_CPU_SET(j, tempMask);
setSize++;
}
}
for (; j >= 0; j--) {
@ -3201,23 +3279,31 @@ __kmp_affinity_process_placelist(kmp_affin_mask_t **out_masks,
else {
int i;
for (i = 0; i < count; i++) {
unsigned j;
int j;
if (setSize == 0) {
break;
}
ADD_MASK(tempMask);
setSize = 0;
for (j = 0; j < (__kmp_affin_mask_size * CHAR_BIT) + stride;
for (j = 0; j < ((int)__kmp_affin_mask_size * CHAR_BIT) + stride;
j++) {
if (KMP_CPU_ISSET(j - stride, tempMask)) {
if (! KMP_CPU_ISSET(j - stride, tempMask)) {
KMP_CPU_CLR(j, tempMask);
}
else if ((j > maxOsId) ||
(! KMP_CPU_ISSET(j, KMP_CPU_INDEX(osId2Mask, j)))) {
if (__kmp_affinity_verbose || (__kmp_affinity_warnings
&& (__kmp_affinity_type != affinity_none))) {
KMP_WARNING(AffIgnoreInvalidProcID, j);
}
KMP_CPU_CLR(j, tempMask);
}
else {
KMP_CPU_SET(j, tempMask);
setSize++;
}
else {
KMP_CPU_CLR(j, tempMask);
}
}
for (; j < __kmp_affin_mask_size * CHAR_BIT; j++) {
for (; j < (int)__kmp_affin_mask_size * CHAR_BIT; j++) {
KMP_CPU_CLR(j, tempMask);
}
}
@ -3270,9 +3356,13 @@ __kmp_apply_thread_places(AddrUnsPair **pAddr, int depth)
}
__kmp_place_num_cores = nCoresPerPkg; // use all available cores
}
if ( !__kmp_affinity_uniform_topology() || depth != 3 ) {
KMP_WARNING( AffThrPlaceUnsupported );
return; // don't support non-uniform topology or not-3-level architecture
if ( !__kmp_affinity_uniform_topology() ) {
KMP_WARNING( AffThrPlaceNonUniform );
return; // don't support non-uniform topology
}
if ( depth != 3 ) {
KMP_WARNING( AffThrPlaceNonThreeLevel );
return; // don't support not-3-level topology
}
if ( __kmp_place_num_threads_per_core == 0 ) {
__kmp_place_num_threads_per_core = __kmp_nThreadsPerCore; // use all HW contexts
@ -3400,18 +3490,14 @@ __kmp_aux_affinity_initialize(void)
}
if (depth < 0) {
if ((msg_id != kmp_i18n_null)
&& (__kmp_affinity_verbose || (__kmp_affinity_warnings
&& (__kmp_affinity_type != affinity_none)))) {
# if KMP_MIC
if (__kmp_affinity_verbose) {
if (__kmp_affinity_verbose) {
if (msg_id != kmp_i18n_null) {
KMP_INFORM(AffInfoStrStr, "KMP_AFFINITY", __kmp_i18n_catgets(msg_id),
KMP_I18N_STR(DecodingLegacyAPIC));
}
# else
KMP_WARNING(AffInfoStrStr, "KMP_AFFINITY", __kmp_i18n_catgets(msg_id),
KMP_I18N_STR(DecodingLegacyAPIC));
# endif
else {
KMP_INFORM(AffInfoStr, "KMP_AFFINITY", KMP_I18N_STR(DecodingLegacyAPIC));
}
}
file_name = NULL;
@ -3428,19 +3514,13 @@ __kmp_aux_affinity_initialize(void)
# if KMP_OS_LINUX
if (depth < 0) {
if ((msg_id != kmp_i18n_null)
&& (__kmp_affinity_verbose || (__kmp_affinity_warnings
&& (__kmp_affinity_type != affinity_none)))) {
# if KMP_MIC
if (__kmp_affinity_verbose) {
if (__kmp_affinity_verbose) {
if (msg_id != kmp_i18n_null) {
KMP_INFORM(AffStrParseFilename, "KMP_AFFINITY", __kmp_i18n_catgets(msg_id), "/proc/cpuinfo");
}
# else
KMP_WARNING(AffStrParseFilename, "KMP_AFFINITY", __kmp_i18n_catgets(msg_id), "/proc/cpuinfo");
# endif
}
else if (__kmp_affinity_verbose) {
KMP_INFORM(AffParseFilename, "KMP_AFFINITY", "/proc/cpuinfo");
else {
KMP_INFORM(AffParseFilename, "KMP_AFFINITY", "/proc/cpuinfo");
}
}
FILE *f = fopen("/proc/cpuinfo", "r");
@ -3461,20 +3541,32 @@ __kmp_aux_affinity_initialize(void)
# endif /* KMP_OS_LINUX */
# if KMP_OS_WINDOWS && KMP_ARCH_X86_64
if ((depth < 0) && (__kmp_num_proc_groups > 1)) {
if (__kmp_affinity_verbose) {
KMP_INFORM(AffWindowsProcGroupMap, "KMP_AFFINITY");
}
depth = __kmp_affinity_create_proc_group_map(&address2os, &msg_id);
KMP_ASSERT(depth != 0);
}
# endif /* KMP_OS_WINDOWS && KMP_ARCH_X86_64 */
if (depth < 0) {
if (msg_id != kmp_i18n_null
&& (__kmp_affinity_verbose || (__kmp_affinity_warnings
&& (__kmp_affinity_type != affinity_none)))) {
if (__kmp_affinity_verbose && (msg_id != kmp_i18n_null)) {
if (file_name == NULL) {
KMP_WARNING(UsingFlatOS, __kmp_i18n_catgets(msg_id));
KMP_INFORM(UsingFlatOS, __kmp_i18n_catgets(msg_id));
}
else if (line == 0) {
KMP_WARNING(UsingFlatOSFile, file_name, __kmp_i18n_catgets(msg_id));
KMP_INFORM(UsingFlatOSFile, file_name, __kmp_i18n_catgets(msg_id));
}
else {
KMP_WARNING(UsingFlatOSFileLine, file_name, line, __kmp_i18n_catgets(msg_id));
KMP_INFORM(UsingFlatOSFileLine, file_name, line, __kmp_i18n_catgets(msg_id));
}
}
// FIXME - print msg if msg_id = kmp_i18n_null ???
file_name = "";
depth = __kmp_affinity_create_flat_map(&address2os, &msg_id);
@ -3508,7 +3600,6 @@ __kmp_aux_affinity_initialize(void)
KMP_ASSERT(address2os == NULL);
return;
}
if (depth < 0) {
KMP_ASSERT(msg_id != kmp_i18n_null);
KMP_FATAL(MsgExiting, __kmp_i18n_catgets(msg_id));
@ -3526,7 +3617,6 @@ __kmp_aux_affinity_initialize(void)
KMP_ASSERT(address2os == NULL);
return;
}
if (depth < 0) {
KMP_ASSERT(msg_id != kmp_i18n_null);
KMP_FATAL(MsgExiting, __kmp_i18n_catgets(msg_id));
@ -3597,23 +3687,9 @@ __kmp_aux_affinity_initialize(void)
depth = __kmp_affinity_create_proc_group_map(&address2os, &msg_id);
KMP_ASSERT(depth != 0);
if (depth < 0) {
if ((msg_id != kmp_i18n_null)
&& (__kmp_affinity_verbose || (__kmp_affinity_warnings
&& (__kmp_affinity_type != affinity_none)))) {
KMP_WARNING(UsingFlatOS, __kmp_i18n_catgets(msg_id));
}
depth = __kmp_affinity_create_flat_map(&address2os, &msg_id);
if (depth == 0) {
KMP_ASSERT(__kmp_affinity_type == affinity_none);
KMP_ASSERT(address2os == NULL);
return;
}
// should not fail
KMP_ASSERT(depth > 0);
KMP_ASSERT(address2os != NULL);
KMP_ASSERT(msg_id != kmp_i18n_null);
KMP_FATAL(MsgExiting, __kmp_i18n_catgets(msg_id));
}
}
@ -3658,7 +3734,7 @@ __kmp_aux_affinity_initialize(void)
kmp_affin_mask_t *osId2Mask = __kmp_create_masks(&maxIndex, &numUnique,
address2os, __kmp_avail_proc);
if (__kmp_affinity_gran_levels == 0) {
KMP_DEBUG_ASSERT(numUnique == __kmp_avail_proc);
KMP_DEBUG_ASSERT((int)numUnique == __kmp_avail_proc);
}
//
@ -3852,6 +3928,7 @@ __kmp_aux_affinity_initialize(void)
}
__kmp_free(osId2Mask);
machine_hierarchy.init(address2os, __kmp_avail_proc);
}
@ -3953,7 +4030,7 @@ __kmp_affinity_set_init_mask(int gtid, int isa_root)
}
# endif
KMP_ASSERT(fullMask != NULL);
i = -1;
i = KMP_PLACE_ALL;
mask = fullMask;
}
else {
@ -4020,7 +4097,8 @@ __kmp_affinity_set_init_mask(int gtid, int isa_root)
char buf[KMP_AFFIN_MASK_PRINT_LEN];
__kmp_affinity_print_mask(buf, KMP_AFFIN_MASK_PRINT_LEN,
th->th.th_affin_mask);
KMP_INFORM(BoundToOSProcSet, "KMP_AFFINITY", gtid, buf);
KMP_INFORM(BoundToOSProcSet, "KMP_AFFINITY", (kmp_int32)getpid(), gtid,
buf);
}
# if KMP_OS_WINDOWS
@ -4058,14 +4136,14 @@ __kmp_affinity_set_place(int gtid)
// Check that the new place is within this thread's partition.
//
KMP_DEBUG_ASSERT(th->th.th_affin_mask != NULL);
KMP_DEBUG_ASSERT(th->th.th_new_place >= 0);
KMP_DEBUG_ASSERT((unsigned)th->th.th_new_place <= __kmp_affinity_num_masks);
KMP_ASSERT(th->th.th_new_place >= 0);
KMP_ASSERT((unsigned)th->th.th_new_place <= __kmp_affinity_num_masks);
if (th->th.th_first_place <= th->th.th_last_place) {
KMP_DEBUG_ASSERT((th->th.th_new_place >= th->th.th_first_place)
KMP_ASSERT((th->th.th_new_place >= th->th.th_first_place)
&& (th->th.th_new_place <= th->th.th_last_place));
}
else {
KMP_DEBUG_ASSERT((th->th.th_new_place <= th->th.th_first_place)
KMP_ASSERT((th->th.th_new_place <= th->th.th_first_place)
|| (th->th.th_new_place >= th->th.th_last_place));
}
@ -4082,7 +4160,8 @@ __kmp_affinity_set_place(int gtid)
char buf[KMP_AFFIN_MASK_PRINT_LEN];
__kmp_affinity_print_mask(buf, KMP_AFFIN_MASK_PRINT_LEN,
th->th.th_affin_mask);
KMP_INFORM(BoundToOSProcSet, "OMP_PROC_BIND", gtid, buf);
KMP_INFORM(BoundToOSProcSet, "OMP_PROC_BIND", (kmp_int32)getpid(),
gtid, buf);
}
__kmp_set_system_affinity(th->th.th_affin_mask, TRUE);
}
@ -4153,6 +4232,11 @@ __kmp_aux_set_affinity(void **mask)
th->th.th_new_place = KMP_PLACE_UNDEFINED;
th->th.th_first_place = 0;
th->th.th_last_place = __kmp_affinity_num_masks - 1;
//
// Turn off 4.0 affinity for the current tread at this parallel level.
//
th->th.th_current_task->td_icvs.proc_bind = proc_bind_false;
# endif
return retval;
@ -4207,7 +4291,6 @@ __kmp_aux_get_affinity(void **mask)
}
int
__kmp_aux_set_affinity_mask_proc(int proc, void **mask)
{
@ -4360,7 +4443,8 @@ void __kmp_balanced_affinity( int tid, int nthreads )
if (__kmp_affinity_verbose) {
char buf[KMP_AFFIN_MASK_PRINT_LEN];
__kmp_affinity_print_mask(buf, KMP_AFFIN_MASK_PRINT_LEN, mask);
KMP_INFORM(BoundToOSProcSet, "KMP_AFFINITY", tid, buf);
KMP_INFORM(BoundToOSProcSet, "KMP_AFFINITY", (kmp_int32)getpid(),
tid, buf);
}
__kmp_set_system_affinity( mask, TRUE );
} else { // Non-uniform topology
@ -4535,7 +4619,8 @@ void __kmp_balanced_affinity( int tid, int nthreads )
if (__kmp_affinity_verbose) {
char buf[KMP_AFFIN_MASK_PRINT_LEN];
__kmp_affinity_print_mask(buf, KMP_AFFIN_MASK_PRINT_LEN, mask);
KMP_INFORM(BoundToOSProcSet, "KMP_AFFINITY", tid, buf);
KMP_INFORM(BoundToOSProcSet, "KMP_AFFINITY", (kmp_int32)getpid(),
tid, buf);
}
__kmp_set_system_affinity( mask, TRUE );
}
@ -4543,4 +4628,50 @@ void __kmp_balanced_affinity( int tid, int nthreads )
# endif /* KMP_MIC */
#else
// affinity not supported
kmp_uint32 mac_skipPerLevel[7];
kmp_uint32 mac_depth;
kmp_uint8 mac_leaf_kids;
void __kmp_get_hierarchy(kmp_uint32 nproc, kmp_bstate_t *thr_bar) {
static int first = 1;
if (first) {
const kmp_uint32 maxLevels = 7;
kmp_uint32 numPerLevel[maxLevels];
for (kmp_uint32 i=0; i<maxLevels; ++i) { // init numPerLevel[*] to 1 item per level
numPerLevel[i] = 1;
mac_skipPerLevel[i] = 1;
}
mac_depth = 2;
numPerLevel[0] = nproc;
kmp_uint32 branch = 4;
if (numPerLevel[0] == 1) branch = nproc/4;
if (branch<4) branch=4;
for (kmp_uint32 d=0; d<mac_depth-1; ++d) { // optimize hierarchy width
while (numPerLevel[d] > branch || (d==0 && numPerLevel[d]>4)) { // max 4 on level 0!
if (numPerLevel[d] & 1) numPerLevel[d]++;
numPerLevel[d] = numPerLevel[d] >> 1;
if (numPerLevel[d+1] == 1) mac_depth++;
numPerLevel[d+1] = numPerLevel[d+1] << 1;
}
if(numPerLevel[0] == 1) {
branch = branch >> 1;
if (branch<4) branch = 4;
}
}
for (kmp_uint32 i=1; i<mac_depth; ++i)
mac_skipPerLevel[i] = numPerLevel[i-1] * mac_skipPerLevel[i-1];
mac_leaf_kids = (kmp_uint8)numPerLevel[0]-1;
first=0;
}
thr_bar->depth = mac_depth;
thr_bar->base_leaf_kids = mac_leaf_kids;
thr_bar->skip_per_level = mac_skipPerLevel;
}
#endif // KMP_AFFINITY_SUPPORTED

View File

@ -1,7 +1,7 @@
/*
* kmp_alloc.c -- private/shared dyanmic memory allocation and management
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43450 $
* $Date: 2014-09-09 10:07:22 -0500 (Tue, 09 Sep 2014) $
*/
@ -1228,7 +1228,7 @@ bpoold( kmp_info_t *th, void *buf, int dumpalloc, int dumpfree)
bufdump( th, (void *) (((char *) b) + sizeof(bhead_t)));
}
} else {
char *lerr = "";
const char *lerr = "";
KMP_DEBUG_ASSERT(bs > 0);
if ((b->ql.blink->ql.flink != b) || (b->ql.flink->ql.blink != b)) {
@ -1772,7 +1772,11 @@ ___kmp_free( void * ptr KMP_SRC_LOC_DECL )
#ifndef LEAK_MEMORY
KE_TRACE( 10, ( " free( %p )\n", descr.ptr_allocated ) );
# ifdef KMP_DEBUG
_free_src_loc( descr.ptr_allocated, _file_, _line_ );
# else
free_src_loc( descr.ptr_allocated KMP_SRC_LOC_PARM );
# endif
#endif
KMP_MB();
@ -1790,7 +1794,7 @@ ___kmp_free( void * ptr KMP_SRC_LOC_DECL )
// Otherwise allocate normally using kmp_thread_malloc.
// AC: How to choose the limit? Just get 16 for now...
static int const __kmp_free_list_limit = 16;
#define KMP_FREE_LIST_LIMIT 16
// Always use 128 bytes for determining buckets for caching memory blocks
#define DCACHE_LINE 128
@ -1932,7 +1936,7 @@ ___kmp_fast_free( kmp_info_t *this_thr, void * ptr KMP_SRC_LOC_DECL )
kmp_mem_descr_t * dsc = (kmp_mem_descr_t *)( (char*)head - sizeof(kmp_mem_descr_t) );
kmp_info_t * q_th = (kmp_info_t *)(dsc->ptr_aligned); // allocating thread, same for all queue nodes
size_t q_sz = dsc->size_allocated + 1; // new size in case we add current task
if ( q_th == alloc_thr && q_sz <= __kmp_free_list_limit ) {
if ( q_th == alloc_thr && q_sz <= KMP_FREE_LIST_LIMIT ) {
// we can add current task to "other" list, no sync needed
*((void **)ptr) = head;
descr->size_allocated = q_sz;

View File

@ -1,7 +1,7 @@
/*
* kmp_atomic.c -- ATOMIC implementation routines
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43421 $
* $Date: 2014-08-28 08:56:10 -0500 (Thu, 28 Aug 2014) $
*/
@ -690,7 +690,7 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * lh
#endif /* KMP_GOMP_COMPAT */
#if KMP_MIC
# define KMP_DO_PAUSE _mm_delay_32( 30 )
# define KMP_DO_PAUSE _mm_delay_32( 1 )
#else
# define KMP_DO_PAUSE KMP_CPU_PAUSE()
#endif /* KMP_MIC */
@ -700,14 +700,10 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * lh
// TYPE - operands' type
// BITS - size in bits, used to distinguish low level calls
// OP - operator
// Note: temp_val introduced in order to force the compiler to read
// *lhs only once (w/o it the compiler reads *lhs twice)
#define OP_CMPXCHG(TYPE,BITS,OP) \
{ \
TYPE KMP_ATOMIC_VOLATILE temp_val; \
TYPE old_value, new_value; \
temp_val = *lhs; \
old_value = temp_val; \
old_value = *(TYPE volatile *)lhs; \
new_value = old_value OP rhs; \
while ( ! KMP_COMPARE_AND_STORE_ACQ##BITS( (kmp_int##BITS *) lhs, \
*VOLATILE_CAST(kmp_int##BITS *) &old_value, \
@ -715,8 +711,7 @@ RET_TYPE __kmpc_atomic_##TYPE_ID##_##OP_ID( ident_t *id_ref, int gtid, TYPE * lh
{ \
KMP_DO_PAUSE; \
\
temp_val = *lhs; \
old_value = temp_val; \
old_value = *(TYPE volatile *)lhs; \
new_value = old_value OP rhs; \
} \
}
@ -765,13 +760,6 @@ ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void) \
KMP_TEST_THEN_ADD##BITS( lhs, OP rhs ); \
}
// -------------------------------------------------------------------------
#define ATOMIC_FLOAT_ADD(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void) \
OP_GOMP_CRITICAL(OP##=,GOMP_FLAG) \
/* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */ \
KMP_TEST_THEN_ADD_REAL##BITS( lhs, OP rhs ); \
}
// -------------------------------------------------------------------------
#define ATOMIC_CMPXCHG(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void) \
OP_GOMP_CRITICAL(OP##=,GOMP_FLAG) \
@ -803,17 +791,6 @@ ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void) \
} \
}
// -------------------------------------------------------------------------
#define ATOMIC_FLOAT_ADD(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void) \
OP_GOMP_CRITICAL(OP##=,GOMP_FLAG) \
if ( ! ( (kmp_uintptr_t) lhs & 0x##MASK) ) { \
OP_CMPXCHG(TYPE,BITS,OP) /* aligned address */ \
} else { \
KMP_CHECK_GTID; \
OP_CRITICAL(OP##=,LCK_ID) /* unaligned address - use critical */ \
} \
}
// -------------------------------------------------------------------------
#define ATOMIC_CMPXCHG(TYPE_ID,OP_ID,TYPE,BITS,OP,LCK_ID,MASK,GOMP_FLAG) \
ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void) \
OP_GOMP_CRITICAL(OP##=,GOMP_FLAG) \
@ -845,25 +822,15 @@ ATOMIC_BEGIN(TYPE_ID,OP_ID,TYPE,void)
ATOMIC_FIXED_ADD( fixed4, add, kmp_int32, 32, +, 4i, 3, 0 ) // __kmpc_atomic_fixed4_add
ATOMIC_FIXED_ADD( fixed4, sub, kmp_int32, 32, -, 4i, 3, 0 ) // __kmpc_atomic_fixed4_sub
#if KMP_MIC
ATOMIC_CMPXCHG( float4, add, kmp_real32, 32, +, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add
ATOMIC_CMPXCHG( float4, sub, kmp_real32, 32, -, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub
#else
ATOMIC_FLOAT_ADD( float4, add, kmp_real32, 32, +, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add
ATOMIC_FLOAT_ADD( float4, sub, kmp_real32, 32, -, 4r, 3, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub
#endif // KMP_MIC
// Routines for ATOMIC 8-byte operands addition and subtraction
ATOMIC_FIXED_ADD( fixed8, add, kmp_int64, 64, +, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_add
ATOMIC_FIXED_ADD( fixed8, sub, kmp_int64, 64, -, 8i, 7, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_sub
#if KMP_MIC
ATOMIC_CMPXCHG( float8, add, kmp_real64, 64, +, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_add
ATOMIC_CMPXCHG( float8, sub, kmp_real64, 64, -, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub
#else
ATOMIC_FLOAT_ADD( float8, add, kmp_real64, 64, +, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_add
ATOMIC_FLOAT_ADD( float8, sub, kmp_real64, 64, -, 8r, 7, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub
#endif // KMP_MIC
// ------------------------------------------------------------------------
// Entries definition for integer operands
@ -1867,35 +1834,16 @@ ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE) \
return old_value; \
}
// -------------------------------------------------------------------------
#define ATOMIC_FLOAT_ADD_CPT(TYPE_ID,OP_ID,TYPE,BITS,OP,GOMP_FLAG) \
ATOMIC_BEGIN_CPT(TYPE_ID,OP_ID,TYPE,TYPE) \
TYPE old_value, new_value; \
OP_GOMP_CRITICAL_CPT(OP,GOMP_FLAG) \
/* OP used as a sign for subtraction: (lhs-rhs) --> (lhs+-rhs) */ \
old_value = KMP_TEST_THEN_ADD_REAL##BITS( lhs, OP rhs ); \
if( flag ) { \
return old_value OP rhs; \
} else \
return old_value; \
}
// -------------------------------------------------------------------------
ATOMIC_FIXED_ADD_CPT( fixed4, add_cpt, kmp_int32, 32, +, 0 ) // __kmpc_atomic_fixed4_add_cpt
ATOMIC_FIXED_ADD_CPT( fixed4, sub_cpt, kmp_int32, 32, -, 0 ) // __kmpc_atomic_fixed4_sub_cpt
ATOMIC_FIXED_ADD_CPT( fixed8, add_cpt, kmp_int64, 64, +, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_add_cpt
ATOMIC_FIXED_ADD_CPT( fixed8, sub_cpt, kmp_int64, 64, -, KMP_ARCH_X86 ) // __kmpc_atomic_fixed8_sub_cpt
#if KMP_MIC
ATOMIC_CMPXCHG_CPT( float4, add_cpt, kmp_real32, 32, +, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add_cpt
ATOMIC_CMPXCHG_CPT( float4, sub_cpt, kmp_real32, 32, -, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_cpt
ATOMIC_CMPXCHG_CPT( float8, add_cpt, kmp_real64, 64, +, KMP_ARCH_X86 ) // __kmpc_atomic_float8_add_cpt
ATOMIC_CMPXCHG_CPT( float8, sub_cpt, kmp_real64, 64, -, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub_cpt
#else
ATOMIC_FLOAT_ADD_CPT( float4, add_cpt, kmp_real32, 32, +, KMP_ARCH_X86 ) // __kmpc_atomic_float4_add_cpt
ATOMIC_FLOAT_ADD_CPT( float4, sub_cpt, kmp_real32, 32, -, KMP_ARCH_X86 ) // __kmpc_atomic_float4_sub_cpt
ATOMIC_FLOAT_ADD_CPT( float8, add_cpt, kmp_real64, 64, +, KMP_ARCH_X86 ) // __kmpc_atomic_float8_add_cpt
ATOMIC_FLOAT_ADD_CPT( float8, sub_cpt, kmp_real64, 64, -, KMP_ARCH_X86 ) // __kmpc_atomic_float8_sub_cpt
#endif // KMP_MIC
// ------------------------------------------------------------------------
// Entries definition for integer operands

View File

@ -1,7 +1,7 @@
/*
* kmp_atomic.h - ATOMIC header file
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43191 $
* $Date: 2014-05-27 07:44:11 -0500 (Tue, 27 May 2014) $
*/
@ -33,7 +33,7 @@
#if defined( __cplusplus ) && ( KMP_OS_WINDOWS )
// create shortcuts for c99 complex types
#ifdef _DEBUG
#if (_MSC_VER < 1600) && defined(_DEBUG)
// Workaround for the problem of _DebugHeapTag unresolved external.
// This problem prevented to use our static debug library for C tests
// compiled with /MDd option (the library itself built with /MTd),

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
/*
* kmp_csupport.c -- kfront linkage support for OpenMP.
* $Revision: 42826 $
* $Date: 2013-11-20 03:39:45 -0600 (Wed, 20 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -20,6 +20,7 @@
#include "kmp_i18n.h"
#include "kmp_itt.h"
#include "kmp_error.h"
#include "kmp_stats.h"
#define MAX_MESSAGE 512
@ -35,7 +36,7 @@
* @param flags in for future use (currently ignored)
*
* Initialize the runtime library. This call is optional; if it is not made then
* it will be implicilty called by attempts to use other library functions.
* it will be implicitly called by attempts to use other library functions.
*
*/
void
@ -276,13 +277,18 @@ Do the actual fork and call the microtask in the relevant number of threads.
void
__kmpc_fork_call(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...)
{
KMP_STOP_EXPLICIT_TIMER(OMP_serial);
KMP_COUNT_BLOCK(OMP_PARALLEL);
int gtid = __kmp_entry_gtid();
// maybe to save thr_state is enough here
{
va_list ap;
va_start( ap, microtask );
__kmp_fork_call( loc, gtid, TRUE,
#if INCLUDE_SSC_MARKS
SSC_MARK_FORKING();
#endif
__kmp_fork_call( loc, gtid, fork_context_intel,
argc,
VOLATILE_CAST(microtask_t) microtask,
VOLATILE_CAST(launch_t) __kmp_invoke_task_func,
@ -293,10 +299,14 @@ __kmpc_fork_call(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...)
ap
#endif
);
#if INCLUDE_SSC_MARKS
SSC_MARK_JOINING();
#endif
__kmp_join_call( loc, gtid );
va_end( ap );
}
KMP_START_EXPLICIT_TIMER(OMP_serial);
}
#if OMP_40_ENABLED
@ -337,17 +347,18 @@ __kmpc_fork_teams(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...)
va_start( ap, microtask );
// remember teams entry point and nesting level
this_thr->th.th_team_microtask = microtask;
this_thr->th.th_teams_microtask = microtask;
this_thr->th.th_teams_level = this_thr->th.th_team->t.t_level; // AC: can be >0 on host
// check if __kmpc_push_num_teams called, set default number of teams otherwise
if ( this_thr->th.th_set_nth_teams == 0 ) {
if ( this_thr->th.th_teams_size.nteams == 0 ) {
__kmp_push_num_teams( loc, gtid, 0, 0 );
}
KMP_DEBUG_ASSERT(this_thr->th.th_set_nproc >= 1);
KMP_DEBUG_ASSERT(this_thr->th.th_set_nth_teams >= 1);
KMP_DEBUG_ASSERT(this_thr->th.th_teams_size.nteams >= 1);
KMP_DEBUG_ASSERT(this_thr->th.th_teams_size.nth >= 1);
__kmp_fork_call( loc, gtid, TRUE,
__kmp_fork_call( loc, gtid, fork_context_intel,
argc,
VOLATILE_CAST(microtask_t) __kmp_teams_master,
VOLATILE_CAST(launch_t) __kmp_invoke_teams_master,
@ -358,9 +369,9 @@ __kmpc_fork_teams(ident_t *loc, kmp_int32 argc, kmpc_micro microtask, ...)
#endif
);
__kmp_join_call( loc, gtid );
this_thr->th.th_team_microtask = NULL;
this_thr->th.th_teams_microtask = NULL;
this_thr->th.th_teams_level = 0;
*(kmp_int64*)(&this_thr->th.th_teams_size) = 0L;
va_end( ap );
}
#endif /* OMP_40_ENABLED */
@ -393,252 +404,9 @@ when the condition is false.
void
__kmpc_serialized_parallel(ident_t *loc, kmp_int32 global_tid)
{
kmp_info_t *this_thr;
kmp_team_t *serial_team;
KC_TRACE( 10, ("__kmpc_serialized_parallel: called by T#%d\n", global_tid ) );
/* Skip all this code for autopar serialized loops since it results in
unacceptable overhead */
if( loc != NULL && (loc->flags & KMP_IDENT_AUTOPAR ) )
return;
if( ! TCR_4( __kmp_init_parallel ) )
__kmp_parallel_initialize();
this_thr = __kmp_threads[ global_tid ];
serial_team = this_thr -> th.th_serial_team;
/* utilize the serialized team held by this thread */
KMP_DEBUG_ASSERT( serial_team );
KMP_MB();
#if OMP_30_ENABLED
if ( __kmp_tasking_mode != tskm_immediate_exec ) {
KMP_DEBUG_ASSERT( this_thr -> th.th_task_team == this_thr -> th.th_team -> t.t_task_team );
KMP_DEBUG_ASSERT( serial_team -> t.t_task_team == NULL );
KA_TRACE( 20, ( "__kmpc_serialized_parallel: T#%d pushing task_team %p / team %p, new task_team = NULL\n",
global_tid, this_thr -> th.th_task_team, this_thr -> th.th_team ) );
this_thr -> th.th_task_team = NULL;
}
#endif // OMP_30_ENABLED
#if OMP_40_ENABLED
kmp_proc_bind_t proc_bind = this_thr->th.th_set_proc_bind;
if ( this_thr->th.th_current_task->td_icvs.proc_bind == proc_bind_false ) {
proc_bind = proc_bind_false;
}
else if ( proc_bind == proc_bind_default ) {
//
// No proc_bind clause was specified, so use the current value
// of proc-bind-var for this parallel region.
//
proc_bind = this_thr->th.th_current_task->td_icvs.proc_bind;
}
//
// Reset for next parallel region
//
this_thr->th.th_set_proc_bind = proc_bind_default;
#endif /* OMP_3_ENABLED */
if( this_thr -> th.th_team != serial_team ) {
#if OMP_30_ENABLED
// Nested level will be an index in the nested nthreads array
int level = this_thr->th.th_team->t.t_level;
#endif
if( serial_team -> t.t_serialized ) {
/* this serial team was already used
* TODO increase performance by making this locks more specific */
kmp_team_t *new_team;
int tid = this_thr->th.th_info.ds.ds_tid;
__kmp_acquire_bootstrap_lock( &__kmp_forkjoin_lock );
new_team = __kmp_allocate_team(this_thr->th.th_root, 1, 1,
#if OMP_40_ENABLED
proc_bind,
#endif
#if OMP_30_ENABLED
& this_thr->th.th_current_task->td_icvs,
#else
this_thr->th.th_team->t.t_set_nproc[tid],
this_thr->th.th_team->t.t_set_dynamic[tid],
this_thr->th.th_team->t.t_set_nested[tid],
this_thr->th.th_team->t.t_set_blocktime[tid],
this_thr->th.th_team->t.t_set_bt_intervals[tid],
this_thr->th.th_team->t.t_set_bt_set[tid],
#endif // OMP_30_ENABLED
0);
__kmp_release_bootstrap_lock( &__kmp_forkjoin_lock );
KMP_ASSERT( new_team );
/* setup new serialized team and install it */
new_team -> t.t_threads[0] = this_thr;
new_team -> t.t_parent = this_thr -> th.th_team;
serial_team = new_team;
this_thr -> th.th_serial_team = serial_team;
KF_TRACE( 10, ( "__kmpc_serialized_parallel: T#%d allocated new serial team %p\n",
global_tid, serial_team ) );
/* TODO the above breaks the requirement that if we run out of
* resources, then we can still guarantee that serialized teams
* are ok, since we may need to allocate a new one */
} else {
KF_TRACE( 10, ( "__kmpc_serialized_parallel: T#%d reusing cached serial team %p\n",
global_tid, serial_team ) );
}
/* we have to initialize this serial team */
KMP_DEBUG_ASSERT( serial_team->t.t_threads );
KMP_DEBUG_ASSERT( serial_team->t.t_threads[0] == this_thr );
KMP_DEBUG_ASSERT( this_thr->th.th_team != serial_team );
serial_team -> t.t_ident = loc;
serial_team -> t.t_serialized = 1;
serial_team -> t.t_nproc = 1;
serial_team -> t.t_parent = this_thr->th.th_team;
#if OMP_30_ENABLED
serial_team -> t.t_sched = this_thr->th.th_team->t.t_sched;
#endif // OMP_30_ENABLED
this_thr -> th.th_team = serial_team;
serial_team -> t.t_master_tid = this_thr->th.th_info.ds.ds_tid;
#if OMP_30_ENABLED
KF_TRACE( 10, ( "__kmpc_serialized_parallel: T#d curtask=%p\n",
global_tid, this_thr->th.th_current_task ) );
KMP_ASSERT( this_thr->th.th_current_task->td_flags.executing == 1 );
this_thr->th.th_current_task->td_flags.executing = 0;
__kmp_push_current_task_to_thread( this_thr, serial_team, 0 );
/* TODO: GEH: do the ICVs work for nested serialized teams? Don't we need an implicit task for
each serialized task represented by team->t.t_serialized? */
copy_icvs(
& this_thr->th.th_current_task->td_icvs,
& this_thr->th.th_current_task->td_parent->td_icvs );
// Thread value exists in the nested nthreads array for the next nested level
if ( __kmp_nested_nth.used && ( level + 1 < __kmp_nested_nth.used ) ) {
this_thr->th.th_current_task->td_icvs.nproc = __kmp_nested_nth.nth[ level + 1 ];
}
#if OMP_40_ENABLED
if ( __kmp_nested_proc_bind.used && ( level + 1 < __kmp_nested_proc_bind.used ) ) {
this_thr->th.th_current_task->td_icvs.proc_bind
= __kmp_nested_proc_bind.bind_types[ level + 1 ];
}
#endif /* OMP_40_ENABLED */
#else /* pre-3.0 icv's */
serial_team -> t.t_set_nproc[0] = serial_team->t.t_parent->
t.t_set_nproc[serial_team->
t.t_master_tid];
serial_team -> t.t_set_dynamic[0] = serial_team->t.t_parent->
t.t_set_dynamic[serial_team->
t.t_master_tid];
serial_team -> t.t_set_nested[0] = serial_team->t.t_parent->
t.t_set_nested[serial_team->
t.t_master_tid];
serial_team -> t.t_set_blocktime[0] = serial_team->t.t_parent->
t.t_set_blocktime[serial_team->
t.t_master_tid];
serial_team -> t.t_set_bt_intervals[0] = serial_team->t.t_parent->
t.t_set_bt_intervals[serial_team->
t.t_master_tid];
serial_team -> t.t_set_bt_set[0] = serial_team->t.t_parent->
t.t_set_bt_set[serial_team->
t.t_master_tid];
#endif // OMP_30_ENABLED
this_thr -> th.th_info.ds.ds_tid = 0;
/* set thread cache values */
this_thr -> th.th_team_nproc = 1;
this_thr -> th.th_team_master = this_thr;
this_thr -> th.th_team_serialized = 1;
#if OMP_30_ENABLED
serial_team -> t.t_level = serial_team -> t.t_parent -> t.t_level + 1;
serial_team -> t.t_active_level = serial_team -> t.t_parent -> t.t_active_level;
#endif // OMP_30_ENABLED
#if KMP_ARCH_X86 || KMP_ARCH_X86_64
if ( __kmp_inherit_fp_control ) {
__kmp_store_x87_fpu_control_word( &serial_team->t.t_x87_fpu_control_word );
__kmp_store_mxcsr( &serial_team->t.t_mxcsr );
serial_team->t.t_mxcsr &= KMP_X86_MXCSR_MASK;
serial_team->t.t_fp_control_saved = TRUE;
} else {
serial_team->t.t_fp_control_saved = FALSE;
}
#endif /* KMP_ARCH_X86 || KMP_ARCH_X86_64 */
/* check if we need to allocate dispatch buffers stack */
KMP_DEBUG_ASSERT(serial_team->t.t_dispatch);
if ( !serial_team->t.t_dispatch->th_disp_buffer ) {
serial_team->t.t_dispatch->th_disp_buffer = (dispatch_private_info_t *)
__kmp_allocate( sizeof( dispatch_private_info_t ) );
}
this_thr -> th.th_dispatch = serial_team->t.t_dispatch;
KMP_MB();
} else {
/* this serialized team is already being used,
* that's fine, just add another nested level */
KMP_DEBUG_ASSERT( this_thr->th.th_team == serial_team );
KMP_DEBUG_ASSERT( serial_team -> t.t_threads );
KMP_DEBUG_ASSERT( serial_team -> t.t_threads[0] == this_thr );
++ serial_team -> t.t_serialized;
this_thr -> th.th_team_serialized = serial_team -> t.t_serialized;
#if OMP_30_ENABLED
// Nested level will be an index in the nested nthreads array
int level = this_thr->th.th_team->t.t_level;
// Thread value exists in the nested nthreads array for the next nested level
if ( __kmp_nested_nth.used && ( level + 1 < __kmp_nested_nth.used ) ) {
this_thr->th.th_current_task->td_icvs.nproc = __kmp_nested_nth.nth[ level + 1 ];
}
serial_team -> t.t_level++;
KF_TRACE( 10, ( "__kmpc_serialized_parallel: T#%d increasing nesting level of serial team %p to %d\n",
global_tid, serial_team, serial_team -> t.t_level ) );
#else
KF_TRACE( 10, ( "__kmpc_serialized_parallel: T#%d reusing team %p for nested serialized parallel region\n",
global_tid, serial_team ) );
#endif // OMP_30_ENABLED
/* allocate/push dispatch buffers stack */
KMP_DEBUG_ASSERT(serial_team->t.t_dispatch);
{
dispatch_private_info_t * disp_buffer = (dispatch_private_info_t *)
__kmp_allocate( sizeof( dispatch_private_info_t ) );
disp_buffer->next = serial_team->t.t_dispatch->th_disp_buffer;
serial_team->t.t_dispatch->th_disp_buffer = disp_buffer;
}
this_thr -> th.th_dispatch = serial_team->t.t_dispatch;
KMP_MB();
}
if ( __kmp_env_consistency_check )
__kmp_push_parallel( global_tid, NULL );
// t_level is not available in 2.5 build, so check for OMP_30_ENABLED
#if USE_ITT_BUILD && OMP_30_ENABLED
// Mark the start of the "parallel" region for VTune. Only use one of frame notification scheme at the moment.
if ( ( __itt_frame_begin_v3_ptr && __kmp_forkjoin_frames && ! __kmp_forkjoin_frames_mode ) || KMP_ITT_DEBUG )
{
__kmp_itt_region_forking( global_tid, 1 );
}
if( ( __kmp_forkjoin_frames_mode == 1 || __kmp_forkjoin_frames_mode == 3 ) && __itt_frame_submit_v3_ptr && __itt_get_timestamp_ptr )
{
#if USE_ITT_NOTIFY
if( this_thr->th.th_team->t.t_level == 1 ) {
this_thr->th.th_frame_time_serialized = __itt_get_timestamp();
}
#endif
}
#endif /* USE_ITT_BUILD */
__kmp_serialized_parallel(loc, global_tid); /* The implementation is now in kmp_runtime.c so that it can share static functions with
* kmp_fork_call since the tasks to be done are similar in each case.
*/
}
/*!
@ -680,26 +448,13 @@ __kmpc_end_serialized_parallel(ident_t *loc, kmp_int32 global_tid)
/* If necessary, pop the internal control stack values and replace the team values */
top = serial_team -> t.t_control_stack_top;
if ( top && top -> serial_nesting_level == serial_team -> t.t_serialized ) {
#if OMP_30_ENABLED
copy_icvs(
&serial_team -> t.t_threads[0] -> th.th_current_task -> td_icvs,
top );
#else
serial_team -> t.t_set_nproc[0] = top -> nproc;
serial_team -> t.t_set_dynamic[0] = top -> dynamic;
serial_team -> t.t_set_nested[0] = top -> nested;
serial_team -> t.t_set_blocktime[0] = top -> blocktime;
serial_team -> t.t_set_bt_intervals[0] = top -> bt_intervals;
serial_team -> t.t_set_bt_set[0] = top -> bt_set;
#endif // OMP_30_ENABLED
copy_icvs( &serial_team -> t.t_threads[0] -> th.th_current_task -> td_icvs, top );
serial_team -> t.t_control_stack_top = top -> next;
__kmp_free(top);
}
#if OMP_30_ENABLED
//if( serial_team -> t.t_serialized > 1 )
serial_team -> t.t_level--;
#endif // OMP_30_ENABLED
/* pop dispatch buffers stack */
KMP_DEBUG_ASSERT(serial_team->t.t_dispatch->th_disp_buffer);
@ -735,7 +490,6 @@ __kmpc_end_serialized_parallel(ident_t *loc, kmp_int32 global_tid)
this_thr -> th.th_dispatch = & this_thr -> th.th_team ->
t.t_dispatch[ serial_team -> t.t_master_tid ];
#if OMP_30_ENABLED
__kmp_pop_current_task_from_thread( this_thr );
KMP_ASSERT( this_thr -> th.th_current_task -> td_flags.executing == 0 );
@ -752,32 +506,37 @@ __kmpc_end_serialized_parallel(ident_t *loc, kmp_int32 global_tid)
KA_TRACE( 20, ( "__kmpc_end_serialized_parallel: T#%d restoring task_team %p / team %p\n",
global_tid, this_thr -> th.th_task_team, this_thr -> th.th_team ) );
}
#endif // OMP_30_ENABLED
}
else {
#if OMP_30_ENABLED
} else {
if ( __kmp_tasking_mode != tskm_immediate_exec ) {
KA_TRACE( 20, ( "__kmpc_end_serialized_parallel: T#%d decreasing nesting depth of serial team %p to %d\n",
global_tid, serial_team, serial_team -> t.t_serialized ) );
}
#endif // OMP_30_ENABLED
}
// t_level is not available in 2.5 build, so check for OMP_30_ENABLED
#if USE_ITT_BUILD && OMP_30_ENABLED
#if USE_ITT_BUILD
kmp_uint64 cur_time = 0;
#if USE_ITT_NOTIFY
if( __itt_get_timestamp_ptr ) {
cur_time = __itt_get_timestamp();
}
#endif /* USE_ITT_NOTIFY */
// Report the barrier
if( ( __kmp_forkjoin_frames_mode == 1 || __kmp_forkjoin_frames_mode == 3 ) && __itt_frame_submit_v3_ptr ) {
if( this_thr->th.th_team->t.t_level == 0 ) {
__kmp_itt_frame_submit( global_tid, this_thr->th.th_frame_time_serialized, cur_time, 0, loc, this_thr->th.th_team_nproc, 0 );
}
}
// Mark the end of the "parallel" region for VTune. Only use one of frame notification scheme at the moment.
if ( ( __itt_frame_end_v3_ptr && __kmp_forkjoin_frames && ! __kmp_forkjoin_frames_mode ) || KMP_ITT_DEBUG )
{
this_thr->th.th_ident = loc;
__kmp_itt_region_joined( global_tid, 1 );
}
if( ( __kmp_forkjoin_frames_mode == 1 || __kmp_forkjoin_frames_mode == 3 ) && __itt_frame_submit_v3_ptr ) {
if( this_thr->th.th_team->t.t_level == 0 ) {
__kmp_itt_frame_submit( global_tid, this_thr->th.th_frame_time_serialized, __itt_timestamp_none, 0, loc );
}
if ( ( __itt_frame_submit_v3_ptr && __kmp_forkjoin_frames_mode == 3 ) || KMP_ITT_DEBUG )
{
this_thr->th.th_ident = loc;
// Since barrier frame for serialized region is equal to the region we use the same begin timestamp as for the barrier.
__kmp_itt_frame_submit( global_tid, serial_team->t.t_region_time, cur_time, 0, loc, this_thr->th.th_team_nproc, 2 );
}
#endif /* USE_ITT_BUILD */
@ -805,55 +564,50 @@ __kmpc_flush(ident_t *loc, ...)
/* need explicit __mf() here since use volatile instead in library */
KMP_MB(); /* Flush all pending memory write invalidates. */
// This is not an OMP 3.0 feature.
// This macro is used here just not to let the change go to 10.1.
// This change will go to the mainline first.
#if OMP_30_ENABLED
#if ( KMP_ARCH_X86 || KMP_ARCH_X86_64 )
#if KMP_MIC
// fence-style instructions do not exist, but lock; xaddl $0,(%rsp) can be used.
// We shouldn't need it, though, since the ABI rules require that
// * If the compiler generates NGO stores it also generates the fence
// * If users hand-code NGO stores they should insert the fence
// therefore no incomplete unordered stores should be visible.
#else
// C74404
// This is to address non-temporal store instructions (sfence needed).
// The clflush instruction is addressed either (mfence needed).
// Probably the non-temporal load monvtdqa instruction should also be addressed.
// mfence is a SSE2 instruction. Do not execute it if CPU is not SSE2.
if ( ! __kmp_cpuinfo.initialized ) {
__kmp_query_cpuid( & __kmp_cpuinfo );
}; // if
if ( ! __kmp_cpuinfo.sse2 ) {
// CPU cannot execute SSE2 instructions.
} else {
#if KMP_COMPILER_ICC || KMP_COMPILER_MSVC
_mm_mfence();
#else
__sync_synchronize();
#endif // KMP_COMPILER_ICC
}; // if
#endif // KMP_MIC
#elif KMP_ARCH_ARM
// Nothing yet
#elif KMP_ARCH_PPC64
// Nothing needed here (we have a real MB above).
#if KMP_OS_CNK
// The flushing thread needs to yield here; this prevents a
// busy-waiting thread from saturating the pipeline. flush is
// often used in loops like this:
// while (!flag) {
// #pragma omp flush(flag)
// }
// and adding the yield here is good for at least a 10x speedup
// when running >2 threads per core (on the NAS LU benchmark).
__kmp_yield(TRUE);
#endif
#if ( KMP_ARCH_X86 || KMP_ARCH_X86_64 )
#if KMP_MIC
// fence-style instructions do not exist, but lock; xaddl $0,(%rsp) can be used.
// We shouldn't need it, though, since the ABI rules require that
// * If the compiler generates NGO stores it also generates the fence
// * If users hand-code NGO stores they should insert the fence
// therefore no incomplete unordered stores should be visible.
#else
#error Unknown or unsupported architecture
// C74404
// This is to address non-temporal store instructions (sfence needed).
// The clflush instruction is addressed either (mfence needed).
// Probably the non-temporal load monvtdqa instruction should also be addressed.
// mfence is a SSE2 instruction. Do not execute it if CPU is not SSE2.
if ( ! __kmp_cpuinfo.initialized ) {
__kmp_query_cpuid( & __kmp_cpuinfo );
}; // if
if ( ! __kmp_cpuinfo.sse2 ) {
// CPU cannot execute SSE2 instructions.
} else {
#if KMP_COMPILER_ICC || KMP_COMPILER_MSVC
_mm_mfence();
#else
__sync_synchronize();
#endif // KMP_COMPILER_ICC
}; // if
#endif // KMP_MIC
#elif KMP_ARCH_ARM
// Nothing yet
#elif KMP_ARCH_PPC64
// Nothing needed here (we have a real MB above).
#if KMP_OS_CNK
// The flushing thread needs to yield here; this prevents a
// busy-waiting thread from saturating the pipeline. flush is
// often used in loops like this:
// while (!flag) {
// #pragma omp flush(flag)
// }
// and adding the yield here is good for at least a 10x speedup
// when running >2 threads per core (on the NAS LU benchmark).
__kmp_yield(TRUE);
#endif
#endif // OMP_30_ENABLED
#else
#error Unknown or unsupported architecture
#endif
}
@ -871,6 +625,8 @@ Execute a barrier.
void
__kmpc_barrier(ident_t *loc, kmp_int32 global_tid)
{
KMP_COUNT_BLOCK(OMP_BARRIER);
KMP_TIME_BLOCK(OMP_barrier);
int explicit_barrier_flag;
KC_TRACE( 10, ("__kmpc_barrier: called T#%d\n", global_tid ) );
@ -906,6 +662,7 @@ __kmpc_barrier(ident_t *loc, kmp_int32 global_tid)
kmp_int32
__kmpc_master(ident_t *loc, kmp_int32 global_tid)
{
KMP_COUNT_BLOCK(OMP_MASTER);
int status = 0;
KC_TRACE( 10, ("__kmpc_master: called T#%d\n", global_tid ) );
@ -1014,11 +771,6 @@ __kmpc_end_ordered( ident_t * loc, kmp_int32 gtid )
__kmp_parallel_dxo( & gtid, & cid, loc );
}
inline void
__kmp_static_yield( int arg ) { // AC: needed in macro __kmp_acquire_user_lock_with_checks
__kmp_yield( arg );
}
static kmp_user_lock_p
__kmp_get_critical_section_ptr( kmp_critical_name * crit, ident_t const * loc, kmp_int32 gtid )
{
@ -1082,6 +834,7 @@ This function blocks until the executing thread can enter the critical section.
*/
void
__kmpc_critical( ident_t * loc, kmp_int32 global_tid, kmp_critical_name * crit ) {
KMP_COUNT_BLOCK(OMP_CRITICAL);
kmp_user_lock_p lck;
@ -1194,6 +947,9 @@ __kmpc_barrier_master(ident_t *loc, kmp_int32 global_tid)
if ( __kmp_env_consistency_check )
__kmp_check_barrier( global_tid, ct_barrier, loc );
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc;
#endif
status = __kmp_barrier( bs_plain_barrier, global_tid, TRUE, 0, NULL, NULL );
return (status != 0) ? 0 : 1;
@ -1243,6 +999,9 @@ __kmpc_barrier_master_nowait( ident_t * loc, kmp_int32 global_tid )
__kmp_check_barrier( global_tid, ct_barrier, loc );
}
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc;
#endif
__kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
ret = __kmpc_master (loc, global_tid);
@ -1280,6 +1039,7 @@ introduce an explicit barrier if it is required.
kmp_int32
__kmpc_single(ident_t *loc, kmp_int32 global_tid)
{
KMP_COUNT_BLOCK(OMP_SINGLE);
kmp_int32 rc = __kmp_enter_single( global_tid, loc, TRUE );
return rc;
}
@ -1353,8 +1113,6 @@ ompc_set_nested( int flag )
set__nested( thread, flag ? TRUE : FALSE );
}
#if OMP_30_ENABLED
void
ompc_set_max_active_levels( int max_active_levels )
{
@ -1384,8 +1142,6 @@ ompc_get_team_size( int level )
return __kmp_get_team_size( __kmp_entry_gtid(), level );
}
#endif // OMP_30_ENABLED
void
kmpc_set_stacksize( int arg )
{
@ -1427,8 +1183,6 @@ kmpc_set_defaults( char const * str )
__kmp_aux_set_defaults( str, strlen( str ) );
}
#ifdef OMP_30_ENABLED
int
kmpc_set_affinity_mask_proc( int proc, void **mask )
{
@ -1468,7 +1222,6 @@ kmpc_get_affinity_mask_proc( int proc, void **mask )
#endif
}
#endif /* OMP_30_ENABLED */
/* -------------------------------------------------------------------------- */
/*!
@ -1533,6 +1286,9 @@ __kmpc_copyprivate( ident_t *loc, kmp_int32 gtid, size_t cpy_size, void *cpy_dat
if (didit) *data_ptr = cpy_data;
/* This barrier is not a barrier region boundary */
#if USE_ITT_NOTIFY
__kmp_threads[gtid]->th.th_ident = loc;
#endif
__kmp_barrier( bs_plain_barrier, gtid, FALSE , 0, NULL, NULL );
if (! didit) (*cpy_func)( cpy_data, *data_ptr );
@ -1540,6 +1296,9 @@ __kmpc_copyprivate( ident_t *loc, kmp_int32 gtid, size_t cpy_size, void *cpy_dat
/* Consider next barrier the user-visible barrier for barrier region boundaries */
/* Nesting checks are already handled by the single construct checks */
#if USE_ITT_NOTIFY
__kmp_threads[gtid]->th.th_ident = loc; // TODO: check if it is needed (e.g. tasks can overwrite the location)
#endif
__kmp_barrier( bs_plain_barrier, gtid, FALSE , 0, NULL, NULL );
}
@ -1722,6 +1481,7 @@ __kmpc_destroy_nest_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
void
__kmpc_set_lock( ident_t * loc, kmp_int32 gtid, void ** user_lock ) {
KMP_COUNT_BLOCK(OMP_set_lock);
kmp_user_lock_p lck;
if ( ( __kmp_user_lock_kind == lk_tas )
@ -1866,6 +1626,8 @@ __kmpc_unset_nest_lock( ident_t *loc, kmp_int32 gtid, void **user_lock )
int
__kmpc_test_lock( ident_t *loc, kmp_int32 gtid, void **user_lock )
{
KMP_COUNT_BLOCK(OMP_test_lock);
KMP_TIME_BLOCK(OMP_test_lock);
kmp_user_lock_p lck;
int rc;
@ -2028,9 +1790,14 @@ __kmpc_reduce_nowait(
kmp_int32 num_vars, size_t reduce_size, void *reduce_data, void (*reduce_func)(void *lhs_data, void *rhs_data),
kmp_critical_name *lck ) {
KMP_COUNT_BLOCK(REDUCE_nowait);
int retval;
PACKED_REDUCTION_METHOD_T packed_reduction_method;
#if OMP_40_ENABLED
kmp_team_t *team;
kmp_info_t *th;
int teams_swapped = 0, task_state;
#endif
KA_TRACE( 10, ( "__kmpc_reduce_nowait() enter: called T#%d\n", global_tid ) );
// why do we need this initialization here at all?
@ -2045,7 +1812,25 @@ __kmpc_reduce_nowait(
if ( __kmp_env_consistency_check )
__kmp_push_sync( global_tid, ct_reduce, loc, NULL );
// it's better to check an assertion ASSERT( thr_state == THR_WORK_STATE )
#if OMP_40_ENABLED
th = __kmp_thread_from_gtid(global_tid);
if( th->th.th_teams_microtask ) { // AC: check if we are inside the teams construct?
team = th->th.th_team;
if( team->t.t_level == th->th.th_teams_level ) {
// this is reduction at teams construct
KMP_DEBUG_ASSERT(!th->th.th_info.ds.ds_tid); // AC: check that tid == 0
// Let's swap teams temporarily for the reduction barrier
teams_swapped = 1;
th->th.th_info.ds.ds_tid = team->t.t_master_tid;
th->th.th_team = team->t.t_parent;
th->th.th_task_team = th->th.th_team->t.t_task_team;
th->th.th_team_nproc = th->th.th_team->t.t_nproc;
task_state = th->th.th_task_state;
if( th->th.th_task_team )
th->th.th_task_state = th->th.th_task_team->tt.tt_state;
}
}
#endif // OMP_40_ENABLED
// packed_reduction_method value will be reused by __kmp_end_reduce* function, the value should be kept in a variable
// the variable should be either a construct-specific or thread-specific property, not a team specific property
@ -2091,6 +1876,9 @@ __kmpc_reduce_nowait(
// this barrier should be invisible to a customer and to the thread profiler
// (it's neither a terminating barrier nor customer's code, it's used for an internal purpose)
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc;
#endif
retval = __kmp_barrier( UNPACK_REDUCTION_BARRIER( packed_reduction_method ), global_tid, FALSE, reduce_size, reduce_data, reduce_func );
retval = ( retval != 0 ) ? ( 0 ) : ( 1 );
@ -2108,7 +1896,16 @@ __kmpc_reduce_nowait(
KMP_ASSERT( 0 ); // "unexpected method"
}
#if OMP_40_ENABLED
if( teams_swapped ) {
// Restore thread structure
th->th.th_info.ds.ds_tid = 0;
th->th.th_team = team;
th->th.th_task_team = team->t.t_task_team;
th->th.th_team_nproc = team->t.t_nproc;
th->th.th_task_state = task_state;
}
#endif
KA_TRACE( 10, ( "__kmpc_reduce_nowait() exit: called T#%d: method %08x, returns %08x\n", global_tid, packed_reduction_method, retval ) );
return retval;
@ -2187,6 +1984,7 @@ __kmpc_reduce(
void (*reduce_func)(void *lhs_data, void *rhs_data),
kmp_critical_name *lck )
{
KMP_COUNT_BLOCK(REDUCE_wait);
int retval;
PACKED_REDUCTION_METHOD_T packed_reduction_method;
@ -2204,8 +2002,6 @@ __kmpc_reduce(
if ( __kmp_env_consistency_check )
__kmp_push_sync( global_tid, ct_reduce, loc, NULL );
// it's better to check an assertion ASSERT( thr_state == THR_WORK_STATE )
packed_reduction_method = __kmp_determine_reduction_method( loc, global_tid, num_vars, reduce_size, reduce_data, reduce_func, lck );
__KMP_SET_REDUCTION_METHOD( global_tid, packed_reduction_method );
@ -2228,6 +2024,9 @@ __kmpc_reduce(
//case tree_reduce_block:
// this barrier should be visible to a customer and to the thread profiler
// (it's a terminating barrier on constructs if NOWAIT not specified)
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc; // needed for correct notification of frames
#endif
retval = __kmp_barrier( UNPACK_REDUCTION_BARRIER( packed_reduction_method ), global_tid, TRUE, reduce_size, reduce_data, reduce_func );
retval = ( retval != 0 ) ? ( 0 ) : ( 1 );
@ -2277,6 +2076,9 @@ __kmpc_end_reduce( ident_t *loc, kmp_int32 global_tid, kmp_critical_name *lck )
__kmp_end_critical_section_reduce_block( loc, global_tid, lck );
// TODO: implicit barrier: should be exposed
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc;
#endif
__kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
} else if( packed_reduction_method == empty_reduce_block ) {
@ -2284,11 +2086,17 @@ __kmpc_end_reduce( ident_t *loc, kmp_int32 global_tid, kmp_critical_name *lck )
// usage: if team size == 1, no synchronization is required ( Intel platforms only )
// TODO: implicit barrier: should be exposed
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc;
#endif
__kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
} else if( packed_reduction_method == atomic_reduce_block ) {
// TODO: implicit barrier: should be exposed
#if USE_ITT_NOTIFY
__kmp_threads[global_tid]->th.th_ident = loc;
#endif
__kmp_barrier( bs_plain_barrier, global_tid, FALSE, 0, NULL, NULL );
} else if( TEST_REDUCTION_METHOD( packed_reduction_method, tree_reduce_block ) ) {
@ -2319,23 +2127,15 @@ __kmpc_end_reduce( ident_t *loc, kmp_int32 global_tid, kmp_critical_name *lck )
kmp_uint64
__kmpc_get_taskid() {
#if OMP_30_ENABLED
kmp_int32 gtid;
kmp_info_t * thread;
gtid = __kmp_get_gtid();
if ( gtid < 0 ) {
return 0;
}; // if
thread = __kmp_thread_from_gtid( gtid );
return thread->th.th_current_task->td_task_id;
#else
kmp_int32 gtid;
kmp_info_t * thread;
gtid = __kmp_get_gtid();
if ( gtid < 0 ) {
return 0;
#endif
}; // if
thread = __kmp_thread_from_gtid( gtid );
return thread->th.th_current_task->td_task_id;
} // __kmpc_get_taskid
@ -2343,25 +2143,17 @@ __kmpc_get_taskid() {
kmp_uint64
__kmpc_get_parent_taskid() {
#if OMP_30_ENABLED
kmp_int32 gtid;
kmp_info_t * thread;
kmp_taskdata_t * parent_task;
gtid = __kmp_get_gtid();
if ( gtid < 0 ) {
return 0;
}; // if
thread = __kmp_thread_from_gtid( gtid );
parent_task = thread->th.th_current_task->td_parent;
return ( parent_task == NULL ? 0 : parent_task->td_task_id );
#else
kmp_int32 gtid;
kmp_info_t * thread;
kmp_taskdata_t * parent_task;
gtid = __kmp_get_gtid();
if ( gtid < 0 ) {
return 0;
#endif
}; // if
thread = __kmp_thread_from_gtid( gtid );
parent_task = thread->th.th_current_task->td_parent;
return ( parent_task == NULL ? 0 : parent_task->td_task_id );
} // __kmpc_get_parent_taskid

View File

@ -1,7 +1,7 @@
/*
* kmp_debug.c -- debug utilities for the Guide library
* $Revision: 42150 $
* $Date: 2013-03-15 15:40:38 -0500 (Fri, 15 Mar 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_debug.h -- debug / assertion code for Assure library
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_dispatch.cpp: dynamic scheduling - iteration initialization and dispatch.
* $Revision: 42674 $
* $Date: 2013-09-18 11:12:49 -0500 (Wed, 18 Sep 2013) $
* $Revision: 43457 $
* $Date: 2014-09-17 03:57:22 -0500 (Wed, 17 Sep 2014) $
*/
@ -32,6 +32,7 @@
#include "kmp_itt.h"
#include "kmp_str.h"
#include "kmp_error.h"
#include "kmp_stats.h"
#if KMP_OS_WINDOWS && KMP_ARCH_X86
#include <float.h>
#endif
@ -39,6 +40,34 @@
/* ------------------------------------------------------------------------ */
/* ------------------------------------------------------------------------ */
// template for type limits
template< typename T >
struct i_maxmin {
static const T mx;
static const T mn;
};
template<>
struct i_maxmin< int > {
static const int mx = 0x7fffffff;
static const int mn = 0x80000000;
};
template<>
struct i_maxmin< unsigned int > {
static const unsigned int mx = 0xffffffff;
static const unsigned int mn = 0x00000000;
};
template<>
struct i_maxmin< long long > {
static const long long mx = 0x7fffffffffffffffLL;
static const long long mn = 0x8000000000000000LL;
};
template<>
struct i_maxmin< unsigned long long > {
static const unsigned long long mx = 0xffffffffffffffffLL;
static const unsigned long long mn = 0x0000000000000000LL;
};
//-------------------------------------------------------------------------
#ifdef KMP_STATIC_STEAL_ENABLED
// replaces dispatch_private_info{32,64} structures and dispatch_private_info{32,64}_t types
@ -148,22 +177,6 @@ struct dispatch_shared_info_template {
/* ------------------------------------------------------------------------ */
/* ------------------------------------------------------------------------ */
static void
__kmp_static_delay( int arg )
{
/* Work around weird code-gen bug that causes assert to trip */
#if KMP_ARCH_X86_64 && KMP_OS_LINUX
#else
KMP_ASSERT( arg >= 0 );
#endif
}
static void
__kmp_static_yield( int arg )
{
__kmp_yield( arg );
}
#undef USE_TEST_LOCKS
// test_then_add template (general template should NOT be used)
@ -294,8 +307,6 @@ __kmp_wait_yield( volatile UT * spinner,
/* if ( TCR_4(__kmp_global.g.g_done) && __kmp_global.g.g_abort)
__kmp_abort_thread(); */
__kmp_static_delay(TRUE);
// if we are oversubscribed,
// or have waited a bit (and KMP_LIBRARY=throughput, then yield
// pause is in the following code
@ -589,6 +600,9 @@ __kmp_dispatch_init(
if ( ! TCR_4( __kmp_init_parallel ) )
__kmp_parallel_initialize();
#if INCLUDE_SSC_MARKS
SSC_MARK_DISPATCH_INIT();
#endif
#ifdef KMP_DEBUG
{
const char * buff;
@ -606,6 +620,9 @@ __kmp_dispatch_init(
active = ! team -> t.t_serialized;
th->th.th_ident = loc;
#if USE_ITT_BUILD
kmp_uint64 cur_chunk = chunk;
#endif
if ( ! active ) {
pr = reinterpret_cast< dispatch_private_info_template< T >* >
( th -> th.th_dispatch -> th_disp_buffer ); /* top of the stack */
@ -640,23 +657,16 @@ __kmp_dispatch_init(
schedule = __kmp_static;
} else {
if ( schedule == kmp_sch_runtime ) {
#if OMP_30_ENABLED
// Use the scheduling specified by OMP_SCHEDULE (or __kmp_sch_default if not specified)
schedule = team -> t.t_sched.r_sched_type;
// Detail the schedule if needed (global controls are differentiated appropriately)
if ( schedule == kmp_sch_guided_chunked ) {
schedule = __kmp_guided;
} else if ( schedule == kmp_sch_static ) {
schedule = __kmp_static;
}
// Use the chunk size specified by OMP_SCHEDULE (or default if not specified)
chunk = team -> t.t_sched.chunk;
#else
kmp_r_sched_t r_sched = __kmp_get_schedule_global();
// Use the scheduling specified by OMP_SCHEDULE and/or KMP_SCHEDULE or default
schedule = r_sched.r_sched_type;
chunk = r_sched.chunk;
#endif
// Use the scheduling specified by OMP_SCHEDULE (or __kmp_sch_default if not specified)
schedule = team -> t.t_sched.r_sched_type;
// Detail the schedule if needed (global controls are differentiated appropriately)
if ( schedule == kmp_sch_guided_chunked ) {
schedule = __kmp_guided;
} else if ( schedule == kmp_sch_static ) {
schedule = __kmp_static;
}
// Use the chunk size specified by OMP_SCHEDULE (or default if not specified)
chunk = team -> t.t_sched.chunk;
#ifdef KMP_DEBUG
{
@ -678,7 +688,6 @@ __kmp_dispatch_init(
}
}
#if OMP_30_ENABLED
if ( schedule == kmp_sch_auto ) {
// mapping and differentiation: in the __kmp_do_serial_initialize()
schedule = __kmp_auto;
@ -694,7 +703,6 @@ __kmp_dispatch_init(
}
#endif
}
#endif // OMP_30_ENABLED
/* guided analytical not safe for too many threads */
if ( team->t.t_nproc > 1<<20 && schedule == kmp_sch_guided_analytical_chunked ) {
@ -848,6 +856,12 @@ __kmp_dispatch_init(
break;
}
}
#if USE_ITT_BUILD
// Calculate chunk for metadata report
if( __itt_metadata_add_ptr && __kmp_forkjoin_frames_mode == 3 ) {
cur_chunk = limit - init + 1;
}
#endif
if ( st == 1 ) {
pr->u.p.lb = lb + init;
pr->u.p.ub = lb + limit;
@ -1101,6 +1115,39 @@ __kmp_dispatch_init(
}; // if
#endif /* USE_ITT_BUILD */
}; // if
#if USE_ITT_BUILD
// Report loop metadata
if( __itt_metadata_add_ptr && __kmp_forkjoin_frames_mode == 3 ) {
kmp_uint32 tid = __kmp_tid_from_gtid( gtid );
if (KMP_MASTER_TID(tid)) {
kmp_uint64 schedtype = 0;
switch ( schedule ) {
case kmp_sch_static_chunked:
case kmp_sch_static_balanced:// Chunk is calculated in the switch above
break;
case kmp_sch_static_greedy:
cur_chunk = pr->u.p.parm1;
break;
case kmp_sch_dynamic_chunked:
schedtype = 1;
break;
case kmp_sch_guided_iterative_chunked:
case kmp_sch_guided_analytical_chunked:
schedtype = 2;
break;
default:
// Should we put this case under "static"?
// case kmp_sch_static_steal:
schedtype = 3;
break;
}
__kmp_itt_metadata_loop(loc, schedtype, tc, cur_chunk);
}
}
#endif /* USE_ITT_BUILD */
#ifdef KMP_DEBUG
{
const char * buff;
@ -1302,6 +1349,7 @@ __kmp_dispatch_next(
kmp_info_t * th = __kmp_threads[ gtid ];
kmp_team_t * team = th -> th.th_team;
KMP_DEBUG_ASSERT( p_last && p_lb && p_ub && p_st ); // AC: these cannot be NULL
#ifdef KMP_DEBUG
{
const char * buff;
@ -1323,9 +1371,10 @@ __kmp_dispatch_next(
if ( (status = (pr->u.p.tc != 0)) == 0 ) {
*p_lb = 0;
*p_ub = 0;
if ( p_st != 0 ) {
// if ( p_last != NULL )
// *p_last = 0;
if ( p_st != NULL )
*p_st = 0;
}
if ( __kmp_env_consistency_check ) {
if ( pr->pushed_ws != ct_none ) {
pr->pushed_ws = __kmp_pop_workshare( gtid, pr->pushed_ws, loc );
@ -1346,7 +1395,10 @@ __kmp_dispatch_next(
if ( (status = (init <= trip)) == 0 ) {
*p_lb = 0;
*p_ub = 0;
if ( p_st != 0 ) *p_st = 0;
// if ( p_last != NULL )
// *p_last = 0;
if ( p_st != NULL )
*p_st = 0;
if ( __kmp_env_consistency_check ) {
if ( pr->pushed_ws != ct_none ) {
pr->pushed_ws = __kmp_pop_workshare( gtid, pr->pushed_ws, loc );
@ -1363,12 +1415,10 @@ __kmp_dispatch_next(
pr->u.p.last_upper = pr->u.p.ub;
#endif /* KMP_OS_WINDOWS */
}
if ( p_last ) {
if ( p_last != NULL )
*p_last = last;
}
if ( p_st != 0 ) {
if ( p_st != NULL )
*p_st = incr;
}
if ( incr == 1 ) {
*p_lb = start + init;
*p_ub = start + limit;
@ -1395,19 +1445,15 @@ __kmp_dispatch_next(
} // if
} else {
pr->u.p.tc = 0;
*p_lb = pr->u.p.lb;
*p_ub = pr->u.p.ub;
#if KMP_OS_WINDOWS
pr->u.p.last_upper = *p_ub;
#endif /* KMP_OS_WINDOWS */
if ( p_st != 0 ) {
*p_st = pr->u.p.st;
}
if ( p_last ) {
if ( p_last != NULL )
*p_last = TRUE;
}
if ( p_st != NULL )
*p_st = pr->u.p.st;
} // if
#ifdef KMP_DEBUG
{
@ -1415,12 +1461,15 @@ __kmp_dispatch_next(
// create format specifiers before the debug output
buff = __kmp_str_format(
"__kmp_dispatch_next: T#%%d serialized case: p_lb:%%%s " \
"p_ub:%%%s p_st:%%%s p_last:%%p returning:%%d\n",
"p_ub:%%%s p_st:%%%s p_last:%%p %%d returning:%%d\n",
traits_t< T >::spec, traits_t< T >::spec, traits_t< ST >::spec );
KD_TRACE(10, ( buff, gtid, *p_lb, *p_ub, *p_st, p_last, status) );
KD_TRACE(10, ( buff, gtid, *p_lb, *p_ub, *p_st, p_last, *p_last, status) );
__kmp_str_free( &buff );
}
#endif
#if INCLUDE_SSC_MARKS
SSC_MARK_DISPATCH_NEXT();
#endif
return status;
} else {
kmp_int32 last = 0;
@ -1572,7 +1621,7 @@ __kmp_dispatch_next(
if ( !status ) {
*p_lb = 0;
*p_ub = 0;
if ( p_st != 0 ) *p_st = 0;
if ( p_st != NULL ) *p_st = 0;
} else {
start = pr->u.p.parm2;
init *= chunk;
@ -1582,10 +1631,7 @@ __kmp_dispatch_next(
KMP_DEBUG_ASSERT(init <= trip);
if ( (last = (limit >= trip)) != 0 )
limit = trip;
if ( p_last ) {
*p_last = last;
}
if ( p_st != 0 ) *p_st = incr;
if ( p_st != NULL ) *p_st = incr;
if ( incr == 1 ) {
*p_lb = start + init;
@ -1622,10 +1668,7 @@ __kmp_dispatch_next(
*p_lb = pr->u.p.lb;
*p_ub = pr->u.p.ub;
last = pr->u.p.parm1;
if ( p_last ) {
*p_last = last;
}
if ( p_st )
if ( p_st != NULL )
*p_st = pr->u.p.st;
} else { /* no iterations to do */
pr->u.p.lb = pr->u.p.ub + pr->u.p.st;
@ -1665,10 +1708,7 @@ __kmp_dispatch_next(
if ( (last = (limit >= trip)) != 0 )
limit = trip;
if ( p_last ) {
*p_last = last;
}
if ( p_st != 0 ) *p_st = incr;
if ( p_st != NULL ) *p_st = incr;
pr->u.p.count += team->t.t_nproc;
@ -1713,7 +1753,7 @@ __kmp_dispatch_next(
if ( (status = (init <= trip)) == 0 ) {
*p_lb = 0;
*p_ub = 0;
if ( p_st != 0 ) *p_st = 0;
if ( p_st != NULL ) *p_st = 0;
} else {
start = pr->u.p.lb;
limit = chunk + init - 1;
@ -1721,10 +1761,8 @@ __kmp_dispatch_next(
if ( (last = (limit >= trip)) != 0 )
limit = trip;
if ( p_last ) {
*p_last = last;
}
if ( p_st != 0 ) *p_st = incr;
if ( p_st != NULL ) *p_st = incr;
if ( incr == 1 ) {
*p_lb = start + init;
@ -1801,8 +1839,6 @@ __kmp_dispatch_next(
incr = pr->u.p.st;
if ( p_st != NULL )
*p_st = incr;
if ( p_last != NULL )
*p_last = last;
*p_lb = start + init * incr;
*p_ub = start + limit * incr;
if ( pr->ordered ) {
@ -1906,8 +1942,6 @@ __kmp_dispatch_next(
incr = pr->u.p.st;
if ( p_st != NULL )
*p_st = incr;
if ( p_last != NULL )
*p_last = last;
*p_lb = start + init * incr;
*p_ub = start + limit * incr;
if ( pr->ordered ) {
@ -1951,7 +1985,7 @@ __kmp_dispatch_next(
if ( (status = ((T)index < parm3 && init <= trip)) == 0 ) {
*p_lb = 0;
*p_ub = 0;
if ( p_st != 0 ) *p_st = 0;
if ( p_st != NULL ) *p_st = 0;
} else {
start = pr->u.p.lb;
limit = ( (index+1) * ( 2*parm2 - index*parm4 ) ) / 2 - 1;
@ -1960,10 +1994,7 @@ __kmp_dispatch_next(
if ( (last = (limit >= trip)) != 0 )
limit = trip;
if ( p_last != 0 ) {
*p_last = last;
}
if ( p_st != 0 ) *p_st = incr;
if ( p_st != NULL ) *p_st = incr;
if ( incr == 1 ) {
*p_lb = start + init;
@ -1991,6 +2022,17 @@ __kmp_dispatch_next(
} // if
} // case
break;
default:
{
status = 0; // to avoid complaints on uninitialized variable use
__kmp_msg(
kmp_ms_fatal, // Severity
KMP_MSG( UnknownSchedTypeDetected ), // Primary message
KMP_HNT( GetNewerLibrary ), // Hint
__kmp_msg_null // Variadic argument list terminator
);
}
break;
} // switch
} // if tc == 0;
@ -2010,7 +2052,7 @@ __kmp_dispatch_next(
}
#endif
if ( num_done == team->t.t_nproc-1 ) {
if ( (ST)num_done == team->t.t_nproc-1 ) {
/* NOTE: release this buffer to be reused */
KMP_MB(); /* Flush all pending memory write invalidates. */
@ -2048,6 +2090,8 @@ __kmp_dispatch_next(
pr->u.p.last_upper = pr->u.p.ub;
}
#endif /* KMP_OS_WINDOWS */
if ( p_last != NULL && status != 0 )
*p_last = last;
} // if
#ifdef KMP_DEBUG
@ -2062,9 +2106,129 @@ __kmp_dispatch_next(
__kmp_str_free( &buff );
}
#endif
#if INCLUDE_SSC_MARKS
SSC_MARK_DISPATCH_NEXT();
#endif
return status;
}
template< typename T >
static void
__kmp_dist_get_bounds(
ident_t *loc,
kmp_int32 gtid,
kmp_int32 *plastiter,
T *plower,
T *pupper,
typename traits_t< T >::signed_t incr
) {
KMP_COUNT_BLOCK(OMP_DISTR_FOR_dynamic);
typedef typename traits_t< T >::unsigned_t UT;
typedef typename traits_t< T >::signed_t ST;
register kmp_uint32 team_id;
register kmp_uint32 nteams;
register UT trip_count;
register kmp_team_t *team;
kmp_info_t * th;
KMP_DEBUG_ASSERT( plastiter && plower && pupper );
KE_TRACE( 10, ("__kmpc_dist_get_bounds called (%d)\n", gtid));
#ifdef KMP_DEBUG
{
const char * buff;
// create format specifiers before the debug output
buff = __kmp_str_format( "__kmpc_dist_get_bounds: T#%%d liter=%%d "\
"iter=(%%%s, %%%s, %%%s) signed?<%s>\n",
traits_t< T >::spec, traits_t< T >::spec, traits_t< ST >::spec,
traits_t< T >::spec );
KD_TRACE(100, ( buff, gtid, *plastiter, *plower, *pupper, incr ) );
__kmp_str_free( &buff );
}
#endif
if( __kmp_env_consistency_check ) {
if( incr == 0 ) {
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrZeroProhibited, ct_pdo, loc );
}
if( incr > 0 ? (*pupper < *plower) : (*plower < *pupper) ) {
// The loop is illegal.
// Some zero-trip loops maintained by compiler, e.g.:
// for(i=10;i<0;++i) // lower >= upper - run-time check
// for(i=0;i>10;--i) // lower <= upper - run-time check
// for(i=0;i>10;++i) // incr > 0 - compile-time check
// for(i=10;i<0;--i) // incr < 0 - compile-time check
// Compiler does not check the following illegal loops:
// for(i=0;i<10;i+=incr) // where incr<0
// for(i=10;i>0;i-=incr) // where incr<0
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrIllegal, ct_pdo, loc );
}
}
th = __kmp_threads[gtid];
KMP_DEBUG_ASSERT(th->th.th_teams_microtask); // we are in the teams construct
team = th->th.th_team;
#if OMP_40_ENABLED
nteams = th->th.th_teams_size.nteams;
#endif
team_id = team->t.t_master_tid;
KMP_DEBUG_ASSERT(nteams == team->t.t_parent->t.t_nproc);
// compute global trip count
if( incr == 1 ) {
trip_count = *pupper - *plower + 1;
} else if(incr == -1) {
trip_count = *plower - *pupper + 1;
} else {
trip_count = (ST)(*pupper - *plower) / incr + 1; // cast to signed to cover incr<0 case
}
if( trip_count <= nteams ) {
KMP_DEBUG_ASSERT(
__kmp_static == kmp_sch_static_greedy || \
__kmp_static == kmp_sch_static_balanced
); // Unknown static scheduling type.
// only some teams get single iteration, others get nothing
if( team_id < trip_count ) {
*pupper = *plower = *plower + team_id * incr;
} else {
*plower = *pupper + incr; // zero-trip loop
}
if( plastiter != NULL )
*plastiter = ( team_id == trip_count - 1 );
} else {
if( __kmp_static == kmp_sch_static_balanced ) {
register UT chunk = trip_count / nteams;
register UT extras = trip_count % nteams;
*plower += incr * ( team_id * chunk + ( team_id < extras ? team_id : extras ) );
*pupper = *plower + chunk * incr - ( team_id < extras ? 0 : incr );
if( plastiter != NULL )
*plastiter = ( team_id == nteams - 1 );
} else {
register T chunk_inc_count =
( trip_count / nteams + ( ( trip_count % nteams ) ? 1 : 0) ) * incr;
register T upper = *pupper;
KMP_DEBUG_ASSERT( __kmp_static == kmp_sch_static_greedy );
// Unknown static scheduling type.
*plower += team_id * chunk_inc_count;
*pupper = *plower + chunk_inc_count - incr;
// Check/correct bounds if needed
if( incr > 0 ) {
if( *pupper < *plower )
*pupper = i_maxmin< T >::mx;
if( plastiter != NULL )
*plastiter = *plower <= upper && *pupper > upper - incr;
if( *pupper > upper )
*pupper = upper; // tracker C73258
} else {
if( *pupper > *plower )
*pupper = i_maxmin< T >::mn;
if( plastiter != NULL )
*plastiter = *plower >= upper && *pupper < upper - incr;
if( *pupper < upper )
*pupper = upper; // tracker C73258
}
}
}
}
//-----------------------------------------------------------------------------------------
// Dispatch routines
// Transfer call to template< type T >
@ -2091,6 +2255,7 @@ void
__kmpc_dispatch_init_4( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_int32 lb, kmp_int32 ub, kmp_int32 st, kmp_int32 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dispatch_init< kmp_int32 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
@ -2101,6 +2266,7 @@ void
__kmpc_dispatch_init_4u( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_uint32 lb, kmp_uint32 ub, kmp_int32 st, kmp_int32 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dispatch_init< kmp_uint32 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
@ -2113,6 +2279,7 @@ __kmpc_dispatch_init_8( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_int64 lb, kmp_int64 ub,
kmp_int64 st, kmp_int64 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dispatch_init< kmp_int64 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
@ -2125,10 +2292,60 @@ __kmpc_dispatch_init_8u( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_uint64 lb, kmp_uint64 ub,
kmp_int64 st, kmp_int64 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dispatch_init< kmp_uint64 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
/*!
See @ref __kmpc_dispatch_init_4
Difference from __kmpc_dispatch_init set of functions is these functions
are called for composite distribute parallel for construct. Thus before
regular iterations dispatching we need to calc per-team iteration space.
These functions are all identical apart from the types of the arguments.
*/
void
__kmpc_dist_dispatch_init_4( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_int32 *p_last, kmp_int32 lb, kmp_int32 ub, kmp_int32 st, kmp_int32 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dist_get_bounds< kmp_int32 >( loc, gtid, p_last, &lb, &ub, st );
__kmp_dispatch_init< kmp_int32 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
void
__kmpc_dist_dispatch_init_4u( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_int32 *p_last, kmp_uint32 lb, kmp_uint32 ub, kmp_int32 st, kmp_int32 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dist_get_bounds< kmp_uint32 >( loc, gtid, p_last, &lb, &ub, st );
__kmp_dispatch_init< kmp_uint32 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
void
__kmpc_dist_dispatch_init_8( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_int32 *p_last, kmp_int64 lb, kmp_int64 ub, kmp_int64 st, kmp_int64 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dist_get_bounds< kmp_int64 >( loc, gtid, p_last, &lb, &ub, st );
__kmp_dispatch_init< kmp_int64 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
void
__kmpc_dist_dispatch_init_8u( ident_t *loc, kmp_int32 gtid, enum sched_type schedule,
kmp_int32 *p_last, kmp_uint64 lb, kmp_uint64 ub, kmp_int64 st, kmp_int64 chunk )
{
KMP_COUNT_BLOCK(OMP_FOR_dynamic);
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_dist_get_bounds< kmp_uint64 >( loc, gtid, p_last, &lb, &ub, st );
__kmp_dispatch_init< kmp_uint64 >( loc, gtid, schedule, lb, ub, st, chunk, true );
}
/*!
@param loc Source code location
@param gtid Global thread id
@ -2284,8 +2501,6 @@ __kmp_wait_yield_4(volatile kmp_uint32 * spinner,
/* if ( TCR_4(__kmp_global.g.g_done) && __kmp_global.g.g_abort)
__kmp_abort_thread(); */
__kmp_static_delay(TRUE);
/* if we have waited a bit, or are oversubscribed, yield */
/* pause is in the following code */
KMP_YIELD( TCR_4(__kmp_nth) > __kmp_avail_proc );
@ -2320,8 +2535,6 @@ __kmp_wait_yield_8( volatile kmp_uint64 * spinner,
/* if ( TCR_4(__kmp_global.g.g_done) && __kmp_global.g.g_abort)
__kmp_abort_thread(); */
__kmp_static_delay(TRUE);
// if we are oversubscribed,
// or have waited a bit (and KMP_LIBARRY=throughput, then yield
// pause is in the following code

View File

@ -1,7 +1,7 @@
/*
* kmp_environment.c -- Handle environment variables OS-independently.
* $Revision: 42263 $
* $Date: 2013-04-04 11:03:19 -0500 (Thu, 04 Apr 2013) $
* $Revision: 43084 $
* $Date: 2014-04-15 09:15:14 -0500 (Tue, 15 Apr 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_environment.h -- Handle environment varoiables OS-independently.
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_error.c -- KPTS functions for error checking at runtime
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_error.h -- PTS functions for error checking at runtime.
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_ftn_cdecl.c -- Fortran __cdecl linkage support for OpenMP.
* $Revision: 42757 $
* $Date: 2013-10-18 08:20:57 -0500 (Fri, 18 Oct 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_ftn_entry.h -- Fortran entry linkage support for OpenMP.
* $Revision: 42798 $
* $Date: 2013-10-30 16:39:54 -0500 (Wed, 30 Oct 2013) $
* $Revision: 43435 $
* $Date: 2014-09-04 15:16:08 -0500 (Thu, 04 Sep 2014) $
*/
@ -217,8 +217,6 @@ FTN_GET_LIBRARY (void)
#endif
}
#if OMP_30_ENABLED
int FTN_STDCALL
FTN_SET_AFFINITY( void **mask )
{
@ -348,8 +346,6 @@ FTN_GET_AFFINITY_MASK_PROC( int KMP_DEREF proc, void **mask )
#endif
}
#endif /* OMP_30_ENABLED */
/* ------------------------------------------------------------------------ */
@ -391,12 +387,8 @@ xexpand(FTN_GET_MAX_THREADS)( void )
}
gtid = __kmp_entry_gtid();
thread = __kmp_threads[ gtid ];
#if OMP_30_ENABLED
//return thread -> th.th_team -> t.t_current_task[ thread->th.th_info.ds.ds_tid ] -> icvs.nproc;
return thread -> th.th_current_task -> td_icvs.nproc;
#else
return thread -> th.th_team -> t.t_set_nproc[ thread->th.th_info.ds.ds_tid ];
#endif
#endif
}
@ -533,7 +525,7 @@ xexpand(FTN_IN_PARALLEL)( void )
#else
kmp_info_t *th = __kmp_entry_thread();
#if OMP_40_ENABLED
if ( th->th.th_team_microtask ) {
if ( th->th.th_teams_microtask ) {
// AC: r_in_parallel does not work inside teams construct
// where real parallel is inactive, but all threads have same root,
// so setting it in one team affects other teams.
@ -546,8 +538,6 @@ xexpand(FTN_IN_PARALLEL)( void )
#endif
}
#if OMP_30_ENABLED
void FTN_STDCALL
xexpand(FTN_SET_SCHEDULE)( kmp_sched_t KMP_DEREF kind, int KMP_DEREF modifier )
{
@ -667,8 +657,6 @@ xexpand(FTN_IN_FINAL)( void )
#endif
}
#endif // OMP_30_ENABLED
#if OMP_40_ENABLED
@ -689,7 +677,7 @@ xexpand(FTN_GET_NUM_TEAMS)( void )
return 1;
#else
kmp_info_t *thr = __kmp_entry_thread();
if ( thr->th.th_team_microtask ) {
if ( thr->th.th_teams_microtask ) {
kmp_team_t *team = thr->th.th_team;
int tlevel = thr->th.th_teams_level;
int ii = team->t.t_level; // the level of the teams construct
@ -728,7 +716,7 @@ xexpand(FTN_GET_TEAM_NUM)( void )
return 0;
#else
kmp_info_t *thr = __kmp_entry_thread();
if ( thr->th.th_team_microtask ) {
if ( thr->th.th_teams_microtask ) {
kmp_team_t *team = thr->th.th_team;
int tlevel = thr->th.th_teams_level; // the level of the teams construct
int ii = team->t.t_level;
@ -1048,19 +1036,19 @@ FTN_GET_CANCELLATION_STATUS(int cancel_kind) {
#endif // OMP_40_ENABLED
// GCC compatibility (versioned symbols)
#if KMP_OS_LINUX
#ifdef KMP_USE_VERSION_SYMBOLS
/*
These following sections create function aliases (dummy symbols) for the omp_* routines.
These aliases will then be versioned according to how libgomp ``versions'' its
symbols (OMP_1.0, OMP_2.0, OMP_3.0, ...) while also retaining the
These aliases will then be versioned according to how libgomp ``versions'' its
symbols (OMP_1.0, OMP_2.0, OMP_3.0, ...) while also retaining the
default version which libiomp5 uses: VERSION (defined in exports_so.txt)
If you want to see the versioned symbols for libgomp.so.1 then just type:
If you want to see the versioned symbols for libgomp.so.1 then just type:
objdump -T /path/to/libgomp.so.1 | grep omp_
Example:
Step 1) Create __kmp_api_omp_set_num_threads_10_alias
Step 1) Create __kmp_api_omp_set_num_threads_10_alias
which is alias of __kmp_api_omp_set_num_threads
Step 2) Set __kmp_api_omp_set_num_threads_10_alias to version: omp_set_num_threads@OMP_1.0
Step 2B) Set __kmp_api_omp_set_num_threads to default version : omp_set_num_threads@@VERSION
@ -1092,7 +1080,6 @@ xaliasify(FTN_TEST_NEST_LOCK, 10);
xaliasify(FTN_GET_WTICK, 20);
xaliasify(FTN_GET_WTIME, 20);
#if OMP_30_ENABLED
// OMP_3.0 aliases
xaliasify(FTN_SET_SCHEDULE, 30);
xaliasify(FTN_GET_SCHEDULE, 30);
@ -1116,7 +1103,6 @@ xaliasify(FTN_TEST_NEST_LOCK, 30);
// OMP_3.1 aliases
xaliasify(FTN_IN_FINAL, 31);
#endif /* OMP_30_ENABLED */
#if OMP_40_ENABLED
// OMP_4.0 aliases
@ -1160,7 +1146,6 @@ xversionify(FTN_TEST_NEST_LOCK, 10, "OMP_1.0");
xversionify(FTN_GET_WTICK, 20, "OMP_2.0");
xversionify(FTN_GET_WTIME, 20, "OMP_2.0");
#if OMP_30_ENABLED
// OMP_3.0 versioned symbols
xversionify(FTN_SET_SCHEDULE, 30, "OMP_3.0");
xversionify(FTN_GET_SCHEDULE, 30, "OMP_3.0");
@ -1186,7 +1171,6 @@ xversionify(FTN_TEST_NEST_LOCK, 30, "OMP_3.0");
// OMP_3.1 versioned symbol
xversionify(FTN_IN_FINAL, 31, "OMP_3.1");
#endif /* OMP_30_ENABLED */
#if OMP_40_ENABLED
// OMP_4.0 versioned symbols
@ -1204,7 +1188,7 @@ xversionify(FTN_GET_CANCELLATION, 40, "OMP_4.0");
// OMP_5.0 versioned symbols
#endif
#endif /* KMP_OS_LINUX */
#endif // KMP_USE_VERSION_SYMBOLS
#ifdef __cplusplus
} //extern "C"

View File

@ -1,7 +1,7 @@
/*
* kmp_ftn_extra.c -- Fortran 'extra' linkage support for OpenMP.
* $Revision: 42757 $
* $Date: 2013-10-18 08:20:57 -0500 (Fri, 18 Oct 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_ftn_os.h -- KPTS Fortran defines header file.
* $Revision: 42745 $
* $Date: 2013-10-14 17:02:04 -0500 (Mon, 14 Oct 2013) $
* $Revision: 43354 $
* $Date: 2014-07-22 17:15:02 -0500 (Tue, 22 Jul 2014) $
*/
@ -472,14 +472,14 @@
#define KMP_API_NAME_GOMP_TASKGROUP_START GOMP_taskgroup_start
#define KMP_API_NAME_GOMP_TASKGROUP_END GOMP_taskgroup_end
/* Target functions should be taken care of by liboffload */
//#define KMP_API_NAME_GOMP_TARGET GOMP_target
//#define KMP_API_NAME_GOMP_TARGET_DATA GOMP_target_data
//#define KMP_API_NAME_GOMP_TARGET_END_DATA GOMP_target_end_data
//#define KMP_API_NAME_GOMP_TARGET_UPDATE GOMP_target_update
#define KMP_API_NAME_GOMP_TARGET GOMP_target
#define KMP_API_NAME_GOMP_TARGET_DATA GOMP_target_data
#define KMP_API_NAME_GOMP_TARGET_END_DATA GOMP_target_end_data
#define KMP_API_NAME_GOMP_TARGET_UPDATE GOMP_target_update
#define KMP_API_NAME_GOMP_TEAMS GOMP_teams
#if KMP_OS_LINUX && !KMP_OS_CNK && !KMP_ARCH_PPC64
#define xstr(x) str(x)
#ifdef KMP_USE_VERSION_SYMBOLS
#define xstr(x) str(x)
#define str(x) #x
// If Linux, xexpand prepends __kmp_api_ to the real API name
@ -494,7 +494,7 @@
__asm__(".symver " xstr(__kmp_api_##api_name##_##version_num##_alias) "," xstr(api_name) "@" version_str "\n\t"); \
__asm__(".symver " xstr(__kmp_api_##api_name) "," xstr(api_name) "@@" default_ver "\n\t")
#else /* KMP_OS_LINUX */
#else // KMP_USE_VERSION_SYMBOLS
#define xstr(x) /* Nothing */
#define str(x) /* Nothing */
@ -508,7 +508,7 @@
#define xversionify(api_name, version_num, version_str) /* Nothing */
#define versionify(api_name, version_num, version_str, default_ver) /* Nothing */
#endif /* KMP_OS_LINUX */
#endif // KMP_USE_VERSION_SYMBOLS
#endif /* KMP_FTN_OS_H */

View File

@ -1,7 +1,7 @@
/*
* kmp_ftn_stdcall.c -- Fortran __stdcall linkage support for OpenMP.
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_global.c -- KPTS global variables for runtime support library
* $Revision: 42816 $
* $Date: 2013-11-11 15:33:37 -0600 (Mon, 11 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -25,6 +25,20 @@ kmp_key_t __kmp_gtid_threadprivate_key;
kmp_cpuinfo_t __kmp_cpuinfo = { 0 }; // Not initialized
#if KMP_STATS_ENABLED
#include "kmp_stats.h"
// lock for modifying the global __kmp_stats_list
kmp_tas_lock_t __kmp_stats_lock = KMP_TAS_LOCK_INITIALIZER(__kmp_stats_lock);
// global list of per thread stats, the head is a sentinel node which accumulates all stats produced before __kmp_create_worker is called.
kmp_stats_list __kmp_stats_list;
// thread local pointer to stats node within list
__thread kmp_stats_list* __kmp_stats_thread_ptr = &__kmp_stats_list;
// gives reference tick for all events (considered the 0 tick)
tsc_tick_count __kmp_stats_start_time;
#endif
/* ----------------------------------------------------- */
/* INITIALIZATION VARIABLES */
@ -53,6 +67,7 @@ unsigned int __kmp_next_wait = KMP_DEFAULT_NEXT_WAIT; /* susequent number of s
size_t __kmp_stksize = KMP_DEFAULT_STKSIZE;
size_t __kmp_monitor_stksize = 0; // auto adjust
size_t __kmp_stkoffset = KMP_DEFAULT_STKOFFSET;
int __kmp_stkpadding = KMP_MIN_STKPADDING;
size_t __kmp_malloc_pool_incr = KMP_DEFAULT_MALLOC_POOL_INCR;
@ -94,7 +109,7 @@ char const *__kmp_barrier_type_name [ bs_last_barrier ] =
, "reduction"
#endif // KMP_FAST_REDUCTION_BARRIER
};
char const *__kmp_barrier_pattern_name [ bp_last_bar ] = { "linear", "tree", "hyper" };
char const *__kmp_barrier_pattern_name [ bp_last_bar ] = { "linear", "tree", "hyper", "hierarchical" };
int __kmp_allThreadsSpecified = 0;
@ -114,16 +129,17 @@ int __kmp_dflt_team_nth_ub = 0;
int __kmp_tp_capacity = 0;
int __kmp_tp_cached = 0;
int __kmp_dflt_nested = FALSE;
#if OMP_30_ENABLED
int __kmp_dflt_max_active_levels = KMP_MAX_ACTIVE_LEVELS_LIMIT; /* max_active_levels limit */
#endif // OMP_30_ENABLED
#if KMP_NESTED_HOT_TEAMS
int __kmp_hot_teams_mode = 0; /* 0 - free extra threads when reduced */
/* 1 - keep extra threads when reduced */
int __kmp_hot_teams_max_level = 1; /* nesting level of hot teams */
#endif
enum library_type __kmp_library = library_none;
enum sched_type __kmp_sched = kmp_sch_default; /* scheduling method for runtime scheduling */
enum sched_type __kmp_static = kmp_sch_static_greedy; /* default static scheduling method */
enum sched_type __kmp_guided = kmp_sch_guided_iterative_chunked; /* default guided scheduling method */
#if OMP_30_ENABLED
enum sched_type __kmp_auto = kmp_sch_guided_analytical_chunked; /* default auto scheduling method */
#endif // OMP_30_ENABLED
int __kmp_dflt_blocktime = KMP_DEFAULT_BLOCKTIME;
int __kmp_monitor_wakeups = KMP_MIN_MONITOR_WAKEUPS;
int __kmp_bt_intervals = KMP_INTERVALS_FROM_BLOCKTIME( KMP_DEFAULT_BLOCKTIME, KMP_MIN_MONITOR_WAKEUPS );
@ -242,7 +258,6 @@ unsigned int __kmp_place_num_threads_per_core = 0;
unsigned int __kmp_place_core_offset = 0;
#endif
#if OMP_30_ENABLED
kmp_tasking_mode_t __kmp_tasking_mode = tskm_task_teams;
/* This check ensures that the compiler is passing the correct data type
@ -255,8 +270,6 @@ KMP_BUILD_ASSERT( sizeof(kmp_tasking_flags_t) == 4 );
kmp_int32 __kmp_task_stealing_constraint = 1; /* Constrain task stealing by default */
#endif /* OMP_30_ENABLED */
#ifdef DEBUG_SUSPEND
int __kmp_suspend_count = 0;
#endif
@ -364,6 +377,29 @@ kmp_global_t __kmp_global = {{ 0 }};
/* ----------------------------------------------- */
/* GLOBAL SYNCHRONIZATION LOCKS */
/* TODO verify the need for these locks and if they need to be global */
#if KMP_USE_INTERNODE_ALIGNMENT
/* Multinode systems have larger cache line granularity which can cause
* false sharing if the alignment is not large enough for these locks */
KMP_ALIGN_CACHE_INTERNODE
kmp_bootstrap_lock_t __kmp_initz_lock = KMP_BOOTSTRAP_LOCK_INITIALIZER( __kmp_initz_lock ); /* Control initializations */
KMP_ALIGN_CACHE_INTERNODE
kmp_bootstrap_lock_t __kmp_forkjoin_lock; /* control fork/join access */
KMP_ALIGN_CACHE_INTERNODE
kmp_bootstrap_lock_t __kmp_exit_lock; /* exit() is not always thread-safe */
KMP_ALIGN_CACHE_INTERNODE
kmp_bootstrap_lock_t __kmp_monitor_lock; /* control monitor thread creation */
KMP_ALIGN_CACHE_INTERNODE
kmp_bootstrap_lock_t __kmp_tp_cached_lock; /* used for the hack to allow threadprivate cache and __kmp_threads expansion to co-exist */
KMP_ALIGN_CACHE_INTERNODE
kmp_lock_t __kmp_global_lock; /* Control OS/global access */
KMP_ALIGN_CACHE_INTERNODE
kmp_queuing_lock_t __kmp_dispatch_lock; /* Control dispatch access */
KMP_ALIGN_CACHE_INTERNODE
kmp_lock_t __kmp_debug_lock; /* Control I/O access for KMP_DEBUG */
#else
KMP_ALIGN_CACHE
kmp_bootstrap_lock_t __kmp_initz_lock = KMP_BOOTSTRAP_LOCK_INITIALIZER( __kmp_initz_lock ); /* Control initializations */
@ -378,6 +414,7 @@ KMP_ALIGN(128)
kmp_queuing_lock_t __kmp_dispatch_lock; /* Control dispatch access */
KMP_ALIGN(128)
kmp_lock_t __kmp_debug_lock; /* Control I/O access for KMP_DEBUG */
#endif
/* ----------------------------------------------- */

View File

@ -1,7 +1,7 @@
/*
* kmp_gsupport.c
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -244,7 +244,7 @@ xexpand(KMP_API_NAME_GOMP_ORDERED_END)(void)
// The parallel contruct
//
#ifdef KMP_DEBUG
#ifndef KMP_DEBUG
static
#endif /* KMP_DEBUG */
void
@ -255,7 +255,7 @@ __kmp_GOMP_microtask_wrapper(int *gtid, int *npr, void (*task)(void *),
}
#ifdef KMP_DEBUG
#ifndef KMP_DEBUG
static
#endif /* KMP_DEBUG */
void
@ -276,7 +276,7 @@ __kmp_GOMP_parallel_microtask_wrapper(int *gtid, int *npr,
}
#ifdef KMP_DEBUG
#ifndef KMP_DEBUG
static
#endif /* KMP_DEBUG */
void
@ -287,7 +287,7 @@ __kmp_GOMP_fork_call(ident_t *loc, int gtid, microtask_t wrapper, int argc,...)
va_list ap;
va_start(ap, argc);
rc = __kmp_fork_call(loc, gtid, FALSE, argc, wrapper, __kmp_invoke_task_func,
rc = __kmp_fork_call(loc, gtid, fork_context_gnu, argc, wrapper, __kmp_invoke_task_func,
#if (KMP_ARCH_X86_64 || KMP_ARCH_ARM) && KMP_OS_LINUX
&ap
#else
@ -563,7 +563,7 @@ xexpand(KMP_API_NAME_GOMP_LOOP_END_NOWAIT)(void)
status = KMP_DISPATCH_NEXT_ULL(&loc, gtid, NULL, \
(kmp_uint64 *)p_lb, (kmp_uint64 *)p_ub, (kmp_int64 *)&stride); \
if (status) { \
KMP_DEBUG_ASSERT(stride == str2); \
KMP_DEBUG_ASSERT((long long)stride == str2); \
*p_ub += (str > 0) ? 1 : -1; \
} \
} \
@ -666,9 +666,6 @@ PARALLEL_LOOP_START(xexpand(KMP_API_NAME_GOMP_PARALLEL_LOOP_GUIDED_START), kmp_s
PARALLEL_LOOP_START(xexpand(KMP_API_NAME_GOMP_PARALLEL_LOOP_RUNTIME_START), kmp_sch_runtime)
#if OMP_30_ENABLED
/* */
//
// Tasking constructs
@ -742,9 +739,6 @@ xexpand(KMP_API_NAME_GOMP_TASKWAIT)(void)
}
#endif /* OMP_30_ENABLED */
/* */
//
// Sections worksharing constructs
@ -861,9 +855,268 @@ xexpand(KMP_API_NAME_GOMP_SECTIONS_END_NOWAIT)(void)
void
xexpand(KMP_API_NAME_GOMP_TASKYIELD)(void)
{
KA_TRACE(20, ("GOMP_taskyield: T#%d\n", __kmp_get_gtid()))
return;
}
#if OMP_40_ENABLED // these are new GOMP_4.0 entry points
void
xexpand(KMP_API_NAME_GOMP_PARALLEL)(void (*task)(void *), void *data, unsigned num_threads, unsigned int flags)
{
int gtid = __kmp_entry_gtid();
MKLOC(loc, "GOMP_parallel");
KA_TRACE(20, ("GOMP_parallel: T#%d\n", gtid));
if (__kmpc_ok_to_fork(&loc) && (num_threads != 1)) {
if (num_threads != 0) {
__kmp_push_num_threads(&loc, gtid, num_threads);
}
if(flags != 0) {
__kmp_push_proc_bind(&loc, gtid, (kmp_proc_bind_t)flags);
}
__kmp_GOMP_fork_call(&loc, gtid,
(microtask_t)__kmp_GOMP_microtask_wrapper, 2, task, data);
}
else {
__kmpc_serialized_parallel(&loc, gtid);
}
task(data);
xexpand(KMP_API_NAME_GOMP_PARALLEL_END)();
}
void
xexpand(KMP_API_NAME_GOMP_PARALLEL_SECTIONS)(void (*task) (void *), void *data,
unsigned num_threads, unsigned count, unsigned flags)
{
int gtid = __kmp_entry_gtid();
int last = FALSE;
MKLOC(loc, "GOMP_parallel_sections");
KA_TRACE(20, ("GOMP_parallel_sections: T#%d\n", gtid));
if (__kmpc_ok_to_fork(&loc) && (num_threads != 1)) {
if (num_threads != 0) {
__kmp_push_num_threads(&loc, gtid, num_threads);
}
if(flags != 0) {
__kmp_push_proc_bind(&loc, gtid, (kmp_proc_bind_t)flags);
}
__kmp_GOMP_fork_call(&loc, gtid,
(microtask_t)__kmp_GOMP_parallel_microtask_wrapper, 9, task, data,
num_threads, &loc, kmp_nm_dynamic_chunked, (kmp_int)1,
(kmp_int)count, (kmp_int)1, (kmp_int)1);
}
else {
__kmpc_serialized_parallel(&loc, gtid);
}
KMP_DISPATCH_INIT(&loc, gtid, kmp_nm_dynamic_chunked, 1, count, 1, 1, TRUE);
task(data);
xexpand(KMP_API_NAME_GOMP_PARALLEL_END)();
KA_TRACE(20, ("GOMP_parallel_sections exit: T#%d\n", gtid));
}
#define PARALLEL_LOOP(func, schedule) \
void func (void (*task) (void *), void *data, unsigned num_threads, \
long lb, long ub, long str, long chunk_sz, unsigned flags) \
{ \
int gtid = __kmp_entry_gtid(); \
int last = FALSE; \
MKLOC(loc, #func); \
KA_TRACE(20, ( #func ": T#%d, lb 0x%lx, ub 0x%lx, str 0x%lx, chunk_sz 0x%lx\n", \
gtid, lb, ub, str, chunk_sz )); \
\
if (__kmpc_ok_to_fork(&loc) && (num_threads != 1)) { \
if (num_threads != 0) { \
__kmp_push_num_threads(&loc, gtid, num_threads); \
} \
if (flags != 0) { \
__kmp_push_proc_bind(&loc, gtid, (kmp_proc_bind_t)flags); \
} \
__kmp_GOMP_fork_call(&loc, gtid, \
(microtask_t)__kmp_GOMP_parallel_microtask_wrapper, 9, \
task, data, num_threads, &loc, (schedule), lb, \
(str > 0) ? (ub - 1) : (ub + 1), str, chunk_sz); \
} \
else { \
__kmpc_serialized_parallel(&loc, gtid); \
} \
\
KMP_DISPATCH_INIT(&loc, gtid, (schedule), lb, \
(str > 0) ? (ub - 1) : (ub + 1), str, chunk_sz, \
(schedule) != kmp_sch_static); \
task(data); \
xexpand(KMP_API_NAME_GOMP_PARALLEL_END)(); \
\
KA_TRACE(20, ( #func " exit: T#%d\n", gtid)); \
}
PARALLEL_LOOP(xexpand(KMP_API_NAME_GOMP_PARALLEL_LOOP_STATIC), kmp_sch_static)
PARALLEL_LOOP(xexpand(KMP_API_NAME_GOMP_PARALLEL_LOOP_DYNAMIC), kmp_sch_dynamic_chunked)
PARALLEL_LOOP(xexpand(KMP_API_NAME_GOMP_PARALLEL_LOOP_GUIDED), kmp_sch_guided_chunked)
PARALLEL_LOOP(xexpand(KMP_API_NAME_GOMP_PARALLEL_LOOP_RUNTIME), kmp_sch_runtime)
void
xexpand(KMP_API_NAME_GOMP_TASKGROUP_START)(void)
{
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_taskgroup_start");
KA_TRACE(20, ("GOMP_taskgroup_start: T#%d\n", gtid));
__kmpc_taskgroup(&loc, gtid);
return;
}
void
xexpand(KMP_API_NAME_GOMP_TASKGROUP_END)(void)
{
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_taskgroup_end");
KA_TRACE(20, ("GOMP_taskgroup_end: T#%d\n", gtid));
__kmpc_end_taskgroup(&loc, gtid);
return;
}
#ifndef KMP_DEBUG
static
#endif /* KMP_DEBUG */
kmp_int32 __kmp_gomp_to_iomp_cancellation_kind(int gomp_kind) {
kmp_int32 cncl_kind = 0;
switch(gomp_kind) {
case 1:
cncl_kind = cancel_parallel;
break;
case 2:
cncl_kind = cancel_loop;
break;
case 4:
cncl_kind = cancel_sections;
break;
case 8:
cncl_kind = cancel_taskgroup;
break;
}
return cncl_kind;
}
bool
xexpand(KMP_API_NAME_GOMP_CANCELLATION_POINT)(int which)
{
if(__kmp_omp_cancellation) {
KMP_FATAL(NoGompCancellation);
}
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_cancellation_point");
KA_TRACE(20, ("GOMP_cancellation_point: T#%d\n", gtid));
kmp_int32 cncl_kind = __kmp_gomp_to_iomp_cancellation_kind(which);
return __kmpc_cancellationpoint(&loc, gtid, cncl_kind);
}
bool
xexpand(KMP_API_NAME_GOMP_BARRIER_CANCEL)(void)
{
if(__kmp_omp_cancellation) {
KMP_FATAL(NoGompCancellation);
}
KMP_FATAL(NoGompCancellation);
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_barrier_cancel");
KA_TRACE(20, ("GOMP_barrier_cancel: T#%d\n", gtid));
return __kmpc_cancel_barrier(&loc, gtid);
}
bool
xexpand(KMP_API_NAME_GOMP_CANCEL)(int which, bool do_cancel)
{
if(__kmp_omp_cancellation) {
KMP_FATAL(NoGompCancellation);
} else {
return FALSE;
}
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_cancel");
KA_TRACE(20, ("GOMP_cancel: T#%d\n", gtid));
kmp_int32 cncl_kind = __kmp_gomp_to_iomp_cancellation_kind(which);
if(do_cancel == FALSE) {
return xexpand(KMP_API_NAME_GOMP_CANCELLATION_POINT)(which);
} else {
return __kmpc_cancel(&loc, gtid, cncl_kind);
}
}
bool
xexpand(KMP_API_NAME_GOMP_SECTIONS_END_CANCEL)(void)
{
if(__kmp_omp_cancellation) {
KMP_FATAL(NoGompCancellation);
}
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_sections_end_cancel");
KA_TRACE(20, ("GOMP_sections_end_cancel: T#%d\n", gtid));
return __kmpc_cancel_barrier(&loc, gtid);
}
bool
xexpand(KMP_API_NAME_GOMP_LOOP_END_CANCEL)(void)
{
if(__kmp_omp_cancellation) {
KMP_FATAL(NoGompCancellation);
}
int gtid = __kmp_get_gtid();
MKLOC(loc, "GOMP_loop_end_cancel");
KA_TRACE(20, ("GOMP_loop_end_cancel: T#%d\n", gtid));
return __kmpc_cancel_barrier(&loc, gtid);
}
// All target functions are empty as of 2014-05-29
void
xexpand(KMP_API_NAME_GOMP_TARGET)(int device, void (*fn) (void *), const void *openmp_target,
size_t mapnum, void **hostaddrs, size_t *sizes, unsigned char *kinds)
{
return;
}
void
xexpand(KMP_API_NAME_GOMP_TARGET_DATA)(int device, const void *openmp_target, size_t mapnum,
void **hostaddrs, size_t *sizes, unsigned char *kinds)
{
return;
}
void
xexpand(KMP_API_NAME_GOMP_TARGET_END_DATA)(void)
{
return;
}
void
xexpand(KMP_API_NAME_GOMP_TARGET_UPDATE)(int device, const void *openmp_target, size_t mapnum,
void **hostaddrs, size_t *sizes, unsigned char *kinds)
{
return;
}
void
xexpand(KMP_API_NAME_GOMP_TEAMS)(unsigned int num_teams, unsigned int thread_limit)
{
return;
}
#endif // OMP_40_ENABLED
/*
The following sections of code create aliases for the GOMP_* functions,
then create versioned symbols using the assembler directive .symver.
@ -871,7 +1124,7 @@ xexpand(KMP_API_NAME_GOMP_TASKYIELD)(void)
xaliasify and xversionify are defined in kmp_ftn_os.h
*/
#if KMP_OS_LINUX
#ifdef KMP_USE_VERSION_SYMBOLS
// GOMP_1.0 aliases
xaliasify(KMP_API_NAME_GOMP_ATOMIC_END, 10);
@ -917,10 +1170,8 @@ xaliasify(KMP_API_NAME_GOMP_SINGLE_COPY_START, 10);
xaliasify(KMP_API_NAME_GOMP_SINGLE_START, 10);
// GOMP_2.0 aliases
#if OMP_30_ENABLED
xaliasify(KMP_API_NAME_GOMP_TASK, 20);
xaliasify(KMP_API_NAME_GOMP_TASKWAIT, 20);
#endif
xaliasify(KMP_API_NAME_GOMP_LOOP_ULL_DYNAMIC_NEXT, 20);
xaliasify(KMP_API_NAME_GOMP_LOOP_ULL_DYNAMIC_START, 20);
xaliasify(KMP_API_NAME_GOMP_LOOP_ULL_GUIDED_NEXT, 20);
@ -942,9 +1193,27 @@ xaliasify(KMP_API_NAME_GOMP_LOOP_ULL_STATIC_START, 20);
xaliasify(KMP_API_NAME_GOMP_TASKYIELD, 30);
// GOMP_4.0 aliases
/* TODO: add GOMP_4.0 aliases when corresponding
GOMP_* functions are implemented
*/
// The GOMP_parallel* entry points below aren't OpenMP 4.0 related.
#if OMP_40_ENABLED
xaliasify(KMP_API_NAME_GOMP_PARALLEL, 40);
xaliasify(KMP_API_NAME_GOMP_PARALLEL_SECTIONS, 40);
xaliasify(KMP_API_NAME_GOMP_PARALLEL_LOOP_DYNAMIC, 40);
xaliasify(KMP_API_NAME_GOMP_PARALLEL_LOOP_GUIDED, 40);
xaliasify(KMP_API_NAME_GOMP_PARALLEL_LOOP_RUNTIME, 40);
xaliasify(KMP_API_NAME_GOMP_PARALLEL_LOOP_STATIC, 40);
xaliasify(KMP_API_NAME_GOMP_TASKGROUP_START, 40);
xaliasify(KMP_API_NAME_GOMP_TASKGROUP_END, 40);
xaliasify(KMP_API_NAME_GOMP_BARRIER_CANCEL, 40);
xaliasify(KMP_API_NAME_GOMP_CANCEL, 40);
xaliasify(KMP_API_NAME_GOMP_CANCELLATION_POINT, 40);
xaliasify(KMP_API_NAME_GOMP_LOOP_END_CANCEL, 40);
xaliasify(KMP_API_NAME_GOMP_SECTIONS_END_CANCEL, 40);
xaliasify(KMP_API_NAME_GOMP_TARGET, 40);
xaliasify(KMP_API_NAME_GOMP_TARGET_DATA, 40);
xaliasify(KMP_API_NAME_GOMP_TARGET_END_DATA, 40);
xaliasify(KMP_API_NAME_GOMP_TARGET_UPDATE, 40);
xaliasify(KMP_API_NAME_GOMP_TEAMS, 40);
#endif
// GOMP_1.0 versioned symbols
xversionify(KMP_API_NAME_GOMP_ATOMIC_END, 10, "GOMP_1.0");
@ -990,10 +1259,8 @@ xversionify(KMP_API_NAME_GOMP_SINGLE_COPY_START, 10, "GOMP_1.0");
xversionify(KMP_API_NAME_GOMP_SINGLE_START, 10, "GOMP_1.0");
// GOMP_2.0 versioned symbols
#if OMP_30_ENABLED
xversionify(KMP_API_NAME_GOMP_TASK, 20, "GOMP_2.0");
xversionify(KMP_API_NAME_GOMP_TASKWAIT, 20, "GOMP_2.0");
#endif
xversionify(KMP_API_NAME_GOMP_LOOP_ULL_DYNAMIC_NEXT, 20, "GOMP_2.0");
xversionify(KMP_API_NAME_GOMP_LOOP_ULL_DYNAMIC_START, 20, "GOMP_2.0");
xversionify(KMP_API_NAME_GOMP_LOOP_ULL_GUIDED_NEXT, 20, "GOMP_2.0");
@ -1015,11 +1282,28 @@ xversionify(KMP_API_NAME_GOMP_LOOP_ULL_STATIC_START, 20, "GOMP_2.0");
xversionify(KMP_API_NAME_GOMP_TASKYIELD, 30, "GOMP_3.0");
// GOMP_4.0 versioned symbols
/* TODO: add GOMP_4.0 versioned symbols when corresponding
GOMP_* functions are implemented
*/
#if OMP_40_ENABLED
xversionify(KMP_API_NAME_GOMP_PARALLEL, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_PARALLEL_SECTIONS, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_PARALLEL_LOOP_DYNAMIC, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_PARALLEL_LOOP_GUIDED, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_PARALLEL_LOOP_RUNTIME, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_PARALLEL_LOOP_STATIC, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TASKGROUP_START, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TASKGROUP_END, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_BARRIER_CANCEL, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_CANCEL, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_CANCELLATION_POINT, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_LOOP_END_CANCEL, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_SECTIONS_END_CANCEL, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TARGET, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TARGET_DATA, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TARGET_END_DATA, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TARGET_UPDATE, 40, "GOMP_4.0");
xversionify(KMP_API_NAME_GOMP_TEAMS, 40, "GOMP_4.0");
#endif
#endif /* KMP_OS_LINUX */
#endif // KMP_USE_VERSION_SYMBOLS
#ifdef __cplusplus
} //extern "C"

View File

@ -1,7 +1,7 @@
/*
* kmp_i18n.c
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43084 $
* $Date: 2014-04-15 09:15:14 -0500 (Tue, 15 Apr 2014) $
*/
@ -815,7 +815,7 @@ sys_error(
// not issue warning if strerror_r() returns `int' instead of expected `char *'.
message = __kmp_str_format( "%s", err_msg );
#else // OS X*, FreeBSD etc.
#else // OS X*, FreeBSD* etc.
// XSI version of strerror_r.

View File

@ -1,7 +1,7 @@
/*
* kmp_i18n.h
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_import.c
* $Revision: 42286 $
* $Date: 2013-04-18 10:53:26 -0500 (Thu, 18 Apr 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_io.c -- RTL IO
* $Revision: 42150 $
* $Date: 2013-03-15 15:40:38 -0500 (Fri, 15 Mar 2013) $
* $Revision: 43236 $
* $Date: 2014-06-04 16:42:35 -0500 (Wed, 04 Jun 2014) $
*/
@ -171,7 +171,7 @@ __kmp_vprintf( enum kmp_io __kmp_io, char const * format, va_list ap )
int chars = 0;
#ifdef KMP_DEBUG_PIDS
chars = sprintf( db, "pid=%d: ", getpid() );
chars = sprintf( db, "pid=%d: ", (kmp_int32)getpid() );
#endif
chars += vsprintf( db, format, ap );
@ -200,7 +200,8 @@ __kmp_vprintf( enum kmp_io __kmp_io, char const * format, va_list ap )
#if KMP_OS_WINDOWS
DWORD count;
#ifdef KMP_DEBUG_PIDS
__kmp_str_buf_print( &__kmp_console_buf, "pid=%d: ", getpid() );
__kmp_str_buf_print( &__kmp_console_buf, "pid=%d: ",
(kmp_int32)getpid() );
#endif
__kmp_str_buf_vprint( &__kmp_console_buf, format, ap );
WriteFile(
@ -213,7 +214,7 @@ __kmp_vprintf( enum kmp_io __kmp_io, char const * format, va_list ap )
__kmp_str_buf_clear( &__kmp_console_buf );
#else
#ifdef KMP_DEBUG_PIDS
fprintf( __kmp_stderr, "pid=%d: ", getpid() );
fprintf( __kmp_stderr, "pid=%d: ", (kmp_int32)getpid() );
#endif
vfprintf( __kmp_stderr, format, ap );
fflush( __kmp_stderr );

View File

@ -1,7 +1,7 @@
/*
* kmp_io.h -- RTL IO header file.
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,8 +1,8 @@
#if USE_ITT_BUILD
/*
* kmp_itt.c -- ITT Notify interface.
* $Revision: 42489 $
* $Date: 2013-07-08 11:00:09 -0500 (Mon, 08 Jul 2013) $
* $Revision: 43457 $
* $Date: 2014-09-17 03:57:22 -0500 (Wed, 17 Sep 2014) $
*/
@ -25,8 +25,13 @@
#if USE_ITT_NOTIFY
kmp_int32 __kmp_frame_domain_count = 0;
__itt_domain* __kmp_itt_domains[KMP_MAX_FRAME_DOMAINS];
kmp_int32 __kmp_barrier_domain_count;
kmp_int32 __kmp_region_domain_count;
__itt_domain* __kmp_itt_barrier_domains[KMP_MAX_FRAME_DOMAINS];
__itt_domain* __kmp_itt_region_domains[KMP_MAX_FRAME_DOMAINS];
__itt_domain* __kmp_itt_imbalance_domains[KMP_MAX_FRAME_DOMAINS];
kmp_int32 __kmp_itt_region_team_size[KMP_MAX_FRAME_DOMAINS];
__itt_domain * metadata_domain = NULL;
#include "kmp_version.h"
#include "kmp_i18n.h"

View File

@ -1,8 +1,8 @@
#if USE_ITT_BUILD
/*
* kmp_itt.h -- ITT Notify interface.
* $Revision: 42829 $
* $Date: 2013-11-21 05:44:01 -0600 (Thu, 21 Nov 2013) $
* $Revision: 43457 $
* $Date: 2014-09-17 03:57:22 -0500 (Wed, 17 Sep 2014) $
*/
@ -55,12 +55,20 @@ void __kmp_itt_destroy();
// __kmp_itt_xxxed() function should be called after action.
// --- Parallel region reporting ---
__kmp_inline void __kmp_itt_region_forking( int gtid, int serialized = 0 ); // Master only, before forking threads.
__kmp_inline void __kmp_itt_region_forking( int gtid, int team_size, int barriers, int serialized = 0 ); // Master only, before forking threads.
__kmp_inline void __kmp_itt_region_joined( int gtid, int serialized = 0 ); // Master only, after joining threads.
// (*) Note: A thread may execute tasks after this point, though.
// --- Frame reporting ---
__kmp_inline void __kmp_itt_frame_submit( int gtid, __itt_timestamp begin, __itt_timestamp end, int imbalance, ident_t *loc );
// region = 0 - no regions, region = 1 - parallel, region = 2 - serialized parallel
__kmp_inline void __kmp_itt_frame_submit( int gtid, __itt_timestamp begin, __itt_timestamp end, int imbalance, ident_t *loc, int team_size, int region = 0 );
// --- Metadata reporting ---
// begin/end - begin/end timestamps of a barrier frame, imbalance - aggregated wait time value, reduction -if this is a reduction barrier
__kmp_inline void __kmp_itt_metadata_imbalance( int gtid, kmp_uint64 begin, kmp_uint64 end, kmp_uint64 imbalance, kmp_uint64 reduction );
// sched_type: 0 - static, 1 - dynamic, 2 - guided, 3 - custom (all others); iterations - loop trip count, chunk - chunk size
__kmp_inline void __kmp_itt_metadata_loop( ident_t * loc, kmp_uint64 sched_type, kmp_uint64 iterations, kmp_uint64 chunk );
__kmp_inline void __kmp_itt_metadata_single();
// --- Barrier reporting ---
__kmp_inline void * __kmp_itt_barrier_object( int gtid, int bt, int set_name = 0, int delta = 0 );
@ -135,8 +143,12 @@ __kmp_inline void __kmp_itt_stack_callee_leave(__itt_caller);
#if (INCLUDE_SSC_MARKS && KMP_OS_LINUX && KMP_ARCH_X86_64)
// Portable (at least for gcc and icc) code to insert the necessary instructions
// to set %ebx and execute the unlikely no-op.
# define INSERT_SSC_MARK(tag) \
__asm__ __volatile__ ("movl %0, %%ebx; .byte 0x64, 0x67, 0x90 " ::"i"(tag):"%ebx")
#if defined( __INTEL_COMPILER )
# define INSERT_SSC_MARK(tag) __SSC_MARK(tag)
#else
# define INSERT_SSC_MARK(tag) \
__asm__ __volatile__ ("movl %0, %%ebx; .byte 0x64, 0x67, 0x90 " ::"i"(tag):"%ebx")
#endif
#else
# define INSERT_SSC_MARK(tag) ((void)0)
#endif
@ -150,6 +162,18 @@ __kmp_inline void __kmp_itt_stack_callee_leave(__itt_caller);
#define SSC_MARK_SPIN_START() INSERT_SSC_MARK(0x4376)
#define SSC_MARK_SPIN_END() INSERT_SSC_MARK(0x4377)
// Markers for architecture simulation.
// FORKING : Before the master thread forks.
// JOINING : At the start of the join.
// INVOKING : Before the threads invoke microtasks.
// DISPATCH_INIT: At the start of dynamically scheduled loop.
// DISPATCH_NEXT: After claming next iteration of dynamically scheduled loop.
#define SSC_MARK_FORKING() INSERT_SSC_MARK(0xd693)
#define SSC_MARK_JOINING() INSERT_SSC_MARK(0xd694)
#define SSC_MARK_INVOKING() INSERT_SSC_MARK(0xd695)
#define SSC_MARK_DISPATCH_INIT() INSERT_SSC_MARK(0xd696)
#define SSC_MARK_DISPATCH_NEXT() INSERT_SSC_MARK(0xd697)
// The object is an address that associates a specific set of the prepare, acquire, release,
// and cancel operations.
@ -227,8 +251,14 @@ __kmp_inline void __kmp_itt_stack_callee_leave(__itt_caller);
const int KMP_MAX_FRAME_DOMAINS = 512; // Maximum number of frame domains to use (maps to
// different OpenMP regions in the user source code).
extern kmp_int32 __kmp_frame_domain_count;
extern __itt_domain* __kmp_itt_domains[KMP_MAX_FRAME_DOMAINS];
extern kmp_int32 __kmp_barrier_domain_count;
extern kmp_int32 __kmp_region_domain_count;
extern __itt_domain* __kmp_itt_barrier_domains[KMP_MAX_FRAME_DOMAINS];
extern __itt_domain* __kmp_itt_region_domains[KMP_MAX_FRAME_DOMAINS];
extern __itt_domain* __kmp_itt_imbalance_domains[KMP_MAX_FRAME_DOMAINS];
extern kmp_int32 __kmp_itt_region_team_size[KMP_MAX_FRAME_DOMAINS];
extern __itt_domain * metadata_domain;
#else
// Null definitions of the synchronization tracing functions.

View File

@ -1,8 +1,8 @@
#if USE_ITT_BUILD
/*
* kmp_itt.inl -- Inline functions of ITT Notify.
* $Revision: 42866 $
* $Date: 2013-12-10 15:15:58 -0600 (Tue, 10 Dec 2013) $
* $Revision: 43457 $
* $Date: 2014-09-17 03:57:22 -0500 (Wed, 17 Sep 2014) $
*/
@ -63,6 +63,8 @@
#endif
#endif
static kmp_bootstrap_lock_t metadata_lock = KMP_BOOTSTRAP_LOCK_INITIALIZER( metadata_lock );
/*
------------------------------------------------------------------------------------------------
Parallel region reporting.
@ -89,12 +91,10 @@
// -------------------------------------------------------------------------------------------------
LINKAGE void
__kmp_itt_region_forking( int gtid, int serialized ) {
__kmp_itt_region_forking( int gtid, int team_size, int barriers, int serialized ) {
#if USE_ITT_NOTIFY
kmp_team_t * team = __kmp_team_from_gtid( gtid );
#if OMP_30_ENABLED
if (team->t.t_active_level + serialized > 1)
#endif
{
// The frame notifications are only supported for the outermost teams.
return;
@ -105,40 +105,81 @@ __kmp_itt_region_forking( int gtid, int serialized ) {
// Assume that reserved_2 contains zero initially. Since zero is special
// value here, store the index into domain array increased by 1.
if (loc->reserved_2 == 0) {
if (__kmp_frame_domain_count < KMP_MAX_FRAME_DOMAINS) {
int frm = KMP_TEST_THEN_INC32( & __kmp_frame_domain_count ); // get "old" value
if (__kmp_region_domain_count < KMP_MAX_FRAME_DOMAINS) {
int frm = KMP_TEST_THEN_INC32( & __kmp_region_domain_count ); // get "old" value
if (frm >= KMP_MAX_FRAME_DOMAINS) {
KMP_TEST_THEN_DEC32( & __kmp_frame_domain_count ); // revert the count
KMP_TEST_THEN_DEC32( & __kmp_region_domain_count ); // revert the count
return; // loc->reserved_2 is still 0
}
//if (!KMP_COMPARE_AND_STORE_ACQ32( &loc->reserved_2, 0, frm + 1 )) {
// frm = loc->reserved_2 - 1; // get value saved by other thread for same loc
//} // AC: this block is to replace next unsynchronized line
loc->reserved_2 = frm + 1; // save "new" value
// We need to save indexes for both region and barrier frames. We'll use loc->reserved_2
// field but put region index to the low two bytes and barrier indexes to the high
// two bytes. It is OK because KMP_MAX_FRAME_DOMAINS = 512.
loc->reserved_2 |= (frm + 1); // save "new" value
// Transform compiler-generated region location into the format
// that the tools more or less standardized on:
// "<func>$omp$parallel@[file:]<line>[:<col>]"
const char * buff = NULL;
kmp_str_loc_t str_loc = __kmp_str_loc_init( loc->psource, 1 );
buff = __kmp_str_format("%s$omp$parallel@%s:%d:%d",
str_loc.func, str_loc.file,
buff = __kmp_str_format("%s$omp$parallel:%d@%s:%d:%d",
str_loc.func, team_size, str_loc.file,
str_loc.line, str_loc.col);
__kmp_str_loc_free( &str_loc );
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_domains[ frm ] = __itt_domain_create( buff );
__kmp_itt_region_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__kmp_str_free( &buff );
__itt_frame_begin_v3(__kmp_itt_domains[ frm ], NULL);
if( barriers ) {
if (__kmp_barrier_domain_count < KMP_MAX_FRAME_DOMAINS) {
int frm = KMP_TEST_THEN_INC32( & __kmp_barrier_domain_count ); // get "old" value
if (frm >= KMP_MAX_FRAME_DOMAINS) {
KMP_TEST_THEN_DEC32( & __kmp_barrier_domain_count ); // revert the count
return; // loc->reserved_2 is still 0
}
const char * buff = NULL;
buff = __kmp_str_format("%s$omp$barrier@%s:%d",
str_loc.func, str_loc.file, str_loc.col);
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_barrier_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__kmp_str_free( &buff );
// Save the barrier frame index to the high two bytes.
loc->reserved_2 |= (frm + 1) << 16;
}
}
__kmp_str_loc_free( &str_loc );
__itt_frame_begin_v3(__kmp_itt_region_domains[ frm ], NULL);
}
} else { // Region domain exists for this location
// Check if team size was changed. Then create new region domain for this location
int frm = (loc->reserved_2 & 0x0000FFFF) - 1;
if( __kmp_itt_region_team_size[frm] != team_size ) {
const char * buff = NULL;
kmp_str_loc_t str_loc = __kmp_str_loc_init( loc->psource, 1 );
buff = __kmp_str_format("%s$omp$parallel:%d@%s:%d:%d",
str_loc.func, team_size, str_loc.file,
str_loc.line, str_loc.col);
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_region_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__kmp_str_free( &buff );
__kmp_str_loc_free( &str_loc );
__kmp_itt_region_team_size[frm] = team_size;
__itt_frame_begin_v3(__kmp_itt_region_domains[frm], NULL);
} else { // Team size was not changed. Use existing domain.
__itt_frame_begin_v3(__kmp_itt_region_domains[frm], NULL);
}
} else { // if it is not 0 then it should be <= KMP_MAX_FRAME_DOMAINS
__itt_frame_begin_v3(__kmp_itt_domains[loc->reserved_2 - 1], NULL);
}
KMP_ITT_DEBUG_LOCK();
KMP_ITT_DEBUG_PRINT( "[frm beg] gtid=%d, idx=%d, serialized:%d, loc:%p\n",
gtid, loc->reserved_2 - 1, serialized, loc );
KMP_ITT_DEBUG_PRINT( "[frm beg] gtid=%d, idx=%x, serialized:%d, loc:%p\n",
gtid, loc->reserved_2, serialized, loc );
}
#endif
} // __kmp_itt_region_forking
@ -146,50 +187,207 @@ __kmp_itt_region_forking( int gtid, int serialized ) {
// -------------------------------------------------------------------------------------------------
LINKAGE void
__kmp_itt_frame_submit( int gtid, __itt_timestamp begin, __itt_timestamp end, int imbalance, ident_t * loc ) {
__kmp_itt_frame_submit( int gtid, __itt_timestamp begin, __itt_timestamp end, int imbalance, ident_t * loc, int team_size, int region ) {
#if USE_ITT_NOTIFY
if( region ) {
kmp_team_t * team = __kmp_team_from_gtid( gtid );
int serialized = ( region == 2 ? 1 : 0 );
if (team->t.t_active_level + serialized > 1)
{
// The frame notifications are only supported for the outermost teams.
return;
}
//Check region domain has not been created before. It's index is saved in the low two bytes.
if ((loc->reserved_2 & 0x0000FFFF) == 0) {
if (__kmp_region_domain_count < KMP_MAX_FRAME_DOMAINS) {
int frm = KMP_TEST_THEN_INC32( & __kmp_region_domain_count ); // get "old" value
if (frm >= KMP_MAX_FRAME_DOMAINS) {
KMP_TEST_THEN_DEC32( & __kmp_region_domain_count ); // revert the count
return; // loc->reserved_2 is still 0
}
// We need to save indexes for both region and barrier frames. We'll use loc->reserved_2
// field but put region index to the low two bytes and barrier indexes to the high
// two bytes. It is OK because KMP_MAX_FRAME_DOMAINS = 512.
loc->reserved_2 |= (frm + 1); // save "new" value
// Transform compiler-generated region location into the format
// that the tools more or less standardized on:
// "<func>$omp$parallel:team_size@[file:]<line>[:<col>]"
const char * buff = NULL;
kmp_str_loc_t str_loc = __kmp_str_loc_init( loc->psource, 1 );
buff = __kmp_str_format("%s$omp$parallel:%d@%s:%d:%d",
str_loc.func, team_size, str_loc.file,
str_loc.line, str_loc.col);
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_region_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__kmp_str_free( &buff );
__kmp_str_loc_free( &str_loc );
__kmp_itt_region_team_size[frm] = team_size;
__itt_frame_submit_v3(__kmp_itt_region_domains[ frm ], NULL, begin, end );
}
} else { // Region domain exists for this location
// Check if team size was changed. Then create new region domain for this location
int frm = (loc->reserved_2 & 0x0000FFFF) - 1;
if( __kmp_itt_region_team_size[frm] != team_size ) {
const char * buff = NULL;
kmp_str_loc_t str_loc = __kmp_str_loc_init( loc->psource, 1 );
buff = __kmp_str_format("%s$omp$parallel:%d@%s:%d:%d",
str_loc.func, team_size, str_loc.file,
str_loc.line, str_loc.col);
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_region_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__kmp_str_free( &buff );
__kmp_str_loc_free( &str_loc );
__kmp_itt_region_team_size[frm] = team_size;
__itt_frame_submit_v3(__kmp_itt_region_domains[ frm ], NULL, begin, end );
} else { // Team size was not changed. Use existing domain.
__itt_frame_submit_v3(__kmp_itt_region_domains[ frm ], NULL, begin, end );
}
}
KMP_ITT_DEBUG_LOCK();
KMP_ITT_DEBUG_PRINT( "[reg sub] gtid=%d, idx=%x, region:%d, loc:%p, beg:%llu, end:%llu\n",
gtid, loc->reserved_2, region, loc, begin, end );
return;
} else { // called for barrier reporting
if (loc) {
if (loc->reserved_2 == 0) {
if (__kmp_frame_domain_count < KMP_MAX_FRAME_DOMAINS) {
int frm = KMP_TEST_THEN_INC32( & __kmp_frame_domain_count ); // get "old" value
if ((loc->reserved_2 & 0xFFFF0000) == 0) {
if (__kmp_barrier_domain_count < KMP_MAX_FRAME_DOMAINS) {
int frm = KMP_TEST_THEN_INC32( & __kmp_barrier_domain_count ); // get "old" value
if (frm >= KMP_MAX_FRAME_DOMAINS) {
KMP_TEST_THEN_DEC32( & __kmp_frame_domain_count ); // revert the count
KMP_TEST_THEN_DEC32( & __kmp_barrier_domain_count ); // revert the count
return; // loc->reserved_2 is still 0
}
// Should it be synchronized? See the comment in __kmp_itt_region_forking
loc->reserved_2 = frm + 1; // save "new" value
// Save the barrier frame index to the high two bytes.
loc->reserved_2 |= (frm + 1) << 16; // save "new" value
// Transform compiler-generated region location into the format
// that the tools more or less standardized on:
// "<func>$omp$frame@[file:]<line>[:<col>]"
const char * buff = NULL;
kmp_str_loc_t str_loc = __kmp_str_loc_init( loc->psource, 1 );
if( imbalance ) {
buff = __kmp_str_format("%s$omp$barrier-imbalance@%s:%d",
str_loc.func, str_loc.file, str_loc.col);
const char * buff_imb = NULL;
buff_imb = __kmp_str_format("%s$omp$barrier-imbalance:%d@%s:%d",
str_loc.func, team_size, str_loc.file, str_loc.col);
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_imbalance_domains[ frm ] = __itt_domain_create( buff_imb );
__itt_suppress_pop();
__itt_frame_submit_v3(__kmp_itt_imbalance_domains[ frm ], NULL, begin, end );
__kmp_str_free( &buff_imb );
} else {
const char * buff = NULL;
buff = __kmp_str_format("%s$omp$barrier@%s:%d",
str_loc.func, str_loc.file, str_loc.col);
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_barrier_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__itt_frame_submit_v3(__kmp_itt_barrier_domains[ frm ], NULL, begin, end );
__kmp_str_free( &buff );
}
__kmp_str_loc_free( &str_loc );
__itt_suppress_push(__itt_suppress_memory_errors);
__kmp_itt_domains[ frm ] = __itt_domain_create( buff );
__itt_suppress_pop();
__kmp_str_free( &buff );
__itt_frame_submit_v3(__kmp_itt_domains[ frm ], NULL, begin, end );
}
} else { // if it is not 0 then it should be <= KMP_MAX_FRAME_DOMAINS
__itt_frame_submit_v3(__kmp_itt_domains[loc->reserved_2 - 1], NULL, begin, end );
if( imbalance ) {
__itt_frame_submit_v3(__kmp_itt_imbalance_domains[ (loc->reserved_2 >> 16) - 1 ], NULL, begin, end );
} else {
__itt_frame_submit_v3(__kmp_itt_barrier_domains[(loc->reserved_2 >> 16) - 1], NULL, begin, end );
}
}
KMP_ITT_DEBUG_LOCK();
KMP_ITT_DEBUG_PRINT( "[frm sub] gtid=%d, idx=%x, loc:%p, beg:%llu, end:%llu\n",
gtid, loc->reserved_2, loc, begin, end );
}
}
#endif
} // __kmp_itt_frame_submit
// -------------------------------------------------------------------------------------------------
LINKAGE void
__kmp_itt_metadata_imbalance( int gtid, kmp_uint64 begin, kmp_uint64 end, kmp_uint64 imbalance, kmp_uint64 reduction ) {
#if USE_ITT_NOTIFY
if( metadata_domain == NULL) {
__kmp_acquire_bootstrap_lock( & metadata_lock );
if( metadata_domain == NULL) {
__itt_suppress_push(__itt_suppress_memory_errors);
metadata_domain = __itt_domain_create( "OMP Metadata" );
__itt_suppress_pop();
}
__kmp_release_bootstrap_lock( & metadata_lock );
}
__itt_string_handle * string_handle = __itt_string_handle_create( "omp_metadata_imbalance");
kmp_uint64 imbalance_data[ 4 ];
imbalance_data[ 0 ] = begin;
imbalance_data[ 1 ] = end;
imbalance_data[ 2 ] = imbalance;
imbalance_data[ 3 ] = reduction;
__itt_metadata_add(metadata_domain, __itt_null, string_handle, __itt_metadata_u64, 4, imbalance_data);
#endif
} // __kmp_itt_metadata_imbalance
// -------------------------------------------------------------------------------------------------
LINKAGE void
__kmp_itt_metadata_loop( ident_t * loc, kmp_uint64 sched_type, kmp_uint64 iterations, kmp_uint64 chunk ) {
#if USE_ITT_NOTIFY
if( metadata_domain == NULL) {
__kmp_acquire_bootstrap_lock( & metadata_lock );
if( metadata_domain == NULL) {
__itt_suppress_push(__itt_suppress_memory_errors);
metadata_domain = __itt_domain_create( "OMP Metadata" );
__itt_suppress_pop();
}
__kmp_release_bootstrap_lock( & metadata_lock );
}
__itt_string_handle * string_handle = __itt_string_handle_create( "omp_metadata_loop");
kmp_str_loc_t str_loc = __kmp_str_loc_init( loc->psource, 1 );
kmp_uint64 loop_data[ 5 ];
loop_data[ 0 ] = str_loc.line;
loop_data[ 1 ] = str_loc.col;
loop_data[ 2 ] = sched_type;
loop_data[ 3 ] = iterations;
loop_data[ 4 ] = chunk;
__kmp_str_loc_free( &str_loc );
__itt_metadata_add(metadata_domain, __itt_null, string_handle, __itt_metadata_u64, 5, loop_data);
#endif
} // __kmp_itt_metadata_loop
// -------------------------------------------------------------------------------------------------
LINKAGE void
__kmp_itt_metadata_single( ) {
#if USE_ITT_NOTIFY
if( metadata_domain == NULL) {
__kmp_acquire_bootstrap_lock( & metadata_lock );
if( metadata_domain == NULL) {
__itt_suppress_push(__itt_suppress_memory_errors);
metadata_domain = __itt_domain_create( "OMP Metadata" );
__itt_suppress_pop();
}
__kmp_release_bootstrap_lock( & metadata_lock );
}
__itt_string_handle * string_handle = __itt_string_handle_create( "omp_metadata_single");
__itt_metadata_add(metadata_domain, __itt_null, string_handle, __itt_metadata_u64, 0, NULL);
#endif
} // __kmp_itt_metadata_single
// -------------------------------------------------------------------------------------------------
LINKAGE void
__kmp_itt_region_starting( int gtid ) {
#if USE_ITT_NOTIFY
@ -210,19 +408,21 @@ LINKAGE void
__kmp_itt_region_joined( int gtid, int serialized ) {
#if USE_ITT_NOTIFY
kmp_team_t * team = __kmp_team_from_gtid( gtid );
#if OMP_30_ENABLED
if (team->t.t_active_level + serialized > 1)
#endif
{
// The frame notifications are only supported for the outermost teams.
return;
}
ident_t * loc = __kmp_thread_from_gtid( gtid )->th.th_ident;
if (loc && loc->reserved_2 && loc->reserved_2 <= KMP_MAX_FRAME_DOMAINS) {
KMP_ITT_DEBUG_LOCK();
__itt_frame_end_v3(__kmp_itt_domains[loc->reserved_2 - 1], NULL);
KMP_ITT_DEBUG_PRINT( "[frm end] gtid=%d, idx=%d, serialized:%d, loc:%p\n",
gtid, loc->reserved_2 - 1, serialized, loc );
if (loc && loc->reserved_2)
{
int frm = (loc->reserved_2 & 0x0000FFFF) - 1;
if(frm < KMP_MAX_FRAME_DOMAINS) {
KMP_ITT_DEBUG_LOCK();
__itt_frame_end_v3(__kmp_itt_region_domains[frm], NULL);
KMP_ITT_DEBUG_PRINT( "[frm end] gtid=%d, idx=%x, serialized:%d, loc:%p\n",
gtid, loc->reserved_2, serialized, loc );
}
}
#endif
} // __kmp_itt_region_joined
@ -409,8 +609,6 @@ __kmp_itt_barrier_finished( int gtid, void * object ) {
#endif
} // __kmp_itt_barrier_finished
#if OMP_30_ENABLED
/*
------------------------------------------------------------------------------------------------
Taskwait reporting.
@ -507,8 +705,6 @@ __kmp_itt_task_finished(
// -------------------------------------------------------------------------------------------------
#endif /* OMP_30_ENABLED */
/*
------------------------------------------------------------------------------------------------
Lock reporting.
@ -757,7 +953,11 @@ __kmp_itt_thread_name( int gtid ) {
if ( __itt_thr_name_set_ptr ) {
kmp_str_buf_t name;
__kmp_str_buf_init( & name );
__kmp_str_buf_print( & name, "OMP Worker Thread #%d", gtid );
if( KMP_MASTER_GTID(gtid) ) {
__kmp_str_buf_print( & name, "OMP Master Thread #%d", gtid );
} else {
__kmp_str_buf_print( & name, "OMP Worker Thread #%d", gtid );
}
KMP_ITT_DEBUG_LOCK();
__itt_thr_name_set( name.str, name.used );
KMP_ITT_DEBUG_PRINT( "[thr nam] name( \"%s\")\n", name.str );

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
/*
* kmp_lock.h -- lock header file
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -280,16 +280,16 @@ extern void __kmp_destroy_nested_ticket_lock( kmp_ticket_lock_t *lck );
#if KMP_USE_ADAPTIVE_LOCKS
struct kmp_adaptive_lock;
struct kmp_adaptive_lock_info;
typedef struct kmp_adaptive_lock kmp_adaptive_lock_t;
typedef struct kmp_adaptive_lock_info kmp_adaptive_lock_info_t;
#if KMP_DEBUG_ADAPTIVE_LOCKS
struct kmp_adaptive_lock_statistics {
/* So we can get stats from locks that haven't been destroyed. */
kmp_adaptive_lock_t * next;
kmp_adaptive_lock_t * prev;
kmp_adaptive_lock_info_t * next;
kmp_adaptive_lock_info_t * prev;
/* Other statistics */
kmp_uint32 successfulSpeculations;
@ -307,7 +307,7 @@ extern void __kmp_init_speculative_stats();
#endif // KMP_DEBUG_ADAPTIVE_LOCKS
struct kmp_adaptive_lock
struct kmp_adaptive_lock_info
{
/* Values used for adaptivity.
* Although these are accessed from multiple threads we don't access them atomically,
@ -348,10 +348,6 @@ struct kmp_base_queuing_lock {
kmp_int32 depth_locked; // depth locked, for nested locks only
kmp_lock_flags_t flags; // lock specifics, e.g. critical section lock
#if KMP_USE_ADAPTIVE_LOCKS
KMP_ALIGN(CACHE_LINE)
kmp_adaptive_lock_t adaptive; // Information for the speculative adaptive lock
#endif
};
typedef struct kmp_base_queuing_lock kmp_base_queuing_lock_t;
@ -379,6 +375,30 @@ extern void __kmp_release_nested_queuing_lock( kmp_queuing_lock_t *lck, kmp_int3
extern void __kmp_init_nested_queuing_lock( kmp_queuing_lock_t *lck );
extern void __kmp_destroy_nested_queuing_lock( kmp_queuing_lock_t *lck );
#if KMP_USE_ADAPTIVE_LOCKS
// ----------------------------------------------------------------------------
// Adaptive locks.
// ----------------------------------------------------------------------------
struct kmp_base_adaptive_lock {
kmp_base_queuing_lock qlk;
KMP_ALIGN(CACHE_LINE)
kmp_adaptive_lock_info_t adaptive; // Information for the speculative adaptive lock
};
typedef struct kmp_base_adaptive_lock kmp_base_adaptive_lock_t;
union KMP_ALIGN_CACHE kmp_adaptive_lock {
kmp_base_adaptive_lock_t lk;
kmp_lock_pool_t pool;
double lk_align;
char lk_pad[ KMP_PAD(kmp_base_adaptive_lock_t, CACHE_LINE) ];
};
typedef union kmp_adaptive_lock kmp_adaptive_lock_t;
# define GET_QLK_PTR(l) ((kmp_queuing_lock_t *) & (l)->lk.qlk)
#endif // KMP_USE_ADAPTIVE_LOCKS
// ----------------------------------------------------------------------------
// DRDPA ticket locks.
@ -913,7 +933,26 @@ __kmp_set_user_lock_flags( kmp_user_lock_p lck, kmp_lock_flags_t flags )
//
extern void __kmp_set_user_lock_vptrs( kmp_lock_kind_t user_lock_kind );
//
// Macros for binding user lock functions.
//
#define KMP_BIND_USER_LOCK_TEMPLATE(nest, kind, suffix) { \
__kmp_acquire##nest##user_lock_with_checks_ = ( void (*)( kmp_user_lock_p, kmp_int32 ) ) \
__kmp_acquire##nest##kind##_##suffix; \
__kmp_release##nest##user_lock_with_checks_ = ( void (*)( kmp_user_lock_p, kmp_int32 ) ) \
__kmp_release##nest##kind##_##suffix; \
__kmp_test##nest##user_lock_with_checks_ = ( int (*)( kmp_user_lock_p, kmp_int32 ) ) \
__kmp_test##nest##kind##_##suffix; \
__kmp_init##nest##user_lock_with_checks_ = ( void (*)( kmp_user_lock_p ) ) \
__kmp_init##nest##kind##_##suffix; \
__kmp_destroy##nest##user_lock_with_checks_ = ( void (*)( kmp_user_lock_p ) ) \
__kmp_destroy##nest##kind##_##suffix; \
}
#define KMP_BIND_USER_LOCK(kind) KMP_BIND_USER_LOCK_TEMPLATE(_, kind, lock)
#define KMP_BIND_USER_LOCK_WITH_CHECKS(kind) KMP_BIND_USER_LOCK_TEMPLATE(_, kind, lock_with_checks)
#define KMP_BIND_NESTED_USER_LOCK(kind) KMP_BIND_USER_LOCK_TEMPLATE(_nested_, kind, lock)
#define KMP_BIND_NESTED_USER_LOCK_WITH_CHECKS(kind) KMP_BIND_USER_LOCK_TEMPLATE(_nested_, kind, lock_with_checks)
// ----------------------------------------------------------------------------
// User lock table & lock allocation

View File

@ -1,8 +1,8 @@
/*
* kmp_omp.h -- OpenMP definition for kmp_omp_struct_info_t.
* This is for information about runtime library structures.
* $Revision: 42105 $
* $Date: 2013-03-11 14:51:34 -0500 (Mon, 11 Mar 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_os.h -- KPTS runtime header file.
* $Revision: 42820 $
* $Date: 2013-11-13 16:53:44 -0600 (Wed, 13 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -69,7 +69,7 @@
#define KMP_OS_LINUX 0
#define KMP_OS_FREEBSD 0
#define KMP_OS_DARWIN 0
#define KMP_OS_WINDOWS 0
#define KMP_OS_WINDOWS 0
#define KMP_OS_CNK 0
#define KMP_OS_UNIX 0 /* disjunction of KMP_OS_LINUX, KMP_OS_DARWIN etc. */
@ -116,6 +116,12 @@
# define KMP_OS_UNIX 1
#endif
#if (KMP_OS_LINUX || KMP_OS_WINDOWS) && !KMP_OS_CNK && !KMP_ARCH_PPC64
# define KMP_AFFINITY_SUPPORTED 1
#else
# define KMP_AFFINITY_SUPPORTED 0
#endif
#if KMP_OS_WINDOWS
# if defined _M_AMD64
# undef KMP_ARCH_X86_64
@ -356,6 +362,8 @@ typedef double kmp_real64;
extern "C" {
#endif // __cplusplus
#define INTERNODE_CACHE_LINE 4096 /* for multi-node systems */
/* Define the default size of the cache line */
#ifndef CACHE_LINE
#define CACHE_LINE 128 /* cache line size in bytes */
@ -366,16 +374,6 @@ extern "C" {
#endif
#endif /* CACHE_LINE */
/* SGI's cache padding improvements using align decl specs (Ver 19) */
#if !defined KMP_PERF_V19
# define KMP_PERF_V19 KMP_ON
#endif
/* SGI's improvements for inline argv (Ver 106) */
#if !defined KMP_PERF_V106
# define KMP_PERF_V106 KMP_ON
#endif
#define KMP_CACHE_PREFETCH(ADDR) /* nothing */
/* Temporary note: if performance testing of this passes, we can remove
@ -383,10 +381,12 @@ extern "C" {
#if KMP_OS_UNIX && defined(__GNUC__)
# define KMP_DO_ALIGN(bytes) __attribute__((aligned(bytes)))
# define KMP_ALIGN_CACHE __attribute__((aligned(CACHE_LINE)))
# define KMP_ALIGN_CACHE_INTERNODE __attribute__((aligned(INTERNODE_CACHE_LINE)))
# define KMP_ALIGN(bytes) __attribute__((aligned(bytes)))
#else
# define KMP_DO_ALIGN(bytes) __declspec( align(bytes) )
# define KMP_ALIGN_CACHE __declspec( align(CACHE_LINE) )
# define KMP_ALIGN_CACHE_INTERNODE __declspec( align(INTERNODE_CACHE_LINE) )
# define KMP_ALIGN(bytes) __declspec( align(bytes) )
#endif
@ -525,7 +525,7 @@ extern kmp_real64 __kmp_xchg_real64( volatile kmp_real64 *p, kmp_real64 v );
# define KMP_XCHG_REAL64(p, v) __kmp_xchg_real64( (p), (v) );
#elif (KMP_ASM_INTRINS && (KMP_OS_LINUX || KMP_OS_FREEBSD || KMP_OS_DARWIN)) || !(KMP_ARCH_X86 || KMP_ARCH_X86_64)
#elif (KMP_ASM_INTRINS && KMP_OS_UNIX) || !(KMP_ARCH_X86 || KMP_ARCH_X86_64)
/* cast p to correct type so that proper intrinsic will be used */
# define KMP_TEST_THEN_INC32(p) __sync_fetch_and_add( (kmp_int32 *)(p), 1 )
@ -654,17 +654,6 @@ extern kmp_real64 __kmp_xchg_real64( volatile kmp_real64 *p, kmp_real64 v );
#endif /* KMP_ASM_INTRINS */
# if !KMP_MIC
//
// no routines for floating addition on MIC
// no intrinsic support for floating addition on UNIX
//
extern kmp_real32 __kmp_test_then_add_real32 ( volatile kmp_real32 *p, kmp_real32 v );
extern kmp_real64 __kmp_test_then_add_real64 ( volatile kmp_real64 *p, kmp_real64 v );
# define KMP_TEST_THEN_ADD_REAL32(p, v) __kmp_test_then_add_real32( (p), (v) )
# define KMP_TEST_THEN_ADD_REAL64(p, v) __kmp_test_then_add_real64( (p), (v) )
# endif
/* ------------- relaxed consistency memory model stuff ------------------ */

File diff suppressed because it is too large Load Diff

View File

@ -1,7 +1,7 @@
/*
* kmp_sched.c -- static scheduling -- iteration initialization
* $Revision: 42358 $
* $Date: 2013-05-07 13:43:26 -0500 (Tue, 07 May 2013) $
* $Revision: 43457 $
* $Date: 2014-09-17 03:57:22 -0500 (Wed, 17 Sep 2014) $
*/
@ -28,6 +28,8 @@
#include "kmp_i18n.h"
#include "kmp_str.h"
#include "kmp_error.h"
#include "kmp_stats.h"
#include "kmp_itt.h"
// template for type limits
template< typename T >
@ -79,6 +81,7 @@ __kmp_for_static_init(
typename traits_t< T >::signed_t incr,
typename traits_t< T >::signed_t chunk
) {
KMP_COUNT_BLOCK(OMP_FOR_static);
typedef typename traits_t< T >::unsigned_t UT;
typedef typename traits_t< T >::signed_t ST;
/* this all has to be changed back to TID and such.. */
@ -88,6 +91,7 @@ __kmp_for_static_init(
register UT trip_count;
register kmp_team_t *team;
KMP_DEBUG_ASSERT( plastiter && plower && pupper && pstride );
KE_TRACE( 10, ("__kmpc_for_static_init called (%d)\n", global_tid));
#ifdef KMP_DEBUG
{
@ -108,12 +112,12 @@ __kmp_for_static_init(
__kmp_push_workshare( global_tid, ct_pdo, loc );
if ( incr == 0 ) {
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrZeroProhibited, ct_pdo, loc );
}
}
/* special handling for zero-trip loops */
if ( incr > 0 ? (*pupper < *plower) : (*plower < *pupper) ) {
*plastiter = FALSE;
if( plastiter != NULL )
*plastiter = FALSE;
/* leave pupper and plower set to entire iteration space */
*pstride = incr; /* value should never be used */
// *plower = *pupper - incr; // let compiler bypass the illegal loop (like for(i=1;i<10;i--)) THIS LINE CAUSED shape2F/h_tests_1.f TO HAVE A FAILURE ON A ZERO-TRIP LOOP (lower=1,\
@ -149,7 +153,8 @@ __kmp_for_static_init(
/* determine if "for" loop is an active worksharing construct */
if ( team -> t.t_serialized ) {
/* serialized parallel, each thread executes whole iteration space */
*plastiter = TRUE;
if( plastiter != NULL )
*plastiter = TRUE;
/* leave pupper and plower set to entire iteration space */
*pstride = (incr > 0) ? (*pupper - *plower + 1) : (-(*plower - *pupper + 1));
@ -169,8 +174,9 @@ __kmp_for_static_init(
}
nth = team->t.t_nproc;
if ( nth == 1 ) {
*plastiter = TRUE;
if( plastiter != NULL )
*plastiter = TRUE;
*pstride = (incr > 0) ? (*pupper - *plower + 1) : (-(*plower - *pupper + 1));
#ifdef KMP_DEBUG
{
const char * buff;
@ -192,12 +198,13 @@ __kmp_for_static_init(
} else if (incr == -1) {
trip_count = *plower - *pupper + 1;
} else {
if ( incr > 1 ) {
if ( incr > 1 ) { // the check is needed for unsigned division when incr < 0
trip_count = (*pupper - *plower) / incr + 1;
} else {
trip_count = (*plower - *pupper) / ( -incr ) + 1;
}
}
if ( __kmp_env_consistency_check ) {
/* tripcount overflow? */
if ( trip_count == 0 && *pupper != *plower ) {
@ -219,14 +226,16 @@ __kmp_for_static_init(
} else {
*plower = *pupper + incr;
}
*plastiter = ( tid == trip_count - 1 );
if( plastiter != NULL )
*plastiter = ( tid == trip_count - 1 );
} else {
if ( __kmp_static == kmp_sch_static_balanced ) {
register UT small_chunk = trip_count / nth;
register UT extras = trip_count % nth;
*plower += incr * ( tid * small_chunk + ( tid < extras ? tid : extras ) );
*pupper = *plower + small_chunk * incr - ( tid < extras ? 0 : incr );
*plastiter = ( tid == nth - 1 );
if( plastiter != NULL )
*plastiter = ( tid == nth - 1 );
} else {
register T big_chunk_inc_count = ( trip_count/nth +
( ( trip_count % nth ) ? 1 : 0) ) * incr;
@ -238,16 +247,16 @@ __kmp_for_static_init(
*plower += tid * big_chunk_inc_count;
*pupper = *plower + big_chunk_inc_count - incr;
if ( incr > 0 ) {
if ( *pupper < *plower ) {
if( *pupper < *plower )
*pupper = i_maxmin< T >::mx;
}
*plastiter = *plower <= old_upper && *pupper > old_upper - incr;
if( plastiter != NULL )
*plastiter = *plower <= old_upper && *pupper > old_upper - incr;
if ( *pupper > old_upper ) *pupper = old_upper; // tracker C73258
} else {
if ( *pupper > *plower ) {
if( *pupper > *plower )
*pupper = i_maxmin< T >::mn;
}
*plastiter = *plower >= old_upper && *pupper < old_upper - incr;
if( plastiter != NULL )
*plastiter = *plower >= old_upper && *pupper < old_upper - incr;
if ( *pupper < old_upper ) *pupper = old_upper; // tracker C73258
}
}
@ -256,7 +265,7 @@ __kmp_for_static_init(
}
case kmp_sch_static_chunked:
{
register T span;
register ST span;
if ( chunk < 1 ) {
chunk = 1;
}
@ -264,11 +273,8 @@ __kmp_for_static_init(
*pstride = span * nth;
*plower = *plower + (span * tid);
*pupper = *plower + span - incr;
/* TODO: is the following line a bug? Shouldn't it be plastiter instead of *plastiter ? */
if (*plastiter) { /* only calculate this if it was requested */
kmp_int32 lasttid = ((trip_count - 1) / ( UT )chunk) % nth;
*plastiter = (tid == lasttid);
}
if( plastiter != NULL )
*plastiter = (tid == ((trip_count - 1)/( UT )chunk) % nth);
break;
}
default:
@ -276,6 +282,18 @@ __kmp_for_static_init(
break;
}
#if USE_ITT_BUILD
// Report loop metadata
if ( KMP_MASTER_TID(tid) && __itt_metadata_add_ptr && __kmp_forkjoin_frames_mode == 3 ) {
kmp_uint64 cur_chunk = chunk;
// Calculate chunk in case it was not specified; it is specified for kmp_sch_static_chunked
if ( schedtype == kmp_sch_static ) {
cur_chunk = trip_count / nth + ( ( trip_count % nth ) ? 1 : 0);
}
// 0 - "static" schedule
__kmp_itt_metadata_loop(loc, 0, trip_count, cur_chunk);
}
#endif
#ifdef KMP_DEBUG
{
const char * buff;
@ -291,6 +309,355 @@ __kmp_for_static_init(
return;
}
template< typename T >
static void
__kmp_dist_for_static_init(
ident_t *loc,
kmp_int32 gtid,
kmp_int32 schedule,
kmp_int32 *plastiter,
T *plower,
T *pupper,
T *pupperDist,
typename traits_t< T >::signed_t *pstride,
typename traits_t< T >::signed_t incr,
typename traits_t< T >::signed_t chunk
) {
KMP_COUNT_BLOCK(OMP_DISTR_FOR_static);
typedef typename traits_t< T >::unsigned_t UT;
typedef typename traits_t< T >::signed_t ST;
register kmp_uint32 tid;
register kmp_uint32 nth;
register kmp_uint32 team_id;
register kmp_uint32 nteams;
register UT trip_count;
register kmp_team_t *team;
kmp_info_t * th;
KMP_DEBUG_ASSERT( plastiter && plower && pupper && pupperDist && pstride );
KE_TRACE( 10, ("__kmpc_dist_for_static_init called (%d)\n", gtid));
#ifdef KMP_DEBUG
{
const char * buff;
// create format specifiers before the debug output
buff = __kmp_str_format(
"__kmpc_dist_for_static_init: T#%%d schedLoop=%%d liter=%%d "\
"iter=(%%%s, %%%s, %%%s) chunk=%%%s signed?<%s>\n",
traits_t< T >::spec, traits_t< T >::spec, traits_t< ST >::spec,
traits_t< ST >::spec, traits_t< T >::spec );
KD_TRACE(100, ( buff, gtid, schedule, *plastiter,
*plower, *pupper, incr, chunk ) );
__kmp_str_free( &buff );
}
#endif
if( __kmp_env_consistency_check ) {
__kmp_push_workshare( gtid, ct_pdo, loc );
if( incr == 0 ) {
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrZeroProhibited, ct_pdo, loc );
}
if( incr > 0 ? (*pupper < *plower) : (*plower < *pupper) ) {
// The loop is illegal.
// Some zero-trip loops maintained by compiler, e.g.:
// for(i=10;i<0;++i) // lower >= upper - run-time check
// for(i=0;i>10;--i) // lower <= upper - run-time check
// for(i=0;i>10;++i) // incr > 0 - compile-time check
// for(i=10;i<0;--i) // incr < 0 - compile-time check
// Compiler does not check the following illegal loops:
// for(i=0;i<10;i+=incr) // where incr<0
// for(i=10;i>0;i-=incr) // where incr<0
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrIllegal, ct_pdo, loc );
}
}
tid = __kmp_tid_from_gtid( gtid );
th = __kmp_threads[gtid];
KMP_DEBUG_ASSERT(th->th.th_teams_microtask); // we are in the teams construct
nth = th->th.th_team_nproc;
team = th->th.th_team;
#if OMP_40_ENABLED
nteams = th->th.th_teams_size.nteams;
#endif
team_id = team->t.t_master_tid;
KMP_DEBUG_ASSERT(nteams == team->t.t_parent->t.t_nproc);
// compute global trip count
if( incr == 1 ) {
trip_count = *pupper - *plower + 1;
} else if(incr == -1) {
trip_count = *plower - *pupper + 1;
} else {
trip_count = (ST)(*pupper - *plower) / incr + 1; // cast to signed to cover incr<0 case
}
*pstride = *pupper - *plower; // just in case (can be unused)
if( trip_count <= nteams ) {
KMP_DEBUG_ASSERT(
__kmp_static == kmp_sch_static_greedy || \
__kmp_static == kmp_sch_static_balanced
); // Unknown static scheduling type.
// only masters of some teams get single iteration, other threads get nothing
if( team_id < trip_count && tid == 0 ) {
*pupper = *pupperDist = *plower = *plower + team_id * incr;
} else {
*pupperDist = *pupper;
*plower = *pupper + incr; // compiler should skip loop body
}
if( plastiter != NULL )
*plastiter = ( tid == 0 && team_id == trip_count - 1 );
} else {
// Get the team's chunk first (each team gets at most one chunk)
if( __kmp_static == kmp_sch_static_balanced ) {
register UT chunkD = trip_count / nteams;
register UT extras = trip_count % nteams;
*plower += incr * ( team_id * chunkD + ( team_id < extras ? team_id : extras ) );
*pupperDist = *plower + chunkD * incr - ( team_id < extras ? 0 : incr );
if( plastiter != NULL )
*plastiter = ( team_id == nteams - 1 );
} else {
register T chunk_inc_count =
( trip_count / nteams + ( ( trip_count % nteams ) ? 1 : 0) ) * incr;
register T upper = *pupper;
KMP_DEBUG_ASSERT( __kmp_static == kmp_sch_static_greedy );
// Unknown static scheduling type.
*plower += team_id * chunk_inc_count;
*pupperDist = *plower + chunk_inc_count - incr;
// Check/correct bounds if needed
if( incr > 0 ) {
if( *pupperDist < *plower )
*pupperDist = i_maxmin< T >::mx;
if( plastiter != NULL )
*plastiter = *plower <= upper && *pupperDist > upper - incr;
if( *pupperDist > upper )
*pupperDist = upper; // tracker C73258
if( *plower > *pupperDist ) {
*pupper = *pupperDist; // no iterations available for the team
goto end;
}
} else {
if( *pupperDist > *plower )
*pupperDist = i_maxmin< T >::mn;
if( plastiter != NULL )
*plastiter = *plower >= upper && *pupperDist < upper - incr;
if( *pupperDist < upper )
*pupperDist = upper; // tracker C73258
if( *plower < *pupperDist ) {
*pupper = *pupperDist; // no iterations available for the team
goto end;
}
}
}
// Get the parallel loop chunk now (for thread)
// compute trip count for team's chunk
if( incr == 1 ) {
trip_count = *pupperDist - *plower + 1;
} else if(incr == -1) {
trip_count = *plower - *pupperDist + 1;
} else {
trip_count = (ST)(*pupperDist - *plower) / incr + 1;
}
KMP_DEBUG_ASSERT( trip_count );
switch( schedule ) {
case kmp_sch_static:
{
if( trip_count <= nth ) {
KMP_DEBUG_ASSERT(
__kmp_static == kmp_sch_static_greedy || \
__kmp_static == kmp_sch_static_balanced
); // Unknown static scheduling type.
if( tid < trip_count )
*pupper = *plower = *plower + tid * incr;
else
*plower = *pupper + incr; // no iterations available
if( plastiter != NULL )
if( *plastiter != 0 && !( tid == trip_count - 1 ) )
*plastiter = 0;
} else {
if( __kmp_static == kmp_sch_static_balanced ) {
register UT chunkL = trip_count / nth;
register UT extras = trip_count % nth;
*plower += incr * (tid * chunkL + (tid < extras ? tid : extras));
*pupper = *plower + chunkL * incr - (tid < extras ? 0 : incr);
if( plastiter != NULL )
if( *plastiter != 0 && !( tid == nth - 1 ) )
*plastiter = 0;
} else {
register T chunk_inc_count =
( trip_count / nth + ( ( trip_count % nth ) ? 1 : 0) ) * incr;
register T upper = *pupperDist;
KMP_DEBUG_ASSERT( __kmp_static == kmp_sch_static_greedy );
// Unknown static scheduling type.
*plower += tid * chunk_inc_count;
*pupper = *plower + chunk_inc_count - incr;
if( incr > 0 ) {
if( *pupper < *plower )
*pupper = i_maxmin< T >::mx;
if( plastiter != NULL )
if( *plastiter != 0 && !(*plower <= upper && *pupper > upper - incr) )
*plastiter = 0;
if( *pupper > upper )
*pupper = upper;//tracker C73258
} else {
if( *pupper > *plower )
*pupper = i_maxmin< T >::mn;
if( plastiter != NULL )
if( *plastiter != 0 && !(*plower >= upper && *pupper < upper - incr) )
*plastiter = 0;
if( *pupper < upper )
*pupper = upper;//tracker C73258
}
}
}
break;
}
case kmp_sch_static_chunked:
{
register ST span;
if( chunk < 1 )
chunk = 1;
span = chunk * incr;
*pstride = span * nth;
*plower = *plower + (span * tid);
*pupper = *plower + span - incr;
if( plastiter != NULL )
if( *plastiter != 0 && !(tid == ((trip_count - 1) / ( UT )chunk) % nth) )
*plastiter = 0;
break;
}
default:
KMP_ASSERT2( 0, "__kmpc_dist_for_static_init: unknown loop scheduling type" );
break;
}
}
end:;
#ifdef KMP_DEBUG
{
const char * buff;
// create format specifiers before the debug output
buff = __kmp_str_format(
"__kmpc_dist_for_static_init: last=%%d lo=%%%s up=%%%s upDist=%%%s "\
"stride=%%%s signed?<%s>\n",
traits_t< T >::spec, traits_t< T >::spec, traits_t< T >::spec,
traits_t< ST >::spec, traits_t< T >::spec );
KD_TRACE(100, ( buff, *plastiter, *plower, *pupper, *pupperDist, *pstride ) );
__kmp_str_free( &buff );
}
#endif
KE_TRACE( 10, ("__kmpc_dist_for_static_init: T#%d return\n", gtid ) );
return;
}
template< typename T >
static void
__kmp_team_static_init(
ident_t *loc,
kmp_int32 gtid,
kmp_int32 *p_last,
T *p_lb,
T *p_ub,
typename traits_t< T >::signed_t *p_st,
typename traits_t< T >::signed_t incr,
typename traits_t< T >::signed_t chunk
) {
// The routine returns the first chunk distributed to the team and
// stride for next chunks calculation.
// Last iteration flag set for the team that will execute
// the last iteration of the loop.
// The routine is called for dist_schedue(static,chunk) only.
typedef typename traits_t< T >::unsigned_t UT;
typedef typename traits_t< T >::signed_t ST;
kmp_uint32 team_id;
kmp_uint32 nteams;
UT trip_count;
T lower;
T upper;
ST span;
kmp_team_t *team;
kmp_info_t *th;
KMP_DEBUG_ASSERT( p_last && p_lb && p_ub && p_st );
KE_TRACE( 10, ("__kmp_team_static_init called (%d)\n", gtid));
#ifdef KMP_DEBUG
{
const char * buff;
// create format specifiers before the debug output
buff = __kmp_str_format( "__kmp_team_static_init enter: T#%%d liter=%%d "\
"iter=(%%%s, %%%s, %%%s) chunk %%%s; signed?<%s>\n",
traits_t< T >::spec, traits_t< T >::spec, traits_t< ST >::spec,
traits_t< ST >::spec, traits_t< T >::spec );
KD_TRACE(100, ( buff, gtid, *p_last, *p_lb, *p_ub, *p_st, chunk ) );
__kmp_str_free( &buff );
}
#endif
lower = *p_lb;
upper = *p_ub;
if( __kmp_env_consistency_check ) {
if( incr == 0 ) {
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrZeroProhibited, ct_pdo, loc );
}
if( incr > 0 ? (upper < lower) : (lower < upper) ) {
// The loop is illegal.
// Some zero-trip loops maintained by compiler, e.g.:
// for(i=10;i<0;++i) // lower >= upper - run-time check
// for(i=0;i>10;--i) // lower <= upper - run-time check
// for(i=0;i>10;++i) // incr > 0 - compile-time check
// for(i=10;i<0;--i) // incr < 0 - compile-time check
// Compiler does not check the following illegal loops:
// for(i=0;i<10;i+=incr) // where incr<0
// for(i=10;i>0;i-=incr) // where incr<0
__kmp_error_construct( kmp_i18n_msg_CnsLoopIncrIllegal, ct_pdo, loc );
}
}
th = __kmp_threads[gtid];
KMP_DEBUG_ASSERT(th->th.th_teams_microtask); // we are in the teams construct
team = th->th.th_team;
#if OMP_40_ENABLED
nteams = th->th.th_teams_size.nteams;
#endif
team_id = team->t.t_master_tid;
KMP_DEBUG_ASSERT(nteams == team->t.t_parent->t.t_nproc);
// compute trip count
if( incr == 1 ) {
trip_count = upper - lower + 1;
} else if(incr == -1) {
trip_count = lower - upper + 1;
} else {
trip_count = (ST)(upper - lower) / incr + 1; // cast to signed to cover incr<0 case
}
if( chunk < 1 )
chunk = 1;
span = chunk * incr;
*p_st = span * nteams;
*p_lb = lower + (span * team_id);
*p_ub = *p_lb + span - incr;
if ( p_last != NULL )
*p_last = (team_id == ((trip_count - 1)/(UT)chunk) % nteams);
// Correct upper bound if needed
if( incr > 0 ) {
if( *p_ub < *p_lb ) // overflow?
*p_ub = i_maxmin< T >::mx;
if( *p_ub > upper )
*p_ub = upper; // tracker C73258
} else { // incr < 0
if( *p_ub > *p_lb )
*p_ub = i_maxmin< T >::mn;
if( *p_ub < upper )
*p_ub = upper; // tracker C73258
}
#ifdef KMP_DEBUG
{
const char * buff;
// create format specifiers before the debug output
buff = __kmp_str_format( "__kmp_team_static_init exit: T#%%d team%%u liter=%%d "\
"iter=(%%%s, %%%s, %%%s) chunk %%%s\n",
traits_t< T >::spec, traits_t< T >::spec, traits_t< ST >::spec,
traits_t< ST >::spec );
KD_TRACE(100, ( buff, gtid, team_id, *p_last, *p_lb, *p_ub, *p_st, chunk ) );
__kmp_str_free( &buff );
}
#endif
}
//--------------------------------------------------------------------------------------
extern "C" {
@ -310,7 +677,7 @@ Each of the four functions here are identical apart from the argument types.
The functions compute the upper and lower bounds and stride to be used for the set of iterations
to be executed by the current thread from the statically scheduled loop that is described by the
initial values of the bround, stride, increment and chunk size.
initial values of the bounds, stride, increment and chunk size.
@{
*/
@ -362,5 +729,155 @@ __kmpc_for_static_init_8u( ident_t *loc, kmp_int32 gtid, kmp_int32 schedtype, km
@}
*/
/*!
@ingroup WORK_SHARING
@param loc Source code location
@param gtid Global thread id of this thread
@param scheduleD Scheduling type for the distribute
@param scheduleL Scheduling type for the parallel loop
@param plastiter Pointer to the "last iteration" flag
@param plower Pointer to the lower bound
@param pupper Pointer to the upper bound of loop chunk
@param pupperD Pointer to the upper bound of dist_chunk
@param pstrideD Pointer to the stride for distribute
@param pstrideL Pointer to the stride for parallel loop
@param incr Loop increment
@param chunkD The chunk size for the distribute
@param chunkL The chunk size for the parallel loop
Each of the four functions here are identical apart from the argument types.
The functions compute the upper and lower bounds and strides to be used for the set of iterations
to be executed by the current thread from the statically scheduled loop that is described by the
initial values of the bounds, strides, increment and chunks for parallel loop and distribute
constructs.
@{
*/
void
__kmpc_dist_for_static_init_4(
ident_t *loc, kmp_int32 gtid, kmp_int32 schedule, kmp_int32 *plastiter,
kmp_int32 *plower, kmp_int32 *pupper, kmp_int32 *pupperD,
kmp_int32 *pstride, kmp_int32 incr, kmp_int32 chunk )
{
__kmp_dist_for_static_init< kmp_int32 >(
loc, gtid, schedule, plastiter, plower, pupper, pupperD, pstride, incr, chunk );
}
/*!
See @ref __kmpc_dist_for_static_init_4
*/
void
__kmpc_dist_for_static_init_4u(
ident_t *loc, kmp_int32 gtid, kmp_int32 schedule, kmp_int32 *plastiter,
kmp_uint32 *plower, kmp_uint32 *pupper, kmp_uint32 *pupperD,
kmp_int32 *pstride, kmp_int32 incr, kmp_int32 chunk )
{
__kmp_dist_for_static_init< kmp_uint32 >(
loc, gtid, schedule, plastiter, plower, pupper, pupperD, pstride, incr, chunk );
}
/*!
See @ref __kmpc_dist_for_static_init_4
*/
void
__kmpc_dist_for_static_init_8(
ident_t *loc, kmp_int32 gtid, kmp_int32 schedule, kmp_int32 *plastiter,
kmp_int64 *plower, kmp_int64 *pupper, kmp_int64 *pupperD,
kmp_int64 *pstride, kmp_int64 incr, kmp_int64 chunk )
{
__kmp_dist_for_static_init< kmp_int64 >(
loc, gtid, schedule, plastiter, plower, pupper, pupperD, pstride, incr, chunk );
}
/*!
See @ref __kmpc_dist_for_static_init_4
*/
void
__kmpc_dist_for_static_init_8u(
ident_t *loc, kmp_int32 gtid, kmp_int32 schedule, kmp_int32 *plastiter,
kmp_uint64 *plower, kmp_uint64 *pupper, kmp_uint64 *pupperD,
kmp_int64 *pstride, kmp_int64 incr, kmp_int64 chunk )
{
__kmp_dist_for_static_init< kmp_uint64 >(
loc, gtid, schedule, plastiter, plower, pupper, pupperD, pstride, incr, chunk );
}
/*!
@}
*/
//-----------------------------------------------------------------------------------------
// Auxiliary routines for Distribute Parallel Loop construct implementation
// Transfer call to template< type T >
// __kmp_team_static_init( ident_t *loc, int gtid,
// int *p_last, T *lb, T *ub, ST *st, ST incr, ST chunk )
/*!
@ingroup WORK_SHARING
@{
@param loc Source location
@param gtid Global thread id
@param p_last pointer to last iteration flag
@param p_lb pointer to Lower bound
@param p_ub pointer to Upper bound
@param p_st Step (or increment if you prefer)
@param incr Loop increment
@param chunk The chunk size to block with
The functions compute the upper and lower bounds and stride to be used for the set of iterations
to be executed by the current team from the statically scheduled loop that is described by the
initial values of the bounds, stride, increment and chunk for the distribute construct as part of
composite distribute parallel loop construct.
These functions are all identical apart from the types of the arguments.
*/
void
__kmpc_team_static_init_4(
ident_t *loc, kmp_int32 gtid, kmp_int32 *p_last,
kmp_int32 *p_lb, kmp_int32 *p_ub, kmp_int32 *p_st, kmp_int32 incr, kmp_int32 chunk )
{
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_team_static_init< kmp_int32 >( loc, gtid, p_last, p_lb, p_ub, p_st, incr, chunk );
}
/*!
See @ref __kmpc_team_static_init_4
*/
void
__kmpc_team_static_init_4u(
ident_t *loc, kmp_int32 gtid, kmp_int32 *p_last,
kmp_uint32 *p_lb, kmp_uint32 *p_ub, kmp_int32 *p_st, kmp_int32 incr, kmp_int32 chunk )
{
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_team_static_init< kmp_uint32 >( loc, gtid, p_last, p_lb, p_ub, p_st, incr, chunk );
}
/*!
See @ref __kmpc_team_static_init_4
*/
void
__kmpc_team_static_init_8(
ident_t *loc, kmp_int32 gtid, kmp_int32 *p_last,
kmp_int64 *p_lb, kmp_int64 *p_ub, kmp_int64 *p_st, kmp_int64 incr, kmp_int64 chunk )
{
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_team_static_init< kmp_int64 >( loc, gtid, p_last, p_lb, p_ub, p_st, incr, chunk );
}
/*!
See @ref __kmpc_team_static_init_4
*/
void
__kmpc_team_static_init_8u(
ident_t *loc, kmp_int32 gtid, kmp_int32 *p_last,
kmp_uint64 *p_lb, kmp_uint64 *p_ub, kmp_int64 *p_st, kmp_int64 incr, kmp_int64 chunk )
{
KMP_DEBUG_ASSERT( __kmp_init_serial );
__kmp_team_static_init< kmp_uint64 >( loc, gtid, p_last, p_lb, p_ub, p_st, incr, chunk );
}
/*!
@}
*/
} // extern "C"

View File

@ -1,7 +1,7 @@
/*
* kmp_settings.c -- Initialize environment variables
* $Revision: 42816 $
* $Date: 2013-11-11 15:33:37 -0600 (Mon, 11 Nov 2013) $
* $Revision: 43473 $
* $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
*/
@ -534,9 +534,9 @@ __kmp_stg_parse_file(
* out = __kmp_str_format( "%s", buffer );
} // __kmp_stg_parse_file
#ifdef KMP_DEBUG
static char * par_range_to_print = NULL;
#ifdef KMP_DEBUG
static void
__kmp_stg_parse_par_range(
char const * name,
@ -944,6 +944,26 @@ __kmp_stg_print_settings( kmp_str_buf_t * buffer, char const * name, void * data
__kmp_stg_print_bool( buffer, name, __kmp_settings );
} // __kmp_stg_print_settings
// -------------------------------------------------------------------------------------------------
// KMP_STACKPAD
// -------------------------------------------------------------------------------------------------
static void
__kmp_stg_parse_stackpad( char const * name, char const * value, void * data ) {
__kmp_stg_parse_int(
name, // Env var name
value, // Env var value
KMP_MIN_STKPADDING, // Min value
KMP_MAX_STKPADDING, // Max value
& __kmp_stkpadding // Var to initialize
);
} // __kmp_stg_parse_stackpad
static void
__kmp_stg_print_stackpad( kmp_str_buf_t * buffer, char const * name, void * data ) {
__kmp_stg_print_int( buffer, name, __kmp_stkpadding );
} // __kmp_stg_print_stackpad
// -------------------------------------------------------------------------------------------------
// KMP_STACKOFFSET
// -------------------------------------------------------------------------------------------------
@ -1229,7 +1249,6 @@ __kmp_stg_print_num_threads( kmp_str_buf_t * buffer, char const * name, void * d
// OpenMP 3.0: KMP_TASKING, OMP_MAX_ACTIVE_LEVELS,
// -------------------------------------------------------------------------------------------------
#if OMP_30_ENABLED
static void
__kmp_stg_parse_tasking( char const * name, char const * value, void * data ) {
__kmp_stg_parse_int( name, value, 0, (int)tskm_max, (int *)&__kmp_tasking_mode );
@ -1259,7 +1278,41 @@ static void
__kmp_stg_print_max_active_levels( kmp_str_buf_t * buffer, char const * name, void * data ) {
__kmp_stg_print_int( buffer, name, __kmp_dflt_max_active_levels );
} // __kmp_stg_print_max_active_levels
#endif // OMP_30_ENABLED
#if KMP_NESTED_HOT_TEAMS
// -------------------------------------------------------------------------------------------------
// KMP_HOT_TEAMS_MAX_LEVEL, KMP_HOT_TEAMS_MODE
// -------------------------------------------------------------------------------------------------
static void
__kmp_stg_parse_hot_teams_level( char const * name, char const * value, void * data ) {
if ( TCR_4(__kmp_init_parallel) ) {
KMP_WARNING( EnvParallelWarn, name );
return;
} // read value before first parallel only
__kmp_stg_parse_int( name, value, 0, KMP_MAX_ACTIVE_LEVELS_LIMIT, & __kmp_hot_teams_max_level );
} // __kmp_stg_parse_hot_teams_level
static void
__kmp_stg_print_hot_teams_level( kmp_str_buf_t * buffer, char const * name, void * data ) {
__kmp_stg_print_int( buffer, name, __kmp_hot_teams_max_level );
} // __kmp_stg_print_hot_teams_level
static void
__kmp_stg_parse_hot_teams_mode( char const * name, char const * value, void * data ) {
if ( TCR_4(__kmp_init_parallel) ) {
KMP_WARNING( EnvParallelWarn, name );
return;
} // read value before first parallel only
__kmp_stg_parse_int( name, value, 0, KMP_MAX_ACTIVE_LEVELS_LIMIT, & __kmp_hot_teams_mode );
} // __kmp_stg_parse_hot_teams_mode
static void
__kmp_stg_print_hot_teams_mode( kmp_str_buf_t * buffer, char const * name, void * data ) {
__kmp_stg_print_int( buffer, name, __kmp_hot_teams_mode );
} // __kmp_stg_print_hot_teams_mode
#endif // KMP_NESTED_HOT_TEAMS
// -------------------------------------------------------------------------------------------------
// KMP_HANDLE_SIGNALS
@ -1438,12 +1491,10 @@ __kmp_stg_parse_barrier_branch_bit( char const * name, char const * value, void
const char *var;
/* ---------- Barrier branch bit control ------------ */
for ( int i=bs_plain_barrier; i<bs_last_barrier; i++ ) {
var = __kmp_barrier_branch_bit_env_name[ i ];
if ( ( strcmp( var, name) == 0 ) && ( value != 0 ) ) {
char *comma;
char *comma;
comma = (char *) strchr( value, ',' );
__kmp_barrier_gather_branch_bits[ i ] = ( kmp_uint32 ) __kmp_str_to_int( value, ',' );
@ -1455,7 +1506,6 @@ __kmp_stg_parse_barrier_branch_bit( char const * name, char const * value, void
if ( __kmp_barrier_release_branch_bits[ i ] > KMP_MAX_BRANCH_BITS ) {
__kmp_msg( kmp_ms_warning, KMP_MSG( BarrReleaseValueInvalid, name, comma + 1 ), __kmp_msg_null );
__kmp_barrier_release_branch_bits[ i ] = __kmp_barrier_release_bb_dflt;
}
}
@ -2037,11 +2087,6 @@ __kmp_parse_affinity_env( char const * name, char const * value,
# if OMP_40_ENABLED
KMP_DEBUG_ASSERT( ( __kmp_nested_proc_bind.bind_types != NULL )
&& ( __kmp_nested_proc_bind.used > 0 ) );
if ( ( __kmp_affinity_notype != NULL )
&& ( ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_default )
|| ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_intel ) ) ) {
type = TRUE;
}
# endif
while ( *buf != '\0' ) {
@ -2049,29 +2094,53 @@ __kmp_parse_affinity_env( char const * name, char const * value,
if (__kmp_match_str("none", buf, (const char **)&next)) {
set_type( affinity_none );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_false;
# endif
buf = next;
} else if (__kmp_match_str("scatter", buf, (const char **)&next)) {
set_type( affinity_scatter );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
buf = next;
} else if (__kmp_match_str("compact", buf, (const char **)&next)) {
set_type( affinity_compact );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
buf = next;
} else if (__kmp_match_str("logical", buf, (const char **)&next)) {
set_type( affinity_logical );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
buf = next;
} else if (__kmp_match_str("physical", buf, (const char **)&next)) {
set_type( affinity_physical );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
buf = next;
} else if (__kmp_match_str("explicit", buf, (const char **)&next)) {
set_type( affinity_explicit );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
buf = next;
# if KMP_MIC
} else if (__kmp_match_str("balanced", buf, (const char **)&next)) {
set_type( affinity_balanced );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
buf = next;
# endif
} else if (__kmp_match_str("disabled", buf, (const char **)&next)) {
set_type( affinity_disabled );
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_false;
# endif
buf = next;
} else if (__kmp_match_str("verbose", buf, (const char **)&next)) {
set_verbose( TRUE );
@ -2451,6 +2520,9 @@ __kmp_stg_parse_gomp_cpu_affinity( char const * name, char const * value, void *
__kmp_affinity_proclist = temp_proclist;
__kmp_affinity_type = affinity_explicit;
__kmp_affinity_gran = affinity_gran_fine;
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
}
else {
KMP_WARNING( AffSyntaxError, name );
@ -2772,6 +2844,21 @@ __kmp_stg_parse_places( char const * name, char const * value, void * data )
const char *scan = value;
const char *next = scan;
const char *kind = "\"threads\"";
kmp_setting_t **rivals = (kmp_setting_t **) data;
int rc;
rc = __kmp_stg_check_rivals( name, value, rivals );
if ( rc ) {
return;
}
//
// If OMP_PROC_BIND is not specified but OMP_PLACES is,
// then let OMP_PROC_BIND default to true.
//
if ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_default ) {
__kmp_nested_proc_bind.bind_types[0] = proc_bind_true;
}
//__kmp_affinity_num_places = 0;
@ -2805,10 +2892,17 @@ __kmp_stg_parse_places( char const * name, char const * value, void * data )
__kmp_affinity_type = affinity_explicit;
__kmp_affinity_gran = affinity_gran_fine;
__kmp_affinity_dups = FALSE;
if ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_default ) {
__kmp_nested_proc_bind.bind_types[0] = proc_bind_true;
}
}
return;
}
if ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_default ) {
__kmp_nested_proc_bind.bind_types[0] = proc_bind_true;
}
SKIP_WS(scan);
if ( *scan == '\0' ) {
return;
@ -2855,8 +2949,7 @@ __kmp_stg_print_places( kmp_str_buf_t * buffer, char const * name,
}
if ( ( __kmp_nested_proc_bind.used == 0 )
|| ( __kmp_nested_proc_bind.bind_types == NULL )
|| ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_false )
|| ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_intel ) ) {
|| ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_false ) ) {
__kmp_str_buf_print( buffer, ": %s\n", KMP_I18N_STR( NotDefined ) );
}
else if ( __kmp_affinity_type == affinity_explicit ) {
@ -2913,7 +3006,7 @@ __kmp_stg_print_places( kmp_str_buf_t * buffer, char const * name,
# endif /* OMP_40_ENABLED */
# if OMP_30_ENABLED && (! OMP_40_ENABLED)
# if (! OMP_40_ENABLED)
static void
__kmp_stg_parse_proc_bind( char const * name, char const * value, void * data )
@ -2943,7 +3036,7 @@ __kmp_stg_parse_proc_bind( char const * name, char const * value, void * data )
}
} // __kmp_parse_proc_bind
# endif /* if OMP_30_ENABLED && (! OMP_40_ENABLED) */
# endif /* if (! OMP_40_ENABLED) */
static void
@ -3132,11 +3225,7 @@ __kmp_stg_parse_proc_bind( char const * name, char const * value, void * data )
buf = next;
SKIP_WS( buf );
__kmp_nested_proc_bind.used = 1;
//
// "true" currently maps to "spread"
//
__kmp_nested_proc_bind.bind_types[0] = proc_bind_spread;
__kmp_nested_proc_bind.bind_types[0] = proc_bind_true;
}
else {
//
@ -3454,7 +3543,7 @@ __kmp_stg_parse_schedule( char const * name, char const * value, void * data ) {
KMP_WARNING( InvalidClause, name, value );
} else
KMP_WARNING( EmptyClause, name );
} while ( value = semicolon ? semicolon + 1 : NULL );
} while ( (value = semicolon ? semicolon + 1 : NULL) );
}
}; // if
@ -3499,7 +3588,6 @@ __kmp_stg_parse_omp_schedule( char const * name, char const * value, void * data
else if (!__kmp_strcasecmp_with_sentinel("guided", value, ',')) /* GUIDED */
__kmp_sched = kmp_sch_guided_chunked;
// AC: TODO: add AUTO schedule, and pprobably remove TRAPEZOIDAL (OMP 3.0 does not allow it)
#if OMP_30_ENABLED
else if (!__kmp_strcasecmp_with_sentinel("auto", value, ',')) { /* AUTO */
__kmp_sched = kmp_sch_auto;
if( comma ) {
@ -3507,7 +3595,6 @@ __kmp_stg_parse_omp_schedule( char const * name, char const * value, void * data
comma = NULL;
}
}
#endif // OMP_30_ENABLED
else if (!__kmp_strcasecmp_with_sentinel("trapezoidal", value, ',')) /* TRAPEZOIDAL */
__kmp_sched = kmp_sch_trapezoidal;
else if (!__kmp_strcasecmp_with_sentinel("static", value, ',')) /* STATIC */
@ -4016,7 +4103,7 @@ __kmp_stg_parse_adaptive_lock_props( const char *name, const char *value, void *
break;
}
// Next character is not an integer or not a comma OR number of values > 2 => end of list
if ( ( ( *next < '0' ) || ( *next > '9' ) ) && ( *next !=',') || ( total > 2 ) ) {
if ( ( ( *next < '0' || *next > '9' ) && *next !=',' ) || total > 2 ) {
KMP_WARNING( EnvSyntaxError, name, value );
return;
}
@ -4314,6 +4401,10 @@ __kmp_stg_print_omp_display_env( kmp_str_buf_t * buffer, char const * name, void
static void
__kmp_stg_parse_omp_cancellation( char const * name, char const * value, void * data ) {
if ( TCR_4(__kmp_init_parallel) ) {
KMP_WARNING( EnvParallelWarn, name );
return;
} // read value before first parallel only
__kmp_stg_parse_bool( name, value, & __kmp_omp_cancellation );
} // __kmp_stg_parse_omp_cancellation
@ -4340,6 +4431,7 @@ static kmp_setting_t __kmp_stg_table[] = {
{ "KMP_SETTINGS", __kmp_stg_parse_settings, __kmp_stg_print_settings, NULL, 0, 0 },
{ "KMP_STACKOFFSET", __kmp_stg_parse_stackoffset, __kmp_stg_print_stackoffset, NULL, 0, 0 },
{ "KMP_STACKSIZE", __kmp_stg_parse_stacksize, __kmp_stg_print_stacksize, NULL, 0, 0 },
{ "KMP_STACKPAD", __kmp_stg_parse_stackpad, __kmp_stg_print_stackpad, NULL, 0, 0 },
{ "KMP_VERSION", __kmp_stg_parse_version, __kmp_stg_print_version, NULL, 0, 0 },
{ "KMP_WARNINGS", __kmp_stg_parse_warnings, __kmp_stg_print_warnings, NULL, 0, 0 },
@ -4347,13 +4439,15 @@ static kmp_setting_t __kmp_stg_table[] = {
{ "OMP_NUM_THREADS", __kmp_stg_parse_num_threads, __kmp_stg_print_num_threads, NULL, 0, 0 },
{ "OMP_STACKSIZE", __kmp_stg_parse_stacksize, __kmp_stg_print_stacksize, NULL, 0, 0 },
#if OMP_30_ENABLED
{ "KMP_TASKING", __kmp_stg_parse_tasking, __kmp_stg_print_tasking, NULL, 0, 0 },
{ "KMP_TASK_STEALING_CONSTRAINT", __kmp_stg_parse_task_stealing, __kmp_stg_print_task_stealing, NULL, 0, 0 },
{ "OMP_MAX_ACTIVE_LEVELS", __kmp_stg_parse_max_active_levels, __kmp_stg_print_max_active_levels, NULL, 0, 0 },
{ "OMP_THREAD_LIMIT", __kmp_stg_parse_all_threads, __kmp_stg_print_all_threads, NULL, 0, 0 },
{ "OMP_WAIT_POLICY", __kmp_stg_parse_wait_policy, __kmp_stg_print_wait_policy, NULL, 0, 0 },
#endif // OMP_30_ENABLED
#if KMP_NESTED_HOT_TEAMS
{ "KMP_HOT_TEAMS_MAX_LEVEL", __kmp_stg_parse_hot_teams_level, __kmp_stg_print_hot_teams_level, NULL, 0, 0 },
{ "KMP_HOT_TEAMS_MODE", __kmp_stg_parse_hot_teams_mode, __kmp_stg_print_hot_teams_mode, NULL, 0, 0 },
#endif // KMP_NESTED_HOT_TEAMS
#if KMP_HANDLE_SIGNALS
{ "KMP_HANDLE_SIGNALS", __kmp_stg_parse_handle_signals, __kmp_stg_print_handle_signals, NULL, 0, 0 },
@ -4411,18 +4505,16 @@ static kmp_setting_t __kmp_stg_table[] = {
# ifdef KMP_GOMP_COMPAT
{ "GOMP_CPU_AFFINITY", __kmp_stg_parse_gomp_cpu_affinity, NULL, /* no print */ NULL, 0, 0 },
# endif /* KMP_GOMP_COMPAT */
# if OMP_30_ENABLED
# if OMP_40_ENABLED
# if OMP_40_ENABLED
{ "OMP_PROC_BIND", __kmp_stg_parse_proc_bind, __kmp_stg_print_proc_bind, NULL, 0, 0 },
{ "OMP_PLACES", __kmp_stg_parse_places, __kmp_stg_print_places, NULL, 0, 0 },
# else
# else
{ "OMP_PROC_BIND", __kmp_stg_parse_proc_bind, NULL, /* no print */ NULL, 0, 0 },
# endif /* OMP_40_ENABLED */
# endif /* OMP_30_ENABLED */
# endif /* OMP_40_ENABLED */
{ "KMP_TOPOLOGY_METHOD", __kmp_stg_parse_topology_method, __kmp_stg_print_topology_method, NULL, 0, 0 },
#elif !KMP_AFFINITY_SUPPORTED
#else
//
// KMP_AFFINITY is not supported on OS X*, nor is OMP_PLACES.
@ -4432,8 +4524,6 @@ static kmp_setting_t __kmp_stg_table[] = {
{ "OMP_PROC_BIND", __kmp_stg_parse_proc_bind, __kmp_stg_print_proc_bind, NULL, 0, 0 },
# endif
#else
#error "Unknown or unsupported OS"
#endif // KMP_AFFINITY_SUPPORTED
{ "KMP_INIT_AT_FORK", __kmp_stg_parse_init_at_fork, __kmp_stg_print_init_at_fork, NULL, 0, 0 },
@ -4571,7 +4661,6 @@ __kmp_stg_init( void
}
#if OMP_30_ENABLED
{ // Initialize KMP_LIBRARY and OMP_WAIT_POLICY data.
kmp_setting_t * kmp_library = __kmp_stg_find( "KMP_LIBRARY" ); // 1st priority.
@ -4595,21 +4684,12 @@ __kmp_stg_init( void
}; // if
}
#else
{
kmp_setting_t * kmp_library = __kmp_stg_find( "KMP_LIBRARY" );
static kmp_stg_wp_data_t kmp_data = { 0, NULL };
kmp_library->data = & kmp_data;
}
#endif /* OMP_30_ENABLED */
{ // Initialize KMP_ALL_THREADS, KMP_MAX_THREADS, and OMP_THREAD_LIMIT data.
kmp_setting_t * kmp_all_threads = __kmp_stg_find( "KMP_ALL_THREADS" ); // 1st priority.
kmp_setting_t * kmp_max_threads = __kmp_stg_find( "KMP_MAX_THREADS" ); // 2nd priority.
#if OMP_30_ENABLED
kmp_setting_t * omp_thread_limit = __kmp_stg_find( "OMP_THREAD_LIMIT" ); // 3rd priority.
#endif
// !!! volatile keyword is Intel (R) C Compiler bug CQ49908 workaround.
static kmp_setting_t * volatile rivals[ 4 ];
@ -4617,20 +4697,16 @@ __kmp_stg_init( void
rivals[ i ++ ] = kmp_all_threads;
rivals[ i ++ ] = kmp_max_threads;
#if OMP_30_ENABLED
if ( omp_thread_limit != NULL ) {
rivals[ i ++ ] = omp_thread_limit;
}; // if
#endif
rivals[ i ++ ] = NULL;
kmp_all_threads->data = (void*)& rivals;
kmp_max_threads->data = (void*)& rivals;
#if OMP_30_ENABLED
if ( omp_thread_limit != NULL ) {
omp_thread_limit->data = (void*)& rivals;
}; // if
#endif
}
@ -4645,18 +4721,11 @@ __kmp_stg_init( void
KMP_DEBUG_ASSERT( gomp_cpu_affinity != NULL );
# endif
# if OMP_30_ENABLED
kmp_setting_t * omp_proc_bind = __kmp_stg_find( "OMP_PROC_BIND" ); // 3rd priority.
KMP_DEBUG_ASSERT( omp_proc_bind != NULL );
# endif
# if OMP_40_ENABLED
kmp_setting_t * omp_places = __kmp_stg_find( "OMP_PLACES" ); // 3rd priority.
KMP_DEBUG_ASSERT( omp_places != NULL );
# endif
// !!! volatile keyword is Intel (R) C Compiler bug CQ49908 workaround.
static kmp_setting_t * volatile rivals[ 5 ];
static kmp_setting_t * volatile rivals[ 4 ];
int i = 0;
rivals[ i ++ ] = kmp_affinity;
@ -4666,23 +4735,30 @@ __kmp_stg_init( void
gomp_cpu_affinity->data = (void*)& rivals;
# endif
# if OMP_30_ENABLED
rivals[ i ++ ] = omp_proc_bind;
omp_proc_bind->data = (void*)& rivals;
# endif
rivals[ i ++ ] = NULL;
# if OMP_40_ENABLED
rivals[ i ++ ] = omp_places;
omp_places->data = (void*)& rivals;
static kmp_setting_t * volatile places_rivals[ 4 ];
i = 0;
kmp_setting_t * omp_places = __kmp_stg_find( "OMP_PLACES" ); // 3rd priority.
KMP_DEBUG_ASSERT( omp_places != NULL );
places_rivals[ i ++ ] = kmp_affinity;
# ifdef KMP_GOMP_COMPAT
places_rivals[ i ++ ] = gomp_cpu_affinity;
# endif
places_rivals[ i ++ ] = omp_places;
omp_places->data = (void*)& places_rivals;
places_rivals[ i ++ ] = NULL;
# endif
rivals[ i ++ ] = NULL;
}
#else
// KMP_AFFINITY not supported, so OMP_PROC_BIND has no rivals.
// OMP_PLACES not supported yet.
#endif
#endif // KMP_AFFINITY_SUPPORTED
{ // Initialize KMP_DETERMINISTIC_REDUCTION and KMP_FORCE_REDUCTION data.
@ -4917,8 +4993,33 @@ __kmp_env_initialize( char const * string ) {
&& ( FIND( aff_str, "disabled" ) == NULL ) ) {
__kmp_affinity_notype = __kmp_stg_find( "KMP_AFFINITY" );
}
else {
//
// A new affinity type is specified.
// Reset the affinity flags to their default values,
// in case this is called from kmp_set_defaults().
//
__kmp_affinity_type = affinity_default;
__kmp_affinity_gran = affinity_gran_default;
__kmp_affinity_top_method = affinity_top_method_default;
__kmp_affinity_respect_mask = affinity_respect_mask_default;
}
# undef FIND
#if OMP_40_ENABLED
//
// Also reset the affinity flags if OMP_PROC_BIND is specified.
//
aff_str = __kmp_env_blk_var( & block, "OMP_PROC_BIND" );
if ( aff_str != NULL ) {
__kmp_affinity_type = affinity_default;
__kmp_affinity_gran = affinity_gran_default;
__kmp_affinity_top_method = affinity_top_method_default;
__kmp_affinity_respect_mask = affinity_respect_mask_default;
}
#endif /* OMP_40_ENABLED */
}
#endif /* KMP_AFFINITY_SUPPORTED */
#if OMP_40_ENABLED
@ -4956,9 +5057,15 @@ __kmp_env_initialize( char const * string ) {
else {
KMP_DEBUG_ASSERT( string != NULL); // kmp_set_defaults() was called
KMP_DEBUG_ASSERT( __kmp_user_lock_kind != lk_default );
__kmp_set_user_lock_vptrs( __kmp_user_lock_kind );
// Binds lock functions again to follow the transition between different
// KMP_CONSISTENCY_CHECK values. Calling this again is harmless as long
// as we do not allow lock kind changes after making a call to any
// user lock functions (true).
}
#if KMP_AFFINITY_SUPPORTED
if ( ! TCR_4(__kmp_init_middle) ) {
//
// Determine if the machine/OS is actually capable of supporting
@ -4984,102 +5091,87 @@ __kmp_env_initialize( char const * string ) {
}
# if OMP_40_ENABLED
if ( __kmp_affinity_type == affinity_disabled ) {
__kmp_nested_proc_bind.bind_types[0] = proc_bind_disabled;
}
else if ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_default ) {
else if ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_true ) {
//
// Where supported the default is to use the KMP_AFFINITY
// mechanism. On OS X* etc. it is none.
// OMP_PROC_BIND=true maps to OMP_PROC_BIND=spread.
//
# if KMP_AFFINITY_SUPPORTED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# else
__kmp_nested_proc_bind.bind_types[0] = proc_bind_false;
# endif
__kmp_nested_proc_bind.bind_types[0] = proc_bind_spread;
}
//
// If OMP_PROC_BIND was specified (so we are using OpenMP 4.0 affinity)
// but OMP_PLACES was not, then it defaults to the equivalent of
// KMP_AFFINITY=compact,noduplicates,granularity=fine.
//
if ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_intel ) {
if ( ( __kmp_affinity_type == affinity_none )
# if ! KMP_MIC
|| ( __kmp_affinity_type == affinity_default )
# endif
) {
__kmp_nested_proc_bind.bind_types[0] = proc_bind_false;
}
}
else if ( ( __kmp_nested_proc_bind.bind_types[0] != proc_bind_false )
&& ( __kmp_nested_proc_bind.bind_types[0] != proc_bind_disabled ) ) {
if ( __kmp_affinity_type == affinity_default ) {
__kmp_affinity_type = affinity_compact;
__kmp_affinity_dups = FALSE;
}
if ( __kmp_affinity_gran == affinity_gran_default ) {
__kmp_affinity_gran = affinity_gran_fine;
}
}
# endif // OMP_40_ENABLED
# endif /* OMP_40_ENABLED */
if ( KMP_AFFINITY_CAPABLE() ) {
# if KMP_OS_WINDOWS && KMP_ARCH_X86_64
if ( __kmp_num_proc_groups > 1 ) {
//
// Handle the Win 64 group affinity stuff if there are multiple
// processor groups, or if the user requested it, and OMP 4.0
// affinity is not in effect.
//
if ( ( ( __kmp_num_proc_groups > 1 )
&& ( __kmp_affinity_type == affinity_default )
# if OMP_40_ENABLED
&& ( __kmp_nested_proc_bind.bind_types[0] == proc_bind_default ) )
# endif
|| ( __kmp_affinity_top_method == affinity_top_method_group ) ) {
if ( __kmp_affinity_respect_mask == affinity_respect_mask_default ) {
__kmp_affinity_respect_mask = FALSE;
__kmp_affinity_respect_mask = FALSE;
}
if ( ( __kmp_affinity_type == affinity_default )
|| ( __kmp_affinity_type == affinity_none ) ) {
if ( __kmp_affinity_type == affinity_none ) {
if ( __kmp_affinity_verbose || ( __kmp_affinity_warnings
&& ( __kmp_affinity_type != affinity_none ) ) ) {
KMP_WARNING( AffTypeCantUseMultGroups, "none", "compact" );
}
}
if ( __kmp_affinity_type == affinity_default ) {
__kmp_affinity_type = affinity_compact;
if ( __kmp_affinity_top_method == affinity_top_method_default ) {
__kmp_affinity_top_method = affinity_top_method_group;
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
}
if ( __kmp_affinity_top_method == affinity_top_method_default ) {
if ( __kmp_affinity_gran == affinity_gran_default ) {
__kmp_affinity_top_method = affinity_top_method_group;
__kmp_affinity_gran = affinity_gran_group;
}
else if ( __kmp_affinity_gran == affinity_gran_group ) {
__kmp_affinity_top_method = affinity_top_method_group;
}
else {
__kmp_affinity_top_method = affinity_top_method_all;
}
}
else if ( __kmp_affinity_top_method == affinity_top_method_default ) {
__kmp_affinity_top_method = affinity_top_method_all;
}
if ( __kmp_affinity_gran_levels < 0 ) {
if ( __kmp_affinity_top_method == affinity_top_method_group ) {
if ( __kmp_affinity_gran == affinity_gran_default ) {
__kmp_affinity_gran = affinity_gran_group;
}
else if ( __kmp_affinity_gran == affinity_gran_core ) {
if ( __kmp_affinity_verbose || ( __kmp_affinity_warnings
&& ( __kmp_affinity_type != affinity_none ) ) ) {
KMP_WARNING( AffGranCantUseMultGroups, "core", "thread" );
}
__kmp_affinity_gran = affinity_gran_thread;
}
else if ( __kmp_affinity_gran == affinity_gran_package ) {
if ( __kmp_affinity_verbose || ( __kmp_affinity_warnings
&& ( __kmp_affinity_type != affinity_none ) ) ) {
KMP_WARNING( AffGranCantUseMultGroups, "package", "group" );
}
__kmp_affinity_gran = affinity_gran_group;
}
else if ( __kmp_affinity_gran == affinity_gran_node ) {
if ( __kmp_affinity_verbose || ( __kmp_affinity_warnings
&& ( __kmp_affinity_type != affinity_none ) ) ) {
KMP_WARNING( AffGranCantUseMultGroups, "node", "group" );
}
__kmp_affinity_gran = affinity_gran_group;
}
else if ( __kmp_affinity_top_method == affinity_top_method_group ) {
if ( __kmp_affinity_gran == affinity_gran_default ) {
__kmp_affinity_gran = affinity_gran_group;
}
else if ( __kmp_affinity_gran == affinity_gran_default ) {
else if ( ( __kmp_affinity_gran != affinity_gran_group )
&& ( __kmp_affinity_gran != affinity_gran_fine )
&& ( __kmp_affinity_gran != affinity_gran_thread ) ) {
char *str = NULL;
switch ( __kmp_affinity_gran ) {
case affinity_gran_core: str = "core"; break;
case affinity_gran_package: str = "package"; break;
case affinity_gran_node: str = "node"; break;
default: KMP_DEBUG_ASSERT( 0 );
}
KMP_WARNING( AffGranTopGroup, var, str );
__kmp_affinity_gran = affinity_gran_fine;
}
}
else {
if ( __kmp_affinity_gran == affinity_gran_default ) {
__kmp_affinity_gran = affinity_gran_core;
}
else if ( __kmp_affinity_gran == affinity_gran_group ) {
char *str = NULL;
switch ( __kmp_affinity_type ) {
case affinity_physical: str = "physical"; break;
case affinity_logical: str = "logical"; break;
case affinity_compact: str = "compact"; break;
case affinity_scatter: str = "scatter"; break;
case affinity_explicit: str = "explicit"; break;
// No MIC on windows, so no affinity_balanced case
default: KMP_DEBUG_ASSERT( 0 );
}
KMP_WARNING( AffGranGroupType, var, str );
__kmp_affinity_gran = affinity_gran_core;
}
}
@ -5087,27 +5179,52 @@ __kmp_env_initialize( char const * string ) {
else
# endif /* KMP_OS_WINDOWS && KMP_ARCH_X86_64 */
{
if ( __kmp_affinity_respect_mask == affinity_respect_mask_default ) {
__kmp_affinity_respect_mask = TRUE;
# if KMP_OS_WINDOWS && KMP_ARCH_X86_64
if ( __kmp_num_proc_groups > 1 ) {
__kmp_affinity_respect_mask = FALSE;
}
else
# endif /* KMP_OS_WINDOWS && KMP_ARCH_X86_64 */
{
__kmp_affinity_respect_mask = TRUE;
}
}
# if OMP_40_ENABLED
if ( ( __kmp_nested_proc_bind.bind_types[0] != proc_bind_intel )
&& ( __kmp_nested_proc_bind.bind_types[0] != proc_bind_default ) ) {
if ( __kmp_affinity_type == affinity_default ) {
__kmp_affinity_type = affinity_compact;
__kmp_affinity_dups = FALSE;
}
}
else
# endif /* OMP_40_ENABLED */
if ( __kmp_affinity_type == affinity_default ) {
# if KMP_MIC
__kmp_affinity_type = affinity_scatter;
__kmp_affinity_type = affinity_scatter;
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_intel;
# endif
# else
__kmp_affinity_type = affinity_none;
__kmp_affinity_type = affinity_none;
# if OMP_40_ENABLED
__kmp_nested_proc_bind.bind_types[0] = proc_bind_false;
# endif
# endif
}
if ( ( __kmp_affinity_gran == affinity_gran_default )
&& ( __kmp_affinity_gran_levels < 0 ) ) {
# if KMP_MIC
__kmp_affinity_gran = affinity_gran_fine;
__kmp_affinity_gran = affinity_gran_fine;
# else
__kmp_affinity_gran = affinity_gran_core;
__kmp_affinity_gran = affinity_gran_core;
# endif
}
if ( __kmp_affinity_top_method == affinity_top_method_default ) {
__kmp_affinity_top_method = affinity_top_method_all;
__kmp_affinity_top_method = affinity_top_method_all;
}
}
}
@ -5164,9 +5281,8 @@ __kmp_env_print() {
char const * name = block.vars[ i ].name;
char const * value = block.vars[ i ].value;
if (
strlen( name ) > 4
&&
( strncmp( name, "KMP_", 4 ) == 0 ) || strncmp( name, "OMP_", 4 ) == 0
( strlen( name ) > 4 && strncmp( name, "KMP_", 4 ) == 0 )
|| strncmp( name, "OMP_", 4 ) == 0
#ifdef KMP_GOMP_COMPAT
|| strncmp( name, "GOMP_", 5 ) == 0
#endif // KMP_GOMP_COMPAT

View File

@ -1,7 +1,7 @@
/*
* kmp_settings.h -- Initialize environment variables
* $Revision: 42598 $
* $Date: 2013-08-19 15:40:56 -0500 (Mon, 19 Aug 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -0,0 +1,615 @@
/** @file kmp_stats.cpp
* Statistics gathering and processing.
*/
//===----------------------------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.
//
//===----------------------------------------------------------------------===//
#if KMP_STATS_ENABLED
#include "kmp.h"
#include "kmp_str.h"
#include "kmp_lock.h"
#include "kmp_stats.h"
#include <algorithm>
#include <sstream>
#include <iomanip>
#include <stdlib.h> // for atexit
#define STRINGIZE2(x) #x
#define STRINGIZE(x) STRINGIZE2(x)
#define expandName(name,flags,ignore) {STRINGIZE(name),flags},
statInfo timeStat::timerInfo[] = {
KMP_FOREACH_TIMER(expandName,0)
{0,0}
};
const statInfo counter::counterInfo[] = {
KMP_FOREACH_COUNTER(expandName,0)
{0,0}
};
#undef expandName
#define expandName(ignore1,ignore2,ignore3) {0.0,0.0,0.0},
kmp_stats_output_module::rgb_color kmp_stats_output_module::timerColorInfo[] = {
KMP_FOREACH_TIMER(expandName,0)
{0.0,0.0,0.0}
};
#undef expandName
const kmp_stats_output_module::rgb_color kmp_stats_output_module::globalColorArray[] = {
{1.0, 0.0, 0.0}, // red
{1.0, 0.6, 0.0}, // orange
{1.0, 1.0, 0.0}, // yellow
{0.0, 1.0, 0.0}, // green
{0.0, 0.0, 1.0}, // blue
{0.6, 0.2, 0.8}, // purple
{1.0, 0.0, 1.0}, // magenta
{0.0, 0.4, 0.2}, // dark green
{1.0, 1.0, 0.6}, // light yellow
{0.6, 0.4, 0.6}, // dirty purple
{0.0, 1.0, 1.0}, // cyan
{1.0, 0.4, 0.8}, // pink
{0.5, 0.5, 0.5}, // grey
{0.8, 0.7, 0.5}, // brown
{0.6, 0.6, 1.0}, // light blue
{1.0, 0.7, 0.5}, // peach
{0.8, 0.5, 1.0}, // lavender
{0.6, 0.0, 0.0}, // dark red
{0.7, 0.6, 0.0}, // gold
{0.0, 0.0, 0.0} // black
};
// Ensure that the atexit handler only runs once.
static uint32_t statsPrinted = 0;
// output interface
static kmp_stats_output_module __kmp_stats_global_output;
/* ****************************************************** */
/* ************* statistic member functions ************* */
void statistic::addSample(double sample)
{
double delta = sample - meanVal;
sampleCount = sampleCount + 1;
meanVal = meanVal + delta/sampleCount;
m2 = m2 + delta*(sample - meanVal);
minVal = std::min(minVal, sample);
maxVal = std::max(maxVal, sample);
}
statistic & statistic::operator+= (const statistic & other)
{
if (sampleCount == 0)
{
*this = other;
return *this;
}
uint64_t newSampleCount = sampleCount + other.sampleCount;
double dnsc = double(newSampleCount);
double dsc = double(sampleCount);
double dscBydnsc = dsc/dnsc;
double dosc = double(other.sampleCount);
double delta = other.meanVal - meanVal;
// Try to order these calculations to avoid overflows.
// If this were Fortran, then the compiler would not be able to re-order over brackets.
// In C++ it may be legal to do that (we certainly hope it doesn't, and CC+ Programming Language 2nd edition
// suggests it shouldn't, since it says that exploitation of associativity can only be made if the operation
// really is associative (which floating addition isn't...)).
meanVal = meanVal*dscBydnsc + other.meanVal*(1-dscBydnsc);
m2 = m2 + other.m2 + dscBydnsc*dosc*delta*delta;
minVal = std::min (minVal, other.minVal);
maxVal = std::max (maxVal, other.maxVal);
sampleCount = newSampleCount;
return *this;
}
void statistic::scale(double factor)
{
minVal = minVal*factor;
maxVal = maxVal*factor;
meanVal= meanVal*factor;
m2 = m2*factor*factor;
return;
}
std::string statistic::format(char unit, bool total) const
{
std::string result = formatSI(sampleCount,9,' ');
result = result + std::string(", ") + formatSI(minVal, 9, unit);
result = result + std::string(", ") + formatSI(meanVal, 9, unit);
result = result + std::string(", ") + formatSI(maxVal, 9, unit);
if (total)
result = result + std::string(", ") + formatSI(meanVal*sampleCount, 9, unit);
result = result + std::string(", ") + formatSI(getSD(), 9, unit);
return result;
}
/* ********************************************************** */
/* ************* explicitTimer member functions ************* */
void explicitTimer::start(timer_e timerEnumValue) {
startTime = tsc_tick_count::now();
if(timeStat::logEvent(timerEnumValue)) {
__kmp_stats_thread_ptr->incrementNestValue();
}
return;
}
void explicitTimer::stop(timer_e timerEnumValue) {
if (startTime.getValue() == 0)
return;
tsc_tick_count finishTime = tsc_tick_count::now();
//stat->addSample ((tsc_tick_count::now() - startTime).ticks());
stat->addSample ((finishTime - startTime).ticks());
if(timeStat::logEvent(timerEnumValue)) {
__kmp_stats_thread_ptr->push_event(startTime.getValue() - __kmp_stats_start_time.getValue(), finishTime.getValue() - __kmp_stats_start_time.getValue(), __kmp_stats_thread_ptr->getNestValue(), timerEnumValue);
__kmp_stats_thread_ptr->decrementNestValue();
}
/* We accept the risk that we drop a sample because it really did start at t==0. */
startTime = 0;
return;
}
/* ******************************************************************* */
/* ************* kmp_stats_event_vector member functions ************* */
void kmp_stats_event_vector::deallocate() {
__kmp_free(events);
internal_size = 0;
allocated_size = 0;
events = NULL;
}
// This function is for qsort() which requires the compare function to return
// either a negative number if event1 < event2, a positive number if event1 > event2
// or zero if event1 == event2.
// This sorts by start time (lowest to highest).
int compare_two_events(const void* event1, const void* event2) {
kmp_stats_event* ev1 = (kmp_stats_event*)event1;
kmp_stats_event* ev2 = (kmp_stats_event*)event2;
if(ev1->getStart() < ev2->getStart()) return -1;
else if(ev1->getStart() > ev2->getStart()) return 1;
else return 0;
}
void kmp_stats_event_vector::sort() {
qsort(events, internal_size, sizeof(kmp_stats_event), compare_two_events);
}
/* *********************************************************** */
/* ************* kmp_stats_list member functions ************* */
// returns a pointer to newly created stats node
kmp_stats_list* kmp_stats_list::push_back(int gtid) {
kmp_stats_list* newnode = (kmp_stats_list*)__kmp_allocate(sizeof(kmp_stats_list));
// placement new, only requires space and pointer and initializes (so __kmp_allocate instead of C++ new[] is used)
new (newnode) kmp_stats_list();
newnode->setGtid(gtid);
newnode->prev = this->prev;
newnode->next = this;
newnode->prev->next = newnode;
newnode->next->prev = newnode;
return newnode;
}
void kmp_stats_list::deallocate() {
kmp_stats_list* ptr = this->next;
kmp_stats_list* delptr = this->next;
while(ptr != this) {
delptr = ptr;
ptr=ptr->next;
// placement new means we have to explicitly call destructor.
delptr->_event_vector.deallocate();
delptr->~kmp_stats_list();
__kmp_free(delptr);
}
}
kmp_stats_list::iterator kmp_stats_list::begin() {
kmp_stats_list::iterator it;
it.ptr = this->next;
return it;
}
kmp_stats_list::iterator kmp_stats_list::end() {
kmp_stats_list::iterator it;
it.ptr = this;
return it;
}
int kmp_stats_list::size() {
int retval;
kmp_stats_list::iterator it;
for(retval=0, it=begin(); it!=end(); it++, retval++) {}
return retval;
}
/* ********************************************************************* */
/* ************* kmp_stats_list::iterator member functions ************* */
kmp_stats_list::iterator::iterator() : ptr(NULL) {}
kmp_stats_list::iterator::~iterator() {}
kmp_stats_list::iterator kmp_stats_list::iterator::operator++() {
this->ptr = this->ptr->next;
return *this;
}
kmp_stats_list::iterator kmp_stats_list::iterator::operator++(int dummy) {
this->ptr = this->ptr->next;
return *this;
}
kmp_stats_list::iterator kmp_stats_list::iterator::operator--() {
this->ptr = this->ptr->prev;
return *this;
}
kmp_stats_list::iterator kmp_stats_list::iterator::operator--(int dummy) {
this->ptr = this->ptr->prev;
return *this;
}
bool kmp_stats_list::iterator::operator!=(const kmp_stats_list::iterator & rhs) {
return this->ptr!=rhs.ptr;
}
bool kmp_stats_list::iterator::operator==(const kmp_stats_list::iterator & rhs) {
return this->ptr==rhs.ptr;
}
kmp_stats_list* kmp_stats_list::iterator::operator*() const {
return this->ptr;
}
/* *************************************************************** */
/* ************* kmp_stats_output_module functions ************** */
const char* kmp_stats_output_module::outputFileName = NULL;
const char* kmp_stats_output_module::eventsFileName = NULL;
const char* kmp_stats_output_module::plotFileName = NULL;
int kmp_stats_output_module::printPerThreadFlag = 0;
int kmp_stats_output_module::printPerThreadEventsFlag = 0;
// init() is called very near the beginning of execution time in the constructor of __kmp_stats_global_output
void kmp_stats_output_module::init()
{
char * statsFileName = getenv("KMP_STATS_FILE");
eventsFileName = getenv("KMP_STATS_EVENTS_FILE");
plotFileName = getenv("KMP_STATS_PLOT_FILE");
char * threadStats = getenv("KMP_STATS_THREADS");
char * threadEvents = getenv("KMP_STATS_EVENTS");
// set the stats output filenames based on environment variables and defaults
outputFileName = statsFileName;
eventsFileName = eventsFileName ? eventsFileName : "events.dat";
plotFileName = plotFileName ? plotFileName : "events.plt";
// set the flags based on environment variables matching: true, on, 1, .true. , .t. , yes
printPerThreadFlag = __kmp_str_match_true(threadStats);
printPerThreadEventsFlag = __kmp_str_match_true(threadEvents);
if(printPerThreadEventsFlag) {
// assigns a color to each timer for printing
setupEventColors();
} else {
// will clear flag so that no event will be logged
timeStat::clearEventFlags();
}
return;
}
void kmp_stats_output_module::setupEventColors() {
int i;
int globalColorIndex = 0;
int numGlobalColors = sizeof(globalColorArray) / sizeof(rgb_color);
for(i=0;i<TIMER_LAST;i++) {
if(timeStat::logEvent((timer_e)i)) {
timerColorInfo[i] = globalColorArray[globalColorIndex];
globalColorIndex = (globalColorIndex+1)%numGlobalColors;
}
}
return;
}
void kmp_stats_output_module::printStats(FILE *statsOut, statistic const * theStats, bool areTimers)
{
if (areTimers)
{
// Check if we have useful timers, since we don't print zero value timers we need to avoid
// printing a header and then no data.
bool haveTimers = false;
for (int s = 0; s<TIMER_LAST; s++)
{
if (theStats[s].getCount() != 0)
{
haveTimers = true;
break;
}
}
if (!haveTimers)
return;
}
// Print
const char * title = areTimers ? "Timer, SampleCount," : "Counter, ThreadCount,";
fprintf (statsOut, "%s Min, Mean, Max, Total, SD\n", title);
if (areTimers) {
for (int s = 0; s<TIMER_LAST; s++) {
statistic const * stat = &theStats[s];
if (stat->getCount() != 0) {
char tag = timeStat::noUnits(timer_e(s)) ? ' ' : 'T';
fprintf (statsOut, "%-25s, %s\n", timeStat::name(timer_e(s)), stat->format(tag, true).c_str());
}
}
} else { // Counters
for (int s = 0; s<COUNTER_LAST; s++) {
statistic const * stat = &theStats[s];
fprintf (statsOut, "%-25s, %s\n", counter::name(counter_e(s)), stat->format(' ', true).c_str());
}
}
}
void kmp_stats_output_module::printCounters(FILE * statsOut, counter const * theCounters)
{
// We print all the counters even if they are zero.
// That makes it easier to slice them into a spreadsheet if you need to.
fprintf (statsOut, "\nCounter, Count\n");
for (int c = 0; c<COUNTER_LAST; c++) {
counter const * stat = &theCounters[c];
fprintf (statsOut, "%-25s, %s\n", counter::name(counter_e(c)), formatSI(stat->getValue(), 9, ' ').c_str());
}
}
void kmp_stats_output_module::printEvents(FILE* eventsOut, kmp_stats_event_vector* theEvents, int gtid) {
// sort by start time before printing
theEvents->sort();
for (int i = 0; i < theEvents->size(); i++) {
kmp_stats_event ev = theEvents->at(i);
rgb_color color = getEventColor(ev.getTimerName());
fprintf(eventsOut, "%d %lu %lu %1.1f rgb(%1.1f,%1.1f,%1.1f) %s\n",
gtid,
ev.getStart(),
ev.getStop(),
1.2 - (ev.getNestLevel() * 0.2),
color.r, color.g, color.b,
timeStat::name(ev.getTimerName())
);
}
return;
}
void kmp_stats_output_module::windupExplicitTimers()
{
// Wind up any explicit timers. We assume that it's fair at this point to just walk all the explcit timers in all threads
// and say "it's over".
// If the timer wasn't running, this won't record anything anyway.
kmp_stats_list::iterator it;
for(it = __kmp_stats_list.begin(); it != __kmp_stats_list.end(); it++) {
for (int timer=0; timer<EXPLICIT_TIMER_LAST; timer++) {
(*it)->getExplicitTimer(explicit_timer_e(timer))->stop((timer_e)timer);
}
}
}
void kmp_stats_output_module::printPloticusFile() {
int i;
int size = __kmp_stats_list.size();
FILE* plotOut = fopen(plotFileName, "w+");
fprintf(plotOut, "#proc page\n"
" pagesize: 15 10\n"
" scale: 1.0\n\n");
fprintf(plotOut, "#proc getdata\n"
" file: %s\n\n",
eventsFileName);
fprintf(plotOut, "#proc areadef\n"
" title: OpenMP Sampling Timeline\n"
" titledetails: align=center size=16\n"
" rectangle: 1 1 13 9\n"
" xautorange: datafield=2,3\n"
" yautorange: -1 %d\n\n",
size);
fprintf(plotOut, "#proc xaxis\n"
" stubs: inc\n"
" stubdetails: size=12\n"
" label: Time (ticks)\n"
" labeldetails: size=14\n\n");
fprintf(plotOut, "#proc yaxis\n"
" stubs: inc 1\n"
" stubrange: 0 %d\n"
" stubdetails: size=12\n"
" label: Thread #\n"
" labeldetails: size=14\n\n",
size-1);
fprintf(plotOut, "#proc bars\n"
" exactcolorfield: 5\n"
" axis: x\n"
" locfield: 1\n"
" segmentfields: 2 3\n"
" barwidthfield: 4\n\n");
// create legend entries corresponding to the timer color
for(i=0;i<TIMER_LAST;i++) {
if(timeStat::logEvent((timer_e)i)) {
rgb_color c = getEventColor((timer_e)i);
fprintf(plotOut, "#proc legendentry\n"
" sampletype: color\n"
" label: %s\n"
" details: rgb(%1.1f,%1.1f,%1.1f)\n\n",
timeStat::name((timer_e)i),
c.r, c.g, c.b);
}
}
fprintf(plotOut, "#proc legend\n"
" format: down\n"
" location: max max\n\n");
fclose(plotOut);
return;
}
void kmp_stats_output_module::outputStats(const char* heading)
{
statistic allStats[TIMER_LAST];
statistic allCounters[COUNTER_LAST];
// stop all the explicit timers for all threads
windupExplicitTimers();
FILE * eventsOut;
FILE * statsOut = outputFileName ? fopen (outputFileName, "a+") : stderr;
if (eventPrintingEnabled()) {
eventsOut = fopen(eventsFileName, "w+");
}
if (!statsOut)
statsOut = stderr;
fprintf(statsOut, "%s\n",heading);
// Accumulate across threads.
kmp_stats_list::iterator it;
for (it = __kmp_stats_list.begin(); it != __kmp_stats_list.end(); it++) {
int t = (*it)->getGtid();
// Output per thread stats if requested.
if (perThreadPrintingEnabled()) {
fprintf (statsOut, "Thread %d\n", t);
printStats(statsOut, (*it)->getTimers(), true);
printCounters(statsOut, (*it)->getCounters());
fprintf(statsOut,"\n");
}
// Output per thread events if requested.
if (eventPrintingEnabled()) {
kmp_stats_event_vector events = (*it)->getEventVector();
printEvents(eventsOut, &events, t);
}
for (int s = 0; s<TIMER_LAST; s++) {
// See if we should ignore this timer when aggregating
if ((timeStat::masterOnly(timer_e(s)) && (t != 0)) || // Timer is only valid on the master and this thread is a worker
(timeStat::workerOnly(timer_e(s)) && (t == 0)) || // Timer is only valid on a worker and this thread is the master
timeStat::synthesized(timer_e(s)) // It's a synthesized stat, so there's no raw data for it.
)
{
continue;
}
statistic * threadStat = (*it)->getTimer(timer_e(s));
allStats[s] += *threadStat;
}
// Special handling for synthesized statistics.
// These just have to be coded specially here for now.
// At present we only have one: the total parallel work done in each thread.
// The variance here makes it easy to see load imbalance over the whole program (though, of course,
// it's possible to have a code with awful load balance in every parallel region but perfect load
// balance oever the whole program.)
allStats[TIMER_Total_work].addSample ((*it)->getTimer(TIMER_OMP_work)->getTotal());
// Time waiting for work (synthesized)
if ((t != 0) || !timeStat::workerOnly(timer_e(TIMER_OMP_await_work)))
allStats[TIMER_Total_await_work].addSample ((*it)->getTimer(TIMER_OMP_await_work)->getTotal());
// Time in explicit barriers.
allStats[TIMER_Total_barrier].addSample ((*it)->getTimer(TIMER_OMP_barrier)->getTotal());
for (int c = 0; c<COUNTER_LAST; c++) {
if (counter::masterOnly(counter_e(c)) && t != 0)
continue;
allCounters[c].addSample ((*it)->getCounter(counter_e(c))->getValue());
}
}
if (eventPrintingEnabled()) {
printPloticusFile();
fclose(eventsOut);
}
fprintf (statsOut, "Aggregate for all threads\n");
printStats (statsOut, &allStats[0], true);
fprintf (statsOut, "\n");
printStats (statsOut, &allCounters[0], false);
if (statsOut != stderr)
fclose(statsOut);
}
/* ************************************************** */
/* ************* exported C functions ************** */
// no name mangling for these functions, we want the c files to be able to get at these functions
extern "C" {
void __kmp_reset_stats()
{
kmp_stats_list::iterator it;
for(it = __kmp_stats_list.begin(); it != __kmp_stats_list.end(); it++) {
timeStat * timers = (*it)->getTimers();
counter * counters = (*it)->getCounters();
explicitTimer * eTimers = (*it)->getExplicitTimers();
for (int t = 0; t<TIMER_LAST; t++)
timers[t].reset();
for (int c = 0; c<COUNTER_LAST; c++)
counters[c].reset();
for (int t=0; t<EXPLICIT_TIMER_LAST; t++)
eTimers[t].reset();
// reset the event vector so all previous events are "erased"
(*it)->resetEventVector();
// May need to restart the explicit timers in thread zero?
}
KMP_START_EXPLICIT_TIMER(OMP_serial);
KMP_START_EXPLICIT_TIMER(OMP_start_end);
}
// This function will reset all stats and stop all threads' explicit timers if they haven't been stopped already.
void __kmp_output_stats(const char * heading)
{
__kmp_stats_global_output.outputStats(heading);
__kmp_reset_stats();
}
void __kmp_accumulate_stats_at_exit(void)
{
// Only do this once.
if (KMP_XCHG_FIXED32(&statsPrinted, 1) != 0)
return;
__kmp_output_stats("Statistics on exit");
return;
}
void __kmp_stats_init(void)
{
return;
}
} // extern "C"
#endif // KMP_STATS_ENABLED

View File

@ -0,0 +1,706 @@
#ifndef KMP_STATS_H
#define KMP_STATS_H
/** @file kmp_stats.h
* Functions for collecting statistics.
*/
//===----------------------------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.
//
//===----------------------------------------------------------------------===//
#if KMP_STATS_ENABLED
/*
* Statistics accumulator.
* Accumulates number of samples and computes min, max, mean, standard deviation on the fly.
*
* Online variance calculation algorithm from http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm
*/
#include <limits>
#include <math.h>
#include <string>
#include <stdint.h>
#include <new> // placement new
#include "kmp_stats_timing.h"
/*!
* @ingroup STATS_GATHERING
* \brief flags to describe the statistic ( timers or counter )
*
*/
class stats_flags_e {
public:
const static int onlyInMaster = 1<<0; //!< statistic is valid only for master
const static int noUnits = 1<<1; //!< statistic doesn't need units printed next to it in output
const static int synthesized = 1<<2; //!< statistic's value is created atexit time in the __kmp_output_stats function
const static int notInMaster = 1<<3; //!< statistic is valid for non-master threads
const static int logEvent = 1<<4; //!< statistic can be logged when KMP_STATS_EVENTS is on (valid only for timers)
};
/*!
* \brief Add new counters under KMP_FOREACH_COUNTER() macro in kmp_stats.h
*
* @param macro a user defined macro that takes three arguments - macro(COUNTER_NAME, flags, arg)
* @param arg a user defined argument to send to the user defined macro
*
* \details A counter counts the occurence of some event.
* Each thread accumulates its own count, at the end of execution the counts are aggregated treating each thread
* as a separate measurement. (Unless onlyInMaster is set, in which case there's only a single measurement).
* The min,mean,max are therefore the values for the threads.
* Adding the counter here and then putting in a KMP_BLOCK_COUNTER(name) is all you need to do.
* All of the tables and printing is generated from this macro.
* Format is "macro(name, flags, arg)"
*
* @ingroup STATS_GATHERING
*/
#define KMP_FOREACH_COUNTER(macro, arg) \
macro (OMP_PARALLEL, stats_flags_e::onlyInMaster, arg) \
macro (OMP_FOR_static, 0, arg) \
macro (OMP_FOR_dynamic, 0, arg) \
macro (OMP_DISTR_FOR_static, 0, arg) \
macro (OMP_DISTR_FOR_dynamic, 0, arg) \
macro (OMP_BARRIER, 0, arg) \
macro (OMP_CRITICAL,0, arg) \
macro (OMP_SINGLE, 0, arg) \
macro (OMP_MASTER, 0, arg) \
macro (OMP_set_lock, 0, arg) \
macro (OMP_test_lock, 0, arg) \
macro (OMP_test_lock_failure, 0, arg) \
macro (REDUCE_wait, 0, arg) \
macro (REDUCE_nowait, 0, arg) \
macro (LAST,0,arg)
/*!
* \brief Add new timers under KMP_FOREACH_TIMER() macro in kmp_stats.h
*
* @param macro a user defined macro that takes three arguments - macro(TIMER_NAME, flags, arg)
* @param arg a user defined argument to send to the user defined macro
*
* \details A timer collects multiple samples of some count in each thread and then finally aggregates over all the threads.
* The count is normally a time (in ticks), hence the name "timer". (But can be any value, so we use this for "number of arguments passed to fork"
* as well, or we could collect "loop iteration count" if we wanted to).
* For timers the threads are not significant, it's the individual observations that count, so the statistics are at that level.
* Format is "macro(name, flags, arg)"
*
* @ingroup STATS_GATHERING
*/
#define KMP_FOREACH_TIMER(macro, arg) \
macro (OMP_PARALLEL_args, stats_flags_e::onlyInMaster | stats_flags_e::noUnits, arg) \
macro (FOR_static_iterations, stats_flags_e::onlyInMaster | stats_flags_e::noUnits, arg) \
macro (FOR_dynamic_iterations, stats_flags_e::noUnits, arg) \
macro (OMP_start_end, stats_flags_e::onlyInMaster, arg) \
macro (OMP_serial, stats_flags_e::onlyInMaster, arg) \
macro (OMP_work, 0, arg) \
macro (Total_work, stats_flags_e::synthesized, arg) \
macro (OMP_await_work, stats_flags_e::notInMaster, arg) \
macro (Total_await_work, stats_flags_e::synthesized, arg) \
macro (OMP_barrier, 0, arg) \
macro (Total_barrier, stats_flags_e::synthesized, arg) \
macro (OMP_test_lock, 0, arg) \
macro (FOR_static_scheduling, 0, arg) \
macro (FOR_dynamic_scheduling, 0, arg) \
macro (KMP_fork_call, 0, arg) \
macro (KMP_join_call, 0, arg) \
macro (KMP_fork_barrier, stats_flags_e::logEvent, arg) \
macro (KMP_join_barrier, stats_flags_e::logEvent, arg) \
macro (KMP_barrier, 0, arg) \
macro (KMP_end_split_barrier, 0, arg) \
macro (KMP_wait_sleep, 0, arg) \
macro (KMP_release, 0, arg) \
macro (KMP_hier_gather, 0, arg) \
macro (KMP_hier_release, 0, arg) \
macro (KMP_hyper_gather, stats_flags_e::logEvent, arg) \
macro (KMP_hyper_release, stats_flags_e::logEvent, arg) \
macro (KMP_linear_gather, 0, arg) \
macro (KMP_linear_release, 0, arg) \
macro (KMP_tree_gather, 0, arg) \
macro (KMP_tree_release, 0, arg) \
macro (USER_master_invoke, stats_flags_e::logEvent, arg) \
macro (USER_worker_invoke, stats_flags_e::logEvent, arg) \
macro (USER_resume, stats_flags_e::logEvent, arg) \
macro (USER_suspend, stats_flags_e::logEvent, arg) \
macro (USER_launch_thread_loop, stats_flags_e::logEvent, arg) \
macro (KMP_allocate_team, 0, arg) \
macro (KMP_setup_icv_copy, 0, arg) \
macro (USER_icv_copy, 0, arg) \
macro (LAST,0, arg)
// OMP_PARALLEL_args -- the number of arguments passed to a fork
// FOR_static_iterations -- Number of available parallel chunks of work in a static for
// FOR_dynamic_iterations -- Number of available parallel chunks of work in a dynamic for
// Both adjust for any chunking, so if there were an iteration count of 20 but a chunk size of 10, we'd record 2.
// OMP_serial -- thread zero time executing serial code
// OMP_start_end -- time from when OpenMP is initialized until the stats are printed at exit
// OMP_work -- elapsed time in code dispatched by a fork (measured in the thread)
// Total_work -- a synthesized statistic summarizing how much parallel work each thread executed.
// OMP_barrier -- time at "real" barriers
// Total_barrier -- a synthesized statistic summarizing how much time at real barriers in each thread
// OMP_set_lock -- time in lock setting
// OMP_test_lock -- time in testing a lock
// LOCK_WAIT -- time waiting for a lock
// FOR_static_scheduling -- time spent doing scheduling for a static "for"
// FOR_dynamic_scheduling -- time spent doing scheduling for a dynamic "for"
// KMP_wait_sleep -- time in __kmp_wait_sleep
// KMP_release -- time in __kmp_release
// KMP_fork_barrier -- time in __kmp_fork_barrier
// KMP_join_barrier -- time in __kmp_join_barrier
// KMP_barrier -- time in __kmp_barrier
// KMP_end_split_barrier -- time in __kmp_end_split_barrier
// KMP_setup_icv_copy -- time in __kmp_setup_icv_copy
// KMP_icv_copy -- start/stop timer for any ICV copying
// KMP_linear_gather -- time in __kmp_linear_barrier_gather
// KMP_linear_release -- time in __kmp_linear_barrier_release
// KMP_tree_gather -- time in __kmp_tree_barrier_gather
// KMP_tree_release -- time in __kmp_tree_barrier_release
// KMP_hyper_gather -- time in __kmp_hyper_barrier_gather
// KMP_hyper_release -- time in __kmp_hyper_barrier_release
/*!
* \brief Add new explicit timers under KMP_FOREACH_EXPLICIT_TIMER() macro.
*
* @param macro a user defined macro that takes three arguments - macro(TIMER_NAME, flags, arg)
* @param arg a user defined argument to send to the user defined macro
*
* \warning YOU MUST HAVE THE SAME NAMED TIMER UNDER KMP_FOREACH_TIMER() OR ELSE BAD THINGS WILL HAPPEN!
*
* \details Explicit timers are ones where we need to allocate a timer itself (as well as the accumulated timing statistics).
* We allocate these on a per-thread basis, and explicitly start and stop them.
* Block timers just allocate the timer itself on the stack, and use the destructor to notice block exit; they don't
* need to be defined here.
* The name here should be the same as that of a timer above.
*
* @ingroup STATS_GATHERING
*/
#define KMP_FOREACH_EXPLICIT_TIMER(macro, arg) \
macro(OMP_serial, 0, arg) \
macro(OMP_start_end, 0, arg) \
macro(USER_icv_copy, 0, arg) \
macro(USER_launch_thread_loop, stats_flags_e::logEvent, arg) \
macro(LAST, 0, arg)
#define ENUMERATE(name,ignore,prefix) prefix##name,
enum timer_e {
KMP_FOREACH_TIMER(ENUMERATE, TIMER_)
};
enum explicit_timer_e {
KMP_FOREACH_EXPLICIT_TIMER(ENUMERATE, EXPLICIT_TIMER_)
};
enum counter_e {
KMP_FOREACH_COUNTER(ENUMERATE, COUNTER_)
};
#undef ENUMERATE
class statistic
{
double minVal;
double maxVal;
double meanVal;
double m2;
uint64_t sampleCount;
public:
statistic() { reset(); }
statistic (statistic const &o): minVal(o.minVal), maxVal(o.maxVal), meanVal(o.meanVal), m2(o.m2), sampleCount(o.sampleCount) {}
double getMin() const { return minVal; }
double getMean() const { return meanVal; }
double getMax() const { return maxVal; }
uint64_t getCount() const { return sampleCount; }
double getSD() const { return sqrt(m2/sampleCount); }
double getTotal() const { return sampleCount*meanVal; }
void reset()
{
minVal = std::numeric_limits<double>::max();
maxVal = -std::numeric_limits<double>::max();
meanVal= 0.0;
m2 = 0.0;
sampleCount = 0;
}
void addSample(double sample);
void scale (double factor);
void scaleDown(double f) { scale (1./f); }
statistic & operator+= (statistic const & other);
std::string format(char unit, bool total=false) const;
};
struct statInfo
{
const char * name;
uint32_t flags;
};
class timeStat : public statistic
{
static statInfo timerInfo[];
public:
timeStat() : statistic() {}
static const char * name(timer_e e) { return timerInfo[e].name; }
static bool masterOnly (timer_e e) { return timerInfo[e].flags & stats_flags_e::onlyInMaster; }
static bool workerOnly (timer_e e) { return timerInfo[e].flags & stats_flags_e::notInMaster; }
static bool noUnits (timer_e e) { return timerInfo[e].flags & stats_flags_e::noUnits; }
static bool synthesized(timer_e e) { return timerInfo[e].flags & stats_flags_e::synthesized; }
static bool logEvent (timer_e e) { return timerInfo[e].flags & stats_flags_e::logEvent; }
static void clearEventFlags() {
int i;
for(i=0;i<TIMER_LAST;i++) {
timerInfo[i].flags &= (~(stats_flags_e::logEvent));
}
}
};
// Where we need explicitly to start and end the timer, this version can be used
// Since these timers normally aren't nicely scoped, so don't have a good place to live
// on the stack of the thread, they're more work to use.
class explicitTimer
{
timeStat * stat;
tsc_tick_count startTime;
public:
explicitTimer () : stat(0), startTime(0) { }
explicitTimer (timeStat * s) : stat(s), startTime() { }
void setStat (timeStat *s) { stat = s; }
void start(timer_e timerEnumValue);
void stop(timer_e timerEnumValue);
void reset() { startTime = 0; }
};
// Where all you need is to time a block, this is enough.
// (It avoids the need to have an explicit end, leaving the scope suffices.)
class blockTimer : public explicitTimer
{
timer_e timerEnumValue;
public:
blockTimer (timeStat * s, timer_e newTimerEnumValue) : timerEnumValue(newTimerEnumValue), explicitTimer(s) { start(timerEnumValue); }
~blockTimer() { stop(timerEnumValue); }
};
// If all you want is a count, then you can use this...
// The individual per-thread counts will be aggregated into a statistic at program exit.
class counter
{
uint64_t value;
static const statInfo counterInfo[];
public:
counter() : value(0) {}
void increment() { value++; }
uint64_t getValue() const { return value; }
void reset() { value = 0; }
static const char * name(counter_e e) { return counterInfo[e].name; }
static bool masterOnly (counter_e e) { return counterInfo[e].flags & stats_flags_e::onlyInMaster; }
};
/* ****************************************************************
Class to implement an event
There are four components to an event: start time, stop time
nest_level, and timer_name.
The start and stop time should be obvious (recorded in clock ticks).
The nest_level relates to the bar width in the timeline graph.
The timer_name is used to determine which timer event triggered this event.
the interface to this class is through four read-only operations:
1) getStart() -- returns the start time as 64 bit integer
2) getStop() -- returns the stop time as 64 bit integer
3) getNestLevel() -- returns the nest level of the event
4) getTimerName() -- returns the timer name that triggered event
*MORE ON NEST_LEVEL*
The nest level is used in the bar graph that represents the timeline.
Its main purpose is for showing how events are nested inside eachother.
For example, say events, A, B, and C are recorded. If the timeline
looks like this:
Begin -------------------------------------------------------------> Time
| | | | | |
A B C C B A
start start start end end end
Then A, B, C will have a nest level of 1, 2, 3 respectively.
These values are then used to calculate the barwidth so you can
see that inside A, B has occured, and inside B, C has occured.
Currently, this is shown with A's bar width being larger than B's
bar width, and B's bar width being larger than C's bar width.
**************************************************************** */
class kmp_stats_event {
uint64_t start;
uint64_t stop;
int nest_level;
timer_e timer_name;
public:
kmp_stats_event() : start(0), stop(0), nest_level(0), timer_name(TIMER_LAST) {}
kmp_stats_event(uint64_t strt, uint64_t stp, int nst, timer_e nme) : start(strt), stop(stp), nest_level(nst), timer_name(nme) {}
inline uint64_t getStart() const { return start; }
inline uint64_t getStop() const { return stop; }
inline int getNestLevel() const { return nest_level; }
inline timer_e getTimerName() const { return timer_name; }
};
/* ****************************************************************
Class to implement a dynamically expandable array of events
---------------------------------------------------------
| event 1 | event 2 | event 3 | event 4 | ... | event N |
---------------------------------------------------------
An event is pushed onto the back of this array at every
explicitTimer->stop() call. The event records the thread #,
start time, stop time, and nest level related to the bar width.
The event vector starts at size INIT_SIZE and grows (doubles in size)
if needed. An implication of this behavior is that log(N)
reallocations are needed (where N is number of events). If you want
to avoid reallocations, then set INIT_SIZE to a large value.
the interface to this class is through six operations:
1) reset() -- sets the internal_size back to 0 but does not deallocate any memory
2) size() -- returns the number of valid elements in the vector
3) push_back(start, stop, nest, timer_name) -- pushes an event onto
the back of the array
4) deallocate() -- frees all memory associated with the vector
5) sort() -- sorts the vector by start time
6) operator[index] or at(index) -- returns event reference at that index
**************************************************************** */
class kmp_stats_event_vector {
kmp_stats_event* events;
int internal_size;
int allocated_size;
static const int INIT_SIZE = 1024;
public:
kmp_stats_event_vector() {
events = (kmp_stats_event*)__kmp_allocate(sizeof(kmp_stats_event)*INIT_SIZE);
internal_size = 0;
allocated_size = INIT_SIZE;
}
~kmp_stats_event_vector() {}
inline void reset() { internal_size = 0; }
inline int size() const { return internal_size; }
void push_back(uint64_t start_time, uint64_t stop_time, int nest_level, timer_e name) {
int i;
if(internal_size == allocated_size) {
kmp_stats_event* tmp = (kmp_stats_event*)__kmp_allocate(sizeof(kmp_stats_event)*allocated_size*2);
for(i=0;i<internal_size;i++) tmp[i] = events[i];
__kmp_free(events);
events = tmp;
allocated_size*=2;
}
events[internal_size] = kmp_stats_event(start_time, stop_time, nest_level, name);
internal_size++;
return;
}
void deallocate();
void sort();
const kmp_stats_event & operator[](int index) const { return events[index]; }
kmp_stats_event & operator[](int index) { return events[index]; }
const kmp_stats_event & at(int index) const { return events[index]; }
kmp_stats_event & at(int index) { return events[index]; }
};
/* ****************************************************************
Class to implement a doubly-linked, circular, statistics list
|---| ---> |---| ---> |---| ---> |---| ---> ... next
| | | | | | | |
|---| <--- |---| <--- |---| <--- |---| <--- ... prev
Sentinel first second third
Node node node node
The Sentinel Node is the user handle on the list.
The first node corresponds to thread 0's statistics.
The second node corresponds to thread 1's statistics and so on...
Each node has a _timers, _counters, and _explicitTimers array to
hold that thread's statistics. The _explicitTimers
point to the correct _timer and update its statistics at every stop() call.
The explicitTimers' pointers are set up in the constructor.
Each node also has an event vector to hold that thread's timing events.
The event vector expands as necessary and records the start-stop times
for each timer.
The nestLevel variable is for plotting events and is related
to the bar width in the timeline graph.
Every thread will have a __thread local pointer to its node in
the list. The sentinel node is used by the master thread to
store "dummy" statistics before __kmp_create_worker() is called.
**************************************************************** */
class kmp_stats_list {
int gtid;
timeStat _timers[TIMER_LAST+1];
counter _counters[COUNTER_LAST+1];
explicitTimer _explicitTimers[EXPLICIT_TIMER_LAST+1];
int _nestLevel; // one per thread
kmp_stats_event_vector _event_vector;
kmp_stats_list* next;
kmp_stats_list* prev;
public:
kmp_stats_list() : next(this) , prev(this) , _event_vector(), _nestLevel(0) {
#define doInit(name,ignore1,ignore2) \
getExplicitTimer(EXPLICIT_TIMER_##name)->setStat(getTimer(TIMER_##name));
KMP_FOREACH_EXPLICIT_TIMER(doInit,0);
#undef doInit
}
~kmp_stats_list() { }
inline timeStat * getTimer(timer_e idx) { return &_timers[idx]; }
inline counter * getCounter(counter_e idx) { return &_counters[idx]; }
inline explicitTimer * getExplicitTimer(explicit_timer_e idx) { return &_explicitTimers[idx]; }
inline timeStat * getTimers() { return _timers; }
inline counter * getCounters() { return _counters; }
inline explicitTimer * getExplicitTimers() { return _explicitTimers; }
inline kmp_stats_event_vector & getEventVector() { return _event_vector; }
inline void resetEventVector() { _event_vector.reset(); }
inline void incrementNestValue() { _nestLevel++; }
inline int getNestValue() { return _nestLevel; }
inline void decrementNestValue() { _nestLevel--; }
inline int getGtid() const { return gtid; }
inline void setGtid(int newgtid) { gtid = newgtid; }
kmp_stats_list* push_back(int gtid); // returns newly created list node
inline void push_event(uint64_t start_time, uint64_t stop_time, int nest_level, timer_e name) {
_event_vector.push_back(start_time, stop_time, nest_level, name);
}
void deallocate();
class iterator;
kmp_stats_list::iterator begin();
kmp_stats_list::iterator end();
int size();
class iterator {
kmp_stats_list* ptr;
friend kmp_stats_list::iterator kmp_stats_list::begin();
friend kmp_stats_list::iterator kmp_stats_list::end();
public:
iterator();
~iterator();
iterator operator++();
iterator operator++(int dummy);
iterator operator--();
iterator operator--(int dummy);
bool operator!=(const iterator & rhs);
bool operator==(const iterator & rhs);
kmp_stats_list* operator*() const; // dereference operator
};
};
/* ****************************************************************
Class to encapsulate all output functions and the environment variables
This module holds filenames for various outputs (normal stats, events, plot file),
as well as coloring information for the plot file.
The filenames and flags variables are read from environment variables.
These are read once by the constructor of the global variable __kmp_stats_output
which calls init().
During this init() call, event flags for the timeStat::timerInfo[] global array
are cleared if KMP_STATS_EVENTS is not true (on, 1, yes).
The only interface function that is public is outputStats(heading). This function
should print out everything it needs to, either to files or stderr,
depending on the environment variables described below
ENVIRONMENT VARIABLES:
KMP_STATS_FILE -- if set, all statistics (not events) will be printed to this file,
otherwise, print to stderr
KMP_STATS_THREADS -- if set to "on", then will print per thread statistics to either
KMP_STATS_FILE or stderr
KMP_STATS_PLOT_FILE -- if set, print the ploticus plot file to this filename,
otherwise, the plot file is sent to "events.plt"
KMP_STATS_EVENTS -- if set to "on", then log events, otherwise, don't log events
KMP_STATS_EVENTS_FILE -- if set, all events are outputted to this file,
otherwise, output is sent to "events.dat"
**************************************************************** */
class kmp_stats_output_module {
public:
struct rgb_color {
float r;
float g;
float b;
};
private:
static const char* outputFileName;
static const char* eventsFileName;
static const char* plotFileName;
static int printPerThreadFlag;
static int printPerThreadEventsFlag;
static const rgb_color globalColorArray[];
static rgb_color timerColorInfo[];
void init();
static void setupEventColors();
static void printPloticusFile();
static void printStats(FILE *statsOut, statistic const * theStats, bool areTimers);
static void printCounters(FILE * statsOut, counter const * theCounters);
static void printEvents(FILE * eventsOut, kmp_stats_event_vector* theEvents, int gtid);
static rgb_color getEventColor(timer_e e) { return timerColorInfo[e]; }
static void windupExplicitTimers();
bool eventPrintingEnabled() {
if(printPerThreadEventsFlag) return true;
else return false;
}
bool perThreadPrintingEnabled() {
if(printPerThreadFlag) return true;
else return false;
}
public:
kmp_stats_output_module() { init(); }
void outputStats(const char* heading);
};
#ifdef __cplusplus
extern "C" {
#endif
void __kmp_stats_init();
void __kmp_reset_stats();
void __kmp_output_stats(const char *);
void __kmp_accumulate_stats_at_exit(void);
// thread local pointer to stats node within list
extern __thread kmp_stats_list* __kmp_stats_thread_ptr;
// head to stats list.
extern kmp_stats_list __kmp_stats_list;
// lock for __kmp_stats_list
extern kmp_tas_lock_t __kmp_stats_lock;
// reference start time
extern tsc_tick_count __kmp_stats_start_time;
// interface to output
extern kmp_stats_output_module __kmp_stats_output;
#ifdef __cplusplus
}
#endif
// Simple, standard interfaces that drop out completely if stats aren't enabled
/*!
* \brief Uses specified timer (name) to time code block.
*
* @param name timer name as specified under the KMP_FOREACH_TIMER() macro
*
* \details Use KMP_TIME_BLOCK(name) macro to time a code block. This will record the time taken in the block
* and use the destructor to stop the timer. Convenient!
* With this definition you can't have more than one KMP_TIME_BLOCK in the same code block.
* I don't think that's a problem.
*
* @ingroup STATS_GATHERING
*/
#define KMP_TIME_BLOCK(name) \
blockTimer __BLOCKTIME__(__kmp_stats_thread_ptr->getTimer(TIMER_##name), TIMER_##name)
/*!
* \brief Adds value to specified timer (name).
*
* @param name timer name as specified under the KMP_FOREACH_TIMER() macro
* @param value double precision sample value to add to statistics for the timer
*
* \details Use KMP_COUNT_VALUE(name, value) macro to add a particular value to a timer statistics.
*
* @ingroup STATS_GATHERING
*/
#define KMP_COUNT_VALUE(name, value) \
__kmp_stats_thread_ptr->getTimer(TIMER_##name)->addSample(value)
/*!
* \brief Increments specified counter (name).
*
* @param name counter name as specified under the KMP_FOREACH_COUNTER() macro
*
* \details Use KMP_COUNT_BLOCK(name, value) macro to increment a statistics counter for the executing thread.
*
* @ingroup STATS_GATHERING
*/
#define KMP_COUNT_BLOCK(name) \
__kmp_stats_thread_ptr->getCounter(COUNTER_##name)->increment()
/*!
* \brief "Starts" an explicit timer which will need a corresponding KMP_STOP_EXPLICIT_TIMER() macro.
*
* @param name explicit timer name as specified under the KMP_FOREACH_EXPLICIT_TIMER() macro
*
* \details Use to start a timer. This will need a corresponding KMP_STOP_EXPLICIT_TIMER()
* macro to stop the timer unlike the KMP_TIME_BLOCK(name) macro which has an implicit stopping macro at the end
* of the code block. All explicit timers are stopped at library exit time before the final statistics are outputted.
*
* @ingroup STATS_GATHERING
*/
#define KMP_START_EXPLICIT_TIMER(name) \
__kmp_stats_thread_ptr->getExplicitTimer(EXPLICIT_TIMER_##name)->start(TIMER_##name)
/*!
* \brief "Stops" an explicit timer.
*
* @param name explicit timer name as specified under the KMP_FOREACH_EXPLICIT_TIMER() macro
*
* \details Use KMP_STOP_EXPLICIT_TIMER(name) to stop a timer. When this is done, the time between the last KMP_START_EXPLICIT_TIMER(name)
* and this KMP_STOP_EXPLICIT_TIMER(name) will be added to the timer's stat value. The timer will then be reset.
* After the KMP_STOP_EXPLICIT_TIMER(name) macro is called, another call to KMP_START_EXPLICIT_TIMER(name) will start the timer once again.
*
* @ingroup STATS_GATHERING
*/
#define KMP_STOP_EXPLICIT_TIMER(name) \
__kmp_stats_thread_ptr->getExplicitTimer(EXPLICIT_TIMER_##name)->stop(TIMER_##name)
/*!
* \brief Outputs the current thread statistics and reset them.
*
* @param heading_string heading put above the final stats output
*
* \details Explicitly stops all timers and outputs all stats.
* Environment variable, `OMPTB_STATSFILE=filename`, can be used to output the stats to a filename instead of stderr
* Environment variable, `OMPTB_STATSTHREADS=true|undefined`, can be used to output thread specific stats
* For now the `OMPTB_STATSTHREADS` environment variable can either be defined with any value, which will print out thread
* specific stats, or it can be undefined (not specified in the environment) and thread specific stats won't be printed
* It should be noted that all statistics are reset when this macro is called.
*
* @ingroup STATS_GATHERING
*/
#define KMP_OUTPUT_STATS(heading_string) \
__kmp_output_stats(heading_string)
/*!
* \brief resets all stats (counters to 0, timers to 0 elapsed ticks)
*
* \details Reset all stats for all threads.
*
* @ingroup STATS_GATHERING
*/
#define KMP_RESET_STATS() __kmp_reset_stats()
#else // KMP_STATS_ENABLED
// Null definitions
#define KMP_TIME_BLOCK(n) ((void)0)
#define KMP_COUNT_VALUE(n,v) ((void)0)
#define KMP_COUNT_BLOCK(n) ((void)0)
#define KMP_START_EXPLICIT_TIMER(n) ((void)0)
#define KMP_STOP_EXPLICIT_TIMER(n) ((void)0)
#define KMP_OUTPUT_STATS(heading_string) ((void)0)
#define KMP_RESET_STATS() ((void)0)
#endif // KMP_STATS_ENABLED
#endif // KMP_STATS_H

View File

@ -0,0 +1,167 @@
/** @file kmp_stats_timing.cpp
* Timing functions
*/
//===----------------------------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.
//
//===----------------------------------------------------------------------===//
#include <stdlib.h>
#include <unistd.h>
#include <iostream>
#include <iomanip>
#include <sstream>
#include "kmp_stats_timing.h"
using namespace std;
#if KMP_OS_LINUX
# if KMP_MIC
double tsc_tick_count::tick_time()
{
// pretty bad assumption of 1GHz clock for MIC
return 1/((double)1000*1.e6);
}
# else
# include <string.h>
// Extract the value from the CPUID information
double tsc_tick_count::tick_time()
{
static double result = 0.0;
if (result == 0.0)
{
int cpuinfo[4];
char brand[256];
__cpuid(cpuinfo, 0x80000000);
memset(brand, 0, sizeof(brand));
int ids = cpuinfo[0];
for (unsigned int i=2; i<(ids^0x80000000)+2; i++)
__cpuid(brand+(i-2)*sizeof(cpuinfo), i | 0x80000000);
char * start = &brand[0];
for (;*start == ' '; start++)
;
char * end = brand + strlen(brand) - 3;
uint64_t multiplier;
if (*end == 'M') multiplier = 1000LL*1000LL;
else if (*end == 'G') multiplier = 1000LL*1000LL*1000LL;
else if (*end == 'T') multiplier = 1000LL*1000LL*1000LL*1000LL;
else
{
cout << "Error determining multiplier '" << *end << "'\n";
exit (-1);
}
*end = 0;
while (*end != ' ') end--;
end++;
double freq = strtod(end, &start);
if (freq == 0.0)
{
cout << "Error calculating frequency " << end << "\n";
exit (-1);
}
result = ((double)1.0)/(freq * multiplier);
}
return result;
}
# endif
#endif
static bool useSI = true;
// Return a formatted string after normalising the value into
// engineering style and using a suitable unit prefix (e.g. ms, us, ns).
std::string formatSI(double interval, int width, char unit)
{
std::stringstream os;
if (useSI)
{
// Preserve accuracy for small numbers, since we only multiply and the positive powers
// of ten are precisely representable.
static struct { double scale; char prefix; } ranges[] = {
{1.e12,'f'},
{1.e9, 'p'},
{1.e6, 'n'},
{1.e3, 'u'},
{1.0, 'm'},
{1.e-3,' '},
{1.e-6,'k'},
{1.e-9,'M'},
{1.e-12,'G'},
{1.e-15,'T'},
{1.e-18,'P'},
{1.e-21,'E'},
{1.e-24,'Z'},
{1.e-27,'Y'}
};
if (interval == 0.0)
{
os << std::setw(width-3) << std::right << "0.00" << std::setw(3) << unit;
return os.str();
}
bool negative = false;
if (interval < 0.0)
{
negative = true;
interval = -interval;
}
for (int i=0; i<(int)(sizeof(ranges)/sizeof(ranges[0])); i++)
{
if (interval*ranges[i].scale < 1.e0)
{
interval = interval * 1000.e0 * ranges[i].scale;
os << std::fixed << std::setprecision(2) << std::setw(width-3) << std::right <<
(negative ? -interval : interval) << std::setw(2) << ranges[i].prefix << std::setw(1) << unit;
return os.str();
}
}
}
os << std::setprecision(2) << std::fixed << std::right << std::setw(width-3) << interval << std::setw(3) << unit;
return os.str();
}
tsc_tick_count::tsc_interval_t computeLastInLastOutInterval(timePair * times, int nTimes)
{
timePair lastTimes = times[0];
tsc_tick_count * startp = lastTimes.get_startp();
tsc_tick_count * endp = lastTimes.get_endp();
for (int i=1; i<nTimes; i++)
{
(*startp) = startp->later(times[i].get_start());
(*endp) = endp->later (times[i].get_end());
}
return lastTimes.duration();
}
std::string timePair::format() const
{
std::ostringstream oss;
oss << start.getValue() << ":" << end.getValue() << " = " << (end-start).getValue();
return oss.str();
}

View File

@ -0,0 +1,104 @@
#ifndef KMP_STATS_TIMING_H
#define KMP_STATS_TIMING_H
/** @file kmp_stats_timing.h
* Access to real time clock and timers.
*/
//===----------------------------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.
//
//===----------------------------------------------------------------------===//
#include <stdint.h>
#include <string>
#include <limits>
#include "kmp_os.h"
class tsc_tick_count {
private:
int64_t my_count;
public:
class tsc_interval_t {
int64_t value;
explicit tsc_interval_t(int64_t _value) : value(_value) {}
public:
tsc_interval_t() : value(0) {}; // Construct 0 time duration
double seconds() const; // Return the length of a time interval in seconds
double ticks() const { return double(value); }
int64_t getValue() const { return value; }
friend class tsc_tick_count;
friend tsc_interval_t operator-(
const tsc_tick_count t1, const tsc_tick_count t0);
};
tsc_tick_count() : my_count(static_cast<int64_t>(__rdtsc())) {};
tsc_tick_count(int64_t value) : my_count(value) {};
int64_t getValue() const { return my_count; }
tsc_tick_count later (tsc_tick_count const other) const {
return my_count > other.my_count ? (*this) : other;
}
tsc_tick_count earlier(tsc_tick_count const other) const {
return my_count < other.my_count ? (*this) : other;
}
static double tick_time(); // returns seconds per cycle (period) of clock
static tsc_tick_count now() { return tsc_tick_count(); } // returns the rdtsc register value
friend tsc_tick_count::tsc_interval_t operator-(const tsc_tick_count t1, const tsc_tick_count t0);
};
inline tsc_tick_count::tsc_interval_t operator-(const tsc_tick_count t1, const tsc_tick_count t0)
{
return tsc_tick_count::tsc_interval_t( t1.my_count-t0.my_count );
}
inline double tsc_tick_count::tsc_interval_t::seconds() const
{
return value*tick_time();
}
extern std::string formatSI(double interval, int width, char unit);
inline std::string formatSeconds(double interval, int width)
{
return formatSI(interval, width, 'S');
}
inline std::string formatTicks(double interval, int width)
{
return formatSI(interval, width, 'T');
}
class timePair
{
tsc_tick_count KMP_ALIGN_CACHE start;
tsc_tick_count end;
public:
timePair() : start(-std::numeric_limits<int64_t>::max()), end(-std::numeric_limits<int64_t>::max()) {}
tsc_tick_count get_start() const { return start; }
tsc_tick_count get_end() const { return end; }
tsc_tick_count * get_startp() { return &start; }
tsc_tick_count * get_endp() { return &end; }
void markStart() { start = tsc_tick_count::now(); }
void markEnd() { end = tsc_tick_count::now(); }
void set_start(tsc_tick_count s) { start = s; }
void set_end (tsc_tick_count e) { end = e; }
tsc_tick_count::tsc_interval_t duration() const { return end-start; }
std::string format() const;
};
extern tsc_tick_count::tsc_interval_t computeLastInLastOutInterval(timePair * times, int nTimes);
#endif // KMP_STATS_TIMING_H

View File

@ -1,7 +1,7 @@
/*
* kmp_str.c -- String manipulation routines.
* $Revision: 42810 $
* $Date: 2013-11-07 12:06:33 -0600 (Thu, 07 Nov 2013) $
* $Revision: 43084 $
* $Date: 2014-04-15 09:15:14 -0500 (Tue, 15 Apr 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_str.h -- String manipulation routines.
* $Revision: 42613 $
* $Date: 2013-08-23 13:29:50 -0500 (Fri, 23 Aug 2013) $
* $Revision: 43435 $
* $Date: 2014-09-04 15:16:08 -0500 (Thu, 04 Sep 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_stub.c -- stub versions of user-callable OpenMP RT functions.
* $Revision: 42826 $
* $Date: 2013-11-20 03:39:45 -0600 (Wed, 20 Nov 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/
@ -15,13 +15,13 @@
//===----------------------------------------------------------------------===//
#include "kmp_stub.h"
#include <stdlib.h>
#include <limits.h>
#include <errno.h>
#include "kmp_os.h" // KMP_OS_*
#include "omp.h" // Function renamings.
#include "kmp.h" // KMP_DEFAULT_STKSIZE
#include "kmp_stub.h"
#if KMP_OS_WINDOWS
#include <windows.h>
@ -29,20 +29,12 @@
#include <sys/time.h>
#endif
#include "omp.h" // Function renamings.
#include "kmp.h" // KMP_DEFAULT_STKSIZE
#include "kmp_version.h"
// Moved from omp.h
#if OMP_30_ENABLED
#define omp_set_max_active_levels ompc_set_max_active_levels
#define omp_set_schedule ompc_set_schedule
#define omp_get_ancestor_thread_num ompc_get_ancestor_thread_num
#define omp_get_team_size ompc_get_team_size
#endif // OMP_30_ENABLED
#define omp_set_num_threads ompc_set_num_threads
#define omp_set_dynamic ompc_set_dynamic
#define omp_set_nested ompc_set_nested
@ -95,15 +87,13 @@ static size_t __kmps_init() {
void omp_set_num_threads( omp_int_t num_threads ) { i; }
void omp_set_dynamic( omp_int_t dynamic ) { i; __kmps_set_dynamic( dynamic ); }
void omp_set_nested( omp_int_t nested ) { i; __kmps_set_nested( nested ); }
#if OMP_30_ENABLED
void omp_set_max_active_levels( omp_int_t max_active_levels ) { i; }
void omp_set_schedule( omp_sched_t kind, omp_int_t modifier ) { i; __kmps_set_schedule( (kmp_sched_t)kind, modifier ); }
int omp_get_ancestor_thread_num( omp_int_t level ) { i; return ( level ) ? ( -1 ) : ( 0 ); }
int omp_get_team_size( omp_int_t level ) { i; return ( level ) ? ( -1 ) : ( 1 ); }
int kmpc_set_affinity_mask_proc( int proc, void **mask ) { i; return -1; }
int kmpc_unset_affinity_mask_proc( int proc, void **mask ) { i; return -1; }
int kmpc_get_affinity_mask_proc( int proc, void **mask ) { i; return -1; }
#endif // OMP_30_ENABLED
void omp_set_max_active_levels( omp_int_t max_active_levels ) { i; }
void omp_set_schedule( omp_sched_t kind, omp_int_t modifier ) { i; __kmps_set_schedule( (kmp_sched_t)kind, modifier ); }
int omp_get_ancestor_thread_num( omp_int_t level ) { i; return ( level ) ? ( -1 ) : ( 0 ); }
int omp_get_team_size( omp_int_t level ) { i; return ( level ) ? ( -1 ) : ( 1 ); }
int kmpc_set_affinity_mask_proc( int proc, void **mask ) { i; return -1; }
int kmpc_unset_affinity_mask_proc( int proc, void **mask ) { i; return -1; }
int kmpc_get_affinity_mask_proc( int proc, void **mask ) { i; return -1; }
/* kmp API functions */
void kmp_set_stacksize( omp_int_t arg ) { i; __kmps_set_stacksize( arg ); }
@ -178,8 +168,6 @@ int __kmps_get_stacksize( void ) {
return __kmps_stacksize;
} // __kmps_get_stacksize
#if OMP_30_ENABLED
static kmp_sched_t __kmps_sched_kind = kmp_sched_default;
static int __kmps_sched_modifier = 0;
@ -195,8 +183,6 @@ static int __kmps_sched_modifier = 0;
*modifier = __kmps_sched_modifier;
} // __kmps_get_schedule
#endif // OMP_30_ENABLED
#if OMP_40_ENABLED
static kmp_proc_bind_t __kmps_proc_bind = proc_bind_false;

View File

@ -1,7 +1,7 @@
/*
* kmp_stub.h
* $Revision: 42061 $
* $Date: 2013-02-28 16:36:24 -0600 (Thu, 28 Feb 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/
@ -33,7 +33,6 @@ int __kmps_get_nested( void );
void __kmps_set_stacksize( int arg );
int __kmps_get_stacksize();
#if OMP_30_ENABLED
#ifndef KMP_SCHED_TYPE_DEFINED
#define KMP_SCHED_TYPE_DEFINED
typedef enum kmp_sched {
@ -46,11 +45,10 @@ typedef enum kmp_sched {
#endif
void __kmps_set_schedule( kmp_sched_t kind, int modifier );
void __kmps_get_schedule( kmp_sched_t *kind, int *modifier );
#endif // OMP_30_ENABLED
#if OMP_40_ENABLED
void __kmps_set_proc_bind( enum kmp_proc_bind_t arg );
enum kmp_proc_bind_t __kmps_get_proc_bind( void );
void __kmps_set_proc_bind( kmp_proc_bind_t arg );
kmp_proc_bind_t __kmps_get_proc_bind( void );
#endif /* OMP_40_ENABLED */
double __kmps_get_wtime();

View File

@ -19,6 +19,7 @@
#include "kmp.h"
#include "kmp_io.h"
#include "kmp_wait_release.h"
#if OMP_40_ENABLED
@ -88,20 +89,20 @@ static kmp_dephash_t *
__kmp_dephash_create ( kmp_info_t *thread )
{
kmp_dephash_t *h;
kmp_int32 size = kmp_dephash_size * sizeof(kmp_dephash_entry_t) + sizeof(kmp_dephash_t);
#if USE_FAST_MEMORY
h = (kmp_dephash_t *) __kmp_fast_allocate( thread, size );
#else
h = (kmp_dephash_t *) __kmp_thread_malloc( thread, size );
#endif
#ifdef KMP_DEBUG
#ifdef KMP_DEBUG
h->nelements = 0;
#endif
h->buckets = (kmp_dephash_entry **)(h+1);
for ( kmp_int32 i = 0; i < kmp_dephash_size; i++ )
h->buckets[i] = 0;
@ -137,11 +138,11 @@ static kmp_dephash_entry *
__kmp_dephash_find ( kmp_info_t *thread, kmp_dephash_t *h, kmp_intptr_t addr )
{
kmp_int32 bucket = __kmp_dephash_hash(addr);
kmp_dephash_entry_t *entry;
for ( entry = h->buckets[bucket]; entry; entry = entry->next_in_bucket )
if ( entry->addr == addr ) break;
if ( entry == NULL ) {
// create entry. This is only done by one thread so no locking required
#if USE_FAST_MEMORY
@ -212,6 +213,8 @@ static inline kmp_int32
__kmp_process_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t *hash,
bool dep_barrier,kmp_int32 ndeps, kmp_depend_info_t *dep_list)
{
KA_TRACE(30, ("__kmp_process_deps<%d>: T#%d processing %d depencies : dep_barrier = %d\n", filter, gtid, ndeps, dep_barrier ) );
kmp_info_t *thread = __kmp_threads[ gtid ];
kmp_int32 npredecessors=0;
for ( kmp_int32 i = 0; i < ndeps ; i++ ) {
@ -232,6 +235,8 @@ __kmp_process_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t *hash,
if ( indep->dn.task ) {
__kmp_track_dependence(indep,node);
indep->dn.successors = __kmp_add_node(thread, indep->dn.successors, node);
KA_TRACE(40,("__kmp_process_deps<%d>: T#%d adding dependence from %p to %p",
filter,gtid, KMP_TASK_TO_TASKDATA(indep->dn.task), KMP_TASK_TO_TASKDATA(node->dn.task)));
npredecessors++;
}
KMP_RELEASE_DEPNODE(gtid,indep);
@ -246,13 +251,16 @@ __kmp_process_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t *hash,
if ( last_out->dn.task ) {
__kmp_track_dependence(last_out,node);
last_out->dn.successors = __kmp_add_node(thread, last_out->dn.successors, node);
KA_TRACE(40,("__kmp_process_deps<%d>: T#%d adding dependence from %p to %p",
filter,gtid, KMP_TASK_TO_TASKDATA(last_out->dn.task), KMP_TASK_TO_TASKDATA(node->dn.task)));
npredecessors++;
}
KMP_RELEASE_DEPNODE(gtid,last_out);
}
if ( dep_barrier ) {
// if this is a sync point in the serial sequence and previous outputs are guaranteed to be completed after
// if this is a sync point in the serial sequence, then the previous outputs are guaranteed to be completed after
// the execution of this task so the previous output nodes can be cleared.
__kmp_node_deref(thread,last_out);
info->last_out = NULL;
@ -265,6 +273,9 @@ __kmp_process_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t *hash,
}
}
KA_TRACE(30, ("__kmp_process_deps<%d>: T#%d found %d predecessors\n", filter, gtid, npredecessors ) );
return npredecessors;
}
@ -278,7 +289,10 @@ __kmp_check_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_task_t *task, kmp_de
kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list )
{
int i;
kmp_taskdata_t * taskdata = KMP_TASK_TO_TASKDATA(task);
KA_TRACE(20, ("__kmp_check_deps: T#%d checking dependencies for task %p : %d possibly aliased dependencies, %d non-aliased depedencies : dep_barrier=%d .\n", gtid, taskdata, ndeps, ndeps_noalias, dep_barrier ) );
// Filter deps in dep_list
// TODO: Different algorithm for large dep_list ( > 10 ? )
for ( i = 0; i < ndeps; i ++ ) {
@ -292,8 +306,8 @@ __kmp_check_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_task_t *task, kmp_de
}
// doesn't need to be atomic as no other thread is going to be accessing this node just yet
// npredecessors is set 1 to ensure that none of the releasing tasks queues this task before we have finished processing all the dependencies
node->dn.npredecessors = 1;
// npredecessors is set -1 to ensure that none of the releasing tasks queues this task before we have finished processing all the dependencies
node->dn.npredecessors = -1;
// used to pack all npredecessors additions into a single atomic operation at the end
int npredecessors;
@ -301,12 +315,16 @@ __kmp_check_deps ( kmp_int32 gtid, kmp_depnode_t *node, kmp_task_t *task, kmp_de
npredecessors = __kmp_process_deps<true>(gtid, node, hash, dep_barrier, ndeps, dep_list);
npredecessors += __kmp_process_deps<false>(gtid, node, hash, dep_barrier, ndeps_noalias, noalias_dep_list);
KMP_TEST_THEN_ADD32(&node->dn.npredecessors, npredecessors);
// Remove the fake predecessor and find out if there's any outstanding dependence (some tasks may have finished while we processed the dependences)
node->dn.task = task;
KMP_MB();
npredecessors = KMP_TEST_THEN_DEC32(&node->dn.npredecessors) - 1;
// Account for our initial fake value
npredecessors++;
// Update predecessors and obtain current value to check if there are still any outstandig dependences (some tasks may have finished while we processed the dependences)
npredecessors = KMP_TEST_THEN_ADD32(&node->dn.npredecessors, npredecessors) + npredecessors;
KA_TRACE(20, ("__kmp_check_deps: T#%d found %d predecessors for task %p \n", gtid, npredecessors, taskdata ) );
// beyond this point the task could be queued (and executed) by a releasing task...
return npredecessors > 0 ? true : false;
@ -318,11 +336,15 @@ __kmp_release_deps ( kmp_int32 gtid, kmp_taskdata_t *task )
kmp_info_t *thread = __kmp_threads[ gtid ];
kmp_depnode_t *node = task->td_depnode;
if ( task->td_dephash )
if ( task->td_dephash ) {
KA_TRACE(40, ("__kmp_realease_deps: T#%d freeing dependencies hash of task %p.\n", gtid, task ) );
__kmp_dephash_free(thread,task->td_dephash);
}
if ( !node ) return;
KA_TRACE(20, ("__kmp_realease_deps: T#%d notifying succesors of task %p.\n", gtid, task ) );
KMP_ACQUIRE_DEPNODE(gtid,node);
node->dn.task = NULL; // mark this task as finished, so no new dependencies are generated
KMP_RELEASE_DEPNODE(gtid,node);
@ -335,9 +357,10 @@ __kmp_release_deps ( kmp_int32 gtid, kmp_taskdata_t *task )
// successor task can be NULL for wait_depends or because deps are still being processed
if ( npredecessors == 0 ) {
KMP_MB();
if ( successor->dn.task )
// loc_ref was already stored in successor's task_data
__kmpc_omp_task(NULL,gtid,successor->dn.task);
if ( successor->dn.task ) {
KA_TRACE(20, ("__kmp_realease_deps: T#%d successor %p of %p scheduled for execution.\n", gtid, successor->dn.task, task ) );
__kmp_omp_task(gtid,successor->dn.task,false);
}
}
next = p->next;
@ -350,6 +373,8 @@ __kmp_release_deps ( kmp_int32 gtid, kmp_taskdata_t *task )
}
__kmp_node_deref(thread,node);
KA_TRACE(20, ("__kmp_realease_deps: T#%d all successors of %p notified of completation\n", gtid, task ) );
}
/*!
@ -368,15 +393,20 @@ Schedule a non-thread-switchable task with dependences for execution
*/
kmp_int32
__kmpc_omp_task_with_deps( ident_t *loc_ref, kmp_int32 gtid, kmp_task_t * new_task,
kmp_int32 ndeps, kmp_depend_info_t *dep_list,
kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list )
kmp_int32 ndeps, kmp_depend_info_t *dep_list,
kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list )
{
kmp_taskdata_t * new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
KA_TRACE(10, ("__kmpc_omp_task_with_deps(enter): T#%d loc=%p task=%p\n",
gtid, loc_ref, new_taskdata ) );
kmp_info_t *thread = __kmp_threads[ gtid ];
kmp_taskdata_t * current_task = thread->th.th_current_task;
bool serial = current_task->td_flags.team_serial || current_task->td_flags.tasking_ser || current_task->td_flags.final;
if ( !serial && ( ndeps > 0 || ndeps_noalias > 0 )) {
if ( !serial && ( ndeps > 0 || ndeps_noalias > 0 )) {
/* if no dependencies have been tracked yet, create the dependence hash */
if ( current_task->td_dephash == NULL )
current_task->td_dephash = __kmp_dephash_create(thread);
@ -388,13 +418,21 @@ __kmpc_omp_task_with_deps( ident_t *loc_ref, kmp_int32 gtid, kmp_task_t * new_ta
#endif
__kmp_init_node(node);
KMP_TASK_TO_TASKDATA(new_task)->td_depnode = node;
new_taskdata->td_depnode = node;
if ( __kmp_check_deps( gtid, node, new_task, current_task->td_dephash, NO_DEP_BARRIER,
ndeps, dep_list, ndeps_noalias,noalias_dep_list ) )
ndeps, dep_list, ndeps_noalias,noalias_dep_list ) ) {
KA_TRACE(10, ("__kmpc_omp_task_with_deps(exit): T#%d task had blocking dependencies: "
"loc=%p task=%p, return: TASK_CURRENT_NOT_QUEUED\n", gtid, loc_ref,
new_taskdata ) );
return TASK_CURRENT_NOT_QUEUED;
}
}
KA_TRACE(10, ("__kmpc_omp_task_with_deps(exit): T#%d task had no blocking dependencies : "
"loc=%p task=%p, transferring to __kmpc_omp_task\n", gtid, loc_ref,
new_taskdata ) );
return __kmpc_omp_task(loc_ref,gtid,new_task);
}
@ -413,35 +451,44 @@ void
__kmpc_omp_wait_deps ( ident_t *loc_ref, kmp_int32 gtid, kmp_int32 ndeps, kmp_depend_info_t *dep_list,
kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list )
{
if ( ndeps == 0 && ndeps_noalias == 0 ) return;
KA_TRACE(10, ("__kmpc_omp_wait_deps(enter): T#%d loc=%p\n", gtid, loc_ref) );
if ( ndeps == 0 && ndeps_noalias == 0 ) {
KA_TRACE(10, ("__kmpc_omp_wait_deps(exit): T#%d has no dependencies to wait upon : loc=%p\n", gtid, loc_ref) );
return;
}
kmp_info_t *thread = __kmp_threads[ gtid ];
kmp_taskdata_t * current_task = thread->th.th_current_task;
// dependences are not computed in serial teams
if ( current_task->td_flags.team_serial || current_task->td_flags.tasking_ser || current_task->td_flags.final)
// We can return immediately as:
// - dependences are not computed in serial teams
// - if the dephash is not yet created it means we have nothing to wait for
if ( current_task->td_flags.team_serial || current_task->td_flags.tasking_ser || current_task->td_flags.final || current_task->td_dephash == NULL ) {
KA_TRACE(10, ("__kmpc_omp_wait_deps(exit): T#%d has no blocking dependencies : loc=%p\n", gtid, loc_ref) );
return;
// if the dephash is not yet created it means we have nothing to wait for
if ( current_task->td_dephash == NULL ) return;
}
kmp_depnode_t node;
__kmp_init_node(&node);
if (!__kmp_check_deps( gtid, &node, NULL, current_task->td_dephash, DEP_BARRIER,
ndeps, dep_list, ndeps_noalias, noalias_dep_list ))
ndeps, dep_list, ndeps_noalias, noalias_dep_list )) {
KA_TRACE(10, ("__kmpc_omp_wait_deps(exit): T#%d has no blocking dependencies : loc=%p\n", gtid, loc_ref) );
return;
int thread_finished = FALSE;
while ( node.dn.npredecessors > 0 ) {
__kmp_execute_tasks( thread, gtid, (volatile kmp_uint32 *)&(node.dn.npredecessors),
0, FALSE, &thread_finished,
#if USE_ITT_BUILD
NULL,
#endif
__kmp_task_stealing_constraint );
}
int thread_finished = FALSE;
kmp_flag_32 flag((volatile kmp_uint32 *)&(node.dn.npredecessors), 0U);
while ( node.dn.npredecessors > 0 ) {
flag.execute_tasks(thread, gtid, FALSE, &thread_finished,
#if USE_ITT_BUILD
NULL,
#endif
__kmp_task_stealing_constraint );
}
KA_TRACE(10, ("__kmpc_omp_wait_deps(exit): T#%d finished waiting : loc=%p\n", gtid, loc_ref) );
}
#endif /* OMP_40_ENABLED */

View File

@ -1,7 +1,7 @@
/*
* kmp_tasking.c -- OpenMP 3.0 tasking support.
* $Revision: 42852 $
* $Date: 2013-12-04 10:50:49 -0600 (Wed, 04 Dec 2013) $
* $Revision: 43389 $
* $Date: 2014-08-11 10:54:01 -0500 (Mon, 11 Aug 2014) $
*/
@ -18,9 +18,9 @@
#include "kmp.h"
#include "kmp_i18n.h"
#include "kmp_itt.h"
#include "kmp_wait_release.h"
#if OMP_30_ENABLED
/* ------------------------------------------------------------------------ */
/* ------------------------------------------------------------------------ */
@ -31,26 +31,12 @@ static void __kmp_enable_tasking( kmp_task_team_t *task_team, kmp_info_t *this_t
static void __kmp_alloc_task_deque( kmp_info_t *thread, kmp_thread_data_t *thread_data );
static int __kmp_realloc_task_threads_data( kmp_info_t *thread, kmp_task_team_t *task_team );
#ifndef KMP_DEBUG
# define __kmp_static_delay( arg ) /* nothing to do */
#else
static void
__kmp_static_delay( int arg )
{
/* Work around weird code-gen bug that causes assert to trip */
# if KMP_ARCH_X86_64 && KMP_OS_LINUX
KMP_ASSERT( arg != 0 );
# else
KMP_ASSERT( arg >= 0 );
# endif
}
#endif /* KMP_DEBUG */
static void
__kmp_static_yield( int arg )
{
__kmp_yield( arg );
static inline void __kmp_null_resume_wrapper(int gtid, volatile void *flag) {
switch (((kmp_flag_64 *)flag)->get_type()) {
case flag32: __kmp_resume_32(gtid, NULL); break;
case flag64: __kmp_resume_64(gtid, NULL); break;
case flag_oncore: __kmp_resume_oncore(gtid, NULL); break;
}
}
#ifdef BUILD_TIED_TASK_STACK
@ -605,9 +591,7 @@ __kmp_task_finish( kmp_int32 gtid, kmp_task_t *task, kmp_taskdata_t *resumed_tas
}
#endif /* BUILD_TIED_TASK_STACK */
KMP_DEBUG_ASSERT( taskdata -> td_flags.executing == 1 );
KMP_DEBUG_ASSERT( taskdata -> td_flags.complete == 0 );
taskdata -> td_flags.executing = 0; // suspend the finishing task
taskdata -> td_flags.complete = 1; // mark the task as completed
KMP_DEBUG_ASSERT( taskdata -> td_flags.started == 1 );
KMP_DEBUG_ASSERT( taskdata -> td_flags.freed == 0 );
@ -624,6 +608,12 @@ __kmp_task_finish( kmp_int32 gtid, kmp_task_t *task, kmp_taskdata_t *resumed_tas
#endif
}
// td_flags.executing must be marked as 0 after __kmp_release_deps has been called
// Othertwise, if a task is executed immediately from the release_deps code
// the flag will be reset to 1 again by this same function
KMP_DEBUG_ASSERT( taskdata -> td_flags.executing == 1 );
taskdata -> td_flags.executing = 0; // suspend the finishing task
KA_TRACE(20, ("__kmp_task_finish: T#%d finished task %p, %d incomplete children\n",
gtid, taskdata, children) );
@ -908,7 +898,7 @@ __kmp_task_alloc( ident_t *loc_ref, kmp_int32 gtid, kmp_tasking_flags_t *flags,
taskdata->td_taskgroup = parent_task->td_taskgroup; // task inherits the taskgroup from the parent task
taskdata->td_dephash = NULL;
taskdata->td_depnode = NULL;
#endif
#endif
// Only need to keep track of child task counts if team parallel and tasking not serialized
if ( !( taskdata -> td_flags.team_serial || taskdata -> td_flags.tasking_ser ) ) {
KMP_TEST_THEN_INC32( (kmp_int32 *)(& parent_task->td_incomplete_child_tasks) );
@ -1047,9 +1037,38 @@ __kmpc_omp_task_parts( ident_t *loc_ref, kmp_int32 gtid, kmp_task_t * new_task)
return TASK_CURRENT_NOT_QUEUED;
}
//---------------------------------------------------------------------
// __kmp_omp_task: Schedule a non-thread-switchable task for execution
// gtid: Global Thread ID of encountering thread
// new_task: non-thread-switchable task thunk allocated by __kmp_omp_task_alloc()
// serialize_immediate: if TRUE then if the task is executed immediately its execution will be serialized
// returns:
//
// TASK_CURRENT_NOT_QUEUED (0) if did not suspend and queue current task to be resumed later.
// TASK_CURRENT_QUEUED (1) if suspended and queued the current task to be resumed later.
kmp_int32
__kmp_omp_task( kmp_int32 gtid, kmp_task_t * new_task, bool serialize_immediate )
{
kmp_taskdata_t * new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
/* Should we execute the new task or queue it? For now, let's just always try to
queue it. If the queue fills up, then we'll execute it. */
if ( __kmp_push_task( gtid, new_task ) == TASK_NOT_PUSHED ) // if cannot defer
{ // Execute this task immediately
kmp_taskdata_t * current_task = __kmp_threads[ gtid ] -> th.th_current_task;
if ( serialize_immediate )
new_taskdata -> td_flags.task_serial = 1;
__kmp_invoke_task( gtid, new_task, current_task );
}
return TASK_CURRENT_NOT_QUEUED;
}
//---------------------------------------------------------------------
// __kmpc_omp_task: Schedule a non-thread-switchable task for execution
// __kmpc_omp_task: Wrapper around __kmp_omp_task to schedule a non-thread-switchable task from
// the parent thread only!
// loc_ref: location of original task pragma (ignored)
// gtid: Global Thread ID of encountering thread
// new_task: non-thread-switchable task thunk allocated by __kmp_omp_task_alloc()
@ -1062,28 +1081,18 @@ kmp_int32
__kmpc_omp_task( ident_t *loc_ref, kmp_int32 gtid, kmp_task_t * new_task)
{
kmp_taskdata_t * new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
kmp_int32 rc;
kmp_int32 res;
KA_TRACE(10, ("__kmpc_omp_task(enter): T#%d loc=%p task=%p\n",
gtid, loc_ref, new_taskdata ) );
/* Should we execute the new task or queue it? For now, let's just always try to
queue it. If the queue fills up, then we'll execute it. */
if ( __kmp_push_task( gtid, new_task ) == TASK_NOT_PUSHED ) // if cannot defer
{ // Execute this task immediately
kmp_taskdata_t * current_task = __kmp_threads[ gtid ] -> th.th_current_task;
new_taskdata -> td_flags.task_serial = 1;
__kmp_invoke_task( gtid, new_task, current_task );
}
res = __kmp_omp_task(gtid,new_task,true);
KA_TRACE(10, ("__kmpc_omp_task(exit): T#%d returning TASK_CURRENT_NOT_QUEUED: loc=%p task=%p\n",
gtid, loc_ref, new_taskdata ) );
return TASK_CURRENT_NOT_QUEUED;
return res;
}
//-------------------------------------------------------------------------------------
// __kmpc_omp_taskwait: Wait until all tasks generated by the current task are complete
@ -1117,11 +1126,10 @@ __kmpc_omp_taskwait( ident_t *loc_ref, kmp_int32 gtid )
if ( ! taskdata->td_flags.team_serial ) {
// GEH: if team serialized, avoid reading the volatile variable below.
kmp_flag_32 flag(&(taskdata->td_incomplete_child_tasks), 0U);
while ( TCR_4(taskdata -> td_incomplete_child_tasks) != 0 ) {
__kmp_execute_tasks( thread, gtid, &(taskdata->td_incomplete_child_tasks),
0, FALSE, &thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj),
__kmp_task_stealing_constraint );
flag.execute_tasks(thread, gtid, FALSE, &thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), __kmp_task_stealing_constraint );
}
}
#if USE_ITT_BUILD
@ -1153,7 +1161,7 @@ __kmpc_omp_taskyield( ident_t *loc_ref, kmp_int32 gtid, int end_part )
KA_TRACE(10, ("__kmpc_omp_taskyield(enter): T#%d loc=%p end_part = %d\n",
gtid, loc_ref, end_part) );
if ( __kmp_tasking_mode != tskm_immediate_exec ) {
if ( __kmp_tasking_mode != tskm_immediate_exec && __kmp_init_parallel ) {
// GEH TODO: shouldn't we have some sort of OMPRAP API calls here to mark begin wait?
thread = __kmp_threads[ gtid ];
@ -1172,11 +1180,14 @@ __kmpc_omp_taskyield( ident_t *loc_ref, kmp_int32 gtid, int end_part )
__kmp_itt_taskwait_starting( gtid, itt_sync_obj );
#endif /* USE_ITT_BUILD */
if ( ! taskdata->td_flags.team_serial ) {
__kmp_execute_tasks( thread, gtid, NULL, 0, FALSE, &thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj),
__kmp_task_stealing_constraint );
kmp_task_team_t * task_team = thread->th.th_task_team;
if (task_team != NULL) {
if (KMP_TASKING_ENABLED(task_team, thread->th.th_task_state)) {
__kmp_execute_tasks_32( thread, gtid, NULL, FALSE, &thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), __kmp_task_stealing_constraint );
}
}
}
#if USE_ITT_BUILD
if ( itt_sync_obj != NULL )
__kmp_itt_taskwait_finished( gtid, itt_sync_obj );
@ -1236,11 +1247,10 @@ __kmpc_end_taskgroup( ident_t* loc, int gtid )
#endif /* USE_ITT_BUILD */
if ( ! taskdata->td_flags.team_serial ) {
kmp_flag_32 flag(&(taskgroup->count), 0U);
while ( TCR_4(taskgroup->count) != 0 ) {
__kmp_execute_tasks( thread, gtid, &(taskgroup->count),
0, FALSE, &thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj),
__kmp_task_stealing_constraint );
flag.execute_tasks(thread, gtid, FALSE, &thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), __kmp_task_stealing_constraint );
}
}
@ -1433,7 +1443,7 @@ __kmp_steal_task( kmp_info_t *victim, kmp_int32 gtid, kmp_task_team_t *task_team
__kmp_release_bootstrap_lock( & victim_td -> td.td_deque_lock );
KA_TRACE(10, ("__kmp_steal_task(exit #3): T#%d stole task %p from T#d: task_team=%p "
KA_TRACE(10, ("__kmp_steal_task(exit #3): T#%d stole task %p from T#%d: task_team=%p "
"ntasks=%d head=%u tail=%u\n",
gtid, taskdata, __kmp_gtid_from_thread( victim ), task_team,
victim_td->td.td_deque_ntasks, victim_td->td.td_deque_head,
@ -1445,7 +1455,7 @@ __kmp_steal_task( kmp_info_t *victim, kmp_int32 gtid, kmp_task_team_t *task_team
//-----------------------------------------------------------------------------
// __kmp_execute_tasks: Choose and execute tasks until either the condition
// __kmp_execute_tasks_template: Choose and execute tasks until either the condition
// is statisfied (return true) or there are none left (return false).
// final_spin is TRUE if this is the spin at the release barrier.
// thread_finished indicates whether the thread is finished executing all
@ -1453,16 +1463,10 @@ __kmp_steal_task( kmp_info_t *victim, kmp_int32 gtid, kmp_task_team_t *task_team
// spinner is the location on which to spin.
// spinner == NULL means only execute a single task and return.
// checker is the value to check to terminate the spin.
int
__kmp_execute_tasks( kmp_info_t *thread,
kmp_int32 gtid,
volatile kmp_uint *spinner,
kmp_uint checker,
int final_spin,
int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj),
kmp_int32 is_constrained )
template <class C>
static inline int __kmp_execute_tasks_template(kmp_info_t *thread, kmp_int32 gtid, C *flag, int final_spin,
int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained)
{
kmp_task_team_t * task_team;
kmp_team_t * team;
@ -1478,7 +1482,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
task_team = thread -> th.th_task_team;
KMP_DEBUG_ASSERT( task_team != NULL );
KA_TRACE(15, ("__kmp_execute_tasks(enter): T#%d final_spin=%d *thread_finished=%d\n",
KA_TRACE(15, ("__kmp_execute_tasks_template(enter): T#%d final_spin=%d *thread_finished=%d\n",
gtid, final_spin, *thread_finished) );
threads_data = (kmp_thread_data_t *)TCR_PTR(task_team -> tt.tt_threads_data);
@ -1512,8 +1516,8 @@ __kmp_execute_tasks( kmp_info_t *thread,
// If this thread is in the last spin loop in the barrier, waiting to be
// released, we know that the termination condition will not be satisified,
// so don't waste any cycles checking it.
if ((spinner == NULL) || ((!final_spin) && (TCR_4(*spinner) == checker))) {
KA_TRACE(15, ("__kmp_execute_tasks(exit #1): T#%d spin condition satisfied\n", gtid) );
if (flag == NULL || (!final_spin && flag->done_check())) {
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #1): T#%d spin condition satisfied\n", gtid) );
return TRUE;
}
KMP_YIELD( __kmp_library == library_throughput ); // Yield before executing next task
@ -1527,7 +1531,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
// result in the termination condition being satisfied.
if (! *thread_finished) {
kmp_uint32 count = KMP_TEST_THEN_DEC32( (kmp_int32 *)unfinished_threads ) - 1;
KA_TRACE(20, ("__kmp_execute_tasks(dec #1): T#%d dec unfinished_threads to %d task_team=%p\n",
KA_TRACE(20, ("__kmp_execute_tasks_template(dec #1): T#%d dec unfinished_threads to %d task_team=%p\n",
gtid, count, task_team) );
*thread_finished = TRUE;
}
@ -1537,8 +1541,8 @@ __kmp_execute_tasks( kmp_info_t *thread,
// thread to pass through the barrier, where it might reset each thread's
// th.th_team field for the next parallel region.
// If we can steal more work, we know that this has not happened yet.
if ((spinner != NULL) && (TCR_4(*spinner) == checker)) {
KA_TRACE(15, ("__kmp_execute_tasks(exit #2): T#%d spin condition satisfied\n", gtid) );
if (flag != NULL && flag->done_check()) {
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #2): T#%d spin condition satisfied\n", gtid) );
return TRUE;
}
}
@ -1569,8 +1573,8 @@ __kmp_execute_tasks( kmp_info_t *thread,
#endif /* USE_ITT_BUILD */
// Check to see if this thread can proceed.
if ((spinner == NULL) || ((!final_spin) && (TCR_4(*spinner) == checker))) {
KA_TRACE(15, ("__kmp_execute_tasks(exit #3): T#%d spin condition satisfied\n",
if (flag == NULL || (!final_spin && flag->done_check())) {
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #3): T#%d spin condition satisfied\n",
gtid) );
return TRUE;
}
@ -1579,7 +1583,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
// If the execution of the stolen task resulted in more tasks being
// placed on our run queue, then restart the whole process.
if (TCR_4(threads_data[ tid ].td.td_deque_ntasks) != 0) {
KA_TRACE(20, ("__kmp_execute_tasks: T#%d stolen task spawned other tasks, restart\n",
KA_TRACE(20, ("__kmp_execute_tasks_template: T#%d stolen task spawned other tasks, restart\n",
gtid) );
goto start;
}
@ -1596,7 +1600,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
// result in the termination condition being satisfied.
if (! *thread_finished) {
kmp_uint32 count = KMP_TEST_THEN_DEC32( (kmp_int32 *)unfinished_threads ) - 1;
KA_TRACE(20, ("__kmp_execute_tasks(dec #2): T#%d dec unfinished_threads to %d "
KA_TRACE(20, ("__kmp_execute_tasks_template(dec #2): T#%d dec unfinished_threads to %d "
"task_team=%p\n", gtid, count, task_team) );
*thread_finished = TRUE;
}
@ -1607,8 +1611,8 @@ __kmp_execute_tasks( kmp_info_t *thread,
// thread to pass through the barrier, where it might reset each thread's
// th.th_team field for the next parallel region.
// If we can steal more work, we know that this has not happened yet.
if ((spinner != NULL) && (TCR_4(*spinner) == checker)) {
KA_TRACE(15, ("__kmp_execute_tasks(exit #4): T#%d spin condition satisfied\n",
if (flag != NULL && flag->done_check()) {
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #4): T#%d spin condition satisfied\n",
gtid) );
return TRUE;
}
@ -1640,8 +1644,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
(__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) &&
(TCR_PTR(other_thread->th.th_sleep_loc) != NULL))
{
__kmp_resume( __kmp_gtid_from_thread( other_thread ), NULL );
__kmp_null_resume_wrapper(__kmp_gtid_from_thread(other_thread), other_thread->th.th_sleep_loc);
// A sleeping thread should not have any tasks on it's queue.
// There is a slight possibility that it resumes, steals a task from
// another thread, which spawns more tasks, all in the that it takes
@ -1677,8 +1680,8 @@ __kmp_execute_tasks( kmp_info_t *thread,
}
// Check to see if this thread can proceed.
if ((spinner == NULL) || ((!final_spin) && (TCR_4(*spinner) == checker))) {
KA_TRACE(15, ("__kmp_execute_tasks(exit #5): T#%d spin condition satisfied\n",
if (flag == NULL || (!final_spin && flag->done_check())) {
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #5): T#%d spin condition satisfied\n",
gtid) );
return TRUE;
}
@ -1687,7 +1690,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
// If the execution of the stolen task resulted in more tasks being
// placed on our run queue, then restart the whole process.
if (TCR_4(threads_data[ tid ].td.td_deque_ntasks) != 0) {
KA_TRACE(20, ("__kmp_execute_tasks: T#%d stolen task spawned other tasks, restart\n",
KA_TRACE(20, ("__kmp_execute_tasks_template: T#%d stolen task spawned other tasks, restart\n",
gtid) );
goto start;
}
@ -1704,7 +1707,7 @@ __kmp_execute_tasks( kmp_info_t *thread,
// result in the termination condition being satisfied.
if (! *thread_finished) {
kmp_uint32 count = KMP_TEST_THEN_DEC32( (kmp_int32 *)unfinished_threads ) - 1;
KA_TRACE(20, ("__kmp_execute_tasks(dec #3): T#%d dec unfinished_threads to %d; "
KA_TRACE(20, ("__kmp_execute_tasks_template(dec #3): T#%d dec unfinished_threads to %d; "
"task_team=%p\n",
gtid, count, task_team) );
*thread_finished = TRUE;
@ -1716,18 +1719,42 @@ __kmp_execute_tasks( kmp_info_t *thread,
// thread to pass through the barrier, where it might reset each thread's
// th.th_team field for the next parallel region.
// If we can steal more work, we know that this has not happened yet.
if ((spinner != NULL) && (TCR_4(*spinner) == checker)) {
KA_TRACE(15, ("__kmp_execute_tasks(exit #6): T#%d spin condition satisfied\n",
gtid) );
if (flag != NULL && flag->done_check()) {
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #6): T#%d spin condition satisfied\n", gtid) );
return TRUE;
}
}
}
KA_TRACE(15, ("__kmp_execute_tasks(exit #7): T#%d can't find work\n", gtid) );
KA_TRACE(15, ("__kmp_execute_tasks_template(exit #7): T#%d can't find work\n", gtid) );
return FALSE;
}
int __kmp_execute_tasks_32(kmp_info_t *thread, kmp_int32 gtid, kmp_flag_32 *flag, int final_spin,
int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained)
{
return __kmp_execute_tasks_template(thread, gtid, flag, final_spin, thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), is_constrained);
}
int __kmp_execute_tasks_64(kmp_info_t *thread, kmp_int32 gtid, kmp_flag_64 *flag, int final_spin,
int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained)
{
return __kmp_execute_tasks_template(thread, gtid, flag, final_spin, thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), is_constrained);
}
int __kmp_execute_tasks_oncore(kmp_info_t *thread, kmp_int32 gtid, kmp_flag_oncore *flag, int final_spin,
int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained)
{
return __kmp_execute_tasks_template(thread, gtid, flag, final_spin, thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), is_constrained);
}
//-----------------------------------------------------------------------------
// __kmp_enable_tasking: Allocate task team and resume threads sleeping at the
@ -1770,7 +1797,7 @@ __kmp_enable_tasking( kmp_task_team_t *task_team, kmp_info_t *this_thr )
// tasks and execute them. In extra barrier mode, tasks do not sleep
// at the separate tasking barrier, so this isn't a problem.
for (i = 0; i < nthreads; i++) {
volatile kmp_uint *sleep_loc;
volatile void *sleep_loc;
kmp_info_t *thread = threads_data[i].td.td_thr;
if (i == this_thr->th.th_info.ds.ds_tid) {
@ -1779,17 +1806,16 @@ __kmp_enable_tasking( kmp_task_team_t *task_team, kmp_info_t *this_thr )
// Since we haven't locked the thread's suspend mutex lock at this
// point, there is a small window where a thread might be putting
// itself to sleep, but hasn't set the th_sleep_loc field yet.
// To work around this, __kmp_execute_tasks() periodically checks
// To work around this, __kmp_execute_tasks_template() periodically checks
// see if other threads are sleeping (using the same random
// mechanism that is used for task stealing) and awakens them if
// they are.
if ( ( sleep_loc = (volatile kmp_uint *)
TCR_PTR( thread -> th.th_sleep_loc) ) != NULL )
if ( ( sleep_loc = TCR_PTR( thread -> th.th_sleep_loc) ) != NULL )
{
KF_TRACE( 50, ( "__kmp_enable_tasking: T#%d waking up thread T#%d\n",
__kmp_gtid_from_thread( this_thr ),
__kmp_gtid_from_thread( thread ) ) );
__kmp_resume( __kmp_gtid_from_thread( thread ), sleep_loc );
__kmp_null_resume_wrapper(__kmp_gtid_from_thread(thread), sleep_loc);
}
else {
KF_TRACE( 50, ( "__kmp_enable_tasking: T#%d don't wake up thread T#%d\n",
@ -1805,7 +1831,7 @@ __kmp_enable_tasking( kmp_task_team_t *task_team, kmp_info_t *this_thr )
/* ------------------------------------------------------------------------ */
/*
/* // TODO: Check the comment consistency
* Utility routines for "task teams". A task team (kmp_task_t) is kind of
* like a shadow of the kmp_team_t data struct, with a different lifetime.
* After a child * thread checks into a barrier and calls __kmp_release() from
@ -1839,6 +1865,7 @@ __kmp_enable_tasking( kmp_task_team_t *task_team, kmp_info_t *this_thr )
* barriers, when no explicit tasks were spawned (pushed, actually).
*/
static kmp_task_team_t *__kmp_free_task_teams = NULL; // Free list for task_team data structures
// Lock for task team data structures
static kmp_bootstrap_lock_t __kmp_task_team_lock = KMP_BOOTSTRAP_LOCK_INITIALIZER( __kmp_task_team_lock );
@ -2193,7 +2220,6 @@ __kmp_wait_to_unref_task_teams(void)
thread != NULL;
thread = thread->th.th_next_pool)
{
volatile kmp_uint *sleep_loc;
#if KMP_OS_WINDOWS
DWORD exit_val;
#endif
@ -2218,11 +2244,12 @@ __kmp_wait_to_unref_task_teams(void)
__kmp_gtid_from_thread( thread ) ) );
if ( __kmp_dflt_blocktime != KMP_MAX_BLOCKTIME ) {
volatile void *sleep_loc;
// If the thread is sleeping, awaken it.
if ( ( sleep_loc = (volatile kmp_uint *) TCR_PTR( thread->th.th_sleep_loc) ) != NULL ) {
if ( ( sleep_loc = TCR_PTR( thread->th.th_sleep_loc) ) != NULL ) {
KA_TRACE( 10, ( "__kmp_wait_to_unref_task_team: T#%d waking up thread T#%d\n",
__kmp_gtid_from_thread( thread ), __kmp_gtid_from_thread( thread ) ) );
__kmp_resume( __kmp_gtid_from_thread( thread ), sleep_loc );
__kmp_null_resume_wrapper(__kmp_gtid_from_thread(thread), sleep_loc);
}
}
}
@ -2350,9 +2377,9 @@ __kmp_task_team_wait( kmp_info_t *this_thr,
// contention, only the master thread checks for the
// termination condition.
//
__kmp_wait_sleep( this_thr, &task_team->tt.tt_unfinished_threads, 0, TRUE
USE_ITT_BUILD_ARG(itt_sync_obj)
);
kmp_flag_32 flag(&task_team->tt.tt_unfinished_threads, 0U);
flag.wait(this_thr, TRUE
USE_ITT_BUILD_ARG(itt_sync_obj));
//
// Kill the old task team, so that the worker threads will
@ -2390,8 +2417,9 @@ __kmp_tasking_barrier( kmp_team_t *team, kmp_info_t *thread, int gtid )
#if USE_ITT_BUILD
KMP_FSYNC_SPIN_INIT( spin, (kmp_uint32*) NULL );
#endif /* USE_ITT_BUILD */
while (! __kmp_execute_tasks( thread, gtid, spin, 0, TRUE, &flag
USE_ITT_BUILD_ARG(NULL), 0 ) ) {
kmp_flag_32 spin_flag(spin, 0U);
while (! spin_flag.execute_tasks(thread, gtid, TRUE, &flag
USE_ITT_BUILD_ARG(NULL), 0 ) ) {
#if USE_ITT_BUILD
// TODO: What about itt_sync_obj??
KMP_FSYNC_SPIN_PREPARE( spin );
@ -2409,5 +2437,3 @@ __kmp_tasking_barrier( kmp_team_t *team, kmp_info_t *thread, int gtid )
#endif /* USE_ITT_BUILD */
}
#endif // OMP_30_ENABLED

View File

@ -1,7 +1,7 @@
/*
* kmp_taskq.c -- TASKQ support for OpenMP.
* $Revision: 42582 $
* $Date: 2013-08-09 06:30:22 -0500 (Fri, 09 Aug 2013) $
* $Revision: 43389 $
* $Date: 2014-08-11 10:54:01 -0500 (Mon, 11 Aug 2014) $
*/
@ -33,23 +33,6 @@
#define THREAD_ALLOC_FOR_TASKQ
static void
__kmp_static_delay( int arg )
{
/* Work around weird code-gen bug that causes assert to trip */
#if KMP_ARCH_X86_64 && KMP_OS_LINUX
KMP_ASSERT( arg != 0 );
#else
KMP_ASSERT( arg >= 0 );
#endif
}
static void
__kmp_static_yield( int arg )
{
__kmp_yield( arg );
}
static int
in_parallel_context( kmp_team_t *team )
{
@ -790,7 +773,7 @@ __kmp_dequeue_task (kmp_int32 global_tid, kmpc_task_queue_t *queue, int in_paral
* 1. Walk up the task queue tree from the current queue's parent and look
* on the way up (for loop, below).
* 2. Do a depth-first search back down the tree from the root and
* look (find_task_in_descandent_queue()).
* look (find_task_in_descendant_queue()).
*
* Here are the rules for deciding which task to take from a queue
* (__kmp_find_task_in_queue ()):
@ -1608,7 +1591,6 @@ __kmpc_end_taskq(ident_t *loc, kmp_int32 global_tid, kmpc_thunk_t *taskq_thunk)
&& (! __kmp_taskq_has_any_children(queue) )
&& (! (queue->tq_flags & TQF_ALL_TASKS_QUEUED) )
) {
__kmp_static_delay( 1 );
KMP_YIELD_WHEN( TRUE, spins );
}

View File

@ -1,7 +1,7 @@
/*
* kmp_threadprivate.c -- OpenMP threadprivate support library
* $Revision: 42618 $
* $Date: 2013-08-27 09:15:45 -0500 (Tue, 27 Aug 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_utility.c -- Utility routines for the OpenMP support library.
* $Revision: 42588 $
* $Date: 2013-08-13 01:26:00 -0500 (Tue, 13 Aug 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,7 +1,7 @@
/*
* kmp_version.c
* $Revision: 42806 $
* $Date: 2013-11-05 16:16:45 -0600 (Tue, 05 Nov 2013) $
* $Revision: 43435 $
* $Date: 2014-09-04 15:16:08 -0500 (Thu, 04 Sep 2014) $
*/
@ -20,7 +20,7 @@
#include "kmp_version.h"
// Replace with snapshot date YYYYMMDD for promotion build.
#define KMP_VERSION_BUILD 00000000
#define KMP_VERSION_BUILD 20140926
// Helper macros to convert value of macro to string literal.
#define _stringer( x ) #x
@ -46,6 +46,8 @@
#define KMP_COMPILER "Intel C++ Compiler 14.0"
#elif __INTEL_COMPILER == 1410
#define KMP_COMPILER "Intel C++ Compiler 14.1"
#elif __INTEL_COMPILER == 1500
#define KMP_COMPILER "Intel C++ Compiler 15.0"
#elif __INTEL_COMPILER == 9999
#define KMP_COMPILER "Intel C++ Compiler mainline"
#endif
@ -54,7 +56,7 @@
#elif KMP_COMPILER_GCC
#define KMP_COMPILER "GCC " stringer( __GNUC__ ) "." stringer( __GNUC_MINOR__ )
#elif KMP_COMPILER_MSVC
#define KMP_COMPILER "MSVC " stringer( __MSC_FULL_VER )
#define KMP_COMPILER "MSVC " stringer( _MSC_FULL_VER )
#endif
#ifndef KMP_COMPILER
#warning "Unknown compiler"
@ -77,7 +79,7 @@
// Finally, define strings.
#define KMP_LIBRARY KMP_LIB_TYPE " library (" KMP_LINK_TYPE ")"
#define KMP_COPYRIGHT "Copyright (C) 1997-2013, Intel Corporation. All Rights Reserved."
#define KMP_COPYRIGHT ""
int const __kmp_version_major = KMP_VERSION_MAJOR;
int const __kmp_version_minor = KMP_VERSION_MINOR;
@ -85,10 +87,8 @@ int const __kmp_version_build = KMP_VERSION_BUILD;
int const __kmp_openmp_version =
#if OMP_40_ENABLED
201307;
#elif OMP_30_ENABLED
201107;
#else
200505;
201107;
#endif
/* Do NOT change the format of this string! Intel(R) Thread Profiler checks for a
@ -128,7 +128,6 @@ __kmp_print_version_1( void )
kmp_str_buf_t buffer;
__kmp_str_buf_init( & buffer );
// Print version strings skipping initial magic.
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_copyright[ KMP_VERSION_MAGIC_LEN ] );
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_lib_ver[ KMP_VERSION_MAGIC_LEN ] );
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_lib_type[ KMP_VERSION_MAGIC_LEN ] );
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_link_type[ KMP_VERSION_MAGIC_LEN ] );
@ -164,8 +163,6 @@ __kmp_print_version_1( void )
); // __kmp_str_buf_print
}; // for i
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_lock[ KMP_VERSION_MAGIC_LEN ] );
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_perf_v19[ KMP_VERSION_MAGIC_LEN ] );
__kmp_str_buf_print( & buffer, "%s\n", & __kmp_version_perf_v106[ KMP_VERSION_MAGIC_LEN ] );
#endif
__kmp_str_buf_print(
& buffer,

View File

@ -1,7 +1,7 @@
/*
* kmp_version.h -- version number for this release
* $Revision: 42181 $
* $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
* $Revision: 42982 $
* $Date: 2014-02-12 10:11:02 -0600 (Wed, 12 Feb 2014) $
*/
@ -55,8 +55,6 @@ extern char const __kmp_version_alt_comp[];
extern char const __kmp_version_omp_api[];
// ??? extern char const __kmp_version_debug[];
extern char const __kmp_version_lock[];
extern char const __kmp_version_perf_v19[];
extern char const __kmp_version_perf_v106[];
extern char const __kmp_version_nested_stats_reporting[];
extern char const __kmp_version_ftnstdcall[];
extern char const __kmp_version_ftncdecl[];

View File

@ -0,0 +1,52 @@
/*
* kmp_wait_release.cpp -- Wait/Release implementation
* $Revision: 43417 $
* $Date: 2014-08-26 14:06:38 -0500 (Tue, 26 Aug 2014) $
*/
//===----------------------------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.
//
//===----------------------------------------------------------------------===//
#include "kmp_wait_release.h"
void __kmp_wait_32(kmp_info_t *this_thr, kmp_flag_32 *flag, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj) )
{
__kmp_wait_template(this_thr, flag, final_spin
USE_ITT_BUILD_ARG(itt_sync_obj) );
}
void __kmp_wait_64(kmp_info_t *this_thr, kmp_flag_64 *flag, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj) )
{
__kmp_wait_template(this_thr, flag, final_spin
USE_ITT_BUILD_ARG(itt_sync_obj) );
}
void __kmp_wait_oncore(kmp_info_t *this_thr, kmp_flag_oncore *flag, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj) )
{
__kmp_wait_template(this_thr, flag, final_spin
USE_ITT_BUILD_ARG(itt_sync_obj) );
}
void __kmp_release_32(kmp_flag_32 *flag) {
__kmp_release_template(flag);
}
void __kmp_release_64(kmp_flag_64 *flag) {
__kmp_release_template(flag);
}
void __kmp_release_oncore(kmp_flag_oncore *flag) {
__kmp_release_template(flag);
}

View File

@ -0,0 +1,496 @@
/*
* kmp_wait_release.h -- Wait/Release implementation
* $Revision: 43417 $
* $Date: 2014-08-26 14:06:38 -0500 (Tue, 26 Aug 2014) $
*/
//===----------------------------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is dual licensed under the MIT and the University of Illinois Open
// Source Licenses. See LICENSE.txt for details.
//
//===----------------------------------------------------------------------===//
#ifndef KMP_WAIT_RELEASE_H
#define KMP_WAIT_RELEASE_H
#include "kmp.h"
#include "kmp_itt.h"
/*!
@defgroup WAIT_RELEASE Wait/Release operations
The definitions and functions here implement the lowest level thread
synchronizations of suspending a thread and awaking it. They are used
to build higher level operations such as barriers and fork/join.
*/
/*!
@ingroup WAIT_RELEASE
@{
*/
/*!
* The flag_type describes the storage used for the flag.
*/
enum flag_type {
flag32, /**< 32 bit flags */
flag64, /**< 64 bit flags */
flag_oncore /**< special 64-bit flag for on-core barrier (hierarchical) */
};
/*!
* Base class for wait/release volatile flag
*/
template <typename P>
class kmp_flag {
volatile P * loc; /**< Pointer to the flag storage that is modified by another thread */
flag_type t; /**< "Type" of the flag in loc */
public:
typedef P flag_t;
kmp_flag(volatile P *p, flag_type ft) : loc(p), t(ft) {}
/*!
* @result the pointer to the actual flag
*/
volatile P * get() { return loc; }
/*!
* @result the flag_type
*/
flag_type get_type() { return t; }
// Derived classes must provide the following:
/*
kmp_info_t * get_waiter(kmp_uint32 i);
kmp_uint32 get_num_waiters();
bool done_check();
bool done_check_val(P old_loc);
bool notdone_check();
P internal_release();
P set_sleeping();
P unset_sleeping();
bool is_sleeping();
bool is_sleeping_val(P old_loc);
*/
};
/* Spin wait loop that first does pause, then yield, then sleep. A thread that calls __kmp_wait_*
must make certain that another thread calls __kmp_release to wake it back up to prevent deadlocks! */
template <class C>
static inline void __kmp_wait_template(kmp_info_t *this_thr, C *flag, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj) )
{
// NOTE: We may not belong to a team at this point.
volatile typename C::flag_t *spin = flag->get();
kmp_uint32 spins;
kmp_uint32 hibernate;
int th_gtid;
int tasks_completed = FALSE;
KMP_FSYNC_SPIN_INIT(spin, NULL);
if (flag->done_check()) {
KMP_FSYNC_SPIN_ACQUIRED(spin);
return;
}
th_gtid = this_thr->th.th_info.ds.ds_gtid;
KA_TRACE(20, ("__kmp_wait_sleep: T#%d waiting for flag(%p)\n", th_gtid, flag));
// Setup for waiting
KMP_INIT_YIELD(spins);
if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
// The worker threads cannot rely on the team struct existing at this point.
// Use the bt values cached in the thread struct instead.
#ifdef KMP_ADJUST_BLOCKTIME
if (__kmp_zero_bt && !this_thr->th.th_team_bt_set)
// Force immediate suspend if not set by user and more threads than available procs
hibernate = 0;
else
hibernate = this_thr->th.th_team_bt_intervals;
#else
hibernate = this_thr->th.th_team_bt_intervals;
#endif /* KMP_ADJUST_BLOCKTIME */
/* If the blocktime is nonzero, we want to make sure that we spin wait for the entirety
of the specified #intervals, plus up to one interval more. This increment make
certain that this thread doesn't go to sleep too soon. */
if (hibernate != 0)
hibernate++;
// Add in the current time value.
hibernate += TCR_4(__kmp_global.g.g_time.dt.t_value);
KF_TRACE(20, ("__kmp_wait_sleep: T#%d now=%d, hibernate=%d, intervals=%d\n",
th_gtid, __kmp_global.g.g_time.dt.t_value, hibernate,
hibernate - __kmp_global.g.g_time.dt.t_value));
}
KMP_MB();
// Main wait spin loop
while (flag->notdone_check()) {
int in_pool;
/* If the task team is NULL, it means one of things:
1) A newly-created thread is first being released by __kmp_fork_barrier(), and
its task team has not been set up yet.
2) All tasks have been executed to completion, this thread has decremented the task
team's ref ct and possibly deallocated it, and should no longer reference it.
3) Tasking is off for this region. This could be because we are in a serialized region
(perhaps the outer one), or else tasking was manually disabled (KMP_TASKING=0). */
kmp_task_team_t * task_team = NULL;
if (__kmp_tasking_mode != tskm_immediate_exec) {
task_team = this_thr->th.th_task_team;
if (task_team != NULL) {
if (!TCR_SYNC_4(task_team->tt.tt_active)) {
KMP_DEBUG_ASSERT(!KMP_MASTER_TID(this_thr->th.th_info.ds.ds_tid));
__kmp_unref_task_team(task_team, this_thr);
} else if (KMP_TASKING_ENABLED(task_team, this_thr->th.th_task_state)) {
flag->execute_tasks(this_thr, th_gtid, final_spin, &tasks_completed
USE_ITT_BUILD_ARG(itt_sync_obj), 0);
}
} // if
} // if
KMP_FSYNC_SPIN_PREPARE(spin);
if (TCR_4(__kmp_global.g.g_done)) {
if (__kmp_global.g.g_abort)
__kmp_abort_thread();
break;
}
// If we are oversubscribed, or have waited a bit (and KMP_LIBRARY=throughput), then yield
KMP_YIELD(TCR_4(__kmp_nth) > __kmp_avail_proc);
// TODO: Should it be number of cores instead of thread contexts? Like:
// KMP_YIELD(TCR_4(__kmp_nth) > __kmp_ncores);
// Need performance improvement data to make the change...
KMP_YIELD_SPIN(spins);
// Check if this thread was transferred from a team
// to the thread pool (or vice-versa) while spinning.
in_pool = !!TCR_4(this_thr->th.th_in_pool);
if (in_pool != !!this_thr->th.th_active_in_pool) {
if (in_pool) { // Recently transferred from team to pool
KMP_TEST_THEN_INC32((kmp_int32 *)&__kmp_thread_pool_active_nth);
this_thr->th.th_active_in_pool = TRUE;
/* Here, we cannot assert that:
KMP_DEBUG_ASSERT(TCR_4(__kmp_thread_pool_active_nth) <= __kmp_thread_pool_nth);
__kmp_thread_pool_nth is inc/dec'd by the master thread while the fork/join
lock is held, whereas __kmp_thread_pool_active_nth is inc/dec'd asynchronously
by the workers. The two can get out of sync for brief periods of time. */
}
else { // Recently transferred from pool to team
KMP_TEST_THEN_DEC32((kmp_int32 *) &__kmp_thread_pool_active_nth);
KMP_DEBUG_ASSERT(TCR_4(__kmp_thread_pool_active_nth) >= 0);
this_thr->th.th_active_in_pool = FALSE;
}
}
// Don't suspend if KMP_BLOCKTIME is set to "infinite"
if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME)
continue;
// Don't suspend if there is a likelihood of new tasks being spawned.
if ((task_team != NULL) && TCR_4(task_team->tt.tt_found_tasks))
continue;
// If we have waited a bit more, fall asleep
if (TCR_4(__kmp_global.g.g_time.dt.t_value) < hibernate)
continue;
KF_TRACE(50, ("__kmp_wait_sleep: T#%d suspend time reached\n", th_gtid));
flag->suspend(th_gtid);
if (TCR_4(__kmp_global.g.g_done)) {
if (__kmp_global.g.g_abort)
__kmp_abort_thread();
break;
}
// TODO: If thread is done with work and times out, disband/free
}
KMP_FSYNC_SPIN_ACQUIRED(spin);
}
/* Release any threads specified as waiting on the flag by releasing the flag and resume the waiting thread
if indicated by the sleep bit(s). A thread that calls __kmp_wait_template must call this function to wake
up the potentially sleeping thread and prevent deadlocks! */
template <class C>
static inline void __kmp_release_template(C *flag)
{
#ifdef KMP_DEBUG
// FIX ME
kmp_info_t * wait_thr = flag->get_waiter(0);
int target_gtid = wait_thr->th.th_info.ds.ds_gtid;
int gtid = TCR_4(__kmp_init_gtid) ? __kmp_get_gtid() : -1;
#endif
KF_TRACE(20, ("__kmp_release: T#%d releasing T#%d spin(%p)\n", gtid, target_gtid, flag->get()));
KMP_DEBUG_ASSERT(flag->get());
KMP_FSYNC_RELEASING(flag->get());
typename C::flag_t old_spin = flag->internal_release();
KF_TRACE(100, ("__kmp_release: T#%d old spin(%p)=%d, set new spin=%d\n",
gtid, flag->get(), old_spin, *(flag->get())));
if (__kmp_dflt_blocktime != KMP_MAX_BLOCKTIME) {
// Only need to check sleep stuff if infinite block time not set
if (flag->is_sleeping_val(old_spin)) {
for (unsigned int i=0; i<flag->get_num_waiters(); ++i) {
kmp_info_t * waiter = flag->get_waiter(i);
int wait_gtid = waiter->th.th_info.ds.ds_gtid;
// Wake up thread if needed
KF_TRACE(50, ("__kmp_release: T#%d waking up thread T#%d since sleep spin(%p) set\n",
gtid, wait_gtid, flag->get()));
flag->resume(wait_gtid);
}
} else {
KF_TRACE(50, ("__kmp_release: T#%d don't wake up thread T#%d since sleep spin(%p) not set\n",
gtid, target_gtid, flag->get()));
}
}
}
template <typename FlagType>
struct flag_traits {};
template <>
struct flag_traits<kmp_uint32> {
typedef kmp_uint32 flag_t;
static const flag_type t = flag32;
static inline flag_t tcr(flag_t f) { return TCR_4(f); }
static inline flag_t test_then_add4(volatile flag_t *f) { return KMP_TEST_THEN_ADD4_32((volatile kmp_int32 *)f); }
static inline flag_t test_then_or(volatile flag_t *f, flag_t v) { return KMP_TEST_THEN_OR32((volatile kmp_int32 *)f, v); }
static inline flag_t test_then_and(volatile flag_t *f, flag_t v) { return KMP_TEST_THEN_AND32((volatile kmp_int32 *)f, v); }
};
template <>
struct flag_traits<kmp_uint64> {
typedef kmp_uint64 flag_t;
static const flag_type t = flag64;
static inline flag_t tcr(flag_t f) { return TCR_8(f); }
static inline flag_t test_then_add4(volatile flag_t *f) { return KMP_TEST_THEN_ADD4_64((volatile kmp_int64 *)f); }
static inline flag_t test_then_or(volatile flag_t *f, flag_t v) { return KMP_TEST_THEN_OR64((volatile kmp_int64 *)f, v); }
static inline flag_t test_then_and(volatile flag_t *f, flag_t v) { return KMP_TEST_THEN_AND64((volatile kmp_int64 *)f, v); }
};
template <typename FlagType>
class kmp_basic_flag : public kmp_flag<FlagType> {
typedef flag_traits<FlagType> traits_type;
FlagType checker; /**< Value to compare flag to to check if flag has been released. */
kmp_info_t * waiting_threads[1]; /**< Array of threads sleeping on this thread. */
kmp_uint32 num_waiting_threads; /**< Number of threads sleeping on this thread. */
public:
kmp_basic_flag(volatile FlagType *p) : kmp_flag<FlagType>(p, traits_type::t), num_waiting_threads(0) {}
kmp_basic_flag(volatile FlagType *p, kmp_info_t *thr) : kmp_flag<FlagType>(p, traits_type::t), num_waiting_threads(1) {
waiting_threads[0] = thr;
}
kmp_basic_flag(volatile FlagType *p, FlagType c) : kmp_flag<FlagType>(p, traits_type::t), checker(c), num_waiting_threads(0) {}
/*!
* param i in index into waiting_threads
* @result the thread that is waiting at index i
*/
kmp_info_t * get_waiter(kmp_uint32 i) {
KMP_DEBUG_ASSERT(i<num_waiting_threads);
return waiting_threads[i];
}
/*!
* @result num_waiting_threads
*/
kmp_uint32 get_num_waiters() { return num_waiting_threads; }
/*!
* @param thr in the thread which is now waiting
*
* Insert a waiting thread at index 0.
*/
void set_waiter(kmp_info_t *thr) {
waiting_threads[0] = thr;
num_waiting_threads = 1;
}
/*!
* @result true if the flag object has been released.
*/
bool done_check() { return traits_type::tcr(*(this->get())) == checker; }
/*!
* @param old_loc in old value of flag
* @result true if the flag's old value indicates it was released.
*/
bool done_check_val(FlagType old_loc) { return old_loc == checker; }
/*!
* @result true if the flag object is not yet released.
* Used in __kmp_wait_template like:
* @code
* while (flag.notdone_check()) { pause(); }
* @endcode
*/
bool notdone_check() { return traits_type::tcr(*(this->get())) != checker; }
/*!
* @result Actual flag value before release was applied.
* Trigger all waiting threads to run by modifying flag to release state.
*/
FlagType internal_release() {
return traits_type::test_then_add4((volatile FlagType *)this->get());
}
/*!
* @result Actual flag value before sleep bit(s) set.
* Notes that there is at least one thread sleeping on the flag by setting sleep bit(s).
*/
FlagType set_sleeping() {
return traits_type::test_then_or((volatile FlagType *)this->get(), KMP_BARRIER_SLEEP_STATE);
}
/*!
* @result Actual flag value before sleep bit(s) cleared.
* Notes that there are no longer threads sleeping on the flag by clearing sleep bit(s).
*/
FlagType unset_sleeping() {
return traits_type::test_then_and((volatile FlagType *)this->get(), ~KMP_BARRIER_SLEEP_STATE);
}
/*!
* @param old_loc in old value of flag
* Test whether there are threads sleeping on the flag's old value in old_loc.
*/
bool is_sleeping_val(FlagType old_loc) { return old_loc & KMP_BARRIER_SLEEP_STATE; }
/*!
* Test whether there are threads sleeping on the flag.
*/
bool is_sleeping() { return is_sleeping_val(*(this->get())); }
};
class kmp_flag_32 : public kmp_basic_flag<kmp_uint32> {
public:
kmp_flag_32(volatile kmp_uint32 *p) : kmp_basic_flag<kmp_uint32>(p) {}
kmp_flag_32(volatile kmp_uint32 *p, kmp_info_t *thr) : kmp_basic_flag<kmp_uint32>(p, thr) {}
kmp_flag_32(volatile kmp_uint32 *p, kmp_uint32 c) : kmp_basic_flag<kmp_uint32>(p, c) {}
void suspend(int th_gtid) { __kmp_suspend_32(th_gtid, this); }
void resume(int th_gtid) { __kmp_resume_32(th_gtid, this); }
int execute_tasks(kmp_info_t *this_thr, kmp_int32 gtid, int final_spin, int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained) {
return __kmp_execute_tasks_32(this_thr, gtid, this, final_spin, thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), is_constrained);
}
void wait(kmp_info_t *this_thr, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj)) {
__kmp_wait_template(this_thr, this, final_spin
USE_ITT_BUILD_ARG(itt_sync_obj));
}
void release() { __kmp_release_template(this); }
};
class kmp_flag_64 : public kmp_basic_flag<kmp_uint64> {
public:
kmp_flag_64(volatile kmp_uint64 *p) : kmp_basic_flag<kmp_uint64>(p) {}
kmp_flag_64(volatile kmp_uint64 *p, kmp_info_t *thr) : kmp_basic_flag<kmp_uint64>(p, thr) {}
kmp_flag_64(volatile kmp_uint64 *p, kmp_uint64 c) : kmp_basic_flag<kmp_uint64>(p, c) {}
void suspend(int th_gtid) { __kmp_suspend_64(th_gtid, this); }
void resume(int th_gtid) { __kmp_resume_64(th_gtid, this); }
int execute_tasks(kmp_info_t *this_thr, kmp_int32 gtid, int final_spin, int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained) {
return __kmp_execute_tasks_64(this_thr, gtid, this, final_spin, thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), is_constrained);
}
void wait(kmp_info_t *this_thr, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj)) {
__kmp_wait_template(this_thr, this, final_spin
USE_ITT_BUILD_ARG(itt_sync_obj));
}
void release() { __kmp_release_template(this); }
};
// Hierarchical 64-bit on-core barrier instantiation
class kmp_flag_oncore : public kmp_flag<kmp_uint64> {
kmp_uint64 checker;
kmp_info_t * waiting_threads[1];
kmp_uint32 num_waiting_threads;
kmp_uint32 offset; /**< Portion of flag that is of interest for an operation. */
bool flag_switch; /**< Indicates a switch in flag location. */
enum barrier_type bt; /**< Barrier type. */
kmp_info_t * this_thr; /**< Thread that may be redirected to different flag location. */
#if USE_ITT_BUILD
void *itt_sync_obj; /**< ITT object that must be passed to new flag location. */
#endif
char& byteref(volatile kmp_uint64* loc, size_t offset) { return ((char *)loc)[offset]; }
public:
kmp_flag_oncore(volatile kmp_uint64 *p)
: kmp_flag<kmp_uint64>(p, flag_oncore), num_waiting_threads(0), flag_switch(false) {}
kmp_flag_oncore(volatile kmp_uint64 *p, kmp_uint32 idx)
: kmp_flag<kmp_uint64>(p, flag_oncore), offset(idx), num_waiting_threads(0), flag_switch(false) {}
kmp_flag_oncore(volatile kmp_uint64 *p, kmp_uint64 c, kmp_uint32 idx, enum barrier_type bar_t,
kmp_info_t * thr
#if USE_ITT_BUILD
, void *itt
#endif
)
: kmp_flag<kmp_uint64>(p, flag_oncore), checker(c), offset(idx), bt(bar_t), this_thr(thr)
#if USE_ITT_BUILD
, itt_sync_obj(itt)
#endif
, num_waiting_threads(0), flag_switch(false) {}
kmp_info_t * get_waiter(kmp_uint32 i) {
KMP_DEBUG_ASSERT(i<num_waiting_threads);
return waiting_threads[i];
}
kmp_uint32 get_num_waiters() { return num_waiting_threads; }
void set_waiter(kmp_info_t *thr) {
waiting_threads[0] = thr;
num_waiting_threads = 1;
}
bool done_check_val(kmp_uint64 old_loc) { return byteref(&old_loc,offset) == checker; }
bool done_check() { return done_check_val(*get()); }
bool notdone_check() {
// Calculate flag_switch
if (this_thr->th.th_bar[bt].bb.wait_flag == KMP_BARRIER_SWITCH_TO_OWN_FLAG)
flag_switch = true;
if (byteref(get(),offset) != 1 && !flag_switch)
return true;
else if (flag_switch) {
this_thr->th.th_bar[bt].bb.wait_flag = KMP_BARRIER_SWITCHING;
kmp_flag_64 flag(&this_thr->th.th_bar[bt].bb.b_go, (kmp_uint64)KMP_BARRIER_STATE_BUMP);
__kmp_wait_64(this_thr, &flag, TRUE
#if USE_ITT_BUILD
, itt_sync_obj
#endif
);
}
return false;
}
kmp_uint64 internal_release() {
kmp_uint64 old_val;
if (__kmp_dflt_blocktime == KMP_MAX_BLOCKTIME) {
old_val = *get();
byteref(get(),offset) = 1;
}
else {
kmp_uint64 mask=0;
byteref(&mask,offset) = 1;
old_val = KMP_TEST_THEN_OR64((volatile kmp_int64 *)get(), mask);
}
return old_val;
}
kmp_uint64 set_sleeping() {
return KMP_TEST_THEN_OR64((kmp_int64 volatile *)get(), KMP_BARRIER_SLEEP_STATE);
}
kmp_uint64 unset_sleeping() {
return KMP_TEST_THEN_AND64((kmp_int64 volatile *)get(), ~KMP_BARRIER_SLEEP_STATE);
}
bool is_sleeping_val(kmp_uint64 old_loc) { return old_loc & KMP_BARRIER_SLEEP_STATE; }
bool is_sleeping() { return is_sleeping_val(*get()); }
void wait(kmp_info_t *this_thr, int final_spin
USE_ITT_BUILD_ARG(void * itt_sync_obj)) {
__kmp_wait_template(this_thr, this, final_spin
USE_ITT_BUILD_ARG(itt_sync_obj));
}
void release() { __kmp_release_template(this); }
void suspend(int th_gtid) { __kmp_suspend_oncore(th_gtid, this); }
void resume(int th_gtid) { __kmp_resume_oncore(th_gtid, this); }
int execute_tasks(kmp_info_t *this_thr, kmp_int32 gtid, int final_spin, int *thread_finished
USE_ITT_BUILD_ARG(void * itt_sync_obj), kmp_int32 is_constrained) {
return __kmp_execute_tasks_oncore(this_thr, gtid, this, final_spin, thread_finished
USE_ITT_BUILD_ARG(itt_sync_obj), is_constrained);
}
};
/*!
@}
*/
#endif // KMP_WAIT_RELEASE_H

View File

@ -1,7 +1,7 @@
/*
* kmp_wrapper_getpid.h -- getpid() declaration.
* $Revision: 42181 $
* $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
* $Revision: 42951 $
* $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
*/

View File

@ -1,8 +1,8 @@
/*
* kmp_wrapper_malloc.h -- Wrappers for memory allocation routines
* (malloc(), free(), and others).
* $Revision: 42181 $
* $Date: 2013-03-26 15:04:45 -0500 (Tue, 26 Mar 2013) $
* $Revision: 43084 $
* $Date: 2014-04-15 09:15:14 -0500 (Tue, 15 Apr 2014) $
*/

View File

@ -1,6 +1,6 @@
// libiomp.rc.var
// $Revision: 42219 $
// $Date: 2013-03-29 13:36:05 -0500 (Fri, 29 Mar 2013) $
// $Revision: 42994 $
// $Date: 2014-03-04 02:22:15 -0600 (Tue, 04 Mar 2014) $
//
////===----------------------------------------------------------------------===//
@ -41,8 +41,6 @@ VS_VERSION_INFO VERSIONINFO
// FileDescription and LegalCopyright should be short.
VALUE "FileDescription", "Intel(R) OpenMP* Runtime Library${{ our $MESSAGE_CATALOG; $MESSAGE_CATALOG ? " Message Catalog" : "" }}\0"
VALUE "LegalCopyright", "Copyright (C) 1997-2013, Intel Corporation. All rights reserved.\0"
// Following values may be relatively long.
VALUE "CompanyName", "Intel Corporation\0"
// VALUE "LegalTrademarks", "\0" // Not used for now.

View File

@ -1,6 +1,6 @@
# makefile.mk #
# $Revision: 42820 $
# $Date: 2013-11-13 16:53:44 -0600 (Wed, 13 Nov 2013) $
# $Revision: 43473 $
# $Date: 2014-09-26 15:02:57 -0500 (Fri, 26 Sep 2014) $
#
#//===----------------------------------------------------------------------===//
@ -221,6 +221,18 @@ ifeq "$(filter gcc clang,$(c))" ""
endif
endif
# On Linux and Windows Intel64 we need offload attribute for all Fortran entries
# in order to support OpenMP function calls inside Device constructs
ifeq "$(fort)" "ifort"
ifeq "$(os)_$(arch)" "lin_32e"
# TODO: change to -qoffload... when we stop supporting 14.0 compiler (-offload is deprecated)
fort-flags += -offload-attribute-target=mic
endif
ifeq "$(os)_$(arch)" "win_32e"
fort-flags += /Qoffload-attribute-target:mic
endif
endif
ifeq "$(os)" "lrb"
c-flags += -mmic
cxx-flags += -mmic
@ -361,6 +373,7 @@ ifeq "$(os)" "lin"
# to remove dependency on libgcc_s:
ifeq "$(c)" "gcc"
ld-flags-dll += -static-libgcc
# omp_os is non-empty only in the open-source code
ifneq "$(omp_os)" "freebsd"
ld-flags-extra += -Wl,-ldl
endif
@ -417,11 +430,15 @@ ifeq "$(os)" "lrb"
ld-flags += -ldl
endif
endif
# include the c++ library for stats-gathering code
ifeq "$(stats)" "on"
ld-flags-extra += -Wl,-lstdc++
endif
endif
endif
ifeq "$(os)" "mac"
ifeq "$(c)" "icc"
ifeq "$(ld)" "icc"
ld-flags += -no-intel-extensions
endif
ld-flags += -single_module
@ -483,6 +500,13 @@ endif
cpp-flags += -D KMP_ADJUST_BLOCKTIME=1
cpp-flags += -D BUILD_PARALLEL_ORDERED
cpp-flags += -D KMP_ASM_INTRINS
cpp-flags += -D KMP_USE_INTERNODE_ALIGNMENT=0
# Linux and MIC compile with version symbols
ifneq "$(filter lin lrb,$(os))" ""
ifeq "$(filter ppc64,$(arch))" ""
cpp-flags += -D KMP_USE_VERSION_SYMBOLS
endif
endif
ifneq "$(os)" "lrb"
cpp-flags += -D USE_LOAD_BALANCE
endif
@ -506,43 +530,52 @@ else # 5
cpp-flags += -D KMP_GOMP_COMPAT
endif
endif
cpp-flags += -D KMP_NESTED_HOT_TEAMS
ifneq "$(filter 32 32e,$(arch))" ""
cpp-flags += -D KMP_USE_ADAPTIVE_LOCKS=1 -D KMP_DEBUG_ADAPTIVE_LOCKS=0
endif
# is the std c++ library needed? (for stats-gathering, it is)
std_cpp_lib=0
ifneq "$(filter lin lrb,$(os))" ""
ifeq "$(stats)" "on"
cpp-flags += -D KMP_STATS_ENABLED=1
std_cpp_lib=1
else
cpp-flags += -D KMP_STATS_ENABLED=0
endif
else # no mac or windows support for stats-gathering
ifeq "$(stats)" "on"
$(error Statistics-gathering functionality not available on $(os) platform)
endif
cpp-flags += -D KMP_STATS_ENABLED=0
endif
# define compatibility with different OpenMP versions
have_omp_50=0
have_omp_41=0
have_omp_40=0
have_omp_30=0
ifeq "$(OMP_VERSION)" "50"
have_omp_50=1
have_omp_41=1
have_omp_40=1
have_omp_30=1
endif
ifeq "$(OMP_VERSION)" "41"
have_omp_50=0
have_omp_41=1
have_omp_40=1
have_omp_30=1
endif
ifeq "$(OMP_VERSION)" "40"
have_omp_50=0
have_omp_41=0
have_omp_40=1
have_omp_30=1
endif
ifeq "$(OMP_VERSION)" "30"
have_omp_50=0
have_omp_41=0
have_omp_40=0
have_omp_30=1
endif
cpp-flags += -D OMP_50_ENABLED=$(have_omp_50) -D OMP_41_ENABLED=$(have_omp_41)
cpp-flags += -D OMP_40_ENABLED=$(have_omp_40) -D OMP_30_ENABLED=$(have_omp_30)
cpp-flags += -D OMP_50_ENABLED=$(have_omp_50) -D OMP_41_ENABLED=$(have_omp_41) -D OMP_40_ENABLED=$(have_omp_40)
# Using ittnotify is enabled by default.
USE_ITT_NOTIFY = 1
@ -598,8 +631,8 @@ ifneq "$(os)" "win"
z_Linux_asm$(obj) : \
cpp-flags += -D KMP_ARCH_PPC64
else
z_Linux_asm$(obj) : \
cpp-flags += -D KMP_ARCH_X86$(if $(filter 32e,$(arch)),_64)
z_Linux_asm$(obj) : \
cpp-flags += -D KMP_ARCH_X86$(if $(filter 32e,$(arch)),_64)
endif
endif
@ -699,6 +732,8 @@ else # norm or prof
kmp_i18n \
kmp_io \
kmp_runtime \
kmp_wait_release \
kmp_barrier \
kmp_settings \
kmp_str \
kmp_tasking \
@ -715,6 +750,10 @@ ifeq "$(OMP_VERSION)" "40"
lib_cpp_items += kmp_taskdeps
lib_cpp_items += kmp_cancel
endif
ifeq "$(stats)" "on"
lib_cpp_items += kmp_stats
lib_cpp_items += kmp_stats_timing
endif
# OS-specific files.
ifeq "$(os)" "win"
@ -1272,8 +1311,20 @@ ifneq "$(os)" "lrb"
# On Linux* OS and OS X* the test is good enough because GNU compiler knows nothing
# about libirc and Intel compiler private lib directories, but we will grep verbose linker
# output just in case.
tt-c = cc
ifeq "$(os)" "lin" # GCC on OS X* does not recognize -pthread.
# Using clang on OS X* because of discontinued support of GNU compilers.
ifeq "$(os)" "mac"
ifeq "$(std_cpp_lib)" "1"
tt-c = clang++
else
tt-c = clang
endif
else # lin
ifeq "$(std_cpp_lib)" "1"
tt-c = g++
else
tt-c = gcc
endif
# GCC on OS X* does not recognize -pthread.
tt-c-flags += -pthread
endif
tt-c-flags += -o $(tt-exe-file)
@ -1416,6 +1467,10 @@ ifneq "$(filter %-dyna win-%,$(os)-$(LINK_TYPE))" ""
td_exp += libc.so.6
td_exp += ld64.so.1
endif
ifeq "$(std_cpp_lib)" "1"
td_exp += libstdc++.so.6
endif
td_exp += libdl.so.2
td_exp += libgcc_s.so.1
ifeq "$(filter 32 32e 64 ppc64,$(arch))" ""
@ -1428,6 +1483,9 @@ ifneq "$(filter %-dyna win-%,$(os)-$(LINK_TYPE))" ""
endif
ifeq "$(os)" "lrb"
ifeq "$(MIC_OS)" "lin"
ifeq "$(std_cpp_lib)" "1"
td_exp += libstdc++.so.6
endif
ifeq "$(MIC_ARCH)" "knf"
td_exp += "ld-linux-l1om.so.2"
td_exp += libc.so.6
@ -1459,8 +1517,9 @@ ifneq "$(filter %-dyna win-%,$(os)-$(LINK_TYPE))" ""
td_exp += uuid
endif
endif
ifeq "$(omp_os)" "freebsd"
td_exp =
td_exp =
td_exp += libc.so.7
td_exp += libthr.so.3
td_exp += libunwind.so.5

View File

@ -1,6 +1,6 @@
# rules.mk #
# $Revision: 42423 $
# $Date: 2013-06-07 09:25:21 -0500 (Fri, 07 Jun 2013) $
# $Revision: 42951 $
# $Date: 2014-01-21 14:41:41 -0600 (Tue, 21 Jan 2014) $
#
#//===----------------------------------------------------------------------===//

Some files were not shown because too many files have changed in this diff Show More