archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Jon Chesterfield	47e63cb60e	[openmp] Apply code change from D109500 (cherry picked from commit 71052ea1e3c63b7209731fdc1726d10640d97480)	2021-09-13 20:57:17 -07:00
Joseph Huber	4f1fd1c209	[Attributor] Change function internalization to not replace uses in internalized callers The current implementation of function internalization creats a copy of each function and replaces every use. This has the downside that the external versions of the functions will call into the internalized versions of the functions. This prevents them from being fully independent of eachother. This patch replaces the current internalization scheme with a method that creates all the copies of the functions intended to be internalized first and then replaces the uses as long as their caller is not already internalized. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106931 (cherry picked from commit adbaa39dfce7a8361d89b6a3b382fd8f50b94727)	2021-08-04 16:35:01 -07:00
Jose M Monsalve Diaz	5b7208da36	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	e9073c9b78	[OpenMP] Try to simplify all loads in device code Eliminating loads/stores in the device code is worth the extra effort, especially for the new device runtime. At the same time we do not compute AAExecutionDomain for non-device code anymore, there is no point. Differential Revision: https://reviews.llvm.org/D106845	2021-07-27 01:44:15 -05:00
Shilei Tian	14491b35e0	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Johannes Doerfert	6a849a320d	[OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass While rewriteDeviceCodeStateMachine should probably be folded into buildCustomStateMachine, we at least need the optimization to happen. This was not reliably the case in the CGSCC pass but in the Module pass it seems to work reliably. This also ports a test to the new kernel encoding (target_init/deinit), and makes sure we cannot run the kernel in SPMD mode. Differential Revision: https://reviews.llvm.org/D106345	2021-07-26 21:26:07 -05:00
Joseph Huber	71556de73f	[OpenMP][NFC] Remove unncessary capture in RAII struct Summary: There was an unnecessary variable assigned to the information cache when we only need it in the constructor to extract the function declaration.	2021-07-26 15:05:55 -04:00
Joseph Huber	daea2dd14a	[OpenMP] Introduce RAII to protect certain RTL calls from DCE This patch introduces a new RAII struct that will temporarily make an OpenMP RTL function have external linkage. This is done before the attributor is invoked to prevent it from incorrectly removing some function definitions that we will use later. For example, if we determine all calls to one function are dead, because it has internal linkage it can safely be removed. Later when we try to get an instance to that function to modify the source using `getOrCreateRuntimeFunction` we will then get an empty declaration for that function that won't be defined anywhere. This patch prevents this from occurring. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106707	2021-07-25 14:15:47 -04:00
Shilei Tian	89d6157c2c	[AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding `__kmpc_is_spmd_exec_mode` Since we are using assumed information now, the logic should be refined to avoid unncessary assertion. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106630	2021-07-23 13:36:47 -04:00
Giorgis Georgakoudis	85a5c6ecfe	[OpenMPOpt] Move dedup runtime calls after init for target regions Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106556	2021-07-23 05:54:01 -07:00
Giorgis Georgakoudis	6805993080	[Attributor][Fix] Add overrides for AA2HS analysis	2021-07-22 18:20:14 -07:00
Giorgis Georgakoudis	d1dd1d3743	[OpenMP] Use AAHeapToStack/AAHeapToShared analysis in SPMDization SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105634	2021-07-22 18:08:37 -07:00
Shilei Tian	04b998f247	[OpenMPOpt] Add support for BooleanStateWithSetVector D101977 added `BooleanStateWithPtrSetVector` to store pointers to a set meanwhile tracking boolean state. One of the limitation is that it can only store pointer. We might want it to store other types of values, such as integer for parallel level. This patch generalizes the idea and create `BooleanStateWithSetVector`. `BooleanStateWithPtrSetVector` therefore becomes a type alias of `BooleanStateWithSetVector`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106149	2021-07-22 13:12:29 -04:00
Johannes Doerfert	76691e60d6	[OpenMP][FIX] Use name + type checks not only name checks for calls A call that is analyzed in an optimization needs to be verified against the name and type of the runtime function to avoid that we look at arguments that do not exist (anymore). This can happen if the signature was rewritten. Since we will not set RFI.Declaration if the type doesn't match we can use it (if it's not null) to determine if the signature is as expected. Differential Revision: https://reviews.llvm.org/D106341	2021-07-21 22:51:05 -05:00
Joseph Huber	191a71d3e8	[OpenMP] Strip NoInline from known OpenMP runtime functions This patch strips the NoInline attribute from known OpenMP runtime functions. This is done so that we can denote certain runtime functions as NoInline to ensure their call sites are intact so they can be checked by OpenMPOpt. We don't wan't this noinline attribute to remain for any functions after OpenMPOpt has been run however. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106482	2021-07-21 21:18:26 -04:00
Joseph Huber	4323f25227	[OpenMP] Fold `__kmpc_is_generic_main_thread_id` if possible This patch adds the ability to fold `__kmpc_is_generic_main_thread_id` if we know for a fact that it is executed by the initial thread using AAExecutionDomain. This combined with folding `__kmpc_is_spmd_exec_mode` will allow us to fully fold `__kmpc_is_generic_main_thread`. Depends on D106438 D106437 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106439	2021-07-21 21:18:22 -04:00
Joseph Huber	e6c2e59f71	[OpenMP] Add an option to disable function internalization Function internalization can sometimes occur in situations where we want to keep the call sites intact. This patch adds an option to disable function internalization and prevents the device runtime from being internalized while creating the bitcode library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106438	2021-07-21 21:18:18 -04:00
Joseph Huber	2c3ddf5d6f	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Giorgis Georgakoudis	7e612fb3a1	[Attributor] Preserve BBs and instructions added in AA manifests Manifesting AbstractAttributes may add new BBs in the IR. This patch provides an interface to register those BBs in the Attributor so that those BBs and containing instructions are not deleted as dead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106383	2021-07-21 11:27:00 -07:00
Giorgis Georgakoudis	43d4e670a4	[OpenMP] Set RequiresFullRuntime false in SPMDization SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105556	2021-07-20 09:54:51 -07:00
Johannes Doerfert	c445095c97	[OpenMP] Fix carefully track SPMDCompatibilityTracker We did not properly use SPMDCompatibilityTracker in various places. This patch makes sure we look at the validity properly and also fix the state if we can. Differential Revision: https://reviews.llvm.org/D106085	2021-07-19 22:47:03 -05:00
Shilei Tian	21ccc39fd4	[AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`: 1. When the reaching kernels are empty, it should not fold to generic mode. 2. When creating AA for the caller when updating information, the dependency should be required. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D106209	2021-07-17 13:13:44 -04:00
Joseph Huber	f405aaa753	[OpenMP][NFC] Update the comment header for optimizations.	2021-07-16 14:13:13 -04:00
Joseph Huber	c2bfd1f7ef	[OpenMP] Add IDs to OpenMP remarks This patch adds unique idenfitiers to the existing OpenMP remarks. This makes it easier to identify the corresponding documentation for each remark that will be hosted in the OpenMP webpage. Depends on D105898 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105939	2021-07-16 14:07:03 -04:00
Joseph Huber	c740546f66	[OpenMP] Rework OpenMP remarks This patch rewrites and reworks a few of the existing remarks to make the mmore concise and consistent prior to writing the documentation for them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105898	2021-07-16 14:07:00 -04:00
Shilei Tian	08c004d674	[Attributor] Add support for compound assignment for ChangeStatus A common use of `ChangeStatus` is as follows: ``` ChangeStatus Changed = ChangeStatus::UNCHANGED; Changed \|= foo(); ``` where `foo` returns `ChangeStatus` as well. Currently `ChangeStatus` doesn't support compound assignment, we have to write as ``` Changed = Changed \| foo(); ``` which is not that convenient. This patch add the support for compound assignment for `ChangeStatus`. Compound assignment is usually implemented as a member function, and binary arithmetic operator is therefore implemented using compound assignment. However, unlike regular C++ class, enum class doesn't support member functions. As a result, they can only be implemented in the way shown in the patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106109	2021-07-15 23:51:46 -04:00
Shilei Tian	4f5c97bb0f	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-15 18:23:23 -04:00
Shilei Tian	4ef4182afa	Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible" This reverts commit 1100e4aafea233bc8bbc307c5758a7d287ad3bae.	2021-07-15 11:19:28 -04:00
Shilei Tian	7cc27af3c6	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-13 22:28:35 -04:00
Johannes Doerfert	f4830fc58d	[Attributor][NFCI] Add UsedAssumedInformation to more interfaces As with other Attributor interfaces we often want to know if assumed information was used to answer a query. This is important if only known information is allowed or if known information can lead to an early fixpoint. The users have been adjusted but none of them utilizes the new information yet.	2021-07-11 19:18:03 -05:00
Johannes Doerfert	3839fcc5cf	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 18:44:25 -05:00
Johannes Doerfert	150c925a38	[OpenMP][FIX] Add missing `)` to remark	2021-07-10 18:40:32 -05:00
Johannes Doerfert	84bebfe406	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 17:57:08 -05:00
Johannes Doerfert	51153424db	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Johannes Doerfert	510ec2aa11	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 16:32:24 -05:00
Nico Weber	b314064dc7	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	90355478cc	[Attributor][NFCI] Add UsedAssumedInformation to more interfaces As with other Attributor interfaces we often want to know if assumed information was used to answer a query. This is important if only known information is allowed or if known information can lead to an early fixpoint. The users have been adjusted but none of them utilizes the new information yet.	2021-07-10 12:32:51 -05:00
Johannes Doerfert	df82045809	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 12:32:51 -05:00
Johannes Doerfert	aca5749760	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 12:32:50 -05:00
Johannes Doerfert	63e4735bba	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1e74d94d7c	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 12:32:50 -05:00
Joseph Huber	de68609f74	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Joseph Huber	58b4d4e578	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
Joseph Huber	2d3d957ab1	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Joseph Huber	a3e4541847	[OpenMP][NFC] Fix typo in OpenMPOpt	2021-06-28 09:49:14 -04:00
Joseph Huber	ab7e4dfd0f	[OpenMP][NFC] Fix missing argument	2021-06-28 09:15:01 -04:00
Joseph Huber	877292e5c5	[OpenMP] Increase attributor iterations on the GPU Increase the number of attributor iterations on a GPU target. I forgot to change this in D104416. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104920	2021-06-28 08:50:49 -04:00
Joseph Huber	b8d800fd9c	[OpenMP] Change OpenMPOpt to check openmp metadata The metadata added in D102361 introduces a module flag that we can check to determine if the module was compiled with `-fopenmp` enables. We can now check for the precense of this instead of scanning the call graph for OpenMP runtime functions. Depends on D102361 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102423	2021-06-25 16:34:22 -04:00
Joseph Huber	1aa9483992	[Attributor] Fix AAExecutionDomain returning true on invalid states This patch fixes a problem with the AAExecutionDomain attributor not checking if it is in a valid state. This can cause it to incorrectly return that a block is executed in a single threaded context after the attributor failed for any reason. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103186	2021-06-22 18:12:43 -04:00
Joseph Huber	9e84441896	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00

1 2 3

138 Commits