Commit Graph

60 Commits

Author SHA1 Message Date
Jason Henline
492c5a1674 [Axccel] Remove -Wno-missing-braces in build
Summary:
I originally added the -Wno-missing-braces flag because I thought it was
erroneously flagging std::array initializations. Now I realize the extra
braces really are desired for these initializations, so I'm turning the
warning flag back on.

Reviewers: jlebar

Subscribers: mgorny, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D27941

llvm-svn: 290137
2016-12-19 21:34:07 +00:00
Jason Henline
bdc410baba [Acxxel] Remove setActiveDeviceForThread
Summary:
After experimenting with CUDA, I realized that we really only need to
set the active context right before creating an object such as a stream
or a device memory allocation. When we go on to use these objects later,
it is fine if the context that created them is no longer active,
operations with those objects will succeed anyway.

Since it turns out that we don't have to check the active context for
every operation, it makes sense to hide this active context from users
(by removing the "ActiveDeviceForThread" setter and getter) and to
change the Acxxel API to explicitly pass in the device ID to create
objects.

This change improves the Acxxel API and greatly simplifies the CUDA and
OpenCL implementations because they no longer require thread_local data.

Reviewers: jlebar, jprice

Subscribers: mgorny, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D26050

llvm-svn: 285372
2016-10-28 00:54:02 +00:00
Jason Henline
b3f709e10f [SE] Remove StreamExecutor
Summary:
The project has been renamed to Acxxel, so this old directory needs to
be deleted.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, parallel_libs-commits, modocache

Differential Revision: https://reviews.llvm.org/D25964

llvm-svn: 285115
2016-10-25 20:38:08 +00:00
Jason Henline
ac232ddc23 Initial check-in of Acxxel (StreamExecutor renamed)
Summary:
Acxxel is basically a simplified redesign of StreamExecutor.

Here are the major points where Acxxel differs from the current
StreamExecutor design:

* Acxxel doesn't support the kernel and kernel loader types designed for
  emission by the compiler to support type-safe kernel launches. For
  CUDA, kernels in Acxxel can be seamlessly launched using the standard
  CUDA triple-chevron kernel launch syntax that is available with clang
  and nvcc. For CUDA and OpenCL, kernel arguments can be passed in the
  old-fashioned way, as one array of pointers to arguments and another
  array of argument sizes. Although OpenCL doesn't get a type-safe
  kernel launch method, it does still get the benefit of all the memory
  management wrappers. In the future, clang may add support for
  triple-chevron OpenCL kernel launchs, or some other type-safe OpenCL
  kernel launch method.
* Acxxel does not depend on any other code in LLVM, so it builds
  completely independently from LLVM.

The goal will be to check in Acxxel and remove StreamExecutor, or
perhaps to remove the old StreamExecutor and rename Acxxel to
StreamExecutor, so I think Acxxel should be thought of as a new version
of StreamExecutor, not as a separate project.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, modocache, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D25701

llvm-svn: 285111
2016-10-25 20:18:56 +00:00
Jason Henline
7bb01a2dc4 [SE] Change CoreTests target name
Summary:
Call it StreamExecutorCoreTests in order to prevent collision with
targets from other modules.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24949

llvm-svn: 282491
2016-09-27 15:32:52 +00:00
Jason Henline
9fc16d4e11 [SE] Fix config bug with CUDA tests
Summary:
It turns out CMake errors out if a processed directory contains source
files that are not used. This was causing an error with the CUDATest.cpp
file when configuring StreamExecutor with the CUDA platform disabled.

Moving CUDATest.cpp to its own directory fixes this problem.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24618

llvm-svn: 281654
2016-09-15 20:26:28 +00:00
Jason Henline
70720a7e1b [SE] Support CUDA dynamic shared memory
Summary:
Add proper handling for shared memory arguments in the CUDA platform. Also add
in unit tests for CUDA.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24596

llvm-svn: 281635
2016-09-15 18:11:04 +00:00
Jason Henline
b2d62bd071 [SE] Let users specify CUDA path
Summary: Add logic to allow users to specify the CUDA path at configuration time.

Reviewers: jlebar

Subscribers: beanz, mgorny, jlebar, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24580

llvm-svn: 281626
2016-09-15 16:48:55 +00:00
Jason Henline
6bfc863d74 [SE] Add CUDA platform
Summary:
Basic CUDA platform implementation and cmake infrastructure to control
whether it's used. A few important TODOs will be handled in later
patches:

* Log some error messages that can't easily be returned as Errors.
* Cache modules and kernels to prevent reloading them if someone tries to
  reload a kernel that's already loaded.
* Tolerate shared memory arguments for kernel launches.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24538

llvm-svn: 281524
2016-09-14 19:58:34 +00:00
Jason Henline
b38d8a3a3b [SE] Pack global dev handle addresses
Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24528

llvm-svn: 281424
2016-09-13 23:59:10 +00:00
Jason Henline
3a90112591 Device doc says device is small
llvm-svn: 281423
2016-09-13 23:56:47 +00:00
Jason Henline
16a5352121 [SE] Platforms return Device values
Summary:
Platforms were returning Device pointers, but a Device is now basically
just a pointer to an underlying PlatformDevice, so we will now just pass
it around as a value.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24537

llvm-svn: 281422
2016-09-13 23:56:46 +00:00
Jason Henline
b459eb3529 [SE] KernelSpec return best PTX
Summary:
Before, the kernel spec would only return PTX for exactly the requested
compute capability. With this patch it will now return the PTX with the
largest compute capability that does not exceed that requested compute
capability.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24531

llvm-svn: 281417
2016-09-13 23:29:25 +00:00
Jason Henline
46b5e48fde [SE] Use real HostPlatformDevice for testing
Summary:
Replace uses of SimpleHostPlatformDevice in tests with
HostPlatformDevice.

Reviewers: jlebar

Subscribers: jlebar, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24519

llvm-svn: 281384
2016-09-13 20:14:44 +00:00
Jason Henline
3088696499 [SE] Host platform implementation
Summary:
This implementation does not currently support multiple concurrent streams, and
it won't allow kernels to be launched with grids larger than one block or
blocks larger than one thread. These limitations could be removed in the future
by launching new threads on the host, but that is not done in this
implementation.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24473

llvm-svn: 281377
2016-09-13 19:28:02 +00:00
Jason Henline
fb62147949 [SE] Add .clang-format
Summary:
The .clang-tidy file is copied from the top-level LLVM source directory.

Also fix warnings generated by clang-format:

* Moved SimpleHostPlatformDevice.h so its header include guard could
  have the right format.
* Changed signatures of methods taking llvm::Twine by value to take it
  by const ref instead.
* Add "noexcept" to some move constructors and assignment operators.
* Removed a bunch of places where single-statement loops and
  conditionals were surrounded with braces. (This was not found by the
  current clang-tidy, but with a local patch that I hope to upstream
  soon.)

Reviewers: jlebar, jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24468

llvm-svn: 281374
2016-09-13 19:25:43 +00:00
Jason Henline
45b467523b [SE] Stop using llvm-config --cxxflags
Summary:
Build configuration was adding $(llvm-config --cxxflags) to the
StreamExecutor CXXFLAGS, but this was causing "-O3" to be passed even
for debug builds, and was making debugging difficult.

The llvm-config call was originally introduced to handle the -fno-rtti
flag because an RTTI StreamExecutor could not link with a no-RTTI LLVM.
This patch converts to using LLVM_ENABLE_RTTI and only adding the
`-fno-rtti` flag if needed, not all the rest of the LLVM CXXFLAGS.

I have tested this with clang-4.0 and gcc-4.8 on Ubuntu. Some work will
probably have to be done to support MSVC.

Reviewers: jlebar

Subscribers: beanz, jprice, parallel_libs-commits, mgorny

Differential Revision: https://reviews.llvm.org/D24474

llvm-svn: 281347
2016-09-13 15:44:18 +00:00
Jason Henline
c16fb8748d [SE] Clean up device and host memory slices
Summary:
* Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to
  emphasize that the slicing is not done in place.
* Change device memory slice function name from `drop_front` to `slice`
  in order to match the naming convention of `llvm::ArrayRef` and host
  memory slice.
* Change the parameter names of host memory slice functions to
  `DropCount` and `TakeCount` to match device memory slice declarations.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24464

llvm-svn: 281239
2016-09-12 17:20:43 +00:00
Jason Henline
57ea481945 [SE] RegisteredHostMemory for async device copies
Summary:
Improve the error-prone interface that allows users to pass host
pointers that haven't been registered to asynchronous copy methods. In
CUDA, this is an extremely easy error to make, and instead of failing at
runtime, it succeeds and gives the right answers by turning the async
copy into a sync copy. So, you silently get a huge performance
degradation if you misuse the old interface. This new interface should
prevent that.

Reviewers: jlebar

Subscribers: jprice, beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24353

llvm-svn: 281225
2016-09-12 16:09:41 +00:00
Jason Henline
a3ad6dcfaf [SE] Remove Utils directory
Summary:
There is no purpose in splitting out the Error class from the rest of
the StreamExecutor code. This organization was just a vestige of an old
failed design.

Plus, this change fixes a bug in the build where the utilites library
was not being statically linked in with libstreamexecutor.

Reviewers: jlebar, jprice

Subscribers: beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24434

llvm-svn: 281118
2016-09-09 23:33:58 +00:00
Justin Lebar
b9e51397bf [StreamExecutor] Make SE work with an in-tree LLVM build.
Summary:
With these changes, we can put parallel-libs within llvm/projects and
build as normal.

This is kind of the minimal change I could figure out how to make while
still making us compatible with llvm's build system.  Some things I'm
not thrilled about include:

 * The creation of a CoreTests directory (the macros really seemed to
   want this)

 * Pulling SimpleHostPlatformDevice.h into CoreTests.  It seems to me
   this should live inside unittests/include, or maybe tests/include,
   but I didn't want to make that change in this patch.

One important piece of work that remains to be done is to make

  $ ninja check-streamexecutor

run all the tests.  Right now the only way I've figured out to run the
tests is

  $ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests
  $ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests

Reviewers: jhen

Subscribers: beanz, parallel_libs-commits, jprice

Differential Revision: https://reviews.llvm.org/D24368

llvm-svn: 281091
2016-09-09 21:01:02 +00:00
Jason Henline
5755bb42ff Add streamexecutor-config
Summary:
Similar to llvm-config, gets command-line flags that are needed to build
applications linking against StreamExecutor.

Reviewers: jprice, jlebar

Subscribers: beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24302

llvm-svn: 280955
2016-09-08 16:12:33 +00:00
Jason Henline
fe51c2f7b4 [SE] Add getName method to Device class
Reviewers: jhen

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24240

llvm-svn: 280872
2016-09-07 22:26:20 +00:00
Jason Henline
19eeb37b8c [SE] Rename PlatformInterfaces to PlatformDevice
Summary:
The only interface that we ever plan to have in this file is
PlatformDevice, so it makes sense to rename the file to reflect that.

Reviewers: jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24269

llvm-svn: 280737
2016-09-06 19:27:00 +00:00
Jason Henline
18ea094df1 [SE] Remove Platform*Handle classes
Summary:
As pointed out by jprice, these classes don't serve a purpose. Instead,
we stay consistent with the way memory is managed and let the Stream and
Kernel classes directly hold opaque handles to device Stream and Kernel
instances, respectively.

Reviewers: jprice, jlebar

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24213

llvm-svn: 280719
2016-09-06 17:07:22 +00:00
Jason Henline
3956b2840b [SE] Add getByteCount methods for device memory
Summary:
Simple utility methods will prevent users from making mistakes when
converting element counts to byte counts.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24197

llvm-svn: 280563
2016-09-03 00:32:07 +00:00
Jason Henline
91f199c4ca [SE] Remove broken doc ref
llvm-svn: 280512
2016-09-02 18:07:48 +00:00
Jason Henline
1ce1856133 [SE] Doc tweaks
Summary:
* Sections on main page.
* Use std algorithm for equality check in example.
* Add tree view on left side.
* Add extra CSS sheet to restrict content width.
* Add mild background color.
* Restrict alphabetic indexes to 1 column.
* Round corners of content boxes.
* Rename example to CUDASaxpy.cpp.
* Add CUDASaxpy.cpp to "Examples" section.

Reviewers: jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24198

llvm-svn: 280511
2016-09-02 17:59:12 +00:00
Jason Henline
31b88cb030 [SE] GlobalDeviceMemory owns its handle
Summary:
Final step in getting GlobalDeviceMemory to own its handle.

* Make GlobalDeviceMemory movable, but no longer copyable.
* Make Device::freeDeviceMemory function private and make
  GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase
  can free its memory in its destructor.
* Make GlobalDeviceMemory constructor private and make Device a friend
  so it can construct GlobalDeviceMemory.
* Remove SharedDeviceMemoryBase class because it is never used.
* Remove explicit memory freeing from example code.

This change just consumes any errors generated during device memory freeing.
The real error handling will be added in a future patch.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24195

llvm-svn: 280509
2016-09-02 17:22:42 +00:00
Jason Henline
75fbe01eeb [SE] Add "install" actions to cmake build
The "install" build target will now copy the StreamExecutor library and
headers to the appropriate subdirectories of CMAKE_INSTALL_PREFIX.

llvm-svn: 280506
2016-09-02 17:19:19 +00:00
Jason Henline
f26ef0a27a [SE] Don't pack raw device mem args
Summary:
Step 4 of getting GlobalDeviceMemory to own its handle.

Take out code to pack untyped device memory types as kernel arguments.
When GlobalDeviceMemory owns its handle, users will never touch untyped
device memory types, so they will never pass them as kernel args.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24177

llvm-svn: 280496
2016-09-02 16:10:51 +00:00
Jason Henline
c15c9ebb1d [StreamExecutor] Pass device memory by ref
Summary:
Step 3 of getting GlobalDeviceMemory to own its handle.

Since GlobalDeviceMemory will no longer by copy-constructible, we must
pass instances by reference rather than by value.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24172

llvm-svn: 280439
2016-09-02 00:25:52 +00:00
Jason Henline
dc2dff6c68 [SE] Make Kernel movable
Summary:
Kernel is basically just a smart pointer to the underlying
implementation, so making it movable prevents having to store a
std::unique_ptr to it.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24150

llvm-svn: 280437
2016-09-02 00:22:05 +00:00
Jason Henline
e091f8e814 [StreamExecutor] Read dev array directly in test
Summary:
Step 2 of getting GlobalDeviceMemory to own its handle.

Use the SimpleHostPlatformDevice allocate methods to create device
arrays for tests, and check for successful copies by dereferncing the
device array handle directly because we know it is really a host
pointer.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24148

llvm-svn: 280428
2016-09-01 23:27:39 +00:00
Jason Henline
8e5b54021e [StreamExecutor] Dev handles in platform interface
Summary:
This is the first in a series of patches that will convert
GlobalDeviceMemory to own its device memory handle. The first step is to
remove GlobalDeviceMemoryBase from the PlatformInterface interfaces and
use raw handles there instead. This is useful because
GlobalDeviceMemoryBase is going to lose its importance in this process.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24114

llvm-svn: 280401
2016-09-01 18:48:21 +00:00
Jason Henline
e9a12f1175 [SE] Make Stream movable
Summary:
The example code makes it clear that this is a much better design
decision.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24142

llvm-svn: 280397
2016-09-01 18:35:37 +00:00
Jason Henline
a8a7fb95ef [SE] Docs use JAVADOC_AUTOBRIEF
That way we don't have to explicitly annotate each brief description as
\brief.

llvm-svn: 280384
2016-09-01 17:47:17 +00:00
Jason Henline
c1e2b83d09 [StreamExecutor] getOrDie and dieIfError utils
Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24107

llvm-svn: 280312
2016-08-31 23:30:41 +00:00
Jason Henline
2eb1da8ed0 Exclude examples, unittests from doc gen
Public documentation shouldn't be generated for unit test code and code
that is only meant to be used as snippets in other documentation.

llvm-svn: 280278
2016-08-31 19:02:47 +00:00
Jason Henline
5b363dd294 [StreamExecutor] Add Doxygen main page
Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24066

llvm-svn: 280277
2016-08-31 19:02:44 +00:00
Jason Henline
ba65d4412e [StreamExecutor] Add Stream::blockHostUntilDone
Summary: Add the type-safe wrapper to the platform-specific implementation.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24063

llvm-svn: 280182
2016-08-31 00:11:14 +00:00
Jason Henline
90ce6e1e64 [StreamExecutor] Simplify Kernel classes
Summary:
Make the Kernel class follow the pattern of the other classes. It now
has a type-safe user wrapper and a typeless, platform-specific handle.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24043

llvm-svn: 280176
2016-08-30 23:35:24 +00:00
Jason Henline
f14306b01e [StreamExecutor] Fix KernelSpec Doxygen
Summary:
There was a typo where \endcode was spelled as \encode and it was
keeping the whole file document from rendering. I also added in some \c
annotations for inline code stuff to make it look nicer.

Reviewers: jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23941

llvm-svn: 279855
2016-08-26 19:55:32 +00:00
Jason Henline
20cf1eb161 [StreamExecutor] Add Platform and PlatformManager
Summary: Abstractions for a StreamExecutor platform

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23857

llvm-svn: 279779
2016-08-25 21:33:07 +00:00
Jason Henline
bcc77b6249 [StreamExecutor] Rename Executor to Device
Summary: This more clearly describes what the class is.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23851

llvm-svn: 279669
2016-08-24 21:31:53 +00:00
Jason Henline
3053bbf3b2 [StreamExecutor] Fix allocateDeviceMemory
Summary:
The return value from PlatformExecutor::allocateDeviceMemory needs to be
converted from Expected<GlobalDeviceMemoryBase> to
Expected<GlobalDeviceMemory<T>> in Executor::allocateDeviceMemory.

A similar bug is also fixed for Executor::allocateHostMemory.

Thanks to jprice for identifying this bug.

Reviewers: jprice, jlebar

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23849

llvm-svn: 279658
2016-08-24 19:42:03 +00:00
Jason Henline
424fc7e611 [StreamExecutor] Clean up device copy comments
Summary:
Consolidate Executor::synchronousCopy* and Stream::thenCopy* methods into
Doxygen method groups and combine all their comments into one section.

Also a "doc" target to the build files to use Doxygen to build the
documentation.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23845

llvm-svn: 279654
2016-08-24 18:56:26 +00:00
Jason Henline
bb1322d495 [StreamExecutor] Executor add synchronous methods
Summary:
Add Executor methods that block the host until completion. Since these
methods are host-synchronous, they don't require Stream arguments.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23577

llvm-svn: 279640
2016-08-24 16:58:20 +00:00
Jason Henline
a91dc70b18 [StreamExecutor] Rename StreamExecutor to Executor
Summary: No functional changes just renaming this class for better readability.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23574

llvm-svn: 278833
2016-08-16 18:18:32 +00:00
Jason Henline
68b97c7dc9 [StreamExecutor] Add basic Stream operations
Summary: Add the Stream class and a few of the operations it supports.

Reviewers: jlebar, tra

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23333

llvm-svn: 278829
2016-08-16 17:58:31 +00:00