add_compile_options is more sensitive to its location in the file than add_definitions--it only takes effect for sources that are added after it. This updated patch ensures that the add_compile_options is done before adding any source files that depend on it.
Using add_definitions caused the flag to be passed to rc.exe on Windows and thus broke Windows builds.
After lots of follow-up fixes, there are still problems, such as
-Wno-suggest-override getting passed to the Windows Resource Compiler
because it was added with add_definitions in the CMake file.
Rather than piling on another fix, let's revert so this can be re-landed
when there's a proper fix.
This reverts commit 21c0b4c1e8d6a171899b31d072a47dac27258fc5.
This reverts commit 81d68ad27b29b1e6bc93807c6e42b14e9a77eade.
This reverts commit a361aa5249856e333a373df90947dabf34cd6aab.
This reverts commit fa42b7cf2949802ff0b8a63a2e111a2a68711067.
This reverts commit 955f87f947fda3072a69b0b00ca83c1f6a0566f6.
This reverts commit 8b16e45f66e24e4c10e2cea1b70d2b85a7ce64d5.
This reverts commit 308a127a38d1111f3940420b98ff45fc1c17715f.
This reverts commit 274b6b0c7a8b584662595762eaeff57d61c6807f.
This reverts commit 1c7037a2a5576d0bb083db10ad947a8308e61f65.
all missed!
Thanks to Alex Bradbury for pointing this out, and the fact that I never
added the intended `legacy` anchor to the developer policy. Add that
anchor too. With hope, this will cause the links to all resolve
successfully.
llvm-svn: 351731
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This installs the new developer policy and moves all of the license
files across all LLVM projects in the monorepo to the new license
structure. The remaining projects will be moved independently.
Note that I've left odd formatting and other idiosyncracies of the
legacy license structure text alone to make the diff easier to read.
Critically, note that we do not in any case *remove* the old license
notice or terms, as that remains necessary until we finish the
relicensing process.
I've updated a few license files that refer to the LLVM license to
instead simply refer generically to whatever license the LLVM project is
under, basically trying to minimize confusion.
This is really the culmination of so many people. Chris led the
community discussions, drafted the policy update and organized the
multi-year string of meeting between lawyers across the community to
figure out the strategy. Numerous lawyers at companies in the community
spent their time figuring out initial answers, and then the Foundation's
lawyer Heather Meeker has done *so* much to help refine and get us ready
here. I could keep going on, but I just want to make sure everyone
realizes what a huge community effort this has been from the begining.
Differential Revision: https://reviews.llvm.org/D56897
llvm-svn: 351631
Summary:
I originally added the -Wno-missing-braces flag because I thought it was
erroneously flagging std::array initializations. Now I realize the extra
braces really are desired for these initializations, so I'm turning the
warning flag back on.
Reviewers: jlebar
Subscribers: mgorny, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D27941
llvm-svn: 290137
Summary:
After experimenting with CUDA, I realized that we really only need to
set the active context right before creating an object such as a stream
or a device memory allocation. When we go on to use these objects later,
it is fine if the context that created them is no longer active,
operations with those objects will succeed anyway.
Since it turns out that we don't have to check the active context for
every operation, it makes sense to hide this active context from users
(by removing the "ActiveDeviceForThread" setter and getter) and to
change the Acxxel API to explicitly pass in the device ID to create
objects.
This change improves the Acxxel API and greatly simplifies the CUDA and
OpenCL implementations because they no longer require thread_local data.
Reviewers: jlebar, jprice
Subscribers: mgorny, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D26050
llvm-svn: 285372
Summary:
The project has been renamed to Acxxel, so this old directory needs to
be deleted.
Reviewers: jlebar, jprice
Subscribers: beanz, mgorny, parallel_libs-commits, modocache
Differential Revision: https://reviews.llvm.org/D25964
llvm-svn: 285115
Summary:
Acxxel is basically a simplified redesign of StreamExecutor.
Here are the major points where Acxxel differs from the current
StreamExecutor design:
* Acxxel doesn't support the kernel and kernel loader types designed for
emission by the compiler to support type-safe kernel launches. For
CUDA, kernels in Acxxel can be seamlessly launched using the standard
CUDA triple-chevron kernel launch syntax that is available with clang
and nvcc. For CUDA and OpenCL, kernel arguments can be passed in the
old-fashioned way, as one array of pointers to arguments and another
array of argument sizes. Although OpenCL doesn't get a type-safe
kernel launch method, it does still get the benefit of all the memory
management wrappers. In the future, clang may add support for
triple-chevron OpenCL kernel launchs, or some other type-safe OpenCL
kernel launch method.
* Acxxel does not depend on any other code in LLVM, so it builds
completely independently from LLVM.
The goal will be to check in Acxxel and remove StreamExecutor, or
perhaps to remove the old StreamExecutor and rename Acxxel to
StreamExecutor, so I think Acxxel should be thought of as a new version
of StreamExecutor, not as a separate project.
Reviewers: jlebar, jprice
Subscribers: beanz, mgorny, modocache, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D25701
llvm-svn: 285111
Summary:
Call it StreamExecutorCoreTests in order to prevent collision with
targets from other modules.
Reviewers: jlebar, jprice
Subscribers: beanz, mgorny, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24949
llvm-svn: 282491
Summary:
It turns out CMake errors out if a processed directory contains source
files that are not used. This was causing an error with the CUDATest.cpp
file when configuring StreamExecutor with the CUDA platform disabled.
Moving CUDATest.cpp to its own directory fixes this problem.
Reviewers: jlebar, jprice
Subscribers: beanz, mgorny, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24618
llvm-svn: 281654
Summary:
Add proper handling for shared memory arguments in the CUDA platform. Also add
in unit tests for CUDA.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24596
llvm-svn: 281635
Summary:
Basic CUDA platform implementation and cmake infrastructure to control
whether it's used. A few important TODOs will be handled in later
patches:
* Log some error messages that can't easily be returned as Errors.
* Cache modules and kernels to prevent reloading them if someone tries to
reload a kernel that's already loaded.
* Tolerate shared memory arguments for kernel launches.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24538
llvm-svn: 281524
Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24528
llvm-svn: 281424
Summary:
Platforms were returning Device pointers, but a Device is now basically
just a pointer to an underlying PlatformDevice, so we will now just pass
it around as a value.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24537
llvm-svn: 281422
Summary:
Before, the kernel spec would only return PTX for exactly the requested
compute capability. With this patch it will now return the PTX with the
largest compute capability that does not exceed that requested compute
capability.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24531
llvm-svn: 281417
Summary:
This implementation does not currently support multiple concurrent streams, and
it won't allow kernels to be launched with grids larger than one block or
blocks larger than one thread. These limitations could be removed in the future
by launching new threads on the host, but that is not done in this
implementation.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24473
llvm-svn: 281377
Summary:
The .clang-tidy file is copied from the top-level LLVM source directory.
Also fix warnings generated by clang-format:
* Moved SimpleHostPlatformDevice.h so its header include guard could
have the right format.
* Changed signatures of methods taking llvm::Twine by value to take it
by const ref instead.
* Add "noexcept" to some move constructors and assignment operators.
* Removed a bunch of places where single-statement loops and
conditionals were surrounded with braces. (This was not found by the
current clang-tidy, but with a local patch that I hope to upstream
soon.)
Reviewers: jlebar, jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24468
llvm-svn: 281374
Summary:
Build configuration was adding $(llvm-config --cxxflags) to the
StreamExecutor CXXFLAGS, but this was causing "-O3" to be passed even
for debug builds, and was making debugging difficult.
The llvm-config call was originally introduced to handle the -fno-rtti
flag because an RTTI StreamExecutor could not link with a no-RTTI LLVM.
This patch converts to using LLVM_ENABLE_RTTI and only adding the
`-fno-rtti` flag if needed, not all the rest of the LLVM CXXFLAGS.
I have tested this with clang-4.0 and gcc-4.8 on Ubuntu. Some work will
probably have to be done to support MSVC.
Reviewers: jlebar
Subscribers: beanz, jprice, parallel_libs-commits, mgorny
Differential Revision: https://reviews.llvm.org/D24474
llvm-svn: 281347
Summary:
* Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to
emphasize that the slicing is not done in place.
* Change device memory slice function name from `drop_front` to `slice`
in order to match the naming convention of `llvm::ArrayRef` and host
memory slice.
* Change the parameter names of host memory slice functions to
`DropCount` and `TakeCount` to match device memory slice declarations.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24464
llvm-svn: 281239
Summary:
Improve the error-prone interface that allows users to pass host
pointers that haven't been registered to asynchronous copy methods. In
CUDA, this is an extremely easy error to make, and instead of failing at
runtime, it succeeds and gives the right answers by turning the async
copy into a sync copy. So, you silently get a huge performance
degradation if you misuse the old interface. This new interface should
prevent that.
Reviewers: jlebar
Subscribers: jprice, beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24353
llvm-svn: 281225
Summary:
There is no purpose in splitting out the Error class from the rest of
the StreamExecutor code. This organization was just a vestige of an old
failed design.
Plus, this change fixes a bug in the build where the utilites library
was not being statically linked in with libstreamexecutor.
Reviewers: jlebar, jprice
Subscribers: beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24434
llvm-svn: 281118
Summary:
With these changes, we can put parallel-libs within llvm/projects and
build as normal.
This is kind of the minimal change I could figure out how to make while
still making us compatible with llvm's build system. Some things I'm
not thrilled about include:
* The creation of a CoreTests directory (the macros really seemed to
want this)
* Pulling SimpleHostPlatformDevice.h into CoreTests. It seems to me
this should live inside unittests/include, or maybe tests/include,
but I didn't want to make that change in this patch.
One important piece of work that remains to be done is to make
$ ninja check-streamexecutor
run all the tests. Right now the only way I've figured out to run the
tests is
$ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests
$ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests
Reviewers: jhen
Subscribers: beanz, parallel_libs-commits, jprice
Differential Revision: https://reviews.llvm.org/D24368
llvm-svn: 281091
Summary:
Similar to llvm-config, gets command-line flags that are needed to build
applications linking against StreamExecutor.
Reviewers: jprice, jlebar
Subscribers: beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24302
llvm-svn: 280955
Summary:
The only interface that we ever plan to have in this file is
PlatformDevice, so it makes sense to rename the file to reflect that.
Reviewers: jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24269
llvm-svn: 280737
Summary:
As pointed out by jprice, these classes don't serve a purpose. Instead,
we stay consistent with the way memory is managed and let the Stream and
Kernel classes directly hold opaque handles to device Stream and Kernel
instances, respectively.
Reviewers: jprice, jlebar
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24213
llvm-svn: 280719
Summary:
Simple utility methods will prevent users from making mistakes when
converting element counts to byte counts.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24197
llvm-svn: 280563
Summary:
* Sections on main page.
* Use std algorithm for equality check in example.
* Add tree view on left side.
* Add extra CSS sheet to restrict content width.
* Add mild background color.
* Restrict alphabetic indexes to 1 column.
* Round corners of content boxes.
* Rename example to CUDASaxpy.cpp.
* Add CUDASaxpy.cpp to "Examples" section.
Reviewers: jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24198
llvm-svn: 280511
Summary:
Final step in getting GlobalDeviceMemory to own its handle.
* Make GlobalDeviceMemory movable, but no longer copyable.
* Make Device::freeDeviceMemory function private and make
GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase
can free its memory in its destructor.
* Make GlobalDeviceMemory constructor private and make Device a friend
so it can construct GlobalDeviceMemory.
* Remove SharedDeviceMemoryBase class because it is never used.
* Remove explicit memory freeing from example code.
This change just consumes any errors generated during device memory freeing.
The real error handling will be added in a future patch.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24195
llvm-svn: 280509
The "install" build target will now copy the StreamExecutor library and
headers to the appropriate subdirectories of CMAKE_INSTALL_PREFIX.
llvm-svn: 280506
Summary:
Step 4 of getting GlobalDeviceMemory to own its handle.
Take out code to pack untyped device memory types as kernel arguments.
When GlobalDeviceMemory owns its handle, users will never touch untyped
device memory types, so they will never pass them as kernel args.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24177
llvm-svn: 280496
Summary:
Step 3 of getting GlobalDeviceMemory to own its handle.
Since GlobalDeviceMemory will no longer by copy-constructible, we must
pass instances by reference rather than by value.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24172
llvm-svn: 280439
Summary:
Kernel is basically just a smart pointer to the underlying
implementation, so making it movable prevents having to store a
std::unique_ptr to it.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24150
llvm-svn: 280437
Summary:
Step 2 of getting GlobalDeviceMemory to own its handle.
Use the SimpleHostPlatformDevice allocate methods to create device
arrays for tests, and check for successful copies by dereferncing the
device array handle directly because we know it is really a host
pointer.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24148
llvm-svn: 280428
Summary:
This is the first in a series of patches that will convert
GlobalDeviceMemory to own its device memory handle. The first step is to
remove GlobalDeviceMemoryBase from the PlatformInterface interfaces and
use raw handles there instead. This is useful because
GlobalDeviceMemoryBase is going to lose its importance in this process.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24114
llvm-svn: 280401
Summary:
The example code makes it clear that this is a much better design
decision.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24142
llvm-svn: 280397