Commit Graph

590 Commits

Author SHA1 Message Date
Jeroen Ketema
1364d268a4 Implement mem_fence on ptx
PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
2017-10-09 19:43:04 +00:00
Jeroen Ketema
4f5a3d5d6f Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory. Unfortunately, there is no similar instruction which does
not include a memory fence. Hence, we cannot optimize the case
where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is
passed.

llvm-svn: 315228
2017-10-09 18:36:48 +00:00
Jan Vesely
3c51ae5bd9 travis: Make sure we report failure even if only earlier checked files fail
for loop would only report status of the last command
v2: return '1'
    call test instead of '['

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315193
2017-10-08 20:07:58 +00:00
Jan Vesely
136381dc38 check_external_calls.sh: Print number of calls in tested file.
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315192
2017-10-08 20:07:56 +00:00
Jan Vesely
80bb52ae75 ptx: Use __clc_nextafter to implement nextafter
using clang builtin results in external library call

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315191
2017-10-08 19:34:00 +00:00
Jan Vesely
1de1444d62 Do not include clc_nextafter header globally
Drop unused clc/math/clc_nextafter.h header

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315190
2017-10-08 19:33:58 +00:00
Jan Vesely
6a5c8ddb3a math/nextafter: Use custom declaration inc file
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315189
2017-10-08 19:33:55 +00:00
Jan Vesely
72be1cc0be math/binary_decl.inc: Do not declare mixed float/double functions
fmin/fmax only need vector/scalar mix

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315188
2017-10-08 19:33:53 +00:00
Jan Vesely
beb6591753 ldexp: Fix double precision function return type
Fixes ~1200 external calls from nvtpx library.

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315170
2017-10-08 06:56:14 +00:00
Jan Vesely
391305638c configure: Fix handling of directories with compats only source lists
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315018
2017-10-05 20:16:28 +00:00
Jeroen Ketema
957151bd86 Add vload_half helpers for ptx
The removes the vload_half unresolved calls from the nvptx libraries.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314998
2017-10-05 18:17:40 +00:00
Jeroen Ketema
feefb0870f Add vstore_half helpers for ptx
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314925
2017-10-04 19:07:48 +00:00
Jan Vesely
a02d0e2c50 integer/sub_sat: Use clang builtin instead of llvm asm
reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314703
2017-10-02 18:39:03 +00:00
Jan Vesely
1964df8fad integer/add_sat: Use clang builtin instead of llvm asm
reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314702
2017-10-02 18:39:00 +00:00
Jan Vesely
943057a288 integer/clz: Use clang builtin instead of llvm asm
The generated llvm IR mostly identical. char/uchar case is a bit worse.

reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314701
2017-10-02 18:38:57 +00:00
Jeroen Ketema
fe9fa89854 Let get_work_dim take exactly 0 arguments
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314634
2017-10-01 20:11:46 +00:00
Jeroen Ketema
17fdf263c5 Do no circularly define NULL
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314633
2017-10-01 20:10:14 +00:00
Jan Vesely
2b7fa1c6f6 Fix amdgcn-amdhsa on llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314548
2017-09-29 19:06:52 +00:00
Jan Vesely
aee030f284 travis: Check built libraries on llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314547
2017-09-29 19:06:50 +00:00
Jan Vesely
8c8c287adf Add script to check for unresolved function calls
v2: add shell shebang
    improve error checks and reporting
v3: fix typo

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314546
2017-09-29 19:06:48 +00:00
Jan Vesely
41b1500db0 geometric: geometric functions are only supported for vector lengths <=4
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314545
2017-09-29 19:06:47 +00:00
Jan Vesely
8d08f01eff travis: add build using llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314544
2017-09-29 19:06:45 +00:00
Jan Vesely
ce29e8cde1 Restore support for llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314543
2017-09-29 19:06:41 +00:00
Jan Vesely
3bb50f6f7b Add missing HAVE_LLVM define to fix build with latest llvm
Broken since r314111

V2: pointed out by Jan Vesely
   - Use format() instead of % formating

Patch-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314261
2017-09-26 23:15:54 +00:00
Jan Vesely
1fa727d615 Rework atomic ops to use clang builtins rather than llvm asm
reviewer: Aaron Watry

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314112
2017-09-25 16:07:34 +00:00
Jan Vesely
760052047b prepare_builtins: Fix compile breakage with older LLVM
Fixes r314050

reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314111
2017-09-25 16:04:37 +00:00
Reid Kleckner
3fc649cb76 [Support] Rename tool_output_file to ToolOutputFile, NFC
This class isn't similar to anything from the STL, so it shouldn't use
the STL naming conventions.

llvm-svn: 314050
2017-09-23 01:03:17 +00:00
Jan Vesely
c9bbbe2403 Implement cl_khr_int64_extended_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313811
2017-09-20 20:42:19 +00:00
Jan Vesely
1c81f4b0e3 Implement cl_khr_int64_base_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313810
2017-09-20 20:42:14 +00:00
Jan Vesely
d0320d5289 Add travis CI configuration file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313773
2017-09-20 17:28:58 +00:00
Aaron Watry
e62f5fa64d Add native_recip(x) as ((1)/(x))
Signed-off-by: Aaron Watry <awatry@gmail.com>
Acked-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313107
2017-09-13 01:40:25 +00:00
Aaron Watry
415a60f303 integer: Add popcount implementation using ctpop intrinsic
Also copy/modify the unary_intrin.inc from math/ to make the
intrinsic declaration somewhat reusable.

Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2

Tested-by on GCN 1.0 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312854
2017-09-09 02:23:54 +00:00
Jan Vesely
285d2fb85c Implement vload_half{,n} and vload(half)
v2: add vload(half) as well
    make helpers amdgpu specific (NVPTX uses different private AS numbering)
    use clang builtin on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
2017-09-08 23:59:00 +00:00
Jan Vesely
661ac03a1b vstore: Cleanup and add vstore(half)
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
2017-09-08 23:58:57 +00:00
Jan Vesely
b9dbaae3fb configure.py: Simplify compatibility sources
Just add the SOURCE_X.Y list to the list of sources if X.Y is the current llvm version.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312837
2017-09-08 23:58:53 +00:00
Jan Vesely
3d1db3de74 amdgcn,waitcnt: Add datalayout info
This file is only compiled for GCN which all share the same layout

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312493
2017-09-04 15:52:07 +00:00
Jan Vesely
e337b30c7d r600: Cleanup barrier implementation.
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
2017-09-04 15:52:05 +00:00
Jan Vesely
1796d590c1 Fixup clc.h comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312491
2017-09-04 15:52:03 +00:00
Aaron Watry
0bf96b1712 relational: Implement shuffle2 builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input

v2: Add half support to shuffle2
    Move shuffle2 to misc/

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
2017-09-02 02:23:28 +00:00
Aaron Watry
880f15dae6 relational: Implement shuffle builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in

v2: Add half-precision support to shuffle when available.
    Move to misc/ and add section 6.12.12 to clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
2017-09-02 02:23:26 +00:00
Aaron Watry
da8dfefd1c Add halfN types and enable fp16 when generating builtin declarations
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
2017-09-02 02:23:16 +00:00
Jan Vesely
999b1d9426 amdgcn: rewrite barrier() using fence and clang __builtin_amdgcn_s_barrier
Specs require using fences when barrier() is invoked:
"The barrier function will either flush any variables stored in local memory
or queue a memory fence to ensure correct ordering of memory operations to local memory."
and
"The barrier function will queue a memory fence to ensure correct ordering
of memory operations to global memory."

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311022
2017-08-16 17:09:00 +00:00
Jan Vesely
1977092dc3 amdgcn: Implement {read_,write_,}mem_fence builtin
v2: add more detailed comment about waitcnt instruction

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311021
2017-08-16 17:08:56 +00:00
Jan Vesely
7fc4c79fa5 configure.py: Drop explicit import of int builtin
I can't reproduce the error that made me add this.

Reported-by: Kim Gräsman <kim.grasman@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Kim Gräsman <kim.grasman@gmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 310968
2017-08-15 22:24:05 +00:00
Jan Vesely
a4a20cd2f3 configure.py: Make python3 friendly
mostly prints and exceptions.
Few behavioral changes are documented in the text
Generated Makefile is identical between python2 and python3

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309820
2017-08-02 15:00:59 +00:00
Jan Vesely
09f0a560e1 add __kernel_exec macros
also consolidate macros into one file, and rename to clcmacros.h

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309358
2017-07-28 03:39:03 +00:00
Jan Vesely
2f2a3bc0dc generic: add missing get_work_dim include
Fixes few piglits since clang r304193

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 304556
2017-06-02 15:58:35 +00:00
Jan Vesely
9f7172965c math: Implement sinh function
mostly copied form amd_builtins

llvm-svn: 296233
2017-02-25 02:46:53 +00:00
Jan Vesely
c3868c8f8d .gitignore: Ignore amdgcn-mesa object directory
llvm-svn: 296164
2017-02-24 20:32:18 +00:00
Aaron Watry
dfec3c8e95 math: Add native_tan as wrapper to tan
Trivially define native_tan as a redirect to tan.

If there are any targets with a native implementation, we can deal with it later.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
llvm-svn: 295920
2017-02-23 01:46:57 +00:00
Jeroen Ketema
80d2e8ffc1 Move BufferPtr into the block where it it being used
The previous location outside the block would crash prepare-builtins
when no the builtins file accidentially not passed on the command line.

llvm-svn: 294916
2017-02-12 21:33:49 +00:00
Jeroen Ketema
ed98e8d099 Add the correct prefixes to the cl_khr_fp64 pragma
llvm-svn: 294915
2017-02-12 21:31:41 +00:00
Matt Arsenault
9df2b9781c math: Add native_rsqrt builtin function
Trivial define to rsqrt.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 294608
2017-02-09 18:39:26 +00:00
Aaron Watry
c606efabb7 math: Add logb builtin
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292335
2017-01-18 03:14:10 +00:00
Aaron Watry
900bd7eb7f math: Add expm1 builtin function
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292334
2017-01-18 03:13:37 +00:00
Tom Stellard
d83eb34ee7 Fix build since r286752.
llvm-svn: 286839
2016-11-14 16:06:33 +00:00
Tom Stellard
088faab429 Fix build since llvm r286566 and require at least llvm 4.0
llvm-svn: 286634
2016-11-11 21:34:47 +00:00
Jan Vesely
0a5aac3fc4 Provide vstore_half helper to workaround clc restrictions
clang won't accept half precision loads and stores without cl_khr_fp16 since r281904

llvm-svn: 282106
2016-09-21 20:15:55 +00:00
Tom Stellard
6b195ece57 configure: Add amdgcn-mesa-mesa3d target
llvm-svn: 281793
2016-09-16 22:43:33 +00:00
Tom Stellard
f19cf403c4 amdgcn-amdhsa: Add get_num_groups implementation
llvm-svn: 281792
2016-09-16 22:43:31 +00:00
Tom Stellard
e7ad23bad3 amdgcn-amdhsa: Add get_global_size() implementation
llvm-svn: 281791
2016-09-16 22:43:29 +00:00
Aaron Watry
af569547fa math: Implement tgamma
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281566
2016-09-15 00:17:34 +00:00
Aaron Watry
e9009cdd21 math: Implement lgamma
Just use lgamma_r and ignore the value returned in the second argument

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281565
2016-09-15 00:17:31 +00:00
Aaron Watry
0ab07e1bde math: Implement lgamma_r
Ported from the amd-builtins branch, which is itself based on the
Sun Microsystems implementation.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281564
2016-09-15 00:17:28 +00:00
Aaron Watry
f969413a82 Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE
This macro is currently unused, but I plan to use it shortly.

The previous form did casts of pointers without an address space, which
doesn't work so well for CL 1.x.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281563
2016-09-15 00:17:22 +00:00
Matt Arsenault
fbfd828d2a Replace nextafter implementation
This one passes conformance.

llvm-svn: 280961
2016-09-08 16:37:56 +00:00
Jan Vesely
eade17271a Avoid ambiguity in calling atom_add functions.
clang (since r280553) allows pointer casts in function overloads,
so we need to disambiguate the second argument.

clang might be smarter about overloads in the future
see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway.

llvm-svn: 280871
2016-09-07 22:11:02 +00:00
Niels Ole Salscheider
63f71057c0 configure.py: Add polaris10 and polaris11
llvm-svn: 280121
2016-08-30 18:00:41 +00:00
Matt Arsenault
958fce3192 amdgcn: Fix return type of get_num_groups
llvm-svn: 279723
2016-08-25 07:31:40 +00:00
Matt Arsenault
7ef7e6aacd Strip opencl.ocl.version metadata
This should be uniqued when linking, but right now it creates
a lot of metadata spam listing the same version. This should also
probably be reporting the compiled version of the user program,
which may differ from the library. Currently the library IR files report
1.0 while 1.1/1.2 are the default for user programs.

llvm-svn: 279692
2016-08-25 00:25:10 +00:00
Matt Arsenault
d0a275228e amdgcn: Also correct get_local_size type for HSA
llvm-svn: 279656
2016-08-24 19:11:52 +00:00
Matt Arsenault
26d9c41ff6 amdgcn: Fix return type for get_global_size
llvm-svn: 279644
2016-08-24 17:52:04 +00:00
Matt Arsenault
314364cbd2 amdgpu: Fix default case value for get_local_size
llvm-svn: 279359
2016-08-20 04:17:17 +00:00
Matt Arsenault
220268d177 amdgcn: Fix get_local_size IR return type
llvm-svn: 279350
2016-08-20 00:01:21 +00:00
Matt Arsenault
2ce3d94a01 amdgcn: Correct return types to be size_t
llvm-svn: 279343
2016-08-19 22:49:39 +00:00
Jan Vesely
ad8672727c Implement vstore_half{,n}
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 278962
2016-08-17 20:02:11 +00:00
Jan Vesely
4c59714a52 Make min follow the OCL 1.0 specs
OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y
are infinite or NaN, the return values are undefined."

OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y
are infinite or NaN, the return values are undefined."

The 1.0 version is stricter so use that one.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276704
2016-07-25 22:36:22 +00:00
Tom Stellard
d835b3f1af Implement cbrt builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276497
2016-07-22 23:45:15 +00:00
Tom Stellard
9cb070f96a Implement cosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276496
2016-07-22 23:45:13 +00:00
Tom Stellard
ff13926a60 geometric/floatn.inc: Add vec8 and vec16 types
llvm-svn: 276495
2016-07-22 23:45:11 +00:00
Jan Vesely
a82e080b57 AMDGPU: Implement get_global_offset builtin
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Jan Vesely
74f02db922 AMDGPU: Use clang intrinsics for workitem builtins
v2: split into 2 patches
    use clang builtins for other intrinsics as well

v3: Fix warnings
    Switch r600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
2016-07-22 17:24:20 +00:00
Jan Vesely
7846c9b8f0 ptx: Fix builtin names after clang r274770
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 276423
2016-07-22 15:00:08 +00:00
Matt Arsenault
633d749da7 amdgpu: Use right builtn for rsq
The r600 path has never actually worked sinced double is not implemented
there.

llvm-svn: 276009
2016-07-19 19:02:01 +00:00
Matt Arsenault
1ab0d9c1ee R600: Use new barrier intrinsic
llvm-svn: 275874
2016-07-18 18:42:17 +00:00
Matt Arsenault
b456c6dd56 Replace llvm.AMDGPU.ldexp with llvm.amdgcn.ldexp
It didn't really work on r600 to begin with, which should
get its own intrinsic.

llvm-svn: 275813
2016-07-18 16:42:50 +00:00
Jan Vesely
e97deffb6a configure: Remove device specific defines
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273044
2016-06-17 20:30:50 +00:00
Jan Vesely
5fd84d028d nvptx: Drop feature defines.
This is now handled by clang

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273043
2016-06-17 20:30:49 +00:00
Jan Vesely
3317f253de 64 bit integers are legal in full profile without an extension
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273042
2016-06-17 20:30:41 +00:00
Jan Vesely
973c1fa5f5 math: Use single precision fmax in sp path
Fixes fdim piglit on Turks

v2: use CL fmax instead of __builtin

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom.stellard@amd.com>
llvm-svn: 269807
2016-05-17 19:44:01 +00:00
Jan Vesely
c374cb76f4 math: Add erf ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.

reviewers: jvesely

Patch by: Vedran Miletić <rivanvx@gmail.com>

llvm-svn: 268766
2016-05-06 18:02:30 +00:00
Aaron Watry
55a8e0fd6d math: Add fdim implementation
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.

Passes piglit (float) tests on pitcairn.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
2016-05-06 03:34:45 +00:00
Tom Stellard
6cb18a09b1 prepare-builtins: Remove call to getGlobalContext()
This function has been removed from LLVM.

Patch By: Laurent Carlier

llvm-svn: 266430
2016-04-15 14:18:58 +00:00
Konstantin Zhuravlyov
f8a81f869f [AMDGPU] Implement get_local_size for amdgcn--amdhsa triple
Differential Revision: http://reviews.llvm.org/D18284

llvm-svn: 265713
2016-04-07 19:54:19 +00:00
Paul Robinson
36a6eaeb34 Update copyright year to 2016.
llvm-svn: 264949
2016-03-30 22:39:03 +00:00
Aaron Watry
09f3c99a86 math: Fix ilogb(double) return type
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 261714
2016-02-24 00:52:15 +00:00
Aaron Watry
d6d0454231 math: Add ilogb ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.

This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.

v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

llvm-svn: 261639
2016-02-23 14:43:09 +00:00
Matt Arsenault
34766ca34b Add .gitignore for build directories
llvm-svn: 261043
2016-02-17 00:27:31 +00:00
Matt Arsenault
45e6eaaa05 amdgcn: Use new workitem intrinsics
llvm-svn: 261042
2016-02-17 00:27:27 +00:00
Matt Arsenault
d4cd67ab9f Update page to list supported targets
llvm-svn: 260778
2016-02-13 01:02:06 +00:00
Matt Arsenault
a48e15c6cb Split sources for amdgcn and r600
Most files remain in a common amdgpu directory.

Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.

This now requires 3.9/trunk to build amdgcn.

llvm-svn: 260777
2016-02-13 01:01:59 +00:00
Jan Vesely
6d870d2e84 configure: Remove llvm 3.6 defines
we require llvm 3.7

reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260304
2016-02-09 22:17:48 +00:00
Jan Vesely
46b7dd99e0 configure: Remove cl_khr_fp64 for device that don't support doubles
Also remove definitions if provided by clang (3.7+)
This halves the size of builtin.opt.{cedar,barts}.bc

reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260303
2016-02-09 22:17:46 +00:00
Jan Vesely
4f38142feb configure: Introduce per device defines
Make cl_khr_fp64 define per-device.
This patch does not change the generated Makefile (for llvm 3.6, 3.7)

v2: Make the device defines per LLVM version, 'all' for all versions

reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260302
2016-02-09 22:17:45 +00:00
Jan Vesely
7fbb96b907 math: Fix log2 vectorization on non-fp64 hw
reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260301
2016-02-09 22:17:42 +00:00
Aaron Watry
8872800eff math: Add frexp ported from amd-builtins
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.

The double scalar is also a direct port from AMD's builtin release.

The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.

Both have been tested in piglit using tests sent to that project's
mailing list.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
2016-02-08 17:07:21 +00:00
Tom Stellard
37d19875fa Implement modf math builtin
V2: use the reference implementation as suggested by Matt Arsenault

Patch By: Pavel Ondračka

llvm-svn: 258933
2016-01-27 14:52:10 +00:00
Tom Stellard
a249f50970 Add _CLC_V_V_VP_VECTORIZE macro
Patch by: Pavel Ondračka

llvm-svn: 258932
2016-01-27 14:52:07 +00:00
Tom Stellard
0abd28140b AMDGPU: Add aliases for all VI targets
llvm-svn: 255663
2015-12-15 18:37:04 +00:00
Tom Stellard
655680a22b AMDGPU: Add alias for tonga
Patch by: Vedran Mileti

llvm-svn: 255662
2015-12-15 18:37:02 +00:00
Aaron Watry
23faa5a1f9 integer: remove explicit casts from _MIN definitions
The spec says (section 6.12.3, CL version 1.2):
  The macro names given in the following list must use the values
  specified. The values shall all be constant expressions suitable
  for use in #if preprocessing directives.

This commit addresses the second part of that statement.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
CC: Serge Martin <edb+libclc@sigluy.net>
llvm-svn: 249445
2015-10-06 19:12:12 +00:00
Niels Ole Salscheider
f51df5ba8c Implement tanh builtin
This is a port from the AMD builtin library.

llvm-svn: 248780
2015-09-29 06:39:09 +00:00
Tom Stellard
e2bab44ca9 Add sampler defines.
Patch by: Zoltan Gilian

llvm-svn: 248163
2015-09-21 14:59:58 +00:00
Tom Stellard
50dfd44577 Add image attribute defines.
Patch by: Zoltan Gilian

llvm-svn: 248162
2015-09-21 14:59:57 +00:00
Tom Stellard
a59fd49ba4 r600: Add image writing builtins.
Patch by: Zoltan Gilian

llvm-svn: 248161
2015-09-21 14:59:56 +00:00
Tom Stellard
9a7d4a940f r600: Add image reading builtins.
Patch by: Zoltan Gilian

llvm-svn: 248160
2015-09-21 14:59:54 +00:00
Tom Stellard
ccc0ec1ddb Add image attribute getter builtins
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.

Patch by: Zoltan Gilian

llvm-svn: 248159
2015-09-21 14:47:53 +00:00
Aaron Watry
43ee367d1e integer: Update integer limits to comply with spec
The values for the char/short/integer/long minimums were declared with
their actual values, not the definitions from the CL spec (v1.1).  As
a result, (-2147483648) was actually being treated as a long by the
compiler, not an int, which caused issues when trying to add/subtract
that value from a vector.

Update the definitions to use the values declared by the spec, and also
add explicit casts for the char/short/int minimums so that the compiler
actually treats them as shorts/chars. Without those casts, they
actually end up stored as integers, and the compiler may end up storing
the INT_MIN as a long.

The compiler can sign extend the values if it needs to convert the
char->short, short->int, or int->long

v2: Add explicit cast for INT_MIN and fix some type-o's and wrapping
    in the commit message.

Reported-by: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 247661
2015-09-15 03:56:21 +00:00
Peter Collingbourne
008ff14acf Update mailing list reference.
llvm-svn: 245894
2015-08-24 22:43:24 +00:00
Jeroen Ketema
d7be603ab1 Remove files accidentally not removed in r244310
llvm-svn: 244987
2015-08-13 23:43:12 +00:00
Jeroen Ketema
d915739b38 Require LLVM >=3.7 and bump version to 0.2.0
v2: Also remove LLVM 3.6 traces from prepare-builtins.cpp

Patch by: EdB

llvm-svn: 244310
2015-08-07 08:31:37 +00:00
Tom Stellard
7a09e88b6e Fix double implementation of log
We need to use M_LOG2E instead of M_LOG2E_F.

llvm-svn: 243132
2015-07-24 18:07:14 +00:00
Tom Stellard
44b6117dfd Implement accurate log2 function
Use the implementation was ported from the AMD builtin library rather
than LLVM Intrinsics.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 243131
2015-07-24 18:07:12 +00:00
Tom Stellard
f01ffa9ddc Use llvm intrinsics for native_log and native_log2
llvm-svn: 243130
2015-07-24 18:07:06 +00:00
Tom Stellard
4f8d26230c R600: Implement accurate double precision sqrt v2
v2:
  - Use same implementation for R600 and gcn.

llvm-svn: 241907
2015-07-10 13:37:08 +00:00
Tom Stellard
2ef5ec6b2b Fix implementation of sqrt v2
Passing values less than 0 to the llvm.sqrt() intrinsic results in
undefined behavior, so we need to check the input and return NaN if
is is less than 0.

v2:
  - Fix build failures.

llvm-svn: 241906
2015-07-10 13:37:07 +00:00
Tom Stellard
0e4152e391 prepare-builtins: Fix build with LLVM 3.6
Patch by: Tomasz Borowik

llvm-svn: 241905
2015-07-10 13:37:04 +00:00
Jeroen Ketema
2697d2a43b Properly initialize Module pointer
llvm-svn: 240881
2015-06-27 12:35:54 +00:00
Tom Stellard
4957e4057d prepare-builtins: Fix build with LLVM 3.7
llvm-svn: 240552
2015-06-24 17:03:50 +00:00
Tom Stellard
a64bad8338 Use a more accurate implementation for exp
Using exp2(x * M_LOG2E_F) does not give us accurate enough results for
OpenCL.  If you look at the new exp implementation you'll see that
it does multiply the input by M_LOG2E_F, but it still uses the original
input in part of the calculation.

This exp implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237229
2015-05-13 03:55:09 +00:00
Tom Stellard
d538fdc217 Implement exp2 using OpenCL C rather than using an intrinsic
Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237228
2015-05-13 03:55:07 +00:00
Tom Stellard
4294541290 Implement sin for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237155
2015-05-12 17:18:47 +00:00
Tom Stellard
2e6ff0c66e Implement cos for double types
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237154
2015-05-12 17:18:46 +00:00
Tom Stellard
37406a209c Implement atan2pi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237138
2015-05-12 14:48:26 +00:00
Tom Stellard
79cc3eda1e Implement atan2 for doubles
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237131
2015-05-12 13:48:51 +00:00
Jan Vesely
b0fb990b54 math: limit half_sqrt to single precision
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236941
2015-05-09 22:31:03 +00:00
Jan Vesely
7c829fe149 geometric: Limit fast_{distance,length} functions to single precision
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236940
2015-05-09 22:31:01 +00:00
Jan Vesely
071833d454 Fix ldexp fp64 build error
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 236939
2015-05-09 22:30:59 +00:00
Tom Stellard
17ec3a51c3 Implement fast_normalize builtin v4
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove f suffix from constant in double implementations.
  - Consolidate implementations using the .cl/.inc approach.

v3:
 - Use __CLC_FPSIZE instead of __CLC_FP{32,64}

v4 (Jan Vesely):
 - Limit to single precision.

llvm-svn: 236920
2015-05-09 00:04:12 +00:00
Tom Stellard
2ddfa0c5b2 Implement half_rsqrt builtin v3
This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.

v2:
  - Alphabettize SOURCES

v3 (Jan Vesely):
  Limit to single precision types.

llvm-svn: 236915
2015-05-08 23:28:44 +00:00
Jan Vesely
30978bb99d r600: Use __clc_ldexp on asics that don't implement the intruction
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236649
2015-05-06 21:59:30 +00:00
Jan Vesely
90e7ad589e Move ldexp soft implementation to a separate file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236648
2015-05-06 21:59:29 +00:00
Jan Vesely
bc81ebefb7 Implement sinpi builtin
Ported from AMD builtin library, passes piglit on Turks.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236647
2015-05-06 21:59:26 +00:00
Tom Stellard
2ca909d824 math: Add ldexp implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.

Patch By: Aaron Watry

llvm-svn: 236639
2015-05-06 20:53:32 +00:00
Tom Stellard
f30d5fc01d Implement ldexp for R600/SI
llvm-svn: 236638
2015-05-06 20:53:29 +00:00
Tom Stellard
aed5f3cf7e Fix implementation of normalize builtin
The new implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 236608
2015-05-06 16:06:31 +00:00
Tom Stellard
ba742f58af Allow compilation depending to the LLVM version
It allows to keep temporary compatibilty with older version.
For exemple, this can be use when change are not to large.

Patch by: EdB

llvm-svn: 236113
2015-04-29 15:37:06 +00:00
Jan Vesely
44e768e777 Fix compilation warnings without cl_khr_fp64
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 235762
2015-04-24 19:54:17 +00:00
Tom Stellard
9447de37a9 Implement fract builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 235620
2015-04-23 18:50:14 +00:00
Tom Stellard
d9ca1f1596 configure: Add --enable-runtime-subnormal option
This makes it possible for runtime implementations to disable
subnormal handling at runtime.

When this flag is enabled, decisions about how to handle subnormals
in the library will be controlled by an external variable called
__CLC_SUBNORMAL_DISABLE.

Function implementations should use these new helpers for querying subnormal
support:
__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

In order for the library to link correctly with this feature,
users will be required to either:

1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs).

2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the
linker.  These files are distributed with liblclc and installed to
$(installdir).  e.g.:

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc

or

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc

If you do not supply the --enable-runtime-subnormal then the library
behaves the same as it did before this commit.

In addition to these changes, the patch adds helper functions that
should be used when implementing library functions that need
special handling for denormals:

__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

llvm-svn: 235329
2015-04-20 18:49:50 +00:00
Tom Stellard
da2969fca7 Implement atanh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234324
2015-04-07 16:20:22 +00:00
Tom Stellard
ca4d382e11 Implement acosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234323
2015-04-07 16:20:20 +00:00
Tom Stellard
03dc366e79 Implement atanpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233928
2015-04-02 17:01:58 +00:00
Tom Stellard
eea0997566 Implement asinpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233927
2015-04-02 17:01:56 +00:00
Tom Stellard
2b4ef39b2f Implement asinh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233926
2015-04-02 17:01:54 +00:00
Tom Stellard
084124a8fa Implement acospi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233925
2015-04-02 17:01:52 +00:00
Tom Stellard
1ded220cc0 Implement fmax using __builtin_fmax
This ensures correct handling of NaNi.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233713
2015-03-31 16:59:23 +00:00
Tom Stellard
310da7bfd2 Implement fmin using __builtin_fmin
This ensures correct handling of NaN.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233712
2015-03-31 16:59:21 +00:00
Tom Stellard
bd4da7a0ef Implement fast_distance builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232978
2015-03-23 18:10:04 +00:00
Tom Stellard
cb80e14f2c Implement fast_length builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232977
2015-03-23 18:10:02 +00:00
Tom Stellard
d2a1559846 Implement half_sqrt builtin v2
This is a generic implementation which just calls sqrt.  Targets should
override this if they want a faster implementation.

v2:
  - Alphabetize SOURCES

llvm-svn: 232965
2015-03-23 17:01:37 +00:00
Tom Stellard
551a669e80 Implement distance builtin v2
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove unnecessary copyright.

llvm-svn: 232964
2015-03-23 17:01:35 +00:00
Tom Stellard
cb1c0d7939 Fix implementation of length builtin v2
v2:
  - Move common code into a macro
  - Use the same constant for all vector types.

llvm-svn: 232963
2015-03-23 17:01:33 +00:00
Tom Stellard
8d3a4e3af2 Add __clc_ prefix to functions in sincos_helpers.cl
This will help avoid naming conflicts with functions defined in
kernels linking with libclc.

llvm-svn: 232960
2015-03-23 16:20:24 +00:00
Aaron Watry
2cf4d5f312 math: Implement erfc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 232674
2015-03-18 21:52:07 +00:00
Tom Stellard
adfd96f742 Fix bitselect for float/double types v2
We need to reinterpret float/double types as uint/ulong in order to
perform the bitwise operations.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Use vector operations rather than splitting vectors into scalar
    components.

Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 231373
2015-03-05 15:31:05 +00:00
Aaron Watry
1314630ec3 Move mix from math to common
It has been part of the common functions since 1.0

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
2015-03-03 21:25:08 +00:00
Tom Stellard
9d0d374c5b Implement step builtin
This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 230970
2015-03-02 15:29:41 +00:00
Tom Stellard
1f28b14bba Implement smoothstep builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Fix typo in smoothstep.h

llvm-svn: 230969
2015-03-02 15:29:39 +00:00
Tom Stellard
f5e5b0171d Implement radians builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230968
2015-03-02 15:29:37 +00:00
Tom Stellard
8336b3a604 Implement degrees builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230967
2015-03-02 15:29:35 +00:00
Aaron Watry
f89bcca0b7 libclc/math: Add cospi
Ported from the libclc/amd-builtins branch

v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
    Add cospi(double) implementation instead of using llvm.cos

Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
2015-02-26 15:42:00 +00:00
Jan Vesely
51702e6e75 Implement log10
v2: Use constant and multiplication instead of division
v3: Use hex constants

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
2015-01-30 18:00:34 +00:00
Tom Stellard
0f39721261 Use amdgcn triple for SI+ GPUs
llvm-svn: 225296
2015-01-06 20:42:12 +00:00
Tom Stellard
24ea64e050 r600: get_work_dim: Update metadata syntax for LLVM 3.6
llvm-svn: 225042
2014-12-31 15:27:59 +00:00
Tom Stellard
1d77071e0d Require LLVM 3.6 and bump version to 0.1.0
Some functions are implemented using hand-written LLVM IR, and
LLVM assembly format is allowed to change between versions, so we
should require a specific version of LLVM.

llvm-svn: 225041
2014-12-31 15:27:53 +00:00
Jeroen Ketema
c9526139bc Remove wrong semi-colons
Patch by Alastair Donaldson

llvm-svn: 224568
2014-12-19 09:18:23 +00:00
Jeroen Ketema
7a22aebbda Don't include <stddef.h>
Including a standard or system header isn't allowed in OpenCL.

The type "size_t" needs to be explicitely defined now.

v2: Use __SIZE_TYPE__ instead of unsigned int.
v3: Define ptrdiff_t and NULL.

Patch-by: Jean-Sébastien Pédron
Reviewed-by: Jeroen Ketema
Reviewed-by: Jan Vesely
llvm-svn: 222235
2014-11-18 14:19:27 +00:00
NAKAMURA Takumi
729be14435 Prune CRLF.
llvm-svn: 220678
2014-10-27 12:37:26 +00:00
Jan Vesely
ae50c89589 r600: Fix get_work_dim range metadata
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220388
2014-10-22 14:32:53 +00:00
Jan Vesely
260827caa2 r600: Use llvm intrinsic to read work dimension information
v2: Fix function declaration
    Add range metadata to r600 implementation
v3: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
2014-10-15 15:08:06 +00:00
Tom Stellard
bf9f76fbe0 Implement log1p builtin
llvm-svn: 219230
2014-10-07 20:22:42 +00:00
Jan Vesely
8f64c3d842 Implement fmod
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
2014-10-05 20:24:52 +00:00
Tom Stellard
081e778d22 Implement async_work_group_copy builtin v3
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

v3:
  - Fix possible race condition by splitting the copy among multiple
    work items.

llvm-svn: 219008
2014-10-03 19:49:39 +00:00
Tom Stellard
ed5bbfdb1b Implement async_work_group_strided_copy builtin v2
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

llvm-svn: 219007
2014-10-03 19:49:37 +00:00
Tom Stellard
b5064f79ef Implement wait_group_events builtin v2
This is a simple default implemetation which just calls barrier().

v2:
  - Only call barrier() once.

llvm-svn: 219006
2014-10-03 19:49:34 +00:00
Jeroen Ketema
87d2ca57d7 Remove more redundant semi-colons
llvm-svn: 218039
2014-09-18 09:23:40 +00:00
Aaron Watry
db770b5bb5 atomic: undef macros that are included from atomic_decl.inc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 217958
2014-09-17 15:27:39 +00:00
Jeroen Ketema
839b8a62d9 Remove redundant semi-colons
llvm-svn: 217954
2014-09-17 14:40:48 +00:00
Aaron Watry
f4133b8a10 R600: Map Address spaces for atomic_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217925
2014-09-16 22:34:59 +00:00
Aaron Watry
e210cae126 R600: Map address spaces for atomic_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217924
2014-09-16 22:34:58 +00:00
Aaron Watry
0545fa3fb0 R600: Map address spaces for atomic_min
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217923
2014-09-16 22:34:56 +00:00
Aaron Watry
dd754f4b33 R600: Map address spaces for atomic_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217922
2014-09-16 22:34:55 +00:00
Aaron Watry
ea32a57060 R600: Map addr spaces and use atomic_max
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217921
2014-09-16 22:34:53 +00:00
Aaron Watry
5ab82be926 R600: Map address spaces for atomic_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217920
2014-09-16 22:34:52 +00:00
Aaron Watry
348db3c666 R600: Map atomic_and address spaces
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217919
2014-09-16 22:34:51 +00:00
Aaron Watry
0d976ba497 atomic: Add generic atom[ic]_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
2014-09-16 22:34:49 +00:00
Aaron Watry
025d79ad6c atomic: Implement generic atom[ic]_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
2014-09-16 22:34:45 +00:00
Aaron Watry
7cfa12c2a5 atomic: Add generic atomic_min implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
2014-09-16 22:34:41 +00:00
Aaron Watry
3f0a1a4c27 atomic: Add generic atom[ic]_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
2014-09-16 22:34:36 +00:00
Aaron Watry
31e67d1cff atomic: Add atom[ic]_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
2014-09-16 22:34:32 +00:00
Aaron Watry
cc68405761 atomics: Add generic atom[ic]_and
Not used yet.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217913
2014-09-16 22:34:28 +00:00
Aaron Watry
49614fbfd9 atomic: Add generic implementation of atom[ic]_max
Not used yet...

v2: Correct int/uint behavior

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217912
2014-09-16 22:34:24 +00:00
Aaron Watry
c9b88d32be atomic: define extension functions for existing atomic implementations
We were missing the local versions of the atom_* before

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
2014-09-16 22:34:21 +00:00
Aaron Watry
947bdd059a math: Add tan implementation
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
2014-09-10 15:43:35 +00:00
Aaron Watry
951ab64d19 math: Add asin implementation
asin(x) = atan2(x, sqrt( 1-x^2 ))

alternatively:
asin(x) = PI/2 - acos(x)

Use the atan2 implementation since it produces slightly shorter bitcode and
R600 machine code.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217510
2014-09-10 15:43:32 +00:00
Aaron Watry
268beab921 math: Add acos implementation
Passes the tests that were submitted to the piglit list

Tested on R600 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
2014-09-10 15:43:29 +00:00
Jan Vesely
05a60b7ac3 add isordered builtin
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217247
2014-09-05 13:59:15 +00:00
Jan Vesely
63486c1f0e add isunordered builtin
v2: remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217246
2014-09-05 13:59:13 +00:00
Jan Vesely
41a0c491de add islessgreater builtin
v2: remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217245
2014-09-05 13:59:11 +00:00
Jan Vesely
369e20353c add isnormal builtin
v2: simplify and remove isnan leftovers
    remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217244
2014-09-05 13:59:09 +00:00
Jan Vesely
a5a3b023b4 add isfinite builtin
v2: simplify and remove isinf leftovers
    remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217243
2014-09-05 13:59:06 +00:00
Tom Stellard
7a9e2c6879 Implement isinf builtin
llvm-svn: 217046
2014-09-03 15:55:40 +00:00
Tom Stellard
d8a73abfc3 Fix implementation of copysign
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().

Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
2014-09-03 15:55:38 +00:00
Jan Vesely
ef513d392b Implement generic mad_sat
v2: Fix trailing whitespace
    Fix signed long overflow
    improve comment

v3: fix typo

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 216923
2014-09-02 17:55:02 +00:00
Jan Vesely
62496142d5 configure: Add rpath to prepare-builtins util
v2: use space instead of '=' to make Mac happy

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 216922
2014-09-02 17:54:59 +00:00
Michel Danzer
7b77ab7b2c Fix build against LLVM SVN >= r216488
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 216654
2014-08-28 06:19:37 +00:00
Michel Danzer
a10b492ce3 Fix build against LLVM SVN >= r216393
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 216653
2014-08-28 06:19:33 +00:00
Aaron Watry
9447097636 Revert "Implement generic mad_sat"
This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564.

I didn't mean to commit this...  Jan has a v3 incoming

llvm-svn: 216322
2014-08-23 14:06:01 +00:00
Aaron Watry
a4fdda01b8 Add int3/uint3 to integer-gentype.inc
These were missing and caused mad24/mul24 with int3/uint3 arg type to fail

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216321
2014-08-23 14:04:36 +00:00
Aaron Watry
6bfac7ae69 Implement generic mad_sat
v2: Fix trailing whitespace
    Fix signed long overflow
    improve comment

Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
llvm-svn: 216320
2014-08-23 14:04:33 +00:00
Niels Ole Salscheider
109ce69e75 Include llvm-config.h instead of config.h
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 216296
2014-08-22 22:24:28 +00:00
Tom Stellard
b8478abd2e Add missing file from r216127
llvm-svn: 216128
2014-08-20 21:28:44 +00:00
Tom Stellard
2ad4243bf7 Implement prefetch builtin
The default implementation is a no-op.  Targets should override this
with their own implementations.

llvm-svn: 216127
2014-08-20 21:23:03 +00:00
Tom Stellard
425c181fb4 R600: Add aliases for hainan and mullins
llvm-svn: 216126
2014-08-20 21:23:01 +00:00
Aaron Watry
f991505d02 vload/vstore: Use casts instead of scalarizing everything in CLC version
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.

v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
2014-08-20 13:58:57 +00:00
Jan Vesely
12c660827e relational: Add islessequal(floatN) builtin
v2: remove the initial undef

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 214568
2014-08-01 21:50:59 +00:00
Jan Vesely
acba2c98eb relational: Add isless(floatN) builtin
v2: remove the initial undef

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 214567
2014-08-01 21:50:55 +00:00
Tom Stellard
903a78b7c6 Implement sin builtin for float types
This double version still uses @llvm.sin.

llvm-svn: 213762
2014-07-23 15:16:21 +00:00
Tom Stellard
c0ab2f81e3 Implement cos builtin for float types
The double version still uses @llvm.cos.

llvm-svn: 213761
2014-07-23 15:16:18 +00:00
Tom Stellard
f9caca8b9d Implement atan2 builtin
llvm-svn: 213760
2014-07-23 15:16:16 +00:00
Tom Stellard
47882923c7 Implement atan builtin
llvm-svn: 213759
2014-07-23 15:16:13 +00:00
Aaron Watry
9ef589e9cf Add several missing double constant definitions
These were present in CL 1.0, just not implemented yet.

v2: Use hex values and fix commit message

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 213321
2014-07-17 22:07:35 +00:00
Aaron Watry
d7f022a582 relational: Implement isnotequal
v2: Use relational macros instead of hand-rolled ones

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213320
2014-07-17 22:07:32 +00:00
Aaron Watry
30102536c0 relational: Implement isgreaterequal
v2: Use relational macros instead of hand-rolled macros

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213319
2014-07-17 22:07:27 +00:00
Aaron Watry
803a992f04 relational: Implement isgreater
v2: Use relational macros instead of hand-rolled macros

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213318
2014-07-17 22:07:19 +00:00
Aaron Watry
9335fe8eff relational/signbit: Refactor to use relational macros
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213317
2014-07-17 22:05:25 +00:00
Aaron Watry
d5aace4874 Fix isnan definition for vector results
Vector true is -1, not 1, which means we need to use the relational unary
macro instead of the normal unary builtin one.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213316
2014-07-17 22:05:22 +00:00
Aaron Watry
13116cf01a relational: create re-usable macros for relational declarations
relational.h includes relational macros for defining functions which need to
return 1 for scalar true and -1 for vector true.

I believe that this is the only place that this behavior is required, so the
macro is placed at its lowest useful level (same directory as it is used in).

This also creates re-usable unary/binary declaration and floatn includes which
should simplify relational builtin declarations.

Mostly patterned off of include/math/[binary_decl|unary_decl|floatn].inc
but with required changes for relational functions.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213315
2014-07-17 22:05:16 +00:00
Tom Stellard
8d92c396d9 prepare-builtins: Fix broken build due to recent LLVM API change
llvm-svn: 212470
2014-07-07 17:46:45 +00:00
Jeroen Ketema
575fb84cc3 OpenCL 1.1 does not define CL_VERSION_1_2 so use hardcoded number instead
Otherwise the test evaluates to true on OpenCL 1.1 and earlier. Since we
therefore cannot use the CL_VERSION_?_? macros move them to the proper
position in the top-level header.

llvm-svn: 211787
2014-06-26 15:26:38 +00:00
Aaron Watry
d7f158e006 relational: Fix signbit
The vector components were mistakenly using () instead of {}, which caused
all but the last vector component to be dropped on the floor.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 211733
2014-06-25 21:08:38 +00:00
Aaron Watry
d9ee196eab relational: Implement signbit
v2 Changes:
   - use __builtin_signbit instead of shifting by hand
   - significantly improve vector shuffling
   - Works correctly now for signbit(float16) on radeonsi

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 211696
2014-06-25 13:29:23 +00:00
Jeroen Ketema
42df5d2a8f Add exp10
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211680
2014-06-25 10:06:35 +00:00
Jeroen Ketema
dd1fbc0082 Add half limits
These are apparently only defined in OpenCL 1.2.

HALF_MAX, HALF_MIN and HALF_EPSILON are currently omitted. Clang does
not seem to support the ‘h’ suffix for half float constants even with
the cl_khr_fp16 extension enabled.

Reviewed-by: Tom Sellard <tom@stellard.net>
llvm-svn: 211579
2014-06-24 09:51:01 +00:00
Jeroen Ketema
046b47fbbe Introduce CLC_VERSION macros v2
Add these out-of-order in clc.h so we can use these in other headers.

v2: Take into account the lack of a definition in OpenCL 1.0

Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211578
2014-06-24 09:46:52 +00:00
Jeroen Ketema
985a1381b2 Add MAXFLOAT
Align definitions while we are here.

Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211577
2014-06-24 09:41:28 +00:00
Jeroen Ketema
526fe2d501 Move clcmacro.h to avoid cluttering user namespace v2
v2: - use quotes instead of <>
    - add include to r600/lib/math/nextafter.c changed

Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211576
2014-06-24 09:36:32 +00:00
Jeroen Ketema
bfdb1c0c2f Protect functions taking double by #ifdef cl_khr_fp64
Also change the order of the functions to be consistent with
the order in the header files.

llvm-svn: 211496
2014-06-23 14:15:39 +00:00
Jeroen Ketema
d253e66ec7 Fix breakage after r211259
While we are here introduce the proper headers for the error code.

Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 211432
2014-06-21 09:20:31 +00:00