Commit Graph

315 Commits

Author SHA1 Message Date
Yaxun Liu 3cab24aa4f [OpenCL] Fix typos in emitted enqueue kernel function names
Two typos: 
vaarg => vararg
get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple

Differential Revision: https://reviews.llvm.org/D46601

llvm-svn: 331895
2018-05-09 17:07:06 +00:00
Anastasia Stulova 59055b94af [OpenCL] Add constant address space to __func__ in AST.
Added string literal helper function to obtain the type
attributed by a constant address space.

Also fixed predefind __func__ expr to use the helper
to constract the string literal correctly.

Differential Revision: https://reviews.llvm.org/D46049

llvm-svn: 331877
2018-05-09 13:23:26 +00:00
Erich Keane 14c1085317 Add Microsoft Mangling for OpenCL Half Type
Half-type mangling is accomplished following the method introduced by Erich 
Keane for mangling _Float16. Updated the half.cl LIT test to cover this 
particular case.

Patch By: vbridgers

Differential Revision: https://reviews.llvm.org/D46131

llvm-svn: 331263
2018-05-01 14:16:15 +00:00
Matt Arsenault d2da3c20d7 AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331216
2018-04-30 19:08:27 +00:00
Sven van Haastregt 4700faa28e [OpenCL] Add separate read_only and write_only pipe IR types
SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V.  Other backends
may also find this useful.

These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.

This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:

 * __get_pipe_num_packets_ro(...)
 * __get_pipe_num_packets_wo(...)
 * __get_pipe_max_packets_ro(...)
 * __get_pipe_max_packets_wo(...)

These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.

Patch by Stuart Brady.

Differential Revision: https://reviews.llvm.org/D46015

llvm-svn: 331026
2018-04-27 10:37:04 +00:00
Hans Wennborg a417362c28 Fix some tests that were failing on Windows
llvm-svn: 330441
2018-04-20 15:33:44 +00:00
Alexey Sotkin 3858e26f22 [OpenCL] Add 'denorms-are-zero' function attribute
Summary:
Generate attribute 'denorms-are-zero'='true' if '-cl-denorms-are-zero'
compile option was specified and 'denorms-are-zero'='false' otherwise.

Patch by krisb

Reviewers: Anastasia, yaxunl

Reviewed By:  yaxunl 

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D45808

llvm-svn: 330404
2018-04-20 08:08:04 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Matt Arsenault b130ea5605 AMDGPU: Update datalayout for stack alignment
llvm-svn: 328657
2018-03-27 19:26:51 +00:00
Yaxun Liu ac1263cd54 [AMDGPU] Fix codegen for inline assembly
Need to override convertConstraint to recognise amdgpu specific register names.

Differential Revision: https://reviews.llvm.org/D44533

llvm-svn: 328359
2018-03-23 19:43:42 +00:00
Tony Tye 68e11a6eca [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU (CLANG)
Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.

Differential Revision: https://reviews.llvm.org/D44696

llvm-svn: 328350
2018-03-23 18:51:45 +00:00
Tony Tye 1a3f3a2d14 [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU (CLANG)
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use a function attribute to communicate to the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D43735

llvm-svn: 328347
2018-03-23 18:43:15 +00:00
Yaxun Liu 5b330e8d61 Recommit r326946 after reducing CallArgList memory footprint
llvm-svn: 327634
2018-03-15 15:25:19 +00:00
Richard Smith 007cb6df58 Revert r326946. It caused stack overflows by significantly increasing the size of a CallArgList.
llvm-svn: 327195
2018-03-10 01:47:22 +00:00
Yaxun Liu 06dd81149f CodeGen: Fix address space of indirect function argument
The indirect function argument is in alloca address space in LLVM IR. However,
during Clang codegen for C++, the address space of indirect function argument
should match its address space in the source code, i.e., default addr space, even
for indirect argument. This is because destructor of the indirect argument may
be called in the caller function, and address of the indirect argument may be
taken, in either case the indirect function argument is expected to be in default
addr space, not the alloca address space.

Therefore, the indirect function argument should be mapped to the temp var
casted to default address space. The caller will cast it to alloca addr space
when passing it to the callee. In the callee, the argument is also casted to the
default address space and used.

CallArg is refactored to facilitate this fix.

Differential Revision: https://reviews.llvm.org/D34367

llvm-svn: 326946
2018-03-07 21:45:40 +00:00
Yaxun Liu cb35e9fa94 [OpenCL] Remove block invoke function from emitted block literal struct
OpenCL runtime tracks the invoke function emitted for
any block expression. Due to restrictions on blocks in
OpenCL (v2.0 s6.12.5), it is always possible to know the
block invoke function when emitting call of block expression
or __enqueue_kernel builtin functions. Since __enqueu_kernel
already has an argument for the invoke function, it is redundant
to have invoke function member in the llvm block literal structure.

This patch removes invoke function from the llvm block literal
structure. It also removes the bitcast of block invoke function
to the generic block literal type which is useless for OpenCL.

This will save some space for the kernel argument, and also
eliminate some store instructions.

Differential Revision: https://reviews.llvm.org/D43783

llvm-svn: 326937
2018-03-07 19:32:58 +00:00
Rafael Espindola 922f2aa9b2 Bring r325915 back.
The tests that failed on a windows host have been fixed.

Original message:

Start setting dso_local for COFF.

With this there are still some GVs where we don't set dso_local
because setGVProperties is never called. I intend to fix that in
followup commits. This is just the bare minimum to teach
shouldAssumeDSOLocal what it should do for COFF.

llvm-svn: 325940
2018-02-23 19:30:48 +00:00
Alexey Sotkin 20f65928e1 [OpenCL] Add '-cl-uniform-work-group-size' compile option
Summary:
OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option,
which requires that the global work-size be a multiple of the work-group
size specified to clEnqueueNDRangeKernel and allows optimizations that
are made possible by this restriction.

The patch introduces the support of this option.

To keep information about whether an OpenCL kernel has uniform work
group size or not, clang generates 'uniform-work-group-size' function
attribute for every kernel:
- "uniform-work-group-size"="true" for OpenCL 1.2 and lower,
- "uniform-work-group-size"="true" for OpenCL 2.0 and higher if
 '-cl-uniform-work-group-size' option was specified,
- "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no
 '-cl-uniform-work-group-size' options was specified.

If the function is not an OpenCL kernel, 'uniform-work-group-size'
attribute isn't generated.

Patch by: krisb

Reviewers: yaxunl, Anastasia, b-sumner

Reviewed By: yaxunl, Anastasia

Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits

Differential Revision: https://reviews.llvm.org/D43570

llvm-svn: 325771
2018-02-22 11:54:14 +00:00
Yaxun Liu f8ad59d99d Clean up AMDGCN tests
Differential Revision: https://reviews.llvm.org/D43340

llvm-svn: 325279
2018-02-15 19:12:41 +00:00
Yaxun Liu fa13d015a3 [OpenCL] Fix __enqueue_block for block with captures
The following test case causes issue with codegen of __enqueue_block

void (^block)(void) = ^{ callee(id, out); };

enqueue_kernel(queue, 0, ndrange, block);
Clang first does codegen for block expression in the first line and deletes its block info.
Clang then tries to do codegen for the same block expression again for the second line,
and fails because the block info is gone.

The fix is to do normal codegen for both lines. Introduce an API to OpenCL runtime to
record llvm block invoke function and llvm block literal emitted for each AST block
expression, and use the recorded information for generating the wrapper kernel.

The EmitBlockLiteral APIs are cleaned up to minimize changes to the normal codegen
of blocks.

Another minor issue is that some clean up AST expression is generated for block
with captures, which can be stripped by IgnoreImplicit.

Differential Revision: https://reviews.llvm.org/D43240

llvm-svn: 325264
2018-02-15 16:39:19 +00:00
Yaxun Liu 651bd73c02 [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43171

llvm-svn: 325031
2018-02-13 18:01:21 +00:00
Matt Arsenault e7da136a74 AMDGPU: Update for datalayout change
llvm-svn: 324748
2018-02-09 16:58:41 +00:00
Matt Arsenault 935574a490 Fix crash on array initializer with non-0 alloca addrspace
llvm-svn: 324641
2018-02-08 19:37:09 +00:00
Daniil Fukalov da2a0558ea Recommit rL323890: [AMDGPU] Add ds_fadd, ds_fmin, ds_fmax builtins functions
Fixed asserts in tests.

llvm-svn: 324201
2018-02-04 22:32:07 +00:00
Yaxun Liu f5f45e5e63 [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding llvm change.

Differential Revision: https://reviews.llvm.org/D40956

llvm-svn: 324102
2018-02-02 16:08:24 +00:00