Summary:
We rely on these flags to do things in the runtime and print the
contents of binaries correctly. CUDA updated their ABI encoding recently
and we didn't handle that. it's a new ABI entirely so we just select on
it when it shows up.
Fixes: https://github.com/llvm/llvm-project/issues/148703
[LLVM] Fix offload and update CUDA ABI for all SM values (#159354)
Summary:
Turns out the new CUDA ABI now applies retroactively to all the other
SMs if you upgrade to CUDA 13.0. This patch changes the scheme, keeping
all the SM flags consistent but using an offset.
Fixes: https://github.com/llvm/llvm-project/issues/159088
This is a hotfix for #148615 - it fixes the issue for me locally.
I think a broader issue is that in the test environment we're calling
olShutDown from a global destructor in the test binaries. We should do
something more controlled, either calling olInit/olShutDown in every
test, or move those to a GTest global environment. I didn't do that
originally because it looked like it needed changes to LLVM's GTest
wrapper.
Add `OffloadDeviceTest::getPlatformBackend()` and use it to skip event
tests which currently fail on AMDGPU due to:
```
OL_ERRC_UNIMPLEMENTED: synchronize event not implemented
```
`olGetKernel` has been replaced by `olGetSymbol` which accepts a
`Kind` parameter. As well as loading information about kernels, it
can now also load information about global variables.
In the future, we want `ol_symbol_handle_t` to represent both kernels
and global variables The first step in this process is a rename and
promotion to a "typed handle".
The `GlobalTy` helper has been extended to make both the Size and Ptr be
optional. Now `getGlobalMetadataFromDevice`/`Image` is able to write the
size of the global to the struct, instead of just verifying it.
* Add spec generation to offload-tblgen tool
* This patch adds generation of Sphinx compatible reStructuedText
utilizing the C domain to document the Offload API directly from the
spec definition `.td` files.
* Add Sphinx HTML documentation target
* Introduces the `docs-offload-html` target when CMake is configured
with `LLVM_ENABLE_SPHINX=ON` and `SPHINX_OUTPUT_HTML=ON`. Utilized
`offload-tblgen -gen-spen` to generate Offload API specification docs.
Add info queries for queues and events.
`olGetQueueInfo` only supports getting the associated device. We were
already tracking this so we can implement this for free. We will likely
add other queries to it in the future (whether the queue is empty, what
flags it was created with, etc)
`olGetEventInfo` only supports getting the associated queue. This is
another thing we were already storing in the handle. We'll be able to
add other queries in future (the event type, status, etc)
Adds two "launch kernel" tests for lib offload, one testing that
global memory works and persists between different kernels, and one
verifying that `[[gnu::constructor]]` works correctly.
Since we now have tests that contain multiple kernels in the same
binary, the test framework has been updated a bit.
The Offload and Flang-RT had the ability to compile GTest themselves.
But in bootstrapping builds, LLVM_LIBRARY_OUTPUT_INTDIR points to the
same location as the stage1 build. If both are building GTest, they
everwrite each others `libllvm_gtest.a` and `libllvm_test_main.a` which
causes #143134.
This PR removes the ability for the Offload/Flang-RT runtimes to build
their own GTest and instead relies on the stage1 build of GTest. This
was already the case with LLVM_INSTALL_GTEST=ON configurations. For
LLVM_INSTALL_GTEST=OFF configurations, we now also export gtest into the
buildtree configuration. Ultimately, this reduces combinatorial
explosion of configurations in which unittests could be built
(LLVM_INSTALL_GTEST=ON, GTest built by Offload, GTest built by Flang-RT,
GTest built by Offload and also used by Flang-RT).
GTest and therefore Offload/Runtime unittests will not be available if
the runtimes are configured against an LLVM install tree. Since llvm-lit
isn't available in the install tree either, it doesn't matter.
Note that compiler-rt and libc also use GTest in non-default
configrations. libc also depends on LLVM's GTest build (and would
error-out if unavailable), but compiler-rt builds it completely
different.
Fixes#143134
This is a generated file which contains a macro for all Device Info
keys. This is visible to the plugin interface so that it can use the
definitions in a future patch.
The `unloadBinaryImpl` method on the host plugin is now implemented
properly (rather than just being a stub). When an image is unloaded,
it is deallocated and the library associated with it is closed.
The output of the compile-and-run tests is incorrect. These will be used
for reference in future commits that resolve the issues.
Also updated the existing clang LIT test,
target_map_both_pointer_pointee_codegen.cpp, with more constructs and
fewer CHECKs (through more update_cc_test_checks filters).
After #146345 the device info implementation requires a value for every
query, rather than silently returning an empty string. This broke the
test for `OL_DEVICE_INFO_VENDOR` on CUDA.
Add a value to the CUDA plugin. We can quite safely hard code this one.
Previously, the user was not able to use more than 48 KB of shared
memory on NVIDIA GPUs. In order to do so, setting the function attribute
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` is required, which was not
present in the code base. With this commit, we add the ability toset
this attribute, allowing the user to utilize the full power of their
GPU.
In order to not have to reset the function attribute for each launch of
the same kernel, we keep track of the maximum memory limit (as the
variable `MaxDynCGroupMemLimit`) and only set the attribute if our
desired amount exceeds the limit. By default, this limit is set to 48
KB.
Feedback is greatly appreciated, especially around setting the new
variable as mutable. I did this becuase the `launchImpl` method is const
and I am not able to modify my variable otherwise.
---------
Co-authored-by: Giorgi Gvalia <ggvalia@login33.chn.perlmutter.nersc.gov>
Co-authored-by: Giorgi Gvalia <ggvalia@login07.chn.perlmutter.nersc.gov>