This can be used by all test users. Avoids duplicated code. We can handle all
known issues in one place.
There is a Docker bug which causes restore to fail sporadically. See
https://github.com/moby/moby/issues/42900. This has been broken at least since
Docker v19.03.12 (when the issue was reported) and was fixed in v25.0.4. Added
the handling for this issue.
Also got rid of the testutil.Poll() around restore. That can hide gVisor
restore flakiness issues. That was added in 0990ef7517 ("Make
checkpoint/restore e2e test less flaky"). The original sleep has been restored.
PiperOrigin-RevId: 734303878
Add image for ARM workloads for cuda-tests and mark tests that work on ARM.
Most tests don't work due to cross-compiling between sbma and aarch64.
However, a few do. Add an image to support them.
While this test does use the P2P capability, it also has fallback code to
do regular memcopies when P2P is not available. So it does not require the
P2P capability.
PiperOrigin-RevId: 714344567
COS can take a day or two to release a new version. While this is happening
versions can appear in gcloud projects, but not on the public site. In this
case, pass the tests so that we don't have failures while release is happening.
PiperOrigin-RevId: 713388040
This test was broken by 5e6589e0b7 ("Update CUDA test compatibility to keep
up with added gVisor support.") which requires all
images/gpu/cuda-tests/run_sample.go users to specify "all" driver capabilities.
PiperOrigin-RevId: 713170590
These CUDA tests were initially broken in gVisor but now appear to pass.
The test now also verifies that all capabilities are enabled when running.
PiperOrigin-RevId: 713094806
This refreshes the set of models built into the image to a more diverse
set of models while keeping the same categories covered.
It also adds support for embedding generation and benchmark metrics for
embedding-type models.
The image is also (slightly) smaller which helps make benchmarks not take
forever.
PiperOrigin-RevId: 706594537
Our current check of COS drivers often lags behind COS releases.
This is due to needing to preload GPU docker images onto the
images that run in our CI pipelines.
In addition, COS can be a bit more complex than originally thought
releasing driver versions both across GPU types and release branches.
Thus, this test searches the latest COS images on each family for
new drivers. It does this by looking at COS's published release notes
which include a proto of LATEST/DEFAULT drivers selected for each device.
This will flag new versions faster with more coverage than our
CI pipeline currently. Due to this not actually needing a GPU
to run, this can run on any VM.
PiperOrigin-RevId: 693736100
The presubmit pipeline does not exercise these, so commit
a689c11a76 broke them.
This change disables the sniffer on all the non-smoke tests to unbreak
the release pipeline. I will then send another change to re-enable them
on tests where the sniffer works fine after manual testing.
Updates #10885
PiperOrigin-RevId: 673555026
This wraps all GPU tests' command line with the nvproxy ioctl sniffer.
This has multiple functions:
- Verifying that the application does not call ioctls unsupported by
nvproxy. This is controlled by a `AllowIncompatibleIoctl` option, which
is initially set to `true` in all tests to mirror current behavior, but
should be flipped as we verify that they do not call unsupported ioctls.
- Verifying that the sniffer itself works transparently for a wide range
of applications.
- Later down the line, enforcing that the application only calls ioctls
that are part of GPU capabilities that it has a need for. This is
controlled by a capability string which is currently only used to set
the `NVIDIA_DRIVER_CAPABILITIES` environment variable.
Updates issue #10856
PiperOrigin-RevId: 672714520
Previously, run_sniffer was not correctly reporting when unsupported ioctls
were found with the compatibility flag set. At the same time, the sniffer test
was not correctly testing a supported CUDA program, since it was using
run_sample which is currently broken with the sniffer.
This also adds the sniffer test to the list of gpu tests.
PiperOrigin-RevId: 663911778