595 Commits

Author SHA1 Message Date
Jing Chen 63f6dd704d Install CNI plugins from the containerd repo.
Remove the legacy settings, the minimum supported CNI version is 0.4.0.

PiperOrigin-RevId: 736255397
2025-03-12 13:57:52 -07:00
Ayush Ranjan 138e98fb7d nvproxy: Refactor DriverVersion out to nvconf package.
This allows for runsc to be able to use DriverVersion without having to depend
on the entirety of nvproxy.

PiperOrigin-RevId: 733912696
2025-03-05 16:43:03 -08:00
Jing Chen 521cc86a91 Use the minimum recommended version of cri-tools for containerd tests.
PiperOrigin-RevId: 733575043
2025-03-04 21:34:30 -08:00
gVisor bot cd3cec20c6 Replace outdated select() on --cpu with platform API equivalent.
PiperOrigin-RevId: 733519356
2025-03-04 17:18:31 -08:00
gVisor bot e0435b9a53 Merge pull request #11415 from avagin:codespell
PiperOrigin-RevId: 721421397
2025-01-30 09:44:28 -08:00
Andrei Vagin f010ae01ac Fix a few typos 2025-01-29 21:16:51 -08:00
gVisor bot 8bdf76c5ca Internal change.
PiperOrigin-RevId: 721019932
2025-01-29 10:13:52 -08:00
Etienne Perot 3649ca9d9e On COS, add NVIDIA library directory to LD configuration and update cache.
Unlike Ubuntu VMs where we use Docker's `--gpus` flag, COS VMs do not use
this flag and instead mount the NVIDIA library directories automatically.
However, nothing guarantees that these directories are added to the LD
config. This change fixes that. It take advantage of the fact that all GPU
tests have the sniffer binary as entrypoint, which slightly overloads the
role of the sniffer within the GPU test infrastructure... but then again
the ioctl sniffer is already deeply intertwined with ld configuration
because it already overrides the `ioctl` libc function, so this doesn't
seem like too big of a stretch.

This change makes the ffmpeg test succeed with `runc` on COS, but they still
fail with gVisor (with `CUDA_ERROR_OUT_OF_MEMORY` errors). So there must be
some further gVisor-specific error.

Updates #11351
Updates #11321

PiperOrigin-RevId: 715222952
2025-01-13 21:13:14 -08:00
Etienne Perot 3f0c7ccf85 PGO: Add make target to refresh profiles for PGO.
This runs all benchmarks tagged as PGO-enabled, of which there is
currently just one for simplicity (the ffmpeg benchmark). All other
benchmarks are initially tagged out of PGO. I will send a different
change to enroll other benchmarks in PGO.

The make target runs each such benchmark and gathers profiles for each
benchmark. Multiple profile files for multiple runs of the same benchmark
are merged into one, then compared against the existing checked-in profile
used for PGO builds (which right now doesn't exist). If such a profile
doesn't exist or differs widely from the freshly-collected profile, then
this new profile is copied into the repository.

Such profiles are not used at all in builds yet, this is just the glue
that keeps them fresh in the repo.

PiperOrigin-RevId: 714311978
2025-01-10 19:44:15 -08:00
Etienne Perot 501618dbe2 Modify make rules to allow using a fully-local build cache.
This includes the ability to force using local images only (do not check
for updated manifests), and to explicitly mount and specify external caches
for Go repositories via `rules_go`'s `GO_REPOSITORY_USE_HOST_MODCACHE`.

PiperOrigin-RevId: 713473497
2025-01-09 12:23:53 -08:00
Ayush Ranjan b11efeaecd nvproxy: Clean up struct field tags.
Before this change, there were 2 places in which the driver struct names were
defined for nvproxy structs:
1. As struct field tags. The first field of structs had a tag `nvproxy:*`. This
   was kind of awkward. Such metadata is usually a struct comment.
2. In version.go while registering the struct with a name. Not all structs are
   defined in nvproxy (for example simple structs). In such cases, the driver
   struct name is directly assigned while registering struct info.

This change gets rid of (1). Most of the struct tags were `nvproxy:"same"`. Now
driverStructs() always infers the driver struct name using the nvproxy struct
name itself. The few cases where the nvproxy tag was needed, because driver
struct name was lower cased, were handled by defining driverStructWithName()
which allows specifying a different name. Now all driver struct names
definitions are in one place.

Along the way, also made the following fixes:
- For some reason, many structs defined in pkg/abi/nvgpu/frontend.go had
  camel-cased naming, while all other structs in pkg/abi/nvgpu/ctrl.go and
  pkg/abi/nvgpu/classes.go were named the same as their driver structs. The
  convention in the abi/* packages is to follow the kernel source naming.
  This is against Go sytle guide, but is more readable for gVisor purposes and
  has been a long accepted convention. This also makes the task of removing (1)
  easier. So renamed all such structs as per their driver names.
- A lot of code in pkg/sentry/devices/nvproxy/version.go was still referring to
  driver struct info as "struct names", even though it was representing more
  than just struct names. It also contains the reflect.Type of the struct which
  is used to compare the nvproxy struct layout to the driver struct layout.

PiperOrigin-RevId: 710648105
2024-12-30 01:32:41 -08:00
Etienne Perot 84172e4e70 Go benchstat parser: Support alternate syntax for parameters.
This supports benchmark names where parameter-value pairs are separated
by `=` rather than `.`, and benchmarks where `GOMAXPROCS` is not appended
to the name of the benchmark.

This is used in Kubernetes benchmarks where the `GOMAXPROCS` value of the
machine running the Kubernetes client has no bearing on the benchmark's
performance.

This also moves the logic for how to handle sub-test names to the caller.
Docker benchmarks continue to have the behavior of treating sub-names
as a `Condition` with key equal to its value. Kubernetes benchmarks will
instead treat sub-test names as a single `Condition` called `subtest`.

PiperOrigin-RevId: 709861103
2024-12-26 12:51:43 -08:00
Etienne Perot 5fdedb5276 Parallelize BuildKite "All GPU drivers test".
Each shard takes on a subset of the supported driver versions, as determined
by a counter.

Also sort the order in which the list of supported versions is written out
so that console output isn't confusing.

PiperOrigin-RevId: 706887043
2024-12-16 17:24:57 -08:00
Andrei Vagin 6eaa40f345 Automated rollback of changelist 696823576
PiperOrigin-RevId: 703567599
2024-12-06 11:50:03 -08:00
Etienne Perot 5ca21867d9 tools/bigquery: Skip results upload if there are no results.
(Diffbased)

PiperOrigin-RevId: 703297420
2024-12-05 17:01:51 -08:00
Lucas Manning fc249c4464 Add a script for benchmarking vLLM startup time with various models.
PiperOrigin-RevId: 703159146
2024-12-05 10:19:11 -08:00
gVisor bot b78f2ee7c4 Internal change.
PiperOrigin-RevId: 702733881
2024-12-04 08:26:45 -08:00
gVisor bot 395c0ac172 Internal change.
PiperOrigin-RevId: 702043346
2024-12-02 12:08:23 -08:00
Etienne Perot 92c1208157 profiletool: Add subcommand to check for similarity between two profiles.
This is meant to be used as part of a pipeline to refresh the profiles used
for PGO builds. This is used to quantify how similar the profiles are to
existing checked-in profiles. If they are similar enough, then the checked-in
profiles do not need to be updated.

PiperOrigin-RevId: 700788407
2024-11-27 13:17:41 -08:00
Lucas Manning 2267c24a41 Add support for custom socket options and setting the experiment IP option.
PiperOrigin-RevId: 700011458
2024-11-25 09:43:21 -08:00
Jing Chen 39a6242b54 Allow DOCKER_FORCE_PUSH for test images.
The push-* command skips rebuild and image push w/ `latest` tag when a image
tag exists remotely. Most of gVisor test images have never got a chance to be
tagged because they already exist when `DOCKER_PUSH_AS_LATEST` was introduced.

We shouldn't enable DOCKER_FORCE_PUSH in our pipelines, it is better to be a
one off tool when the `latest` tag of an image is missing from remote repos.

PiperOrigin-RevId: 696823576
2024-11-15 03:02:36 -08:00
Zach Koopmans 891ab9fd14 Add -test.v to cos gpu driver test.
PiperOrigin-RevId: 693919284
2024-11-06 17:25:19 -08:00
Zach Koopmans 23c8b4b042 Add test to check COS drivers as they are posted.
Our current check of COS drivers often lags behind COS releases.
This is due to needing to preload GPU docker images onto the
images that run in our CI pipelines.

In addition, COS can be a bit more complex than originally thought
releasing driver versions both across GPU types and release branches.

Thus, this test searches the latest COS images on each family for
new drivers. It does this by looking at COS's published release notes
which include a proto of LATEST/DEFAULT drivers selected for each device.

This will flag new versions faster with more coverage than our
CI pipeline currently. Due to this not actually needing a GPU
to run, this can run on any VM.

PiperOrigin-RevId: 693736100
2024-11-06 08:30:38 -08:00
Nevena Kotlaja 676b9db40f Remove implicit dependencies of _allowlist_function_transition in third_party
Rules are not required to have an implicit dependencies on the transition allowlist since Bazel knows where the file is.

PiperOrigin-RevId: 693260553
2024-11-05 02:04:19 -08:00
Jamie Liu 0608f803c8 Work around Ubuntu 22.04 kernel compiler version problem.
Before this CL (extraneous whitespace trimmed):

```
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.90.07..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.
ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.
ERROR: An error occurred while performing the step: "Checking to see whether the nvidia kernel module was successfully built". See /var/log/nvidia-installer.log for details.
ERROR: The nvidia kernel module was not created.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
...
[nvidia-installer]:    warning: the compiler differs from the one used to build the kernel
[nvidia-installer]:      The kernel was built by: x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
[nvidia-installer]:      You are using:           cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
[nvidia-installer]:
[nvidia-installer]:    Warning: Compiler version check failed:
[nvidia-installer]:
[nvidia-installer]:    The major and minor number of the compiler used to
[nvidia-installer]:    compile the kernel:
[nvidia-installer]:
[nvidia-installer]:    x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38
[nvidia-installer]:
[nvidia-installer]:    does not match the compiler used here:
[nvidia-installer]:
[nvidia-installer]:    cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
[nvidia-installer]:    Copyright (C) 2021 Free Software Foundation, Inc.
[nvidia-installer]:    This is free software; see the source for copying conditions.  There is NO
[nvidia-installer]:    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[nvidia-installer]:
[nvidia-installer]:
[nvidia-installer]:    It is recommended to set the CC environment variable
[nvidia-installer]:    to the compiler that was used to compile the kernel.
[nvidia-installer]:
[nvidia-installer]:    To skip the test and silence this warning message, set
[nvidia-installer]:    the IGNORE_CC_MISMATCH environment variable to "1".
[nvidia-installer]:    However, mixing compiler versions between the kernel
[nvidia-installer]:    and kernel modules can result in subtle bugs that are
[nvidia-installer]:    difficult to diagnose.
[nvidia-installer]:
[nvidia-installer]:    *** Failed CC version check. ***
...
[nvidia-installer]:    cc: error: unrecognized command-line option '-ftrivial-auto-var-init=zero'
[nvidia-installer]:    make[4]: *** [scripts/Makefile.build:243: /tmp/selfgz40736/NVIDIA-Linux-x86_64-550.90.07/kernel/nvidia/nv.o] Error 1
```

After this CL:

```
I1025 19:39:00.337553   62025 install_driver.go:325] getOSRelease() = map[BUG_REPORT_URL:https://bugs.launchpad.net/ubuntu/ HOME_URL:https://www.ubuntu.com/ ID:ubuntu ID_LIKE:debian NAME:Ubuntu PRETTY_NAME:Ubuntu 22.04.5 LTS PRIVACY_POLICY_URL:https://www.ubuntu.com/legal/terms-and-policies/privacy-policy SUPPORT_URL:https://help.ubuntu.com/ UBUNTU_CODENAME:jammy VERSION:22.04.5 LTS (Jammy Jellyfish) VERSION_CODENAME:jammy VERSION_ID:22.04]
I1025 19:39:00.337657   62025 install_driver.go:281] Forcing gcc-12
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.90.07..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.
WARNING: nvidia-installer was forced to guess the X library path '/usr/lib' and X module path '/usr/lib/xorg/modules'; these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
WARNING: This NVIDIA driver package includes Vulkan components, but no Vulkan ICD loader was detected on this system. The NVIDIA Vulkan ICD will not function without the loader. Most distributions package the Vulkan loader; try installing the "vulkan-loader", "vulkan-icd-loader", or "libvulkan1" package.

Fri Oct 25 19:42:46 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   76C    P0             31W /   70W |       1MiB /  15360MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
I1025 19:42:46.758261   62025 install_driver.go:125] Installation Complete!
```

PiperOrigin-RevId: 692271389
2024-11-01 13:18:16 -07:00