This page documents production updates to Cluster Toolkit. Check this page for announcements about new or updated features, bug fixes, known issues, and deprecated functionality.
You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.
November 19, 2025
Cluster Toolkit version v1.73.0 is available. This release adds support
for the
GKE Inference Gateway
and adds a new blueprint for A3
High machines that
automates the process of building a custom image with a TCPx-patched kernel for
enhanced network performance. This version also includes an initial blueprint
for G4 machine types and parameterized the
gIB NCCL RDMA plugin installer in the gke-a4x.yaml blueprint. For more
information about this release, see the Release announcement on
GitHub.
November 13, 2025
Cluster Toolkit version v1.72.0 is available. This release adds support
for Google Cloud Managed Lustre as an optional storage solution for the
gke-tpu-v6-advanced blueprint. This release also adds four example
blueprints to support the deployment of Sycomp storage. In addition, this release makes
improvements to the gke-node-pool module and a4xhigh-slurm-blueprint.yaml
blueprint. For more information about this release, see the Release
announcement on
GitHub.
November 03, 2025
Cluster Toolkit version v1.71.0 is available. This release includes a
fix for a munge mount on login failure due to slow controller Slurm v6 setup, and
adds Managed Lustre support in the gke-a4x blueprint. For more information about
this release, see the Release announcement on
GitHub.
October 24, 2025
Cluster Toolkit version v1.70.0 is available. This release adds
automated TPU support and Cloud Storage FUSE mounts in the TPU v6 blueprint and refactors
the H4D blueprint. This version also includes breaking changes, such as removing
support for the maintenance_interval field for reservations created by Technical Account Managers (TAMs) and
migrating Jobset from static manifests to a Helm chart. For a complete list of
changes, see the Release announcement on
GitHub.
October 21, 2025
Cluster Toolkit version v1.69.0 is available. This release adds
NUMA-aware scheduling in GKE clusters for G4 machines and adds a new module that
provides mount scripts for WEKA filesystems. This version also includes PSA
updates and adds a GKE sample for running the nvidia-bug-report shell script. For details, see
the Release announcement on
GitHub.
October 10, 2025
Cluster Toolkit version v1.68.0 is available. This release lets you
download the NVIDIA Collective Communications Library (NCCL) software packages
libnccl2 and libnccl-dev for A3U and A4H machine types. For more information
about this release, see the Release announcement on
GitHub.
This release supports the generally available, open-source IBM Spectrum Symphony HostFactory connectors for Google Compute Engine and Google Kubernetes Engine, which you can deploy through Cluster Toolkit to extend your on-premises cluster or run entirely within Google Cloud. For information, see Run IBM Spectrum Symphony workloads.
September 19, 2025
Cluster Toolkit version v1.67.0 is available. This release adds support for aarch64-based architecture. For more information about this release, see the Release announcement on GitHub.
September 15, 2025
Cluster Toolkit version v1.66.0 is available. This release lets you use Cloud Storage FUSE for H4D machine types and sets the default cluster availability to zonal. For more information about this release, see the Release announcement on GitHub.
September 09, 2025
Cluster Toolkit version v1.65.0 is available. This release expands support for Managed Lustre on A4X instances and provides an improved GPU network wait solution for A-family machine types. This version also deprecates Debian-based blueprints for A3 Mega GPUs. For a complete list of changes, see the Release announcement on GitHub.
September 01, 2025
Cluster Toolkit version v1.64.0 is available. This release integrates GKE Managed Lustre, which provides high-performance, scalable storage for your GKE clusters. The storage for A3 Ultra machine types now uses basic SSD for improved performance. This version also improves support for alternative services for private service access. For details about these changes and other updates, see the Release announcement on GitHub.
August 26, 2025
Cluster Toolkit version v1.63.0 is available. This release upgrades
Slurm image versions to the 6-11 iteration, for example,
slurm-gcp-6-11-hpc-rocky-linux-8. This version migrates Dynamic Workload
Scheduler (DWS) Flex-start to regional managed instance groups (MIGs). This
version also includes breaking changes, for example, updating the file storage
for A3 Ultra machine types to basic HDD. For a complete list of new features,
improvements, and bug fixes, see the Release announcement on
GitHub.
August 14, 2025
Cluster Toolkit version v1.62.0 is available. This release adds new blueprints for A4X instances and adds a community scheduler module for Slinky (Slurm on Kubernetes). For details, see the Release announcement on GitHub.
August 04, 2025
Cluster Toolkit version v1.61.0 is available. This release adds a namespace for GKE modules and optimizes Cloud Storage FUSE configurations for A3 Ultra and A4 blueprints. For details, see the Release announcement on GitHub.
July 22, 2025
Cluster Toolkit version v1.59.0 is available. This release adds a configurable number of IP addresses per NAT for A-family blueprints and fixes an issue with additional disks for login nodes. For details, see the Release announcement on GitHub.
July 15, 2025
Cluster Toolkit version v1.58.0 is available. This release adds a new blueprint for deploying GKE clusters with H4D instances and deprecates blueprints for deploying Parallelstore. For details, see the Release announcement on GitHub.
June 30, 2025
Cluster Toolkit version v1.57.0 is available. This release integrates Cluster Health Scripts (CHS) with GKE blueprints for A3 Mega, A3 Ultra, and A4 instances. For details, see the Release announcement on GitHub.
June 23, 2025
Cluster Toolkit version v1.56.0 is available. This release improves SlurmGCP Resume functionality and includes several bug fixes. For details, see the Release announcement on GitHub.
June 16, 2025
Cluster Toolkit version v1.55.0 is available. This release adds a new blueprint for a high-throughput AlphaFold 3 execution environment and aligns Cloud Storage FUSE configurations with best practices. For details, see the Release announcement on GitHub.
June 10, 2025
Cluster Toolkit version v1.54.0 is available. This release adds Managed Lustre support for non-default ports in GKE and adds a network blocking script for A3 High instances. For details, see the Release announcement on GitHub.
June 05, 2025
Cluster Toolkit version v1.53.0 is available. This release standardizes the A3 Mega Slurm solution on Ubuntu and adds a new GKE blueprint for A4X instances. For details, see the Release announcement on GitHub.
May 22, 2025
Cluster Toolkit version v1.52.0 is available. This release improves CloudSQL support with database flags and query insights, and adds several bug fixes. For details, see the Release announcement on GitHub.
May 13, 2025
Cluster Toolkit version v1.51.0 is available. This release adds GPU health-check epilogs for A3 High and A3 Mega Slurm blueprints and adds a new GKE TPU v6e example. For details, see the Release announcement on GitHub.
May 05, 2025
Cluster Toolkit version v1.50.0 is available. This release adds new blueprints for Managed Lustre attached to VMs and Slurm clusters. For details, see the Release announcement on GitHub.
April 24, 2025
Cluster Toolkit version v1.49.0 is available. This release adds TPU support to the GKE nodepool module and adds support for Managed Lustre. For details, see the Release announcement on GitHub.
April 01, 2025
Cluster Toolkit version v1.48.0 is available. This release updates the GKE nodepool module to support multiple nodepools and adds automatic GPU health checks for Slurm. For details, see the Release announcement on GitHub.
February 27, 2025
Cluster Toolkit version v1.47.0 is available. This release adds support for the A4 machine family in GKE and Slurm blueprints and adds Dynamic Workload Scheduler Flex support for GKE. For details, see the Release announcement on GitHub.
February 07, 2025
Cluster Toolkit version v1.46.0 is available. This release officially supports Kueue as the workload scheduler for A3U and adds new blueprints for A3U, H4D VMs, and Slurm. For details, see the Release announcement on GitHub.
January 15, 2025
Cluster Toolkit version v1.45.0 is available. This release updates A3 Ultra GKE blueprints to newer versions of Kueue and Jobset, and adds support for cluster deletion protection. For details, see the Release announcement on GitHub.
For information about previous releases of Cluster Toolkit, see the Announcements page on GitHub.