InfiniBand fabrics
Each GPU cluster is created in one of physical InfiniBand fabrics. This is where GPUs interconnected over InfiniBand are located. Each fabric has limited GPU capacity. When creating a GPU cluster, select an InfiniBand fabric for it. Take into account the type of GPUs you are going to use. For example, if you selectfabric-7, you can only add NVIDIA® H200 NVLink with Intel Sapphire Rapids GPUs to this cluster.
Available fabrics and corresponding regions (private regions are marked with *):
| Fabric | GPU platform | Region |
|---|---|---|
fabric-2 | NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm) | eu-north1 |
fabric-3 | NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm) | eu-north1 |
fabric-4 | NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm) | eu-north1 |
fabric-5 | NVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm) | eu-west1 |
fabric-6 | NVIDIA® H100 NVLink with Intel Sapphire Rapids (gpu-h100-sxm) | eu-north1 |
fabric-7 | NVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm) | eu-north1 |
eu-north2-a | NVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm) | eu-north2 |
me-west1-a | NVIDIA® B200 NVLink with Intel Emerald Rapids (gpu-b200-sxm-a) | me-west1 |
uk-south1-a | NVIDIA® B300 NVLink with Intel Granite Rapids (gpu-b300-sxm) | uk-south1 |
us-central1-a | NVIDIA® H200 NVLink with Intel Sapphire Rapids (gpu-h200-sxm) | us-central1 |
us-central1-b | NVIDIA® B200 NVLink with Intel Emerald Rapids (gpu-b200-sxm) | us-central1 |
In most cases, you do not need to change the preselected fabric. We recommend that you create a GPU cluster in another fabric only if it is better suited for a different platform or if you experience capacity issues with an existing GPU cluster.
Isolation and security of InfiniBand traffic
Nebius AI Cloud isolates InfiniBand traffic between GPU clusters by using InfiniBand partition keys (P-Keys). Each GPU cluster is assigned a unique P-Key to create isolation inside shared physical InfiniBand fabrics. This way, nodes in different GPU clusters cannot communicate over InfiniBand even if they use the same fabric infrastructure. This creates isolation between tenants without requiring a dedicated physical fabric for each cluster.How to enable InfiniBand for VMs with GPUs
- Web console
- CLI
-
Create a GPU cluster:
- In the sidebar, go to
Compute → GPU clusters.
- Click
Create GPU cluster.
- On the page that opens, specify the cluster name. It should contain from 3 to 63 characters: lowercase letters, numbers and hyphens.
- Select the InfiniBand fabric.
- Click Create GPU cluster.
- In the sidebar, go to
-
Add VMs to the cluster. You can do it only when creating the VMs:
- In the sidebar, go to
Compute → Virtual machines.
- Click
Create virtual machine.
- On the page that opens, specify the VM’s details and select the GPU cluster name in the GPU cluster list.
- In the sidebar, go to
-
In the Computing resources section of the VM creation form:
- Select With GPU.
- Select a platform and a preset compatible with GPU clusters. For more information, see Types of virtual machines and GPUs in Nebius AI Cloud.
- In the Boot disk section of the VM creation form, select the boot disk for NVIDIA GPUs. For details, see Boot disk images for Compute virtual machines.
How to test the connection with the NCCL tests
To test InfiniBand performance in a Compute cluster, you can run the NVIDIA NCCL test in it. For instructions, see our tutorial on running distributed jobs with MPIrun: it uses the NCCL test as an example.How to delete a GPU cluster
Before deleting a GPU cluster, make sure all virtual machines in the cluster are deleted or moved to another cluster.- Web console
- CLI
- In the sidebar, go to
Compute → GPU clusters.
- In the row of the GPU cluster you want to delete, click
→ Delete.
- In the window that opens, confirm the deletion.
See also
- How to test a GPU cluster physical state in Compute
- InfiniBand networking for Compute virtual machines with GPUs
- How to create a virtual machine in Nebius AI Cloud
- Running the all-reduce NCCL performance test in Soperator clusters
InfiniBand and InfiniBand Trade Association are registered trademarks of the InfiniBand Trade Association.