vllm/vllm-openai:latest image. vLLM automatically downloads the model from Hugging Face when the endpoint starts. The container exposes an OpenAI-compatible /v1/chat/completions API.
For a quick walkthrough of the web console workflow, watch the video below. If you prefer other interfaces or written instructions, follow the steps further down.
Costs
Nebius AI Cloud charges you for Compute virtual machines.Prerequisites
- Web console
- CLI
-
Make sure that you are in a group that has the
adminrole within your tenant; for example, the defaultadminsgroup. -
On the Administration → Limits → Quotas page of the web console, check that you have quotas on the following resources in the region you use:
- NVIDIA® L40S for regular VMs without reservations, under Compute, there should be at least one GPU available.
- Number of virtual machines, under Compute, there should be at least one VM available.
- Total number of allocations, under Virtual Private Cloud, there should be at least one allocation available.
Steps
Create an endpoint
- Web console
- CLI
-
In the sidebar, go to
AI Services → Endpoints.
-
Click
Create endpoint.
-
On the page that opens, specify the following endpoint settings:
-
Image path:
vllm/vllm-openai:v0.18.0-cu130. -
Ports:
8000. -
Entrypoint command:
- Authentication: Token authentication. Copy and save the generated token.
- Computing resources: With GPU.
- Available platform: NVIDIA® L40S PCIe with Intel Ice Lake.
- Preset: 1GPU — 8 CPUs — 32 GiB RAM.
- Network: Public static IP.
-
Image path:
- Click Create.
Check the endpoint status
- Web console
- CLI
Wait until the endpoint status is
Running. You can check the status on the endpoint page.Test the endpoint
- Web console
- CLI
- In the sidebar, go to
AI Services → Endpoints.
- Open the page of the required endpoint.
- In the Network section, copy the IP address from the Public endpoints or Private endpoints field.
How to delete the created resources
The endpoint and its computing resources are chargeable. If you don’t need the endpoint, delete it, so Nebius AI Cloud doesn’t charge for it:- Web console
- CLI
- In the sidebar, go to
AI Services → Endpoints.
- Locate the endpoint and then click
→ Delete.
- In the window that opens, confirm the deletion.