vLLM model catalog

The vLLM template lets you spin up an inference server in minutes. Pick a model from the catalog or provide your own weights.

These ship as ready-to-run choices in the vLLM template. You can also mount your own models from storage or pull from public registries.

Create instance → choose vLLM inference template.
Select a model and size (GPU or CPU as needed).
Launch. The server exposes an HTTP endpoint by default; add ports/SSH if required.

Host hardware: AMD EPYC 7713; GPU sizes listed in GPU types and sizes.