Available models
- Falcon3 3B
- Falcon3 Mamba-7B
- Falcon3 7B
- Falcon3 10B
These ship as ready-to-run choices in the vLLM template. You can also mount your own models from storage or pull from public registries.
Coming soon
- Llama family [confirm versions]
- Mistral family [confirm versions]
- Qwen family [confirm versions]
- GPT-OSS [confirm details]
How to deploy (quick)
- Create instance → choose vLLM inference template.
- Select a model and size (GPU or CPU as needed).
- Launch. The server exposes an HTTP endpoint by default; add ports/SSH if required.
Host hardware: AMD EPYC 7713; GPU sizes listed in GPU types and sizes.