Documentation Index
Fetch the complete documentation index at: https://docs.hivenet.com/llms.txt
Use this file to discover all available pages before exploring further.
EVO2 40B is a 40-billion-parameter biological foundation model for DNA sequence generation and analysis. This tutorial walks you through creating a custom NIM template, deploying it on a 4 × RTX 4090 instance, and running inference endpoints.
Estimated time: 15–20 minutes (includes model download)
What you’ll need
- A Compute with Hivenet medium instance (4 × RTX 4090 GPUs)
- A valid NVIDIA NGC API key
Steps
Create the custom NIM template
This container image is based on the official EVO2 40B NIM container and made compatible with Hivenet’s Compute environment.
Create a medium instance
- From the Compute console, select Create new instance.
- Choose your location.
- Under Setup, pick 4 × RTX 4090.
- In Template, select the custom template you just created (e.g.
my-evo2-40b-template).
- Under Connectivity, add your public SSH key (if not already added) and expose HTTPS port 8000.
- In Instance name, give your instance a name (e.g.
my-evo2-40b-instance).
- Click Create instance and wait until its state changes to Running.
The first start can take several minutes while the image initializes.
Use your EVO2-40B model
Once your instance is running, open the Logs panel.The NIM container automatically downloads the model weights (~ 80 GB).You’ll see messages similar to this once it starts serving:2025-10-14 12:12:18 INFO:nimlib.nim_inference_api_builder.api:{'message': 'Starting HTTP Inference server', 'port': 8000, 'workers_count': 1, 'host': '0.0.0.0', 'log_level': 'info', 'SSL': 'disabled'}
2025-10-14 12:12:18 0.0.0.0:8000/v1/manifest (GET)
2025-10-14 12:12:18 0.0.0.0:8000/v1/metadata (GET)
2025-10-14 12:12:18 0.0.0.0:8000/v1/license (GET)
2025-10-14 12:12:18 0.0.0.0:8000/v1/metrics (GET)
2025-10-14 12:12:18 0.0.0.0:8000/v1/health/ready (GET)
2025-10-14 12:12:18 0.0.0.0:8000/v1/health/live (GET)
2025-10-14 12:12:18 0.0.0.0:8000/biology/arc/evo2/generate (POST)
2025-10-14 12:12:18 0.0.0.0:8000/biology/arc/evo2/forward (POST)
2025-10-14 12:12:18 INFO:nimlib.nim_inference_api_builder.api:Serving endpoints:
2025-10-14 12:12:18 INFO 2025-10-14 10:12:18.042 http_api.py:73] {'message': 'Starting HTTP Inference server', 'port': 8000, 'workers_count': 1, 'host': '0.0.0.0', 'log_level': 'info', 'SSL': 'disabled'}
Check model health
When the model is ready, test the health endpoint:curl -X GET "https://<YOUR_INSTANCE_ID>-8000.tenants.hivenet.com/v1/health/ready"
You should receive:You can also check metadata:curl -X GET "https://<YOUR_INSTANCE_ID>-8000.tenants.hivenet.com/v1/metadata"
Run inference
To generate a short DNA sequence example:curl -X POST "https://<YOUR_INSTANCE_ID>-8000.tenants.hivenet.com/biology/arc/evo2/generate" \
-H "Content-Type: application/json" \
--data '{
"sequence": "ACTGACTGACTGACTG",
"num_tokens": 8,
"top_k": 1,
"enable_sampled_probs": true
}'
Expected response (example):{"sequence":"ACTGACTG","elapsed_ms":1319}
(Optional) Monitor GPU usage
You can SSH into your instance to check GPU activity:ssh -i ~/.ssh/id_rsa -o "ProxyCommand=ssh bastion@ssh.hivenet.com %h" nvs@<YOUR_INSTANCE_ID>.ssh.hivenet.com
Then run:Here is an example output:nvidia-smi
Tue Oct 14 11:22:42 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.03 Driver Version: 575.64.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 29C P8 17W / 450W | 22208MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:23:00.0 Off | Off |
| 0% 30C P8 30W / 450W | 22166MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 Off | 00000000:41:00.0 Off | Off |
| 0% 27C P8 9W / 450W | 22296MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 Off | 00000000:61:00.0 Off | Off |
| 0% 26C P8 8W / 450W | 22210MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
You’ve served EVO2 40B successfully
Your Compute instance is now hosting the EVO2 40B model through a NIM container, ready for inference requests. You can monitor usage directly from the Compute dashboard or SSH into the instance to view live GPU metrics.