Python API Reference¶
This section documents the Python API for vector-inference.
Client Interface¶
vec_inf.client.api.VecInfClient
¶
Client for interacting with Vector Inference programmatically.
This class provides methods for launching models, checking their status, retrieving metrics, and shutting down models using the Vector Inference infrastructure.
Methods:
Name | Description |
---|---|
list_models |
List all available models |
get_model_config |
Get configuration for a specific model |
launch_model |
Launch a model on the cluster |
get_status |
Get status of a running model |
get_metrics |
Get performance metrics of a running model |
shutdown_model |
Shutdown a running model |
wait_until_ready |
Wait for a model to become ready |
Examples:
>>> from vec_inf.api import VecInfClient
>>> client = VecInfClient()
>>> response = client.launch_model("Meta-Llama-3.1-8B-Instruct")
>>> job_id = response.slurm_job_id
>>> status = client.get_status(job_id)
>>> if status.status == ModelStatus.READY:
... print(f"Model is ready at {status.base_url}")
>>> client.shutdown_model(job_id)
Source code in vec_inf/client/api.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
|
__init__
¶
list_models
¶
List all available models.
Returns:
Type | Description |
---|---|
list[ModelInfo]
|
List of ModelInfo objects containing information about available models, including their configurations and specifications. |
Source code in vec_inf/client/api.py
get_model_config
¶
Get the configuration for a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model to get configuration for |
required |
Returns:
Type | Description |
---|---|
ModelConfig
|
Complete configuration for the specified model |
Raises:
Type | Description |
---|---|
ModelNotFoundError
|
If the specified model is not found in the configuration |
Source code in vec_inf/client/api.py
launch_model
¶
Launch a model on the cluster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model to launch |
required |
options
|
LaunchOptions
|
Launch options to override default configuration |
None
|
Returns:
Type | Description |
---|---|
LaunchResponse
|
Response containing launch details including: - SLURM job ID - Model configuration - Launch status |
Raises:
Type | Description |
---|---|
ModelConfigurationError
|
If the model configuration is invalid |
SlurmJobError
|
If there's an error launching the SLURM job |
Source code in vec_inf/client/api.py
get_status
¶
Get the status of a running model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
int
|
The SLURM job ID to check |
required |
log_dir
|
str
|
Path to the SLURM log directory. If None, uses default location |
None
|
Returns:
Type | Description |
---|---|
StatusResponse
|
Status information including: - Model name - Server status - Job state - Base URL (if ready) - Error information (if failed) |
Source code in vec_inf/client/api.py
get_metrics
¶
Get the performance metrics of a running model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
int
|
The SLURM job ID to get metrics for |
required |
log_dir
|
str
|
Path to the SLURM log directory. If None, uses default location |
None
|
Returns:
Type | Description |
---|---|
MetricsResponse
|
Response containing: - Model name - Performance metrics or error message - Timestamp of collection |
Source code in vec_inf/client/api.py
shutdown_model
¶
Shutdown a running model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
int
|
The SLURM job ID to shut down |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the model was successfully shutdown |
Raises:
Type | Description |
---|---|
SlurmJobError
|
If there was an error shutting down the model |
Source code in vec_inf/client/api.py
wait_until_ready
¶
Wait until a model is ready or fails.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
int
|
The SLURM job ID to wait for |
required |
timeout_seconds
|
int
|
Maximum time to wait in seconds, by default 1800 (30 mins) |
1800
|
poll_interval_seconds
|
int
|
How often to check status in seconds, by default 10 |
10
|
log_dir
|
str
|
Path to the SLURM log directory. If None, uses default location |
None
|
Returns:
Type | Description |
---|---|
StatusResponse
|
Status information when the model becomes ready |
Raises:
Type | Description |
---|---|
SlurmJobError
|
If the specified job is not found or there's an error with the job |
ServerError
|
If the server fails to start within the timeout period |
APIError
|
If there was an error checking the status |
Notes
The timeout is reset if the model is still in PENDING state after the initial timeout period. This allows for longer queue times in the SLURM scheduler.
Source code in vec_inf/client/api.py
Data Models¶
vec_inf.client.models
¶
Data models for Vector Inference API.
This module contains the data model classes used by the Vector Inference API for both request parameters and response objects.
Classes:
Name | Description |
---|---|
ModelStatus : Enum |
Status states of a model |
ModelType : Enum |
Types of supported models |
LaunchResponse : dataclass |
Response from model launch operation |
StatusResponse : dataclass |
Response from model status check |
MetricsResponse : dataclass |
Response from metrics collection |
LaunchOptions : dataclass |
Options for model launch |
LaunchOptionsDict : TypedDict |
Dictionary representation of launch options |
ModelInfo : datacitten |
Information about available models |
ModelStatus
¶
Bases: str
, Enum
Enum representing the possible status states of a model.
Attributes:
Name | Type | Description |
---|---|---|
PENDING |
str
|
Model is waiting for Slurm to allocate resources |
LAUNCHING |
str
|
Model is in the process of starting |
READY |
str
|
Model is running and ready to serve requests |
FAILED |
str
|
Model failed to start or encountered an error |
SHUTDOWN |
str
|
Model was intentionally stopped |
UNAVAILABLE |
str
|
Model status cannot be determined |
Source code in vec_inf/client/models.py
ModelType
¶
Bases: str
, Enum
Enum representing the possible model types.
Attributes:
Name | Type | Description |
---|---|---|
LLM |
str
|
Large Language Model |
VLM |
str
|
Vision Language Model |
TEXT_EMBEDDING |
str
|
Text Embedding Model |
REWARD_MODELING |
str
|
Reward Modeling Model |
Source code in vec_inf/client/models.py
LaunchResponse
dataclass
¶
Response from launching a model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
int
|
ID of the launched SLURM job |
required |
model_name
|
str
|
Name of the launched model |
required |
config
|
dict[str, Any]
|
Configuration used for the launch |
required |
raw_output
|
str
|
Raw output from the launch command (hidden from repr) |
required |
Source code in vec_inf/client/models.py
StatusResponse
dataclass
¶
Response from checking a model's status.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model |
required |
server_status
|
ModelStatus
|
Current status of the server |
required |
job_state
|
Union[str, ModelStatus]
|
Current state of the SLURM job |
required |
raw_output
|
str
|
Raw output from status check (hidden from repr) |
required |
base_url
|
str
|
Base URL of the model server if ready |
None
|
pending_reason
|
str
|
Reason for pending state if applicable |
None
|
failed_reason
|
str
|
Reason for failure if applicable |
None
|
Source code in vec_inf/client/models.py
MetricsResponse
dataclass
¶
Response from retrieving model metrics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model |
required |
metrics
|
Union[dict[str, float], str]
|
Either a dictionary of metrics or an error message |
required |
timestamp
|
float
|
Unix timestamp of when metrics were collected |
required |
Source code in vec_inf/client/models.py
LaunchOptions
dataclass
¶
Options for launching a model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_family
|
str
|
Family/architecture of the model |
None
|
model_variant
|
str
|
Specific variant/version of the model |
None
|
partition
|
str
|
SLURM partition to use |
None
|
num_nodes
|
int
|
Number of nodes to allocate |
None
|
gpus_per_node
|
int
|
Number of GPUs per node |
None
|
account
|
str
|
Account name for job scheduling |
None
|
qos
|
str
|
Quality of Service level |
None
|
time
|
str
|
Time limit for the job |
None
|
vocab_size
|
int
|
Size of model vocabulary |
None
|
data_type
|
str
|
Data type for model weights |
None
|
venv
|
str
|
Virtual environment to use |
None
|
log_dir
|
str
|
Directory for logs |
None
|
model_weights_parent_dir
|
str
|
Parent directory containing model weights |
None
|
vllm_args
|
str
|
Additional arguments for vLLM |
None
|
Source code in vec_inf/client/models.py
ModelInfo
dataclass
¶
Information about an available model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the model |
required |
family
|
str
|
Family/architecture of the model |
required |
variant
|
str
|
Specific variant/version of the model |
required |
model_type
|
ModelType
|
Type of the model |
required |
config
|
dict[str, Any]
|
Additional configuration parameters |
required |