Python API Reference¶
This section documents the Python API for vector-inference.
Client Interface¶
vec_inf.client.api.VecInfClient
¶
Client for interacting with Vector Inference programmatically.
This class provides methods for launching models, checking their status, retrieving metrics, and shutting down models using the Vector Inference infrastructure.
Methods:
Name | Description |
---|---|
list_models |
List all available models |
get_model_config |
Get configuration for a specific model |
launch_model |
Launch a model on the cluster |
get_status |
Get status of a running model |
get_metrics |
Get performance metrics of a running model |
shutdown_model |
Shutdown a running model |
wait_until_ready |
Wait for a model to become ready |
cleanup_logs |
Remove logs from the log directory. |
Examples:
>>> from vec_inf.api import VecInfClient
>>> client = VecInfClient()
>>> response = client.launch_model("Meta-Llama-3.1-8B-Instruct")
>>> job_id = response.slurm_job_id
>>> status = client.get_status(job_id)
>>> if status.status == ModelStatus.READY:
... print(f"Model is ready at {status.base_url}")
>>> client.shutdown_model(job_id)
Source code in vec_inf/client/api.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|
__init__
¶
list_models
¶
List all available models.
Returns:
Type | Description |
---|---|
list[ModelInfo]
|
List of ModelInfo objects containing information about available models, including their configurations and specifications. |
Source code in vec_inf/client/api.py
get_model_config
¶
Get the configuration for a specific model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model to get configuration for |
required |
Returns:
Type | Description |
---|---|
ModelConfig
|
Complete configuration for the specified model |
Raises:
Type | Description |
---|---|
ModelNotFoundError
|
If the specified model is not found in the configuration |
Source code in vec_inf/client/api.py
launch_model
¶
Launch a model on the cluster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model to launch |
required |
options
|
LaunchOptions
|
Launch options to override default configuration |
None
|
Returns:
Type | Description |
---|---|
LaunchResponse
|
Response containing launch details including: - SLURM job ID - Model configuration - Launch status |
Raises:
Type | Description |
---|---|
ModelConfigurationError
|
If the model configuration is invalid |
SlurmJobError
|
If there's an error launching the SLURM job |
Source code in vec_inf/client/api.py
batch_launch_models
¶
Launch multiple models on the cluster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_names
|
list[str]
|
List of model names to launch |
required |
Returns:
Type | Description |
---|---|
BatchLaunchResponse
|
Response containing launch details for each model |
Raises:
Type | Description |
---|---|
ModelConfigurationError
|
If the model configuration is invalid |
Source code in vec_inf/client/api.py
get_status
¶
Get the status of a running model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
str
|
The SLURM job ID to check |
required |
Returns:
Type | Description |
---|---|
StatusResponse
|
Status information including: - Model name - Server status - Job state - Base URL (if ready) - Error information (if failed) |
Source code in vec_inf/client/api.py
get_metrics
¶
Get the performance metrics of a running model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
str
|
The SLURM job ID to get metrics for |
required |
Returns:
Type | Description |
---|---|
MetricsResponse
|
Response containing: - Model name - Performance metrics or error message - Timestamp of collection |
Source code in vec_inf/client/api.py
shutdown_model
¶
Shutdown a running model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
str
|
The SLURM job ID to shut down |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the model was successfully shutdown |
Raises:
Type | Description |
---|---|
SlurmJobError
|
If there was an error shutting down the model |
Source code in vec_inf/client/api.py
wait_until_ready
¶
Wait until a model is ready or fails.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
str
|
The SLURM job ID to wait for |
required |
timeout_seconds
|
int
|
Maximum time to wait in seconds, by default 1800 (30 mins) |
1800
|
poll_interval_seconds
|
int
|
How often to check status in seconds, by default 10 |
10
|
Returns:
Type | Description |
---|---|
StatusResponse
|
Status information when the model becomes ready |
Raises:
Type | Description |
---|---|
SlurmJobError
|
If the specified job is not found or there's an error with the job |
ServerError
|
If the server fails to start within the timeout period |
APIError
|
If there was an error checking the status |
Notes
The timeout is reset if the model is still in PENDING state after the initial timeout period. This allows for longer queue times in the SLURM scheduler.
Source code in vec_inf/client/api.py
cleanup_logs
¶
cleanup_logs(
log_dir=None,
model_family=None,
model_name=None,
job_id=None,
before_job_id=None,
dry_run=False,
)
Remove logs from the log directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
log_dir
|
str or Path
|
Root directory containing log files. Defaults to ~/.vec-inf-logs. |
None
|
model_family
|
str
|
Only delete logs for this model family. |
None
|
model_name
|
str
|
Only delete logs for this model name. |
None
|
job_id
|
int
|
If provided, only match directories with this exact SLURM job ID. |
None
|
before_job_id
|
int
|
If provided, only delete logs with job ID less than this value. |
None
|
dry_run
|
bool
|
If True, return matching files without deleting them. |
False
|
Returns:
Type | Description |
---|---|
list[Path]
|
List of deleted (or matched if dry_run) log file paths. |
Source code in vec_inf/client/api.py
Model Config¶
vec_inf.client.config.ModelConfig
¶
Bases: BaseModel
Pydantic model for validating and managing model deployment configurations.
A configuration class that handles validation and management of model deployment settings, including model specifications, hardware requirements, and runtime parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model, must be alphanumeric with allowed characters: '-', '_', '.' |
required |
model_family
|
str
|
Family/architecture of the model |
required |
model_variant
|
str
|
Specific variant or version of the model family |
required |
model_type
|
(LLM, VLM, Text_Embedding, Reward_Modeling)
|
Type of model architecture |
'LLM'
|
gpus_per_node
|
int
|
Number of GPUs to use per node (1-MAX_GPUS_PER_NODE) |
required |
num_nodes
|
int
|
Number of nodes to use for deployment (1-MAX_NUM_NODES) |
required |
cpus_per_task
|
int
|
Number of CPU cores per task (1-MAX_CPUS_PER_TASK) |
required |
mem_per_node
|
str
|
Memory allocation per node in GB format (e.g., '32G') |
required |
vocab_size
|
int
|
Size of the model's vocabulary (1-1,000,000) |
required |
account
|
str
|
Charge resources used by this job to specified account. |
required |
work_dir
|
str
|
Set working directory for the batch job |
required |
qos
|
Union[QOS, str]
|
Quality of Service tier for job scheduling |
required |
time
|
str
|
Time limit for the job in HH:MM:SS format |
required |
partition
|
Union[PARTITION, str]
|
Slurm partition for job scheduling |
required |
resource_type
|
Union[RESOURCE_TYPE, str]
|
Type of resource to request for the job |
required |
venv
|
str
|
Virtual environment or container system to use |
required |
log_dir
|
Path
|
Directory path for storing logs |
required |
model_weights_parent_dir
|
Path
|
Base directory containing model weights |
required |
vllm_args
|
dict[str, Any]
|
Additional arguments for vLLM engine configuration |
required |
Notes
All fields are validated using Pydantic's validation system. The model is configured to be immutable (frozen) and forbids extra fields.
Source code in vec_inf/client/config.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
model_name
class-attribute
instance-attribute
¶
model_variant
class-attribute
instance-attribute
¶
model_type
class-attribute
instance-attribute
¶
gpus_per_node
class-attribute
instance-attribute
¶
num_nodes
class-attribute
instance-attribute
¶
cpus_per_task
class-attribute
instance-attribute
¶
cpus_per_task = Field(
default=int(DEFAULT_ARGS["cpus_per_task"]),
gt=0,
le=MAX_CPUS_PER_TASK,
description="CPUs per task",
)
mem_per_node
class-attribute
instance-attribute
¶
mem_per_node = Field(
default=DEFAULT_ARGS["mem_per_node"],
pattern="^\\d{1,4}G$",
description="Memory per node",
)
account
class-attribute
instance-attribute
¶
work_dir
class-attribute
instance-attribute
¶
qos
class-attribute
instance-attribute
¶
qos = Field(
default=DEFAULT_ARGS["qos"]
if DEFAULT_ARGS["qos"] != ""
else None,
description="Quality of Service tier",
)
time
class-attribute
instance-attribute
¶
time = Field(
default=DEFAULT_ARGS["time"],
pattern="^\\d{2}:\\d{2}:\\d{2}$",
description="HH:MM:SS time limit",
)
partition
class-attribute
instance-attribute
¶
partition = Field(
default=DEFAULT_ARGS["partition"]
if DEFAULT_ARGS["partition"] != ""
else None,
description="GPU partition type",
)
resource_type
class-attribute
instance-attribute
¶
resource_type = Field(
default=DEFAULT_ARGS["resource_type"]
if DEFAULT_ARGS["resource_type"] != ""
else None,
description="Resource type",
)
exclude
class-attribute
instance-attribute
¶
exclude = Field(
default=DEFAULT_ARGS["exclude"],
description="Exclude certain nodes from the resources granted to the job",
)
nodelist
class-attribute
instance-attribute
¶
nodelist = Field(
default=DEFAULT_ARGS["nodelist"],
description="Request a specific list of nodes for deployment",
)
bind
class-attribute
instance-attribute
¶
venv
class-attribute
instance-attribute
¶
log_dir
class-attribute
instance-attribute
¶
model_weights_parent_dir
class-attribute
instance-attribute
¶
model_weights_parent_dir = Field(
default=Path(DEFAULT_ARGS["model_weights_parent_dir"]),
description="Base directory for model weights",
)
vllm_args
class-attribute
instance-attribute
¶
env
class-attribute
instance-attribute
¶
model_config
class-attribute
instance-attribute
¶
model_config = ConfigDict(
extra="forbid",
str_strip_whitespace=True,
validate_default=True,
frozen=True,
)
Data Models¶
vec_inf.client.models
¶
Data models for Vector Inference API.
This module contains the data model classes used by the Vector Inference API for both request parameters and response objects.
Classes:
Name | Description |
---|---|
ModelStatus : Enum |
Status states of a model |
ModelType : Enum |
Types of supported models |
LaunchResponse : dataclass |
Response from model launch operation |
StatusResponse : dataclass |
Response from model status check |
MetricsResponse : dataclass |
Response from metrics collection |
LaunchOptions : dataclass |
Options for model launch |
LaunchOptionsDict : TypedDict |
Dictionary representation of launch options |
ModelInfo : datacitten |
Information about available models |
ModelStatus
¶
Bases: str
, Enum
Enum representing the possible status states of a model.
Attributes:
Name | Type | Description |
---|---|---|
PENDING |
str
|
Model is waiting for Slurm to allocate resources |
LAUNCHING |
str
|
Model is in the process of starting |
READY |
str
|
Model is running and ready to serve requests |
FAILED |
str
|
Model failed to start or encountered an error |
SHUTDOWN |
str
|
Model was intentionally stopped |
UNAVAILABLE |
str
|
Model status cannot be determined |
Source code in vec_inf/client/models.py
ModelType
¶
Bases: str
, Enum
Enum representing the possible model types.
Attributes:
Name | Type | Description |
---|---|---|
LLM |
str
|
Large Language Model |
VLM |
str
|
Vision Language Model |
TEXT_EMBEDDING |
str
|
Text Embedding Model |
REWARD_MODELING |
str
|
Reward Modeling Model |
Source code in vec_inf/client/models.py
LaunchResponse
dataclass
¶
Response from launching a model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
str
|
ID of the launched SLURM job |
required |
model_name
|
str
|
Name of the launched model |
required |
config
|
dict[str, Any]
|
Configuration used for the launch |
required |
raw_output
|
str
|
Raw output from the launch command (hidden from repr) |
required |
Source code in vec_inf/client/models.py
BatchLaunchResponse
dataclass
¶
Response from launching multiple models in batch mode.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
slurm_job_id
|
str
|
ID of the launched SLURM job |
required |
slurm_job_name
|
str
|
Name of the launched SLURM job |
required |
model_names
|
list[str]
|
Names of the launched models |
required |
config
|
dict[str, Any]
|
Configuration used for the launch |
required |
raw_output
|
str
|
Raw output from the launch command (hidden from repr) |
required |
Source code in vec_inf/client/models.py
StatusResponse
dataclass
¶
Response from checking a model's status.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model |
required |
log_dir
|
str
|
Path to the SLURM log directory |
required |
server_status
|
ModelStatus
|
Current status of the server |
required |
job_state
|
Union[str, ModelStatus]
|
Current state of the SLURM job |
required |
raw_output
|
str
|
Raw output from status check (hidden from repr) |
required |
base_url
|
str
|
Base URL of the model server if ready |
None
|
pending_reason
|
str
|
Reason for pending state if applicable |
None
|
failed_reason
|
str
|
Reason for failure if applicable |
None
|
Source code in vec_inf/client/models.py
MetricsResponse
dataclass
¶
Response from retrieving model metrics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Name of the model |
required |
metrics
|
Union[dict[str, float], str]
|
Either a dictionary of metrics or an error message |
required |
timestamp
|
float
|
Unix timestamp of when metrics were collected |
required |
Source code in vec_inf/client/models.py
LaunchOptions
dataclass
¶
Options for launching a model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_family
|
str
|
Family/architecture of the model |
None
|
model_variant
|
str
|
Specific variant/version of the model |
None
|
partition
|
str
|
SLURM partition to use |
None
|
resource_type
|
str
|
Type of resource to request for the job |
None
|
num_nodes
|
int
|
Number of nodes to allocate |
None
|
gpus_per_node
|
int
|
Number of GPUs per node |
None
|
account
|
str
|
Account name for job scheduling |
None
|
work_dir
|
str
|
Set working directory for the batch job |
None
|
qos
|
str
|
Quality of Service level |
None
|
time
|
str
|
Time limit for the job |
None
|
exclude
|
str
|
Exclude certain nodes from the resources granted to the job |
None
|
node_list
|
str
|
Request a specific list of nodes for deployment |
required |
bind
|
str
|
Additional binds for the container as a comma separated list of bind paths |
None
|
vocab_size
|
int
|
Size of model vocabulary |
None
|
data_type
|
str
|
Data type for model weights |
None
|
venv
|
str
|
Virtual environment to use |
None
|
log_dir
|
str
|
Directory for logs |
None
|
model_weights_parent_dir
|
str
|
Parent directory containing model weights |
None
|
vllm_args
|
str
|
Additional arguments for vLLM |
None
|
env
|
str
|
Environment variables to be set |
None
|
config
|
str
|
Path to custom model config yaml |
None
|
Source code in vec_inf/client/models.py
ModelInfo
dataclass
¶
Information about an available model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
Name of the model |
required |
family
|
str
|
Family/architecture of the model |
required |
variant
|
str
|
Specific variant/version of the model |
required |
model_type
|
ModelType
|
Type of the model |
required |
config
|
dict[str, Any]
|
Additional configuration parameters |
required |