Vector Inference: Easy inference on Slurm clusters¶
This repository provides an easy-to-use solution to run inference servers on Slurm-managed computing clusters using open-source inference engines (vLLM, SGLang). This package runs natively on the Vector Institute cluster environments. To adapt to other environments, follow the instructions in Installation.
NOTE: Supported models on Killarney are tracked here
Installation¶
If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:
Otherwise, we recommend using the provided vllm.Dockerfile and sglang.Dockerfile to set up your own environment with the package. The built images are available through Docker Hub
If you'd like to use vec-inf on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
- Clone the repository and update the
environment.yamland themodels.yamlfile invec_inf/config, then install from source by runningpip install .. - The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to
/model-weights/vec-inf-shared, you would need to create anenvironment.yamland amodels.yamlfollowing the format of these files invec_inf/config. - [OPTIONAL] The package would also look for an enviroment variable
VEC_INF_CONFIG_DIR. You can put yourenvironment.yamlandmodels.yamlin a directory of your choice and set the enviroment variableVEC_INF_CONFIG_DIRto point to that location.