Vector Inference: Easy inference on Slurm clusters¶

This repository provides an easy-to-use solution to run inference servers on Slurm-managed computing clusters using vLLM. This package runs natively on the Vector Institute cluster environment. To adapt to other environments, follow the instructions in Installation.

NOTE: Supported models on Killarney are tracked here

Installation¶

If you are using the Vector cluster environment, and you don't need any customization to the inference server environment, run the following to install package:

pip install vec-inf

Otherwise, we recommend using the provided Dockerfile to set up your own environment with the package. The latest image has vLLM version 0.10.1.1.

If you'd like to use vec-inf on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it: * Clone the repository and update the environment.yaml and the models.yaml file in vec_inf/config, then install from source by running pip install .. * The package would try to look for cached configuration files in your environment before using the default configuration. The default cached configuration directory path points to /model-weights/vec-inf-shared, you would need to create an environment.yaml and a models.yaml following the format of these files in vec_inf/config. * The package would also look for an enviroment variable VEC_INF_CONFIG_DIR. You can put your environment.yaml and models.yaml in a directory of your choice and set the enviroment variable VEC_INF_CONFIG_DIR to point to that location.