Vector Institute Implementation Catalog

A curated collection of high-quality AI implementations developed by researchers and engineers at the Vector Institute

115

Implementations

7

Years of Research

Browse Implementations by Type¶

applied-researchbootcamptool

Library for handling atomistic graph datasets focusing on transformer-based implementations, with utilities for training various models, experimenting with different pre-training tasks, and a suite of pre-trained models with huggingface integrations

AtomFormer SchNet TokenGT

Datasets: S2EF Datasets Misc. Atomistic Graph Datasets

A repository for social bias mitigation in LLMs using machine unlearning

Negation via Task Vectors PCGU

Datasets: BBQ Stereoset RedditBias

Cite Paper

A comprehensive framework for Knowledge Graph Retrieval Augmented Generation (KG-RAG).

KG-RAG GraphRAG

Datasets: SEC 10-Q

A toolkit to download, augment, and benchmark Open-PMC data

PMC Data Extraction

Datasets: PubMed Central HuggingFace PMC Dataset

Cite Paper

A repository reference implementations for retrieval-augmented generation

Web Search Document Search SQL Search Cloud Search PubMed QA RAG Evaluation

Datasets: PubMed Banking Dataset - Marketing Targets

Open in Coder

A repository with reference implementations for deploying AI models in production environments, focusing on best practices and cloud-native solutions.

AWS GCP

A repository with implementation of anomaly detection techniques

Logistic Regression (Supervised) Random Forest (Supervised) XGBoost (Supervised) CatBoost (Supervised) Light GBM (Supervised) TabNet (Supervised and Semi-supervised) Autoencoder (AE) (Unsupervised) Isolation Forest (Unsupervised)

Datasets: Bank Account Fraud Detection DGraph dataset MVTec dataset UCSD Anomaly Detection Dataset UCF Crime Dataset

A repository with demos for various diffusion models for tabular and time series data

TabDDPM TabSyn ClavaDDPM CSDI TSDiff

Datasets: Physionet Challenge 2012 Electricity dataset (UCI Machine Learning Repository)

A repository with implementations advanced fine-tuning techniques and approaches to enhance Large Language Model performance, reduce their computational cost, with a focus on alignment with human values

FSDP DDP Instruction Tuning PEFT Quantization Supervised Fine-tuning

Datasets: SAMSum dataset TweetEval

A repository providing reference implementations and resources for the 2025 Bootcamp on Interpretable and Explainable AI, covering both post-hoc explainability methods and interpretable models

LIME SHAP PDP (Partial Dependence Plot) ALE (Accumulated Local Effects) Integrated Gradients Counterfactual Explanations Generalized Additive Model Neural Additive Model Explainable Boosting Machine Gas turbine dataset

A repository with implementations of privacy-enhancing techniques for machine learning

Differential Privacy (tensorflow_privacy) PATE Membership Inference Attacks Horizontal Federated Learning Vertical Federated Learning Homomorphic Encryption

A repository with implementations of recommender systems

Matrix Factorization Collaborative Filtering Content-Based Filtering Sequence Aware Recommender Systems Session-Based Recommender Systems Knowledge Graph-Based Recommender Systems

A repository with reference implementations of self-supervised learning techniques

Internal Contrastive Learning (ICL) + Latent Outlier Exposure (LOE) SimMTM TabRet Data2Vec

Datasets: STL-10 Beijing PM 2.5

A toolkit for facilitating research and deployment of ML models for healthcare

Binary Classification Multi-label Classification Tabular Data Processing Time-series Data Processing Image Data Processing Dataset Shift Detection Model Report Card Generation

Datasets: NIH Chest X-ray MIMIC-IV

An AI-powered tool designed to analyze bias in text and visual content, with a focus on risk identification, mitigation, and promoting sustainable and trustworthy AI systems

Text Bias Analysis Image Bias Analysis Batch Text CSV Analysis Batch Image Analysis AI Risk Management Green AI Optimization Bias Scoring and Assessment

A framework for fine-tuning retrieval-augmented generation (RAG) systems.

Basic fine-tuning with FL RA-DIT

A flexible, modular, and easy to use library to facilitate federated learning research and development in healthcare settings

FedAvg FedOpt FedProx SCAFFOLD MOON FedDG-GA FLASH FedPM Personal FL FedBN FedPer FedRep Ditto MR-MTL APFL PerFCL FENDA-FL FENDA+Ditto

A platform to launch and monitor Federated Learning (FL) training jobs, designed to bridge the gap between FL algorithm implementations and practical healthcare applications

FL Job Orchestration Training Job Monitoring Client-Server Communication FL4Health Integration Web-based Job Configuration Docker-based Deployment Multi-client Support

A toolkit for research on multimodal representation learning

Contrastive Pretraining I-JEPA

Datasets: ImageNet LibriSpeech RGB-D

A comprehensive library for developing foundation models using Electronic Health Record (EHR) data, with a focus on advanced medical data processing and modeling

EHRMamba CEHR-BERT BigBird MultiBird LSTM XGBoost Multitask Prompted Finetuning (MPF) Next Token Prediction (NTP)

Datasets: MIMIC-IV

Efficient LLM inference on Slurm clusters using vLLM

CLI Python API OpenAI compatible server

Browse Implementations by Type¶

atomgen

bias-mitigation-unlearning

kg-rag

pmc-data-extraction

retrieval-augmented-generation

ai-deployment

anomaly-detection

diffusion-models

finetuning-and-alignment

interpretability

privacy-enhancing-techniques

recommender-systems

self-supervised-learning

cyclops

fair-sense-ai

fed-rag

fl4health

florist

mmlearn

odyssey

vector-inference