Skip to content

Vector Institute Reference Implementation Catalog

Welcome to the Vector Institute Reference Implementation Catalog! The catalog is a curated collection of high quality implementations developed by researchers and engineers at the Vector Institute. This catalog provides access to repositories that demonstrate state-of-the-art techniques across a wide range of AI domains.

100+
Reference Implementations
7
Years of Research

Browse Implementations by Year

RAG

2024

This repository contains demos for various Retrieval Augmented Generation techniques using different libraries.

Cloud search via LlamaHub Document search via LangChain LlamaIndex for OpenAI and Cohere models Hybrid Search via Weaviate Vector Store Evaluation via RAGAS library
Datasets: Vectors 2021 Annual Report PubMed Doc Banking Deposits

This repository contains demos for finetuning techniques for LLMs focussed on reducing computational cost.

DDP FSDP Instruction Tuning LoRA DoRA
Datasets: samsam imdb Bias-DeBiased

This repository contains demos for various Prompt Engineering techniques, along with examples for Bias quantification, text classification.

Stereotypical Bias Analysis Sentiment inference Finetuning using HF Library Activation Generation Train and Test Model for Activations without Prompts
Datasets: Crows-pairs crow-pairs-pe-lab sst5 sst5-pe-lab czarnowska templates czar-templ-pe-lab cnn_dailymail ag_news Weather and sports data Other

This repository contains code for the paper [Can Machine Unlearning Reduce Social Bias in Language Models?][bmu-paper] which was published at EMNLP'24 in the Industry track.
Authors are Omkar Dige, Diljot Arneja, Tsz Fung Yau, Qixuan Zhang, Mohammad Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak.

PCGU Task vectors and DPO for Machine Unlearning
Datasets: BBQ bbq-bmu Stereoset stereoset-bmu Link1 link1-bmu Link2 link2-bmu

This repository contains demos for using [CyclOps] package for clinical ML evaluation and monitoring.

XGBoost
Datasets: Diabetes 130-US hospitals dataset for years 1999-2008 diabetes-cyclops

odyssey

2024

This is a library created with research done for the paper [EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records][odyssey-paper] published at ArXiv'24.
Authors are Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, Amrit Krishnan.

EHRMamba XGBoost Bi-LSTM
Datasets: MIMIC-IV

This repository contains demos for various diffusion models for tabular and time series data.

TabDDPM TabSyn ClavaDDPM CSDI TSDiff
Datasets: Physionet Challenge 2012 wiki2000

This repository contains code for libraries and experiments to recognise and evaluate bias and fakeness within news media articles via LLMs.

Bias evaluation via LLMs finetuning and data annotation via LLM for fake news detection Supervised finetuning for debiasing sentence NER for biased phrases via LLMS Evaluate using DeepEval library
Datasets: News Media Bias Full data nmb-data Toxigen Nela GT Debiaser data

Continuation of News Media Bias project, this repository contains code for libraries and experiments to collect and annotate data, recognise and evaluate bias and fakeness within news media articles via LLMs and LVMs.

Bias evaluation via LLMs and VLMs finetuning and data annotation via LLM for fake news detection supervised finetuning for debiasing sentence NER for biased entities via LLMS
Datasets: News Media Bias Plus Full Data nmb-plus-full-data NMB Plus Named Entities nmb-plus-entities

This repository contains demos for various supervised and unsupervised anomaly detection techniques in domains such as Fraud Detection, Network Intrusion Detection, System Monitoring and image, Video Analysis.

AMNet GCN SAGE OCGNN DON
Datasets: On Vector Cluster cluster-anomaly

This repository contains demos for self-supervised techniques such as contrastive learning, masked modeling and self distillation.

Internal Contrastive Learning LatentOD-AD TabRet SimMTM Data2Vec
Datasets: Beijing Air Quality baq-ssl BRFSS brfss-ssl Stroke Prediction stroke-ssl STL10 stl-10-ssl Link1 Link1-ssl Link2 Link2-ssl

This repository contains code to estimate the causal effects of an intervention on some measurable outcome primarily in the health domain.

Naive ATE TARNet DragonNet Double Machine Learning T Learner
Datasets: Infant Health and Development Program IHDP Jobs Twins Berkeley admission Government Census Compas

HV-Ai-C

2023

This repository implements a Reinforcement Learning agent to optimize energy consumption within Data Centers.

RL agents performing Random action Fixed action Q Learning Hyperspace Neighbor Penetration
Datasets: No public datasets available

Flex Model

2023

This repository contains code for the paper [FlexModel: A Framework for Interpretability of Distributed Large Language Models][flex-model-paper].
Authors are Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson.

Distributed Interpretability
Datasets: No public datasets available

VBLL

2023

This repository contains code for the paper [Variational Bayesian Last Layers][vbll-paper].
Authors are James Harrison, John Willes, Jasper Snoek.

Variational Bayesian Last Layers
Datasets: MNIST FashionMNIST

This repository contains demos for various RecSys techniques such as Collaborative Filtering, Knowledge Graph, RL based, Sequence Aware, Session based etc.

SVD++ NeuMF Plot based Two tower SVD
Datasets: Amazon-recsys careervillage movielens-recsys tmdb LastFM yoochoose

This repository contains demos for a variety of forecasting techniques for Univariate and Multivariate time series, spatiotemporal forecasting etc.

Exponential Smoothing Persistence Forecasting Mean Window Forecast Prophet Neuralphophet
Datasets: Canadian Weather Station Data BoC Exchange rate Electricity Consumption Road Traffic Occupancy Influenza-Like Illness Patient Ratios Walmart M5 Retail Product Sales WeatherBench Grocery Store Sales Economic Data with Food CPI

This repository contains demos for a variety of Prompt Engineering techniques such as fairness measurement via sentiment analysis, finetuning, prompt tuning, prompt ensembling etc.

Bias Quantification & Probing Stereotypical Bias Analysis Binary sentiment analysis task Finetuning using HF Library Gradient-Search for Instruction Prefix
Datasets: Crow-pairs sst5 cnn_dailymail ag_news Tweet-data Other

NAA

2022

This repository contains code for the paper [Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support][naa-paper] published at EMNLP'22 in the industry track.
Authors are Stephen Obadinma, Faiza Khan Khattak, Shirley Wang, Tania Sidhorn, Elaine Lau, Sean Robertson, Jingcheng Niu, Winnie Au, Alif Munim, Karthik Raja Kalaiselvi Bhaskar.

Context Retrieval using SBERT bi-encoder Context Retrieval using SBERT cross-encoder Intent identification using BERT Few Shot Multi-Class Text Classification with BERT Multi-Class Text Classification with BERT
Datasets: ELI5 MSMARCO

This repository contains demos for Privacy, Homomorphic Encryption, Horizontal and Vertical Federated Learning, MIA, and PATE.

Vanilla SGD DP SGD DP Logistic Regression Homomorphic Encryption for MLP Horizontal FL
Datasets: Heart Disease Credit Card Fraud Breaset Cancer Data TCGA CIFAR10 cifar10-pet Home Credit Default Risk Yelp Airbnb

SSGVQAP

2021

This repository contains code for the paper [A Smart System to Generate and Validate Question Answer Pairs for COVID-19 Literature][ssgvap-paper] which was accepted in ACL'20.
Authors are Rohan Bhambhoria, Luna Feng, Dawn Sepehr, John Chen, Conner Cowling, Sedef Kocak, Elham Dolatabadi.

An Active Learning Strategy for Data Selection AL-Uncertainty AL-Clustering
Datasets: CORD-19

This repository replicates the experiments described on pages 16 and 17 of the [2022 Edition of Canada's Food Price Report][fpf-paper].

Time series forecasting using Prophet Time series forecasting using Neural prophet Interpretable time series forecasting using N-BEATS Ensemble of the above methods
Datasets: FRED Economic Data

This repository tackles different problems such as defect detection, footprint extraction, road obstacle detection, traffic incident detection, and segmentation of medical procedures.

Semantic segmentation using Unet Unet++ FCN DeepLabv3 Anomaly segmentation
Datasets: SpaceNet Building Detection V2 MVTEC ICDAR2015 PASCAL_VOC DOTA AVA UCF101-24 J-HMDB-21