Vector Institute Reference Implementation Catalog¶

Welcome to the Vector Institute Reference Implementation Catalog! The catalog is a curated collection of high quality implementations developed by researchers and engineers at the Vector Institute. This catalog provides access to repositories that demonstrate state-of-the-art techniques across a wide range of AI domains.

100+

Reference Implementations

7

Years of Research

Browse Implementations by Year¶

20242023202220212020

This repository contains demos for various Retrieval Augmented Generation techniques using different libraries.

Cloud search via LlamaHub Document search via LangChain LlamaIndex for OpenAI and Cohere models Hybrid Search via Weaviate Vector Store Evaluation via RAGAS library

Datasets: Vectors 2021 Annual Report PubMed Doc Banking Deposits

This repository contains demos for finetuning techniques for LLMs focussed on reducing computational cost.

DDP FSDP Instruction Tuning LoRA DoRA

Datasets: samsam imdb Bias-DeBiased

This repository contains demos for various Prompt Engineering techniques, along with examples for Bias quantification, text classification.

Stereotypical Bias Analysis Sentiment inference Finetuning using HF Library Activation Generation Train and Test Model for Activations without Prompts

Datasets: Crows-pairs crow-pairs-pe-lab sst5 sst5-pe-lab czarnowska templates czar-templ-pe-lab cnn_dailymail ag_news Weather and sports data Other

This repository contains code for the paper [Can Machine Unlearning Reduce Social Bias in Language Models?][bmu-paper] which was published at EMNLP'24 in the Industry track.
Authors are Omkar Dige, Diljot Arneja, Tsz Fung Yau, Qixuan Zhang, Mohammad Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak.

PCGU Task vectors and DPO for Machine Unlearning

Datasets: BBQ bbq-bmu Stereoset stereoset-bmu Link1 link1-bmu Link2 link2-bmu

This repository contains demos for using [CyclOps] package for clinical ML evaluation and monitoring.

XGBoost

Datasets: Diabetes 130-US hospitals dataset for years 1999-2008 diabetes-cyclops

This is a library created with research done for the paper [EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records][odyssey-paper] published at ArXiv'24.
Authors are Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, Amrit Krishnan.

EHRMamba XGBoost Bi-LSTM

Datasets: MIMIC-IV

This repository contains demos for various diffusion models for tabular and time series data.

TabDDPM TabSyn ClavaDDPM CSDI TSDiff

Datasets: Physionet Challenge 2012 wiki2000

This repository contains code for libraries and experiments to recognise and evaluate bias and fakeness within news media articles via LLMs.

Bias evaluation via LLMs finetuning and data annotation via LLM for fake news detection Supervised finetuning for debiasing sentence NER for biased phrases via LLMS Evaluate using DeepEval library

Datasets: News Media Bias Full data nmb-data Toxigen Nela GT Debiaser data

Continuation of News Media Bias project, this repository contains code for libraries and experiments to collect and annotate data, recognise and evaluate bias and fakeness within news media articles via LLMs and LVMs.

Bias evaluation via LLMs and VLMs finetuning and data annotation via LLM for fake news detection supervised finetuning for debiasing sentence NER for biased entities via LLMS

Datasets: News Media Bias Plus Full Data nmb-plus-full-data NMB Plus Named Entities nmb-plus-entities

This repository contains demos for various supervised and unsupervised anomaly detection techniques in domains such as Fraud Detection, Network Intrusion Detection, System Monitoring and image, Video Analysis.

AMNet GCN SAGE OCGNN DON

Datasets: On Vector Cluster cluster-anomaly

This repository contains demos for self-supervised techniques such as contrastive learning, masked modeling and self distillation.

Internal Contrastive Learning LatentOD-AD TabRet SimMTM Data2Vec

Datasets: Beijing Air Quality baq-ssl BRFSS brfss-ssl Stroke Prediction stroke-ssl STL10 stl-10-ssl Link1 Link1-ssl Link2 Link2-ssl

This repository contains code to estimate the causal effects of an intervention on some measurable outcome primarily in the health domain.

Naive ATE TARNet DragonNet Double Machine Learning T Learner

Datasets: Infant Health and Development Program IHDP Jobs Twins Berkeley admission Government Census Compas

This repository implements a Reinforcement Learning agent to optimize energy consumption within Data Centers.

RL agents performing Random action Fixed action Q Learning Hyperspace Neighbor Penetration

Datasets: No public datasets available

This repository contains code for the paper [FlexModel: A Framework for Interpretability of Distributed Large Language Models][flex-model-paper].
Authors are Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson.

Distributed Interpretability

Datasets: No public datasets available

This repository contains code for the paper [Variational Bayesian Last Layers][vbll-paper].
Authors are James Harrison, John Willes, Jasper Snoek.

Variational Bayesian Last Layers

Datasets: MNIST FashionMNIST

This repository contains demos for various RecSys techniques such as Collaborative Filtering, Knowledge Graph, RL based, Sequence Aware, Session based etc.

SVD++ NeuMF Plot based Two tower SVD

Datasets: Amazon-recsys careervillage movielens-recsys tmdb LastFM yoochoose

This repository contains demos for a variety of forecasting techniques for Univariate and Multivariate time series, spatiotemporal forecasting etc.

Exponential Smoothing Persistence Forecasting Mean Window Forecast Prophet Neuralphophet

Datasets: Canadian Weather Station Data BoC Exchange rate Electricity Consumption Road Traffic Occupancy Influenza-Like Illness Patient Ratios Walmart M5 Retail Product Sales WeatherBench Grocery Store Sales Economic Data with Food CPI

This repository contains demos for a variety of Prompt Engineering techniques such as fairness measurement via sentiment analysis, finetuning, prompt tuning, prompt ensembling etc.

Bias Quantification & Probing Stereotypical Bias Analysis Binary sentiment analysis task Finetuning using HF Library Gradient-Search for Instruction Prefix

Datasets: Crow-pairs sst5 cnn_dailymail ag_news Tweet-data Other

This repository contains code for the paper [Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support][naa-paper] published at EMNLP'22 in the industry track.
Authors are Stephen Obadinma, Faiza Khan Khattak, Shirley Wang, Tania Sidhorn, Elaine Lau, Sean Robertson, Jingcheng Niu, Winnie Au, Alif Munim, Karthik Raja Kalaiselvi Bhaskar.

Context Retrieval using SBERT bi-encoder Context Retrieval using SBERT cross-encoder Intent identification using BERT Few Shot Multi-Class Text Classification with BERT Multi-Class Text Classification with BERT

Datasets: ELI5 MSMARCO

This repository contains demos for Privacy, Homomorphic Encryption, Horizontal and Vertical Federated Learning, MIA, and PATE.

Vanilla SGD DP SGD DP Logistic Regression Homomorphic Encryption for MLP Horizontal FL

Datasets: Heart Disease Credit Card Fraud Breaset Cancer Data TCGA CIFAR10 cifar10-pet Home Credit Default Risk Yelp Airbnb

This repository contains code for the paper [A Smart System to Generate and Validate Question Answer Pairs for COVID-19 Literature][ssgvap-paper] which was accepted in ACL'20.
Authors are Rohan Bhambhoria, Luna Feng, Dawn Sepehr, John Chen, Conner Cowling, Sedef Kocak, Elham Dolatabadi.

An Active Learning Strategy for Data Selection AL-Uncertainty AL-Clustering

Datasets: CORD-19

This repository replicates the experiments described on pages 16 and 17 of the [2022 Edition of Canada's Food Price Report][fpf-paper].

Time series forecasting using Prophet Time series forecasting using Neural prophet Interpretable time series forecasting using N-BEATS Ensemble of the above methods

Datasets: FRED Economic Data

This repository tackles different problems such as defect detection, footprint extraction, road obstacle detection, traffic incident detection, and segmentation of medical procedures.

Semantic segmentation using Unet Unet++ FCN DeepLabv3 Anomaly segmentation

Datasets: SpaceNet Building Detection V2 MVTEC ICDAR2015 PASCAL_VOC DOTA AVA UCF101-24 J-HMDB-21

Vector Institute Reference Implementation Catalog¶

Browse Implementations by Year¶

RAG

Finetuning and Alignment

Prompt Engineering Laboratory

bias-mitigation-unlearning

cyclops-workshop

odyssey

Diffusion model bootcamp

News Media Bias

News Media Bias Plus

Anomaly Detection Project

SSL Bootcamp

Causal Inference Lab

HV-Ai-C

Flex Model

VBLL

Recommendation Systems

Forecasting with Deep Learning

Prompt Engineering

NAA

Privacy Enhancing Technologies

SSGVQAP

foodprice-forecasting

Computer_Vision_Project