Announcing the Winners of the MIDST Challenge!

The Vector Institute MIDST challenge (Membership Inference over Diffusion-models-based Synthetic Tabular data) will be hosted at the 3rd IEEE Conference on Secure and Trustworthy Machine Learning (SaTML 2025). The competition was launched in December 2024, and final submissions were due on February 28th, 2025. We are excited to announce the winning submissions!

The goal of this challenge was to evaluate the resilience of the synthetic tabular data generated by diffusion models against black-box and white-box membership inference attacks. We sought a quantitative evaluation of the privacy gain of synthetic tabular data generated by diffusion models, with a specific focus on its resistance to membership inference attacks (MIAs). Given the heterogeneity and complexity of tabular data, we explored multiple target models for MIAs, including diffusion models for single tables of mixed data type types and multi-relational tables with interconnected constraints. We expected the development of novel black-box and white-box MIAs tailored to these target diffusion models as a key outcome, enabling a comprehensive evaluation of their privacy efficacy. The following is a link to the GitHub repository: link

Challenge Tasks

This challenge was composed of four different tasks, each associated with a separate category. The categories were defined based on the access to the generative models and the type of the tabular data as follows: Access to the models: black-box, Data: single table Access to the models: white-box, Data: single table Access to the models: black-box, Data: multi-table Access to the models: white-box, Data: multi-table

To facilitate participation in MIDST, we developed shadow models for both single table and multi-table tasks. The shadow models were the same for black-box and white-box tasks. Applicants were free to choose these shadow models and/or generate their own if needed in developing their MIAs.

We hosted the competition tasks as separate competitions on CodaBench.

Evaluation Criteria

Submissions were evaluated and ranked based on their true positive rate at a 10% false positive rate (TPR @ 10% FPR). This metric reflects a realistic attack scenario in which an adversary aims to accurately identify as many members as possible while allowing only a small margin for error. We also plotted full Receiver Operating Characteristic (ROC) curves for each attack and reported additional metrics, including the Area Under the Curve (AUC), overall accuracy, and membership inference advantage (defined as TPR - FPR).

Accessibility

We structured the competition to ensure accessibility, minimizing the need for extensive computational resources. To support this, we provided a calibrated set of shadow models so that participants were not required to train additional models themselves. For context, training the 450 models made available during the competition required approximately 1500 GPU hours. Participants were welcome to join any subset of the four competition tracks.

Transparency

The implementations for this competition are based on the Diffusion Model Bootcamp provided by the Vector Institute. A more detailed technical description of the competition as well as the code used to train models and score submissions is available on the competition GitHub repository.

Result

We received entries from 71 distinct participants across the 4 tracks. We congratulate all participants for taking part in this competition, and we are particularly excited to announce the winner and runner-up in each track.

Track Winner Runner-up
Black-box Single Table Tartan Federer CITADEL & UQAM
White-box Single Table Tartan Federer Yan Pang
Black-box Multi Table Tartan Federer Cyber@BGU
White-box Multi Table Tartan Federer **

The winner of each track is eligible for an award from Vector of $2000 CAD; runners-up are eligible for an award of $1000 CAD.

** We received several submissions for the white-box multi-table task; however, their performance did not significantly exceed that of random guessing.

Analysis

Findings:

Interesting observations for further investigation:

Event Organizers

Meet the Event Organizers

Event Sponsors

Meet the Event Sponsors

FAQ

Browse FAQ

Acknowledgements

We’d like to thank MICO organizers, for their open source project, and very helpful comments.

Next Steps

MIDST is part of ongoing efforts at the Vector Institute to provide guidance on the privacy evaluation of synthetic data. If you are interested in joining a discussion with our team or collaborating with us on the topic, please contact us at the following emails: masoumeh@vectorinstitute.ai, xi.he@uwaterloo.ca, or veronica.chatrath@vectorinstitute.ai.