Skip to content

Bias Evaluation Across Domains (BEADs) ๐Ÿ’ ๐Ÿ”ท๐Ÿ”น๐Ÿ”น ๐Ÿ”น ๐Ÿ”น๐Ÿ”ท๐Ÿ’ 

We introduce the Bias Evaluation Across Domains (BEADs) Dataset page, developed in the AI Engineering team by Shaina Raza, PhD at the Vector Institute, and licensed under CC BY-NC 4.0. It has been evaluated for large language models and designed to address critical challenges in identifying, quantifying, and mitigating biases within language models. This essential resource supports a variety of NLP tasks, facilitating comprehensive studies in bias evaluation.

BEAD Dataset Overview

Contact and Access Information

This dataset provides a comprehensive resource for detecting and evaluating bias across multiple NLP tasks.

Highlights of the BEAD Dataset

  • ๐ŸŒ Multi-Aspects Coverage: Specifically targets biases related to gender, ethnicity, age, and more, using data from diverse social media platforms.
  • ๐Ÿค– Hybrid Annotation Approach: Employs advanced machine learning models combined with human verification to ensure accuracy and reliability.
  • ๐Ÿ› ๏ธ Applications: Supports tasks such as text classification, token classification, and language generation, making it highly versatile for bias studies.
  • ๐Ÿงช Evaluation: Evaluation on LLMs.

Direct Dataset Downloads

Access specific datasets directly through the links below for convenient downloading:

Text Classification Datasets

Token Classification Datasets

Aspects of Bias Dataset

Bias Quantification Demographics

Language Generation Datasets

These above datasets are labeled through GPT-4 and verified by humans.
For GPT-3.5 and active learning labels, refer to Full Annotations.

License

This dataset has been prepared by Shaina Raza, Vector Institute, and is licensed under CC BY-NC 4.0.

Feedback

Provide your feedback or ask a question

Click here to provide feedback