Bias Evaluation Across Domains (BEADs) ๐ ๐ท๐น๐น ๐น ๐น๐ท๐
We introduce the Bias Evaluation Across Domains (BEADs) Dataset page, developed in the AI Engineering team by Shaina Raza, PhD at the Vector Institute, and licensed under CC BY-NC 4.0. It has been evaluated for large language models and designed to address critical challenges in identifying, quantifying, and mitigating biases within language models. This essential resource supports a variety of NLP tasks, facilitating comprehensive studies in bias evaluation.

Contact and Access Information
This dataset provides a comprehensive resource for detecting and evaluating bias across multiple NLP tasks.
Links
- ๐ Access the BEADs Dataset on Hugging Face
- ๐ Datasheet
- ๐ License
- ๐ง Contact Shaina Raza
Highlights of the BEAD Dataset
- ๐ Multi-Aspects Coverage: Specifically targets biases related to gender, ethnicity, age, and more, using data from diverse social media platforms.
- ๐ค Hybrid Annotation Approach: Employs advanced machine learning models combined with human verification to ensure accuracy and reliability.
- ๐ ๏ธ Applications: Supports tasks such as text classification, token classification, and language generation, making it highly versatile for bias studies.
- ๐งช Evaluation: Evaluation on LLMs.
Direct Dataset Downloads
Access specific datasets directly through the links below for convenient downloading:
Text Classification Datasets
- ๐ README
- ๐ Bias Training Data
- ๐ Bias Validation Data
- ๐ Sentiment Training Data
- ๐ Sentiment Validation Data
- ๐ Toxicity Training Data
- ๐ Toxicity Validation Data
Token Classification Datasets
- ๐ README
- ๐ Bias Tokens Data
- ๐ CONLL Format Data
Aspects of Bias Dataset
- ๐ README
- ๐ Aspects Data
Bias Quantification Demographics
- ๐ README
- ๐ Demographic Templates
- ๐ Stereotype Prompts
Language Generation Datasets
- ๐ README
- ๐ Language Generation Data
These above datasets are labeled through GPT-4 and verified by humans.
For GPT-3.5 and active learning labels, refer to Full Annotations.
License
This dataset has been prepared by Shaina Raza, Vector Institute, and is licensed under CC BY-NC 4.0.
Feedback
Provide your feedback or ask a question