Bias Evaluation Across Domains (BEADs) 💠🔷🔹🔹 🔹 🔹🔷💠

We introduce the Bias Evaluation Across Domains (BEADs) Dataset page, developed in the AI Engineering team by Shaina Raza, PhD at the Vector Institute, and licensed under CC BY-NC 4.0. It has been evaluated for large language models and designed to address critical challenges in identifying, quantifying, and mitigating biases within language models. This essential resource supports a variety of NLP tasks, facilitating comprehensive studies in bias evaluation.

Contact and Access Information

This dataset provides a comprehensive resource for detecting and evaluating bias across multiple NLP tasks.

Links

📂 Access the BEADs Dataset on Hugging Face
📜 Datasheet
📝 License
📧 Contact Shaina Raza

Highlights of the BEAD Dataset

🌍 Multi-Aspects Coverage: Specifically targets biases related to gender, ethnicity, age, and more, using data from diverse social media platforms.
🤖 Hybrid Annotation Approach: Employs advanced machine learning models combined with human verification to ensure accuracy and reliability.
🛠️ Applications: Supports tasks such as text classification, token classification, and language generation, making it highly versatile for bias studies.
🧪 Evaluation: Evaluation on LLMs.

Direct Dataset Downloads

Access specific datasets directly through the links below for convenient downloading:

Text Classification Datasets

Token Classification Datasets

Aspects of Bias Dataset

📄 README
📄 Aspects Data

Bias Quantification Demographics

Language Generation Datasets

📄 README
📄 Language Generation Data

These above datasets are labeled through GPT-4 and verified by humans.
For GPT-3.5 and active learning labels, refer to Full Annotations.

License

This dataset has been prepared by Shaina Raza, Vector Institute, and is licensed under CC BY-NC 4.0.

Feedback

Provide your feedback or ask a question

Click here to provide feedback