SQuAD v2¶

SQuAD 2.0 benchmark

HuggingFaceSQuADv2 ¶

Bases: HuggingFaceBenchmarkMixin, BaseBenchmark

HuggingFace SQuAD 2.0 Benchmark.

Stanford Question Answering Dataset (SQuAD) 2.0 combines 100,000 questions from SQuAD 1.1 with over 50,000 unanswerable questions. Systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph.

Example schema

{ "id": "56ddde2d66d3e219004dad4d", "title": "Symbiosis", "context": "Symbiotic relationships include those associations in which one organism lives on another (ectosymbiosis, such as...", "question": "What is an example of ectosymbiosis?", "answers": { "text": ["mistletoe"], "answer_start": [114] } }

For unanswerable questions, the answers field has empty lists: { "answers": { "text": [], "answer_start": [] } }

Source code in src/fed_rag/evals/benchmarks/huggingface/squad_v2.py

class HuggingFaceSQuADv2(HuggingFaceBenchmarkMixin, BaseBenchmark):
    """HuggingFace SQuAD 2.0 Benchmark.

    Stanford Question Answering Dataset (SQuAD) 2.0 combines 100,000 questions
    from SQuAD 1.1 with over 50,000 unanswerable questions. Systems must not
    only answer questions when possible, but also determine when no answer is
    supported by the paragraph.

    Example schema:
        {
            "id": "56ddde2d66d3e219004dad4d",
            "title": "Symbiosis",
            "context": "Symbiotic relationships include those associations in which one organism lives on another (ectosymbiosis, such as...",
            "question": "What is an example of ectosymbiosis?",
            "answers": {
                "text": ["mistletoe"],
                "answer_start": [114]
            }
        }

    For unanswerable questions, the answers field has empty lists:
        {
            "answers": {
                "text": [],
                "answer_start": []
            }
        }
    """

    dataset_name = "squad_v2"
    configuration_name: str | None = None

    def _get_query_from_example(self, example: dict[str, Any]) -> str:
        return str(example["question"])

    def _get_response_from_example(self, example: dict[str, Any]) -> str:
        answers = example.get("answers", {})
        answer_texts = answers.get("text", [])

        if answer_texts:
            # Return the first answer (they are typically variations of the same answer)
            return str(answer_texts[0])
        else:
            # For unanswerable questions, return a special token
            return "[NO ANSWER]"

    def _get_context_from_example(self, example: dict[str, Any]) -> str:
        return str(example["context"])

    @model_validator(mode="before")
    @classmethod
    def _validate_extra_installed(cls, data: Any) -> Any:
        """Validate that huggingface-evals dependencies are installed."""
        check_huggingface_evals_installed(cls.__name__)
        return data