Projects¶
This page summarizes each project's scope and links to its repository, documentation, and quickstart. Installation instructions are in each project's own README.
Overview¶
- What is this? — A collection of research tools, benchmarks, and evaluation frameworks for responsible AI: fairness, explainability, multimodal understanding, and agentic systems.
- Who is it for? — Researchers and practitioners working on fairness-aware AI, multimodal benchmarking, explainable agentic systems, and reproducible AI evaluation.
- How is it organized? — Some projects have their own standalone repositories; others live as modules in the main AIXpert repo. Each section below links to the relevant repo and docs.
UnBias-Plus¶
UnBias-Plus is an AI-driven toolkit for bias detection and debiasing in text. It locates biased segments, classifies severity, explains each span, suggests neutral wording, and returns a full neutral rewrite—usable from the CLI, REST API (FastAPI + demo UI), or Python (UnBiasPlus).
Links: GitHub · Project page · PyPI
FairSense-AgentiX¶
FairSense-AgentiX is an agentic bias detection and AI-risk analysis platform for text, images, and datasets. A reasoning agent plans per input type, selects tools (OCR, vision models, embeddings, retrieval), critiques and refines outputs, and explains steps via telemetry—aiming for more transparent, context-aware fairness checks than static classifiers.
Links: GitHub · Project page · PyPI
SONIC-O1¶
Real-world benchmark for evaluating multimodal LLMs on audio-video understanding: short to long-form videos across 13 conversational domains (job interviews, medical, legal, etc.), with three tasks — summarization, multiple-choice QA, and temporal localization — and demographic metadata for fairness analysis.
Links: GitHub · Project page · Dataset · Leaderboard
SONIC-O1 Multi-Agent¶
Compound multi-agent system for audio-video understanding with Qwen3-Omni: planner, reasoner, and reflection agents with chain-of-thought reasoning, self-reflection, temporal grounding, and optional multi-step task decomposition. Built on LangGraph and vLLM.
Links: GitHub
Explainable Agentic Evaluation Framework¶
Framework for evaluating explainability in both traditional (static) and agentic AI systems. Compares attribution-based explanations (e.g. SHAP, LIME) with trace-based diagnostics; shows that attribution is stable for static prediction but trace-grounded rubrics are needed to localize agentic failures.
Links: GitHub · Project page
Factual Preference Alignment (F-DPO)¶
Factuality-aware Direct Preference Optimization: extends DPO with binary factuality labels and a factuality-aware margin to reduce LLM hallucinations without an auxiliary reward model. Single-stage and compute-efficient.
Links: GitHub · Project page · Dataset
Modules in AIXpert¶
These modules live in the main AIXpert repository. Clone once, run uv sync, then follow the READMEs below for setup and commands.
Controlled Images¶
Baseline vs. fairness-aware image sets for occupations or social groups; configurable attributes, matched prompts, and reproducible seeds.
Links: Module README
Synthetic Data Generation¶
Multi-modal synthesis: image + VQA pairs, textual scenes and MCQs, and video generation (Veo/Gemini). Driven by LLM-designed prompts and metadata templates.
Links: Images README · NLP README
Agent Pipeline (CrewAI)¶
Single-agent orchestration for prompt → image → metadata generation and large-scale data creation with structured JSON task definitions.
Links: Module README
Fairness & Explainability¶
Statistical metrics (e.g. Statistical Parity, Equal Opportunity), zero-shot explainers (integrated gradients, concept attributions), and visualization (disparity plots, attribution maps).
Links: AIXpert repo
Contributing & Documentation¶
- See CONTRIBUTING.md for coding standards (PEP8, Google docstrings), pre-commit hooks (
ruff,mypy,typos,nbQA), branching, and tests. - Run docs locally:
uv sync --no-group docsthenmkdocs serve→ http://127.0.0.1:8000 - CI: GitHub Actions (
code_checks.yml,unit_tests.yml,integration_tests.yml)