Sustainable Open-Source AI Requires Tracking the Cumulative Footprint of Derivatives

Focus: Carbon & Water Footprint Coordination in Open-Source AI
Authors: Shaina Raza1, Iuliia Eyriay1,2, Ahmed Y. Radwan1, Nate Lesperance1,2, Deval Pandya1, Sedef Akinli Kocak1, Graham W. Taylor1,2
Affiliation: 1Vector Institute, 2University of Guelph
📄 Read Paper
💧
Water Impact (Hidden Cost)
  • 76–170 ML water consumed training GPT-4 (estimated)
  • 1.3B+ gallons/year used by single data centers (e.g., Google Council Bluffs)
  • 1 in 4 data centers may face water scarcity by 2050

Water consumption from on-site cooling + off-site electricity generation

🏭
Carbon Footprint Growth
  • 415 TWh → 945 TWh data center electricity (2024-2030)
  • 15% annual growth rate (4× faster than total demand)
  • 30% annual growth for AI-specific servers

IEA projections showing aggregate consumption rising despite efficiency gains

♻️
The Rebound Effect

Lower marginal costs → increased usage → higher aggregate demand. Efficiency improvements can be overwhelmed by scale without coordination. Thousands of derivative models (fine-tunes, LoRA adapters, quantizations) create cumulative impacts that exceed base model training.

💡
Position Statement

Efficiency gains are critical, but reducing AI's aggregate environmental impact also requires coordination infrastructure. Open-source ecosystems need ecosystem-level carbon accounting to enable measurement, disclosure, and shared targets that make emissions visible, assign responsibility, and prevent the tragedy of the commons.

Abstract

The open-source Artificial Intelligence (AI) ecosystem has grown explosively, with Hugging Face now hosting over 2 million models. While this growth democratizes AI, it also introduces a coordination gap. The downstream derivatives incur energy use, water consumption, and emissions that remain largely unobserved and inconsistently disclosed, limiting collective oversight and masking cumulative impact.

The Problem

While quantization, pruning, and efficient fine-tuning help make individual models more efficient, cheaper training and inference can also lead to more experimentation and deployment, which may outweigh these gains through rebound effects. A single foundation model like Meta Llama can spawn hundreds of derivatives within months, each consuming additional compute.

Our Proposal: DIA

We propose Data and Impact Accounting (DIA), a lightweight coordination mechanism that provides ecosystem-level visibility into carbon and water footprints without restricting open-source development. DIA combines standardized reporting, automated integration, and public aggregation dashboards.

Key Insight: Tragedy of the Commons

This dynamic represents the tragedy of the commons, where individually rational actions—such as fine-tuning models for specific use cases—can collectively increase total energy and water use. Here, the commons are the atmosphere and freshwater resources. The open-source ecosystem currently lacks governance mechanisms to coordinate responsible resource use.

Key Figures

Visual evidence of the hidden environmental reality in AI ecosystems

Figure 1. The hidden environmental reality of the AI ecosystem, illustrating: (A) Localized water stress across the United States with data center facilities competing for water in stressed basins, (B) Estimated order-of-magnitude comparison of training-related carbon and water footprints for closed vs. open models,
Figure 2. Overview of Data and Impact Accounting (DIA). Top: Current state showing invisible footprint—base model training may be reported, but derivative artifacts (fine-tunes, LoRA adapters, quantizations, merges) are typically untracked, making aggregate ecosystem impact unobservable. Bottom: Proposed DIA system introducing a low-friction visibility layer with (1) standardized impact reporting in model metadata, (2) automated tracking via existing tools, and (3) ecosystem-level aggregation through public dashboards.
Critical Context: Derivative Proliferation

Meta reports that pretraining Llama 3 (8B and 70B combined) emitted approximately 2,290 tCO₂eq. However, research documents 146+ derivatives for a single model family. Even if most derivatives are cheaper individually, the aggregate emissions across hundreds can exceed base model training by multiples. Precise estimation is currently impossible because derivative compute is rarely disclosed—this motivates DIA.

Model Training Footprints (2020-2024)

Training emissions and water consumption of selected GenAI models. Models marked with ⋆ are open-source. Tree equivalent assumes 25 kg CO₂/tree/year. Water in megalitres (ML; 1 ML = 10⁶ L).

Model Year Params Open tCO₂eq Tree Equiv. Water (ML) R/Est.

Carbon vs Water (Training Phase)

Each point represents a model from the table. Hover for details. Axes use logarithmic scale by default to show order-of-magnitude differences.

📊 Data Notes & Methodology
  • Water values: Estimated using WUEtotal (Water Usage Effectiveness) ranges of 1.8-4.0 L/kWh, combining on-site cooling and off-site electricity generation water consumption (water not returned locally).
  • Reported vs Estimated: "R" indicates values disclosed by model creators; "Est." uses GPU-hours, TDP, PUE, and carbon intensity assumptions (see paper Appendix A, Equations 1-3).
  • Training phase only: This table covers training phase emissions and water use. The paper emphasizes that inference and derivative proliferation can dominate lifecycle impact—these downstream costs are largely untracked.
  • Upper bounds: TDP-based estimates provide upper bounds; actual power draw typically ranges from 60-80% of TDP depending on utilization patterns during training.
  • GPT-4 range: Based on IEA's 42.4 GWh estimate with carbon intensity 0.1-0.445 kgCO₂/kWh and WUE 1.2-4.4 L/kWh depending on cooling and grid source. These are third-party estimates, not audited disclosures.

Data and Impact Accounting (DIA)

A lightweight, non-regulatory transparency infrastructure for ecosystem-level sustainability coordination

📋
1. Lightweight Reporting Schema

Minimal footprint schema embedded in model cards or repository metadata:

  • Hardware type and device count
  • Training duration (GPU-hours)
  • Estimated electricity use (kWh)
  • Estimated water use (L) or facility WUE (L/kWh)
  • Grid carbon intensity (kgCO₂/kWh) or training region as proxy
  • Model lineage (base model(s) and major downstream derivatives)
  • For inference: standardized per-query benchmarks, optional aggregate usage reporting, deployment efficiency metadata
🔧
2. Low-Friction Instrumentation

Automated measurement tools integrated into training pipelines:

  • CodeCarbon for automated energy/emissions tracking
  • ML CO₂ Impact Calculator for job-level estimation
  • Cloud provider sustainability APIs for location-adjusted data
  • MLPerf Inference for standardized benchmarking protocols
  • Region-based defaults with explicit data-quality tiers when facility-level WUE unavailable
  • Generate reports with minimal manual effort
📊
3. Ecosystem-Level Aggregation

Public registry or dashboard summarizing reported footprints:

  • Aggregate data across releases and model families
  • Track trends over time and identify high-impact families
  • Benchmark efficiency improvements at ecosystem scale
  • Enable comparative analysis across lineages
  • Estimate deployment-phase impacts via download statistics and voluntary provider reporting
  • Natural candidates: existing model hubs like Hugging Face
Design Principles
  • Voluntary & low-friction: Adoption driven by social incentives and community norms, similar to model cards
  • Imperfect is acceptable: Approximate estimates based on hardware and duration sufficient for directional insight
  • Preserves open-source benefits: No barriers to entry; small teams provide minimal info, framework focuses on aggregate patterns
  • Positive feedback loop: Making efficiency visible and comparable creates incentive for optimization
What DIA is NOT
  • Not a regulation: Does not restrict who can train, fine-tune, or release models
  • Not a gate: Does not block access or participation in open-source ecosystem
  • Not auditing: Goal is visibility into trends and relative impacts, not policing individual projects
  • Not complete solution: Foundational infrastructure that can support complementary mechanisms (compute budgets, shared targets)
Implementation Pathway
(4 Phases)
1
Norm-setting: Major open-source labs adopt standardized reporting for flagship releases; conferences encourage emissions reporting in submissions (reproducibility/ethics checklists)
2
Friction reduction: Common training stacks (PyTorch, JAX, Transformers) expose optional emissions tracking by default; cloud providers surface location-adjusted carbon and water information in job summaries
3
Ecosystem visibility: Model hubs and community dashboards aggregate and display reported data; researchers can query footprint estimates for model families and track ecosystem trends
4
Accountability: Non-binding badges or "impact labels" on model pages; standardized citations for impact statements; benchmarking supporting voluntary targets and progress tracking

Methodology: Estimating Environmental Footprint

When direct measurements are unavailable, the paper estimates training electricity from GPU-hours and hardware power, then derives CO₂ emissions and water consumption following established ML carbon accounting methodology.

Energy Consumption (kWh)
Etrain = (HGPU × Pavg × PUE) / 1000
HGPU = aggregate GPU-hours across all devices | Pavg = average GPU power draw (W) | PUE = power usage effectiveness (typical hyperscale: 1.1-1.2)
When measured power unavailable, use vendor TDP values as upper bound (actual draw typically 60-80% of TDP)
Carbon Emissions (tCO₂eq)
Ctrain = (Etrain × CI) / 1000
CI = grid carbon intensity (kgCO₂/kWh, typical range 0.1-0.6 depending on region and energy mix)
Water Consumption (L, then ML)
Wtrain = Etrain × WUEtotal
WUEtotal = water usage effectiveness (L/kWh), combining on-site cooling + off-site electricity generation
Typical range: 1.8-4.0 L/kWh | Report in megalitres: W(ML) = W(L)/10⁶
Critical distinction: Water consumption (evaporated/not returned) vs. water withdrawal (taken from source). Consumption drives local scarcity impacts.
The Water Dimension: A Hidden Cost

Beyond carbon emissions, AI consumes substantial amounts of water for evaporative cooling in data centers and indirectly through electricity generation. Unlike carbon, water impacts are highly localized and depend on basin-level scarcity.

Scale: Mid-sized data center: ~300,000 gallons/day (≈1,000 households); hyperscale facilities: up to 5 million gallons/day. Risk: MSCI analysis found 1 in 4 data-center assets may face increased water scarcity by 2050.

Open vs Closed Ecosystem Dynamics

Closed models (e.g., GPT-4): Trained and served centrally via API; inference scales with demand but remains centrally metered in provider data centers. Organizational boundaries enable internal accountability.

Open models (e.g., Llama 3): Trained once, then branch into many derivatives (fine-tunes, quantizations, adapters, merges) produced by independent users. This diffuses environmental impacts across a distributed ecosystem, making aggregate footprint harder to quantify and enabling the tragedy of the commons.

Call to Action

Concrete steps the ML community can take toward ecosystem-level sustainability

👨‍🔬

Researchers & Practitioners

  • Include emissions and water estimates in model cards and paper submissions
  • Use tools like CodeCarbon or Carbontracker to measure training costs
  • Document base models used and incremental compute required for derivatives
  • Estimate inference footprints for deployed systems
  • Remember: imperfect estimates are better than no estimates
📝

Conference Organizers & Reviewers

  • Encourage environmental reporting in reproducibility checklists
  • Provide graduated expectations (lightweight vs. resource-intensive work)
  • Recognize efficiency as a first-class contribution, not secondary consideration
  • Consider environmental impact when evaluating scaling-focused work
🗂️

Model Hub Operators

  • Implement standardized metadata fields for carbon and water reporting
  • Develop dashboards aggregating data across model families and derivatives
  • Surface efficiency metrics alongside accuracy benchmarks in discovery
  • Make sustainability visible and actionable for users
☁️

Cloud Providers & Hardware Vendors

  • Expose per-job carbon intensity and water usage through standardized APIs
  • Provide users with actionable data on environmental cost of workloads
  • Enable carbon-aware scheduling by default
  • Support ecosystem-level reporting standards
💰

Funding Agencies

  • Require environmental impact statements in grant proposals for compute-intensive research
  • Consider efficiency and sustainability as evaluation criteria alongside scientific merit
  • Support research on measurement and coordination infrastructure
  • Fund development of low-friction reporting tools
🏢

Open-Source Labs & Foundations

  • Lead by example with comprehensive environmental reporting for flagship releases
  • Invest in tooling that reduces reporting friction
  • Participate in developing community standards for sustainability accounting
  • Share lessons learned and best practices publicly
🌍
A Path Forward

We propose that by 2027, major open-weight model releases include standardized DIA reports covering training emissions, water usage, and documented lineage. Achieving this requires no regulatory mandate—only coordination. The tools exist; the data can be collected; the community has demonstrated its capacity for collective action on challenges like model cards, dataset documentation, and reproducibility standards.

Earth does not distinguish between emissions from open and closed-source models, between base models and derivatives, or between training and inference. The infrastructure we build today will determine whether open-source AI develops responsibly or experiences uncoordinated growth. What remains is the decision to act.

Acknowledgements

Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

GWT acknowledges support from the Natural Sciences and Engineering Research Council (NSERC), the Canada Research Chairs program, and the Canadian Institute for Advanced Research (CIFAR) Canada CIFAR AI Chairs program.

For more information about Vector Institute partners, visit vectorinstitute.ai/#partners

Citation

Use the BibTeX below to cite this work. Authors and venue details will be updated after de-anonymization.

@article{Raza2026SustainableAI,
  title={Sustainable Open-Source AI Requires Tracking the Cumulative Footprint of Derivatives},
  author={Raza, Shaina and Eyriay, Iuliia and Radwan, Ahmed Y and Lesperance, Nate and Pandya, Deval and Kocak, Sedef Akinli and Taylor, Graham W.},
  journal={arXiv preprint arXiv:2601.21632},
  year={2026},
  doi={10.48550/arXiv.2601.21632}
}

Data attribution: Model training footprint data (Table 1) is compiled from public disclosures, published papers, and prior analyses following established ML carbon accounting methodology. See paper Section 2.3 and Appendix A for detailed estimation procedures and references for original sources.