Sustainable Open-Source AI Requires Tracking the Cumulative Footprint of Derivatives

Focus: Carbon & Water Footprint Coordination in Open-Source AI

Authors: Shaina Raza¹, Iuliia Eyriay^1,2, Ahmed Y. Radwan¹, Nate Lesperance^1,2, Deval Pandya¹, Sedef Akinli Kocak¹, Graham W. Taylor^1,2

Affiliation: ¹Vector Institute, ²University of Guelph

📄 Read Paper

💧

Water Impact (Hidden Cost)

76–170 ML water consumed training GPT-4 (estimated)
1.3B+ gallons/year used by single data centers (e.g., Google Council Bluffs)
1 in 4 data centers may face water scarcity by 2050

Water consumption from on-site cooling + off-site electricity generation

🏭

Carbon Footprint Growth

415 TWh → 945 TWh data center electricity (2024-2030)
15% annual growth rate (4× faster than total demand)
30% annual growth for AI-specific servers

IEA projections showing aggregate consumption rising despite efficiency gains

♻️

The Rebound Effect

Lower marginal costs → increased usage → higher aggregate demand. Efficiency improvements can be overwhelmed by scale without coordination. Thousands of derivative models (fine-tunes, LoRA adapters, quantizations) create cumulative impacts that exceed base model training.

💡

Position Statement

Efficiency gains are critical, but reducing AI's aggregate environmental impact also requires coordination infrastructure. Open-source ecosystems need ecosystem-level carbon accounting to enable measurement, disclosure, and shared targets that make emissions visible, assign responsibility, and prevent the tragedy of the commons.

Abstract

The open-source Artificial Intelligence (AI) ecosystem has grown explosively, with Hugging Face now hosting over 2 million models. While this growth democratizes AI, it also introduces a coordination gap. The downstream derivatives incur energy use, water consumption, and emissions that remain largely unobserved and inconsistently disclosed, limiting collective oversight and masking cumulative impact.

The Problem

While quantization, pruning, and efficient fine-tuning help make individual models more efficient, cheaper training and inference can also lead to more experimentation and deployment, which may outweigh these gains through rebound effects. A single foundation model like Meta Llama can spawn hundreds of derivatives within months, each consuming additional compute.

Our Proposal: DIA

We propose Data and Impact Accounting (DIA), a lightweight coordination mechanism that provides ecosystem-level visibility into carbon and water footprints without restricting open-source development. DIA combines standardized reporting, automated integration, and public aggregation dashboards.

Key Insight: Tragedy of the Commons

This dynamic represents the tragedy of the commons, where individually rational actions—such as fine-tuning models for specific use cases—can collectively increase total energy and water use. Here, the commons are the atmosphere and freshwater resources. The open-source ecosystem currently lacks governance mechanisms to coordinate responsible resource use.

Key Figures

Visual evidence of the hidden environmental reality in AI ecosystems

Figure 1. The hidden environmental reality of the AI ecosystem, illustrating: (A) Localized water stress across the United States with data center facilities competing for water in stressed basins, (B) Estimated order-of-magnitude comparison of training-related carbon and water footprints for closed vs. open models,

Figure 2. Overview of Data and Impact Accounting (DIA). Top: Current state showing invisible footprint—base model training may be reported, but derivative artifacts (fine-tunes, LoRA adapters, quantizations, merges) are typically untracked, making aggregate ecosystem impact unobservable. Bottom: Proposed DIA system introducing a low-friction visibility layer with (1) standardized impact reporting in model metadata, (2) automated tracking via existing tools, and (3) ecosystem-level aggregation through public dashboards.

Critical Context: Derivative Proliferation

Meta reports that pretraining Llama 3 (8B and 70B combined) emitted approximately 2,290 tCO₂eq. However, research documents 146+ derivatives for a single model family. Even if most derivatives are cheaper individually, the aggregate emissions across hundreds can exceed base model training by multiples. Precise estimation is currently impossible because derivative compute is rarely disclosed—this motivates DIA.

Model Training Footprints (2020-2024)

Training emissions and water consumption of selected GenAI models. Models marked with ⋆ are open-source. Tree equivalent assumes 25 kg CO₂/tree/year. Water in megalitres (ML; 1 ML = 10⁶ L).

Search models

Show open-source only (⋆)

Use midpoints for water ranges

Plot scale

Model	Year	Params	Open	tCO₂eq	Tree Equiv.	Water (ML)	R/Est.

Carbon vs Water (Training Phase)

Each point represents a model from the table. Hover for details. Axes use logarithmic scale by default to show order-of-magnitude differences.

📊 Data Notes & Methodology

Water values: Estimated using WUE_total (Water Usage Effectiveness) ranges of 1.8-4.0 L/kWh, combining on-site cooling and off-site electricity generation water consumption (water not returned locally).
Reported vs Estimated: "R" indicates values disclosed by model creators; "Est." uses GPU-hours, TDP, PUE, and carbon intensity assumptions (see paper Appendix A, Equations 1-3).
Training phase only: This table covers training phase emissions and water use. The paper emphasizes that inference and derivative proliferation can dominate lifecycle impact—these downstream costs are largely untracked.
Upper bounds: TDP-based estimates provide upper bounds; actual power draw typically ranges from 60-80% of TDP depending on utilization patterns during training.
GPT-4 range: Based on IEA's 42.4 GWh estimate with carbon intensity 0.1-0.445 kgCO₂/kWh and WUE 1.2-4.4 L/kWh depending on cooling and grid source. These are third-party estimates, not audited disclosures.

Data and Impact Accounting (DIA)

A lightweight, non-regulatory transparency infrastructure for ecosystem-level sustainability coordination

📋

1. Lightweight Reporting Schema

Minimal footprint schema embedded in model cards or repository metadata:

Hardware type and device count
Training duration (GPU-hours)
Estimated electricity use (kWh)
Estimated water use (L) or facility WUE (L/kWh)
Grid carbon intensity (kgCO₂/kWh) or training region as proxy
Model lineage (base model(s) and major downstream derivatives)
For inference: standardized per-query benchmarks, optional aggregate usage reporting, deployment efficiency metadata

🔧

2. Low-Friction Instrumentation

Automated measurement tools integrated into training pipelines:

CodeCarbon for automated energy/emissions tracking
ML CO₂ Impact Calculator for job-level estimation
Cloud provider sustainability APIs for location-adjusted data
MLPerf Inference for standardized benchmarking protocols
Region-based defaults with explicit data-quality tiers when facility-level WUE unavailable
Generate reports with minimal manual effort

📊

3. Ecosystem-Level Aggregation

Public registry or dashboard summarizing reported footprints:

Aggregate data across releases and model families
Track trends over time and identify high-impact families
Benchmark efficiency improvements at ecosystem scale
Enable comparative analysis across lineages
Estimate deployment-phase impacts via download statistics and voluntary provider reporting
Natural candidates: existing model hubs like Hugging Face

Design Principles

Voluntary & low-friction: Adoption driven by social incentives and community norms, similar to model cards
Imperfect is acceptable: Approximate estimates based on hardware and duration sufficient for directional insight
Preserves open-source benefits: No barriers to entry; small teams provide minimal info, framework focuses on aggregate patterns
Positive feedback loop: Making efficiency visible and comparable creates incentive for optimization

What DIA is NOT

Not a regulation: Does not restrict who can train, fine-tune, or release models
Not a gate: Does not block access or participation in open-source ecosystem
Not auditing: Goal is visibility into trends and relative impacts, not policing individual projects
Not complete solution: Foundational infrastructure that can support complementary mechanisms (compute budgets, shared targets)

Implementation Pathway
(4 Phases)

Norm-setting: Major open-source labs adopt standardized reporting for flagship releases; conferences encourage emissions reporting in submissions (reproducibility/ethics checklists)

Friction reduction: Common training stacks (PyTorch, JAX, Transformers) expose optional emissions tracking by default; cloud providers surface location-adjusted carbon and water information in job summaries

Ecosystem visibility: Model hubs and community dashboards aggregate and display reported data; researchers can query footprint estimates for model families and track ecosystem trends

Accountability: Non-binding badges or "impact labels" on model pages; standardized citations for impact statements; benchmarking supporting voluntary targets and progress tracking

Methodology: Estimating Environmental Footprint

When direct measurements are unavailable, the paper estimates training electricity from GPU-hours and hardware power, then derives CO₂ emissions and water consumption following established ML carbon accounting methodology.

Energy Consumption (kWh)

E_train = (H_GPU × P_avg × PUE) / 1000

H_GPU = aggregate GPU-hours across all devices | P_avg = average GPU power draw (W) | PUE = power usage effectiveness (typical hyperscale: 1.1-1.2)

When measured power unavailable, use vendor TDP values as upper bound (actual draw typically 60-80% of TDP)

Carbon Emissions (tCO₂eq)

C_train = (E_train × CI) / 1000

CI = grid carbon intensity (kgCO₂/kWh, typical range 0.1-0.6 depending on region and energy mix)

Water Consumption (L, then ML)

W_train = E_train × WUE_total

WUE_total = water usage effectiveness (L/kWh), combining on-site cooling + off-site electricity generation
Typical range: 1.8-4.0 L/kWh | Report in megalitres: W^(ML) = W^(L)/10⁶

Critical distinction: Water consumption (evaporated/not returned) vs. water withdrawal (taken from source). Consumption drives local scarcity impacts.

The Water Dimension: A Hidden Cost

Beyond carbon emissions, AI consumes substantial amounts of water for evaporative cooling in data centers and indirectly through electricity generation. Unlike carbon, water impacts are highly localized and depend on basin-level scarcity.

Scale: Mid-sized data center: ~300,000 gallons/day (≈1,000 households); hyperscale facilities: up to 5 million gallons/day. Risk: MSCI analysis found 1 in 4 data-center assets may face increased water scarcity by 2050.

Open vs Closed Ecosystem Dynamics

Closed models (e.g., GPT-4): Trained and served centrally via API; inference scales with demand but remains centrally metered in provider data centers. Organizational boundaries enable internal accountability.

Open models (e.g., Llama 3): Trained once, then branch into many derivatives (fine-tunes, quantizations, adapters, merges) produced by independent users. This diffuses environmental impacts across a distributed ecosystem, making aggregate footprint harder to quantify and enabling the tragedy of the commons.

Call to Action

Concrete steps the ML community can take toward ecosystem-level sustainability

👨‍🔬

Researchers & Practitioners

Include emissions and water estimates in model cards and paper submissions
Use tools like CodeCarbon or Carbontracker to measure training costs
Document base models used and incremental compute required for derivatives
Estimate inference footprints for deployed systems
Remember: imperfect estimates are better than no estimates

📝

Conference Organizers & Reviewers

Encourage environmental reporting in reproducibility checklists
Provide graduated expectations (lightweight vs. resource-intensive work)
Recognize efficiency as a first-class contribution, not secondary consideration
Consider environmental impact when evaluating scaling-focused work

🗂️

Model Hub Operators

Implement standardized metadata fields for carbon and water reporting
Develop dashboards aggregating data across model families and derivatives
Surface efficiency metrics alongside accuracy benchmarks in discovery
Make sustainability visible and actionable for users

☁️

Cloud Providers & Hardware Vendors

Expose per-job carbon intensity and water usage through standardized APIs
Provide users with actionable data on environmental cost of workloads
Enable carbon-aware scheduling by default
Support ecosystem-level reporting standards

💰

Funding Agencies

Require environmental impact statements in grant proposals for compute-intensive research
Consider efficiency and sustainability as evaluation criteria alongside scientific merit
Support research on measurement and coordination infrastructure
Fund development of low-friction reporting tools

🏢

Open-Source Labs & Foundations

Lead by example with comprehensive environmental reporting for flagship releases
Invest in tooling that reduces reporting friction
Participate in developing community standards for sustainability accounting
Share lessons learned and best practices publicly

🌍

A Path Forward

We propose that by 2027, major open-weight model releases include standardized DIA reports covering training emissions, water usage, and documented lineage. Achieving this requires no regulatory mandate—only coordination. The tools exist; the data can be collected; the community has demonstrated its capacity for collective action on challenges like model cards, dataset documentation, and reproducibility standards.

Earth does not distinguish between emissions from open and closed-source models, between base models and derivatives, or between training and inference. The infrastructure we build today will determine whether open-source AI develops responsibly or experiences uncoordinated growth. What remains is the decision to act.

Acknowledgements

Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

GWT acknowledges support from the Natural Sciences and Engineering Research Council (NSERC), the Canada Research Chairs program, and the Canadian Institute for Advanced Research (CIFAR) Canada CIFAR AI Chairs program.

For more information about Vector Institute partners, visit vectorinstitute.ai/#partners

Citation

Use the BibTeX below to cite this work. Authors and venue details will be updated after de-anonymization.

@article{raza2026sustainable,
  title={Sustainable Open-Source AI Requires Tracking the Cumulative Footprint of Derivatives},
  author={Raza, Shaina and Eyriay, Iuliia and Radwan, Ahmed Y and Lesperance, Nate and Pandya, Deval and Kocak, Sedef Akinli and Taylor, Graham W},
  journal={arXiv preprint arXiv:2601.21632},
  year={2026}
}

Data attribution: Model training footprint data (Table 1) is compiled from public disclosures, published papers, and prior analyses following established ML carbon accounting methodology. See paper Section 2.3 and Appendix A for detailed estimation procedures and references for original sources.