Skip to main content
  • Home
  • Development
  • Documentation
  • Donate
  • Operational login
  • Browse the archive

swh logo
SoftwareHeritage
Software
Heritage
Archive
Features
  • Search

  • Downloads

  • Save code now

  • Add forge now

  • Help

https://github.com/shorvath/MammalianMethylationConsortium
14 October 2025, 19:56:10 UTC
  • Code
  • Branches (9)
  • Releases (0)
  • Visits
    • Branches
    • Releases
    • HEAD
    • refs/heads/ahaghani-patch-1
    • refs/heads/main
    • refs/tags/3.1.1
    • refs/tags/v1.0.0
    • refs/tags/v2.0.0
    • refs/tags/v2.1.0
    • refs/tags/v3.0.0
    • refs/tags/v3.1.0
    • refs/tags/v4.0.0
    No releases to show
  • 7731be9
  • /
  • EnsembleAge
  • /
  • README.md
Raw File Download
Take a new snapshot of a software origin

If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.

Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.
swh spinner

Processing "take a new snapshot" request ...

To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.

  • content
  • directory
  • revision
  • snapshot
origin badgecontent badge Iframe embedding
swh:1:cnt:007b666e88a2513befaaadd33d529c23e2a940e9
origin badgedirectory badge Iframe embedding
swh:1:dir:43679180423705e9ce82cc3814e4f0af656a72fe
origin badgerevision badge
swh:1:rev:da8c94df7c75063273be48b70db89ec5d7b6f8c8
origin badgesnapshot badge
swh:1:snp:20a186412f45c3cf61a65e29c85a528e5a03cc0f

This interface enables to generate software citations, provided that the root directory of browsed objects contains a citation.cff or codemeta.json file.
Select below a type of object currently browsed in order to generate citations for them.

  • content
  • directory
  • revision
  • snapshot
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Generate software citation in BibTex format (requires biblatex-software package)
Generating citation ...
Tip revision: da8c94df7c75063273be48b70db89ec5d7b6f8c8 authored by Amin Haghani on 12 August 2025, 18:58:48 UTC
Update README.md
Tip revision: da8c94d
README.md
# EnsembleAge

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![R](https://img.shields.io/badge/R-%3E%3D3.5.0-blue.svg)](https://www.r-project.org/)
[![Paper DOI](https://img.shields.io/badge/DOI-10.1007%2Fs11357--025--01808--1-blue)](https://doi.org/10.1007/s11357-025-01808-1)
[![Software DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16814673.svg)](https://doi.org/10.5281/zenodo.16814673)

**Multi-platform epigenetic age prediction using ensemble clock methods**

## πŸ“– Description

EnsembleAge provides tools for predicting epigenetic age using various clock methods including Universal Clocks, Elastic Epigenetic clocks, static ensemble methods, and dynamic ensemble clocks. The package **automatically detects your data platform** and prepares the data accordingly for accurate age prediction across various tissues and species.

**πŸ“„ This package accompanies the open access publication:**
> Haghani, A., et al. (2025). EnsembleAge: enhancing epigenetic age assessment with a multi‑clock framework. *GeroScience*. DOI: [10.1007/s11357-025-01808-1](https://doi.org/10.1007/s11357-025-01808-1)

The clocks and methods implemented in this package are freely available for research use under the MIT license.

## πŸš€ Quick Start

**The easiest way to predict epigenetic age:**

```r
library(EnsembleAge)

# Ultimate one-liner - automatically detects platform and runs predictions
results <- predictEnsemble("path/to/methylation_data.RDS", "path/to/sample_sheet.csv")

# View predictions
head(results)
```

**That's it!** EnsembleAge automatically:
- βœ… Detects your platform (Human, Mammal40k, Mammal320k)
- βœ… Preprocesses your data correctly  
- βœ… Selects optimal clock methods
- βœ… Handles missing probes with imputation
- βœ… Generates comprehensive age predictions

## πŸ“¦ Installation

```r
# Install devtools if you haven't already
if (!requireNamespace("devtools", quietly = TRUE)) {
    install.packages("devtools")
}

# Install EnsembleAge
devtools::install_github("ahaghani/EnsembleAge")
```

### Dependencies

The package automatically installs these required R packages:
- `dplyr`, `tibble`, `tidyr`, `plyr`, `stringr`, `data.table`

## πŸ’‘ Simple Examples

### Example 1: Mammal40k Data

```r
library(EnsembleAge)

# Create simulated mammal40k methylation data
set.seed(123)
n_probes <- 1000  # Using subset for example
n_samples <- 10

# Methylation data: probes as rows, samples as columns, with CGid
methylation_data <- data.frame(
  CGid = paste0("cg", sprintf("%08d", 1:n_probes)),
  matrix(runif(n_probes * n_samples, 0.1, 0.9), nrow = n_probes)
)
names(methylation_data)[-1] <- paste0("Sample_", 1:n_samples)

# Sample sheet: required columns
sample_sheet <- data.frame(
  Basename = paste0("Sample_", 1:n_samples),
  Age = c(0.5, 1.2, 2.1, 0.8, 1.5, 2.8, 1.1, 0.9, 2.2, 1.7),  # Mouse ages in years
  Female = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 1),  # 1=female, 0=male
  Tissue = "Blood",
  SpeciesLatinName = "Mus musculus"
)

# Run predictions
results <- predictEnsemble(methylation_data, sample_sheet)

# View results
head(results)
#   Basename  Age epiClock   epiAge AgeAcceleration Female Tissue clockFamily
# 1 Sample_1  0.5   Static     0.8            0.12      1  Blood   Ensemble.Static
# 2 Sample_1  0.5 Dynamic1     0.7            0.05      1  Blood   EnsembleAge.Dynamic
# ...
```

### Example 2: Mammal320k Data (with file paths)

```r
# For mammal320k array data stored in files:
results <- predictEnsemble("mammal320k_data.RDS", "mouse_samples.csv")

# View results in wide format (traditional output)
results_wide <- predictEnsemble("mammal320k_data.RDS", "mouse_samples.csv", 
                                output_format = "wide")
head(results_wide)
#   Basename  Age  Ensemble.Static.clock.epiAge  EnsembleAge.Dynamic.X1.clock.epiAge
# 1 Mouse_01  1.2                          1.45                                  1.38
```

### Example 3: Human Data (with file paths)

```r
# For human EPIC array data stored in files:
results <- predictEnsemble("human_methylation.RData", "human_samples.csv")

# Get both long and wide format results
results_both <- predictEnsemble("human_methylation.RData", "human_samples.csv", 
                                output_format = "both")

long_format <- results_both$long    # One row per sample-clock combination
wide_format <- results_both$wide    # One row per sample, clocks as columns
```

### Example 4: Auto-generate Sample Sheet

```r
# If you don't have a sample sheet, let EnsembleAge create a template
results <- predictEnsemble(methylation_data, samples = NULL)

# This creates 'user_data_sample_sheet_template.csv' 
# Fill it out with your actual sample information and rerun
```

## πŸ“Š Data Structure Guide

### Methylation Data Format
```r
# Required format: probes as rows, samples as columns
#        CGid   Sample_1  Sample_2  Sample_3
# 1  cg00001     0.234     0.567     0.123
# 2  cg00002     0.789     0.345     0.678
# 3  cg00003     0.456     0.234     0.567
```

### Sample Sheet Format
```r
# Required columns: Basename, Age
# Optional columns: Female, Tissue, SpeciesLatinName
#   Basename   Age  Female    Tissue  SpeciesLatinName
# 1 Sample_1   1.2       1     Blood      Mus musculus
# 2 Sample_2   2.1       0     Liver      Mus musculus  
# 3 Sample_3   0.8       1    Kidney      Mus musculus
```

### Output Format
```r
# Long format (default): one row per sample-clock combination
#   Basename  Age epiClock  epiAge AgeAcceleration  clockFamily
# 1 Sample_1  1.2   Static    1.45           0.25  Ensemble.Static
# 2 Sample_1  1.2 Dynamic1    1.38           0.18  EnsembleAge.Dynamic

# Wide format: one row per sample, clocks as columns  
#   Basename  Age  Ensemble.Static.clock.epiAge  EnsembleAge.Dynamic.X1.clock.epiAge
# 1 Sample_1  1.2                           1.45                                  1.38
```

## πŸ”§ Supported Platforms

EnsembleAge **automatically detects and handles** multiple methylation array platforms:

| Platform | Probe Count | File Types | Auto-Detection | File Path Support |
|----------|-------------|------------|----------------|-------------------|
| **Human EPIC/450k** | ~850k/450k | `.RData`, `.RDS`, `.csv`, `.txt` | βœ… Automatic | βœ… Yes |
| **Mammal320k** | ~320k | `.RData`, `.RDS`, `.csv`, `.txt` | βœ… Automatic | βœ… Yes |
| **Mammal40k** | ~40k | `.RData`, `.RDS`, `.csv`, `.txt` | βœ… Automatic | βœ… Yes |

**File Path Support**: All platforms work with both file paths (`"data.RDS"`) and pre-loaded R objects (`my_data_frame`).

### ⚑ What Happens Automatically

- πŸ” **Platform Detection**: Identifies your array type from data dimensions and patterns
- πŸ”„ **Data Orientation**: Detects if probes are rows or columns and fixes automatically  
- 🧬 **Missing Probe Handling**: Imputes missing CpG sites with neutral values (0.5)
- πŸ“Š **Smart Method Selection**: Chooses best clock methods based on data coverage
- πŸ’Ύ **Multiple Output Formats**: Long format (analysis-ready) or wide format (traditional)

## πŸ“‹ Data Requirements

### Methylation Data
- **Format**: Data frame or matrix with CpG sites and sample methylation values
- **Values**: Beta values (0-1 range) representing methylation levels
- **Orientation**: Probes as rows OR columns (auto-detected and fixed)
- **File Types**: `.RDS`, `.RData`, `.csv` supported

### Sample Sheet
- **Required**: `Basename` (sample IDs), `Age` (chronological age in years)
- **Optional**: `Female` (1=female, 0=male), `Tissue`, `SpeciesLatinName`
- **Format**: CSV file or R data frame

### 🎯 Quick Tips
- **Ages**: Use years (e.g., 1.5 for 18-month-old mouse, 25.3 for human)
- **Missing data**: Package handles missing probes automatically
- **No sample sheet?** Use `samples = NULL` to auto-generate a template
- **File paths**: Use full paths or ensure files are in working directory

## 🎯 Advanced Usage

### Multiple Output Formats

```r
# Get long format (default) - best for analysis
results_long <- predictEnsemble(methylation_data, sample_sheet, output_format = "long")

# Get wide format - one row per sample
results_wide <- predictEnsemble(methylation_data, sample_sheet, output_format = "wide") 

# Get both formats
results_both <- predictEnsemble(methylation_data, sample_sheet, output_format = "both")
```

### Method Selection

```r
# Automatic method selection (recommended)
results <- predictEnsemble(methylation_data, sample_sheet)  # Auto-selects best methods

# Available methods: "ensemble_static", "ensemble_dynamic", "ensemble_dual", "all"
```

### Working with Files vs Objects

```r
# Using file paths (works for ALL platforms: Human, Mammal320k, Mammal40k)
results <- predictEnsemble("methylation.RDS", "samples.csv")           # RDS files
results <- predictEnsemble("data.RData", "samples.csv")                # RData files (all platforms)
results <- predictEnsemble("mammal320k_data.RData", "mouse_samples.csv") # Mammal320k RData
results <- predictEnsemble("mammal40k_data.RData", "mouse_samples.csv")  # Mammal40k RData

# Using R objects directly  
results <- predictEnsemble(my_methylation_df, my_sample_df)

# Mixed approach (file + object, or object + file)
results <- predictEnsemble("methylation.RDS", my_sample_df)           # File + object
results <- predictEnsemble(my_methylation_df, "samples.csv")          # Object + file
```

### πŸ’Ύ Result Saving Control

EnsembleAge gives you full control over where and how your results are saved:

```r
# Default: saves to current directory with automatic naming
results <- predictEnsemble("data.RDS", "samples.csv")
# Creates: EnsembleAge_data_long_20250811_143052.csv

# Custom directory
results <- predictEnsemble("data.RDS", "samples.csv", 
                          output_dir = "results/2024/study1/")

# Custom prefix instead of "EnsembleAge"
results <- predictEnsemble("data.RDS", "samples.csv", 
                          output_prefix = "MouseStudy")
# Creates: MouseStudy_data_long_20250811_143052.csv

# Completely custom filename
results <- predictEnsemble("data.RDS", "samples.csv", 
                          output_filename = "my_final_results.csv")

# Turn off saving entirely (just return results)
results <- predictEnsemble("data.RDS", "samples.csv", 
                          save_results = FALSE)

# Both formats with custom naming
results <- predictEnsemble("data.RDS", "samples.csv", 
                          output_format = "both",
                          output_dir = "final_results/",
                          output_prefix = "Project_Alpha")
# Creates: Project_Alpha_data_long_20250811_143052.csv
#          Project_Alpha_data_wide_20250811_143052.csv
```

**Automatic Features:**
- βœ… **Directory Creation**: Output directories are created automatically if they don't exist
- βœ… **Timestamp**: Automatic timestamps prevent file overwrites  
- βœ… **Smart Naming**: Custom filenames work with "both" format (adds `_long` and `_wide`)
- βœ… **File Size**: Shows saved file size for easy reference

### πŸ“Š Clock Families Included

| Clock Family | Best For | # Clocks | Description | Reference |
|--------------|----------|----------|-------------|-----------|
| **EnsembleAge.Static** | 🐭 Mouse (primary) | 2 | Main mouse ensemble clocks | [GeroScience 2025](https://doi.org/10.1007/s11357-025-01808-1) |
| **EnsembleAge.Dynamic** | 🐭 Mouse (detailed) | 50 | Individual specialized mouse clocks | [GeroScience 2025](https://doi.org/10.1007/s11357-025-01808-1) |
| **EnsembleDualAge.Static** | πŸ‘¨ Human (primary) | 1 | Human-mouse optimized clock | [GeroScience 2025](https://doi.org/10.1007/s11357-025-01808-1) |
| **UniClock2/3** | 🌐 Cross-species | 6 | Universal mammalian clocks | [Nature Aging 2023](https://doi.org/10.1038/s43587-023-00462-6) |
| **LifespanUberClock** | 🐭 Mouse variants | 12 | Lifespan-focused clocks | [bioRxiv 2022](https://doi.org/10.1101/2022.01.16.476530) |
| **DNAmAge*** | 🐭 Mouse categories | 39 | Development, Elastic, Intervention clocks | [eLife 2022](https://doi.org/10.7554/eLife.75244) |

**Total: 110+ individual clocks** covering development, aging, interventions, and cross-species predictions.

## πŸ› οΈ Troubleshooting

### Common Issues

**"Cannot determine data orientation"**
- Ensure your data has a `CGid` column or CpG IDs as row names
- For human data, make sure methylation values are present (not just annotations)

**"No matching samples found"**  
- Check that `Basename` column in sample sheet matches column names in methylation data
- Use `head(colnames(methylation_data))` and `head(sample_sheet$Basename)` to compare

**"Low probe coverage warnings"**
- Normal for human data (relies on imputation)
- For mouse data, ensure you're using the correct platform (Mammal40k vs Mammal320k)

### Getting Help

```r
# Check what platform was detected
platform <- detect_platform(your_data)
cat("Detected platform:", platform)

# Examine your data structure  
str(your_methylation_data)
head(your_sample_sheet)

# Test with small subset first
small_test <- predictEnsemble(your_data[1:100, 1:3], your_samples[1:3, ])
```

## πŸ“ˆ Understanding Results

### Output Columns

- **Basename**: Sample identifier  
- **Age**: Input chronological age
- **epiClock**: Clock name used for prediction
- **epiAge**: Predicted epigenetic age
- **AgeAcceleration**: Age acceleration (residualized difference)
- **Female**: Sex (1=female, 0=male, NA=unknown)
- **Tissue**: Tissue type
- **clockFamily**: Clock family group

### Interpreting Results

- **epiAge**: The biological/epigenetic age predicted by the clock
- **AgeAcceleration**: Positive = aging faster, Negative = aging slower
- **Multiple clocks**: Each row represents one clock prediction per sample

## πŸ‘₯ Author & Citation

**Author:** Amin Haghani (Altos Labs)  
**Email:** dr.a.haghani@gmail.com

### Citation

If you use this package in your research, please cite:

**EnsembleAge Package:**
```
Haghani, A., et al. (2025). EnsembleAge: enhancing epigenetic age assessment with a multi‑clock framework. GeroScience.
DOI: 10.1007/s11357-025-01808-1
```

**Universal Clocks:**
```
Lu, A.T., Fei, Z., Haghani, A. et al. (2023). Universal DNA methylation age across mammalian tissues. 
Nature Aging 3, 1144–1166. DOI: 10.1038/s43587-023-00462-6
```

**Original Mammalian Clocks:**
```
Haghani, A., et al. (2022). Methylation-based epigenetic clocks for mammalian species. eLife 11:e75244.
DOI: 10.7554/eLife.75244
```

**Lifespan Uber Clock:**
```
Lu, A.T., et al. (2022). Universal methylation clocks enable analysis of aging effects in mammals.
bioRxiv. DOI: 10.1101/2022.01.16.476530
```

**BibTeX:**
```bibtex
@article{haghani2025ensembleage,
  title={EnsembleAge: enhancing epigenetic age assessment with a multi‑clock framework},
  author={Haghani, Amin and others},
  journal={GeroScience},
  year={2025},
  doi={10.1007/s11357-025-01808-1}
}

@article{lu2023universal,
  title={Universal DNA methylation age across mammalian tissues},
  author={Lu, Ake T and Fei, Zhe and Haghani, Amin and others},
  journal={Nature Aging},
  volume={3},
  pages={1144--1166},
  year={2023},
  doi={10.1038/s43587-023-00462-6}
}

@article{haghani2022methylation,
  title={Methylation-based epigenetic clocks for mammalian species},
  author={Haghani, Amin and others},
  journal={eLife},
  volume={11},
  pages={e75244},
  year={2022},
  doi={10.7554/eLife.75244}
}

@article{lu2022lifespan,
  title={Universal methylation clocks enable analysis of aging effects in mammals},
  author={Lu, Ake T and others},
  journal={bioRxiv},
  year={2022},
  doi={10.1101/2022.01.16.476530}
}
```

## πŸ“„ License

MIT License - Free for research and commercial use.

## 🀝 Contributing & Issues

- **Contributing**: Contributions are welcome! Please submit a Pull Request.
- **Issues**: Report bugs or request features on our [GitHub issues page](https://github.com/ahaghani/EnsembleAge/issues).
- **Questions**: Contact Amin Haghani at dr.a.haghani@gmail.com

---

**🎯 TL;DR: Just run `predictEnsemble(your_data, your_samples)` - everything else is automatic!**

back to top

Software Heritage β€” Copyright (C) 2015–2025, The Software Heritage developers. License: GNU AGPLv3+.
The source code of Software Heritage itself is available on our development forge.
The source code files archived by Software Heritage are available under their own copyright and licenses.
Terms of use: Archive access, APIβ€” Content policyβ€” Contactβ€” JavaScript license informationβ€” Web API