A Resource Benchmark of Human Microbiota–Associated Mouse Models Reveals Donor-Dominated Outcomes


Updated on June 05, 2026

Contents

  1. Introduction
  2. Analysis-ready dataset
  3. Methods
  4. Results
    1. Pre-processing
    2. Microbiome
  5. Discussion
  6. Data Availability
  7. Code Availability
  8. References
  9. Abbreviations

Color key


 

Introduction

 

Analysis-ready dataset

  1. Donors, N=10
    1. Eight female (F) and two male (M) healthy donors

  2. Samples (post-QC), N=1,249
    1. Donor, N=10
    2. Donor replicates, N=9 (F1 not replicated)
    3. OMM12, N=12
    4. Neg. control, N=8
    5. Mouse, N=1,210

  3. Modality × tissue × timepoint

  4. Modality Tissue(s) Weeks post-FMT

    Bacterial metagenomics (long-read) Fecal (F) 0 (donor), 1, 2, 4, 6, 8
    Colon (C), Ileum (I), Cecum (M) 3, 8
    Mycobiome / Virome (long-read) Fecal (F) 1, 2, 4, 6, 8

    This report covers the metagenomics modalities only; metabolomics (LC–MS) and CyTOF immune phenotyping are reported separately.

  5. Sample naming convention
    1. Donor sex/ID _ Mouse sex/ID _ Mouse age (weeks) _ Sample type
    2. Donor sex/ID: [ F0 | F1 | F2 | F3 | M4 | M5 | F6 | F7 | F8 | F9 | Ctrl (OMM12) | Neg (neg. control)]
    3. Mouse sex/ID: [ M | F ] + Mouse ID
    4. Mouse age: [ 0 (donor) | 1 | 2 | 3 | 4 | 6 | 8 | X (neg) ]
    5. Sample type: [ C (colon) | F (fecal) | I (ileum) | M (cecum) | Ctrl | Neg | 1 (donor) | 2 (donor) ]

    6. Examples:
      • F0_F20_6_F: Fecal sample from 6-week-old female mouse #20 derived from female donor #0
      • M5_F509_3_M: Cecum sample from 3-week-old female mouse #509 derived from male donor #5
      • F6_NA_0_1: 1st fecal sample from female donor #6
      • Ctrl_Ctrl3_3_Ctrl: Control sample from the OMM12 group
      • Neg_6_X_Neg: Negative control sample
 

Methods

Click each item to expand.
  1. Sequencing and basecalling
    • Long-read sequencing on Oxford Nanopore Technologies (ONT) PromethION flow cells
    • Dorado (v1.1.1): basecalling from POD5 with model dna_r10.4.1_e8.2_400bps_sup@v5.2.0
    • Read filtering policy: reads shorter than 35 bp ignored by Kraken2 during classification; reads longer than 15,000 bp trimmed to 15,000 bp; reads longer than 100,000 bp excluded
    • NanoPlot De Coster and Rademakers (2023): per-read length, quality, and yield metrics for long-read data
    • MultiQC Ewels et al. (2016): aggregates QC reports from many tools into a single summary report
    • FastQC: per-read quality control checks
    • SAMtools Danecek et al. (2021): manipulate and inspect alignment files
  2. Donor-derived MAG reference catalogue
    • metaFlye Kolmogorov et al. (2020): long-read metagenome assembly using repeat graphs
    • SemiBin2 Pan et al. (2023): semi-supervised metagenomic binning
    • Medaka (v2.1.1): consensus polishing of long-read assemblies
    • GTDB-Tk Chaumeil et al. (2022): standardized taxonomy assignment for bacterial and archaeal genomes
    • Output: 388 initial bins → 165 non-redundant donor-derived MAGs defining the feature universe for the custom Kraken2 database
  3. Taxonomic classifier evaluation and selection
    • Kraken2 Wood et al. (2019): k-mer-based per-read taxonomic classification — selected as the primary classifier
    • Minimap2 Li (2018) + CoverM Aroney et al. (2025): alignment-based read mapping and coverage estimation (comparator)
    • MetaPhlAn 4 (Metagenomic Phylogenetic Analysis) Blanco-Míguez et al. (2023): marker-gene profiling (comparator)
    • Sylph Shaw and Yu (2025): ANI/sketch-based profiling (comparator)
    • Selection criteria: support for fully customizable databases, per-read assignment outputs, and operational scalability under control-sample benchmarking (OMM12 only vs decoy-inclusive databases)
  4. Species-level abundance estimation and read-level QC
    • basen (v1.0.0): Base-level Abundance estimation with Species-assigned Evidence using Nanopore — species-rank evidence aggregated from per-read Kraken2 k-mer assignments, normalized by reference genome length to a coverage proxy, and rescaled within sample to relative abundance
    • Read-level compositional QC (basen): two-step filter using (1) the proportion of unassigned k-mers per read (excluded if >75%) and (2) the Shannon diversity of the per-read TaxID assignment profile (threshold >1.0 for high-depth donors, >0.75 for shallower mouse, replicate, and control samples)
  5. Mycobiome and virome profiling
    • Reads unclassified against the custom donor-derived Kraken2 database were re-classified against the k2_pluspf (fungi, protozoa) and k2_viral databases
    • Reads unmapped to both fungal and viral indices were screened against the mouse (GRCm39) and human (GRCh38) reference genomes
    • Donor-level cross-domain correlations: Spearman with Phipson–Smyth permutation p-values Phipson and Smyth (2010), BH correction within each kingdom-pair family
  6. Procrustes analysis of cross-kingdom community structure
    • Symmetric Procrustes superimposition via vegan::procrustes / protest; m2 statistic and permutation p-value (999 permutations)
    • Donor-level centroid displacement vectors clustered with Euclidean distance and complete linkage
 

Results

This report covers the metagenomics component of MICROBENCH only. Section A (Pre-processing) expands on pipeline-internal results that the manuscript describes only briefly; Section B (Microbiome) presents the bacterial community results (Figs 1–2)

 

A. Pre-processing

Pipeline-internal steps described only briefly in the manuscript and expanded here with full detail: sequencing QC, classifier benchmarking, donor-derived custom Kraken2 database construction, basen method development, read-level compositional QC, and donor technical-replicate validation.
  1. Experimental design

  2. Basecalling

  3. Basecalling and demultiplex results

  4. Quality assessment

  5. Samples included in the downstream analysis

  6. Read quality control

  7. Post-QC Stats

  8. Classifier evaluation

  9. Create a custom Kraken2 database tailored to the microbial community of interest

  10. Taxonomy classification via Kraken2 with the custom DB (165)

  11. Develop a method to calculate relative abundance

  12. Filter low quality reads

 

B. Microbiome (Figs 1–2)

Donor-stratified bacterial community structure: per-donor relative abundance heatmaps, prominent-taxa engraftment over time, donor-specific stable weeks, tic-tac-toe engraftment classification, α- and β-diversity across compartments, and the operational definition of convergence.
  1. Calculate relative abundance

  2. Correlation/Variability visualization

  3. Diversity

  4. Engraftment varies over time

  5. Taxa and engraftment characteristics

  6. Alpha and beta diversity

  7. Convergence
 

Discussion


Limitations

 

Data Availability

 

Code Availability

 

References

 

Abbreviations

Click to expand
    ANI, average nucleotide identity (sketch-based k-mer sampling)
    BH, Benjamini–Hochberg (false discovery rate correction)
    bp, base pair
    CLR, centered log-ratio
    ctrl, control
    CyTOF, time-of-flight mass cytometry
    D, donor(s)
    DB, database
    FDR, false discovery rate
    FMT, fecal microbiota transplant
    HMA, human microbiota-associated
    LC–MS, liquid chromatography–mass spectrometry
    MAG, metagenome-assembled genome
    neg, negative
    OMM (OMM12), oligo mouse microbiota
    ONT, Oxford Nanopore Technologies
    PCA, principal component analysis
    PCoA, principal coordinates analysis
    PERMANOVA, permutational multivariate analysis of variance
    QC, quality control
    SD, standard deviation