rMAP-TB is a reproducible, Dockerized WDL/Cromwell workflow for public-health-oriented analysis of Mycobacterium tuberculosis complex (MTBC) and non-MTBC Mycobacterium genomic data. The workflow supports paired-end Illumina FASTQ inputs and integrates read preprocessing, sequence quality control, Mycobacteria species typing, MTBC/non-MTBC routing, TB drug-resistance profiling, lineage interpretation, core-SNP phylogenomics, SNP clustering, and interactive surveillance reporting.
rMAP-TB was developed to support genomic surveillance of tuberculosis and clinically relevant Mycobacteria by combining species identification, drug-resistance interpretation, and phylogenomic analysis within a portable and reproducible workflow framework. It is designed for local workstation, server, and cloud-compatible execution using Docker containers and Cromwell/WDL workflow orchestration.
The workflow begins with read trimming, FastQC-based sequence quality assessment, MultiQC aggregation, and Kraken2/Bracken-based Mycobacteria species typing. Species typing is used to route samples before MTBC-specific analyses: MTBC-supported samples proceed to TB-Profiler resistance, species, and lineage profiling, while non-MTBC Mycobacteria are summarized separately through an NTM speciation branch.
For MTBC-supported samples, rMAP-TB performs TB-Profiler-based drug-resistance and lineage interpretation, Snippy-based variant calling, Snippy-core core-genome alignment, drug-resistance-associated non-synonymous mutation summarization, pairwise SNP distance estimation, SNP cluster interpretation, lineage distribution analysis, optional Gubbins recombination filtering, IQ-TREE2 maximum-likelihood phylogeny, and ETE3-based tree visualization.
rMAP-TB generates integrated HTML reports and downloadable public-health surveillance outputs, including QC filtering rationale, Mycobacteria species typing summaries, NTM speciation summaries, TB-Profiler mutation-level resistance evidence, resistance-profile summaries, lineage distribution summaries, pairwise SNP distance tables, SNP cluster summaries, SNP distance heatmaps, phylogenetic tree visualizations, and surveillance metadata TSV files.
Key workflow capabilities include:
- Mycobacteria species typing and routing — uses Kraken2 and Bracken to identify the most probable Mycobacteria species and route samples into MTBC-supported or non-MTBC/NTM reporting branches.
- TB drug-resistance and lineage interpretation — applies TB-Profiler to MTBC-supported samples to summarize species, lineage, sub-lineage, predicted resistance profile, resistant drugs, and mutation-level resistance evidence.
- MTBC core-SNP phylogenomics — performs reference-guided variant calling, core-genome SNP alignment, optional recombination filtering, maximum-likelihood phylogenetic inference, and metadata-enhanced tree visualization.
- SNP distance and cluster interpretation — estimates pairwise SNP distances, generates SNP distance matrices and heatmaps, and summarizes genomically close sample pairs using configurable SNP-distance thresholds.
- Integrated surveillance reporting — produces GitHub Pages-compatible HTML reports with downloadable TSV outputs for QC, species typing, resistance interpretation, lineage distribution, SNP clustering, and surveillance metadata.
- Reproducible workflow execution — uses Dockerized tools and WDL/Cromwell orchestration to support consistent execution across laptops, servers, and cloud-based environments.
The rMAP-TB GitHub repository, documentation, workflow files, example inputs, and GitHub Pages-compatible reports are available here:
https://github.com/gmboowa/rMAP-TB
The GitHub Pages report site is available here:
https://gmboowa.github.io/rMAP-TB/
rMAP-TB manuscript: rMAP-TB: a reproducible WDL/Cromwell workflow for Mycobacterium tuberculosis complex genomic surveillance and drug-resistance interpretation.