phyloFlash logo

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an Illumina (meta)genomic or transcriptomic dataset.

NOTE Version 3 changes some input options and also how mapping-based taxa (NTUs) are handled. Please download the last release of v2.0 (tar.gz archive) for the old implementation. No changes have been made to the database setup, so databases prepared for v2.0 can still be used for v3.0.

This manual explains how to install and use phyloFlash. Navigate from the menu bar above or the table of contents below.

What does phyloFlash do?

  • Summarize taxonomic diversity of a metagenome/transcriptome library from SSU rRNA read affiliations
  • Assemble/reconstruct full-length SSU rRNA sequences suitable for phylogenetic analysis
  • Quick comparison of multiple samples by their taxonomic composition using a heatmap

You may read more about the pipeline design and application in our paper.


Download via Conda

Conda is a package manager that will also install dependencies that are required if you don’t have them already.

phyloFlash is distributed through the Bioconda channel on Conda.

# If you haven't set up Bioconda already
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
# Try the following step if "solving environment" does not terminate
conda config --set channel_priority strict
# Install packages to current environment
conda install sortmerna=2.1b # Optional - if you want to use SortMeRNA option
conda install phyloflash

Download from GitHub

If you prefer not to use Conda, or are interested in a specific version that is not distributed there, you can download releases from the releases page on GitHub.

If you clone the repository directly off GitHub you might end up with a version that is still under development.

# Download latest release
tar -xzf pf3.4.tar.gz

# Check for dependencies and install them if necessary
cd phyloFlash-pf3.4
./ -check_env

Set up database and run

This assumes that the phyloFlash scripts are already in your path.

# Install reference database (takes some time) --remote

# Run with test data and 16 processors (default is to use all processors available) -lib TEST -CPUs 16 -read1 test_files/test_F.fq.gz -read2 test_files/test_R.fq.gz

# Run with interleaved reads -lib LIB -read1 reads_FR.fq.gz -interleaved

# Additionally run EMIRGE for 16S rRNA sequence reconstruction -lib LIB -emirge -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Compress output into tar.gz archive and write a log file -lib LIB -zip -log -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Run both SPAdes and EMIRGE and produce all optional outputs -lib LIB -everything -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Run SPAdes (skip EMIRGE) and produce all optional outputs (recommended) -lib LIB -almosteverything -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Supply trusted contigs containing SSU rRNA sequences to screen vs reads -lib LIB -read1 reads_F.fq.gz -read2 reads_R.fq.gz -trusted contigs.fasta

# Use SortMeRNA instead of BBmap for initial mapping (slower, but more sensitive) -lib LIB -read1 reads_F.fq.gz -read2 reads_R.fq.gz -sortmerna

Use the -help option to display a brief help and the -man option to display the full help message.

Use the -sc switch for MDA datasets (single cell) or other hard to assemble read sets.

Use the -zip switch to compress output files into tar.gz archive, and -log to save run messages to a log file

Example phyloFlash report from the provided test data can be viewed here.



phyloFlash is written by Harald Gruber-Vodicka (Google Scholar, GitHub), Elmar A. Pruesse (Google Scholar, GitHub), and Brandon Seah (Google Scholar, GitHub)

You can find the source code for phyloFlash at GitHub: HRGV/phyloFlash

Max Planck Institute for Marine Microbiology


If you use phyloFlash for a publication, please cite our paper in mSystems:

Harald R Gruber-Vodicka, Brandon KB Seah, Elmar Pruesse. phyloFlash: Rapid SSU rRNA profiling and targeted assembly from metagenomes. mSystems 5 : e00920-20; doi:10.1128/mSystems.00920-20

and also remember to cite the dependencies used, which are listed in each phyloFlash report file.