bio-workflows-metagenomics-pipeline

Metagenomics Pipeline

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-workflows-metagenomics-pipeline" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-workflows-metagenomics-pipeline

Metagenomics Pipeline

Complete workflow from metagenomic FASTQ to taxonomic and functional profiles.

Workflow Overview

FASTQ files | v [1. QC & Host Removal] --> fastp + Bowtie2 | v [2. Taxonomic Classification] | +---> Kraken2 + Bracken (fast, database-dependent) | +---> MetaPhlAn (marker-based, standardized) | v [3. Functional Profiling] --> HUMAnN | v Taxonomic profiles + Pathway abundances

Primary Path: Kraken2 + Bracken + HUMAnN

Step 1: Quality Control and Host Removal

QC with fastp

for sample in sample1 sample2 sample3; do fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz
-o trimmed/${sample}_R1.fq.gz -O trimmed/${sample}_R2.fq.gz
--detect_adapter_for_pe
--qualified_quality_phred 20
--length_required 50
--html qc/${sample}_fastp.html done

Remove host reads (human example)

for sample in sample1 sample2 sample3; do bowtie2 -p 8 -x human_index
-1 trimmed/${sample}_R1.fq.gz
-2 trimmed/${sample}_R2.fq.gz
--un-conc-gz host_removed/${sample}_R%.fq.gz
> /dev/null 2> qc/${sample}_host_removal.log done

Step 2A: Kraken2 Classification

Classify reads

for sample in sample1 sample2 sample3; do kraken2 --db kraken2_db
--threads 8
--paired
--report kraken/${sample}.report
--output kraken/${sample}.output
host_removed/${sample}_R1.fq.gz
host_removed/${sample}_R2.fq.gz done

Step 2B: Bracken Abundance Estimation

Estimate species abundance

for sample in sample1 sample2 sample3; do bracken -d kraken2_db
-i kraken/${sample}.report
-o bracken/${sample}.species.txt
-r 150
-l S
-t 10 done

Combine samples into abundance matrix

combine_bracken_outputs.py
--files bracken/*.species.txt
-o bracken/combined_species.txt

Step 2C: Alternative - MetaPhlAn Profiling

Profile with MetaPhlAn 4

for sample in sample1 sample2 sample3; do metaphlan host_removed/${sample}_R1.fq.gz,host_removed/${sample}_R2.fq.gz
--bowtie2out metaphlan/${sample}.bowtie2.bz2
--input_type fastq
--nproc 8
-o metaphlan/${sample}_profile.txt done

Merge profiles

merge_metaphlan_tables.py metaphlan/*_profile.txt > metaphlan/merged_abundance.txt

Step 3: Functional Profiling with HUMAnN

Run HUMAnN

for sample in sample1 sample2 sample3; do # Concatenate paired reads cat host_removed/${sample}_R1.fq.gz host_removed/${sample}_R2.fq.gz >
host_removed/${sample}_concat.fq.gz

humann --input host_removed/${sample}_concat.fq.gz \
    --output humann/${sample} \
    --threads 8 \
    --metaphlan-options "--bowtie2db metaphlan_db"

done

Normalize and join tables

humann_renorm_table --input humann/sample1/sample1_pathabundance.tsv
--output humann/sample1/sample1_pathabundance_cpm.tsv
--units cpm

humann_join_tables --input humann
--output humann/merged_pathabundance.tsv
--file_name pathabundance

Visualization

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns

Load Bracken species table

species = pd.read_csv('bracken/combined_species.txt', sep='\t', index_col=0)

Top 20 species heatmap

top20 = species.sum(axis=1).nlargest(20).index plt.figure(figsize=(12, 8)) sns.heatmap(species.loc[top20], cmap='viridis', annot=False) plt.title('Top 20 Species Abundance') plt.tight_layout() plt.savefig('top20_species_heatmap.pdf')

Stacked bar plot

species_norm = species.div(species.sum()) * 100 top10 = species_norm.sum(axis=1).nlargest(10).index other = species_norm.loc[~species_norm.index.isin(top10)].sum()

plot_data = species_norm.loc[top10].T plot_data['Other'] = other plot_data.plot(kind='bar', stacked=True, figsize=(10, 6)) plt.ylabel('Relative Abundance (%)') plt.legend(bbox_to_anchor=(1.05, 1)) plt.tight_layout() plt.savefig('species_barplot.pdf')

Parameter Recommendations

Step Parameter Value

fastp --length_required 50 (metagenomic reads)

Kraken2 --confidence 0.0 (default) or 0.1

Bracken -r Read length (e.g., 150)

Bracken -l S (species) or G (genus)

Bracken -t 10 (min reads threshold)

MetaPhlAn --min_cu_len 2000 (default)

HUMAnN --threads 8+

Troubleshooting

Issue Likely Cause Solution

Low classification rate Database mismatch, novel organisms Try different database, check sample type

High unclassified Novel microbes, host contamination Remove host, use larger database

High host reads Incomplete host removal Use multiple host reference genomes

HUMAnN slow Large files Increase threads, pre-filter reads

Complete Pipeline Script

#!/bin/bash set -e

THREADS=8 KRAKEN_DB="kraken2_standard_db" HOST_INDEX="human_bt2_index" SAMPLES="sample1 sample2 sample3" OUTDIR="metagenomics_results"

mkdir -p ${OUTDIR}/{trimmed,host_removed,kraken,bracken,metaphlan,humann,qc}

Step 1: QC

echo "=== QC ===" for sample in $SAMPLES; do fastp -i ${sample}_R1.fastq.gz -I ${sample}_R2.fastq.gz
-o ${OUTDIR}/trimmed/${sample}_R1.fq.gz
-O ${OUTDIR}/trimmed/${sample}_R2.fq.gz
--length_required 50
--html ${OUTDIR}/qc/${sample}_fastp.html -w ${THREADS} done

Host removal

echo "=== Host Removal ===" for sample in $SAMPLES; do bowtie2 -p ${THREADS} -x ${HOST_INDEX}
-1 ${OUTDIR}/trimmed/${sample}_R1.fq.gz
-2 ${OUTDIR}/trimmed/${sample}_R2.fq.gz
--un-conc-gz ${OUTDIR}/host_removed/${sample}_R%.fq.gz
> /dev/null 2> ${OUTDIR}/qc/${sample}_host.log done

Step 2: Kraken2

echo "=== Kraken2 ===" for sample in $SAMPLES; do kraken2 --db ${KRAKEN_DB} --threads ${THREADS} --paired
--report ${OUTDIR}/kraken/${sample}.report
--output ${OUTDIR}/kraken/${sample}.output
${OUTDIR}/host_removed/${sample}_R1.fq.gz
${OUTDIR}/host_removed/${sample}_R2.fq.gz done

Bracken

echo "=== Bracken ===" for sample in $SAMPLES; do bracken -d ${KRAKEN_DB}
-i ${OUTDIR}/kraken/${sample}.report
-o ${OUTDIR}/bracken/${sample}.species.txt
-r 150 -l S -t 10 done

echo "=== Pipeline Complete ===" echo "Kraken reports: ${OUTDIR}/kraken/" echo "Bracken abundances: ${OUTDIR}/bracken/"

Related Skills

  • metagenomics/kraken-classification - Kraken2 details

  • metagenomics/metaphlan-profiling - MetaPhlAn parameters

  • metagenomics/abundance-estimation - Bracken options

  • metagenomics/functional-profiling - HUMAnN workflow

  • metagenomics/metagenome-visualization - Plotting functions

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

bio-read-qc-fastp-workflow

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflow-management-snakemake-workflows

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-genome-assembly-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-scrnaseq-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review