bio-metagenomics-kraken

Kraken2 Classification

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-metagenomics-kraken" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-metagenomics-kraken

Kraken2 Classification

Basic Classification

Classify reads against standard database

kraken2 --db /path/to/kraken2_db
--output output.kraken
--report report.txt
reads.fastq.gz

Paired-End Reads

kraken2 --db /path/to/kraken2_db
--paired
--output output.kraken
--report report.txt
reads_R1.fastq.gz reads_R2.fastq.gz

Common Options

kraken2 --db /path/to/kraken2_db
--threads 8 \ # CPU threads --confidence 0.1 \ # Confidence threshold --minimum-base-quality 20 \ # Quality filter --output output.kraken
--report report.txt
--use-names \ # Add taxon names to output --gzip-compressed \ # Input is gzipped reads.fastq.gz

Memory-Efficient Mode

For systems with limited RAM

kraken2 --db /path/to/kraken2_db
--memory-mapping \ # Use disk-based database --output output.kraken
--report report.txt
reads.fastq.gz

Report Only (No Per-Read Output)

Save space by not writing per-read classifications

kraken2 --db /path/to/kraken2_db
--report report.txt
--report-zero-counts \ # Include taxa with 0 counts reads.fastq.gz

Classified/Unclassified Output

Separate classified and unclassified reads

kraken2 --db /path/to/kraken2_db
--classified-out classified#.fq \ # # replaced by 1/2 for PE --unclassified-out unclassified#.fq
--output output.kraken
--report report.txt
--paired
reads_R1.fastq.gz reads_R2.fastq.gz

Build Custom Database

Download taxonomy

kraken2-build --download-taxonomy --db custom_db

Download specific libraries

kraken2-build --download-library bacteria --db custom_db kraken2-build --download-library archaea --db custom_db kraken2-build --download-library viral --db custom_db

Build database

kraken2-build --build --db custom_db --threads 8

Clean up intermediate files

kraken2-build --clean --db custom_db

Add Custom Sequences

Add FASTA sequences to library

kraken2-build --add-to-library custom_genomes.fasta --db custom_db

Then build

kraken2-build --build --db custom_db

Inspect Database

View database contents

kraken2-inspect --db /path/to/kraken2_db | head -50

Report Format

17.45 1745 1745 U 0 unclassified 82.55 8255 48 R 1 root 82.07 8207 2 R1 131567 cellular organisms 81.99 8199 132 D 2 Bacteria 76.23 7623 178 P 1224 Proteobacteria

Columns:

  • Percentage of reads

  • Number of reads rooted at taxon

  • Number of reads directly assigned

  • Rank code (U, R, D, P, C, O, F, G, S)

  • NCBI taxon ID

  • Scientific name

Parse Kraken Output in Python

import pandas as pd

report = pd.read_csv('report.txt', sep='\t', header=None, names=['pct', 'reads_clade', 'reads_taxon', 'rank', 'taxid', 'name'])

report['name'] = report['name'].str.strip()

species = report[report['rank'] == 'S'] species_sorted = species.sort_values('pct', ascending=False) species_sorted.head(20)

Filter Report by Rank

Get only species-level classifications

awk '$4 == "S"' report.txt > species_report.txt

Get genus level

awk '$4 == "G"' report.txt > genus_report.txt

Key Parameters

Parameter Default Description

--db required Database path

--threads 1 CPU threads

--confidence 0.0 Confidence threshold (0-1)

--minimum-base-quality 0 Phred quality threshold

--memory-mapping false Use disk-based database

--paired false Paired-end mode

--use-names false Include taxon names

--report-zero-counts false Include 0-count taxa

Database Libraries

Library Content

bacteria RefSeq complete bacterial genomes

archaea RefSeq complete archaeal genomes

viral RefSeq complete viral genomes

plasmid RefSeq plasmid nucleotide sequences

human GRCh38 human genome

fungi RefSeq fungi

protozoa RefSeq protozoa

UniVec_Core Common vector sequences

Related Skills

  • abundance-estimation - Estimate abundances with Bracken

  • metaphlan-profiling - Alternative marker-based profiling

  • metagenome-visualization - Visualize results

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review