bio-longread-qc

Long-Read Quality Control

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-longread-qc" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-longread-qc

Long-Read Quality Control

NanoPlot - Visualization

From FASTQ

NanoPlot --fastq reads.fastq.gz -o nanoplot_output -t 4

From BAM

NanoPlot --bam aligned.bam -o nanoplot_output -t 4

From sequencing summary (fastest)

NanoPlot --summary sequencing_summary.txt -o nanoplot_output

NanoPlot - Common Options

NanoPlot --fastq reads.fastq.gz
-o nanoplot_output
-t 8
--N50 \ # Show N50 in plots --title "Sample QC"
--plots hex dot \ # Plot types --format png pdf \ # Output formats --color darkblue
--maxlength 50000 \ # Max length for plots --minlength 500 # Min length for plots

NanoStat - Statistics Only

Quick statistics (no plots)

NanoStat --fastq reads.fastq.gz --threads 4

From BAM

NanoStat --bam aligned.bam --threads 4

Output to file

NanoStat --fastq reads.fastq.gz --threads 4 > qc_stats.txt

chopper - Filter Reads

Filter by length and quality

gunzip -c reads.fastq.gz | chopper -q 10 -l 1000 | gzip > filtered.fastq.gz

Quality >= 10, length >= 1000bp

chopper - Common Options

gunzip -c reads.fastq.gz | chopper
--quality 10 \ # Min quality --minlength 1000 \ # Min length --maxlength 50000 \ # Max length --headcrop 50 \ # Remove from start --tailcrop 50 \ # Remove from end --threads 4
| gzip > filtered.fastq.gz

NanoFilt - Alternative Filter

Filter with NanoFilt

gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 1000 | gzip > filtered.fastq.gz

With more options

gunzip -c reads.fastq.gz | NanoFilt
--quality 10
--length 1000
--maxlength 50000
--headcrop 50
| gzip > filtered.fastq.gz

Porechop - Adapter Trimming

Trim adapters

porechop -i reads.fastq.gz -o trimmed.fastq.gz --threads 8

With barcode splitting

porechop -i reads.fastq.gz -b output_dir/ --threads 8

Generate Summary Statistics

Quick summary with seqkit

seqkit stats reads.fastq.gz

Detailed stats

seqkit stats -a reads.fastq.gz

Watch stats during basecalling

seqkit watch --fields ReadLen,MeanQual reads.fastq.gz

PycoQC - From Basecalling

Generate QC report from sequencing_summary.txt

pycoQC -f sequencing_summary.txt -o pycoqc_report.html

With BAM for alignment stats

pycoQC -f sequencing_summary.txt -a aligned.bam -o pycoqc_report.html

Calculate N50

With seqkit

seqkit stats -a reads.fastq.gz | grep N50

Manual calculation

seqkit fx2tab -l reads.fastq.gz | cut -f 2 | sort -rn |
awk '{sum+=$1; len[NR]=$1} END { target=sum/2; cumsum=0; for(i=1; i<=NR; i++) { cumsum+=len[i]; if(cumsum>=target) {print "N50:", len[i]; break} } }'

Parse FASTQ Quality in Python

import numpy as np from Bio import SeqIO

lengths = [] qualities = []

for record in SeqIO.parse('reads.fastq', 'fastq'): lengths.append(len(record)) qualities.append(np.mean(record.letter_annotations['phred_quality']))

print(f'Total reads: {len(lengths)}') print(f'Total bases: {sum(lengths):,}') print(f'Mean length: {np.mean(lengths):.0f}') print(f'Median length: {np.median(lengths):.0f}') print(f'Mean quality: {np.mean(qualities):.1f}')

NanoPlot Output Files

File Description

NanoStats.txt Summary statistics

NanoPlot-report.html Interactive report

LengthvsQualityScatterPlot Length vs Q plot

WeightedHistogramReadlength Read length distribution

Yield_By_Length Cumulative yield

Key Parameters - NanoPlot

Parameter Description

--fastq Input FASTQ

--bam Input BAM

--summary Sequencing summary

-o Output directory

-t Threads

--N50 Show N50 line

--plots Plot types

--format Output formats

Key Parameters - chopper

Parameter Default Description

-q 0 Min quality

-l 0 Min length

--maxlength inf Max length

--headcrop 0 Trim from start

--tailcrop 0 Trim from end

-t 4 Threads

Quality Thresholds

Q Score Accuracy Typical Use

Q7 ~80% Very low quality

Q10 ~90% Basic filtering

Q15 ~97% Moderate filtering

Q20 ~99% High quality (SUP)

Q30 ~99.9% Very high (HiFi)

Related Skills

  • long-read-alignment - Align filtered reads

  • sequence-io - FASTQ handling

  • medaka-polishing - Polish with filtered reads

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-metagenomics-kraken

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review