Long-Read Quality Control

NanoPlot - Visualization

From FASTQ

NanoPlot --fastq reads.fastq.gz -o nanoplot_output -t 4

From BAM

NanoPlot --bam aligned.bam -o nanoplot_output -t 4

From sequencing summary (fastest)

NanoPlot --summary sequencing_summary.txt -o nanoplot_output

NanoPlot - Common Options

NanoPlot --fastq reads.fastq.gz
-o nanoplot_output
-t 8
--N50 \ # Show N50 in plots --title "Sample QC"
--plots hex dot \ # Plot types --format png pdf \ # Output formats --color darkblue
--maxlength 50000 \ # Max length for plots --minlength 500 # Min length for plots

NanoStat - Statistics Only

Quick statistics (no plots)

NanoStat --fastq reads.fastq.gz --threads 4

From BAM

NanoStat --bam aligned.bam --threads 4

Output to file

NanoStat --fastq reads.fastq.gz --threads 4 > qc_stats.txt

chopper - Filter Reads

Filter by length and quality

gunzip -c reads.fastq.gz | chopper -q 10 -l 1000 | gzip > filtered.fastq.gz

Quality >= 10, length >= 1000bp

chopper - Common Options

gunzip -c reads.fastq.gz | chopper
--quality 10 \ # Min quality --minlength 1000 \ # Min length --maxlength 50000 \ # Max length --headcrop 50 \ # Remove from start --tailcrop 50 \ # Remove from end --threads 4
| gzip > filtered.fastq.gz

NanoFilt - Alternative Filter

Filter with NanoFilt

gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 1000 | gzip > filtered.fastq.gz

With more options

gunzip -c reads.fastq.gz | NanoFilt
--quality 10
--length 1000
--maxlength 50000
--headcrop 50
| gzip > filtered.fastq.gz

Porechop - Adapter Trimming

Trim adapters

porechop -i reads.fastq.gz -o trimmed.fastq.gz --threads 8

With barcode splitting

porechop -i reads.fastq.gz -b output_dir/ --threads 8

Generate Summary Statistics

Quick summary with seqkit

seqkit stats reads.fastq.gz

Detailed stats

seqkit stats -a reads.fastq.gz

Watch stats during basecalling

seqkit watch --fields ReadLen,MeanQual reads.fastq.gz

PycoQC - From Basecalling

Generate QC report from sequencing_summary.txt

pycoQC -f sequencing_summary.txt -o pycoqc_report.html

With BAM for alignment stats

pycoQC -f sequencing_summary.txt -a aligned.bam -o pycoqc_report.html

Calculate N50

With seqkit

seqkit stats -a reads.fastq.gz | grep N50

Manual calculation

seqkit fx2tab -l reads.fastq.gz | cut -f 2 | sort -rn |
awk '{sum+=$1; len[NR]=$1} END { target=sum/2; cumsum=0; for(i=1; i<=NR; i++) { cumsum+=len[i]; if(cumsum>=target) {print "N50:", len[i]; break} } }'

Parse FASTQ Quality in Python

import numpy as np from Bio import SeqIO

lengths = [] qualities = []

for record in SeqIO.parse('reads.fastq', 'fastq'): lengths.append(len(record)) qualities.append(np.mean(record.letter_annotations['phred_quality']))

print(f'Total reads: {len(lengths)}') print(f'Total bases: {sum(lengths):,}') print(f'Mean length: {np.mean(lengths):.0f}') print(f'Median length: {np.median(lengths):.0f}') print(f'Mean quality: {np.mean(qualities):.1f}')

NanoPlot Output Files

File Description

NanoStats.txt Summary statistics

NanoPlot-report.html Interactive report

LengthvsQualityScatterPlot Length vs Q plot

WeightedHistogramReadlength Read length distribution

Yield_By_Length Cumulative yield

Key Parameters - NanoPlot

Parameter Description

--fastq Input FASTQ

--bam Input BAM

--summary Sequencing summary

-o Output directory

-t Threads

--N50 Show N50 line

--plots Plot types

--format Output formats

Key Parameters - chopper

Parameter Default Description

-q 0 Min quality

-l 0 Min length

--maxlength inf Max length

--headcrop 0 Trim from start

--tailcrop 0 Trim from end

-t 4 Threads

Quality Thresholds

Q Score Accuracy Typical Use

Q7 ~80% Very low quality

Q10 ~90% Basic filtering

Q15 ~97% Moderate filtering

Q20 ~99% High quality (SUP)

Q30 ~99.9% Very high (HiFi)

Related Skills

long-read-alignment - Align filtered reads
sequence-io - FASTQ handling
medaka-polishing - Polish with filtered reads

bio-longread-qc

Safety Notice

Copy this and send it to your AI assistant to learn

From FASTQ

From BAM

From sequencing summary (fastest)

Quick statistics (no plots)

From BAM

Output to file

Filter by length and quality

Quality >= 10, length >= 1000bp

Filter with NanoFilt

With more options

Trim adapters

With barcode splitting

Quick summary with seqkit

Detailed stats

Watch stats during basecalling

Generate QC report from sequencing_summary.txt

With BAM for alignment stats

With seqkit

Manual calculation

Source Transparency

Related Skills

bioskills

bio-metagenomics-kraken

bio-epitranscriptomics-merip-preprocessing