Long-Read Quality Control
NanoPlot - Visualization
From FASTQ
NanoPlot --fastq reads.fastq.gz -o nanoplot_output -t 4
From BAM
NanoPlot --bam aligned.bam -o nanoplot_output -t 4
From sequencing summary (fastest)
NanoPlot --summary sequencing_summary.txt -o nanoplot_output
NanoPlot - Common Options
NanoPlot --fastq reads.fastq.gz
-o nanoplot_output
-t 8
--N50 \ # Show N50 in plots
--title "Sample QC"
--plots hex dot \ # Plot types
--format png pdf \ # Output formats
--color darkblue
--maxlength 50000 \ # Max length for plots
--minlength 500 # Min length for plots
NanoStat - Statistics Only
Quick statistics (no plots)
NanoStat --fastq reads.fastq.gz --threads 4
From BAM
NanoStat --bam aligned.bam --threads 4
Output to file
NanoStat --fastq reads.fastq.gz --threads 4 > qc_stats.txt
chopper - Filter Reads
Filter by length and quality
gunzip -c reads.fastq.gz | chopper -q 10 -l 1000 | gzip > filtered.fastq.gz
Quality >= 10, length >= 1000bp
chopper - Common Options
gunzip -c reads.fastq.gz | chopper
--quality 10 \ # Min quality
--minlength 1000 \ # Min length
--maxlength 50000 \ # Max length
--headcrop 50 \ # Remove from start
--tailcrop 50 \ # Remove from end
--threads 4
| gzip > filtered.fastq.gz
NanoFilt - Alternative Filter
Filter with NanoFilt
gunzip -c reads.fastq.gz | NanoFilt -q 10 -l 1000 | gzip > filtered.fastq.gz
With more options
gunzip -c reads.fastq.gz | NanoFilt
--quality 10
--length 1000
--maxlength 50000
--headcrop 50
| gzip > filtered.fastq.gz
Porechop - Adapter Trimming
Trim adapters
porechop -i reads.fastq.gz -o trimmed.fastq.gz --threads 8
With barcode splitting
porechop -i reads.fastq.gz -b output_dir/ --threads 8
Generate Summary Statistics
Quick summary with seqkit
seqkit stats reads.fastq.gz
Detailed stats
seqkit stats -a reads.fastq.gz
Watch stats during basecalling
seqkit watch --fields ReadLen,MeanQual reads.fastq.gz
PycoQC - From Basecalling
Generate QC report from sequencing_summary.txt
pycoQC -f sequencing_summary.txt -o pycoqc_report.html
With BAM for alignment stats
pycoQC -f sequencing_summary.txt -a aligned.bam -o pycoqc_report.html
Calculate N50
With seqkit
seqkit stats -a reads.fastq.gz | grep N50
Manual calculation
seqkit fx2tab -l reads.fastq.gz | cut -f 2 | sort -rn |
awk '{sum+=$1; len[NR]=$1} END {
target=sum/2; cumsum=0;
for(i=1; i<=NR; i++) {
cumsum+=len[i];
if(cumsum>=target) {print "N50:", len[i]; break}
}
}'
Parse FASTQ Quality in Python
import numpy as np from Bio import SeqIO
lengths = [] qualities = []
for record in SeqIO.parse('reads.fastq', 'fastq'): lengths.append(len(record)) qualities.append(np.mean(record.letter_annotations['phred_quality']))
print(f'Total reads: {len(lengths)}') print(f'Total bases: {sum(lengths):,}') print(f'Mean length: {np.mean(lengths):.0f}') print(f'Median length: {np.median(lengths):.0f}') print(f'Mean quality: {np.mean(qualities):.1f}')
NanoPlot Output Files
File Description
NanoStats.txt Summary statistics
NanoPlot-report.html Interactive report
LengthvsQualityScatterPlot Length vs Q plot
WeightedHistogramReadlength Read length distribution
Yield_By_Length Cumulative yield
Key Parameters - NanoPlot
Parameter Description
--fastq Input FASTQ
--bam Input BAM
--summary Sequencing summary
-o Output directory
-t Threads
--N50 Show N50 line
--plots Plot types
--format Output formats
Key Parameters - chopper
Parameter Default Description
-q 0 Min quality
-l 0 Min length
--maxlength inf Max length
--headcrop 0 Trim from start
--tailcrop 0 Trim from end
-t 4 Threads
Quality Thresholds
Q Score Accuracy Typical Use
Q7 ~80% Very low quality
Q10 ~90% Basic filtering
Q15 ~97% Moderate filtering
Q20 ~99% High quality (SUP)
Q30 ~99.9% Very high (HiFi)
Related Skills
-
long-read-alignment - Align filtered reads
-
sequence-io - FASTQ handling
-
medaka-polishing - Polish with filtered reads