bio-de-edger-basics

Differential expression analysis using edgeR's quasi-likelihood framework for RNA-seq count data.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-de-edger-basics" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-de-edger-basics

edgeR Basics

Differential expression analysis using edgeR's quasi-likelihood framework for RNA-seq count data.

Required Libraries

library(edgeR) library(limma) # For design matrices and voom

Installation

if (!require('BiocManager', quietly = TRUE)) install.packages('BiocManager') BiocManager::install('edgeR')

Creating DGEList Object

From count matrix

counts: matrix with genes as rows, samples as columns

group: factor indicating sample groups

y <- DGEList(counts = counts, group = group)

With gene annotation

y <- DGEList(counts = counts, group = group, genes = gene_info)

Check structure

y

Standard edgeR Workflow (Quasi-Likelihood)

Create DGEList

y <- DGEList(counts = counts, group = group)

Filter low-expression genes

keep <- filterByExpr(y, group = group) y <- y[keep, , keep.lib.sizes = FALSE]

Normalize (TMM by default)

y <- calcNormFactors(y)

Create design matrix

design <- model.matrix(~ group)

Estimate dispersion (optional in edgeR v4+ but improves BCV plots)

y <- estimateDisp(y, design)

Fit quasi-likelihood model

fit <- glmQLFit(y, design)

Perform quasi-likelihood F-test

qlf <- glmQLFTest(fit, coef = 2)

View top genes

topTags(qlf)

Filtering Low-Expression Genes

Automatic filtering (recommended)

keep <- filterByExpr(y, group = group) y <- y[keep, , keep.lib.sizes = FALSE]

Manual filtering: CPM threshold

keep <- rowSums(cpm(y) > 1) >= 2 # At least 2 samples with CPM > 1 y <- y[keep, , keep.lib.sizes = FALSE]

Filter by minimum counts

keep <- rowSums(y$counts >= 10) >= 3 # At least 3 samples with 10+ counts y <- y[keep, , keep.lib.sizes = FALSE]

Normalization Methods

TMM normalization (default, recommended)

y <- calcNormFactors(y, method = 'TMM')

Alternative methods

y <- calcNormFactors(y, method = 'RLE') # Relative Log Expression y <- calcNormFactors(y, method = 'upperquartile') y <- calcNormFactors(y, method = 'none') # No normalization

View normalization factors

y$samples$norm.factors

Design Matrices

Simple two-group comparison

design <- model.matrix(~ group)

With batch correction

design <- model.matrix(~ batch + group)

Interaction model

design <- model.matrix(~ genotype + treatment + genotype:treatment)

No intercept (for direct group comparisons)

design <- model.matrix(~ 0 + group) colnames(design) <- levels(group)

Dispersion Estimation

Estimate all dispersions

y <- estimateDisp(y, design)

Or estimate separately

y <- estimateGLMCommonDisp(y, design) y <- estimateGLMTrendedDisp(y, design) y <- estimateGLMTagwiseDisp(y, design)

View dispersions

y$common.dispersion y$trended.dispersion y$tagwise.dispersion

Plot BCV (biological coefficient of variation)

plotBCV(y)

Quasi-Likelihood Testing

Fit QL model

fit <- glmQLFit(y, design)

Test specific coefficient

qlf <- glmQLFTest(fit, coef = 2)

Test with contrast

contrast <- makeContrasts(groupB - groupA, levels = design) qlf <- glmQLFTest(fit, contrast = contrast)

Test multiple coefficients (ANOVA-like)

qlf <- glmQLFTest(fit, coef = 2:3)

Making Contrasts

Design without intercept

design <- model.matrix(~ 0 + group) colnames(design) <- levels(group) y <- estimateDisp(y, design) fit <- glmQLFit(y, design)

Pairwise comparisons

contrast <- makeContrasts( TreatedVsControl = treated - control, DrugAVsControl = drugA - control, DrugBVsControl = drugB - control, DrugAVsDrugB = drugA - drugB, levels = design )

Test each contrast

qlf_treated <- glmQLFTest(fit, contrast = contrast[, 'TreatedVsControl']) qlf_drugA <- glmQLFTest(fit, contrast = contrast[, 'DrugAVsControl'])

Accessing Results

Top differentially expressed genes

topTags(qlf, n = 20)

All results as data frame

results <- topTags(qlf, n = Inf)$table

Summary of DE genes at different thresholds

summary(decideTests(qlf))

Get DE genes with specific cutoffs

de_genes <- topTags(qlf, n = Inf, p.value = 0.05)$table

Result Columns

Column Description

logFC

Log2 fold change

logCPM

Average log2 counts per million

F

Quasi-likelihood F-statistic

PValue

Raw p-value

FDR

False discovery rate (adjusted p-value)

Alternative: Exact Test (Classic edgeR)

For simple two-group comparison only

y <- DGEList(counts = counts, group = group) y <- calcNormFactors(y) y <- estimateDisp(y)

Exact test

et <- exactTest(y) topTags(et)

Alternative: glmLRT (Likelihood Ratio Test)

Fit GLM

fit <- glmFit(y, design)

Likelihood ratio test

lrt <- glmLRT(fit, coef = 2) topTags(lrt)

Treat Test (Log Fold Change Threshold)

Test for |logFC| > threshold

tr <- glmTreat(fit, coef = 2, lfc = log2(1.5)) # |FC| > 1.5 topTags(tr)

Multi-Factor Designs

Design with batch correction

design <- model.matrix(~ batch + condition, data = sample_info) y <- estimateDisp(y, design) fit <- glmQLFit(y, design)

Test condition effect (controlling for batch)

Condition coefficient is typically the last

qlf <- glmQLFTest(fit, coef = ncol(design))

Getting Normalized Counts

Counts per million (CPM)

cpm_values <- cpm(y)

Log2 CPM

log_cpm <- cpm(y, log = TRUE)

RPM (reads per million, same as CPM)

rpm_values <- cpm(y)

With prior count for log transformation

log_cpm <- cpm(y, log = TRUE, prior.count = 2)

Exporting Results

Get all results

all_results <- topTags(qlf, n = Inf)$table

Add gene IDs as column

all_results$gene_id <- rownames(all_results)

Write to file

write.csv(all_results, file = 'edger_results.csv', row.names = FALSE)

Export normalized counts

write.csv(cpm(y), file = 'cpm_values.csv')

Common Errors

Error Cause Solution

"design matrix not full rank" Confounded variables Check sample metadata

"No residual df" Too few samples Need more replicates

"NA/NaN/Inf" Zero counts in all samples Filter more stringently

Deprecated/Changed Functions

Old Status New

decidetestsDGE()

Removed (v4.4) decideTests()

glmFit()

  • glmLRT()

Still works Prefer glmQLFit()

  • glmQLFTest()

estimateDisp()

Optional (v4+) glmQLFit() estimates internally

mglmLS() , mglmSimple()

Retired mglmLevenberg() or mglmOneWay()

Note: calcNormFactors() and normLibSizes() are synonyms - both work.

Quick Reference: Workflow Steps

1. Create DGEList

y <- DGEList(counts = counts, group = group)

2. Filter low-expression genes

keep <- filterByExpr(y, group = group) y <- y[keep, , keep.lib.sizes = FALSE]

3. Normalize

y <- calcNormFactors(y)

4. Create design matrix

design <- model.matrix(~ group)

5. Estimate dispersion (optional in v4+)

y <- estimateDisp(y, design)

6. Fit quasi-likelihood model

fit <- glmQLFit(y, design)

7. Test for DE

qlf <- glmQLFTest(fit, coef = 2)

8. Get results

topTags(qlf, n = 20)

Choosing edgeR vs DESeq2

Aspect edgeR DESeq2

Model Negative binomial + QL Negative binomial

Shrinkage Empirical Bayes on dispersions Shrinkage estimators for LFC

Small samples Robust with QL framework Good with shrinkage

Speed Generally faster Slower for large datasets

Output F-statistic, FDR Wald statistic, padj

Related Skills

  • deseq2-basics - Alternative DE analysis with DESeq2

  • de-visualization - MA plots, volcano plots, heatmaps

  • de-results - Extract and export significant genes

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

bio-microbiome-diversity-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

bio-metabolomics-statistical-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

bio-proteomics-dia-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

bio-spatial-transcriptomics-image-analysis

No summary provided by upstream source.

Repository SourceNeeds Review