GWAS Catalog Database
Overview
The GWAS Catalog is a comprehensive repository of published genome-wide association studies maintained by the National Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EBI). The catalog contains curated SNP-trait associations from thousands of GWAS publications, including genetic variants, associated traits and diseases, p-values, effect sizes, and full summary statistics for many studies.
When to Use This Skill
This skill should be used when queries involve:
-
Genetic variant associations: Finding SNPs associated with diseases or traits
-
SNP lookups: Retrieving information about specific genetic variants (rs IDs)
-
Trait/disease searches: Discovering genetic associations for phenotypes
-
Gene associations: Finding variants in or near specific genes
-
GWAS summary statistics: Accessing complete genome-wide association data
-
Study metadata: Retrieving publication and cohort information
-
Population genetics: Exploring ancestry-specific associations
-
Polygenic risk scores: Identifying variants for risk prediction models
-
Functional genomics: Understanding variant effects and genomic context
-
Systematic reviews: Comprehensive literature synthesis of genetic associations
Core Capabilities
- Understanding GWAS Catalog Data Structure
The GWAS Catalog is organized around four core entities:
-
Studies: GWAS publications with metadata (PMID, author, cohort details)
-
Associations: SNP-trait associations with statistical evidence (p ≤ 5×10⁻⁸)
-
Variants: Genetic markers (SNPs) with genomic coordinates and alleles
-
Traits: Phenotypes and diseases (mapped to EFO ontology terms)
Key Identifiers:
-
Study accessions: GCST IDs (e.g., GCST001234)
-
Variant IDs: rs numbers (e.g., rs7903146) or variant_id format
-
Trait IDs: EFO terms (e.g., EFO_0001360 for type 2 diabetes)
-
Gene symbols: HGNC approved names (e.g., TCF7L2)
- Web Interface Searches
The web interface at https://www.ebi.ac.uk/gwas/ supports multiple search modes:
By Variant (rs ID):
rs7903146
Returns all trait associations for this SNP.
By Disease/Trait:
type 2 diabetes Parkinson disease body mass index
Returns all associated genetic variants.
By Gene:
APOE TCF7L2
Returns variants in or near the gene region.
By Chromosomal Region:
10:114000000-115000000
Returns variants in the specified genomic interval.
By Publication:
PMID:20581827 Author: McCarthy MI GCST001234
Returns study details and all reported associations.
- REST API Access
The GWAS Catalog provides two REST APIs for programmatic access:
Base URLs:
-
GWAS Catalog API: https://www.ebi.ac.uk/gwas/rest/api
-
Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/api
API Documentation:
-
Main API docs: https://www.ebi.ac.uk/gwas/rest/docs/api
-
Summary stats docs: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
Core Endpoints:
Studies endpoint - /studies/{accessionID}
import requests
Get a specific study
url = "https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001795" response = requests.get(url, headers={"Content-Type": "application/json"}) study = response.json()
Associations endpoint - /associations
Find associations for a variant
variant = "rs7903146" url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json()
Variants endpoint - /singleNucleotidePolymorphisms/{rsID}
Get variant details
url = "https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/rs7903146" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_info = response.json()
Traits endpoint - /efoTraits/{efoID}
Get trait information
url = "https://www.ebi.ac.uk/gwas/rest/api/efoTraits/EFO_0001360" response = requests.get(url, headers={"Content-Type": "application/json"}) trait_info = response.json()
- Query Examples and Patterns
Example 1: Find all associations for a disease
import requests
trait = "EFO_0001360" # Type 2 diabetes base_url = "https://www.ebi.ac.uk/gwas/rest/api"
Query associations for this trait
url = f"{base_url}/efoTraits/{trait}/associations" response = requests.get(url, headers={"Content-Type": "application/json"}) associations = response.json()
Process results
for assoc in associations.get('_embedded', {}).get('associations', []): variant = assoc.get('rsId') pvalue = assoc.get('pvalue') risk_allele = assoc.get('strongestAllele') print(f"{variant}: p={pvalue}, risk allele={risk_allele}")
Example 2: Get variant information and all trait associations
import requests
variant = "rs7903146" base_url = "https://www.ebi.ac.uk/gwas/rest/api"
Get variant details
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}" response = requests.get(url, headers={"Content-Type": "application/json"}) variant_data = response.json()
Get all associations for this variant
url = f"{base_url}/singleNucleotidePolymorphisms/{variant}/associations" params = {"projection": "associationBySnp"} response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) associations = response.json()
Extract trait names and p-values
for assoc in associations.get('_embedded', {}).get('associations', []): trait = assoc.get('efoTrait') pvalue = assoc.get('pvalue') print(f"Trait: {trait}, p-value: {pvalue}")
Example 3: Access summary statistics
import requests
Query summary statistics API
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api"
Find associations by trait with p-value threshold
trait = "EFO_0001360" # Type 2 diabetes p_upper = "0.000000001" # p < 1e-9 url = f"{base_url}/traits/{trait}/associations" params = { "p_upper": p_upper, "size": 100 # Number of results } response = requests.get(url, params=params) results = response.json()
Process genome-wide significant hits
for hit in results.get('_embedded', {}).get('associations', []): variant_id = hit.get('variant_id') chromosome = hit.get('chromosome') position = hit.get('base_pair_location') pvalue = hit.get('p_value') print(f"{chromosome}:{position} ({variant_id}): p={pvalue}")
Example 4: Query by chromosomal region
import requests
Find variants in a specific genomic region
chromosome = "10" start_pos = 114000000 end_pos = 115000000
base_url = "https://www.ebi.ac.uk/gwas/rest/api" url = f"{base_url}/singleNucleotidePolymorphisms/search/findByChromBpLocationRange" params = { "chrom": chromosome, "bpStart": start_pos, "bpEnd": end_pos } response = requests.get(url, params=params, headers={"Content-Type": "application/json"}) variants_in_region = response.json()
- Working with Summary Statistics
The GWAS Catalog hosts full summary statistics for many studies, providing access to all tested variants (not just genome-wide significant hits).
Access Methods:
-
FTP download: http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
-
REST API: Query-based access to summary statistics
-
Web interface: Browse and download via the website
Summary Statistics API Features:
-
Filter by chromosome, position, p-value
-
Query specific variants across studies
-
Retrieve effect sizes and allele frequencies
-
Access harmonized and standardized data
Example: Download summary statistics for a study
import requests import gzip
Get available summary statistics
base_url = "https://www.ebi.ac.uk/gwas/summary-statistics/api" url = f"{base_url}/studies/GCST001234" response = requests.get(url) study_info = response.json()
Download link is provided in the response
Alternatively, use FTP:
ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/
- Data Integration and Cross-referencing
The GWAS Catalog provides links to external resources:
Genomic Databases:
-
Ensembl: Gene annotations and variant consequences
-
dbSNP: Variant identifiers and population frequencies
-
gnomAD: Population allele frequencies
Functional Resources:
-
Open Targets: Target-disease associations
-
PGS Catalog: Polygenic risk scores
-
UCSC Genome Browser: Genomic context
Phenotype Resources:
-
EFO (Experimental Factor Ontology): Standardized trait terms
-
OMIM: Disease gene relationships
-
Disease Ontology: Disease hierarchies
Following Links in API Responses:
import requests
API responses include _links for related resources
response = requests.get("https://www.ebi.ac.uk/gwas/rest/api/studies/GCST001234") study = response.json()
Follow link to associations
associations_url = study['_links']['associations']['href'] associations_response = requests.get(associations_url)
Query Workflows
Workflow 1: Exploring Genetic Associations for a Disease
Identify the trait using EFO terms or free text:
-
Search web interface for disease name
-
Note the EFO ID (e.g., EFO_0001360 for type 2 diabetes)
Query associations via API:
url = f"https://www.ebi.ac.uk/gwas/rest/api/efoTraits/{efo_id}/associations"
Filter by significance and population:
-
Check p-values (genome-wide significant: p ≤ 5×10⁻⁸)
-
Review ancestry information in study metadata
-
Filter by sample size or discovery/replication status
Extract variant details:
-
rs IDs for each association
-
Effect alleles and directions
-
Effect sizes (odds ratios, beta coefficients)
-
Population allele frequencies
Cross-reference with other databases:
-
Look up variant consequences in Ensembl
-
Check population frequencies in gnomAD
-
Explore gene function and pathways
Workflow 2: Investigating a Specific Genetic Variant
Query the variant:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}"
Retrieve all trait associations:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/{rs_id}/associations"
Analyze pleiotropy:
-
Identify all traits associated with this variant
-
Review effect directions across traits
-
Look for shared biological pathways
Check genomic context:
-
Determine nearby genes
-
Identify if variant is in coding/regulatory regions
-
Review linkage disequilibrium with other variants
Workflow 3: Gene-Centric Association Analysis
Search by gene symbol in web interface or:
url = f"https://www.ebi.ac.uk/gwas/rest/api/singleNucleotidePolymorphisms/search/findByGene" params = {"geneName": gene_symbol}
Retrieve variants in gene region:
-
Get chromosomal coordinates for gene
-
Query variants in region
-
Include promoter and regulatory regions (extend boundaries)
Analyze association patterns:
-
Identify traits associated with variants in this gene
-
Look for consistent associations across studies
-
Review effect sizes and directions
Functional interpretation:
-
Determine variant consequences (missense, regulatory, etc.)
-
Check expression QTL (eQTL) data
-
Review pathway and network context
Workflow 4: Systematic Review of Genetic Evidence
Define research question:
-
Specific trait or disease of interest
-
Population considerations
-
Study design requirements
Comprehensive variant extraction:
-
Query all associations for trait
-
Set significance threshold
-
Note discovery and replication studies
Quality assessment:
-
Review study sample sizes
-
Check for population diversity
-
Assess heterogeneity across studies
-
Identify potential biases
Data synthesis:
-
Aggregate associations across studies
-
Perform meta-analysis if applicable
-
Create summary tables
-
Generate Manhattan or forest plots
Export and documentation:
-
Download full association data
-
Export summary statistics if needed
-
Document search strategy and date
-
Create reproducible analysis scripts
Workflow 5: Accessing and Analyzing Summary Statistics
Identify studies with summary statistics:
-
Browse summary statistics portal
-
Check FTP directory listings
-
Query API for available studies
Download summary statistics:
Via FTP
wget ftp://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/GCSTXXXXXX/harmonised/GCSTXXXXXX-harmonised.tsv.gz
Query via API for specific variants:
url = f"https://www.ebi.ac.uk/gwas/summary-statistics/api/chromosomes/{chrom}/associations" params = {"start": start_pos, "end": end_pos}
Process and analyze:
-
Filter by p-value thresholds
-
Extract effect sizes and confidence intervals
-
Perform downstream analyses (fine-mapping, colocalization, etc.)
Response Formats and Data Fields
Key Fields in Association Records:
-
rsId : Variant identifier (rs number)
-
strongestAllele : Risk allele for the association
-
pvalue : Association p-value
-
pvalueText : P-value as text (may include inequality)
-
orPerCopyNum : Odds ratio or beta coefficient
-
betaNum : Effect size (for quantitative traits)
-
betaUnit : Unit of measurement for beta
-
range : Confidence interval
-
efoTrait : Associated trait name
-
mappedLabel : EFO-mapped trait term
Study Metadata Fields:
-
accessionId : GCST study identifier
-
pubmedId : PubMed ID
-
author : First author
-
publicationDate : Publication date
-
ancestryInitial : Discovery population ancestry
-
ancestryReplication : Replication population ancestry
-
sampleSize : Total sample size
Pagination: Results are paginated (default 20 items per page). Navigate using:
-
size parameter: Number of results per page
-
page parameter: Page number (0-indexed)
-
_links in response: URLs for next/previous pages
Best Practices
Query Strategy
-
Start with web interface to identify relevant EFO terms and study accessions
-
Use API for bulk data extraction and automated analyses
-
Implement pagination handling for large result sets
-
Cache API responses to minimize redundant requests
Data Interpretation
-
Always check p-value thresholds (genome-wide: 5×10⁻⁸)
-
Review ancestry information for population applicability
-
Consider sample size when assessing evidence strength
-
Check for replication across independent studies
-
Be aware of winner's curse in effect size estimates
Rate Limiting and Ethics
-
Respect API usage guidelines (no excessive requests)
-
Use summary statistics downloads for genome-wide analyses
-
Implement appropriate delays between API calls
-
Cache results locally when performing iterative analyses
-
Cite the GWAS Catalog in publications
Data Quality Considerations
-
GWAS Catalog curates published associations (may contain inconsistencies)
-
Effect sizes reported as published (may need harmonization)
-
Some studies report conditional or joint associations
-
Check for study overlap when combining results
-
Be aware of ascertainment and selection biases
Python Integration Example
Complete workflow for querying and analyzing GWAS data:
import requests import pandas as pd from time import sleep
def query_gwas_catalog(trait_id, p_threshold=5e-8): """ Query GWAS Catalog for trait associations
Args:
trait_id: EFO trait identifier (e.g., 'EFO_0001360')
p_threshold: P-value threshold for filtering
Returns:
pandas DataFrame with association results
"""
base_url = "https://www.ebi.ac.uk/gwas/rest/api"
url = f"{base_url}/efoTraits/{trait_id}/associations"
headers = {"Content-Type": "application/json"}
results = []
page = 0
while True:
params = {"page": page, "size": 100}
response = requests.get(url, params=params, headers=headers)
if response.status_code != 200:
break
data = response.json()
associations = data.get('_embedded', {}).get('associations', [])
if not associations:
break
for assoc in associations:
pvalue = assoc.get('pvalue')
if pvalue and float(pvalue) <= p_threshold:
results.append({
'variant': assoc.get('rsId'),
'pvalue': pvalue,
'risk_allele': assoc.get('strongestAllele'),
'or_beta': assoc.get('orPerCopyNum') or assoc.get('betaNum'),
'trait': assoc.get('efoTrait'),
'pubmed_id': assoc.get('pubmedId')
})
page += 1
sleep(0.1) # Rate limiting
return pd.DataFrame(results)
Example usage
df = query_gwas_catalog('EFO_0001360') # Type 2 diabetes print(df.head()) print(f"\nTotal associations: {len(df)}") print(f"Unique variants: {df['variant'].nunique()}")
Resources
references/api_reference.md
Comprehensive API documentation including:
-
Detailed endpoint specifications for both APIs
-
Complete list of query parameters and filters
-
Response format specifications and field descriptions
-
Advanced query examples and patterns
-
Error handling and troubleshooting
-
Integration with external databases
Consult this reference when:
-
Constructing complex API queries
-
Understanding response structures
-
Implementing pagination or batch operations
-
Troubleshooting API errors
-
Exploring advanced filtering options
Training Materials
The GWAS Catalog team provides workshop materials:
-
GitHub repository: https://github.com/EBISPOT/GWAS_Catalog-workshop
-
Jupyter notebooks with example queries
-
Google Colab integration for cloud execution
Important Notes
Data Updates
-
The GWAS Catalog is updated regularly with new publications
-
Re-run queries periodically for comprehensive coverage
-
Summary statistics are added as studies release data
-
EFO mappings may be updated over time
Citation Requirements
When using GWAS Catalog data, cite:
-
Sollis E, et al. (2023) The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Research. PMID: 37953337
-
Include access date and version when available
-
Cite original studies when discussing specific findings
Limitations
-
Not all GWAS publications are included (curation criteria apply)
-
Full summary statistics available for subset of studies
-
Effect sizes may require harmonization across studies
-
Population diversity is growing but historically limited
-
Some associations represent conditional or joint effects
Data Access
-
Web interface: Free, no registration required
-
REST APIs: Free, no API key needed
-
FTP downloads: Open access
-
Rate limiting applies to API (be respectful)
Additional Resources
-
GWAS Catalog website: https://www.ebi.ac.uk/gwas/
-
Documentation: https://www.ebi.ac.uk/gwas/docs
-
API documentation: https://www.ebi.ac.uk/gwas/rest/docs/api
-
Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/docs/
-
FTP site: http://ftp.ebi.ac.uk/pub/databases/gwas/
-
Training materials: https://github.com/EBISPOT/GWAS_Catalog-workshop
-
PGS Catalog (polygenic scores): https://www.pgscatalog.org/
-
Help and support: gwas-info@ebi.ac.uk