Entrez Link
Navigate between NCBI databases using Biopython's Entrez module (ELink utility).
Required Setup
from Bio import Entrez
Entrez.email = 'your.email@example.com' # Required by NCBI Entrez.api_key = 'your_api_key' # Optional, raises rate limit
Core Function
Entrez.elink() - Cross-Database Links
Find related records in the same or different databases.
Find proteins linked to a gene
handle = Entrez.elink(dbfrom='gene', db='protein', id='672') record = Entrez.read(handle) handle.close()
Extract linked IDs
linkset = record[0] if linkset['LinkSetDb']: links = linkset['LinkSetDb'][0]['Link'] protein_ids = [link['Id'] for link in links] print(f"Found {len(protein_ids)} linked proteins")
Key Parameters:
Parameter Description Example
dbfrom
Source database 'gene'
db
Target database 'protein'
id
Source record ID(s) '672' or '672,675'
linkname
Specific link type 'gene_protein_refseq'
cmd
Link command 'neighbor' , 'neighbor_score'
ELink Result Structure
record[0] # First linkset record[0]['DbFrom'] # Source database record[0]['IdList'] # Input IDs record[0]['LinkSetDb'] # List of link results record[0]['LinkSetDb'][0]['DbTo'] # Target database record[0]['LinkSetDb'][0]['LinkName'] # Link name record[0]['LinkSetDb'][0]['Link'] # List of linked records record[0]['LinkSetDb'][0]['Link'][0]['Id'] # Linked ID
Common Link Paths
Gene to Other Databases
From To Link Name Description
gene protein gene_protein
All proteins
gene protein gene_protein_refseq
RefSeq proteins only
gene nucleotide gene_nuccore
Nucleotide sequences
gene nucleotide gene_nuccore_refseqrna
RefSeq mRNA
gene pubmed gene_pubmed
Related publications
gene homologene gene_homologene
Homologs
gene snp gene_snp
SNPs in gene
gene clinvar gene_clinvar
Clinical variants
Nucleotide to Other Databases
From To Link Name Description
nucleotide protein nuccore_protein
Encoded proteins
nucleotide gene nuccore_gene
Gene records
nucleotide pubmed nuccore_pubmed
Publications
nucleotide taxonomy nuccore_taxonomy
Organism taxonomy
nucleotide biosample nuccore_biosample
Sample info
nucleotide sra nuccore_sra
Related SRA data
Protein to Other Databases
From To Link Name Description
protein nucleotide protein_nuccore
Coding sequences
protein gene protein_gene
Gene records
protein pubmed protein_pubmed
Publications
protein structure protein_structure
3D structures
protein cdd protein_cdd
Conserved domains
PubMed Links
From To Link Name Description
pubmed pubmed pubmed_pubmed
Related articles
pubmed gene pubmed_gene
Mentioned genes
pubmed protein pubmed_protein
Mentioned proteins
pubmed nucleotide pubmed_nuccore
Mentioned sequences
Code Patterns
Gene to Protein
from Bio import Entrez
Entrez.email = 'your.email@example.com'
def get_proteins_for_gene(gene_id): handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq') record = Entrez.read(handle) handle.close()
if not record[0]['LinkSetDb']:
return []
return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]
protein_ids = get_proteins_for_gene('672') # BRCA1 print(f"RefSeq proteins: {protein_ids[:5]}")
Nucleotide to Gene
def get_gene_for_nucleotide(nuc_id): handle = Entrez.elink(dbfrom='nucleotide', db='gene', id=nuc_id) record = Entrez.read(handle) handle.close()
if not record[0]['LinkSetDb']:
return None
return record[0]['LinkSetDb'][0]['Link'][0]['Id']
gene_id = get_gene_for_nucleotide('NM_007294') print(f"Gene ID: {gene_id}")
Find Related PubMed Articles
def get_related_articles(pmid, max_results=10): handle = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid, linkname='pubmed_pubmed') record = Entrez.read(handle) handle.close()
if not record[0]['LinkSetDb']:
return []
links = record[0]['LinkSetDb'][0]['Link']
return [link['Id'] for link in links[:max_results]]
related = get_related_articles('35412348') print(f"Related articles: {related}")
Get All Available Links
def discover_links(db, record_id): handle = Entrez.elink(dbfrom=db, id=record_id, cmd='acheck') record = Entrez.read(handle) handle.close()
links = {}
for linkset in record[0].get('LinkSetDb', []):
links[linkset['LinkName']] = linkset['DbTo']
return links
available = discover_links('gene', '672') for name, target in available.items(): print(f"{name} -> {target}")
Navigate Gene -> Protein -> Structure
def gene_to_structures(gene_id): # Gene to protein handle = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq') record = Entrez.read(handle) handle.close()
if not record[0]['LinkSetDb']:
return []
protein_ids = [link['Id'] for link in record[0]['LinkSetDb'][0]['Link'][:5]]
# Protein to structure
handle = Entrez.elink(dbfrom='protein', db='structure', id=','.join(protein_ids))
record = Entrez.read(handle)
handle.close()
structure_ids = []
for linkset in record:
if linkset['LinkSetDb']:
structure_ids.extend([link['Id'] for link in linkset['LinkSetDb'][0]['Link']])
return structure_ids
structures = gene_to_structures('672') print(f"Structure IDs: {structures[:5]}")
Link Multiple IDs at Once
def batch_link(dbfrom, db, ids): if isinstance(ids, list): ids = ','.join(ids)
handle = Entrez.elink(dbfrom=dbfrom, db=db, id=ids)
record = Entrez.read(handle)
handle.close()
# Returns one linkset per input ID
results = {}
for linkset in record:
source_id = linkset['IdList'][0]
linked_ids = []
if linkset['LinkSetDb']:
linked_ids = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']]
results[source_id] = linked_ids
return results
results = batch_link('gene', 'protein', ['672', '675', '7157']) for gene, proteins in results.items(): print(f"Gene {gene}: {len(proteins)} proteins")
Get Publications for a Sequence
def get_sequence_publications(accession): # First get the GI/UID handle = Entrez.esearch(db='nucleotide', term=f'{accession}[accn]') search = Entrez.read(handle) handle.close()
if not search['IdList']:
return []
uid = search['IdList'][0]
# Link to PubMed
handle = Entrez.elink(dbfrom='nucleotide', db='pubmed', id=uid)
record = Entrez.read(handle)
handle.close()
if not record[0]['LinkSetDb']:
return []
return [link['Id'] for link in record[0]['LinkSetDb'][0]['Link']]
pmids = get_sequence_publications('NM_007294') print(f"PubMed IDs: {pmids[:5]}")
Link Commands
Command Description
neighbor
Default - get linked records
neighbor_score
Include relevance scores
neighbor_history
Store results in history
acheck
List all available links
ncheck
Check if any links exist
lcheck
Check specific link exists
llinks
Get URLs to Entrez links
prlinks
Get provider links (external)
Common Errors
Error Cause Solution
Empty LinkSetDb
No links exist Check if record has linked data
HTTPError 400
Invalid ID or database Verify ID exists in source database
KeyError
Missing expected field Check if LinkSetDb is empty first
Single linkset expected, got list Multiple input IDs Iterate through record list
Decision Tree
Need to find related records? ├── Know what link you want? │ └── Use elink with specific linkname ├── Discover what links exist? │ └── Use elink with cmd='acheck' ├── Navigate to target database? │ └── Use elink(dbfrom=X, db=Y, id=Z) ├── Find related records in same database? │ └── Use elink(dbfrom=X, db=X) with neighbor ├── Chain multiple databases? │ └── Call elink multiple times └── Need the actual records? └── Use elink first, then efetch with IDs
Related Skills
-
entrez-search - Search databases before linking
-
entrez-fetch - Retrieve records after finding linked IDs
-
batch-downloads - Download many linked records efficiently