Drug Target Validation Pipeline
Validate drug target hypotheses using multi-dimensional computational evidence before committing to wet-lab work. Produces a quantitative Target Validation Score (0-100) with priority tier classification and GO/NO-GO recommendation.
KEY PRINCIPLES:
-
Report-first approach - Create report file FIRST, then populate progressively
-
Target disambiguation FIRST - Resolve all identifiers before analysis
-
Evidence grading - Grade all evidence as T1 (experimental) to T4 (computational)
-
Disease-specific - Tailor analysis to disease context when provided
-
Modality-aware - Consider small molecule vs biologics tractability
-
Safety-first - Prominently flag safety concerns early
-
Quantitative scoring - Every dimension scored numerically (0-100 composite)
-
Negative results documented - "No data" is data; empty sections are failures
-
Source references - Every statement must cite tool/database
-
Completeness checklist - Mandatory section showing analysis coverage
-
English-first queries - Always use English terms in tool calls. Respond in user's language
When to Use This Skill
Apply when users:
-
Ask "Is [target] a good drug target for [disease]?"
-
Need target validation or druggability assessment
-
Want to compare targets for drug discovery prioritization
-
Ask about safety risks of modulating a target
-
Need chemical starting points for target validation
-
Ask about pathway context for a target
-
Need a GO/NO-GO recommendation for a target
-
Want a comprehensive target dossier for investment decisions
NOT for (use other skills instead):
-
General target biology overview -> Use tooluniverse-target-research
-
Drug compound profiling -> Use tooluniverse-drug-research
-
Variant interpretation -> Use tooluniverse-variant-interpretation
-
Disease research -> Use tooluniverse-disease-research
Input Parameters
Parameter Required Description Example
target Yes Gene symbol, protein name, or UniProt ID EGFR , P00533 , Epidermal growth factor receptor
disease No Disease/indication for context Non-small cell lung cancer , Pancreatic cancer
modality No Preferred therapeutic modality small molecule , antibody , protein therapeutic , PROTAC
Target Validation Scoring System
Score Components (Total: 0-100)
Disease Association (0-30 points):
-
Genetic evidence: 0-10 (GWAS, rare variants, somatic mutations)
-
Literature evidence: 0-10 (publications, clinical studies)
-
Pathway evidence: 0-10 (disease pathway involvement)
Druggability (0-25 points):
-
Structural tractability: 0-10 (structure quality, binding pockets)
-
Chemical matter: 0-10 (known compounds, bioactivity data)
-
Target class: 0-5 (validated target family bonus)
Safety Profile (0-20 points):
-
Tissue expression selectivity: 0-5 (expression in critical tissues)
-
Genetic validation: 0-10 (knockout phenotypes, human genetics)
-
Known adverse events: 0-5 (safety signals from modulators)
Clinical Precedent (0-15 points):
-
Approved drugs: 15 (strong precedent, validated target)
-
Clinical trials: 10 (moderate precedent)
-
Preclinical compounds: 5 (weak precedent)
-
None: 0 (novel target)
Validation Evidence (0-10 points):
-
Functional studies: 0-5 (CRISPR, siRNA, biochemical)
-
Disease models: 0-5 (animal models, patient data)
Priority Tiers
Score Tier Recommendation
80-100 Tier 1 Highly validated - proceed with confidence
60-79 Tier 2 Good target - needs focused validation
40-59 Tier 3 Moderate risk - significant validation needed
0-39 Tier 4 High risk - consider alternatives
Evidence Grading System
Tier Symbol Criteria Examples
T1 [T1] Direct mechanistic, human clinical proof FDA-approved drug, crystal structure with mechanism, patient mutation
T2 [T2] Functional studies, model organism siRNA phenotype, mouse KO, biochemical assay, CRISPR screen
T3 [T3] Association, screen hits, computational GWAS hit, DepMap essentiality, expression correlation
T4 [T4] Mention, review, text-mined, predicted Review article, database annotation, AlphaFold prediction
Phase 0: Target Disambiguation & ID Resolution (ALWAYS FIRST)
Objective: Resolve target to ALL needed identifiers before any analysis.
Resolution Strategy
Step 1: Determine input type and get initial identifiers
If gene symbol (e.g., "EGFR"):
mygene = tu.tools.MyGene_query_genes(query="EGFR", species="human", fields="symbol,name,ensembl.gene,uniprot.Swiss-Prot,entrezgene")
Extract: ensembl_id, uniprot_id, entrez_id, symbol, name
If UniProt ID (e.g., "P00533"):
uniprot = tu.tools.UniProt_get_entry_by_accession(accession="P00533")
Extract: gene names, Ensembl xrefs, function
Step 2: Resolve Ensembl ID and get versioned ID for GTEx
ensembl = tu.tools.ensembl_lookup_gene(gene_id=ensembl_id, species="homo_sapiens")
CRITICAL: species parameter is REQUIRED
CRITICAL: Response is wrapped in {status, data, url, content_type} - access via ensembl['data']
ensembl_data = ensembl.get('data', ensembl) if isinstance(ensembl, dict) else ensembl
Extract: version for versioned_id (e.g., "ENSG00000146648.18")
Step 3: Get Ensembl cross-references
xrefs = tu.tools.ensembl_get_xrefs(id=ensembl_id)
Extract: HGNC, UniProt, EntrezGene mappings
Step 4: Get OpenTargets target info
ot_target = tu.tools.OpenTargets_get_target_id_description_by_name(targetName="EGFR")
Verify ensemblId matches
Step 5: Get ChEMBL target ID
chembl_targets = tu.tools.ChEMBL_search_targets(pref_name__contains="EGFR", organism="Homo sapiens", limit=5)
Extract: target_chembl_id for later use
Step 6: Get UniProt function summary
function_info = tu.tools.UniProt_get_function_by_accession(accession=uniprot_id)
Returns list of strings (NOT dict)
Step 7: Get alternative names for collision detection
alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=uniprot_id)
Identifier Resolution Output
1. Target Identity
| Database | Identifier | Verified |
|---|---|---|
| Gene Symbol | EGFR | Yes |
| Full Name | Epidermal growth factor receptor | Yes |
| Ensembl | ENSG00000146648 | Yes |
| Ensembl (versioned) | ENSG00000146648.18 | Yes |
| UniProt | P00533 | Yes |
| Entrez Gene | 1956 | Yes |
| ChEMBL | CHEMBL203 | Yes |
| HGNC | HGNC:3236 | Yes |
Protein Function: [from UniProt_get_function_by_accession] Subcellular Location: [from UniProt_get_subcellular_location_by_accession] Target Class: [from OpenTargets_get_target_classes_by_ensemblID]
Known Parameter Corrections
Tool WRONG Parameter CORRECT Parameter
ensembl_lookup_gene
id
gene_id (+ species="homo_sapiens" REQUIRED)
Reactome_map_uniprot_to_pathways
uniprot_id
id
ensembl_get_xrefs
gene_id
id
GTEx_get_median_gene_expression
gencode_id only gencode_id
- operation="median"
OpenTargets_*
ensemblID (uppercase) ensemblId (camelCase)
OpenTargets_get_publications_*
ensemblId
entityId
OpenTargets_get_associated_drugs_by_target_ensemblID
ensemblId only ensemblId
- size (REQUIRED)
MyGene_query_genes
q
query
PubMed_search_articles
returns {articles: [...]}
returns plain list of dicts
UniProt_get_function_by_accession
returns dict returns list of strings
HPA_get_rna_expression_by_source
ensembl_id
gene_name
- source_type
- source_name (ALL required)
alphafold_get_prediction
uniprot_accession
qualifier
drugbank_get_safety_*
simple params query , case_sensitive , exact_match , limit (ALL required)
Phase 1: Disease Association Evidence (0-30 points)
Objective: Quantify the strength of target-disease association from genetic, literature, and pathway evidence.
1A. OpenTargets Disease Associations (Primary)
Get ALL disease associations for target
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensembl(ensemblId=ensembl_id)
If specific disease provided, get detailed evidence
if disease_name: disease_info = tu.tools.OpenTargets_get_disease_id_description_by_name(diseaseName=disease_name) efo_id = disease_info.get('id') # e.g., "EFO_0003060"
evidence = tu.tools.OpenTargets_target_disease_evidence(
efoId=efo_id, ensemblId=ensembl_id
)
# Get evidence by data source for detailed breakdown
datasource_evidence = tu.tools.OpenTargets_get_evidence_by_datasource(
efoId=efo_id, ensemblId=ensembl_id,
datasourceIds=["ot_genetics_portal", "eva", "gene2phenotype", "genomics_england", "uniprot_literature"],
size=100
)
1B. GWAS Genetic Evidence
GWAS associations for target gene
gwas_snps = tu.tools.gwas_get_snps_for_gene(mapped_gene=gene_symbol, size=50)
If specific disease, search for trait-specific associations
if disease_name: gwas_studies = tu.tools.gwas_search_studies(query=disease_name, size=20)
1C. Constraint Scores (gnomAD)
Genetic constraint - intolerance to loss of function
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
Extract: pLI, LOEUF, missense_z, pRec
High pLI (>0.9) = highly intolerant to LoF = likely essential
1D. Literature Evidence
PubMed for target-disease association
articles = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND "{disease_name}" AND (target OR therapeutic OR inhibitor)', limit=50 )
PubMed_search_articles returns a plain list of dicts
OpenTargets publications
pubs = tu.tools.OpenTargets_get_publications_by_target_ensemblID(entityId=ensembl_id)
Scoring Logic - Disease Association
Genetic Evidence (0-10):
- GWAS hits for specific disease: +3 per significant locus (max 6)
- Rare variant evidence (ClinVar pathogenic): +2
- Somatic mutations in disease: +2
- pLI > 0.9 (essential gene): +2
Literature Evidence (0-10):
-
100 publications on target+disease: 10
- 50-100 publications: 7
- 10-50 publications: 5
- 1-10 publications: 3
- 0 publications: 0
Pathway Evidence (0-10):
- OpenTargets overall score > 0.8: 10
- Score 0.5-0.8: 7
- Score 0.2-0.5: 4
- Score < 0.2: 1
Phase 2: Druggability Assessment (0-25 points)
Objective: Assess whether the target is amenable to therapeutic intervention.
2A. OpenTargets Tractability
Tractability assessment across modalities
tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblID(ensemblId=ensembl_id)
Returns: label, modality (SM, AB, PR, OC), value (boolean/score)
Modalities: Small Molecule, Antibody, PROTAC, Other Clinical
2B. Target Class & Family
Target classification (kinase, GPCR, ion channel, etc.)
target_classes = tu.tools.OpenTargets_get_target_classes_by_ensemblID(ensemblId=ensembl_id)
Pharos target development level
pharos = tu.tools.Pharos_get_target(gene=gene_symbol)
TDL: Tclin (approved drug) > Tchem (compounds) > Tbio (biology) > Tdark (unknown)
DGIdb druggability categories
druggability = tu.tools.DGIdb_get_gene_druggability(genes=[gene_symbol])
2C. Structural Tractability
PDB structures available
if uniprot_id: uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id) # Extract PDB cross-references from entry
AlphaFold prediction
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id) alphafold_summary = tu.tools.alphafold_get_summary(qualifier=uniprot_id)
For top PDB structures, analyze binding pockets
ProteinsPlus DoGSiteScorer for pocket detection
for pdb_id in top_pdb_ids[:3]: pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=pdb_id) # Returns predicted druggable pockets with scores
2D. Chemical Probes & Enabling Packages
Chemical probes (validated tool compounds)
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblID(ensemblId=ensembl_id)
Target Enabling Packages (TEPs)
teps = tu.tools.OpenTargets_get_target_enabling_packages_by_ensemblID(ensemblId=ensembl_id)
Scoring Logic - Druggability
Structural Tractability (0-10):
- High-res co-crystal structure with ligand: 10
- PDB structure available, pockets detected: 7
- AlphaFold only, confident pocket prediction: 5
- AlphaFold low confidence / no structure: 2
- No structural data: 0
Chemical Matter (0-10):
- Known drug-like compounds (IC50 < 100nM): 10
- Tool compounds (IC50 < 1uM): 7
- HTS hits only (IC50 > 1uM): 4
- No known ligands: 0
Target Class Bonus (0-5):
- Validated druggable family (kinase, GPCR, nuclear receptor): 5
- Enzyme, ion channel: 4
- Protein-protein interaction, transporter: 2
- Novel/unknown class: 0
Phase 3: Known Modulators & Chemical Matter (Feeds into Phase 2 scoring)
Objective: Identify existing chemical starting points for target validation.
3A. ChEMBL Bioactivity
Search for ChEMBL target
chembl_targets = tu.tools.ChEMBL_search_targets( pref_name__contains=gene_symbol, organism="Homo sapiens", limit=10 )
Get activities for best matching target
target_chembl_id = chembl_targets[0]['target_chembl_id'] activities = tu.tools.ChEMBL_get_target_activities( target_chembl_id__exact=target_chembl_id, limit=100 )
Parse: compound IDs, pChEMBL values, activity types (IC50, Ki, Kd)
Filter: potent compounds (pChEMBL >= 6.0 = IC50 <= 1uM)
3B. BindingDB Ligands
Experimental binding data
ligands = tu.tools.BindingDB_get_ligands_by_uniprot( uniprot=uniprot_id, affinity_cutoff=10000 # nM )
Returns: SMILES, affinity_type (Ki/IC50/Kd), affinity value, PMID
3C. PubChem Bioassays
HTS screening data
assays = tu.tools.PubChem_search_assays_by_target_gene(gene_symbol=gene_symbol)
Get details for top assays
for aid in assay_ids[:5]: summary = tu.tools.PubChem_get_assay_summary(aid=str(aid)) targets = tu.tools.PubChem_get_assay_targets(aid=str(aid)) actives = tu.tools.PubChem_get_assay_active_compounds(aid=str(aid))
3D. Known Drugs Targeting This Protein
OpenTargets known drugs
drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblID( ensemblId=ensembl_id, size=25 )
ChEMBL drug mechanisms
drug_mechanisms = tu.tools.ChEMBL_search_mechanisms( target_chembl_id=target_chembl_id, limit=50 )
Drug interaction databases
dgidb = tu.tools.DGIdb_get_gene_info(genes=[gene_symbol])
Report Format - Chemical Matter
4. Known Modulators & Chemical Matter
4.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Indication | Source |
|---|---|---|---|---|---|
| Erlotinib | CHEMBL553 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |
| Gefitinib | CHEMBL939 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |
4.2 ChEMBL Bioactivity Summary
Total Activities: 12,456 datapoints across 2,341 assays Most Potent Compound: CHEMBL413456 (IC50 = 0.3 nM) [T1] Chemical Series: 8 distinct scaffolds with pChEMBL >= 7.0 Selectivity Data: Available for 45 compounds (kinase panel)
4.3 BindingDB Ligands
Total Ligands: 856 with measured affinity Best Affinity: 0.1 nM (Ki) Affinity Distribution: <1nM: 23, 1-10nM: 89, 10-100nM: 234, 100nM-1uM: 510
4.4 Chemical Probes
| Probe | Source | Potency | Selectivity | Use |
|---|---|---|---|---|
| SGC-1234 | SGC | IC50=5nM | >100x | In vitro |
Phase 4: Clinical Precedent (0-15 points)
Objective: Assess clinical validation from approved drugs and clinical trials.
4A. FDA-Approved Drugs
FDA label information
fda_moa = tu.tools.FDA_get_mechanism_of_action_by_drug_name(drug_name=gene_symbol) fda_indications = tu.tools.FDA_get_indications_by_drug_name(drug_name=known_drug_name)
DrugBank pharmacology
drugbank_targets = tu.tools.drugbank_get_targets_by_drug_name_or_drugbank_id( query=known_drug_name, case_sensitive=False, exact_match=False, limit=10 )
DrugBank safety info
drugbank_safety = tu.tools.drugbank_get_safety_by_drug_name_or_drugbank_id( query=known_drug_name, case_sensitive=False, exact_match=False, limit=10 )
4B. Clinical Trials
Active clinical trials targeting this protein
trials = tu.tools.search_clinical_trials( query_term=gene_symbol, intervention=gene_symbol, pageSize=50 )
If specific disease context
if disease_name: disease_trials = tu.tools.search_clinical_trials( query_term=gene_symbol, condition=disease_name, pageSize=50 )
4C. Failed Programs (Learn from Failures)
Drug warnings and withdrawals
for drug_chembl_id in known_drug_ids: warnings = tu.tools.OpenTargets_get_drug_warnings_by_chemblId(chemblId=drug_chembl_id) adverse = tu.tools.OpenTargets_get_drug_adverse_events_by_chemblId(chemblId=drug_chembl_id)
Scoring Logic - Clinical Precedent
Clinical Precedent (0-15):
- FDA-approved drug for SAME disease: 15
- FDA-approved drug for DIFFERENT disease: 12
- Phase 3 clinical trial: 10
- Phase 2 clinical trial: 7
- Phase 1 clinical trial: 5
- Preclinical compounds only: 3
- No clinical development: 0
Adjustment factors:
- Failed clinical program for safety: -3
- Drug withdrawal: -5
- Multiple approved drugs (validated class): +2
Phase 5: Safety & Toxicity Considerations (0-20 points)
Objective: Identify safety risks from expression, genetics, and known adverse events.
5A. OpenTargets Safety Profile
safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblID(ensemblId=ensembl_id)
Returns: safety liabilities, adverse effects, experimental toxicity
5B. Expression in Critical Tissues
GTEx tissue expression (identifies essential organ expression)
gtex = tu.tools.GTEx_get_median_gene_expression( operation="median", gencode_id=ensembl_versioned_id )
If empty, try unversioned ID
HPA expression
NOTE: HPA_get_rna_expression_by_source requires gene_name, source_type, source_name
hpa = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol) hpa_details = tu.tools.HPA_get_comprehensive_gene_details_by_ensembl_id(ensembl_id=ensembl_id)
Check expression in safety-critical tissues
Heart, liver, kidney, brain, bone marrow = high risk if target is expressed
5C. Knockout Phenotypes
Mouse model phenotypes
mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblID(ensemblId=ensembl_id)
Genetic constraint (proxy for essentiality)
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
High pLI = essential gene = potential safety concern
5D. Known Adverse Events from Target Modulation
For known drugs targeting this protein
for drug_name in known_drug_names: fda_adr = tu.tools.FDA_get_adverse_reactions_by_drug_name(drug_name=drug_name) fda_warnings = tu.tools.FDA_get_warnings_and_cautions_by_drug_name(drug_name=drug_name) fda_boxed = tu.tools.FDA_get_boxed_warning_info_by_drug_name(drug_name=drug_name) fda_contraindications = tu.tools.FDA_get_contraindications_by_drug_name(drug_name=drug_name)
5E. Homologs & Off-Target Risks
Paralogs (close family members that might be hit)
homologs = tu.tools.OpenTargets_get_target_homologues_by_ensemblID(ensemblId=ensembl_id)
Paralogs with high sequence identity = selectivity challenge
Scoring Logic - Safety
Tissue Expression Selectivity (0-5):
- Target restricted to disease tissue: 5
- Low expression in heart/liver/kidney/brain: 4
- Moderate expression in 1-2 critical tissues: 2
- High expression in multiple critical tissues: 0
Genetic Validation (0-10):
- Mouse KO viable, no severe phenotype: 10
- Mouse KO viable with mild phenotype: 7
- Mouse KO has concerning phenotype: 3
- Mouse KO lethal: 0
- No KO data, low pLI (<0.5): 5
- No KO data, high pLI (>0.9): 2
Known Adverse Events (0-5):
- No known safety signals: 5
- Mild, manageable ADRs: 3
- Serious ADRs reported: 1
- Black box warning or drug withdrawal: 0
Phase 6: Pathway Context & Network Analysis
Objective: Understand the target's role in biological networks and disease pathways.
6A. Reactome Pathways
Map target to pathways
pathways = tu.tools.Reactome_map_uniprot_to_pathways(id=uniprot_id)
Get pathway details for top pathways
for pathway in top_pathways[:5]: detail = tu.tools.Reactome_get_pathway(id=pathway['stId']) reactions = tu.tools.Reactome_get_pathway_reactions(id=pathway['stId'])
6B. Protein-Protein Interactions
STRING network
string_ppi = tu.tools.STRING_get_protein_interactions( protein_ids=[gene_symbol], species=9606, confidence_score=0.7 )
Higher confidence = more reliable
IntAct interactions (experimental)
intact_ppi = tu.tools.intact_get_interactions(identifier=uniprot_id)
OpenTargets interactions
ot_ppi = tu.tools.OpenTargets_get_target_interactions_by_ensemblID(ensemblId=ensembl_id)
6C. Functional Enrichment
GO annotations
go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblID(ensemblId=ensembl_id)
Direct GO query
go_annotations = tu.tools.GO_get_annotations_for_gene(gene_id=gene_symbol)
STRING functional enrichment of interaction partners
enrichment = tu.tools.STRING_functional_enrichment( protein_ids=[gene_symbol], species=9606 )
Report Format - Pathway Context
7. Pathway Context & Network Analysis
7.1 Key Pathways
| Pathway | Reactome ID | Relevance to Disease | Evidence |
|---|---|---|---|
| EGFR signaling | R-HSA-177929 | Driver pathway in NSCLC | [T1] |
| RAS-RAF-MEK-ERK | R-HSA-5673001 | Downstream effector | [T1] |
| PI3K-AKT signaling | R-HSA-2219528 | Resistance mechanism | [T2] |
7.2 Protein-Protein Interactions
Total Interactors: 45 (STRING confidence > 0.7) Key Interactors: GRB2, SHC1, PLCG1, PIK3CA, STAT3
7.3 Pathway Redundancy Assessment
Compensation Risk: MODERATE
- Parallel pathways: HER2, HER3 can compensate
- Feedback loops: RAS activation bypasses EGFR
- Downstream convergence: MEK/ERK shared with other RTKs
Phase 7: Validation Evidence (0-10 points)
Objective: Assess existing functional validation data.
7A. DepMap Essentiality (CRISPR/RNAi)
Gene essentiality in cancer cell lines
deps = tu.tools.DepMap_get_gene_dependencies(gene_symbol=gene_symbol)
Negative scores = essential (cells die upon KO)
Score < -0.5: moderately essential
Score < -1.0: strongly essential
7B. Literature Validation Evidence
Search for functional studies
validation_papers = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND (CRISPR OR siRNA OR knockdown OR knockout OR "loss of function") AND "{disease_name}"', limit=30 )
Search for biomarker studies
biomarker_papers = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND (biomarker OR "target engagement" OR "pharmacodynamic")', limit=20 )
7C. Animal Model Evidence
Mouse phenotypes from OpenTargets (already retrieved in Phase 5)
Reuse mouse_models data
CTD gene-disease associations (complementary)
ctd_diseases = tu.tools.CTD_get_gene_diseases(input_terms=gene_symbol)
Scoring Logic - Validation Evidence
Functional Studies (0-5):
- CRISPR KO shows disease-relevant phenotype: 5
- siRNA knockdown shows phenotype: 4
- Biochemical assay validates mechanism: 3
- Overexpression study only: 2
- No functional data: 0
Disease Models (0-5):
- Patient-derived xenograft (PDX) response: 5
- Genetically engineered mouse model: 4
- Cell line model: 3
- In silico model only: 1
- No model data: 0
Phase 8: Structural Insights
Objective: Leverage structural biology for druggability and mechanism understanding.
8A. PDB Structures
Get PDB entries from UniProt cross-references
uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
Parse: uniProtKBCrossReferences where database == "PDB"
Get details for each PDB
for pdb_id in pdb_ids[:10]: metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id) quality = tu.tools.pdbe_get_entry_quality(pdb_id=pdb_id) summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id) experiment = tu.tools.pdbe_get_entry_experiment(pdb_id=pdb_id) molecules = tu.tools.pdbe_get_entry_molecules(pdb_id=pdb_id)
8B. AlphaFold Prediction
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id) alphafold_info = tu.tools.alphafold_get_summary(qualifier=uniprot_id)
Check pLDDT scores for confidence
8C. Binding Pocket Analysis
ProteinsPlus DoGSiteScorer for best PDB structure
pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=best_pdb_id)
Returns: pocket locations, druggability scores, volume, surface
Interaction diagram for co-crystal structures
if has_ligand: diagram = tu.tools.ProteinsPlus_generate_interaction_diagram(pdb_id=pdb_id)
8D. Domain Architecture
InterPro domains
domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=uniprot_id)
Domain details for key domains
for domain in domains[:5]: detail = tu.tools.InterPro_get_domain_details(entry_id=domain['accession'])
Phase 9: Literature Deep Dive
Objective: Comprehensive literature analysis with collision-aware search.
9A. Collision Detection
Detect naming collisions before literature search
test_results = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}"[Title]', limit=20 )
PubMed returns plain list of dicts
Check if >20% of results are off-topic (no biology terms)
If collision detected, add filters: AND (protein OR gene OR receptor OR kinase)
9B. Publication Metrics
Total publications
total = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND (protein OR gene)', limit=1 )
Check total_count field
Recent publications (5-year trend)
recent = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND (protein OR gene) AND ("2021"[PDAT] : "2026"[PDAT])', limit=50 )
Drug-focused publications
drug_pubs = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND (drug OR therapeutic OR inhibitor OR antibody)', limit=30 )
EuropePMC for broader coverage
epmc = tu.tools.EuropePMC_search_articles( query=f'"{gene_symbol}" AND drug target', limit=30 )
9C. Key Reviews and Landmark Papers
Reviews for target overview
reviews = tu.tools.PubMed_search_articles( query=f'"{gene_symbol}" AND drug target AND review[pt]', limit=10 )
OpenAlex for citation metrics
openalex_works = tu.tools.openalex_search_works( query=f'{gene_symbol} drug target', limit=20 )
Phase 10: Validation Roadmap (Synthesis)
Objective: Generate actionable recommendations based on all evidence.
This phase synthesizes all previous phases into:
-
Target Validation Score (0-100)
-
Priority Tier (1-4)
-
GO/NO-GO Recommendation
-
Recommended Experiments
-
Tool Compounds for Testing
-
Biomarker Strategy
-
Key Risks & Mitigations
Score Calculation
def calculate_validation_score(phase_results): """ Calculate Target Validation Score (0-100).
Components:
- Disease Association: 0-30
- Druggability: 0-25
- Safety: 0-20
- Clinical Precedent: 0-15
- Validation Evidence: 0-10
"""
score = {
'disease_genetic': 0, # 0-10
'disease_literature': 0, # 0-10
'disease_pathway': 0, # 0-10
'drug_structural': 0, # 0-10
'drug_chemical': 0, # 0-10
'drug_class': 0, # 0-5
'safety_expression': 0, # 0-5
'safety_genetic': 0, # 0-10
'safety_adverse': 0, # 0-5
'clinical': 0, # 0-15
'validation_functional': 0, # 0-5
'validation_models': 0, # 0-5
}
# ... scoring logic from each phase ...
total = sum(score.values())
if total >= 80:
tier = "Tier 1"
recommendation = "GO - Highly validated target"
elif total >= 60:
tier = "Tier 2"
recommendation = "CONDITIONAL GO - Needs focused validation"
elif total >= 40:
tier = "Tier 3"
recommendation = "CAUTION - Significant validation needed"
else:
tier = "Tier 4"
recommendation = "NO-GO - Consider alternatives"
return total, tier, recommendation, score
Report Template
File: [TARGET]_[DISEASE]_validation_report.md
Drug Target Validation Report: [TARGET]
Target: [Gene Symbol] ([Full Name]) Disease Context: [Disease Name] (if provided) Modality: [Small molecule / Antibody / etc.] (if specified) Generated: [Date] Status: In Progress
Executive Summary
Target Validation Score: [XX/100] Priority Tier: [Tier X] - [Description] Recommendation: [GO / CONDITIONAL GO / CAUTION / NO-GO]
Key Findings:
- [1-sentence disease association strength with evidence grade]
- [1-sentence druggability assessment]
- [1-sentence safety profile]
- [1-sentence clinical precedent]
Critical Risks:
- [Top risk 1]
- [Top risk 2]
Validation Scorecard
| Dimension | Score | Max | Assessment | Key Evidence |
|---|---|---|---|---|
| Disease Association | 30 | |||
| - Genetic evidence | 10 | |||
| - Literature evidence | 10 | |||
| - Pathway evidence | 10 | |||
| Druggability | 25 | |||
| - Structural tractability | 10 | |||
| - Chemical matter | 10 | |||
| - Target class | 5 | |||
| Safety Profile | 20 | |||
| - Expression selectivity | 5 | |||
| - Genetic validation | 10 | |||
| - Known ADRs | 5 | |||
| Clinical Precedent | 15 | |||
| Validation Evidence | 10 | |||
| - Functional studies | 5 | |||
| - Disease models | 5 | |||
| TOTAL | XX | 100 | [Tier] |
1. Target Identity
[Researching...]
2. Disease Association Evidence
2.1 OpenTargets Disease Associations
[Researching...]
2.2 GWAS Genetic Evidence
[Researching...]
2.3 Constraint Scores (gnomAD)
[Researching...]
2.4 Literature Evidence
[Researching...]
3. Druggability Assessment
3.1 Tractability (OpenTargets)
[Researching...]
3.2 Target Classification
[Researching...]
3.3 Structural Tractability
[Researching...]
3.4 Chemical Probes & Enabling Packages
[Researching...]
4. Known Modulators & Chemical Matter
4.1 Approved/Clinical Drugs
[Researching...]
4.2 ChEMBL Bioactivity
[Researching...]
4.3 BindingDB Ligands
[Researching...]
4.4 PubChem Bioassays
[Researching...]
4.5 Chemical Probes
[Researching...]
5. Clinical Precedent
5.1 FDA-Approved Drugs
[Researching...]
5.2 Clinical Trial Landscape
[Researching...]
5.3 Failed Programs & Lessons
[Researching...]
6. Safety & Toxicity Profile
6.1 OpenTargets Safety Liabilities
[Researching...]
6.2 Expression in Critical Tissues
[Researching...]
6.3 Knockout Phenotypes
[Researching...]
6.4 Known Adverse Events
[Researching...]
6.5 Paralog & Off-Target Risks
[Researching...]
7. Pathway Context & Network Analysis
7.1 Biological Pathways
[Researching...]
7.2 Protein-Protein Interactions
[Researching...]
7.3 Functional Enrichment
[Researching...]
7.4 Pathway Redundancy Assessment
[Researching...]
8. Validation Evidence
8.1 Target Essentiality (DepMap)
[Researching...]
8.2 Functional Studies
[Researching...]
8.3 Animal Models
[Researching...]
8.4 Biomarker Potential
[Researching...]
9. Structural Insights
9.1 Experimental Structures (PDB)
[Researching...]
9.2 AlphaFold Prediction
[Researching...]
9.3 Binding Pocket Analysis
[Researching...]
9.4 Domain Architecture
[Researching...]
10. Literature Landscape
10.1 Publication Metrics
[Researching...]
10.2 Key Publications
[Researching...]
10.3 Research Trend
[Researching...]
11. Validation Roadmap
11.1 Recommended Validation Experiments
[Researching...]
11.2 Tool Compounds for Testing
[Researching...]
11.3 Biomarker Strategy
[Researching...]
11.4 Clinical Biomarker Candidates
[Researching...]
11.5 Disease Models to Test
[Researching...]
12. Risk Assessment
12.1 Key Risks
[Researching...]
12.2 Mitigation Strategies
[Researching...]
12.3 Competitive Landscape
[Researching...]
13. Completeness Checklist
[To be populated post-audit...]
14. Data Sources & Methodology
[Will be populated as research progresses...]
Completeness Checklist (MANDATORY)
Before finalizing, verify:
13. Completeness Checklist
Phase Coverage
- Phase 0: Target disambiguation (all IDs resolved)
- Phase 1: Disease association (OT + GWAS + gnomAD + literature)
- Phase 2: Druggability (tractability + class + structure + probes)
- Phase 3: Chemical matter (ChEMBL + BindingDB + PubChem + drugs)
- Phase 4: Clinical precedent (FDA + trials + failures)
- Phase 5: Safety (OT safety + expression + KO + ADRs + paralogs)
- Phase 6: Pathway context (Reactome + STRING + GO)
- Phase 7: Validation evidence (DepMap + literature + models)
- Phase 8: Structural insights (PDB + AlphaFold + pockets + domains)
- Phase 9: Literature (collision-aware + metrics + key papers)
- Phase 10: Validation roadmap (score + recommendations)
Data Quality
- All scores justified with specific data
- Evidence grades (T1-T4) assigned to key claims
- Negative results documented (not left blank)
- Failed tools with fallbacks documented
- Source citations for all data points
Scoring
- All 12 score components calculated
- Total score summed correctly
- Priority tier assigned
- GO/NO-GO recommendation justified
Fallback Chains
Primary Tool Fallback 1 Fallback 2 If All Fail
OpenTargets_get_diseases_phenotypes_*
CTD_get_gene_diseases
PubMed search Note in report
GTEx_get_median_gene_expression (versioned) GTEx (unversioned) HPA_search_genes_by_query
Document gap
ChEMBL_get_target_activities
BindingDB_get_ligands_by_uniprot
DGIdb_get_gene_info
Note in report
gnomad_get_gene_constraints
OpenTargets_get_target_constraint_info_*
Note as unavailable
Reactome_map_uniprot_to_pathways
OpenTargets_get_target_gene_ontology_*
Use GO only
STRING_get_protein_interactions
intact_get_interactions
OpenTargets interactions
Note in report
ProteinsPlus_predict_binding_sites
alphafold_get_prediction
Literature pockets Note as limited
Modality-Specific Considerations
Small Molecule Focus
-
Emphasize: binding pockets, ChEMBL compounds, Lipinski compliance
-
Key tractability: OpenTargets SM tractability bucket
-
Structure: co-crystal structures with small molecule ligands
-
Chemical matter: IC50/Ki/Kd data from ChEMBL/BindingDB
Antibody Focus
-
Emphasize: extracellular domains, cell surface expression, glycosylation
-
Key tractability: OpenTargets AB tractability bucket
-
Structure: ectodomain structures, epitope mapping
-
Expression: surface expression in disease vs normal tissue
PROTAC Focus
-
Emphasize: intracellular targets, surface lysines, E3 ligase proximity
-
Key tractability: OpenTargets PROTAC tractability
-
Structure: full-length structures for linker design
-
Chemical matter: known binders + E3 ligase binders
Quick Reference: Verified Tool Parameters
Tool Parameters Notes
ensembl_lookup_gene
gene_id , species
species="homo_sapiens" REQUIRED; response wrapped in {status, data, url, content_type}
OpenTargets_get_*_by_ensemblID
ensemblId
camelCase, NOT ensemblID
OpenTargets_get_publications_by_target_ensemblID
entityId
NOT ensemblId
OpenTargets_get_associated_drugs_by_target_ensemblID
ensemblId , size
size is REQUIRED
OpenTargets_target_disease_evidence
efoId , ensemblId
Both REQUIRED
GTEx_get_median_gene_expression
operation , gencode_id
operation="median" REQUIRED
HPA_get_rna_expression_by_source
gene_name , source_type , source_name
ALL 3 required
PubMed_search_articles
query , limit
Returns plain list, NOT {articles:[]}
UniProt_get_function_by_accession
accession
Returns list of strings
alphafold_get_prediction
qualifier
NOT uniprot_accession
drugbank_get_safety_*
query , case_sensitive , exact_match , limit
ALL required
STRING_get_protein_interactions
protein_ids , species
protein_ids is array; species=9606
Reactome_map_uniprot_to_pathways
id
NOT uniprot_id
ChEMBL_get_target_activities
target_chembl_id__exact
Note double underscore
search_clinical_trials
query_term
REQUIRED parameter
gnomad_get_gene_constraints
gene_symbol
NOT gene_id
DepMap_get_gene_dependencies
gene_symbol
NOT gene_id
BindingDB_get_ligands_by_uniprot
uniprot , affinity_cutoff
affinity in nM
Pharos_get_target
gene or uniprot
Both optional but need one
Example Execution: EGFR for NSCLC
Phase 0 Result
- Symbol: EGFR, Ensembl: ENSG00000146648, UniProt: P00533, ChEMBL: CHEMBL203
Expected Scores (EGFR for NSCLC)
-
Disease Association: ~28/30 (strong genetic + pathway + literature)
-
Druggability: ~24/25 (kinase, many structures, abundant compounds)
-
Safety: ~14/20 (widely expressed but manageable toxicity)
-
Clinical Precedent: 15/15 (multiple approved drugs)
-
Validation Evidence: ~9/10 (extensive functional data)
-
Total: ~90/100 = Tier 1
Example for Novel Target (e.g., understudied kinase)
-
Disease Association: ~8/30 (limited GWAS, few publications)
-
Druggability: ~15/25 (kinase family bonus, AlphaFold structure)
-
Safety: ~12/20 (limited data, unknown KO phenotype)
-
Clinical Precedent: 0/15 (no clinical development)
-
Validation Evidence: ~2/10 (minimal functional data)
-
Total: ~37/100 = Tier 4