chai

Structure prediction using Chai-1, a foundation model for molecular structure. Use this skill when: (1) Predicting protein-protein complex structures, (2) Validating designed binders, (3) Predicting protein-ligand complexes, (4) Using the Chai API for high-throughput prediction, (5) Need an alternative to AlphaFold2. For QC thresholds, use protein-qc. For AlphaFold2 prediction, use alphafold. For ESM-based analysis, use esm.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "chai" with this command: npx skills add adaptyvbio/protein-design-skills/adaptyvbio-protein-design-skills-chai

Chai-1 Structure Prediction

Prerequisites

RequirementMinimumRecommended
Python3.10+3.11
CUDA12.0+12.1+
GPU VRAM24GB40GB (A100)
RAM32GB64GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Modal

cd biomodals
modal run modal_chai1.py \
  --input-faa complex.fasta \
  --out-dir predictions/

GPU: A100 (40GB) | Timeout: 30min default

Option 2: Chai API (recommended)

pip install chai_lab

python -c "
import chai_lab
from chai_lab.chai1 import run_inference

# Run prediction
run_inference(
    fasta_file='complex.fasta',
    output_dir='predictions/',
    num_trunk_recycles=3
)
"

Option 3: Local installation

git clone https://github.com/chaidiscovery/chai-lab.git
cd chai-lab
pip install -e .

chai-lab predict \
  --fasta complex.fasta \
  --output predictions/

FASTA Format

Protein complex

>binder
MKTAYIAKQRQISFVKSHFSRQLE...
>target
MVLSPADKTNVKAAWGKVGAHAGE...

Protein + ligand

>protein
MKTAYIAKQRQISFVKSHFSRQLE...
>ligand|smiles
CCO

Protein + DNA/RNA

>protein
MKTAYIAKQRQISFVKSHFSRQLE...
>dna
ATCGATCGATCG

Key parameters

ParameterDefaultRangeDescription
num_trunk_recycles31-10Recycles (more = better)
num_diffn_timesteps20050-500Diffusion steps
seed0intRandom seed

Output format

predictions/
├── pred.model_idx_0.cif    # Best model (CIF format)
├── pred.model_idx_1.cif    # Second model
├── scores.json             # Confidence scores
├── pae.npy                 # PAE matrix
└── plddt.npy               # pLDDT values

Note: Chai-1 outputs CIF format. Convert to PDB if needed:

from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("pred", "pred.model_idx_0.cif")
io = PDBIO()
io.set_structure(structure)
io.save("pred.model_idx_0.pdb")

Extracting metrics

import numpy as np
import json

# Load scores
with open('predictions/scores.json') as f:
    scores = json.load(f)

plddt = np.load('predictions/plddt.npy')
pae = np.load('predictions/pae.npy')

print(f"pLDDT: {plddt.mean():.3f}")
print(f"pTM: {scores['ptm']:.3f}")
print(f"ipTM: {scores.get('iptm', 'N/A')}")

Use cases

Binder validation

# Predict complex with Chai
chai-lab predict --fasta binder_target.fasta --output val/

# Check ipTM > 0.5
scores = json.load(open('val/scores.json'))
if scores['iptm'] > 0.5:
    print("Design passes validation")

Protein-ligand complex

# FASTA with SMILES
fasta = """
>protein
MKTA...
>ligand|smiles
CCO
"""

# Chai handles both protein and small molecules

Batch prediction

# Multiple sequences
for fasta in sequences/*.fasta; do
    chai-lab predict \
        --fasta "$fasta" \
        --output "predictions/$(basename $fasta .fasta)"
done

Comparison with AF2

AspectChai-1AlphaFold2
MSA requiredNoYes
Small moleculesYesNo
DNA/RNAYesLimited
SpeedFasterSlower
AccuracyComparableReference

Sample output

Successful run

$ chai-lab predict --fasta complex.fasta --output predictions/
[INFO] Loading Chai-1 model...
[INFO] Running inference...
[INFO] Saved 5 models to predictions/

predictions/scores.json:
{
  "ptm": 0.82,
  "iptm": 0.71,
  "ranking_score": 0.76
}

What good output looks like:

  • pTM: > 0.7 (confident global structure)
  • ipTM: > 0.5 (confident interface, > 0.7 for high confidence)
  • CIF files with reasonable atom positions

Decision tree

Should I use Chai?
│
├─ What are you predicting?
│  ├─ Protein-protein complex → Chai ✓ or ColabFold
│  ├─ Protein + small molecule → Chai ✓
│  ├─ Protein + DNA/RNA → Chai ✓
│  └─ Single protein only → Use ESMFold (faster)
│
├─ Need MSA?
│  ├─ No / want speed → Chai ✓
│  └─ Yes / want accuracy → ColabFold
│
└─ Priority?
   ├─ Highest accuracy → ColabFold with MSA
   ├─ Speed / no MSA → Chai ✓
   └─ Ligand binding → Chai ✓

Typical performance

Campaign SizeTime (A100)Cost (Modal)Notes
100 complexes30-60 min~$10Standard validation
500 complexes2-4h~$45Large campaign
1000 complexes5-8h~$90Comprehensive

Per-complex: ~20-40s for typical binder-target complex.


Verify

find predictions -name "*.cif" | wc -l  # Should match input count

Troubleshooting

Low pLDDT: Increase num_trunk_recycles Low ipTM: Check chain order, interface region OOM errors: Use A100-80GB or reduce batch Slow prediction: Reduce num_diffn_timesteps

Error interpretation

ErrorCauseFix
RuntimeError: CUDA out of memoryComplex too largeUse A100-80GB or split prediction
KeyError: 'iptm'Single chain predictedEnsure FASTA has multiple chains
ValueError: invalid SMILESMalformed ligandValidate SMILES with RDKit
torch.cuda.OutOfMemoryErrorGPU exhaustedReduce num_diffn_timesteps to 100

Next: protein-qc for filtering and ranking.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

cell-free-expression

No summary provided by upstream source.

Repository SourceNeeds Review
General

binding-characterization

No summary provided by upstream source.

Repository SourceNeeds Review
General

protein-qc

No summary provided by upstream source.

Repository SourceNeeds Review
General

ipsae

No summary provided by upstream source.

Repository SourceNeeds Review
chai | V50.AI