Modern Structure Prediction

Predict protein structures using state-of-the-art machine learning models. This covers cloud APIs, local installations, and interpretation of results.

Model Comparison

Model Complexes Ligands Speed Access

AlphaFold3 Yes Yes Slow Server only (2025)

ESMFold No No Fast API or local

Chai-1 Yes Yes Moderate Local or API

Boltz-1 Yes Yes Moderate Local

ColabFold No* No Moderate Colab/local

*ColabFold can predict complexes with AlphaFold-Multimer.

ESMFold (Fastest Single-Chain)

Via ESM Atlas API

import requests

def predict_esmfold(sequence): '''Predict structure using ESMFold API''' url = 'https://api.esmatlas.com/foldSequence/v1/pdb/' response = requests.post(url, data=sequence, timeout=300) if response.status_code == 200: return response.text raise Exception(f'ESMFold failed: {response.status_code}')

sequence = 'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH' pdb_text = predict_esmfold(sequence) with open('predicted.pdb', 'w') as f: f.write(pdb_text)

Local ESMFold

import torch import esm

def predict_esmfold_local(sequence, device='cuda'): '''Run ESMFold locally (requires ~16GB GPU memory)''' model = esm.pretrained.esmfold_v1() model = model.eval().to(device)

with torch.no_grad():
    output = model.infer_pdb(sequence)
return output

Extract pLDDT from ESMFold output

def extract_esmfold_plddt(pdb_text): plddt = {} for line in pdb_text.split('\n'): if line.startswith('ATOM') and line[12:16].strip() == 'CA': resnum = int(line[22:26]) bfactor = float(line[60:66]) plddt[resnum] = bfactor return plddt

AlphaFold3 (Server)

AlphaFold3 predictions via the server at alphafoldserver.com.

Prepare Input JSON

import json

def create_af3_input(sequences, job_name='prediction'): '''Create AlphaFold3 server input JSON''' entities = [] for i, seq in enumerate(sequences): entities.append({ 'type': 'protein', 'sequence': seq, 'count': 1 })

job = {
    'name': job_name,
    'modelSeeds': [1],
    'sequences': entities
}
return json.dumps(job, indent=2)

Single protein

input_json = create_af3_input(['MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH'])

Protein complex

input_json = create_af3_input([ 'MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH', 'MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSS' ])

Process AF3 Results

import json from Bio.PDB import PDBParser import numpy as np

def analyze_af3_result(result_dir): '''Analyze AlphaFold3 prediction results''' # Load summary with open(f'{result_dir}/summary_confidences.json') as f: summary = json.load(f)

# Extract confidence metrics
iptm = summary.get('iptm', None)  # Interface pTM (complexes)
ptm = summary.get('ptm', None)    # Predicted TM-score
ranking = summary.get('ranking_score', None)

print(f'pTM: {ptm:.3f}' if ptm else 'pTM: N/A')
print(f'ipTM: {iptm:.3f}' if iptm else 'ipTM: N/A')

return summary

AF3 Confidence Interpretation

Metric Range Interpretation

pTM 0-1 Overall structure confidence

ipTM 0-1 Interface prediction quality

pLDDT 0-100 Per-residue confidence

PAE 0-30A Position error between residue pairs

Chai-1 (Local Open-Source)

Installation

pip install chai-lab

Basic Prediction

from chai_lab.chai1 import run_inference import numpy as np from pathlib import Path

def predict_chai1(fasta_path, output_dir='chai_output'): '''Run Chai-1 structure prediction''' Path(output_dir).mkdir(exist_ok=True)

candidates = run_inference(
    fasta_file=Path(fasta_path),
    output_dir=Path(output_dir),
    num_trunk_recycles=3,        # 3: Standard. Use 5+ for difficult targets.
    num_diffn_timesteps=200,     # 200: Standard. 500 for higher quality.
    seed=42,
    device='cuda:0'
)
return candidates

Candidates are sorted by confidence

candidates.cif files contain predicted structures

Chai-1 with Ligands

Chai-1 supports protein-ligand complexes

Include ligand SMILES in input FASTA with special format

def create_chai_fasta_with_ligand(protein_seq, ligand_smiles, output_file): '''Create Chai-1 input with protein and ligand''' with open(output_file, 'w') as f: f.write('>protein|chain_A\n') f.write(f'{protein_seq}\n') f.write('>ligand|chain_B\n') f.write(f'{ligand_smiles}\n')

Boltz-1 (Open-Source Complex Prediction)

Installation

pip install boltz

Basic Prediction

from boltz import Boltz1

def predict_boltz1(sequences, output_dir='boltz_output'): '''Run Boltz-1 structure prediction''' model = Boltz1()

result = model.predict(
    sequences=sequences,
    output_dir=output_dir,
    recycling_steps=3,   # 3: Standard. Increase for difficult targets.
    sampling_steps=200   # 200: Standard. 500 for publication quality.
)
return result

Boltz-1 for Complexes

Boltz-1 handles heteromeric complexes

def predict_complex_boltz(chain_sequences): '''Predict protein complex with Boltz-1''' model = Boltz1()

result = model.predict(
    sequences=chain_sequences,  # List of sequences for each chain
    output_dir='complex_output'
)

# Extract interface metrics
return result

ColabFold (AlphaFold2 + MMseqs2)

Command Line

Install ColabFold

pip install colabfold

Run prediction

colabfold_batch input.fasta output_dir/

With custom templates

colabfold_batch input.fasta output_dir/ --templates

For complexes (use : to separate chains)

Create FASTA like: >complex\nSEQUENCE1:SEQUENCE2

Python API

from colabfold.batch import run_colabfold

def predict_colabfold(fasta_file, output_dir, use_templates=False): '''Run ColabFold prediction''' run_colabfold( input_path=fasta_file, result_dir=output_dir, use_templates=use_templates, num_models=5, # 5: Standard. Use 1 for quick predictions. num_recycles=3, # 3: Standard. Increase for multimers. model_order=[1,2,3,4,5] )

Comparing Predictions

from Bio.PDB import PDBParser, Superimposer import numpy as np

def compare_predictions(pdb_files, labels=None): '''Compare multiple structure predictions''' parser = PDBParser(QUIET=True) structures = [parser.get_structure(f'model_{i}', f) for i, f in enumerate(pdb_files)]

# Extract CA atoms from first chain
def get_ca_atoms(struct):
    return [r['CA'] for r in struct[0].get_residues() if 'CA' in r]

all_atoms = [get_ca_atoms(s) for s in structures]

# Pairwise RMSD
n = len(structures)
rmsd_matrix = np.zeros((n, n))

for i in range(n):
    for j in range(i+1, n):
        min_len = min(len(all_atoms[i]), len(all_atoms[j]))
        super_imposer = Superimposer()
        super_imposer.set_atoms(all_atoms[i][:min_len], all_atoms[j][:min_len])
        rmsd_matrix[i,j] = rmsd_matrix[j,i] = super_imposer.rms

return rmsd_matrix

Compare ESMFold vs AlphaFold3 vs Chai-1

rmsd = compare_predictions(['esmfold.pdb', 'af3.pdb', 'chai1.pdb']) print('RMSD matrix:') print(rmsd)

When to Use Each Model

Scenario Recommended Model

Quick single-chain prediction ESMFold (API)

Highest accuracy single chain AlphaFold3 or ColabFold

Protein-protein complex AlphaFold3, Chai-1, or Boltz-1

Protein-ligand complex AlphaFold3 or Chai-1

No GPU available ESMFold API or AlphaFold3 server

Large-scale screening ESMFold (local)

Open-source requirement Chai-1 or Boltz-1

Memory Requirements

Model GPU Memory Notes

ESMFold ~16 GB Sequence length dependent

ColabFold ~8-16 GB Model size dependent

Chai-1 ~24 GB Complex size dependent

Boltz-1 ~24 GB Complex size dependent

Related Skills

alphafold-predictions - Download pre-computed AlphaFold structures
structure-io - Parse and write structure files
geometric-analysis - RMSD, superimposition, distance calculations
structure-navigation - Navigate predicted structure hierarchy

bio-structural-biology-modern-structure-prediction

Safety Notice

Copy this and send it to your AI assistant to learn