protein-assembly

Protein Assembly Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "protein-assembly" with this command: npx skills add letta-ai/skills/letta-ai-skills-protein-assembly

Protein Assembly Skill

This skill provides structured guidance for designing fusion protein gBlock sequences that combine multiple protein components (antibody fragments, fluorescent proteins, enzyme domains) into a single optimized DNA construct.

When to Use This Skill

This skill applies to tasks that involve:

  • Designing fusion proteins from multiple sources (PDB, plasmids, protein databases)

  • Creating gBlock sequences with specific linker requirements

  • Codon optimization for GC content constraints

  • Combining fluorescent proteins with specific excitation/emission wavelengths

  • Assembling multi-domain proteins with N-terminal methionine removal

Structured Approach

Phase 1: Information Gathering and Cataloging

Objective: Collect ALL required sequence data before any design work begins.

Inventory input files completely

  • Read ALL input files in their entirety (avoid truncated reads)

  • For GenBank (.gb) files, parse the complete file to extract CDS/protein sequences

  • For FASTA files, extract all sequences with their identifiers

  • For PDB ID lists, note all IDs for batch retrieval

Fetch external sequences systematically

  • Query PDB API for each protein ID to retrieve amino acid sequences

  • Query relevant protein databases (e.g., fpbase for fluorescent proteins)

  • Document each retrieved sequence with its source and identifier

Create a sequence catalog

  • List all available protein sequences with clear labels

  • Note the source of each sequence (PDB ID, plasmid CDS, database)

  • Identify any missing sequences before proceeding

Phase 2: Protein Identification and Selection

Objective: Match proteins to task requirements using specific criteria.

Wavelength matching for fluorescent proteins

  • Search for proteins with exact wavelength matches (not approximate)

  • Verify both excitation AND emission peaks against requirements

  • Document the selected donor and acceptor proteins with rationale

Binding domain identification

  • Identify proteins that bind specific molecules (substrates, ligands)

  • Cross-reference PDB entries with known binding partners

  • Verify binding capability through database annotations

Target protein identification

  • For antibody-related tasks, identify the target antigen

  • Use sequence homology or database lookups as needed

  • Document the identification method and confidence

Phase 3: Sequence Processing

Objective: Prepare individual protein sequences for fusion.

N-terminal methionine handling

  • Remove N-terminal methionines from ALL internal proteins

  • Keep only the first protein's N-terminal methionine (if required)

  • Document which sequences were modified

Sequence validation

  • Verify each sequence is complete and valid

  • Check for unusual amino acids or sequence artifacts

  • Confirm sequences match expected lengths

Phase 4: Fusion Protein Assembly

Objective: Construct the complete fusion protein sequence.

Follow the specified protein order exactly

  • Do not deviate from the required arrangement

  • Document the order: [Protein1]-[Linker]-[Protein2]-[Linker]-...

Design appropriate linkers

  • Use GS (Glycine-Serine) linkers of specified length

  • Common patterns: (GGGGS)n or (GS)n where n provides required length

  • Ensure linkers fall within length constraints (e.g., 5-20 amino acids)

Assemble the complete protein sequence

  • Concatenate proteins with linkers in correct order

  • Verify the assembled sequence is continuous and valid

Phase 5: Codon Optimization and DNA Generation

Objective: Convert protein to optimized DNA sequence.

Initial codon translation

  • Convert each amino acid to a codon

  • Use a standard codon table for the target organism

GC content optimization

  • Calculate GC content in sliding windows (e.g., 50 nucleotides)

  • Identify windows outside acceptable range (e.g., 30-70%)

  • Swap synonymous codons to bring GC content within range

  • Re-verify after each swap

Length verification

  • Confirm DNA sequence meets length constraints (e.g., ≤3000 nt)

  • If too long, review design choices (linker lengths, protein selections)

Phase 6: Output Generation

Objective: Create the required output file(s).

Write output immediately after assembly

  • Do not delay output file creation

  • Write to the exact path specified in requirements

Include appropriate formatting

  • Follow any specified format (plain text, FASTA, etc.)

  • Include headers or metadata if required

Verify output file exists

  • Confirm the file was created successfully

  • Verify file contents match the designed sequence

Verification Checkpoints

After Phase 1:

  • All input files read completely (no truncation)

  • All external sequences retrieved

  • Sequence catalog is complete

After Phase 2:

  • All required proteins identified

  • Wavelength/binding requirements verified

  • Selection rationale documented

After Phase 3:

  • N-terminal methionines handled correctly

  • All sequences validated

After Phase 4:

  • Protein order matches requirements

  • Linkers meet length constraints

  • Complete fusion sequence assembled

After Phase 5:

  • GC content within range in ALL windows

  • DNA length within constraints

After Phase 6:

  • Output file exists at specified path

  • File contents are correct

Common Pitfalls

Incomplete file reading

  • GenBank files may be large; ensure complete parsing

  • Extract CDS translations, not just raw sequences

Approximate wavelength matching

  • Use exact values, not "close enough" matches

  • Verify both excitation AND emission, not just one

Forgetting N-terminal methionines

  • Internal proteins in fusions should have Met removed

  • Only the first protein retains its N-terminal Met

Ignoring GC content windows

  • Check ALL sliding windows, not just overall GC%

  • Optimize problematic regions with synonymous codons

Delayed output generation

  • Create output file as soon as sequence is ready

  • Do not continue gathering information after design is complete

Information gathering loops

  • Set a clear stopping point for research

  • Progress to execution even with incomplete information

  • A partial solution is better than no solution

Output-First Strategy

If time or resources are constrained:

  • Create the output file early, even with placeholders

  • Update the file as each component is determined

  • Ensure a valid (if imperfect) output exists at task end

This ensures the primary deliverable exists, which can be refined with additional information.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

extracting-pdf-text

No summary provided by upstream source.

Repository SourceNeeds Review
General

video-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

google-workspace

No summary provided by upstream source.

Repository SourceNeeds Review
General

portfolio-optimization

No summary provided by upstream source.

Repository SourceNeeds Review