bulktrajblend-trajectory-interpolation

BulkTrajBlend trajectory interpolation

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bulktrajblend-trajectory-interpolation" with this command: npx skills add starlitnightly/omicverse/starlitnightly-omicverse-bulktrajblend-trajectory-interpolation

BulkTrajBlend trajectory interpolation

Overview

Invoke this skill when users need to bridge gaps in single-cell developmental trajectories using matched bulk RNA-seq. It follows t_bulktrajblend.ipynb , showcasing how BulkTrajBlend deconvolves PDAC bulk samples, identifies overlapping communities with a GNN, and interpolates "interrupted" cell states.

Instructions

  • Prepare libraries and inputs

  • Import omicverse as ov , scanpy as sc , scvelo as scv , and helper functions like from omicverse.utils import mde ; run ov.plot_set() .

  • Load the reference scRNA-seq AnnData (scv.datasets.dentategyrus() ) and raw bulk counts with ov.utils.read(...) followed by ov.bulk.Matrix_ID_mapping(...) for gene ID harmonisation.

  • Configure BulkTrajBlend

  • Instantiate ov.bulk2single.BulkTrajBlend(bulk_seq=bulk_df, single_seq=adata, bulk_group=['dg_d_1','dg_d_2','dg_d_3'], celltype_key='clusters') .

  • Explain that bulk_group names correspond to raw bulk columns and the method expects unscaled counts.

  • Set beta-VAE expectations

  • Call bulktb.vae_configure(cell_target_num=100) (or pass a dictionary) to define expected cell counts per cluster. Mention that omitting the argument triggers TAPE-based estimation.

  • Train or load the beta-VAE

  • Use bulktb.vae_train(batch_size=512, learning_rate=1e-4, hidden_size=256, epoch_num=3500, vae_save_dir='...', vae_save_name='dg_btb_vae', generate_save_dir='...', generate_save_name='dg_btb') .

  • Highlight resuming with bulktb.vae_load('.../dg_btb_vae.pth') and the need to regenerate cells with consistent random seeds for reproducibility.

  • Generate synthetic cells

  • Produce filtered AnnData via bulktb.vae_generate(leiden_size=25) and inspect compositions with ov.bulk2single.bulk2single_plot_cellprop(...) .

  • Save outputs to disk for reuse (adata.write_h5ad ).

  • Configure and train the GNN

  • Call bulktb.gnn_configure(max_epochs=2000, use_rep='X', neighbor_rep='X_pca', gpu=0, ...) to set hyperparameters.

  • Train using bulktb.gnn_train() ; reload checkpoints with bulktb.gnn_load('save_model/gnn.pth') .

  • Generate overlapping community assignments through bulktb.gnn_generate() .

  • Visualise community structure

  • Create MDE embeddings: bulktb.nocd_obj.adata.obsm['X_mde'] = mde(bulktb.nocd_obj.adata.obsm['X_pca']) .

  • Plot clusters vs. discovered communities using sc.pl.embedding(..., color=['clusters','nocd_n'], palette=ov.utils.pyomic_palette()) and filtered subsets excluding synthetic labels with hyphens.

  • Interpolate missing states

  • Run bulktb.interpolation('OPC') (replace with target lineage) to synthesise continuity, then preprocess the interpolated AnnData (HVG selection, scaling, PCA).

  • Compute embeddings with mde , visualise with ov.pl.embedding , and compare to the original atlas.

  • Analyse trajectories

  • Initialise ov.single.pyVIA on both original and interpolated data to derive pseudotime, followed by get_pseudotime , ov.pp.neighbors , ov.utils.cal_paga , and ov.utils.plot_paga for topology validation.

  • Defensive validation

Before BulkTrajBlend: verify bulk_group columns exist

for g in bulk_group: assert g in bulk_df.columns, f"Bulk group '{g}' not in bulk data columns"

Verify celltype_key exists in reference

assert celltype_key in adata.obs.columns, f"Cell type column '{celltype_key}' not in reference AnnData"

Verify gene name overlap

shared = set(bulk_df.index) & set(adata.var_names) assert len(shared) > 100, f"Only {len(shared)} shared genes — harmonize gene IDs first"

  • Troubleshooting tips

  • If the VAE collapses (high reconstruction loss), lower learning_rate or reduce hidden_size .

  • Ensure the same generated dataset is used before calling gnn_train ; regenerating cells changes the graph and can break checkpoint loading.

  • Sparse clusters may need adjusted cell_target_num thresholds or a smaller leiden_size filter to retain rare populations.

Examples

  • "Train BulkTrajBlend on PDAC cohorts, then interpolate missing OPC states in the trajectory."

  • "Load saved beta-VAE and GNN weights to regenerate overlapping communities and plot cluster vs. nocd labels."

  • "Run VIA on interpolated cells and compare PAGA graphs with the original scRNA-seq trajectory."

References

  • Tutorial notebook: t_bulktrajblend.ipynb

  • Example datasets and checkpoints: omicverse_guide/docs/Tutorials-bulk2single/data/

  • Quick copy/paste commands: reference.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

data-viz-plots

No summary provided by upstream source.

Repository SourceNeeds Review
General

data-export-pdf

No summary provided by upstream source.

Repository SourceNeeds Review
General

data-export-excel

No summary provided by upstream source.

Repository SourceNeeds Review
General

single-cell-multi-omics-integration

No summary provided by upstream source.

Repository SourceNeeds Review