Spatial Transcriptomics with OmicVerse

This skill covers spatial analysis workflows organized into three stages: Preprocessing, Deconvolution, and Downstream Analysis. Each stage includes the critical function calls, parameter guidance, and common pitfalls.

Defensive Validation: Always Check Spatial Coordinates First

Before ANY spatial operation, verify that spatial coordinates exist and are numeric:

Required check before spatial analysis

assert 'spatial' in adata.obsm,
"Missing adata.obsm['spatial']. Load with ov.io.spatial.read_visium() or set manually."

Cast to float64 to prevent coordinate precision issues during rotation/cropping

adata.obsm['spatial'] = adata.obsm['spatial'].astype('float64')

Stage 1: Preprocessing

Crop, Rotate, and Align Coordinates

Load Visium data and manipulate spatial coordinates for region selection and alignment:

import scanpy as sc, omicverse as ov ov.plot_set()

adata = sc.datasets.visium_sge(sample_id="V1_Breast_Cancer_Block_A_Section_1") library_id = list(adata.uns['spatial'].keys())[0]

Cast coordinates before manipulation

adata.obsm['spatial'] = adata.obsm['spatial'].astype('float64')

Crop to region of interest

adata_crop = ov.space.crop_space_visium(adata, crop_loc=(0, 0), crop_area=(1000, 1000), library_id=library_id, scale=1)

Rotate and auto-align

adata_rot = ov.space.rotate_space_visium(adata, angle=45, library_id=library_id) ov.space.map_spatial_auto(adata_rot, method='phase')

For manual refinement: ov.space.map_spatial_manual(adata_rot, ...)

Visium HD Cell Segmentation

Segment Visium HD bins into cells using cellpose:

adata = ov.space.read_visium_10x(path="binned_outputs/square_002um/", source_image_path="tissue_image.btf") ov.pp.filter_genes(adata, min_cells=3) ov.pp.filter_cells(adata, min_counts=1)

H&E-based segmentation

adata = ov.space.visium_10x_hd_cellpose_he(adata, mpp=0.3, gpu=True, buffer=150)

Expand labels to neighboring bins

ov.space.visium_10x_hd_cellpose_expand(adata, labels_key='labels_he', expanded_labels_key='labels_he_expanded', max_bin_distance=4)

Gene-expression-driven seeds

ov.space.visium_10x_hd_cellpose_gex(adata, obs_key="n_counts_adjusted", mpp=0.3, sigma=5)

Merge labels and aggregate to cell-level

ov.space.salvage_secondary_labels(adata, primary_label='labels_he_expanded', secondary_label='labels_gex', labels_key='labels_joint') cdata = ov.space.bin2cell(adata, labels_key='labels_joint')

Stage 2: Deconvolution

Critical API Reference: Method Selection

CORRECT — Tangram: method passed to deconvolution() call

decov_obj = ov.space.Deconvolution(adata_sc=sc_adata, adata_sp=sp_adata, celltype_key='Subset', result_dir='result/tangram') decov_obj.preprocess_sc(max_cells=5000) decov_obj.preprocess_sp() decov_obj.deconvolution(method='tangram', num_epochs=1000)

CORRECT — cell2location: method passed at INIT time

cell2_obj = ov.space.Deconvolution(adata_sc=sc_adata, adata_sp=sp_adata, celltype_key='Subset', result_dir='result/c2l', method='cell2location') cell2_obj.deconvolution(max_epochs=30000) cell2_obj.save_model('result/c2l/model')

Note: For cell2location, the method parameter is set at initialization, not at the deconvolution() call. For Tangram, it's passed to deconvolution() .

Starfysh Archetypal Deconvolution

from omicverse.external.starfysh import AA, utils, plot_utils

visium_args = utils.prepare_data(adata_path="data/counts.h5ad", signature_path="data/signatures.csv", min_cells=10, filter_hvg=True, n_top_genes=3000) adata, adata_normed = visium_args.get_adata() aa_model = AA.ArchetypalAnalysis(adata_orig=adata_normed) aa_model.fit(k=12, n_init=10) visium_args = utils.refine_anchors(visium_args, aa_model, add_marker=True) model, history = utils.run_starfysh(visium_args, poe=False, n_repeat=5, lr=5e-3, max_epochs=500)

Stage 3: Downstream Analysis

Spatial Clustering

GraphST + mclust

ov.utils.cluster(adata, use_rep='graphst|original|X_pca', method='mclust', n_components=7)

STAGATE

ov.utils.cluster(adata, use_rep='STAGATE', method='mclust', n_components=7)

Merge small clusters

ov.space.merge_cluster(adata, groupby='mclust', resolution=0.5)

Algorithm choice: GraphST and STAGATE require precalculated latent spaces. For standard clustering without spatial-aware embeddings, use Leiden/Louvain directly.

Multi-Slice Integration (STAligner)

import anndata as ad Batch_list = [ov.read(p) for p in slice_paths] adata_concat = ad.concat(Batch_list, label='slice_name', keys=section_ids) STAligner_obj = ov.space.pySTAligner(adata=adata_concat, batch_key='slice_name', hidden_dims=[256, 64], use_gpu=True) STAligner_obj.train_STAligner_subgraph(nepochs=800, lr=1e-3) STAligner_obj.train() adata_aligned = STAligner_obj.predicted() sc.pp.neighbors(adata_aligned, use_rep='STAligner')

Spatial Trajectories

SpaceFlow — pseudo-spatial maps:

sf_obj = ov.space.pySpaceFlow(adata) sf_obj.train(spatial_regularization_strength=0.1, num_epochs=300, patience=50) sf_obj.cal_pSM(n_neighbors=20, resolution=1.0)

STT — transition dynamics:

STT_obj = ov.space.STT(adata, spatial_loc='xy_loc', region='Region', n_neighbors=20) STT_obj.stage_estimate() STT_obj.train(n_states=9, n_iter=15, weight_connectivities=0.5)

Cell Communication (COMMOT + FlowSig)

df_cellchat = ov.external.commot.pp.ligand_receptor_database(species='human', database='cellchat') df_cellchat = ov.external.commot.pp.filter_lr_database(df_cellchat, adata, min_expr_frac=0.05) ov.external.commot.tl.spatial_communication(adata, lr_database=df_cellchat, distance_threshold=500, result_prefix='cellchat')

FlowSig network

adata.layers['normalized'] = adata.X.copy() ov.external.flowsig.tl.construct_intercellular_flow_network( adata, commot_output_key='commot-cellchat', flowsig_output_key='flowsig-cellchat', edge_threshold=0.7)

Structural Layers (GASTON) and Slice Alignment (SLAT)

GASTON — iso-depth estimation:

gas_obj = ov.space.GASTON(adata) A = gas_obj.prepare_inputs(n_pcs=50) gas_obj.load_rescale(A) gas_obj.train(hidden_dims=[64, 32], dropout=0.1, max_epochs=2000) gaston_isodepth, gaston_labels = gas_obj.cal_iso_depth(n_layers=5)

SLAT — cross-slice alignment:

from omicverse.external.scSLAT.model import Cal_Spatial_Net, load_anndatas, run_SLAT, spatial_match Cal_Spatial_Net(adata1, k_cutoff=20, model='KNN') Cal_Spatial_Net(adata2, k_cutoff=20, model='KNN') edges, features = load_anndatas([adata1, adata2], feature='DPCA', check_order=False) embeddings, *_ = run_SLAT(features, edges, LGCN_layer=5) best, index, distance = spatial_match(embeddings, adatas=[adata1, adata2])

Troubleshooting

ValueError: spatial coordinates out of bounds after rotation: Cast adata.obsm['spatial'] to float64 BEFORE calling rotate_space_visium . Integer coordinates lose precision during trigonometric rotation.
Cellpose segmentation fails with memory error: For large .btf images, use backend='tifffile' to memory-map the image. Reduce buffer parameter if GPU memory is insufficient.
Gene ID overlap failure in Tangram/cell2location: Harmonise identifiers (ENSEMBL vs gene symbols) between adata_sc and adata_sp before calling preprocess_sc /preprocess_sp . Drop non-overlapping genes.
mclust clustering error: Requires rpy2 and the R mclust package. If R bindings are unavailable, switch to method='louvain' or method='leiden' .
STAligner/SpaceFlow embeddings collapse to a single point: Verify adata.obsm['spatial'] exists and coordinates are scaled appropriately. Tune learning rate (try lr=5e-4 ) and regularisation strength.
FlowSig returns empty network: Build spatial neighbor graphs before Moran's I filtering. Increase bootstraps or lower edge_threshold (try 0.5) if the network is too sparse.
GASTON RuntimeError in training: Provide a writable out_dir path. PyTorch nondeterminism may cause variation between runs—set torch.manual_seed() for reproducibility.
SLAT alignment has many low-quality matches: Regenerate spatial graphs with a higher k_cutoff value. Inspect low_quality_index flags and filter cells with high distance scores.
STT pathway enrichment fails: gseapy needs network access for gene set downloads. Cache gene sets locally with ov.utils.geneset_prepare() and pass the dictionary directly.

Dependencies

Core: omicverse , scanpy , anndata , squidpy , numpy , matplotlib
Segmentation: cellpose , opencv-python /tifffile , optional GPU PyTorch
Deconvolution: tangram-sc , cell2location , pytorch-lightning ; Starfysh needs torch , scikit-learn
Downstream: scikit-learn , commot , flowsig , gseapy , torch-backed modules (STAligner, SpaceFlow, GASTON, SLAT)

Examples

"Crop and rotate my Visium slide, then run cellpose segmentation on the HD data and aggregate to cell-level AnnData."
"Deconvolve my lymph node spatial data with Tangram and cell2location, compare proportions, and plot cell-type maps."
"Integrate three DLPFC slices with STAligner, cluster with STAGATE, and infer communication with COMMOT+FlowSig."

References

Quick copy/paste commands: reference.md

spatial-transcriptomics-tutorials-with-omicverse

Safety Notice

Copy this and send it to your AI assistant to learn