rhino-sdk

Plan and execute federated analytics workflows with the Rhino Health Python SDK. Use when the user wants to run survival analysis, metrics, data harmonization, model training, or any multi-step SDK workflow. Takes high-level research goals and produces phased execution plans with runnable code. Also handles SDK questions, debugging rhino_health errors, and metric selection. Triggers on: rhino-health, rhino_health, RhinoSession, FCP, federated analytics, OMOP, FHIR, harmonization, or any of the 40+ federated metrics.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rhino-sdk" with this command: npx skills add naverazy-rhino/rhino-sdk-skills/naverazy-rhino-rhino-sdk-skills-rhino-sdk

Rhino Health SDK — Workflow Planner & Code Expert

Plan-first skill for the rhino-health Python SDK (v2.1.x). Takes high-level research and analytics goals, decomposes them into phased execution plans, and generates complete runnable Python code.

Context Loading

Before responding, read ALL reference files — planning requires the full SDK picture:

  1. API Referencereferences/sdk_reference.md Endpoint classes, methods, enums, CreateInput summaries, dataclass fields, import paths.

  2. Patterns & Gotchasreferences/patterns_and_gotchas.md Auth patterns, resource lookup, metrics execution, filtering, code objects, async, and pitfalls.

  3. Metrics Referencereferences/metrics_reference.md All 40+ federated metric classes with parameters, import paths, and decision guide.

  4. Example Indexreferences/examples/INDEX.md Mapping of use cases to working example files with key methods and difficulty levels.

For SDK questions that don't require planning, you may selectively load only the relevant files.

Request Routing

Determine what the user needs and follow the appropriate workflow:

User intentAction
High-level goal, multi-step workflow, "plan", "design", "how should I approach"Full planning workflow (Sections 3-6)
"Write code", "generate a script", single-task code generationCode generation with validation (Section 6)
"How do I...", SDK concept questionAnswer from reference files (Section 9)
Error, traceback, "why is this failing"Error diagnosis (Section 8)
"Which metric for...", metric configurationMetric selection (Section 7)
"Show me an example", "sample code"Example matching from references/examples/INDEX.md

Planning Process

Follow these four steps for any multi-step goal:

Step 1: Analyze the Goal

Extract from the user's request:

  • Data: What data sources? Do datasets already exist, or need ingestion/creation?
  • Analysis: What computation? Metrics, custom code, harmonization, or a combination?
  • Output: What does the user want? Numbers, transformed datasets, trained models, exported files?
  • Constraints: Filters (age > 50, gender = F), specific sites, time ranges, target data models (OMOP/FHIR)?

If any of these are unclear, ask the user before producing the plan.

Step 2: Select Workflow Templates

Match the goal to one or more composable SDK pipeline templates:

Template A: Federated Analytics

Run statistical metrics across one or more sites without moving data.

Auth → Project → Datasets → Metric Config → Execute → Results
StepSDK MethodNotes
Authenticaterh.login()Always first
Get projectsession.project.get_project_by_name()Check for None
Get datasetsproject.get_dataset_by_name() or list allOne per site
Configure metricMean(variable=...), Cox(...), etc.Add filters/group_by as needed
Execute per-sitesession.dataset.get_dataset_metric(uid, config)Single site
Execute aggregatedsession.project.aggregate_dataset_metric(uids, config)Cross-site, List[str] of UIDs
Execute joinedsession.project.joined_dataset_metric(config, query, filter)Federated join with shared identifiers

Use when: descriptive stats, survival analysis, hypothesis tests, or any metric-based analysis.

Template B: Code Object Execution

Run custom containerized or Python code across federated sites.

Auth → Project → Data Schema → Code Object Create → Build → Run → Wait → Output Datasets
StepSDK MethodNotes
Authenticaterh.login()
Get/create projectsession.project.get_project_by_name()
Get/create schemasession.data_schema.create_data_schema()Only if new data format
Create code objectsession.code_object.create_code_object()GENERALIZED_COMPUTE or PYTHON_CODE
Wait for buildcode_object.wait_for_build()Only for GENERALIZED_COMPUTE
Runsession.code_object.run_code_object()input_dataset_uids=[[uid]] double-nested
Wait for completioncode_run.wait_for_completion()
Access outputsresult.output_dataset_uids.root[0].root[0].root[0]Triply nested

Use when: custom computation — train/test splits, feature engineering, model training, any logic that metrics alone cannot express.

Template C: Data Harmonization

Transform source data into a target data model (OMOP, FHIR, custom).

Auth → Project → Vocabulary → Semantic Mapping → Syntactic Mapping → Config → Run → Output
StepSDK MethodNotes
Authenticaterh.login()
Get projectsession.project.get_project_by_name()
Create semantic mappingsession.semantic_mapping.create_semantic_mapping()Optional; for vocabulary lookups
Wait for indexingsemantic_mapping.wait_for_completion()Can be slow (minutes)
Create syntactic mappingsession.syntactic_mapping.create_syntactic_mapping()Defines column transformations
Generate/set configsession.syntactic_mapping.generate_config()LLM-based auto-generation or manual
Run harmonizationsession.syntactic_mapping.run_data_harmonization()Preferred path
Wait for completioncode_run.wait_for_completion()
Access outputsresult.output_dataset_uids.root[0].root[0].root[0]Triply nested

Key harmonization types: TransformationType.SPECIFIC_VALUE, SOURCE_DATA_VALUE, ROW_PYTHON, TABLE_PYTHON, SEMANTIC_MAPPING, VLOOKUP, DATE, SECURE_UUID.

Target models: SyntacticMappingDataModel.OMOP, .FHIR, .CUSTOM.

Use when: source data needs transformation before analysis — different column names, value encodings, or target standards like OMOP/FHIR.

Template D: SQL Data Ingestion

Pull data from an on-prem database into the Rhino platform.

Auth → Project → Connection Details → SQL Query → Import as Dataset → Verify
StepSDK MethodNotes
Authenticaterh.login()
Get projectsession.project.get_project_by_name()
Define connectionConnectionDetails(server_type=..., server_url=..., ...)PostgreSQL, MySQL, etc.
Run metrics on querysession.sql_query.run_sql_query(SQLQueryInput(...))Does NOT return raw data
Import as datasetsession.sql_query.import_dataset_from_sql_query(SQLQueryImportInput(...))Creates a Dataset from query results
Waitsql_query.wait_for_completion()

Use when: data lives in a relational database and needs to be brought into the platform as a Dataset.

Template E: Model Training + Inference

Train a federated model, then run inference on new data. This is Template B applied twice:

  1. Train phase: Code Object with training logic → produces model artifacts
  2. Inference phase: session.code_run.run_inference() using the trained model
StepSDK MethodNotes
Train (Template B)create_code_objectrun_code_objectwait_for_completionFull code object lifecycle
Run inferencesession.code_run.run_inference(code_run_uid, validation_dataset_uids, ...)Uses trained model
Get model paramssession.code_run.get_model_params(code_run_uid)Download model weights

Use when: federated ML model training and validation.

Template F: Multi-Pipeline Composition

Chain 2+ templates when a single template cannot satisfy the goal:

Goal patternComposition
Harmonize then analyzeTemplate C → Template A
Ingest from SQL then analyzeTemplate D → Template A
Harmonize then train modelTemplate C → Template E
Ingest, harmonize, analyze, trainTemplate D → Template C → Template A → Template E
Custom preprocessing then analyticsTemplate B → Template A

Chaining rule: the output datasets of one phase become the input datasets of the next. Use result.output_dataset_uids.root[0].root[0].root[0] to extract UIDs and pass them forward.

Step 3: Compose the Plan

  1. Authentication is always Phase 0 — shared across all phases. Include project and workgroup discovery.
  2. One template per phase — if the goal requires Templates C → A → B, that is three phases plus Phase 0.
  3. Chain outputs to inputs — explicitly state which output from Phase N feeds into Phase N+1.
  4. Add checkpoints — after each phase, include a verification step (print status, check dataset count, verify output exists).
  5. Surface prerequisites — list what must already exist vs. what will be created.
  6. Note alternatives — if there are multiple valid approaches, briefly state why you chose one.

Step 4: Generate Implementation

After presenting the plan, generate the complete runnable code following ALL validation rules in Section 6.

Plan Output Format

Structure every planning response as:

## Goal
[1-2 sentence restatement]

## Prerequisites
- **Must exist:** [project, datasets, schemas, workgroup access]
- **Created by this plan:** [new code objects, schemas, harmonized datasets]

## Plan

### Phase 0: Setup
- Authenticate and discover project/workgroup/datasets
- Checkpoint: print project name and dataset count

### Phase 1: [Name] — Template [X]
- Step 1.1: [description] — `session.X.method()`
- Step 1.2: [description] — `session.Y.method()`
- Checkpoint: [how to verify]

### Phase 2: [Name] — Template [Y]
- Depends on: Phase 1 output datasets
- Step 2.1: ...
- Checkpoint: [how to verify]

## Alternatives Considered
[Other approaches and why this plan is preferred]

## Implementation
[Complete, runnable Python script]

Decision Guidance

When the goal is ambiguous, use this table:

User signalTemplateReasoning
"analyze", "measure", "statistics", "compare"A (Analytics)Metric-based, no custom code needed
"run code", "custom analysis", "process data", "split", "transform"B (Code Object)Needs logic beyond built-in metrics
"harmonize", "OMOP", "FHIR", "map columns", "standardize"C (Harmonization)Data transformation to target model
"SQL", "database", "import from DB", "ingest"D (SQL Ingestion)Data lives in a relational database
"train model", "predict", "inference", "ML"E (Model Train)Federated model training + validation
Multiple of the aboveF (Composition)Chain templates in dependency order

Validation Checklist

Apply every item to ALL generated code — plans and standalone scripts alike.

Endpoint Accessors

OperationCorrect accessor
Project-level operations, aggregate/joined metricssession.project
Dataset-level operations, per-site metricssession.dataset
Code objects, builds, runs, harmonizationsession.code_object
Run status, inference resultssession.code_run
SQL queriessession.sql_query
Semantic mappings, vocabulariessession.semantic_mapping
Syntactic mappings, harmonization configsession.syntactic_mapping
Data schemassession.data_schema

Environment

  • Default rh.login() connects to production. For dev/QA/staging, pass rhino_api_url: rh.login(..., rhino_api_url=ApiEnvironment.DEV1_AWS_URL)
  • Import: from rhino_health.lib.constants import ApiEnvironment
  • If user mentions dev1/dev2/QA/staging environment, ALWAYS add rhino_api_url parameter

Import Paths

WrongCorrect
from rhino_health.metrics import Xfrom rhino_health.lib.metrics import X
from rhino_health.endpoints.X import Yfrom rhino_health.lib.endpoints.X.X_dataclass import Y

Metric Calls

  • aggregate_dataset_metric takes List[str] of UIDs: [str(d.uid) for d in datasets]
  • get_dataset_metric takes a single dataset_uid: str
  • joined_dataset_metric takes query_datasets and optional filter_datasets as List[str]
  • Metric config objects require data_column (not column or field)
  • FilterVariable uses keys: data_column, filter_column, filter_value, filter_type

CreateInput Alias Fields

Field nameAlias (use this)
project_uidproject
workgroup_uidworkgroup

Nested Structures & RootModels

  • CodeObjectRunInput.input_dataset_uids is List[List[str]]: [[uid1, uid2]]
  • output_dataset_uids is triply nested RootModel: access via .root[0].root[0].root[0]
  • DataSchema.schema_fields is a SchemaFields RootModel: access list via .root, names via .field_names
  • group_by format: {"groupings": [{"data_column": "col"}]}
  • data_filters list: [FilterVariable(data_column="col", filter_column="col", filter_value="val", filter_type=FilterType.EQUALS)]
  • Enum display: use .value for clean strings (e.g. status.value'Approved')

Async Operations

  • Call wait_for_build() after creating Generalized Compute code objects
  • Call wait_for_completion() after run_code_object(), run_data_harmonization(), run_sql_query()

None Checks

Every get_*_by_name() call must be followed by a None check:

dataset = project.get_dataset_by_name("Name")
if dataset is None:
    raise ValueError("Dataset not found")

Code Template

Every generated script must follow this structure:

import rhino_health as rh
from getpass import getpass
# ... additional imports ...

# For non-production environments, add rhino_api_url:
# from rhino_health.lib.constants import ApiEnvironment
# session = rh.login(username="my_email@example.com", password=getpass(),
#                    rhino_api_url=ApiEnvironment.DEV1_AWS_URL)

session = rh.login(username="my_email@example.com", password=getpass())

PROJECT_NAME = "My Project"
# ... constants ...

project = session.project.get_project_by_name(PROJECT_NAME)
if project is None:
    raise ValueError(f"Project '{PROJECT_NAME}' not found")

# ... core logic ...
print(result)

Metric Selection Tree

Map natural language to the right metric class:

User asks about...Metric classCategory
Counts, frequenciesCountBasic
Averages, meansMeanBasic
Spread, variabilityStandardDeviation, VarianceBasic
Totals, sumsSumBasic
Percentiles, medians, quartilesPercentile, NPercentileQuantile
Survival time, time-to-eventKaplanMeierSurvival
Hazard ratios, covariates + survivalCoxSurvival
ROC curves, AUCRocAucROC/AUC
ROC with confidence intervalsRocAucWithCIROC/AUC
Correlation between variablesPearson, SpearmanStatistics
Inter-rater reliabilityICCStatistics
Compare two group meansTTestStatistics
Compare 3+ group meansOneWayANOVAStatistics
Categorical associationChiSquareStatistics
2x2 contingency tableTwoByTwoTableEpidemiology
Odds ratioOddsRatioEpidemiology
Risk ratio / relative riskRiskRatioEpidemiology
Risk differenceRiskDifferenceEpidemiology
Incidence ratesIncidenceEpidemiology

All metrics: from rhino_health.lib.metrics import ClassName

Execution modes

ScopeMethod
Single sitesession.dataset.get_dataset_metric(dataset_uid, config)
Aggregated across sitessession.project.aggregate_dataset_metric(dataset_uids, config)List[str] UIDs
Federated joinsession.project.joined_dataset_metric(config, query_datasets, filter_datasets)

Filtering example

from rhino_health.lib.metrics import Mean, FilterType, FilterVariable

config = Mean(
    variable="Height",
    data_filters=[
        FilterVariable(
            data_column="Gender",
            filter_column="Gender",
            filter_value="Female",
            filter_type=FilterType.EQUALS,
        )
    ],
    group_by={"groupings": ["Gender"]},
)

Error-to-Fix Reference

When the user encounters an error, diagnose using this table:

Error patternRoot causeFix
NotAuthenticatedError / HTTP 401Token expired, wrong creds, or MFARe-login; pass otp_code if MFA enabled
HTTP 401 with correct credentialsWrong environment URLAdd rhino_api_url=ApiEnvironment.DEV1_AWS_URL (or QA/staging). Default is production
AttributeError: 'NoneType'get_*_by_name() returned NoneAdd None check after every get_*_by_name()
ValidationError (pydantic)Wrong field names — alias confusionUse aliases: project not project_uid, workgroup not workgroup_uid
TypeError in metric configString where FilterVariable expectedUse FilterVariable(data_column=..., filter_column=..., filter_value=..., filter_type=...)
ImportError / ModuleNotFoundErrorWrong import pathfrom rhino_health.lib.metrics import X (NOT rhino_health.metrics)
TypeError: aggregate_dataset_metric()List[Dataset] instead of List[str]Convert: [str(d.uid) for d in datasets]
IndexError on output_dataset_uidsAccessing as flat listUse .root[0].root[0].root[0] (triply nested RootModel)
TypeError / AttributeError on schema_fieldsSchemaFields is a RootModel, not a listUse schema.schema_fields.root for the list, .field_names for names
TimeoutError / operation hangsDefault timeout too lowIncrease timeout_seconds in wait_for_completion()
TypeError: input_dataset_uidsList[str] instead of List[List[str]]Must be double-nested: [[uid1, uid2]]
KeyError / None in metric resultsWrong data_column nameVerify column name matches dataset schema (case-sensitive)
Enum shows full path (e.g. Status.APPROVED)Printing enum object directlyUse .value for clean string: status.value'Approved'
ValidationError on enum field (e.g. indexing_status)SDK/API version mismatch — backend added new valueUse session.get() raw API escape hatch (§17 in patterns_and_gotchas.md), or pip install --upgrade rhino-health

Diagnostic process: identify exception class → locate failing SDK call → cross-reference correct signature in references/sdk_reference.md → check for compound errors.

Question Routing

For non-planning SDK questions, locate the right context section:

Question typeSource fileSection
Authentication, login, MFApatterns_and_gotchas.md§1
Finding projects/datasets by namepatterns_and_gotchas.md§2
Creating/updating resources (upsert)patterns_and_gotchas.md§3
Running per-site or aggregated metricspatterns_and_gotchas.md§4
Filtering datapatterns_and_gotchas.md§5
Group-by analysispatterns_and_gotchas.md§6
Federated joinspatterns_and_gotchas.md§7
Code objects (create, build, run)patterns_and_gotchas.md§8
Async operations / waitingpatterns_and_gotchas.md§9
Correct import pathspatterns_and_gotchas.md§11
Environment URL (dev1, QA, staging)patterns_and_gotchas.md§13
RootModel access (SchemaFields, output UIDs)patterns_and_gotchas.md§14
Semantic mapping entries / datapatterns_and_gotchas.md§15
Session persistence / SSOpatterns_and_gotchas.md§16
SDK crash on valid API data, ValidationError on enumpatterns_and_gotchas.md§17
Raw API calls, session.get(), bypassing Pydanticpatterns_and_gotchas.md§17
Vocabularies, vocabulary typessdk_reference.md§SemanticMappingEndpoints, §Key Enums
Data schema fields, column infosdk_reference.md§DataSchema, §SchemaFields
Specific endpoint methodssdk_reference.md§[EndpointName]Endpoints
Enums and constantssdk_reference.md§Key Enums
API environment URLssdk_reference.md§ApiEnvironment
Metric configurationmetrics_reference.md§[Category]
"Which metric for...?"metrics_reference.md§Quick Decision Guide

Working Examples

Match the user's goal to verified working examples from references/examples/INDEX.md:

TemplateExample files
A (Analytics)eda.py, cox.py, metrics_examples.py, roc_analysis.py, aggregate_quantile.py, federated_join.py
B (Code Object)train_test_split.py, runtime_external_files.py
C (Harmonization)fhir_pipeline.py
D (SQL Ingestion)sql_data_ingestion.py
E (Model Training)train_test_split.py (training portion)
F (Composition)fhir_pipeline.py (harmonization + code object + export)

Read the relevant example file before generating code to follow its proven patterns.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

rhino-sdk-example

No summary provided by upstream source.

Repository SourceNeeds Review
General

rhino-sdk-write

No summary provided by upstream source.

Repository SourceNeeds Review
General

rhino-sdk-metrics

No summary provided by upstream source.

Repository SourceNeeds Review
General

rhino-sdk-plan

No summary provided by upstream source.

Repository SourceNeeds Review