clinical-data-cleaner

Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detecting outliers in lab results, or converting raw CRF data to CDISC format. Cleans and standardizes clinical trial data for regulatory compliance with audit trails.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "clinical-data-cleaner" with this command: npx skills add renhaosu2024/clinical-data-cleaner

Clinical Data Cleaner

Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.

Quick Start

from scripts.main import ClinicalDataCleaner

# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')

# Clean data with default settings
cleaned = cleaner.clean(raw_data)

# Save with audit trail
cleaner.save_report('output.csv')

Core Capabilities

1. SDTM Domain Validation

cleaner = ClinicalDataCleaner(domain='DM')  # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)

Required Fields:

  • DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
  • LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
  • VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC

2. Missing Value Handling

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median'  # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)

3. Outlier Detection

cleaner = ClinicalDataCleaner(
    domain='LB',
    outlier_method='domain',  # iqr, zscore, domain
    outlier_action='flag'     # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)

Clinical Thresholds:

ParameterRangeUnit
Glucose50-500mg/dL
Hemoglobin5-20g/dL
Systolic BP70-220mmHg

4. Date Standardization

standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00

5. Complete Pipeline

cleaner = ClinicalDataCleaner(
    domain='DM',
    missing_strategy='median',
    outlier_method='iqr',
    outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')

Output Files:

  • output.csv - Cleaned SDTM data
  • output.report.json - Audit trail for regulatory submission

CLI Usage

# Clean demographics
python scripts/main.py \
  --input dm_raw.csv \
  --domain DM \
  --output dm_clean.csv \
  --missing-strategy median \
  --outlier-method iqr \
  --outlier-action flag

# Clean lab data with clinical thresholds
python scripts/main.py \
  --input lb_raw.csv \
  --domain LB \
  --output lb_clean.csv \
  --outlier-method domain

Common Patterns

See references/common-patterns.md for detailed examples:

  • Regulatory Submission Preparation
  • Interim Analysis Data Preparation
  • Database Migration Cleanup
  • External Lab Data Integration

Troubleshooting

See references/troubleshooting.md for solutions to:

  • Validation failures
  • Date parsing errors
  • Memory errors with large datasets
  • Outlier detection issues

Quality Checklist

Pre-Cleaning:

  • IACUC approval obtained (animal studies)
  • Sample size adequately powered
  • Randomization method documented

Post-Cleaning:

  • Validate against CDISC SDTM IG
  • Review all cleaning actions in audit trail
  • Test import to analysis software

References

  • references/sdtm_ig_guide.md - CDISC SDTM Implementation Guide
  • references/domain_specs.json - Domain-specific field requirements
  • references/outlier_thresholds.json - Clinical outlier thresholds
  • references/common-patterns.md - Detailed usage patterns
  • references/troubleshooting.md - Problem-solving guide

Skill ID: 189 | Version: 2.0 | License: MIT

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Semmtech Dev

Semmtech development and deployment tools

Registry SourceRecently Updated
00Profile unavailable
Coding

fleet

Multi-agent fleet management for OpenClaw, Claude Code, Codex, Cursor, Windsurf, Gemini CLI, and custom agent stacks. One CLI to monitor, manage, and orchest...

Registry SourceRecently Updated
Coding

Mini Coder Max

Autonomous coding agent that systematically plans, implements, reviews, and delivers high-quality code. Handles tasks of any complexity by following a struct...

Registry SourceRecently Updated
Coding

AANA Guardrail Skill Bundled Helper Variant

Provides a local Python helper to send minimal review payloads to a trusted AANA HTTP bridge for policy guidance inside the skill package.

Registry SourceRecently Updated