data-quality

Audience: Data engineers building quality gates for pipelines.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-quality" with this command: npx skills add majesticlabs-dev/majestic-marketplace/majesticlabs-dev-majestic-marketplace-data-quality

Data Quality

Audience: Data engineers building quality gates for pipelines.

Goal: Measure, monitor, and report on data quality dimensions.

Related skills:

  • data-profiler

  • For comprehensive data profiling

  • anomaly-detector

  • For outlier detection

Scripts

Execute quality functions from scripts/quality_metrics.py :

from scripts.quality_metrics import ( QualityDimension, QualityMetric, QualityScorecard, calculate_completeness, calculate_uniqueness, check_freshness, check_volume, detect_distribution_drift, generate_scorecard, generate_html_report )

Usage Examples

Quality Checks

from scripts.quality_metrics import calculate_completeness, calculate_uniqueness

Completeness check

completeness = calculate_completeness(df, required_cols=['id', 'email', 'status']) print(f"Completeness: {completeness.score}% - {'PASS' if completeness.passed else 'FAIL'}")

Uniqueness check

uniqueness = calculate_uniqueness(df, key_cols=['id']) print(f"Uniqueness: {uniqueness.score}%")

Freshness Check

from scripts.quality_metrics import check_freshness

freshness = check_freshness(df, timestamp_col='updated_at', max_age_hours=24) if not freshness.passed: print(f"Data is stale: {freshness.details['age_hours']} hours old")

Generate Scorecard

from scripts.quality_metrics import generate_scorecard, generate_html_report

scorecard = generate_scorecard( df, name="users_table", required_cols=['id', 'email'], key_cols=['id'] )

print(f"Overall Score: {scorecard.overall_score:.1f}%") print(f"Status: {'PASSED' if scorecard.passed else 'FAILED'}")

Generate HTML report

html = generate_html_report(scorecard)

Distribution Drift

from scripts.quality_metrics import detect_distribution_drift

drift = detect_distribution_drift(baseline_df['revenue'], current_df['revenue']) if drift['drifted']: print(f"Distribution drift detected: {drift['test']} p-value={drift['p_value']:.4f}")

Quality Dimensions

Dimension What It Measures

Completeness Missing values, required fields

Uniqueness Duplicates in key columns

Validity Format, range, pattern compliance

Accuracy Correctness vs source of truth

Consistency Cross-field logical rules

Timeliness Data freshness, staleness

Dependencies

pandas scipy # For distribution drift detection

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

google-ads-strategy

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

viral-content

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

market-research

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

free-tool-arsenal

No summary provided by upstream source.

Repository SourceNeeds Review