excel-data-analyzer

Analyze Excel files to identify data structure, quality issues, format inconsistencies, and statistical patterns. Generate comprehensive markdown reports with actionable insights for data cleaning and improvement.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "excel-data-analyzer" with this command: npx skills add mineru98/skills-store/mineru98-skills-store-excel-data-analyzer

Excel Data Analyzer

Overview

Analyze Excel files to identify data structure, quality issues, format inconsistencies, and statistical patterns. Generate comprehensive markdown reports with actionable insights for data cleaning and improvement.

Quick Start

Analyze any Excel file with a single command:

cd /path/to/skill/scripts bun install # First time only bun run analyze_excel.ts /path/to/data.xlsx

Output: Markdown report (data_analysis.md ) with complete analysis.

Core Capabilities

  1. Data Structure Detection

Automatically identifies:

  • Column names and data types (integer, float, string, date, email, boolean, mixed)

  • Row and column counts per sheet

  • Distinct value counts

  • Sample values for quick inspection

  1. Data Quality Analysis

Detects quality issues:

  • Missing values: Percentage and count of nulls per column

  • High null columns: Flags columns with >50% missing data

  • Mixed data types: Identifies columns with inconsistent types

  • Format issues: Detects leading/trailing whitespace, inconsistent casing, numeric strings

  1. Statistical Summaries

Generates statistics for numeric columns:

  • Min, max, mean, median, standard deviation

  • Outlier detection: Values beyond 3 standard deviations

  • Value distribution: Top 10 most frequent values with counts

For text columns:

  • Min/max/average length

  • Value frequency distribution

  1. Quality Scoring

Assigns quality scores (0-100) based on:

  • Missing headers: -10 points

  • High null percentage columns: -15 points

  • Format inconsistencies: -10 points

  • Duplicate column names: -15 points

  1. Multi-Sheet Support

Analyzes all sheets in workbook:

  • Per-sheet quality scores

  • Sheet-by-sheet column analysis

  • Overall workbook quality score

Usage

Basic Analysis

bun run analyze_excel.ts data.xlsx

Generates: data_analysis.md

Custom Output Path

bun run analyze_excel.ts data.xlsx --output reports/audit.md

First-Time Setup

Before running analysis scripts:

cd /path/to/excel-data-analyzer/scripts bun install

This installs required dependencies (xlsx library).

Workflow

When a user provides an Excel file for analysis:

  • Run the analysis script on the provided file

  • Read the generated report to understand findings

  • Summarize key issues for the user:

  • Overall quality score

  • Most critical issues (missing values, format problems)

  • Columns requiring attention

  • Provide recommendations based on analysis:

  • Which columns to investigate

  • Suggested cleaning strategies

  • Priority of fixes (high/medium/low)

Report Structure

Generated markdown reports include:

Executive Summary

  • File metadata (name, size, sheets)

  • Overall quality score

  • High-level findings

Per-Sheet Analysis

  • Dimensions (rows × columns)

  • Quality score

  • Detected issues list

  • Column analysis table (type, distinct values, missing %, issues)

Detailed Column Information

For each column:

  • Data type classification

  • Missing value statistics

  • Sample values

  • Format issues (if any)

  • Statistical summaries (numeric columns)

  • Value distributions

Common Data Issues

High Priority Issues

Mixed data types:

  • Column contains numbers, strings, and dates

  • Prevents proper analysis

  • Example: 123 , "abc" , 2023-01-15

High missing percentage (>50%):

  • Column has insufficient data

  • Consider dropping or imputing

Duplicate column names:

  • Creates ambiguity in analysis

  • Requires renaming

Medium Priority Issues

Numeric strings:

  • Numbers stored as text: "123" instead of 123

  • Prevents calculations

Format inconsistencies:

  • Leading/trailing whitespace: " value "

  • Inconsistent casing: "john" , "JOHN" , "John"

  • Mixed date formats: "2023-01-15" , "01/15/2023"

Outliers:

  • Values beyond 3 standard deviations

  • May indicate errors or special cases

  • Requires investigation

Low Priority Issues

Missing headers:

  • Empty column names

  • Generates systematic names (Column_1, Column_2)

Text length variations:

  • Wide range in string lengths

  • May indicate data entry inconsistencies

Advanced Patterns

For detailed information on data quality patterns and detection methods, see:

references/analysis-patterns.md - Comprehensive guide covering:

  • Data type issues (mixed types, numeric strings, date formats)

  • Missing data patterns (high missing %, sparse data, placeholders)

  • Format inconsistencies (whitespace, casing, delimiters)

  • Statistical anomalies (outliers, skewed distributions)

  • Structural issues (duplicate names, empty rows/columns)

  • Domain-specific patterns (emails, phone numbers, dates)

  • Encoding issues (character encoding, Unicode)

Consult this reference when encountering unusual patterns or needing deeper analysis strategies.

Output Interpretation

Quality Score Ranges

  • 90-100: Excellent - minimal issues

  • 70-89: Good - minor format issues

  • 50-69: Fair - significant quality concerns

  • Below 50: Poor - major data problems

Prioritizing Fixes

  • First: Address structural issues (duplicate columns, missing headers)

  • Second: Fix high missing value columns (>50%)

  • Third: Resolve mixed data types

  • Fourth: Clean format inconsistencies

  • Fifth: Investigate outliers

Performance

Optimized for large files:

  • Bun runtime: Fast JavaScript execution

  • Streaming support: Memory-efficient for large datasets

  • xlsx library: Industry-standard Excel parsing

Typical performance:

  • Small files (<1MB): <1 second

  • Medium files (1-100MB): 1-10 seconds

  • Large files (>100MB): 10-60 seconds

Limitations

  • Only generates analysis reports (does not perform data cleaning)

  • Text-based analysis (does not interpret business context)

  • Statistical methods assume numeric data for quantitative analysis

  • Outlier detection uses simple 3-sigma rule (not robust methods)

Resources

scripts/

analyze_excel.ts - Main analysis script (Bun/TypeScript)

  • Parses Excel files using xlsx library

  • Detects data types and quality issues

  • Generates statistical summaries

  • Produces markdown reports

package.json - Bun dependencies

  • xlsx: Excel file parsing

references/

analysis-patterns.md - Comprehensive guide to data quality patterns

  • Detailed detection methods

  • Impact assessments

  • Recommendations for each issue type

assets/

report-template.md - Markdown report template structure

  • Shows expected output format

  • Reference for understanding report sections

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

奇门遁甲

提供基于数据分析与认知科学的八字和奇门遁甲排盘及解盘,辅助理性传统命理决策参考。

Registry SourceRecently Updated
General

No Login Video

Turn a 2-minute screen recording or phone video into 1080p ready-to-share MP4 just by typing what you need. Whether it's editing and exporting videos instant...

Registry SourceRecently Updated
General

Editor Baby

Skip the learning curve of professional editing software. Describe what you want — trim the best moments, add soft background music, and create a short highl...

Registry SourceRecently Updated
General

Secretary Memory

OpenClaw 秘书式多分区记忆系统 v3.0。仿生现代秘书的笔记本分类法,支持:(1) 多分区并发搜索 + 每分区3条上下文召回,(2) 会话自动摘要,(3) 偏好自动提取 + 用户关系图谱,(4) 记忆冲突主动检测,(5) 定时 consolidation + 会话结束 hook,(6) 精细化恢复/回溯,...

Registry SourceRecently Updated