q_descriptive-analysis

Descriptive Analysis Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "q_descriptive-analysis" with this command: npx skills add tyrealq/q-skills/tyrealq-q-skills-q-descriptive-analysis

Descriptive Analysis Skill

Generate comprehensive exploratory descriptive analysis of tabular datasets with grouped statistics, frequency tables, entity extraction, and publication-ready markdown summaries.

Workflow

  1. Requirements Gathering (Interview)

Before analysis, ask the user 5-10 questions covering:

  • Research objective - Exploratory vs. confirmatory analysis

  • Grouping variables - Categorical variables to stratify by

  • Continuous variables - Metrics to calculate descriptives for

  • Text fields requiring extraction - Columns with embedded entities

  • Temporal variable - Date/time column and desired granularity

  • Classification schemes - Any custom tier/category definitions

  • Output preferences - CSV tables, MD summary, visualizations

  1. Data Preparation

Create derived variables as needed:

Tier classification (customize thresholds)

def classify_tier(value, tiers): for tier_name, (min_val, max_val) in tiers.items(): if min_val <= value <= max_val: return tier_name return 'Other'

Example tier structure

TIERS = { 'Small': (0, 1000), 'Medium': (1001, 10000), 'Large': (10001, float('inf')) }

Temporal grouping

df['month'] = df['date_col'].dt.to_period('M').astype(str)

  1. Analysis Structure

Generate tables in this order:

File Pattern Contents

01 sample_overview.csv N, date range, unique counts

02-07 {groupvar}_distribution.csv Frequency for each grouping variable

08 continuous_overall.csv Mean, SD, Median, Min, Max

08a-f continuous_by_{groupvar}.csv Descriptives stratified by group

09-10 categorical_distribution.csv Key categorical variables

11-15 entity_{fieldname}.csv Extracted entity frequencies

16 temporal_trends.csv Metrics over time

  1. Descriptive Statistics Function

def descriptive_stats(series, name='Variable'): return { 'Variable': name, 'N': series.count(), 'Mean': series.mean(), 'SD': series.std(), 'Min': series.min(), 'Q1': series.quantile(0.25), 'Median': series.median(), 'Q3': series.quantile(0.75), 'Max': series.max() }

def grouped_descriptives(df, var, group_var, group_col_name): results = [] for group in df[group_var].dropna().unique(): group_data = df[df[group_var] == group][var].dropna() if len(group_data) > 0: stats = descriptive_stats(group_data, var) stats[group_col_name] = group results.append(stats) return pd.DataFrame(results)

  1. Entity Extraction

For text fields with embedded entities (timestamps, names, etc.):

import re

def extract_entities(text): """Extract entities from bracketed text like '[00:01:23] entity_name'""" if pd.isna(text) or text == '': return [] entities = [] pattern = r'[[\d:]+]\s*([^;[]]+)' matches = re.findall(pattern, str(text)) for match in matches: entity = match.strip().lower() if entity and len(entity) > 1: entities.append(entity) return entities

def entity_frequency(df, col): all_entities = [] for text in df[col].dropna(): all_entities.extend(extract_entities(text)) return pd.Series(all_entities).value_counts()

  1. Output Directory Structure

TABLE/ ├── 01_sample_overview.csv ├── 02_groupvar1_distribution.csv ├── ... ├── 08_continuous_overall.csv ├── 08a_continuous_by_groupvar1.csv ├── ... ├── DESCRIPTIVE_SUMMARY.md

  1. MD Summary Generator

Create comprehensive markdown summary including:

  • Sample Overview - Dataset dimensions and date range

  • Distribution Tables - Top values for each grouping variable

  • Continuous Descriptives - Overall + by each grouping variable

  • Entity Summaries - Unique counts and top entities

  • Temporal Trends - Key metrics over time

  • Output Files Reference - Links to all CSV tables

Summary should use markdown tables with proper formatting:

VariableNMeanSDMedian
views38027192.59133894.14657.00
  1. Key Design Principles
  • Descriptive only - No inferential statistics unless requested

  • Flexible grouping - Support any number of grouping variables

  • Top-N limits - Show top 5-10 for large category sets

  • Clean entity extraction - Normalize case, deduplicate

  • Dual output - CSV for validation, MD for interpretation

  • Video/channel counts - When applicable, report both unit types

  • Milestone annotations - Add context to temporal distributions

  1. Verification Checklist
  • All CSV files generated with > 0 rows

  • No empty/null columns

  • Sum of frequencies matches total N

  • Grouped descriptives align with overall

  • Entity extraction capturing expected patterns

  • MD summary coherent and complete

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

q-exploratory-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
General

q-educator

No summary provided by upstream source.

Repository SourceNeeds Review
General

q-infographics

No summary provided by upstream source.

Repository SourceNeeds Review
General

q-topic-finetuning

No summary provided by upstream source.

Repository SourceNeeds Review