rangebar-eval-metrics

Range Bar Evaluation Metrics

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "rangebar-eval-metrics" with this command: npx skills add terrylica/cc-skills/terrylica-cc-skills-rangebar-eval-metrics

Range Bar Evaluation Metrics

Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data.

When to Use This Skill

Use this skill when:

  • Evaluating ML model performance on range bar data

  • Computing Sharpe ratios with non-IID bar sequences

  • Running Walk-Forward Optimization metric analysis

  • Calculating PSR, DSR, or MinTRL statistical tests

  • Generating evaluation reports from fold results

Quick Start

Compute metrics from predictions + actuals

python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy

Generate full evaluation report

python scripts/generate_report.py --results folds.jsonl --output report.md

Metric Tiers

Tier Purpose Metrics Compute

Primary (5) Research decisions weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate Per-fold + aggregate

Secondary/Risk (5) Additional context max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns Per-fold

ML Quality (3) Prediction health ic, prediction_autocorr, is_collapsed Per-fold

Diagnostic (5) Final validation psr, dsr, autocorr_lag1, effective_n, binomial_pvalue Aggregate only

Extended Risk (5) Deep risk analysis var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index Per-fold (optional)

Why Range Bars Need Special Treatment

Range bars violate standard IID assumptions:

  • Variable duration: Bars form based on price movement, not time

  • Autocorrelation: High-volatility periods cluster bars → temporal correlation

  • Non-constant information: More bars during volatility = more information per day

Canonical solution: Daily aggregation via _group_by_day() before Sharpe calculation.

References

Core Reference Files

Topic Reference File

Sharpe Ratio Calculations sharpe-formulas.md

Risk Metrics (VaR, Omega, Ulcer) risk-metrics.md

ML Prediction Quality (IC, Autocorr) ml-prediction-quality.md

Crypto Market Considerations crypto-markets.md

Temporal Aggregation Rules temporal-aggregation.md

JSON Schema for Metrics metrics-schema.md

Anti-Patterns (Transaction Costs) anti-patterns.md

SOTA 2025-2026 (SHAP, BOCPD, etc.) sota-2025-2026.md

Worked Examples (BTC, EUR/USD) worked-examples.md

Structured Logging (NDJSON) structured-logging.md

Related Skills

Skill Relationship

adaptive-wfo-epoch Uses weekly_sharpe , psr , dsr for WFE calculation

Dependencies

pip install -r requirements.txt

Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10

Key Formulas

Daily-Aggregated Sharpe (Primary Metric)

def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float: """Sharpe with daily aggregation for range bars.""" daily_pnl = _group_by_day(pnl, timestamps) # Sum PnL per calendar day if len(daily_pnl) < 2 or np.std(daily_pnl) == 0: return 0.0 daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl) # For crypto (7-day week): sqrt(7). For equities: sqrt(5) return daily_sharpe * np.sqrt(7) # Crypto default

Information Coefficient (Prediction Quality)

from scipy.stats import spearmanr

def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float: """Spearman rank IC - captures magnitude alignment.""" ic, _ = spearmanr(predictions, actuals) return ic # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent

Probabilistic Sharpe Ratio (Statistical Validation)

from scipy.stats import norm

def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float: """P(true Sharpe > benchmark).""" return norm.cdf((sharpe - benchmark) / se)

Annualization Factors

Market Daily → Weekly Daily → Annual Rationale

Crypto (24/7) sqrt(7) = 2.65 sqrt(365) = 19.1 7 trading days/week

Equity sqrt(5) = 2.24 sqrt(252) = 15.9 5 trading days/week

NEVER use sqrt(252) for crypto markets.

CRITICAL: Session Filter Changes Annualization

View Filter days_per_week Rationale

Session-filtered (London-NY) Weekdays 08:00-16:00 sqrt(5) Trading like equities

All-bars (unfiltered) None sqrt(7) Full 24/7 crypto

Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!

See crypto-markets.md for detailed rationale.

Dual-View Metrics

For comprehensive analysis, compute metrics with BOTH views:

  • Session-filtered (London 08:00 to NY 16:00): Primary strategy evaluation

  • All-bars: Regime detection, data quality diagnostics

Academic References

Concept Citation

Deflated Sharpe Ratio Bailey & López de Prado (2014)

Sharpe SE with Non-Normality Mertens (2002)

Statistics of Sharpe Ratios Lo (2002)

Omega Ratio Keating & Shadwick (2002)

Ulcer Index Peter Martin (1987)

Decision Framework

Go Criteria (Research)

go_criteria:

  • positive_sharpe_rate > 0.55
  • mean_weekly_sharpe > 0
  • cv_fold_returns < 1.5
  • mean_hit_rate > 0.50

Publication Criteria

publication_criteria:

  • binomial_pvalue < 0.05
  • psr > 0.85
  • dsr > 0.50 # If n_trials > 1

Scripts

Script Purpose

scripts/compute_metrics.py

Compute all metrics from predictions/actuals

scripts/generate_report.py

Generate Markdown report from fold results

scripts/validate_schema.py

Validate metrics JSON against schema

Remediations (2026-01-19 Multi-Agent Audit)

The following fixes were applied based on a 12-subagent adversarial audit:

Issue Root Cause Fix Source

weekly_sharpe=0

Constant predictions Model collapse detection + architecture fix model-expert

IC=None

Zero variance predictions Return 1.0 for constant (semantically correct) model-expert

prediction_autocorr=NaN

Division by zero Guard for std < 1e-10, return 1.0 model-expert

Ulcer Index divide-by-zero Peak equity = 0 Guard with np.where(peak > 1e-10, ...) risk-analyst

Omega/Profit Factor unreliable Too few samples min_days parameter (default: 5) robustness-analyst

BiLSTM mean collapse Architecture too small hidden_size: 16→48, dropout: 0.5→0.3 model-expert

profit_factor=1.0 (n_bars=0) Early return wrong value Return NaN when no data to compute ratio risk-analyst

Model Collapse Detection

ALWAYS check for model collapse after prediction

pred_std = np.std(predictions) if pred_std < 1e-6: logger.warning( f"Constant predictions detected (std={pred_std:.2e}). " "Model collapsed to mean - check architecture." )

Recommended BiLSTM Architecture

BEFORE (causes collapse on range bars)

HIDDEN_SIZE = 16 DROPOUT = 0.5

AFTER (prevents collapse)

HIDDEN_SIZE = 48 # Triple capacity DROPOUT = 0.3 # Less aggressive regularization

See reference docs for complete implementation details.

Troubleshooting

Issue Cause Solution

weekly_sharpe is 0 Constant predictions Check for model collapse, increase hidden_size

IC returns None Zero variance in predictions Model collapsed - check architecture

prediction_autocorr is NaN Division by zero Guard for std < 1e-10 in autocorr calculation

Ulcer Index divide error Peak equity is zero Add guard: np.where(peak > 1e-10, ...)

profit_factor = 1.0 No bars processed Return NaN when n_bars is 0

Sharpe inflated 18% Wrong annualization for data Use sqrt(5) for session-filtered, sqrt(7) for 24/7

PSR/DSR not computed Missing scipy Install: pip install scipy

Timestamps not parsed Wrong format Ensure Unix timestamps, not datetime strings

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

pandoc-pdf-generation

No summary provided by upstream source.

Repository SourceNeeds Review
General

mql5-indicator-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

mise-tasks

No summary provided by upstream source.

Repository SourceNeeds Review
General

semantic-release

No summary provided by upstream source.

Repository SourceNeeds Review