prompt-repetition

LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "prompt-repetition" with this command: npx skills add akillness/skills-template/akillness-skills-template-prompt-repetition

Prompt Repetition

Problem Being Solved

LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to:

  • Context-Question Problem: The question is unknown when processing context

  • Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices

  • Position/Index Problem: Attention weights weaken for specific position information in long lists

Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.

When to use this skill

  • When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc.

  • Options-First MCQ: Multiple choice where answer choices appear before the question

  • Context + Question: Searching for specific information in long contexts

  • Index/Position Tasks: Position-based queries in inventories or lists

  • NPC Dialogue: Maintaining consistency for game AI characters

  • Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought

How It Works

Limitations of Causal Attention

[Context] → [Question] ↓ Cannot reference Question content when processing Context tokens Attention weights for Context are already finalized by the time Question tokens appear

How Prompt Repetition Solves This

[First Pass] [Second Pass] Context → Question → Context' → Question' ↑ ↑ Can reference entire first pass

In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance.

Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.

Research Results (Google Research 2025)

Metric Result

Significant improvement (p < 0.1) 47 / 70 benchmarks

Performance degradation 0

Neutral 23

Improvement rate 67%

Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)

Tested Models

  • Gemini 2.0 Flash / Flash Lite

  • GPT-4o / GPT-4o-mini

  • Claude 3.7 Sonnet / Claude 3 Haiku

  • Deepseek V3

Tested Benchmarks

  • ARC (Challenge) - Scientific reasoning

  • OpenBookQA - Open-domain QA

  • GSM8K - Math problems

  • MMLU-Pro - Multitask language understanding

  • MATH - Mathematical problem solving

  • NameIndex / MiddleMatch - Custom position tasks

Application Procedure

Step 1: Verify Auto-Apply Target Models

Provider Auto-apply models Excluded models

Claude haiku series opus, sonnet

Gemini flash, flash-lite pro, ultra

OpenAI gpt-4o-mini, gpt-low gpt-4o, gpt-4

Step 2: Determine Repetition Count by Task Type

Task Type Keyword Pattern Repetitions Expected Improvement

Options-First MCQ A. B. C. D. choices first 2× +15-40%p

Index/Position slot , position , index , N-th

3× +50-76%p

Context + Question General question 2× +5-15%p

With CoT step by step , think through

0× (not applied) ~0%

Step 3: Check Token Limits

Check context before auto-apply

max_context = model_context_window * 0.8 # 80% safety margin if len(prompt_tokens) * repetitions > max_context: repetitions = max(1, int(max_context / len(prompt_tokens)))

Step 4: Prompt Transformation

def apply_prompt_repetition(prompt: str, times: int = 2) -> str: """Repeat the prompt a specified number of times

Args:
    prompt: Original prompt
    times: Number of repetitions (default 2)

Returns:
    Repeated prompt
"""
if times &#x3C;= 1:
    return prompt
return "\n\n".join([prompt] * times)

Practical Examples

Example 1: Options-First MCQ (Greatest Effect)

Before:

A. Paris B. London C. Berlin D. Madrid

Which city is the capital of France? Reply with one letter.

After (repetition ×2 applied):

A. Paris B. London C. Berlin D. Madrid

Which city is the capital of France? Reply with one letter.

A. Paris B. London C. Berlin D. Madrid

Which city is the capital of France? Reply with one letter.

Expected output:

A

Accuracy: original 78% → after repetition 93% (+15%p)

Example 2: Index/Position Tasks (Maximum Effect)

Before:

Inventory:

  1. Iron Sword
  2. Leather Armor
  3. Health Potion (x5)
  4. Magic Staff ...
  5. Dragon Scale ...
  6. Ancient Map

What item is in slot 25?

After (repetition ×3 applied): Prompt repeated 3 times

Expected output:

Dragon Scale

Accuracy: original 21% → after repetition 97% (+76%p)

Example 3: Tool Call Prompt Handling

Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.

Before:

Use the calculator tool to compute 234 * 567. What is the result?

After (repetition ×2):

Use the calculator tool to compute 234 * 567. What is the result?

Use the calculator tool to compute 234 * 567. What is the result?

Research results show that full repetition including tool call sections is also effective.

Production-Ready Implementation

Auto-Apply Transformer

"""prompt_repetition_transformer.py""" from dataclasses import dataclass, field from typing import Optional, Callable, List import re

Context window per model (in tokens)

MODEL_CONTEXT_WINDOWS = { "claude-3-haiku": 200_000, "claude-haiku": 200_000, "gemini-flash": 1_000_000, "gemini-flash-lite": 1_000_000, "gemini-2.0-flash": 1_000_000, "gpt-4o-mini": 128_000, "gpt-low": 128_000, }

Models targeted for auto-apply

AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())

CoT patterns (excluded from apply)

COT_PATTERNS = [ r"step by step", r"think through", r"let's think", r"reasoning:", r"chain of thought", ]

Position/Index patterns (3× repetition)

POSITION_PATTERNS = [ r"slot \d+", r"position \d+", r"index \d+", r"\d+(st|nd|rd|th)", r"item \d+", r"row \d+", r"column \d+", ]

@dataclass class PromptRepetitionConfig: """Prompt repetition configuration""" default_repetitions: int = 2 position_repetitions: int = 3 separator: str = "\n\n" max_context_ratio: float = 0.8 applied_marker: str = "<!-- prompt-repetition-applied -->"

class PromptRepetitionTransformer: """Auto-apply prompt repetition transformer for lightweight models"""

def __init__(self, config: Optional[PromptRepetitionConfig] = None):
    self.config = config or PromptRepetitionConfig()

def should_apply(self, model: str, prompt: str) -> bool:
    """Determine whether to auto-apply"""
    # Skip if already applied
    if self.config.applied_marker in prompt:
        return False

    # Check target model
    model_lower = model.lower()
    if not any(m in model_lower for m in AUTO_APPLY_MODELS):
        return False

    # Skip when CoT pattern detected
    prompt_lower = prompt.lower()
    for pattern in COT_PATTERNS:
        if re.search(pattern, prompt_lower):
            return False

    return True

def determine_repetitions(self, prompt: str, model: str) -> int:
    """Determine repetition count based on task type"""
    prompt_lower = prompt.lower()

    # Position/Index pattern detected → 3×
    for pattern in POSITION_PATTERNS:
        if re.search(pattern, prompt_lower):
            return self.config.position_repetitions

    return self.config.default_repetitions

def estimate_tokens(self, text: str) -> int:
    """Simple token count estimation (speed over precision)"""
    # Estimate approximately 4 characters = 1 token
    return len(text) // 4

def transform(self, prompt: str, model: str) -> str:
    """Apply repetition to prompt"""
    if not self.should_apply(model, prompt):
        return prompt

    repetitions = self.determine_repetitions(prompt, model)

    # Check context limit
    model_lower = model.lower()
    max_tokens = 128_000  # Default value
    for m, tokens in MODEL_CONTEXT_WINDOWS.items():
        if m in model_lower:
            max_tokens = tokens
            break

    max_allowed = int(max_tokens * self.config.max_context_ratio)
    prompt_tokens = self.estimate_tokens(prompt)

    # Reduce repetitions if token limit exceeded
    while prompt_tokens * repetitions > max_allowed and repetitions > 1:
        repetitions -= 1

    if repetitions &#x3C;= 1:
        return prompt

    # Apply repetition + add marker
    repeated = self.config.separator.join([prompt] * repetitions)
    return f"{self.config.applied_marker}\n{repeated}"

def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
    """Wrap LLM call function"""
    def wrapped(prompt: str, **kwargs):
        transformed = self.transform(prompt, model)
        return llm_fn(transformed, **kwargs)
    return wrapped

How to Measure Effectiveness (Verification)

A/B Testing Method

def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]): """A/B test for prompt repetition effectiveness""" transformer = PromptRepetitionTransformer()

results = {"baseline": [], "repeated": []}

for prompt, expected in zip(prompts, ground_truth):
    # Baseline
    response_a = llm_fn(prompt)
    results["baseline"].append(response_a == expected)

    # With Repetition
    repeated_prompt = transformer.transform(prompt, model)
    response_b = llm_fn(repeated_prompt)
    results["repeated"].append(response_b == expected)

baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)

print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")

Key Metrics

Metric Measurement Method

Accuracy Compare correct answer rates

Consistency Variance across 10 runs of same prompt

Token cost Input token increase rate

Latency Compare p50, p99 latency

When NOT to Use

Case Reason

Using CoT Reasoning process already provides context

Reasoning models (opus, sonnet) Already optimized; minimal effect

Very long prompts Risk of exceeding context limit

Already repeated Duplicate application wastes tokens

Cost-Accuracy Analysis

Metric Baseline With Repetition Change

Input tokens 500/req 1000/req +100%

Output tokens 100/req 100/req 0%

Latency (p50) 450ms 460ms +2%

Latency (p99) 1200ms 1250ms +4%

Accuracy 78% 89% +14%p

Cost per correct answer $0.019 $0.020 +5%

Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.

Multi-Agent Integration

Auto-Apply Strategy Per Agent

Agent Model Repetition Applied Applied At

Claude Orchestrator opus/sonnet Optional

Claude Executor haiku Auto skill_loader.py

Gemini Analyst flash Auto On MCP call

OpenAI gpt-4o-mini Auto skill_loader.py

Preventing Duplicate Application

To prevent duplicate application in multi-agent pipelines:

  • Use markers: Detect already-applied prompts with <!-- prompt-repetition-applied --> marker

  • Pass metadata: Pass x-prompt-repetition-applied: true header between agents

  • Orchestrator management: Claude Orchestrator tracks whether repetition is applied when calling sub-agents

Application Pattern

[Claude Sonnet] Planning (no repetition needed) ↓ [Gemini Flash] Analysis (repetition ×2 auto-applied, marker added) ↓ [Claude Haiku] Execution (marker detected → skip duplicate apply)

skill_loader.py Integration Guide

Recommended Implementation

Code to add to skill_loader.py

from prompt_repetition_transformer import PromptRepetitionTransformer

class SkillLoader: def init(self, ...): # ... existing code ... self.prompt_transformer = PromptRepetitionTransformer()

def apply_auto_skills(self, prompt: str, model: str) -> str:
    """Handle auto-apply skills"""
    # Auto-apply prompt-repetition
    for skill in self.skills.values():
        auto_apply = skill.get('data', {}).get('auto-apply', {})
        if auto_apply.get('trigger') == 'auto':
            target_models = auto_apply.get('models', [])
            if any(m in model.lower() for m in target_models):
                prompt = self.prompt_transformer.transform(prompt, model)

    return prompt

Constraints

Required Rules

  • Lightweight models first: Most effective for haiku, flash, mini series

  • Limit repetitions: 2× for general tasks, max 3× for position tasks

  • Context monitoring: Be cautious of context overflow due to repetition

  • Check markers: Mandatory marker check to prevent duplicate application

Prohibited Rules

  • No padding substitution: Increasing length with . etc. has no effect (per research)

  • Do not combine with CoT: Effects cancel out

  • Do not force-apply to reasoning models: Already optimized

  • No duplicate application: Consecutive application without markers wastes tokens

Quick Reference

=== Auto-Apply Target Models === claude-3-haiku, claude-haiku gemini-flash, gemini-flash-lite, gemini-2.0-flash gpt-4o-mini, gpt-low

=== Repetition Count === General tasks: 2× Position/Index (slot/position/index keywords): 3× With CoT: 0× (not applied)

=== Effect (Google Research 2025) === Improvement rate: 67% (47/70 benchmarks) Performance degradation: 0 cases Maximum improvement: +76%p (NameIndex)

=== Cost === Input tokens: +100% Latency: +2% (Prefill parallelization) Cost per correct answer: +5%

=== Duplicate Application Prevention === Marker: <!-- prompt-repetition-applied -->

References

  • Prompt Repetition Improves Non-Reasoning LLMs (Leviathan et al., 2025)

  • Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2023)

  • Re-Reading Improves Reasoning in LLMs (Xu et al., 2024)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

omc

No summary provided by upstream source.

Repository SourceNeeds Review
General

vibe-kanban

No summary provided by upstream source.

Repository SourceNeeds Review
General

ralph

No summary provided by upstream source.

Repository SourceNeeds Review
General

plannotator

No summary provided by upstream source.

Repository SourceNeeds Review