translation-quality-assessment

Translation Quality Assessment

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "translation-quality-assessment" with this command: npx skills add findinfinitelabs/chuuk/findinfinitelabs-chuuk-translation-quality-assessment

Translation Quality Assessment

Overview

A specialized skill for assessing translation quality between Chuukese and English, incorporating cultural context validation, linguistic accuracy checking, and automated quality metrics. Designed to ensure high-quality translations that preserve both linguistic meaning and cultural nuances.

Supported Models: Helsinki-NLP OPUS-MT, fine-tuned Chuukese models, hybrid translation systems

Capabilities

  • Cultural Context Validation: Ensure translations maintain cultural appropriateness and traditional concepts

  • Linguistic Accuracy Assessment: Check grammatical correctness and meaning preservation

  • Automated Quality Metrics: BLEU, chrF, ROUGE, and custom Chuukese-specific scoring

  • Helsinki-NLP Model Evaluation: Model-specific metrics for OPUS-MT fine-tuned models

  • Consistency Checking: Verify terminology consistency across translations

  • Fluency Evaluation: Assess naturalness and readability of translations

  • Back-Translation Validation: Round-trip translation quality assessment

  • Low-Resource Language Metrics: chrF preferred for morphologically rich languages

Core Components

  1. Cultural Context Validator

class ChuukeseCulturalValidator: def init(self): self.cultural_mappings = { # Family relationships with cultural significance 'family_terms': { 'semei': {'english': 'older brother', 'cultural_note': 'implies respect and responsibility'}, 'jinej': {'english': 'older sister', 'cultural_note': 'implies respect and responsibility'}, 'pwis': {'english': 'grandchild', 'cultural_note': 'special bond in Chuukese culture'} },

        # Traditional concepts that require cultural explanation
        'traditional_concepts': {
            'emon': {'english': 'traditional house', 'cultural_note': 'communal living structure'},
            'chomw': {'english': 'to help/cooperate', 'cultural_note': 'fundamental community value'},
            'nous': {'english': 'traditional gift exchange', 'cultural_note': 'important social practice'}
        },
        
        # Respect and formality indicators
        'respect_markers': {
            'oupwe': {'english': 'please (formal)', 'cultural_note': 'high respect level'},
            'kose mochen': {'english': 'thank you (formal)', 'cultural_note': 'deep gratitude expression'},
            'tipeew': {'english': 'excuse me (formal)', 'cultural_note': 'polite interruption'}
        }
    }

def validate_cultural_preservation(self, chuukese_text, english_translation):
    """Validate that cultural concepts are properly translated"""
    validation_results = {
        'cultural_terms_found': [],
        'proper_translations': [],
        'missing_context': [],
        'cultural_accuracy_score': 0.0
    }
    
    total_cultural_terms = 0
    correctly_translated = 0
    
    for category, terms in self.cultural_mappings.items():
        for chuukese_term, translation_info in terms.items():
            if chuukese_term in chuukese_text.lower():
                total_cultural_terms += 1
                validation_results['cultural_terms_found'].append({
                    'term': chuukese_term,
                    'category': category,
                    'expected_translation': translation_info['english'],
                    'cultural_note': translation_info['cultural_note']
                })
                
                # Check if English translation contains expected terms
                if translation_info['english'] in english_translation.lower():
                    correctly_translated += 1
                    validation_results['proper_translations'].append(chuukese_term)
                else:
                    validation_results['missing_context'].append({
                        'term': chuukese_term,
                        'expected': translation_info['english'],
                        'suggestion': f"Consider translating as '{translation_info['english']}' with note: {translation_info['cultural_note']}"
                    })
    
    if total_cultural_terms > 0:
        validation_results['cultural_accuracy_score'] = correctly_translated / total_cultural_terms
    
    return validation_results

2. Linguistic Accuracy Checker

class ChuukeseLinguisticChecker: def init(self): # Common Chuukese grammatical patterns self.grammar_patterns = { 'verb_patterns': { 'present': r'(ko|ka|ke)\s+\w+', # Present tense markers 'past': r'(a|aa)\s+\w+', # Past tense markers 'future': r'(pwe|pwene)\s+\w+' # Future tense markers },

        'noun_patterns': {
            'plural': r'\w+(kan|kin)$',      # Plural endings
            'possessive': r'\w+(y|i|ey)$'    # Possessive forms
        },
        
        'sentence_structure': {
            'basic_word_order': 'VSO',       # Verb-Subject-Object typical order
            'question_markers': ['ya', 'ese', 'iwe', 'mei']
        }
    }

def check_grammatical_accuracy(self, chuukese_text, english_translation):
    """Check if translation preserves grammatical structures"""
    accuracy_report = {
        'tense_consistency': True,
        'structure_preservation': True,
        'grammatical_errors': [],
        'suggestions': []
    }
    
    # Check tense consistency
    chuukese_tenses = self.detect_tenses(chuukese_text)
    english_tenses = self.detect_english_tenses(english_translation)
    
    if not self.tenses_match(chuukese_tenses, english_tenses):
        accuracy_report['tense_consistency'] = False
        accuracy_report['grammatical_errors'].append('Tense mismatch between source and translation')
    
    # Check question structure preservation
    if any(marker in chuukese_text.lower() for marker in self.grammar_patterns['sentence_structure']['question_markers']):
        if not english_translation.strip().endswith('?'):
            accuracy_report['structure_preservation'] = False
            accuracy_report['grammatical_errors'].append('Question structure not preserved in translation')
    
    return accuracy_report

def detect_tenses(self, text):
    """Detect tenses in Chuukese text"""
    detected_tenses = []
    for tense, pattern in self.grammar_patterns['verb_patterns'].items():
        if re.search(pattern, text, re.IGNORECASE):
            detected_tenses.append(tense)
    return detected_tenses

3. Automated Quality Metrics

import math from collections import Counter import sacrebleu from typing import List, Dict, Optional

class TranslationQualityMetrics: """ Quality metrics for translation evaluation. Optimized for low-resource languages like Chuukese. """

def __init__(self):
    self.reference_translations = {}  # Load from reference corpus

def calculate_bleu_score(self, candidate: str, reference: str) -> float:
    """
    Calculate BLEU score using sacrebleu for consistency.
    Note: chrF is preferred for morphologically rich languages.
    """
    try:
        bleu = sacrebleu.sentence_bleu(candidate, [reference])
        return bleu.score / 100.0  # Normalize to 0-1
    except Exception:
        return self._fallback_bleu(candidate, reference)

def _fallback_bleu(self, candidate: str, reference: str) -> float:
    """Fallback BLEU calculation if sacrebleu unavailable"""
    candidate_tokens = candidate.lower().split()
    reference_tokens = reference.lower().split()
    
    precisions = []
    for n in range(1, 5):
        candidate_ngrams = self.get_ngrams(candidate_tokens, n)
        reference_ngrams = self.get_ngrams(reference_tokens, n)
        
        if len(candidate_ngrams) == 0:
            precision = 0
        else:
            matches = sum(min(candidate_ngrams[ngram], reference_ngrams.get(ngram, 0)) 
                        for ngram in candidate_ngrams)
            precision = matches / len(candidate_ngrams)
        
        precisions.append(precision)
    
    candidate_length = len(candidate_tokens)
    reference_length = len(reference_tokens)
    
    if candidate_length > reference_length:
        bp = 1
    else:
        bp = math.exp(1 - reference_length / max(candidate_length, 1))
    
    if min(precisions) > 0:
        bleu = bp * math.exp(sum(math.log(p) for p in precisions if p > 0) / 4)
    else:
        bleu = 0
    
    return bleu

def calculate_chrf_score(self, candidate: str, reference: str) -> float:
    """
    Calculate chrF score - PREFERRED for low-resource and morphologically rich languages.
    chrF is character-based and handles Chuukese accents better than BLEU.
    """
    try:
        chrf = sacrebleu.sentence_chrf(candidate, [reference])
        return chrf.score / 100.0  # Normalize to 0-1
    except Exception:
        return self._fallback_chrf(candidate, reference)

def _fallback_chrf(self, candidate: str, reference: str) -> float:
    """Simple character F-score fallback"""
    candidate_chars = set(candidate.lower())
    reference_chars = set(reference.lower())
    
    if not candidate_chars or not reference_chars:
        return 0.0
    
    intersection = candidate_chars & reference_chars
    precision = len(intersection) / len(candidate_chars)
    recall = len(intersection) / len(reference_chars)
    
    if precision + recall == 0:
        return 0.0
    
    return 2 * precision * recall / (precision + recall)

def get_ngrams(self, tokens: List[str], n: int) -> Counter:
    """Extract n-grams from token list"""
    ngrams = Counter()
    for i in range(len(tokens) - n + 1):
        ngram = tuple(tokens[i:i + n])
        ngrams[ngram] += 1
    return ngrams

def calculate_cultural_preservation_score(self, cultural_validation_results: Dict) -> float:
    """Calculate score based on cultural context preservation"""
    base_score = cultural_validation_results.get('cultural_accuracy_score', 0.0)
    
    missing_context_penalty = len(cultural_validation_results.get('missing_context', [])) * 0.1
    proper_translations_bonus = len(cultural_validation_results.get('proper_translations', [])) * 0.05
    
    final_score = max(0.0, min(1.0, base_score - missing_context_penalty + proper_translations_bonus))
    
    return final_score

class HelsinkiModelEvaluator: """ Specialized evaluator for Helsinki-NLP OPUS-MT models. Handles fine-tuned Chuukese translation models. """

def __init__(self, model_path: str = None):
    self.model_path = model_path
    self.metrics = TranslationQualityMetrics()
    self.device = "cuda" if torch.cuda.is_available() else "cpu"

def evaluate_model(
    self, 
    test_pairs: List[Dict[str, str]],
    direction: str = "chk_to_en"
) -> Dict[str, float]:
    """
    Evaluate Helsinki-NLP model on test set.
    
    Args:
        test_pairs: List of {'source': str, 'target': str}
        direction: 'chk_to_en' or 'en_to_chk'
    
    Returns:
        Dictionary with aggregate metrics
    """
    bleu_scores = []
    chrf_scores = []
    
    for pair in test_pairs:
        source = pair['source']
        target = pair['target']
        
        # Get model prediction
        prediction = self.translate(source, direction)
        
        # Calculate metrics
        bleu = self.metrics.calculate_bleu_score(prediction, target)
        chrf = self.metrics.calculate_chrf_score(prediction, target)
        
        bleu_scores.append(bleu)
        chrf_scores.append(chrf)
    
    return {
        'bleu_mean': sum(bleu_scores) / len(bleu_scores),
        'chrf_mean': sum(chrf_scores) / len(chrf_scores),
        'bleu_scores': bleu_scores,
        'chrf_scores': chrf_scores,
        'samples_evaluated': len(test_pairs)
    }

def compare_models(
    self, 
    base_model_path: str,
    finetuned_model_path: str,
    test_pairs: List[Dict[str, str]]
) -> Dict[str, Dict[str, float]]:
    """
    Compare base vs fine-tuned model performance.
    
    Returns improvement metrics for the fine-tuned model.
    """
    base_results = self._evaluate_with_model(base_model_path, test_pairs)
    finetuned_results = self._evaluate_with_model(finetuned_model_path, test_pairs)
    
    return {
        'base_model': base_results,
        'finetuned_model': finetuned_results,
        'improvement': {
            'bleu_improvement': finetuned_results['bleu_mean'] - base_results['bleu_mean'],
            'chrf_improvement': finetuned_results['chrf_mean'] - base_results['chrf_mean'],
            'bleu_percent_improvement': (
                (finetuned_results['bleu_mean'] - base_results['bleu_mean']) / 
                max(base_results['bleu_mean'], 0.01) * 100
            ),
            'chrf_percent_improvement': (
                (finetuned_results['chrf_mean'] - base_results['chrf_mean']) / 
                max(base_results['chrf_mean'], 0.01) * 100
            )
        }
    }

def translate(self, text: str, direction: str = "chk_to_en") -> str:
    """Translate text using loaded model"""
    # Implementation uses HelsinkiChuukeseTranslator
    pass

def _evaluate_with_model(
    self, 
    model_path: str, 
    test_pairs: List[Dict[str, str]]
) -> Dict[str, float]:
    """Evaluate specific model"""
    # Load model and evaluate
    pass

4. Comprehensive Quality Assessment Pipeline

class TranslationQualityAssessment: def init(self, reference_corpus_path=None): self.cultural_validator = ChuukeseCulturalValidator() self.linguistic_checker = ChuukeseLinguisticChecker() self.metrics_calculator = TranslationQualityMetrics()

    if reference_corpus_path:
        self.load_reference_corpus(reference_corpus_path)

def assess_translation_quality(self, chuukese_text, english_translation, reference_translation=None):
    """Comprehensive translation quality assessment"""
    assessment_report = {
        'overall_quality_score': 0.0,
        'cultural_validation': {},
        'linguistic_accuracy': {},
        'automated_metrics': {},
        'recommendations': []
    }
    
    # Cultural context validation
    cultural_results = self.cultural_validator.validate_cultural_preservation(
        chuukese_text, english_translation
    )
    assessment_report['cultural_validation'] = cultural_results
    
    # Linguistic accuracy check
    linguistic_results = self.linguistic_checker.check_grammatical_accuracy(
        chuukese_text, english_translation
    )
    assessment_report['linguistic_accuracy'] = linguistic_results
    
    # Automated metrics
    metrics = {}
    if reference_translation:
        metrics['bleu_score'] = self.metrics_calculator.calculate_bleu_score(
            english_translation, reference_translation
        )
    
    metrics['cultural_preservation_score'] = self.metrics_calculator.calculate_cultural_preservation_score(
        cultural_results
    )
    
    assessment_report['automated_metrics'] = metrics
    
    # Calculate overall quality score
    overall_score = self.calculate_overall_score(cultural_results, linguistic_results, metrics)
    assessment_report['overall_quality_score'] = overall_score
    
    # Generate recommendations
    recommendations = self.generate_recommendations(cultural_results, linguistic_results, metrics)
    assessment_report['recommendations'] = recommendations
    
    return assessment_report

def calculate_overall_score(self, cultural_results, linguistic_results, metrics):
    """Calculate weighted overall quality score"""
    cultural_score = metrics.get('cultural_preservation_score', 0.0)
    linguistic_score = 1.0 if linguistic_results.get('tense_consistency', False) and linguistic_results.get('structure_preservation', False) else 0.5
    bleu_score = metrics.get('bleu_score', 0.0)
    
    # Weighted average (cultural context heavily weighted for Chuukese)
    overall_score = (cultural_score * 0.4 + linguistic_score * 0.4 + bleu_score * 0.2)
    
    return round(overall_score, 3)

def generate_recommendations(self, cultural_results, linguistic_results, metrics):
    """Generate actionable recommendations for translation improvement"""
    recommendations = []
    
    # Cultural recommendations
    if cultural_results.get('cultural_accuracy_score', 0) < 0.8:
        recommendations.append("Consider adding cultural context or explanations for traditional terms")
    
    for missing in cultural_results.get('missing_context', []):
        recommendations.append(f"Improve translation of '{missing['term']}': {missing['suggestion']}")
    
    # Linguistic recommendations
    if not linguistic_results.get('tense_consistency', True):
        recommendations.append("Review tense consistency between source and target languages")
    
    if not linguistic_results.get('structure_preservation', True):
        recommendations.append("Preserve sentence structure and question forms from source language")
    
    # Metric-based recommendations
    if metrics.get('bleu_score', 0) < 0.3:
        recommendations.append("Consider improving lexical similarity with reference translations")
    
    return recommendations

Usage Examples

Basic Translation Assessment

Initialize assessment system

assessor = TranslationQualityAssessment("reference_corpus.json")

Assess a translation

chuukese_text = "Kopwe pwan chomong ngonuk ekkewe chon Chuuk" english_translation = "We will help those Chuukese people" reference_translation = "We shall assist the people of Chuuk"

assessment = assessor.assess_translation_quality( chuukese_text, english_translation, reference_translation )

print(f"Overall Quality Score: {assessment['overall_quality_score']}") for recommendation in assessment['recommendations']: print(f"- {recommendation}")

Batch Translation Evaluation

def evaluate_translation_batch(translation_pairs): """Evaluate multiple translations and generate summary report""" results = [] total_score = 0

for pair in translation_pairs:
    assessment = assessor.assess_translation_quality(
        pair['chuukese'],
        pair['english'],
        pair.get('reference')
    )
    results.append(assessment)
    total_score += assessment['overall_quality_score']

average_score = total_score / len(translation_pairs)

summary_report = {
    'average_quality_score': average_score,
    'total_assessments': len(translation_pairs),
    'individual_results': results,
    'common_issues': identify_common_issues(results)
}

return summary_report

Best Practices

Quality Assessment

  • Multiple validation layers: Combine automated metrics with cultural validation

  • Reference corpus usage: Maintain high-quality reference translations for comparison

  • Community validation: Involve native speakers in quality assessment

  • Continuous improvement: Update assessment criteria based on feedback

Cultural Sensitivity

  • Context preservation: Ensure cultural concepts are properly explained

  • Respect levels: Validate appropriate formality and respect markers

  • Traditional knowledge: Incorporate understanding of Chuukese customs

  • Community standards: Align with community expectations for translation quality

Technical Implementation

  • Comprehensive metrics: Use multiple quality indicators

  • Actionable feedback: Provide specific, implementable recommendations

  • Scalable assessment: Design for batch processing and automation

  • Continuous learning: Adapt assessment criteria based on new insights

Dependencies

  • re : Regular expression pattern matching

  • math : Mathematical operations for scoring

  • collections : Counter for n-gram analysis

  • json : Reference corpus data handling

  • nltk : Natural language processing utilities

Validation Criteria

A successful implementation should:

  • ✅ Accurately assess cultural context preservation

  • ✅ Validate linguistic accuracy and grammatical consistency

  • ✅ Provide meaningful quality scores and metrics

  • ✅ Generate actionable recommendations for improvement

  • ✅ Handle both individual and batch assessments

  • ✅ Integrate with existing translation workflows

  • ✅ Support continuous quality monitoring and improvement

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

document-ocr-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

css-styling-standards

No summary provided by upstream source.

Repository SourceNeeds Review
General

bible-epub-processing

No summary provided by upstream source.

Repository SourceNeeds Review
General

large-document-processing

No summary provided by upstream source.

Repository SourceNeeds Review