LangExtract - Structured Information Extraction
Expert assistance for extracting structured, source-grounded information from unstructured text using large language models.
When to Use This Skill
Use this skill when you need to:
-
Extract structured entities from unstructured text (medical notes, reports, documents)
-
Maintain precise source grounding (map extracted data to original text locations)
-
Process long documents beyond LLM token limits
-
Visualize extraction results with interactive HTML highlighting
-
Extract clinical information from medical records
-
Structure radiology or pathology reports
-
Extract medications, diagnoses, or symptoms from clinical notes
-
Analyze literary texts for characters, emotions, relationships
-
Build domain-specific extraction pipelines
-
Work with Gemini, OpenAI, or local models (Ollama)
-
Generate schema-compliant outputs without fine-tuning
Overview
LangExtract is a Python library by Google for extracting structured information from unstructured text using large language models. It emphasizes:
-
Source Grounding: Every extraction maps to its exact location in source text
-
Structured Outputs: Schema-compliant results with controlled generation
-
Long Document Processing: Intelligent chunking and multi-pass extraction
-
Interactive Visualization: Self-contained HTML for reviewing extractions in context
-
Flexible LLM Support: Works with Gemini, OpenAI, and local models
-
Few-Shot Learning: Requires only quality examples, no expensive fine-tuning
Key Resources:
-
Examples: https://github.com/google/langextract/tree/main/examples
-
Documentation: https://github.com/google/langextract/tree/main/docs/examples
Installation
Prerequisites
-
Python 3.8 or higher
-
API key for Gemini (AI Studio), OpenAI, or local Ollama setup
Basic Installation
Install from PyPI (recommended)
pip install langextract
Install with OpenAI support
pip install langextract[openai]
Install with development tools
pip install langextract[dev]
Install from Source
git clone https://github.com/google/langextract.git cd langextract pip install -e .
For development with testing
pip install -e ".[test]"
Docker Installation
Build Docker image
docker build -t langextract .
Run with API key
docker run --rm
-e LANGEXTRACT_API_KEY="your-api-key"
langextract python your_script.py
API Key Setup
Gemini (Google AI Studio):
export LANGEXTRACT_API_KEY="your-gemini-api-key"
Get keys from: https://ai.google.dev/
OpenAI:
export OPENAI_API_KEY="your-openai-api-key"
Vertex AI (Enterprise):
Use service account authentication
Set project in language_model_params
.env File (Development):
Create .env file
echo "LANGEXTRACT_API_KEY=your-key-here" > .env
Quick Start
Basic Extraction Example
import langextract as lx import textwrap
1. Define extraction task
prompt = textwrap.dedent("""
Extract all medications mentioned in the clinical note.
Include medication name, dosage, and frequency.
Use exact text from the document.""")
2. Provide examples (few-shot learning)
examples = [ lx.data.ExampleData( text="Patient prescribed Lisinopril 10mg daily for hypertension.", extractions=[ lx.data.Extraction( extraction_class="medication", extraction_text="Lisinopril 10mg daily", attributes={ "name": "Lisinopril", "dosage": "10mg", "frequency": "daily", "indication": "hypertension" } ) ] ) ]
3. Input text to extract from
input_text = """ Patient continues on Metformin 500mg twice daily for diabetes management. Started on Amlodipine 5mg once daily for blood pressure control. Discontinued Aspirin 81mg due to side effects. """
4. Run extraction
result = lx.extract( text_or_documents=input_text, prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp" )
5. Access results
for extraction in result.extractions: print(f"Medication: {extraction.extraction_text}") print(f" Name: {extraction.attributes.get('name')}") print(f" Dosage: {extraction.attributes.get('dosage')}") print(f" Frequency: {extraction.attributes.get('frequency')}") print(f" Location: {extraction.start_char}-{extraction.end_char}") print()
6. Save and visualize
lx.io.save_annotated_documents( [result], output_name="medications.jsonl", output_dir="." )
html_content = lx.visualize("medications.jsonl") with open("medications.html", "w") as f: f.write(html_content)
Literary Text Example
import langextract as lx
prompt = """Extract characters, emotions, and relationships in order of appearance. Use exact text for extractions. Do not paraphrase or overlap entities."""
examples = [ lx.data.ExampleData( text="ROMEO entered the garden, filled with wonder at JULIET's beauty.", extractions=[ lx.data.Extraction( extraction_class="character", extraction_text="ROMEO", attributes={"emotional_state": "wonder"} ), lx.data.Extraction( extraction_class="character", extraction_text="JULIET", attributes={} ), lx.data.Extraction( extraction_class="relationship", extraction_text="ROMEO ... JULIET's beauty", attributes={ "subject": "ROMEO", "relation": "admires", "object": "JULIET" } ) ] ) ]
text = """Act 2, Scene 2: The Capulet's orchard. ROMEO appears beneath JULIET's balcony, gazing upward with longing. JULIET steps onto the balcony, unaware of ROMEO's presence below."""
result = lx.extract( text_or_documents=text, prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp" )
Core Concepts
- Extraction Classes
Define categories of entities to extract:
Single class
extraction_class="medication"
Multiple classes via examples
examples = [ lx.data.ExampleData( text="...", extractions=[ lx.data.Extraction(extraction_class="diagnosis", ...), lx.data.Extraction(extraction_class="symptom", ...), lx.data.Extraction(extraction_class="medication", ...) ] ) ]
- Source Grounding
Every extraction includes precise text location:
extraction = result.extractions[0] print(f"Text: {extraction.extraction_text}") print(f"Start: {extraction.start_char}") print(f"End: {extraction.end_char}")
Extract from original document
original_text = input_text[extraction.start_char:extraction.end_char]
- Attributes
Add structured metadata to extractions:
lx.data.Extraction( extraction_class="medication", extraction_text="Lisinopril 10mg daily", attributes={ "name": "Lisinopril", "dosage": "10mg", "frequency": "daily", "route": "oral", "indication": "hypertension" } )
- Few-Shot Learning
Provide 1-5 quality examples instead of fine-tuning:
Minimal examples (1-2) for simple tasks
examples = [example1]
More examples (3-5) for complex schemas
examples = [example1, example2, example3, example4, example5]
- Long Document Processing
Automatic chunking for documents beyond token limits:
result = lx.extract( text_or_documents=long_document, # Any length prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp", extraction_passes=3, # Multiple passes for better recall max_workers=20, # Parallel processing max_char_buffer=1000 # Chunk overlap for continuity )
Configuration
Model Selection
Gemini models (recommended)
model_id="gemini-2.0-flash-exp" # Fast, cost-effective model_id="gemini-2.0-flash-thinking-exp" # Complex reasoning model_id="gemini-1.5-pro" # Legacy
OpenAI models
model_id="gpt-4o" # GPT-4 Optimized model_id="gpt-4o-mini" # Smaller, faster
Local models via Ollama
model_id="gemma2:2b" # Local inference model_url="http://localhost:11434"
Scaling Parameters
result = lx.extract( text_or_documents=documents, prompt_description=prompt, examples=examples,
# Multi-pass extraction for better recall
extraction_passes=3,
# Parallel processing
max_workers=20,
# Chunk size tuning
max_char_buffer=1000,
# Model configuration
model_id="gemini-2.0-flash-exp"
)
Backend Configuration
Vertex AI:
result = lx.extract( text_or_documents=text, prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp", language_model_params={ "vertexai": True, "project": "your-gcp-project-id", "location": "us-central1" } )
Batch Processing:
language_model_params={ "batch": { "enabled": True } }
OpenAI Configuration:
result = lx.extract( text_or_documents=text, prompt_description=prompt, examples=examples, model_id="gpt-4o", fence_output=True, # Required for OpenAI use_schema_constraints=False # Disable Gemini-specific features )
Local Ollama:
result = lx.extract( text_or_documents=text, prompt_description=prompt, examples=examples, model_id="gemma2:2b", model_url="http://localhost:11434", use_schema_constraints=False )
Environment Variables
API Keys
LANGEXTRACT_API_KEY="gemini-api-key" OPENAI_API_KEY="openai-api-key"
Vertex AI
GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
Model configuration
LANGEXTRACT_MODEL_ID="gemini-2.0-flash-exp" LANGEXTRACT_MODEL_URL="http://localhost:11434"
Common Patterns
Pattern 1: Clinical Note Extraction
import langextract as lx
prompt = """Extract diagnoses, symptoms, and medications from clinical notes. Include ICD-10 codes when available. Use exact medical terminology."""
examples = [ lx.data.ExampleData( text="Patient presents with Type 2 Diabetes Mellitus (E11.9). Started on Metformin 500mg BID. Reports fatigue and increased thirst.", extractions=[ lx.data.Extraction( extraction_class="diagnosis", extraction_text="Type 2 Diabetes Mellitus (E11.9)", attributes={"condition": "Type 2 Diabetes Mellitus", "icd10": "E11.9"} ), lx.data.Extraction( extraction_class="medication", extraction_text="Metformin 500mg BID", attributes={"name": "Metformin", "dosage": "500mg", "frequency": "BID"} ), lx.data.Extraction( extraction_class="symptom", extraction_text="fatigue", attributes={"symptom": "fatigue"} ), lx.data.Extraction( extraction_class="symptom", extraction_text="increased thirst", attributes={"symptom": "polydipsia"} ) ] ) ]
Process multiple clinical notes
clinical_notes = [ "Note 1: Patient presents with...", "Note 2: Follow-up visit for...", "Note 3: New onset chest pain..." ]
results = lx.extract( text_or_documents=clinical_notes, prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp", extraction_passes=2, max_workers=10 )
Save structured output
lx.io.save_annotated_documents( results, output_name="clinical_extractions.jsonl", output_dir="./output" )
Pattern 2: Radiology Report Structuring
prompt = """Extract findings, impressions, and recommendations from radiology reports. Include anatomical location, abnormality type, and severity."""
examples = [ lx.data.ExampleData( text="FINDINGS: 3.2cm mass in right upper lobe. IMPRESSION: Suspicious for malignancy. RECOMMENDATION: Biopsy recommended.", extractions=[ lx.data.Extraction( extraction_class="finding", extraction_text="3.2cm mass in right upper lobe", attributes={ "location": "right upper lobe", "type": "mass", "size": "3.2cm" } ), lx.data.Extraction( extraction_class="impression", extraction_text="Suspicious for malignancy", attributes={"diagnosis": "possible malignancy", "certainty": "suspicious"} ), lx.data.Extraction( extraction_class="recommendation", extraction_text="Biopsy recommended", attributes={"action": "biopsy"} ) ] ) ]
Pattern 3: Multi-Document Processing
import langextract as lx from pathlib import Path
Load multiple documents
documents = [] for file_path in Path("./documents").glob("*.txt"): with open(file_path, "r") as f: documents.append(f.read())
Extract from all documents
results = lx.extract( text_or_documents=documents, prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp", extraction_passes=3, max_workers=20 )
Results is a list of AnnotatedDocument objects
for i, result in enumerate(results): print(f"\nDocument {i+1}: {len(result.extractions)} extractions") for extraction in result.extractions: print(f" - {extraction.extraction_class}: {extraction.extraction_text}")
Pattern 4: Interactive Visualization
Generate interactive HTML
html_content = lx.visualize("extractions.jsonl")
Save to file
with open("interactive_results.html", "w") as f: f.write(html_content)
Open in browser (optional)
import webbrowser webbrowser.open("interactive_results.html")
Pattern 5: Custom Provider Plugin
See examples/custom_provider_plugin/ for full implementation
from langextract.providers import ProviderPlugin
class CustomProvider(ProviderPlugin): def extract(self, text, prompt, examples, **kwargs): # Custom extraction logic return extractions
def supports_schema_constraints(self):
return False
Register custom provider
lx.register_provider("custom", CustomProvider())
Use custom provider
result = lx.extract( text_or_documents=text, prompt_description=prompt, examples=examples, model_id="custom", provider="custom" )
API Reference
Core Functions
lx.extract()
Main extraction function.
result = lx.extract( text_or_documents, # str or list of str prompt_description, # str: extraction instructions examples, # list of ExampleData model_id="gemini-2.0-flash-exp", # str: model identifier extraction_passes=1, # int: number of passes max_workers=None, # int: parallel workers max_char_buffer=1000, # int: chunk overlap language_model_params=None, # dict: model config fence_output=False, # bool: required for OpenAI use_schema_constraints=True, # bool: use schema enforcement model_url=None, # str: custom model endpoint api_key=None # str: API key (prefer env var) )
Returns: AnnotatedDocument or list[AnnotatedDocument]
lx.visualize()
Generate interactive HTML visualization.
html_content = lx.visualize( jsonl_file_path, # str: path to JSONL file title="Extraction Results", # str: HTML page title show_attributes=True # bool: display attributes )
Returns: str (HTML content)
lx.io.save_annotated_documents()
Save results to JSONL format.
lx.io.save_annotated_documents( annotated_documents, # list of AnnotatedDocument output_name, # str: filename (e.g., "results.jsonl") output_dir="." # str: output directory )
Data Classes
ExampleData
Few-shot example definition.
example = lx.data.ExampleData( text="Example text here", extractions=[ lx.data.Extraction(...) ] )
Extraction
Single extraction definition.
extraction = lx.data.Extraction( extraction_class="medication", # str: entity type extraction_text="Aspirin 81mg", # str: exact text attributes={ # dict: metadata "name": "Aspirin", "dosage": "81mg" }, start_char=0, # int: start position (auto-set) end_char=13 # int: end position (auto-set) )
AnnotatedDocument
Extraction results for a document.
result.text # str: original text result.extractions # list of Extraction result.metadata # dict: additional info
Best Practices
Extraction Design
Write Clear Prompts: Be specific about what to extract and how
Good
prompt = "Extract medications with dosage, frequency, and route of administration. Use exact medical terminology."
Avoid
prompt = "Extract medications."
Provide Quality Examples: 1-5 well-crafted examples beat many poor ones
Include edge cases in examples
examples = [ normal_case_example, edge_case_example, complex_case_example ]
Use Exact Text: Extract verbatim from source for accurate grounding
Good
extraction_text="Lisinopril 10mg daily"
Avoid paraphrasing
extraction_text="10mg lisinopril taken once per day"
Define Attributes Clearly: Structure metadata consistently
attributes={ "name": "Lisinopril", # Drug name "dosage": "10mg", # Amount "frequency": "daily", # How often "route": "oral" # How taken }
Performance Optimization
Multi-Pass for Long Documents: Improves recall
extraction_passes=3 # 2-3 passes recommended for thorough extraction
Parallel Processing: Speed up batch operations
max_workers=20 # Adjust based on API rate limits
Chunk Size Tuning: Balance accuracy and context
max_char_buffer=1000 # Larger for context, smaller for speed
Model Selection: Choose based on task complexity
Simple extraction
model_id="gemini-2.0-flash-exp"
Complex reasoning
model_id="gemini-2.0-flash-thinking-exp"
Production Deployment
API Key Security: Never hardcode keys
Good: Use environment variables
import os api_key = os.getenv("LANGEXTRACT_API_KEY")
Avoid: Hardcoding
api_key = "AIza..." # Never do this
Error Handling: Handle API failures gracefully
try: result = lx.extract(...) except Exception as e: logger.error(f"Extraction failed: {e}") # Implement retry logic or fallback
Cost Management: Monitor API usage
Use cheaper models for bulk processing
model_id="gemini-2.0-flash-exp" # vs "gemini-1.5-pro"
Batch processing for cost efficiency
language_model_params={"batch": {"enabled": True}}
Validation: Verify extraction quality
for extraction in result.extractions: # Validate extraction is within document bounds assert 0 <= extraction.start_char < len(result.text) assert extraction.end_char <= len(result.text)
# Verify text matches
extracted = result.text[extraction.start_char:extraction.end_char]
assert extracted == extraction.extraction_text
Common Pitfalls
Overlapping Extractions
-
Issue: Extractions overlap or duplicate
-
Solution: Specify in prompt "Do not overlap entities"
Paraphrasing Instead of Exact Text
-
Issue: Extracted text doesn't match original
-
Solution: Prompt "Use exact text from document. Do not paraphrase."
Insufficient Examples
-
Issue: Poor extraction quality
-
Solution: Provide 3-5 diverse examples covering edge cases
Model Limitations
-
Issue: Schema constraints not supported on all models
-
Solution: Set use_schema_constraints=False for OpenAI/Ollama
Troubleshooting
Common Issues
Issue 1: API Authentication Failed
Symptoms:
-
AuthenticationError: Invalid API key
-
Permission denied errors
Solution:
Verify API key is set
echo $LANGEXTRACT_API_KEY
Set API key
export LANGEXTRACT_API_KEY="your-key-here"
For OpenAI
export OPENAI_API_KEY="your-openai-key"
Verify key works
python -c "import os; print(os.getenv('LANGEXTRACT_API_KEY'))"
Issue 2: Schema Constraints Error
Symptoms:
-
Schema constraints not supported error
-
Malformed output with OpenAI or Ollama
Solution:
Disable schema constraints for non-Gemini models
result = lx.extract( text_or_documents=text, prompt_description=prompt, examples=examples, model_id="gpt-4o", use_schema_constraints=False, # Disable for OpenAI fence_output=True # Enable for OpenAI )
Issue 3: Token Limit Exceeded
Symptoms:
-
Token limit exceeded error
-
Truncated results
Solution:
Use multi-pass extraction
result = lx.extract( text_or_documents=long_text, prompt_description=prompt, examples=examples, extraction_passes=3, # Multiple passes max_char_buffer=1000, # Adjust chunk size max_workers=10 # Parallel processing )
Issue 4: Poor Extraction Quality
Symptoms:
-
Missing entities
-
Incorrect extractions
-
Paraphrased text
Solution:
Improve prompt specificity
prompt = """Extract medications with exact dosage and frequency. Use exact text from document. Do not paraphrase. Include generic and brand names. Extract discontinued medications as well."""
Add more diverse examples
examples = [ normal_case, edge_case_1, edge_case_2, complex_case ]
Increase extraction passes
extraction_passes=3
Try more capable model
model_id="gemini-2.0-flash-thinking-exp"
Issue 5: Ollama Connection Failed
Symptoms:
-
Connection refused to localhost:11434
-
Ollama model not found
Solution:
Start Ollama server
ollama serve
Pull required model
ollama pull gemma2:2b
Verify Ollama is running
curl http://localhost:11434/api/tags
Use in langextract
python -c " import langextract as lx result = lx.extract( text_or_documents='test', prompt_description='Extract entities', examples=[], model_id='gemma2:2b', model_url='http://localhost:11434', use_schema_constraints=False ) "
Debugging Tips
Enable Verbose Logging
import logging logging.basicConfig(level=logging.DEBUG)
Inspect Intermediate Results
Save each pass separately
for i, result in enumerate(results): lx.io.save_annotated_documents( [result], output_name=f"pass_{i}.jsonl", output_dir="./debug" )
Validate Examples
Check examples match expected format
for example in examples: for extraction in example.extractions: # Verify text is in example text assert extraction.extraction_text in example.text print(f"✓ {extraction.extraction_class}: {extraction.extraction_text}")
Test with Simple Input First
Start with minimal test
test_result = lx.extract( text_or_documents="Patient on Aspirin 81mg daily.", prompt_description="Extract medications.", examples=[simple_example], model_id="gemini-2.0-flash-exp" ) print(f"Extractions: {len(test_result.extractions)}")
Advanced Topics
Custom Extraction Schemas
Define complex nested structures:
examples = [ lx.data.ExampleData( text="Patient presents with chest pain. ECG shows ST elevation. Diagnosed with STEMI.", extractions=[ lx.data.Extraction( extraction_class="clinical_event", extraction_text="Patient presents with chest pain. ECG shows ST elevation. Diagnosed with STEMI.", attributes={ "symptom": "chest pain", "diagnostic_test": "ECG", "finding": "ST elevation", "diagnosis": "STEMI", "severity": "severe", "timeline": [ {"event": "symptom_onset", "description": "chest pain"}, {"event": "diagnostic", "description": "ECG shows ST elevation"}, {"event": "diagnosis", "description": "STEMI"} ] } ) ] ) ]
Batch Processing with Progress Tracking
from tqdm import tqdm import langextract as lx
documents = load_documents() # List of documents results = []
for i, doc in enumerate(tqdm(documents)): try: result = lx.extract( text_or_documents=doc, prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp" ) results.append(result)
# Save incrementally
if (i + 1) % 100 == 0:
lx.io.save_annotated_documents(
results,
output_name=f"batch_{i+1}.jsonl",
output_dir="./batches"
)
results = [] # Clear for next batch
except Exception as e:
print(f"Failed on document {i}: {e}")
continue
Integration with Data Pipelines
import langextract as lx import pandas as pd
Load data
df = pd.read_csv("clinical_notes.csv")
Extract from each note
extractions_data = []
for idx, row in df.iterrows(): result = lx.extract( text_or_documents=row['note_text'], prompt_description=prompt, examples=examples, model_id="gemini-2.0-flash-exp" )
for extraction in result.extractions:
extractions_data.append({
'patient_id': row['patient_id'],
'note_date': row['note_date'],
'extraction_class': extraction.extraction_class,
'extraction_text': extraction.extraction_text,
**extraction.attributes
})
Create structured DataFrame
extractions_df = pd.DataFrame(extractions_data) extractions_df.to_csv("structured_extractions.csv", index=False)
Performance Benchmarking
import time import langextract as lx
def benchmark_extraction(documents, model_id, passes=1): start = time.time()
results = lx.extract(
text_or_documents=documents,
prompt_description=prompt,
examples=examples,
model_id=model_id,
extraction_passes=passes,
max_workers=20
)
elapsed = time.time() - start
total_extractions = sum(len(r.extractions) for r in results)
print(f"Model: {model_id}")
print(f"Passes: {passes}")
print(f"Documents: {len(documents)}")
print(f"Total extractions: {total_extractions}")
print(f"Time: {elapsed:.2f}s")
print(f"Throughput: {len(documents)/elapsed:.2f} docs/sec")
print()
Compare models
benchmark_extraction(docs, "gemini-2.0-flash-exp", passes=1) benchmark_extraction(docs, "gemini-2.0-flash-exp", passes=3) benchmark_extraction(docs, "gpt-4o", passes=1)
Examples
Example Projects
The repository includes several example implementations:
Custom Provider Plugin (examples/custom_provider_plugin/ )
-
How to create custom extraction backends
-
Integration with proprietary models
Jupyter Notebooks (examples/notebooks/ )
-
Interactive extraction workflows
-
Visualization and analysis
Ollama Integration (examples/ollama/ )
-
Local model usage
-
Privacy-preserving extraction
Medical Use Case
See examples/clinical_extraction.py for a complete medical extraction pipeline.
Literary Analysis
See examples/literary_extraction.py for character and relationship extraction from novels.
Testing
Running Tests
Install test dependencies
pip install -e ".[test]"
Run all tests
pytest tests
Run with coverage
pytest tests --cov=langextract
Run specific test
pytest tests/test_extraction.py
Run integration tests
pytest tests/integration/
Integration Testing with Ollama
Install tox
pip install tox
Run Ollama integration tests
tox -e ollama-integration
Writing Tests
import langextract as lx
def test_basic_extraction(): prompt = "Extract names." examples = [ lx.data.ExampleData( text="John Smith visited the clinic.", extractions=[ lx.data.Extraction( extraction_class="name", extraction_text="John Smith" ) ] ) ]
result = lx.extract(
text_or_documents="Mary Johnson was the doctor.",
prompt_description=prompt,
examples=examples,
model_id="gemini-2.0-flash-exp"
)
assert len(result.extractions) >= 1
assert result.extractions[0].extraction_class == "name"
Resources
Official Documentation
-
GitHub Repository: https://github.com/google/langextract
-
Examples Directory: https://github.com/google/langextract/tree/main/examples
-
Documentation: https://github.com/google/langextract/tree/main/docs/examples
Model Documentation
-
Gemini API: https://ai.google.dev/
-
Vertex AI: https://cloud.google.com/vertex-ai
-
OpenAI API: https://platform.openai.com/
-
Ollama: https://ollama.ai/
Related Tools
-
Google AI Studio: Web interface for Gemini models
-
Vertex AI Workbench: Enterprise AI development
-
LangChain: LLM application framework
-
Instructor: Structured outputs library
Use Case Examples
-
Clinical information extraction
-
Legal document analysis
-
Scientific literature mining
-
Customer feedback structuring
-
Contract entity extraction
Contributing
Contributions welcome! See the official repository for guidelines: https://github.com/google/langextract
Development Setup
git clone https://github.com/google/langextract.git cd langextract pip install -e ".[dev]" pre-commit install
Running CI Locally
Full test matrix
tox
Specific Python version
tox -e py310
Code formatting
black langextract/ isort langextract/
Linting
flake8 langextract/ mypy langextract/
Version Information
Last Updated: 2025-12-25 Skill Version: 1.0.0 LangExtract Version: Latest (check PyPI)
This skill provides comprehensive guidance for LangExtract based on official documentation and examples. For the latest updates, refer to the GitHub repository.