Elasticsearch Expert
Expert guidance for Elasticsearch, search optimization, ELK stack, and distributed search systems.
Core Concepts
-
Full-text search and inverted indexes
-
Document-oriented storage
-
RESTful API
-
Distributed architecture with sharding
-
ELK stack (Elasticsearch, Logstash, Kibana)
-
Aggregations and analytics
Index Management
from elasticsearch import Elasticsearch
es = Elasticsearch(['http://localhost:9200'])
Create index with mapping
mapping = { "mappings": { "properties": { "title": {"type": "text", "analyzer": "english"}, "content": {"type": "text"}, "author": {"type": "keyword"}, "created_at": {"type": "date"}, "views": {"type": "integer"} } } }
es.indices.create(index='articles', body=mapping)
Index document
doc = { "title": "Elasticsearch Guide", "content": "Complete guide to Elasticsearch", "author": "John Doe", "created_at": "2024-01-01", "views": 100 }
es.index(index='articles', id=1, body=doc)
Bulk indexing
from elasticsearch.helpers import bulk
actions = [ {"_index": "articles", "_id": i, "_source": doc} for i, doc in enumerate(documents) ]
bulk(es, actions)
Search Queries
Full-text search
query = { "query": { "match": { "content": "elasticsearch guide" } } }
results = es.search(index='articles', body=query)
Boolean query
bool_query = { "query": { "bool": { "must": [ {"match": {"content": "elasticsearch"}} ], "filter": [ {"range": {"views": {"gte": 100}}} ], "should": [ {"term": {"author": "john-doe"}} ], "must_not": [ {"term": {"status": "draft"}} ] } } }
Multi-match query
multi_match = { "query": { "multi_match": { "query": "elasticsearch guide", "fields": ["title^2", "content"], # Boost title "type": "best_fields" } } }
Fuzzy search
fuzzy = { "query": { "fuzzy": { "title": { "value": "elasticseerch", "fuzziness": "AUTO" } } } }
Aggregations
Aggregation query
agg_query = { "aggs": { "authors": { "terms": { "field": "author", "size": 10 } }, "avg_views": { "avg": { "field": "views" } }, "views_histogram": { "histogram": { "field": "views", "interval": 100 } }, "date_histogram": { "date_histogram": { "field": "created_at", "calendar_interval": "month" } } } }
result = es.search(index='articles', body=agg_query)
Best Practices
-
Design mappings carefully
-
Use appropriate analyzers
-
Implement proper sharding strategy
-
Monitor cluster health
-
Use bulk operations
-
Implement pagination with search_after
-
Cache frequently used queries
Anti-Patterns
❌ Deep pagination with from/size ❌ Wildcard queries without prefix ❌ No replica shards ❌ Over-sharding ❌ Not using filters for exact matches ❌ Ignoring cluster yellow/red status
Resources
-
Elasticsearch Guide: https://www.elastic.co/guide/
-
ELK Stack: https://www.elastic.co/elk-stack