Senior Data Engineer
World-class senior data engineer skill for production-grade AI/ML/Data systems.
Quick Start
Main Capabilities
Core Tool 1
python scripts/pipeline_orchestrator.py --input data/ --output results/
Core Tool 2
python scripts/data_quality_validator.py --input data/ --output reports/ --config config.yaml
Core Tool 3
python scripts/etl_performance_optimizer.py --input data/ --output optimized/ --config config.yaml
Core Expertise
This skill covers world-class capabilities in:
-
Advanced production patterns and architectures
-
Scalable system design and implementation
-
Performance optimization at scale
-
MLOps and DataOps best practices
-
Real-time processing and inference
-
Distributed computing frameworks
-
Model deployment and monitoring
-
Security and compliance
-
Cost optimization
-
Team leadership and mentoring
Tech Stack
Languages: Python, SQL, R, Scala, Go ML Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost Data Tools: Spark, Airflow, dbt, Kafka, Databricks LLM Frameworks: LangChain, LlamaIndex, DSPy Deployment: Docker, Kubernetes, AWS/GCP/Azure Monitoring: MLflow, Weights & Biases, Prometheus Databases: PostgreSQL, BigQuery, Snowflake, Pinecone
Reference Documentation
- Data Pipeline Architecture
Comprehensive guide available in references/data_pipeline_architecture.md covering:
-
Advanced patterns and best practices
-
Production implementation strategies
-
Performance optimization techniques
-
Scalability considerations
-
Security and compliance
-
Real-world case studies
- Data Modeling Patterns
Complete workflow documentation in references/data_modeling_patterns.md including:
-
Step-by-step processes
-
Architecture design patterns
-
Tool integration guides
-
Performance tuning strategies
-
Troubleshooting procedures
- Dataops Best Practices
Technical reference guide in references/dataops_best_practices.md with:
-
System design principles
-
Implementation examples
-
Configuration best practices
-
Deployment strategies
-
Monitoring and observability
Production Patterns
Pattern 1: Scalable Data Processing
Enterprise-scale data processing with distributed computing:
-
Horizontal scaling architecture
-
Fault-tolerant design
-
Real-time and batch processing
-
Data quality validation
-
Performance monitoring
Pattern 2: ML Model Deployment
Production ML system with high availability:
-
Model serving with low latency
-
A/B testing infrastructure
-
Feature store integration
-
Model monitoring and drift detection
-
Automated retraining pipelines
Pattern 3: Real-Time Inference
High-throughput inference system:
-
Batching and caching strategies
-
Load balancing
-
Auto-scaling
-
Latency optimization
-
Cost optimization
Best Practices
Development
-
Test-driven development
-
Code reviews and pair programming
-
Documentation as code
-
Version control everything
-
Continuous integration
Production
-
Monitor everything critical
-
Automate deployments
-
Feature flags for releases
-
Canary deployments
-
Comprehensive logging
Team Leadership
-
Mentor junior engineers
-
Drive technical decisions
-
Establish coding standards
-
Foster learning culture
-
Cross-functional collaboration
Performance Targets
Latency:
-
P50: < 50ms
-
P95: < 100ms
-
P99: < 200ms
Throughput:
-
Requests/second: > 1000
-
Concurrent users: > 10,000
Availability:
-
Uptime: 99.9%
-
Error rate: < 0.1%
Security & Compliance
-
Authentication & authorization
-
Data encryption (at rest & in transit)
-
PII handling and anonymization
-
GDPR/CCPA compliance
-
Regular security audits
-
Vulnerability management
Common Commands
Pipeline orchestration
python scripts/pipeline_orchestrator.py --input data/ --output results/ --verbose
Data quality validation
python scripts/data_quality_validator.py --input data/ --output reports/ --config config.yaml
ETL optimization
python scripts/etl_performance_optimizer.py --input data/ --output optimized/ --config config.yaml
Resources
-
Advanced Patterns: references/data_pipeline_architecture.md
-
Implementation Guide: references/data_modeling_patterns.md
-
Technical Reference: references/dataops_best_practices.md
-
Automation Scripts: scripts/ directory
Senior-Level Responsibilities
As a world-class senior professional:
Technical Leadership
-
Drive architectural decisions
-
Mentor team members
-
Establish best practices
-
Ensure code quality
Strategic Thinking
-
Align with business goals
-
Evaluate trade-offs
-
Plan for scale
-
Manage technical debt
Collaboration
-
Work across teams
-
Communicate effectively
-
Build consensus
-
Share knowledge
Innovation
-
Stay current with research
-
Experiment with new approaches
-
Contribute to community
-
Drive continuous improvement
Production Excellence
-
Ensure high availability
-
Monitor proactively
-
Optimize performance
-
Respond to incidents