Data Analytics Diagram Generator
Quick Start: Define data sources → Declare ingestion/ETL icons → Connect to storage/warehouse → Add BI/visualization → Wrap in ```plantuml fence.
⚠️ IMPORTANT: Always use ```plantuml or ```puml code fence. NEVER use ```text — it will NOT render as a diagram.
Critical Rules
- Every diagram starts with
@startuml and ends with @enduml
- Use
left to right direction for data pipelines (Source → Ingest → Transform → Store → Visualize)
- Use
mxgraph.aws4.* stencil syntax for analytics, database, and storage icons
- Default colors are applied automatically — you do NOT need to specify
fillColor or strokeColor
- Use
rectangle "Zone" { ... } or package "Layer" { ... } for grouping pipeline stages
- Directed flows use
-->, async/streaming flows use ..> (dashed)
Full stencil reference: See stencils/README.md for 9500+ available icons.
Mxgraph Stencil Syntax
mxgraph.aws4.<icon> "Label" as <alias>
Analytics & ETL Stencils
| Category | Stencils | Purpose |
|---|
| Query Engine | athena, athena_data_source_connectors | Serverless SQL on S3 data |
| ETL | glue, glue_crawlers, glue_data_catalog, aws_glue_data_quality, aws_glue_for_ray | Data integration & cataloging |
| Streaming | kinesis, kinesis_data_streams, kinesis_data_firehose, kinesis_data_analytics, kinesis_video_streams | Real-time data streaming |
| MapReduce | emr, emr_engine, emr_engine_mapr_m3, emr_engine_mapr_m5 | Big data processing (Spark, Hive) |
| Data Warehouse | redshift, redshift_ra3, redshift_streaming_ingestion, redshift_ml | Columnar analytics warehouse |
| Search | opensearch_service_data_node, opensearch_ingestion, cloudsearch | Full-text search & log analytics |
| BI | quicksight | Dashboards & visualizations |
| Data Lake | lake_formation, s3, glacier, glacier_deep_archive | Governed data lake storage |
| Catalog | datazone_custom_asset_type, data_exchange | Data governance & sharing |
| Streaming Kafka | msk, msk_connect | Managed Kafka streaming |
Database Stencils
| Category | Stencils | Purpose |
|---|
| Relational | aurora, aurora_instance, rds, rds_instance, rds_mysql_instance, rds_postgresql_instance | Transactional databases |
| NoSQL | dynamodb, dynamodb_table, dynamodb_global_secondary_index, dynamodb_stream | Key-value & document store |
| Graph | neptune | Graph database |
| In-Memory | elasticache, elasticache_for_redis, elasticache_for_memcached | Cache & session store |
| Document | documentdb, documentdb_with_mongodb_compatibility | Document database |
| Ledger | quantum_ledger_database | Immutable transaction log |
| Wide-Column | keyspaces | Cassandra-compatible |
Connection Types
| Syntax | Meaning | Use Case |
|---|
A --> B | Solid arrow | Batch data flow / API call |
A ..> B | Dashed arrow | Streaming / async / CDC |
A -- B | Solid line | Bidirectional sync |
A --> B : "label" | Labeled connection | Describe data format or volume |
Quick Example
@startuml
left to right direction
mxgraph.aws4.s3 "Data Lake\n(S3)" as s3
mxgraph.aws4.glue "Glue\nETL" as glue
mxgraph.aws4.redshift "Redshift" as rs
mxgraph.aws4.quicksight "QuickSight" as qs
s3 --> glue
glue --> rs
rs --> qs
@enduml
Data Analytics Architecture Types
| Type | Purpose | Key Stencils | Example |
|---|
| Data Lake | Centralized raw data store | s3, lake_formation, glue, athena | data-lake.md |
| Real-time Streaming | Event stream processing | kinesis, msk, lambda_function, opensearch_service | real-time-streaming.md |
| Data Warehouse | Star-schema analytics | redshift, glue, quicksight | data-warehouse.md |
| ETL Pipeline | Extract-transform-load | glue, glue_crawlers, glue_data_catalog, s3 | etl-pipeline.md |
| Log Analytics | Centralized logging | kinesis_data_firehose, opensearch_service, lambda_function | log-analytics.md |
| ML Feature Store | Feature engineering pipeline | glue, s3, athena, emr | ml-feature-pipeline.md |
| CDC Pipeline | Database change capture | dynamodb_streams, kinesis, lambda_function, redshift | cdc-pipeline.md |
| Multi-source BI | Cross-database reporting | aurora, dynamodb, redshift, quicksight | multi-source-bi.md |