Data Visualization
Overview
This skill focuses on creating effective data visualizations that communicate insights clearly. It covers various visualization libraries, chart selection, and design principles for impactful data presentation.
Instructions
- Understand the Data
-
Analyze data structure and types
-
Identify key metrics and dimensions
-
Determine the story to tell
-
Consider the target audience
- Select Appropriate Visualization
-
Match chart type to data relationship
-
Consider data volume and complexity
-
Plan for interactivity needs
-
Account for accessibility
- Design for Clarity
-
Choose effective color schemes
-
Label axes and data clearly
-
Remove chart junk
-
Highlight key insights
- Implement and Iterate
-
Build visualization with chosen tool
-
Test with real data
-
Gather feedback
-
Refine based on usage
Best Practices
-
Right Chart for Data: Match visualization to data type
-
Less is More: Remove unnecessary elements
-
Consistent Styling: Use coherent color schemes
-
Accessible Design: Consider colorblind users
-
Clear Labels: Descriptive titles and axis labels
-
Context Matters: Include reference points
-
Interactive When Helpful: Add tooltips and filters
Examples
Example 1: Python with Matplotlib/Seaborn
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import numpy as np
Set style for professional look
plt.style.use('seaborn-v0_8-whitegrid') sns.set_palette("husl")
Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
Example 1: Line chart for time series
df_sales = pd.DataFrame({ 'date': pd.date_range('2024-01-01', periods=12, freq='M'), 'revenue': [100, 120, 115, 140, 155, 170, 165, 180, 195, 210, 225, 250], 'target': [110, 115, 120, 130, 145, 160, 175, 185, 200, 215, 230, 245] })
ax1 = axes[0, 0] ax1.plot(df_sales['date'], df_sales['revenue'], marker='o', linewidth=2, label='Actual') ax1.plot(df_sales['date'], df_sales['target'], linestyle='--', linewidth=2, label='Target') ax1.fill_between(df_sales['date'], df_sales['revenue'], df_sales['target'], alpha=0.3, where=(df_sales['revenue'] >= df_sales['target']), color='green') ax1.fill_between(df_sales['date'], df_sales['revenue'], df_sales['target'], alpha=0.3, where=(df_sales['revenue'] < df_sales['target']), color='red') ax1.set_title('Monthly Revenue vs Target', fontsize=14, fontweight='bold') ax1.set_xlabel('Month') ax1.set_ylabel('Revenue ($K)') ax1.legend() ax1.tick_params(axis='x', rotation=45)
Example 2: Bar chart for comparison
df_products = pd.DataFrame({ 'product': ['Product A', 'Product B', 'Product C', 'Product D', 'Product E'], 'sales': [45, 32, 28, 22, 18] })
ax2 = axes[0, 1] colors = sns.color_palette("Blues_r", len(df_products)) bars = ax2.barh(df_products['product'], df_products['sales'], color=colors) ax2.bar_label(bars, padding=3, fmt='$%.0fK') ax2.set_title('Sales by Product', fontsize=14, fontweight='bold') ax2.set_xlabel('Sales ($K)') ax2.invert_yaxis()
Example 3: Scatter plot with regression
np.random.seed(42) df_scatter = pd.DataFrame({ 'ad_spend': np.random.uniform(10, 100, 50), 'conversions': lambda x: x['ad_spend'] * 2.5 + np.random.normal(0, 15, 50) }.class.call(pd.DataFrame({'ad_spend': np.random.uniform(10, 100, 50)}))) df_scatter['conversions'] = df_scatter['ad_spend'] * 2.5 + np.random.normal(0, 15, 50)
ax3 = axes[1, 0] sns.regplot(data=df_scatter, x='ad_spend', y='conversions', ax=ax3, scatter_kws={'alpha': 0.6}, line_kws={'color': 'red'}) ax3.set_title('Ad Spend vs Conversions', fontsize=14, fontweight='bold') ax3.set_xlabel('Ad Spend ($K)') ax3.set_ylabel('Conversions')
Example 4: Pie/Donut chart for composition
df_channels = pd.DataFrame({ 'channel': ['Organic', 'Paid Search', 'Social', 'Email', 'Direct'], 'traffic': [35, 25, 20, 12, 8] })
ax4 = axes[1, 1] wedges, texts, autotexts = ax4.pie( df_channels['traffic'], labels=df_channels['channel'], autopct='%1.1f%%', pctdistance=0.75, wedgeprops=dict(width=0.5) ) ax4.set_title('Traffic by Channel', fontsize=14, fontweight='bold')
plt.tight_layout() plt.savefig('dashboard.png', dpi=150, bbox_inches='tight') plt.show()
Example 2: Interactive Visualization with Plotly
import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots import pandas as pd
Create interactive time series
df = pd.DataFrame({ 'date': pd.date_range('2024-01-01', periods=365, freq='D'), 'value': (pd.Series(range(365)) * 0.1 + np.sin(pd.Series(range(365)) * 0.1) * 20 + np.random.normal(0, 5, 365)).cumsum() })
fig = go.Figure()
fig.add_trace(go.Scatter( x=df['date'], y=df['value'], mode='lines', name='Daily Value', line=dict(color='#1f77b4', width=1.5), hovertemplate='%{x|%B %d, %Y}<br>Value: %{y:.2f}<extra></extra>' ))
Add moving average
df['ma_7'] = df['value'].rolling(7).mean() fig.add_trace(go.Scatter( x=df['date'], y=df['ma_7'], mode='lines', name='7-day MA', line=dict(color='#ff7f0e', width=2, dash='dash') ))
fig.update_layout( title='Daily Performance with Moving Average', xaxis_title='Date', yaxis_title='Value', hovermode='x unified', template='plotly_white', xaxis=dict( rangeselector=dict( buttons=list([ dict(count=7, label="1w", step="day", stepmode="backward"), dict(count=1, label="1m", step="month", stepmode="backward"), dict(count=3, label="3m", step="month", stepmode="backward"), dict(step="all") ]) ), rangeslider=dict(visible=True) ) )
fig.write_html('interactive_chart.html') fig.show()
Example 3: Chart Type Selection Guide
Chart Selection by Data Type
Comparison
- Bar Chart: Compare values across categories
- Grouped Bar: Compare multiple series across categories
- Bullet Chart: Show performance against target
Distribution
- Histogram: Show frequency distribution
- Box Plot: Show distribution summary statistics
- Violin Plot: Show distribution shape
Composition
- Pie/Donut Chart: Show parts of a whole (< 6 categories)
- Stacked Bar: Show composition across categories
- Treemap: Show hierarchical composition
Relationship
- Scatter Plot: Show correlation between two variables
- Bubble Chart: Add third dimension via size
- Heatmap: Show correlation matrix
Time Series
- Line Chart: Show trends over time
- Area Chart: Show cumulative trends
- Candlestick: Show OHLC financial data
Geographic
- Choropleth: Show values by region
- Point Map: Show locations with values
- Flow Map: Show movement between locations
Example 4: Dashboard Layout Principles
Streamlit Dashboard Example
import streamlit as st import pandas as pd import plotly.express as px
st.set_page_config(page_title="Sales Dashboard", layout="wide")
Header
st.title("Sales Performance Dashboard") st.markdown("---")
KPI Row
col1, col2, col3, col4 = st.columns(4) with col1: st.metric("Total Revenue", "$1.2M", "+12%") with col2: st.metric("Orders", "8,543", "+8%") with col3: st.metric("Avg Order Value", "$140", "+3%") with col4: st.metric("Conversion Rate", "3.2%", "-0.5%")
st.markdown("---")
Filters
with st.sidebar: st.header("Filters") date_range = st.date_input("Date Range", []) region = st.multiselect("Region", ["North", "South", "East", "West"]) category = st.selectbox("Category", ["All", "Electronics", "Clothing", "Home"])
Main Charts
left_col, right_col = st.columns([2, 1])
with left_col: st.subheader("Revenue Trend") # Line chart here
with right_col: st.subheader("Sales by Region") # Pie chart here
Detail Table
st.subheader("Recent Orders")