GeoPandas Geospatial Data Analysis
Python library for geospatial vector data - extends pandas with spatial operations.
When to Use
-
Working with geographic/spatial data (shapefiles, GeoJSON, GeoPackage)
-
Spatial analysis (buffer, intersection, spatial joins)
-
Coordinate transformations and projections
-
Creating choropleth maps
-
Processing geographic boundaries, points, lines, polygons
Quick Start
import geopandas as gpd
Read spatial data
gdf = gpd.read_file("data.geojson")
Basic exploration
print(gdf.head()) print(gdf.crs) # Coordinate Reference System print(gdf.geometry.geom_type)
Simple plot
gdf.plot()
Reproject to different CRS
gdf_projected = gdf.to_crs("EPSG:3857")
Calculate area (use projected CRS)
gdf_projected['area'] = gdf_projected.geometry.area
Save to file
gdf.to_file("output.gpkg")
Reading/Writing Data
Read various formats
gdf = gpd.read_file("data.shp") # Shapefile gdf = gpd.read_file("data.geojson") # GeoJSON gdf = gpd.read_file("data.gpkg") # GeoPackage
Read with spatial filter (faster for large files)
gdf = gpd.read_file("data.gpkg", bbox=(xmin, ymin, xmax, ymax))
Write to file
gdf.to_file("output.gpkg") gdf.to_file("output.geojson", driver="GeoJSON")
PostGIS database
from sqlalchemy import create_engine engine = create_engine("postgresql://user:pass@localhost/db") gdf = gpd.read_postgis("SELECT * FROM table", con=engine, geom_col='geom')
Coordinate Reference Systems
Check CRS
print(gdf.crs)
Set CRS (when metadata missing)
gdf = gdf.set_crs("EPSG:4326")
Reproject (transforms coordinates)
gdf_projected = gdf.to_crs("EPSG:3857") # Web Mercator gdf_projected = gdf.to_crs("EPSG:32633") # UTM zone 33N
Common CRS codes:
EPSG:4326 - WGS84 (lat/lon)
EPSG:3857 - Web Mercator
EPSG:326XX - UTM zones
Geometric Operations
Buffer (expand/shrink geometries)
buffered = gdf.geometry.buffer(100) # 100 units buffer
Centroid
centroids = gdf.geometry.centroid
Simplify (reduce vertices)
simplified = gdf.geometry.simplify(tolerance=5, preserve_topology=True)
Convex hull
hull = gdf.geometry.convex_hull
Boundary
boundary = gdf.geometry.boundary
Area and length (use projected CRS!)
gdf['area'] = gdf.geometry.area gdf['length'] = gdf.geometry.length
Spatial Analysis
Spatial Joins
Join based on spatial relationship
joined = gpd.sjoin(gdf1, gdf2, predicate='intersects') joined = gpd.sjoin(gdf1, gdf2, predicate='within') joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
Nearest neighbor join
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
Overlay Operations
Intersection
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
Union
union = gpd.overlay(gdf1, gdf2, how='union')
Difference
difference = gpd.overlay(gdf1, gdf2, how='difference')
Dissolve (Aggregate by Attribute)
Merge geometries by attribute
dissolved = gdf.dissolve(by='region', aggfunc='sum')
Clip
Clip data to boundary
clipped = gpd.clip(gdf, boundary_gdf)
Visualization
import matplotlib.pyplot as plt
Basic plot
gdf.plot()
Choropleth map
gdf.plot(column='population', cmap='YlOrRd', legend=True)
Multi-layer map
fig, ax = plt.subplots(figsize=(10, 10)) gdf1.plot(ax=ax, color='blue', alpha=0.5) gdf2.plot(ax=ax, color='red', alpha=0.5) plt.savefig('map.png', dpi=300, bbox_inches='tight')
Interactive map (requires folium)
gdf.explore(column='population', legend=True)
Common Workflows
Spatial Join and Aggregate
Join points to polygons
points_in_polygons = gpd.sjoin(points_gdf, polygons_gdf, predicate='within')
Aggregate by polygon
aggregated = points_in_polygons.groupby('index_right').agg({ 'value': 'sum', 'count': 'size' })
Merge back to polygons
result = polygons_gdf.merge(aggregated, left_index=True, right_index=True)
Buffer Analysis
Create buffers around points
gdf_projected = points_gdf.to_crs("EPSG:3857") # Project first! gdf_projected['buffer'] = gdf_projected.geometry.buffer(1000) # 1km buffer gdf_projected = gdf_projected.set_geometry('buffer')
Find features within buffer
within_buffer = gpd.sjoin(other_gdf, gdf_projected, predicate='within')
Best Practices
-
Always check CRS before spatial operations
-
Use projected CRS for area/distance calculations
-
Match CRS before spatial joins or overlays
-
Validate geometries with .is_valid before operations
-
Use GeoPackage format over Shapefile (modern, better)
-
Use .copy() when modifying geometry to avoid side effects
-
Filter during read with bbox for large files
vs Alternatives
Tool Best For
GeoPandas Vector data analysis, spatial operations
Rasterio Raster data (satellite imagery, DEMs)
Shapely Low-level geometry operations
Folium Interactive web maps
Resources