GeoMaster
Comprehensive geospatial science skill covering GIS, remote sensing, spatial analysis, and ML for Earth observation across 70+ topics with 500+ code examples in 8 programming languages.
Installation
Core Python stack (conda recommended)
conda install -c conda-forge gdal rasterio fiona shapely pyproj geopandas
Remote sensing & ML
uv pip install rsgislib torchgeo earthengine-api uv pip install scikit-learn xgboost torch-geometric
Network & visualization
uv pip install osmnx networkx folium keplergl uv pip install cartopy contextily mapclassify
Big data & cloud
uv pip install xarray rioxarray dask-geopandas uv pip install pystac-client planetary-computer
Point clouds
uv pip install laspy pylas open3d pdal
Databases
conda install -c conda-forge postgis spatialite
Quick Start
NDVI from Sentinel-2
import rasterio import numpy as np
with rasterio.open('sentinel2.tif') as src: red = src.read(4).astype(float) # B04 nir = src.read(8).astype(float) # B08 ndvi = (nir - red) / (nir + red + 1e-8) ndvi = np.nan_to_num(ndvi, nan=0)
profile = src.profile
profile.update(count=1, dtype=rasterio.float32)
with rasterio.open('ndvi.tif', 'w', **profile) as dst:
dst.write(ndvi.astype(rasterio.float32), 1)
Spatial Analysis with GeoPandas
import geopandas as gpd
Load and ensure same CRS
zones = gpd.read_file('zones.geojson') points = gpd.read_file('points.geojson')
if zones.crs != points.crs: points = points.to_crs(zones.crs)
Spatial join and statistics
joined = gpd.sjoin(points, zones, how='inner', predicate='within') stats = joined.groupby('zone_id').agg({ 'value': ['count', 'mean', 'std', 'min', 'max'] }).round(2)
Google Earth Engine Time Series
import ee import pandas as pd
ee.Initialize(project='your-project') roi = ee.Geometry.Point([-122.4, 37.7]).buffer(10000)
s2 = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED') .filterBounds(roi) .filterDate('2020-01-01', '2023-12-31') .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))
def add_ndvi(img): return img.addBands(img.normalizedDifference(['B8', 'B4']).rename('NDVI'))
s2_ndvi = s2.map(add_ndvi)
def extract_series(image): stats = image.reduceRegion(ee.Reducer.mean(), roi.centroid(), scale=10, maxPixels=1e9) return ee.Feature(None, {'date': image.date().format('YYYY-MM-dd'), 'ndvi': stats.get('NDVI')})
series = s2_ndvi.map(extract_series).getInfo() df = pd.DataFrame([f['properties'] for f in series['features']]) df['date'] = pd.to_datetime(df['date'])
Core Concepts
Data Types
Type Examples Libraries
Vector Shapefile, GeoJSON, GeoPackage GeoPandas, Fiona, GDAL
Raster GeoTIFF, NetCDF, COG Rasterio, Xarray, GDAL
Point Cloud LAS, LAZ Laspy, PDAL, Open3D
Coordinate Systems
-
EPSG:4326 (WGS 84) - Geographic, lat/lon, use for storage
-
EPSG:3857 (Web Mercator) - Web maps only (don't use for area/distance!)
-
EPSG:326xx/327xx (UTM) - Metric calculations, <1% distortion per zone
-
Use gdf.estimate_utm_crs() for automatic UTM detection
Always check CRS before operations
assert gdf1.crs == gdf2.crs, "CRS mismatch!"
For area/distance calculations, use projected CRS
gdf_metric = gdf.to_crs(gdf.estimate_utm_crs()) area_sqm = gdf_metric.geometry.area
OGC Standards
-
WMS: Web Map Service - raster maps
-
WFS: Web Feature Service - vector data
-
WCS: Web Coverage Service - raster coverage
-
STAC: Spatiotemporal Asset Catalog - modern metadata
Common Operations
Spectral Indices
def calculate_indices(image_path): """NDVI, EVI, SAVI, NDWI from Sentinel-2.""" with rasterio.open(image_path) as src: B02, B03, B04, B08, B11 = [src.read(i).astype(float) for i in [1,2,3,4,5]]
ndvi = (B08 - B04) / (B08 + B04 + 1e-8)
evi = 2.5 * (B08 - B04) / (B08 + 6*B04 - 7.5*B02 + 1)
savi = ((B08 - B04) / (B08 + B04 + 0.5)) * 1.5
ndwi = (B03 - B08) / (B03 + B08 + 1e-8)
return {'NDVI': ndvi, 'EVI': evi, 'SAVI': savi, 'NDWI': ndwi}
Vector Operations
Buffer (use projected CRS!)
gdf_proj = gdf.to_crs(gdf.estimate_utm_crs()) gdf['buffer_1km'] = gdf_proj.geometry.buffer(1000)
Spatial relationships
intersects = gdf[gdf.geometry.intersects(other_geometry)] contains = gdf[gdf.geometry.contains(point_geometry)]
Geometric operations
gdf['centroid'] = gdf.geometry.centroid gdf['simplified'] = gdf.geometry.simplify(tolerance=0.001)
Overlay operations
intersection = gpd.overlay(gdf1, gdf2, how='intersection') union = gpd.overlay(gdf1, gdf2, how='union')
Terrain Analysis
def terrain_metrics(dem_path): """Calculate slope, aspect, hillshade from DEM.""" with rasterio.open(dem_path) as src: dem = src.read(1)
dy, dx = np.gradient(dem)
slope = np.arctan(np.sqrt(dx**2 + dy**2)) * 180 / np.pi
aspect = (90 - np.arctan2(-dy, dx) * 180 / np.pi) % 360
# Hillshade
az_rad, alt_rad = np.radians(315), np.radians(45)
hillshade = (np.sin(alt_rad) * np.sin(np.radians(slope)) +
np.cos(alt_rad) * np.cos(np.radians(slope)) *
np.cos(np.radians(aspect) - az_rad))
return slope, aspect, hillshade
Network Analysis
import osmnx as ox import networkx as nx
Download and analyze street network
G = ox.graph_from_place('San Francisco, CA', network_type='drive') G = ox.add_edge_speeds(G).add_edge_travel_times(G)
Shortest path
orig = ox.distance.nearest_nodes(G, -122.4, 37.7) dest = ox.distance.nearest_nodes(G, -122.3, 37.8) route = nx.shortest_path(G, orig, dest, weight='travel_time')
Image Classification
from sklearn.ensemble import RandomForestClassifier import rasterio from rasterio.features import rasterize
def classify_imagery(raster_path, training_gdf, output_path): """Train RF and classify imagery.""" with rasterio.open(raster_path) as src: image = src.read() profile = src.profile transform = src.transform
# Extract training data
X_train, y_train = [], []
for _, row in training_gdf.iterrows():
mask = rasterize([(row.geometry, 1)],
out_shape=(profile['height'], profile['width']),
transform=transform, fill=0, dtype=np.uint8)
pixels = image[:, mask > 0].T
X_train.extend(pixels)
y_train.extend([row['class_id']] * len(pixels))
# Train and predict
rf = RandomForestClassifier(n_estimators=100, max_depth=20, n_jobs=-1)
rf.fit(X_train, y_train)
prediction = rf.predict(image.reshape(image.shape[0], -1).T)
prediction = prediction.reshape(profile['height'], profile['width'])
profile.update(dtype=rasterio.uint8, count=1)
with rasterio.open(output_path, 'w', **profile) as dst:
dst.write(prediction.astype(rasterio.uint8), 1)
return rf
Modern Cloud-Native Workflows
STAC + Planetary Computer
import pystac_client import planetary_computer import odc.stac
Search Sentinel-2 via STAC
catalog = pystac_client.Client.open( "https://planetarycomputer.microsoft.com/api/stac/v1", modifier=planetary_computer.sign_inplace, )
search = catalog.search( collections=["sentinel-2-l2a"], bbox=[-122.5, 37.7, -122.3, 37.9], datetime="2023-01-01/2023-12-31", query={"eo:cloud_cover": {"lt": 20}}, )
Load as xarray (cloud-native!)
data = odc.stac.load( list(search.get_items())[:5], bands=["B02", "B03", "B04", "B08"], crs="EPSG:32610", resolution=10, )
Calculate NDVI on xarray
ndvi = (data.B08 - data.B04) / (data.B08 + data.B04)
Cloud-Optimized GeoTIFF (COG)
import rasterio from rasterio.session import AWSSession
Read COG directly from cloud (partial reads)
session = AWSSession(aws_access_key_id=..., aws_secret_access_key=...) with rasterio.open('s3://bucket/path.tif', session=session) as src: # Read only window of interest window = ((1000, 2000), (1000, 2000)) subset = src.read(1, window=window)
Write COG
with rasterio.open('output.tif', 'w', **profile, tiled=True, blockxsize=256, blockysize=256, compress='DEFLATE', predictor=2) as dst: dst.write(data)
Validate COG
from rio_cogeo.cogeo import cog_validate cog_validate('output.tif')
Performance Tips
1. Spatial indexing (10-100x faster queries)
gdf.sindex # Auto-created by GeoPandas
2. Chunk large rasters
with rasterio.open('large.tif') as src: for i, window in src.block_windows(1): block = src.read(1, window=window)
3. Dask for big data
import dask.array as da dask_array = da.from_rasterio('large.tif', chunks=(1, 1024, 1024))
4. Use Arrow for I/O
gdf.to_file('output.gpkg', use_arrow=True)
5. GDAL caching
from osgeo import gdal gdal.SetCacheMax(2**30) # 1GB cache
6. Parallel processing
rf = RandomForestClassifier(n_jobs=-1) # All cores
Best Practices
-
Always check CRS before spatial operations
-
Use projected CRS for area/distance calculations
-
Validate geometries: gdf = gdf[gdf.is_valid]
-
Handle missing data: gdf['geometry'] = gdf['geometry'].fillna(None)
-
Use efficient formats: GeoPackage > Shapefile, Parquet for large data
-
Apply cloud masking to optical imagery
-
Preserve lineage for reproducible research
-
Use appropriate resolution for your analysis scale
Detailed Documentation
-
Coordinate Systems - CRS fundamentals, UTM, transformations
-
Core Libraries - GDAL, Rasterio, GeoPandas, Shapely
-
Remote Sensing - Satellite missions, spectral indices, SAR
-
Machine Learning - Deep learning, CNNs, GNNs for RS
-
GIS Software - QGIS, ArcGIS, GRASS integration
-
Scientific Domains - Marine, hydrology, agriculture, forestry
-
Advanced GIS - 3D GIS, spatiotemporal, topology
-
Big Data - Distributed processing, GPU acceleration
-
Industry Applications - Urban planning, disaster management
-
Programming Languages - Python, R, Julia, JS, C++, Java, Go, Rust
-
Data Sources - Satellite catalogs, APIs
-
Troubleshooting - Common issues, debugging, error reference
-
Code Examples - 500+ examples
GeoMaster covers everything from basic GIS operations to advanced remote sensing and machine learning.