worldclim-extract

Version Compatibility

Reference examples tested with: Python 3.10+, rasterio 1.4+, pandas 2.0+

Before using code patterns, verify installed versions match. If versions differ:

pip show rasterio pandas openpyxl

If code throws ImportError, install missing packages:

pip install rasterio pandas openpyxl

Overview

WorldClim provides global climate data as GeoTIFF raster files. Each .tif file is a grid covering the entire Earth, where each grid cell stores a climate value (e.g., temperature in °C or precipitation in mm). This skill automates the process of extracting climate values for specific geographic coordinates.

How It Works

Input: Excel or CSV file containing sample coordinates (longitude, latitude)
Data: WorldClim 2.1 bioclimatic GeoTIFF files (19 BIO variables, 1970-2000 average)
Process: For each coordinate, find the corresponding grid cell and read its value
Output: Original data plus extracted climate columns appended

Grid Resolution

Resolution	Cell Size	Approx. Area	File Size
`10m`	0.167°	~18.5 km²	~48 MB zip
`5m`	0.083°	~9.3 km²	~170 MB zip
`2.5m`	0.042°	~4.6 km²	~650 MB zip

Default: 10m — sufficient for most ecological/population genetics studies.

Quick Start

Using the CLI Script

A reusable Python script is provided at {baseDir}/extract_worldclim.py:

# Extract BIO1 (annual mean temp) and BIO12 (annual precipitation) — default
python3 {baseDir}/extract_worldclim.py \
  -i samples.xlsx \
  -o samples_with_climate.xlsx

# Extract all 19 bioclimatic variables
python3 {baseDir}/extract_worldclim.py \
  -i samples.xlsx \
  -o samples_all_bio.xlsx \
  --bios 1-19

# Extract specific variables with custom column names
python3 {baseDir}/extract_worldclim.py \
  -i coords.csv \
  -o result.xlsx \
  --bios 1,5,6,12,13 \
  --res 2.5m \
  --lon longitude \
  --lat latitude

Using Python Directly

For custom integration or programmatic use:

import pandas as pd
import rasterio

def extract_bio(tif_path, lon, lat):
    """Extract a single value from a GeoTIFF at given coordinates."""
    with rasterio.open(tif_path) as src:
        value = next(src.sample([(lon, lat)]))[0]
    return value

# Read sample coordinates
df = pd.read_excel("samples.xlsx")
coords = list(zip(df["经度"], df["纬度"]))

# Extract BIO1 (Annual Mean Temperature)
with rasterio.open("wc2.1_10m_bio_1.tif") as src:
    df["年均温度_C"] = [v[0] for v in src.sample(coords)]

# Extract BIO12 (Annual Precipitation)
with rasterio.open("wc2.1_10m_bio_12.tif") as src:
    df["年降水量_mm"] = [v[0] for v in src.sample(coords)]

df.to_excel("samples_with_climate.xlsx", index=False)

WorldClim Data Download

Automatic (script handles it)

The CLI script auto-downloads data on first run to the --cache directory (default: ./worldclim_data).

Manual Download

If automatic download fails (e.g., network issues):

# 10m resolution (~48 MB)
curl -O https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_10m_bio.zip
unzip wc2.1_10m_bio.zip -d ./worldclim_data/

# 2.5m resolution (~650 MB)
curl -O https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_2.5m_bio.zip
unzip wc2.1_2.5m_bio.zip -d ./worldclim_data/

BIO Variable Reference

BIO	Name	Unit	Description
BIO1	Annual Mean Temperature	°C	年均温度
BIO2	Mean Diurnal Range	°C	昼夜温差月均值
BIO3	Isothermality	%	等温性 (BIO2/BIO7 × 100)
BIO4	Temperature Seasonality	SD × 100	温度季节性
BIO5	Max Temp of Warmest Month	°C	最暖月最高温
BIO6	Min Temp of Coldest Month	°C	最冷月最低温
BIO7	Temperature Annual Range	°C	年温度范围 (BIO5−BIO6)
BIO8	Mean Temp of Wettest Quarter	°C	最湿季均温
BIO9	Mean Temp of Driest Quarter	°C	最干季均温
BIO10	Mean Temp of Warmest Quarter	°C	最暖季均温
BIO11	Mean Temp of Coldest Quarter	°C	最冷季均温
BIO12	Annual Precipitation	mm	年降水量
BIO13	Precipitation of Wettest Month	mm	最湿月降水量
BIO14	Precipitation of Driest Month	mm	最干月降水量
BIO15	Precipitation Seasonality	CV	降水季节性
BIO16	Precipitation of Wettest Quarter	mm	最湿季降水量
BIO17	Precipitation of Driest Quarter	mm	最干季降水量
BIO18	Precipitation of Warmest Quarter	mm	最暖季降水量
BIO19	Precipitation of Coldest Quarter	mm	最冷季降水量

Data Source: WorldClim 2.1 (1970-2000, 30-year average)

Input Format Requirements

Required Columns

Longitude column: Decimal degrees, range [-180, 180]. Default column name: 经度 (override with --lon)
Latitude column: Decimal degrees, range [-90, 90]. Default column name: 纬度 (override with --lat)

Supported Input Formats

.xlsx — Excel workbook (recommended, handles Chinese headers well)
.csv — Comma-separated values

Common Issues

Issue	Cause	Solution
Coordinates read as text	Hidden special characters (e.g., `\xa0` non-breaking space)	Script auto-cleans with `pd.to_numeric(errors='coerce')`; check for NA after conversion
Negative longitudes rejected	Using East/West format instead of decimal	Convert to decimal: 东经 117° → 117.0; 西经 117° → -117.0
Missing extracted values	Coordinate falls in ocean or outside raster bounds	Check coordinate validity; WorldClim covers land globally

Output Format

The output file contains all original columns plus extracted BIO columns:

名称    经度        纬度        年均温度_C    年降水量_mm
NFAL10  117.214052  31.270421   16.15        1325.0
NFBJ1   116.591445  40.032115   11.88        542.0

Using R (terra) for Cross-Validation

If you need to validate results with R:

library(terra)

# Read raster stack
bio <- rast(list.files("./worldclim_data", pattern = "\\.tif$", full.names = TRUE))

# Read and clean coordinates
pts <- readxl::read_excel("samples.xlsx")
pts$经度 <- as.numeric(gsub("\\s+", "", pts$经度))  # Remove hidden spaces
pts$纬度 <- as.numeric(pts$纬度)
pts <- pts[!is.na(pts$经度) & !is.na(pts$纬度), ]

# Extract
v <- vect(pts, geom = c("经度", "纬度"), crs = "EPSG:4326")
result <- extract(bio, v)
write.csv(cbind(pts, result[, -1]), "output.csv", row.names = FALSE)

Note: R's as.numeric() is stricter than Python's pandas and may fail on hidden whitespace. Always clean coordinates before conversion.

Decision Tree

Need to extract climate data for sample coordinates?
├── Have coordinates in Excel/CSV?
│   └── Use the CLI script: python3 extract_worldclim.py -i input.xlsx -o output.xlsx
├── Need only temperature and precipitation?
│   └── Default: --bios 1,12 (no need to specify)
├── Need all 19 bioclimatic variables?
│   └── Use: --bios 1-19
├── Need higher spatial resolution?
│   ├── ~9 km cells → --res 5m
│   └── ~4.6 km cells → --res 2.5m
└── Need to integrate into a Python pipeline?
    └── Use the direct Python code pattern with rasterio.sample()

Related Skills

bio-geo-data — For general geospatial data operations
bio-read-sequences — For biological sequence file parsing
bio-batch-processing — For processing multiple files in batch