GeoRegression

A geospatial framework for performing non-linear regression, designed to effectively model complex spatial relationships.

This Python package offers a robust framework for regression modeling on geospatial data, addressing the challenge of spatial non-stationarity by integrating spatial information directly into the modeling process. Built on this framework are two advanced methods: the SpatioTemporal Random Forest (STRF) and the SpatioTemporal Stacking Tree (STST), which leverage spatial and temporal patterns to enhance predictive accuracy.

Illustration for STRF and STST

Installation
Quick Start
- Data Preparation
- Shared Parameter Explanation
- SpatioTemporal Random Forest (STRF)
- SpatioTemporal Stacking Tree (STST)
- GWR / GTWR
- Prediction
- SpatioTemporal Dimension
- Posterior Inspection Tools
  - Feature Importance Analysis
  - SpatioTemporal (Local) Accumulated Local Effects (STALE) Plots
Citation

Installation

Python with version >= 3.7 is required.

pip install georegression

Quick Start

The full example can be found in the Examples folder.

Data Preparation

Use the provided function to generate the sample data with spatial non-stationarity.

import numpy as np
from georegression.simulation.simulation_for_fitting import generate_sample, f_square, coef_strong

X, y, points = generate_sample(500, f_square, coef_strong, random_seed=1, plot=True)
X_plus = np.concatenate([X, points], axis=1)

Shared Parameter Explanation

Several parameters are shared across different model implementations and are used to construct weight matrices for both spatial and spatiotemporal dimensions:

kernel_type: Determines the kernel function used for spatial weighting. Accepts standard kernel types:
- 'bisquare': A commonly used kernel that provides smooth distance-based weighting
- 'gaussian': Gaussian kernel for distance-based weighting
neighbour_count: Controls the adaptive kernel bandwidth for spatial weighting:
- Must be a decimal between 0 and 1 (e.g., 0.3)
- Uses an adaptive kernel bandwidth equal to the distance to the specified percentage of nearest neighbors
bandwidth: Optional parameter for fixed kernel bandwidth:
- If an integer value is provided, uses this fixed distance as the kernel bandwidth
- This bandwidth will be applied uniformly to all data points
- When specified, this takes precedence over neighbour_count

SpatioTemporal Random Forest (STRF)

The WeightModel class provides the basic weighted framework for regression.
In the weighted framework, each local models do not see the y value of the target location, therefore, the prediction of each local model is the prediction of the whole model.

from sklearn.ensemble import RandomForestRegressor
from georegression.weight_model import WeightModel

distance_measure = "euclidean"
kernel_type = "bisquare"

grf_neighbour_count=0.3
grf_n_estimators=50
model = WeightModel(
    RandomForestRegressor(n_estimators=grf_n_estimators),
    distance_measure,
    kernel_type,
    neighbour_count=grf_neighbour_count,
)
model.fit(X_plus, y, [points])
print('STRF R2 Score: ', model.llocv_score_)

# --- Alternative ---

from sklearn.metrics import r2_score
y_predict = model.local_predict_
score = r2_score(y, y_predict)
print(score)

SpatioTemporal Stacking Tree (STST)

The StackingWeightModel class provides the weighted stacking framework for regression.
In the weighted stacking framework, each local models do not see the y value of the target location, therefore, the prediction of each local model is the prediction of the whole model.

from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import ExtraTreesRegressor
from georegression.stacking_model import StackingWeightModel

distance_measure = "euclidean"
kernel_type = "bisquare"

stacking_neighbour_count=0.3
stacking_neighbour_leave_out_rate=0.1
model = StackingWeightModel(
    DecisionTreeRegressor(splitter="random", max_depth=X.shape[1]),
    # Or use the ExtraTreesRegressor for better predicting performance.
    # ExtraTreesRegressor(n_estimators=10, max_depth=X.shape[1]), 
    distance_measure,
    kernel_type,
    neighbour_count=stacking_neighbour_count,
    neighbour_leave_out_rate=stacking_neighbour_leave_out_rate,
)
model.fit(X_plus, y, [points])
print('STST R2 Score: ', model.llocv_stacking_)

# --- Alternative ---

from sklearn.metrics import r2_score
y_predict = model.stacking_predict_
score = r2_score(y, y_predict)
print(score)

GWR / GTWR

from sklearn.linear_model import LinearRegression
from georegression.weight_model import WeightModel

distance_measure = "euclidean"
kernel_type = "bisquare"

gwr_neighbour_count=0.2
model = WeightModel(
    LinearRegression(),
    distance_measure,
    kernel_type,
    neighbour_count=gwr_neighbour_count,
)
model.fit(X_plus, y, [points])

print('GWR R2 Score: ', model.llocv_score_)

# --- Alternative ---

from sklearn.metrics import r2_score
y_predict = model.local_predict_
score = r2_score(y, y_predict)
print(score)

Prediction

Although in the weighted framework, the prediction of each local model is the prediction of the whole model, two methods are provided for making prediction for the new data:
- predict_by_fit: Fit new local model for prediction data using the training data to make prediction.
- predict_by_weight: Predict using local estimators and weight the local predictions using the weight matrix that calculated by using training locations as source and prediction locations as target.

X_test, y_test, points_test = generate_sample(500, f_square, coef_strong, random_seed=2, plot=False)
X_test_plus = np.concatenate([X_test, points_test], axis=1)

y_predict = model.predict_by_fit(X_plus, y, [points], X_test_plus, [points_test])

# For weight model:
# y_predict = model.predict_by_fit(X_test_plus, [points_test])

# For predict by weight:
# y_predict = model.predict_by_weight(X_test_plus, [points_test])
score = r2_score(y_test, y_predict)
print(score)

SpatioTemporal Dimension

To use more than one dimension of spatial information, just add the new dimension to the input data.

times = np.random.randint(0, 10, size=(X.shape[0], 1))
X_plus = np.concatenate([X, points, times], axis=1)

distance_measure = ["euclidean", 'euclidean']
kernel_type = ["bisquare", 'bisquare']

grf_neighbour_count = 0.3

grf_n_estimators=50
model = WeightModel(
    RandomForestRegressor(n_estimators=grf_n_estimators),
    distance_measure,
    kernel_type,
    neighbour_count=grf_neighbour_count,
)
model.fit(X_plus, y, [points, times])

Posterior Inspection Tools

GeoRegression provides powerful tools for model interpretation and analysis after fitting. Here are two key features:

Feature Importance Analysis

You can analyze both global and local feature importance to understand how different features contribute to predictions across space:

from georegression.weight_model import WeightModel
from sklearn.ensemble import RandomForestRegressor

# Fit the model
model = WeightModel(
    RandomForestRegressor(n_estimators=50),
    distance_measure="euclidean",
    kernel_type="bisquare",
    neighbour_count=0.02
)
model.fit(X, y, [points])

# Get global feature importance
importance_global = model.importance_score_global()
print("Global Importance Score: ", importance_global)

# Get local feature importance
importance_local = model.importance_score_local()

# Visualize local importance for each feature
import matplotlib.pyplot as plt

for i in range(importance_local.shape[1]):
    plt.figure()
    scatter = plt.scatter(
        points[:, 0], points[:, 1], 
        c=importance_local[:, i], 
        cmap="viridis"
    )
    plt.colorbar(scatter)
    plt.title(f"Local Importance of Feature {i}")
    plt.show()

Example visualization of local feature importance:

Local importance visualization showing spatial variation in feature influence

SpatioTemporal (Local) Accumulated Local Effects (STALE) Plots

STALE plots help understand how features affect predictions locally:

from georegression.local_ale import weighted_ale
from georegression.visualize.ale import plot_ale

# For a specific location (local_index)
feature_index = 0  # Feature to analyze
local_index = 0    # Location to analyze

# Get local estimator and data
estimator = model.local_estimator_list[local_index]
neighbour_mask = model.neighbour_matrix_[local_index]
neighbour_weight = model.weight_matrix_[local_index][neighbour_mask]
X_local = model.X[neighbour_mask]

# Calculate ALE
ale_result = weighted_ale(
    X_local, 
    feature_index, 
    estimator.predict, 
    neighbour_weight
)
fval, ale = ale_result

# Plot ALE with weighted observations
x_neighbour = X[model.neighbour_matrix_[local_index], feature_index]
y_neighbour = y[model.neighbour_matrix_[local_index]]
weight_neighbour = model.weight_matrix_[local_index, model.neighbour_matrix_[local_index]]

fig = plot_ale(fval, ale, x_neighbour)
plt.show()

Example STALE plot:

STALE plot showing the local accumulated effects of a feature at a specific location

These tools provide insights into:

How different features influence predictions globally and locally
How feature effects vary across space
The strength and nature of spatial relationships in your data

Citation

If you find this package useful in your research, please consider citing:

Luo, Y., & Su, S. (2025). SpatioTemporal Random Forest and SpatioTemporal Stacking Tree: A novel spatially explicit ensemble learning approach to modeling non-linearity in spatiotemporal non-stationarity. International Journal of Applied Earth Observation and Geoinformation, 136, 104315. https://doi.org/10.1016/j.jag.2024.104315

@article{luo_spatiotemporal_2025,
	title = {{SpatioTemporal} {Random} {Forest} and {SpatioTemporal} {Stacking} {Tree}: {A} novel spatially explicit ensemble learning approach to modeling non-linearity in spatiotemporal non-stationarity},
	volume = {136},
	issn = {1569-8432},
	shorttitle = {{SpatioTemporal} {Random} {Forest} and {SpatioTemporal} {Stacking} {Tree}},
	url = {https://www.sciencedirect.com/science/article/pii/S1569843224006733},
	doi = {10.1016/j.jag.2024.104315},
	urldate = {2024-12-30},
	journal = {International Journal of Applied Earth Observation and Geoinformation},
	author = {Luo, Yun and Su, Shiliang},
	month = feb,
	year = {2025},
	keywords = {Ensemble learning, Machine learning, Nonlinearity, Spatially explicit modeling, Spatiotemporal non-stationarity, Spatiotemporal random forest, Spatiotemporal stacking tree},
	pages = {104315},
}

GeoRegression
GeoRegression copied to clipboard

Metadata

GeoRegression

Table of Contents

Installation

Quick Start

Data Preparation

Shared Parameter Explanation

SpatioTemporal Random Forest (STRF)

SpatioTemporal Stacking Tree (STST)

GWR / GTWR

Prediction

SpatioTemporal Dimension

Posterior Inspection Tools

Feature Importance Analysis

SpatioTemporal (Local) Accumulated Local Effects (STALE) Plots

Citation

← Metadata

Owner

Metadata

GeoRegression GeoRegression copied to clipboard

Metadata

GeoRegression

Table of Contents

Installation

Quick Start

Data Preparation

Shared Parameter Explanation

SpatioTemporal Random Forest (STRF)

SpatioTemporal Stacking Tree (STST)

GWR / GTWR

Prediction

SpatioTemporal Dimension

Posterior Inspection Tools

Feature Importance Analysis

SpatioTemporal (Local) Accumulated Local Effects (STALE) Plots

Citation

← Metadata

Owner

Metadata

GeoRegression
GeoRegression copied to clipboard