Geospatial risk assessment and machine learning for ecological restoration planning
Greater Manchester contains over 1,500 registered brownfield sites — former industrial land contaminated by decades of manufacturing and extraction. These sites pose environmental risks through potential groundwater pollution and contamination spread to nearby watercourses.
However, brownfield restoration using nature-based solutions like mycoforestry (fungal remediation) offers transformative potential for healing these damaged ecosystems. The challenge: which sites should be prioritised for ecological assessment and restoration interventions?
This project combines satellite imagery analysis, geospatial modelling, and machine learning to answer that question.
Site size and terrain flatness are the strongest predictors of restoration suitability. The machine learning model found that moderately-sized (0.1–10 hectares), flat brownfield sites offer the best combination of ecological risk and restoration feasibility.
Salford M5 emerged as the highest-risk area, with three of the top 10 priority sites concentrated near the River Irwell — reflecting the intersection of flat terrain, permeable soils, and proximity to watercourses.
This project integrates multiple geospatial platforms and programming languages to build a comprehensive risk assessment pipeline:
Imported the UK brownfield land register and filtered to Greater Manchester (bounding box: -2.7 to -1.95°E, 53.35 to 53.65°N). Integrated satellite-derived environmental layers:
Calculated composite risk scores (0–1 scale) for each site based on three environmental factors:
// Water proximity risk (contamination spread potential)
var water_risk = riverDistance.divide(5000).multiply(-1).add(1).clamp(0,1);
// Soil permeability risk (groundwater pollution)
var soil_risk = soilTexture.divide(12);
// Terrain flatness (industrial land proxy)
var slope_risk = slope.divide(30).multiply(-1).add(1).clamp(0,1);
// Composite score (equal weighting)
var total_risk = water_risk.add(soil_risk).add(slope_risk).divide(3);
Exported results to CSV for statistical analysis (1,583 sites with calculated risk scores).
Categorised sites into Low/Medium/High risk groups and generated publication-quality visualisations using ggplot2:
Distribution of contamination risk scores across 1,582 sites
Site size vs. contamination risk — showing no strong correlation between area and risk score
Created a polished, print-quality map with professional cartographic elements (legend, scale bar, north arrow, title block). Sites colour-coded by risk category for stakeholder presentations.
Final map showing brownfield sites colour-coded by environmental risk
Built a web-based interactive map allowing users to:
Exports as standalone HTML (no server required) — practical for sharing with non-technical stakeholders.
Trained a Random Forest classifier to predict restoration suitability based on site characteristics. Key findings:
Random Forest feature importance analysis
The composite risk score prioritises three contamination pathways:
Equal weighting was chosen due to lack of empirical data on relative pathway importance. Future work could use expert elicitation or historical contamination records to refine weights.
The Random Forest classifier achieved 100% accuracy on test data — however, this reflects the use of a synthetic target variable rather than real restoration outcomes. The target was defined using deterministic rules:
suitable = (
(risk_category in ['Medium', 'High']) &
(0.1 <= hectares <= 10) &
(slope_risk > 0.8)
)
The model learned these rules perfectly because they're predictive by design. With access to actual restoration project records (success/failure outcomes, intervention costs, ecological recovery metrics), this approach could be extended to build a genuine predictive model for real-world decision-making.
Perfect classification on test set (317 sites) reflects synthetic target variable
This analysis prioritises sites based on contamination risk (likelihood of environmental harm), not restoration potential (likelihood of ecological success). The two are related but not identical.
Future extensions could address this by:
The methodology developed here is transferable to other UK cities with industrial heritage (Sheffield, Birmingham, Newcastle) and could inform national-scale brownfield restoration planning.
Platforms: Google Earth Engine, QGIS, Jupyter
Languages: JavaScript (GEE), Python, R
Python Libraries: GeoPandas, Folium, scikit-learn, pandas, matplotlib, seaborn
R Packages: sf, ggplot2, dplyr, tidyr
Data Sources: UK Brownfield Register, ESA WorldCover, HydroSHEDS, OpenLandMap, SRTM
Open to collaboration on environmental data science projects and actively seeking opportunities in geospatial analysis and ecological restoration