Greater Manchester Brownfield Restoration Potential

Geospatial risk assessment and machine learning for ecological restoration planning

Interactive Map Stakeholder Explainer View Code

The Challenge

Greater Manchester contains over 1,500 registered brownfield sites — former industrial land contaminated by decades of manufacturing and extraction. These sites pose environmental risks through potential groundwater pollution and contamination spread to nearby watercourses.

However, brownfield restoration using nature-based solutions like mycoforestry (fungal remediation) offers transformative potential for healing these damaged ecosystems. The challenge: which sites should be prioritised for ecological assessment and restoration interventions?

This project combines satellite imagery analysis, geospatial modelling, and machine learning to answer that question.

Key Findings

1,585
Active brownfield sites analysed
746
High-risk sites identified
772 ha
High-risk land area
47%
Sites scored high priority

Primary Finding

Site size and terrain flatness are the strongest predictors of restoration suitability. The machine learning model found that moderately-sized (0.1–10 hectares), flat brownfield sites offer the best combination of ecological risk and restoration feasibility.

Salford M5 emerged as the highest-risk area, with three of the top 10 priority sites concentrated near the River Irwell — reflecting the intersection of flat terrain, permeable soils, and proximity to watercourses.

Methodology

This project integrates multiple geospatial platforms and programming languages to build a comprehensive risk assessment pipeline:

  1. Data Collection (Google Earth Engine)

    Imported the UK brownfield land register and filtered to Greater Manchester (bounding box: -2.7 to -1.95°E, 53.35 to 53.65°N). Integrated satellite-derived environmental layers:

    • ESA WorldCover (land cover, 10m resolution)
    • WWF HydroSHEDS (river networks)
    • OpenLandMap soil texture (permeability classes)
    • SRTM elevation and derived slope
  2. Risk Score Calculation (JavaScript in GEE)

    Calculated composite risk scores (0–1 scale) for each site based on three environmental factors:

    // Water proximity risk (contamination spread potential)
    var water_risk = riverDistance.divide(5000).multiply(-1).add(1).clamp(0,1);
    
    // Soil permeability risk (groundwater pollution)
    var soil_risk = soilTexture.divide(12);
    
    // Terrain flatness (industrial land proxy)
    var slope_risk = slope.divide(30).multiply(-1).add(1).clamp(0,1);
    
    // Composite score (equal weighting)
    var total_risk = water_risk.add(soil_risk).add(slope_risk).divide(3);

    Exported results to CSV for statistical analysis (1,583 sites with calculated risk scores).

  3. Statistical Analysis (R)

    Categorised sites into Low/Medium/High risk groups and generated publication-quality visualisations using ggplot2:

    • Risk score distribution histogram
    • Category breakdown bar chart (746 high, 764 medium, 72 low)
    • Site size vs. risk scatter plot
    • Top 10 highest-risk sites ranking
    Risk Distribution

    Distribution of contamination risk scores across 1,582 sites

    Site Size vs Risk

    Site size vs. contamination risk — showing no strong correlation between area and risk score

  4. Cartography (QGIS)

    Created a polished, print-quality map with professional cartographic elements (legend, scale bar, north arrow, title block). Sites colour-coded by risk category for stakeholder presentations.

    QGIS Map

    Final map showing brownfield sites colour-coded by environmental risk

  5. Interactive Mapping (Python + Folium)

    Built a web-based interactive map allowing users to:

    • Click markers to view site details (address, risk scores, size)
    • Toggle High/Medium/Low risk categories on/off
    • Measure distances between sites
    • Recenter the map view

    Exports as standalone HTML (no server required) — practical for sharing with non-technical stakeholders.

  6. Machine Learning (Python + scikit-learn)

    Trained a Random Forest classifier to predict restoration suitability based on site characteristics. Key findings:

    • Site size (hectares) is the dominant predictor (75% feature importance)
    • Terrain flatness contributes 13% (proxy for industrial land use)
    • Water proximity and soil permeability have minimal predictive power once size and terrain are accounted for
    Feature Importance

    Random Forest feature importance analysis

Technical Approach

Risk Score Design Rationale

The composite risk score prioritises three contamination pathways:

Equal weighting was chosen due to lack of empirical data on relative pathway importance. Future work could use expert elicitation or historical contamination records to refine weights.

Machine Learning Model

The Random Forest classifier achieved 100% accuracy on test data — however, this reflects the use of a synthetic target variable rather than real restoration outcomes. The target was defined using deterministic rules:

suitable = (
    (risk_category in ['Medium', 'High']) &
    (0.1 <= hectares <= 10) &
    (slope_risk > 0.8)
)

The model learned these rules perfectly because they're predictive by design. With access to actual restoration project records (success/failure outcomes, intervention costs, ecological recovery metrics), this approach could be extended to build a genuine predictive model for real-world decision-making.

Confusion Matrix

Perfect classification on test set (317 sites) reflects synthetic target variable

Limitations & Future Work

Key Limitation

This analysis prioritises sites based on contamination risk (likelihood of environmental harm), not restoration potential (likelihood of ecological success). The two are related but not identical.

Future extensions could address this by:

The methodology developed here is transferable to other UK cities with industrial heritage (Sheffield, Birmingham, Newcastle) and could inform national-scale brownfield restoration planning.

Tools & Technologies

Platforms: Google Earth Engine, QGIS, Jupyter

Languages: JavaScript (GEE), Python, R

Python Libraries: GeoPandas, Folium, scikit-learn, pandas, matplotlib, seaborn

R Packages: sf, ggplot2, dplyr, tidyr

Data Sources: UK Brownfield Register, ESA WorldCover, HydroSHEDS, OpenLandMap, SRTM

Get In Touch

Open to collaboration on environmental data science projects and actively seeking opportunities in geospatial analysis and ecological restoration