Logo

This webpage contains all materials for the Methodology and Statistics master course Processing Complex Data (PCD). The materials on this website are CC-BY-4.0 licensed. Lecturer
Javier Garcia-Bernardo
Assistant Professor of Social Data Science
Department of Methodology & Statistics
Utrecht University

Geospatial Projects

Canonical course conventions live in project_guidelines.md. That file is the source of truth for the four required workflow files (week1_explore.qmd, week2_operationalize_clean.qmd, week3_model.qmd, week4_storytelling.qmd), the data/model_data.rds -> data/model_results.rds pipeline, the raw-data policy, quality-check requirements, decision logs, and contribution tracking. Read it before starting and treat anything below as project-specific guidance on top of those conventions.

Tutorial framing

Geospatial data are complex because observations are tied to coordinate systems, geometric boundaries, raster surfaces, and spatial dependence rather than arriving as independent rows in a single analysis-ready table.

Students should learn three main things about these data:

  1. How spatial data are represented through vector geometries, raster grids, coordinate reference systems, spatial identifiers, and formats or services such as GeoJSON, Shapefiles, GeoTIFF, WFS, and WMS.
  2. How to turn raw spatial sources into an analysis-ready table through spatial joins, raster-to-polygon aggregation, transformations, and documented choices about what land use and population composition mean.
  3. How spatial dependence affects modeling, visualization, assumptions, and interpretation, including when residual autocorrelation or SAR/CAR-style models matter.

Peer-teaching checklist

Dimension This project teaches
Data structure Vector geometries, raster grids, tabular attributes linked by spatial identifiers, and spatial adjacency graphs.
Storage system File-based spatial datasets and geospatial web services from PDOK/CBS.
File formats GeoJSON, Shapefile, GeoTIFF, and tabular exports such as CSV.
Encoding JSON for GeoJSON, binary spatial files for Shapefile components and GeoTIFF, and CRS metadata.
Model Linear or GLM baseline, residual spatial autocorrelation check, and SAR/CAR-style extension if needed.
Key aspects to explain CRS, vector vs. raster data, WFS/WMS, spatial joins, raster-to-polygon aggregation, spatial dependence (Moran’s I, SAR/CAR when needed), and the Modifiable Areal Unit Problem (MAUP).

Resources

Data sources

Feel free to use different sources if you want.

Knowledge sources

Week-by-week

Week 1:

Start from raw spatial files or web services, identify the data generating process, and explain vector/raster or point/polygon structure before doing any modeling.

Prepare for the roundtable of week 2:

Week 2

Operationalize the research question by turning raw geometry-linked files into one analysis table, and document why the data were stored in that format.

Prepare for the roundtable of week 3:

Week 3:

Fit models, explain preprocessing decisions, and show one sensitivity check to spatial choices.

Prepare for the roundtable of week 4:

Week 4:

Visualize and tell a story