Logo

This webpage contains all materials for the Methodology and Statistics master course Processing Complex Data (PCD). The materials on this website are CC-BY-4.0 licensed. Lecturer
Javier Garcia-Bernardo
Assistant Professor of Social Data Science
Department of Methodology & Statistics
Utrecht University

Network Projects

Canonical course conventions live in project_guidelines.md. That file is the source of truth for the four required workflow files (week1_explore.qmd, week2_operationalize_clean.qmd, week3_model.qmd, week4_storytelling.qmd), the data/model_data.rds -> data/model_results.rds pipeline, the raw-data policy, quality-check requirements, decision logs, and contribution tracking. Read it before starting and treat anything below as project-specific guidance on top of those conventions.

Tutorial framing

Network data are complex because observations are connected through ties, direction, weights, missing nodes, and dependence between relations rather than arriving as independent rows in a single analysis-ready table.

Students should learn three main things about these data:

  1. How networks are represented through nodes, edges, edge lists, adjacency matrices, sparse matrices, GraphML, and choices about direction, weight, time, and isolates.
  2. How to turn raw graph files into a clean network object while documenting what counts as a node, what counts as a tie, and which representation best matches the research question.
  3. How network dependence affects standard statistical assumptions, and how network statistics, reference models, permutation tests, or clustering can support claims about homophily, polarization, centrality, or other network structures.

Peer-teaching checklist

Dimension This project teaches
Data structure Graphs with nodes, edges, node attributes, edge attributes, edge lists, adjacency matrices, and sparse matrix representations.
Storage system File-based network repositories and downloaded graph files.
File formats CSV edge lists, GraphML, and compressed repository downloads where relevant.
Encoding Text CSV and XML-based GraphML.
Model Assortativity or homophily statistic, permutation test, clustering, or a small network summary model.
Key aspects to explain What counts as a node or tie, directed vs. undirected graphs, weighted vs. unweighted ties, isolates, sparse vs. dense matrices, network visualization, and why network dependence violates ordinary i.i.d. assumptions.

Resources

Data sources

Knowledge sources

Week-by-week

Week 1:

Begin with raw repository files and explain what the network is, who generated it, for what purpose, and the different storage formats.

Prepare for roundtable in week 2:

Week 2:

Operationalize the research question by turning raw graph files into a clean file with explicit decisions about direction, weights, and isolates.

Prepare for roundtable in week 3:

Week 3:

Use a network-appropriate statistic and an appropriate model, and check sensitivity to preprocessing choices.

Prepare for roundtable in week 4:

Week 4

Visualize and tell a story