Logo

This webpage contains all materials for the Methodology and Statistics master course Processing Complex Data (PCD). The materials on this website are CC-BY-4.0 licensed. Lecturer
Javier Garcia-Bernardo
Assistant Professor of Social Data Science
Department of Methodology & Statistics
Utrecht University

Time Series Project: Scientific Data Standards and Temporal Signals

Canonical course conventions live in project_guidelines.md. That file is the source of truth for the four required workflow files (week1_explore.qmd, week2_operationalize_clean.qmd, week3_model.qmd, week4_storytelling.qmd), the data/model_data.rds -> data/model_results.rds pipeline, the raw-data policy, quality-check requirements, decision logs, and contribution tracking. Read it before starting and treat anything below as project-specific guidance on top of those conventions.

Tutorial framing

Scientific time-series data are complex because observations are ordered, repeated, metadata-dependent, and often stored in domain-specific standards designed for reproducibility rather than immediate analysis as a flat table.

Students should learn three main things about these data:

  1. How scientific time-series data are represented through samples, events, timestamps, participant metadata, task metadata, calibration or acquisition settings, and standards such as BIDS-style folder structures, NIfTI, EDF/ASC, TSV sidecars, JSON metadata, HDF5, or NetCDF.
  2. How to turn a raw temporal scientific object into an analysis-ready panel or time-series table by defining the signal, unit of analysis, time window, alignment rule, missing-data rule, and feature extraction choices.
  3. How temporal dependence, sampling rate, smoothing, aggregation, lag construction, and scientific metadata affect modeling, visualization, assumptions, and the claims that can be made from the data.

Peer-teaching checklist

Dimension This project teaches
Data structure Time-indexed samples or events, multivariate time series, participant/task metadata, and possibly spatiotemporal arrays.
Storage system Scientific repository or instructor-provided raw dataset organized through a scientific data standard.
File formats One chosen standard such as BIDS with NIfTI/TSV/JSON sidecars, EDF/ASC eye-tracking exports, HDF5, NetCDF, or comparable domain files.
Encoding Text metadata or event files, JSON sidecars, and binary scientific signal formats.
Model Group comparison of extracted temporal features, linear or mixed model, lagged regression, simple classifier, or time-window comparison.
Key aspects to explain Temporal order, sampling rate, alignment, smoothing, aggregation windows, missing segments, lag construction, scientific metadata, and sensitivity to preprocessing choices.

Resources

Data source

The practical is built around fMRI data. European fMRI datasets are difficult to share publicly: anything that reveals the detailed structure of an individual brain — including raw fMRI volumes — is typically considered individually identifiable under the GDPR and cannot be released openly. The practical therefore uses an American dataset that is shareable.

Primary dataset: Natural Scenes Dataset (NSD) — a high-resolution 7T fMRI dataset of individuals viewing thousands of natural images, with raw BIDS files, prepared NIfTI files, repeated scan sessions, visual ROI masks, behavioral/task event files, and extensive documentation. Access is public through AWS Open Data after signing the NSD data access agreement.

Candidate research question

Good fMRI research has moved well beyond simple summaries — current work uses complex models of neural responsivity, not toy questions. Students do not need to invent a new contribution. Instead they can replicate one of two well-established demonstrations, both supported directly by NSD:

  1. Response amplitudes in a visual ROI vary across scan sessions for one participant (the session-drift / repeated-measures phenomenon documented in the reference above).
  2. Animate versus inanimate object categories produce distinguishable responses in many brain areas.

Either question keeps the project at a defensible size, foregrounds the BIDS/NIfTI raw object, and gives students something real to learn rather than a manufactured small question.

Alternative: NSD eye-tracking data

If a group has a strong eye-tracking reason to deviate, NSD also includes eye-tracking data on AWS, which keeps the dataset and provenance story consistent:

This is the fallback path, not the default. The main practical is fMRI.

Knowledge sources

Teaching angle

Week-by-week

Week 1

Start from the raw scientific files, identify the data-generating process, and explain why the data are stored in a standard rather than in one analysis-ready table.

Prepare for roundtable in week 2:

Week 2

Operationalize the research question by turning the raw scientific files into one analysis-ready time-series or panel object.

Prepare for roundtable in week 3:

Week 3

Fit a simple within-subject model on the panel from Week 2, evaluate it, and show one sensitivity check to a processing choice that is actually present in your pipeline. The specific model depends on which of the two candidate RQs the group chose:

Common prompts for both RQs:

Prepare for roundtable in week 4:

Week 4

Visualize and tell a story about the within-subject result while making the data standard, preprocessing, and model assumptions explicit.