Practical Machine Learning for Surveys, Panels, and Experiments

Lecturer: Marco Steenbergen

Modality: In presence

Week 2: 17-21 August 2026

 

Workshop Contents and Objectives

Machine learning (ML) is changing how social scientists approach familiar research problems. Issues such as missing data, model specification, measurement error, heterogeneity, and complex longitudinal patterns are not new. However, modern ML methods offer new ways to diagnose, correct, and exploit these challenges. This course shows how ML can directly improve the quality, robustness, and creativity of empirical social science research. Rather than focusing on abstract model families, we anchor each method in concrete tasks researchers face every day: cleaning messy data, building better measures, designing credible causal analyses, and extracting structure from complex datasets.

Through hands-on sessions, participants will learn how to combine traditional statistical thinking with modern ML workflows, including advanced tree-based algorithms, causal ML, and deep learning for tabular and survey data. By the end of the week, students will not only know how these methods work, but when they meaningfully expand what social scientists can learn from their data.

 

Detailed lecture plan (daily schedule)

Day 1 – Foundations for modern ML workflows
  • Theme: How machine learning fits into social science research.
  • Content: (1) Predictive versus inferential goals; (2) train/test logic, resampling, and cross-validation; (3) regularization, tuning, and model selection; (4) CART, random forests, and gradient boosting; (5) hands-on with tidymodels.
  • Outcome: Participants understand workflow logic and can train/evaluate basic ensemble models.
Day 2 – ML for missing data and data quality
  • Theme: Missingness as a prediction problem.
  • Content: (1) Predictive imputation versus model-based imputation; (2) MissForest, BART imputation, MICE with ML engines; (3) diagnostics for imputation quality; (4) detecting low-quality or inattentive responses in surveys.
  • Outcome: Students can use ML to diagnose and fix incomplete or low-quality data.
Day 3 – ML for causal inference I
  • Theme: Using ML to strengthen causal identification and reduce bias.
  • Content: (1) Why ML helps; (2) propensity estimation with ML; (3) post-double-selection and partialling-out; (3) double machine learning; (4) hands-on with causal learners.
  • Outcome: Students can combine traditional causal inference with ML-based nuisance models.
Day 4 – ML for causal inference II
  • Theme: When treatment effects vary—and how to learn from that.
  • Content: (1) CATE estimation; (2) R-learners, T-learners, and X-learners; (3) causal forests and policy trees; (4) stability and interpretability in heterogeneous treatment effects.
  • Outcome: Students can estimate and interpret heterogeneous causal effects and extract actionable policy rules.
Day 5 – Deep learning for social science data
  • Theme: Neural networks for nonlinear patterns, latent structure, and measurement.
  • Content: (1) Introduction to deep learning (layers, activation, optimization, and overfitting); (2) feedforward networks for tabular data; (3) autoencoders for dimensionality reduction and measurement; (4) using deep nets to detect latent patterns, compress scales, and diagnose anomalies.
  • Outcome: Participants understand and can train deep-learning models in real social science data.

Class materials

Recommended: Kuhn, Max and Julia Silge. 2022. Tidy Modeling with R: A Framework for Modeling in the Tidyverse. O’Reilly. ISBN: 978-1492096481

 

Prerequisites

Prior knowledge of regression and R is highly recommended.

Marco Steenbergen

University of Zürich, Switzerland

He is professor of political methodology at the University of Zurich. His methodological interests span choice models, machine learning, measurement, and multilevel analysis. His substantive interests cover voting behavior and digital democracy, in particular, online deliberative processes. Originally hailing from The Netherlands, he previously taught at Carnegie Mellon University, the University of North Carolina at Chapel Hill, and the University of Bern. He has published extensively and is co-author of the award-winning book The Ambivalent Partisan (OUP 2012).