Working with Large Online Datasets for Social Science Research (using R)
Instructor: Lukasz Walasek
Modality: In presence
Week 1: 10-14 August 2026
Workshop contents and objectives
The rapid growth of digital communication, online social platforms, and publicly available web data sources has created new opportunities for social scientists to study behavior, attitudes, and social phenomena on an unprecedent scale. Online data, whether accessed through APIs, publicly available datasets, social media archives, repositories, or web-scraping tools, can provide unique insights into complex topics, such as wellbeing, risk perception, health communication, consumer behavior, misinformation, and many others.
This course introduces participants to the foundational concepts, tools, and research practices needed to design and conduct social science projects using large online datasets. This course prioritizes practical knowledge with emphasis on research design, ethical considerations, discovering online data sources, and introductory methods for accessing, wrangling, and analyzing rich quantitative and qualitative datasets
By the end of this course, participants will be able to:
- Critically evaluate how online data sources can be applied effectively in social science research.
- Identify, access, and extract valuable data from various online sources.
- Implement data access using APIs and web scraping tools, focusing on ethical, robust, and sustainable data analysis pipelines.
- Apply best practices for data wrangling, documentation, and reproducible research workflows.
Workshop design
Each day on the course will consist of two parts:
- The morning session is dedicated to foundational concepts and approaches. We'll cover research design, data sources, and ethics, complemented by live demonstrations in R.
- The afternoon session will feature guided exercises and individual project work. Every participant will have the opportunity to apply their new knowledge to construct their own research pipeline using large online datasets.
During the course, participants are also welcome to work on their own projects. The instructor will happily assist with any individual project that requires skills and knowledge covered during the course.
Materials (lecture slides, sample datasets, handouts, exercises with solutions, annotated R scripts) will be made openly available via an online repository to all participants.
Detailed lecture plan (daily schedule)
| Day 1 – Opportunities and Challenges of Large Online Data in Social Science |
Morning:
Afternoon:
|
| Day 2 – Obtaining Data Using APIs |
Morning:
Afternoon:
|
| Day 3 – Working with Large Online Data: Wrangling, Cleaning, and Preparation |
Morning:
Afternoon:
|
| Day 4 – When APIs Aren’t Enough: Web Scraping |
Morning:
Afternoon:
|
| Day 5 – Combining it all Together: Complete Analysis Pipeline with Large Online Data |
Morning:
Afternoon:
|
Class materials
All materials will be provided online.
Prerequisites
The course is suitable for beginners wanting to explore online data in an applied, conceptual, and practical way.
Participants are expected to have basic computer and statistical analysis skills. Basic familiarity with R is necessary to participate in practical exercises and activities.
Recommended readings or preliminary material
- Altman, S., Behrman, B., & Wickham, H. (2021). Data Wrangling. https://dcl-wrangle.stanford.edu/
- Bradley, A., & James, R. J. E. (2019). Web scraping using R. Advances in Methods and Practices in Psychological Science, 2(3), 264-270.
- Wickham H (2022). rvest: Easily Harvest (Scrape) Web Pages. https://rvest.tidyverse.org/
Lukasz Walasek
University of Warwick, UK
Dr Lukasz Walasek is an associate professor at the Department of Psychology, University of Warwick, UK. He completed PhD and MSc in Psychology at the University of Essex and a BSc in Psychosocial Sciences at the University of East Anglia. Dr Walasek teaches the “Behavioural Change: Nudging and Persuasion” on the MSc in Behavioural Economic Science, and MSc in Behavioural Data Science.
In his research, Dr Walasek applies insights from data science to study how people make everyday decisions and judgments. His most recent work uses data mining and natural language processing to study topics such as: implicit bias, self-control, gambling-related harm, food choice, effects of inequality on consumption, as well as the dynamics of political polarization.