Large Language Models for Social Science Research

Lecturer: Giovanni Colavizza

Modality: In-presence

Week 1: 10-14 August 2026

Workshop Contents and Objectives

The course aims to bring students and researchers in the social sciences up to speed with the technology of Large Language Models (LLMs). LLMs are a powerful machine learning technique able to represent and generate text, as well as data in other modalities. LLMs can also be used for automating several research tasks with little tuning. The course provides a theoretical foundation on LLMs as well as several applications relevant to the social sciences, which will be implemented in practical sessions.

The practical sessions will showcase examples from recent literature in the social sciences using LLMs in a variety of novel ways. These include the large-scale detection of online media polarization, the thematic clustering and labeling of textual sources, how to simulate survey participants, and how to use LLMs as personal research assistants for searching and summarizing information, or for data analysis.

Workshop design

The course combines lectures (morning sessions) with laboratories consisting of live coding and practicals (afternoon sessions). The students will also be provided with a small amount of credits to use a commercial LLM during the course.

Detailed lecture plan (daily schedule)

Day 1

Introduction to Large Language Models

Laboratory: System setup, the HuggingFace library, using the OpenAI API.

Day 2

LLMs for content representation

Laboratory: Thematic clustering of textual sources.

Day 3

LLMs for classification and regression

Laboratory: Large-scale, zero-shot classification of textual sources.

Day 4

LLMs for content generation

Laboratory: Prompt engineering and simulating survey participants.

Day 5

LLMs as research assistants

Laboratory: Semantic search, retrieval-augmented generation, data analysis.

Course feedback and Q&A.

Class materials

All materials will be provided online in the course’s GitHub repository.

Prerequisites

A practical knowledge of the Python programming language is recommended.

Recommended readings or preliminary material

What our participants appreciated most

"| really loved this workshop and for not having lot of prior knowledge in Python or LLM in a technical manner | could follow most of the content because our instructor prepared detailed and well commented Python scripts. | enjoyed the discussions and the practices. In addition, the class members helped each other!"

"Amazing workshop, maybe very heavyin the content quantity but the Professor was very good at explaining things and we haveall the materials (detailed slides, links and notes) to digest and be ready to useit quite autonomously."

Giovanni Colavizza

University of Copenhagen and University of Bologna

Colavizza is a Professor of Digital and Computational Humanities at the Department of Communication, University of Copenhagen, and Associate Professor of Computer Science at the Department of Classical and Italian Philology at the University of Bologna. Colavizza is the CTO of Odoma, a Swiss-based company providing AI solutions for the cultural and creative sectors. Colavizza held previous appointments at the The Alan Turing Institute, the University of Amsterdam, and the EPFL. Colavizza’s primary domain of expertise is that of machine learning applications in the Social Sciences and Humanities.