Content Analysis and Natural Language Processing
Lecturer: Thomas Hills
Modality: In presence
Week 2: 17-21 August 2026
Workshop contents and objectives
This workshop provides participants with a practical hands-on and theoretical understanding of new methods in deriving quantitative data from natural language. This approach asks how we can extend qualitative content analysis to the available digital tools, of which there are many. Moreover, we also ask how existing tools can help us ask new qualitative and quantitative questions when we are dealing with text as data. This approach scales from words to documents to large text corpora.
Some of the issues this approach addresses include the following:
- Understanding the speech of political leaders.
- Detecting historical and cultural change in well-being, views towards immigrants, trust, etc.
- Predicting views of brands: What does it mean to be a luxury brand? What associations do people have with different products?
- Using language to predict personality or changes across an individual’s lifespan: How did the writing of Darwin, Mozart, and Van Gogh change across their lifespan?
The course will begin by providing participants with an understanding of what natural language processing offers content analysis. Automation can allow interesting content questions to be answered in very short periods of time (sometimes minutes), saving weeks or months of research time. It can also introduce new questions that lead to innovative research programs.
Each day will present published research and then unpack the methodology to show how the research was done, providing code and data for students to replicate these approaches with their own data, including analysis and visualization of results.
On completion of the course, participants will be able to recognize and implement many common approaches to content analysis using natural language processing. Students will also take the first steps towards formulating and addressing problems of their own. Participants will also be provided with detailed information about how to follow up and learn more with respect to their area of interest.
This workshop is good for experienced researchers and early researchers alike, as much of it takes a hands-on approach where you can use tools to formulate and address your own research questions.
Workshop design
The course will alternate between lectures and interactive programming using pre-written code in R.
Detailed lecture plan (daily schedule)
Day 1: Intro to content analysis and natural language processing, off the shelf tools and the art of the simple approach
Day 2: Word features (sentiment analysis and feature analysis)
Day 3: Word and document semantics and similarity (categorizing documents and words)
Day 4: Topics (what are my documents about and how can I organize them?)
Day 5: Advanced topics and short presentations from students
Class materials
All materials including code, readings, and slides will be provided online.
Prerequisites
Students taking this workshop should have some experience with R and RStudio. There are free or inexpensive online courses (e.g., Datacamp) that offer introductory courses in R that are sufficient prerequisites for this course. A general introductory book to statistics in R will also work (e.g., Dalgaard, P. 2008. Introductory statistics with R is where I started). Though the course will primarily use R, I will provide all the code. Therefore, this course can be a way to improve your R skills as well.
Recommended readings or preliminary material
- Hills, T., Proto, E., & Sgroi, D. (2019). Historical analysis of national subjective wellbeing using millions of digitized books. Nature Human Behavior, 1-5. https://warwick.ac.uk/fac/sci/psych/people/thills/thills/2019_hillsproto...
- Li, Y., & Hills, T. T. (2021). Language patterns of outgroup prejudice. Cognition, 215, 104813.
- Hills, T., & Miani, A. (2025). A short primer on historical natural language processing. https://warwick.ac.uk/fac/sci/psych/people/thills/thills/23_hillsmiani_n...
What our participants appreciated most
"Thomas is a great instructur with a thorough knowledgeof the subject. The workshop gave us sufficient time to work on our own data and apply the theoretical content presented. Thomas answered our questions in detail and gave us lots of advice and tips for performing analyses in R."
"Professor Hills was absolutely amazing; he has great amount of knowledge, and | really appreciate the amount of information and examples we were given. | finished this workshop with more questions than | had at the beginning- which is absolutely amazing. The workshoptriggered me to think about my work on a higher level."
Thomas Hills
University of Warwick
Thomas Hills is Professor of Psychology at the University of Warwick, concentrating on how humans represent and navigate information in the mind and society, including topics such as conspiracy beliefs, aging memory, and cultural evolution. He directs the Behavioral and Data Science MSc and has held fellowships with the Alan Turing Institute and the Royal Society. His publications include work in psychology, communications, education, and economics, and focus on issues associated with large-scale analysis of data.