Large Language Models in Practice: A Hands-On Journey from Data Collection to Insight Discovery
Monday, 27 January 2025, 1.00pm to 5.00pm
Organiser: Cambridge Digital Humanities
Location: Milstein Seminar Room, Cambridge University Library
Convenor: Jacob Forward, CDH Methods Fellow 2024–25
Jacob will offer hands-on experience of a full research pipeline in this methods workshop, from data collection and cleaning to deploying large language models (LLMs) to uncover new insights from our textual sources.
The session will cover:
- An overview of how digital neural networks operate and how they can be effectively used in LLMs to grasp the patterns in language.
- Discover how to web-scrape text to create a dataset of primary sources you want to explore.
- Use LLMs to help generate and debug the code necessary to clean your dataset and convert it into an appropriate file type.
- Discuss best practices when working with AI to produce code.
- Explore our sources by deploying LLMs in a process known as Retrieval Augmented Generation (RAG).
- Discuss the merits of ‘fine-tuning’ vs RAG.
If you don’t have any experience of coding, Jacob hopes to show you just how much you are capable of, and if you have a technical background, you can look forward to pushing the boundaries of your skill.