Researchers have located surprisingly sophisticated concepts in large language models, such as truthfulness and emotional valence. What makes a good concept representation, and what methods reliably give us control over how models use these concepts? I will present recent progress in evaluating concept representations, from directions (e.g., from sparse autoencoders or steering vectors) to affine and convex hulls. Then, I will turn to the question of how to locate and control relevant features.
How can we move from observing what a model does to understanding why it does it? In this talk, I argue that causality is the key to uncovering the mechanisms underlying model predictions. First, I examine a “macro” view of model analysis, showing how econometric tools—such as regression discontinuity or difference-in-differences—can isolate the causal impact of specific design choices, like tokeniser and training data selection, on a model’s outputs.
In this talk, I will present CodeScaler, a novel framework designed to overcome the scalability bottlenecks of Reinforcement Learning from Verifiable Rewards (RLVR) in code generation. While traditional RLVR relies heavily on the availability of high-quality unit tests—which are often scarce or unreliable—CodeScaler introduces an execution-free reward model that scales both training and test-time inference.
Professor Wei-Yin Loh; University of Wisconsin–Madison, Department of Statistics
Classification and regression tree models are unmatched for their interpretability, a feature that is lacking in "black-box" models, such as tree ensembles and those constructed by deep learning and gradient boosting. Yet tree models have been falling out of favor in recent years. One reason is the prediction accuracy of tree models tends to be lower than that of black-box models, particularly random forests. Consequently, the latter have largely supplanted trees for prediction tasks.
We warmly invite you to the C2D3 Computational Biology Annual Symposium 2026. This event is open to everyone in the Computational Biology Community.
https://www.c2d3.cam.ac.uk/events/comp-bio-2026
Early Career Researcher: Abstract Submission
We are inviting Early Career Researchers to present their research during the symposium. Talks should be 17 minutes each, and a short Q&A will follow. Abstract submission - Deadline 9am 1st April 2026.
Registrations
Registration is essential. A waitlist will open if capacity is reached. Registrations - Deadline 9am Monday 4th May 2026.
This free event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.
If you would like to participate, please sign up as we will not be able to offer a consultation otherwise. Please sign up through the following link: https://forms.gle/Tbk2JKH6Sm3CbA8SA. Sign-up is possible from May 7 midday (12pm) until May 11 midday or until we reach full capacity, whichever is earlier. If you successfully signed up, we will confirm your appointment by May 13 midday.
As AI systems become capable enough to matter, I think the question of whether we actually understand them becomes urgent in a new way. This talk works through four candidate answers — understanding as explanation, as mechanism, as control, and as process — and argues that each one, on its own, isn't enough.
Michael Sparks - Software Sustainability Institute
The Research Software Quality Toolkit (RSQKit; https://everse.software/RSQKit/), developed by the EVERSE project, lists curated best practices for improving the quality of research software. It is intended for researchers, research software engineers, as well as those running research infrastructures involving software or engaged in research software policy and funding.
Lotem Peled-Cohen (Technion - Israel Institute of Technology)
This talk presents my PhD research, supervised by Prof. Roi Reichart, exploring the intersection of Large Language Models (LLMs) and Alzheimer’s and related dementias. I begin by presenting our survey and perspective paper, in which we map the field’s current state and identify critical research gaps, such as data scarcity and the need for LLM-based simulation.
Scientific discovery emerges not from isolated reasoning, but from the intersection of diverse epistemic traditions. This talk proposes that the modern AI ecosystem, a structured network of heterogeneous reasoning agents spanning approximate and rigorous inference, constitutes a new form of collaborative intelligence for scientific inquiry. Drawing on Simon's conception of reasoning as adaptive search, we argue that such ecosystems do not merely accelerate known reasoning pathways, but create conditions under which genuinely novel representations may emerge.
Scientific discovery emerges not from isolated reasoning, but from the intersection of diverse epistemic traditions. This talk proposes that the modern AI ecosystem, a structured network of heterogeneous reasoning agents spanning approximate and rigorous inference, constitutes a new form of collaborative intelligence for scientific inquiry. Drawing on Simon's conception of reasoning as adaptive search, we argue that such ecosystems do not merely accelerate known reasoning pathways, but create conditions under which genuinely novel representations may emerge.
Scientific discovery emerges not from isolated reasoning, but from the intersection of diverse epistemic traditions. This talk proposes that the modern AI ecosystem, a structured network of heterogeneous reasoning agents spanning approximate and rigorous inference, constitutes a new form of collaborative intelligence for scientific inquiry. Drawing on Simon's conception of reasoning as adaptive search, we argue that such ecosystems do not merely accelerate known reasoning pathways, but create conditions under which genuinely novel representations may emerge.
Luke Gilbert, PhD, Associate Professor of Urology, University of California, San Francisco
Abstract: The ability to precisely manipulate endogenous gene expression enables exploration of gene function and establishment of causal relationships. This lecture will discuss CRISPR tools for turning genes on and off from a research and therapeutics perspective. I will also describe our CRISPRi approach for large-scale mapping of genetic interactions (GI) in the context of environmental perturbations.
Monotonicity is a common and often necessary assumption in biomedical research. In multiplex assays, biomarker expression is expected to have a monotonic association with disease outcome; similarly, in dose-finding studies, the probability of a response or toxicity outcome is expected to increase with dose.
In this talk we will explore a zero-player game based on an information isolation constraint. The dynamics of the game emerge from a “no-barber” selection principle that prohibits external structure. The aim is for the game to avoid impredictive-style inconsistencies. Motivated by the selection principle we will derive a “selected" trajectory in the game that consists of a second-order constrained maximum entropy production along the information geometry.
Kirsty Pringle - Software Sustainability Institute; EPCC, University of Edinburgh
Research Software Engineers (RSEs) collaborate with researchers to develop and maintain software, helping to embed best practices that improve reliability and reduce inefficiencies in research workflows.
As awareness grows of the environmental impact of computational research, a new specialism - Green RSE - is beginning to emerge.
Green RSEs integrate sustainability into software development, ensuring environmental considerations are addressed alongside performance and usability.
Abstract: Neural networks have shown remarkable performance across data domains, especially in regimes of increasing compute budgets. However, fundamental insights into how neural networks process information, share representations and traverse loss landscapes remain uncertain. In this work, we quantify the functional impact of distribution matching, facilitated by knowledge sharing mechanisms such as knowledge distillation, under student-teacher optimisation strategies.
This free event is open only to members of the University of Cambridge (and affiliated institutes). Please be aware that we are unable to offer consultations outside clinic hours.
If you would like to participate, please sign up as we will not be able to offer a consultation otherwise. Please sign up through the following link: https://forms.gle/5dHfs6vJrrvTbqst5. Sign-up is possible from May 21 midday (12pm) until May 25 midday or until we reach full capacity, whichever is earlier. If you successfully signed up, we will confirm your appointment by May 27 midday.