TL;DR: Postdoctoral Researcher, Natural Language Processing, Language Variation and Diversity, Evaluating LLMs, Neural Style Representations, Paraphrasing, Tokenization for Language Variation
I am a Postdoctoral Researcher at the NLP and Society Lab at Utrecht University with Dong Nguyen. I study language variation (e.g., lexical variation, spelling variation, … clustered around different social groups) via natural language processing (NLP) and machine learning methods. Previously, I studied computer science and mathematics at RWTH Aachen University. I also spent time at places like University of Michigan, Vrije Universiteit Amsterdam, UNSW Sydney and University of Auckland. Find my thesis “Say the Same but Differently: Measuring Stylistic Variation and Paraphrasing Across Speakers” here.
Generally, I am interested in how to evaluate, measure and represent language variation. Questions I am interested in include: How to measure language variation in a dataset? How to curate training datasets including language diversity? How to evaluate representation of language variation in LLMs? How to represent language variation in LLMs? …
Check out our our style embedding model on the huggingface model hub (see also pic below). This is a model trained in Content-Independent Style Representations, see the video here. Evaluated on STEL. Consider contributing!

Talks: SCALE at Johns Hopkins University (2022); Bocconi University (2021); Complexity Science Hub, Vienna (2020)
Interested in doing a Bachelor’s or Master’s Thesis with me? see: Student Projects