TL;DR: PhD Candidate, Natural Language Processing, Language Variation and Diversity, Evaluating LLMs, Neural Style Representations, Paraphrasing in Dialog, Tokenization for Language Variation
I am a PhD Candidate at the NLP and Society Lab at Utrecht University with Dong Nguyen. I study language variation (e.g., lexical variation, spelling variation, … clustered around different social groups) via natural language processing (NLP) and machine learning methods. Previously, I studied computer science and mathematics at RWTH Aachen University. I also spent time at places like University of Michigan, Vrije Universiteit Amsterdam, UNSW Sydney and University of Auckland. Currently wrapping up my thesis with the working title “Say the Same but Differently: Measuring Stylistic Variation and Paraphrasing Across Speakers”.
Generally, I am interested in how to evaluate, measure and represent language variation. Questions I am interested in include: How to measure language variation in a dataset? How to curate training datasets including language diversity? How to evaluate representation of language variation in LLMs? How to represent language variation in LLMs? …
Check out our our style embedding model on the huggingface model hub (see also pic below). This is a model trained in Content-Independent Style Representations, see the video here. Evaluated on STEL. Consider contributing!
Talks: SCALE at Johns Hopkins University (2022); Bocconi University (2021); Complexity Science Hub, Vienna (2020)
My PhD project is funded by EMMA and NWO. This is a collaboration with Dong Nguyen (UU), Kees van Deemter (UU), Tijs van den Broek and Bianca Beersma in the #Bridging Project.
Interested in doing a Bachelor’s or Master’s Thesis with me? see: Student Projects