StyleSurvey

Papers and works from the Style Survey

We welcome contributions via GitHub issues or pull requests to collect relevant style papers that are not mentioned in our survey. Note that this list includes all references from our survey which includes works that might not be directly related to style or style representations.

This page is intended to list the references from the Style Survey paper as a sortable, searchable table.

To filter the table, start typing in the search box below; click any column header to sort.

Title Authors Link Year Venue
Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace Ahmed Abbasi and Hsinchun Chen link 2008 ACM Transactions on Information Systems (TOIS), 26(2):1–29
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg link 2017 ICLR
Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts? Cristina Aggazzotti, Nicholas Andrews, and Elizabeth Allyn Smith link 2024 Transactions of the Association for Computational Linguistics, 12:875–891
Content Anonymization for Privacy in Long-form Audio Cristina Aggazzotti, Ashi Garg, Zexin Cai, and Nicholas Andrews link 2025 arXiv preprint ArXiv:2510.12780
The impact of automatic speech transcription on speaker attribution Cristina Aggazzotti, Matthew Wiesner, Elizabeth Allyn Smith, and Nicholas Andrews link 2025 Transactions of the Association for Computational Linguistics, in press
Neurobiber: Fast and Interpretable Stylistic Feature Extraction Kenan Alkiek, Anna Wegmann, Jian Zhu, and David Jurgens link 2025 arXiv preprint ArXiv:2502.18590
SmolLM2: When Smol Goes Big — Data-Centric Training of a Fully Open Small Language Model Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, et al. link 2025 COLM
Masks and mimicry: Strategic obfuscation and impersonation attacks on authorship verification Kenneth Alperin, Rohan Leekha, Adaku Uchendu, et al. link 2025 Proceedings of the 5th International Conference on NLP for Digital Humanities, pages 102–116
Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, and Kathleen McKeown link 2025 COLING, pages 1124–1135
Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers Milad Alshomary, Nikhil Reddy Varimalla, Vishal Anand, Smaranda Muresan, and Kathleen McKeown link 2025 EMNLP, pages 10290–10303
The topic confusion task: A novel evaluation scenario for authorship attribution Malik Altakrori, Jackie Chi Kit Cheung, and Benjamin CM Fung link 2021 Findings of EMNLP 2021, pages 4242–4256
Learning invariant representations of social media users Nicholas Andrews and Marcus Bishop link 2019 EMNLP-IJCNLP, pages 1684–1695
(Dis)improved?! How Simplified Language Affects Large Language Model Performance across Languages Miriam Anschütz, Anastasiya Damaratskaya, Chaeeun Joy Lee, Arthur Schmalz, Edoardo Mosca, and Georg Groh link 2025 GEM² Workshop, pages 847–861
A light in the dark web: Linking dark web aliases to real internet identities Ehsan Arabnezhad, Massimo La Morgia, Alessandro Mei, Eugenio Nerio Nemmi, and Julinda Stefa link 2020 ICDCS, pages 311–321
Computational forensic authorship analysis: Promises and pitfalls Shlomo Argamon link 2018 Language and Law/Linguagem e Direito, 5(2):7–37
Overview of the International Authorship Identification Competition at PAN-2011 Shlomo Argamon and Patrick Juola link 2011 CLEF 2011
Efficient Large Scale Language Modeling with Mixtures of Experts Mikel Artetxe, Shruti Bhosale, Naman Goyal, et al. link 2022 EMNLP, pages 11699–11732
The Routledge Handbook of Sociolinguistics Around the World, 2nd edition Martin J. Ball, Rajend Mesthrie, and Chiara Meluzzi link 2023 Routledge
The Language That Drives Engagement: A Systematic Large-scale Analysis of Headline Experiments Akshina Banerjee and Oleg Urminsky link 2025 Marketing Science, 44(3):566–592
Keep it Private: Unsupervised privatization of online text Calvin Bao and Marine Carpuat link 2024 NAACL, pages 8678–8693
Measuring what Matters: Construct Validity in Large Language Model Benchmarks Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, et al. link 2025 NeurIPS
Probing classifiers: Promises, shortcomings, and advances Yonatan Belinkov link 2022 Computational Linguistics, 48(1):207–219
What do neural machine translation models learn about morphology? Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass link 2017 ACL, pages 861–872
Language style as audience design Allan Bell link 1984 Language in Society, 13(2):145–204
Overview of PAN 2025: Generative AI Detection, Multilingual Text Detoxification, Multi-author Writing Style Analysis, and Generative Plagiarism Detection Janek Bevendorff, Daryna Dementieva, Maik Fröbe, et al. link 2025 Advances in Information Retrieval, pages 434–441
The two paradigms of LLM detection: Authorship attribution vs. authorship verification Janek Bevendorff, Matti Wiegmann, Emmelie Richter, Martin Potthast, and Benno Stein link 2025 Findings of ACL 2025, pages 3762–3787
Variation across Speech and Writing Douglas Biber link 1988 Cambridge University Press
Register, Genre, and Style, 2nd edition Douglas Biber and Susan Conrad link 2019 Cambridge University Press
Natural Language Processing with Python Steven Bird, Ewan Klein, and Edward Loper link 2019 O'Reilly Media
Centering the speech community Steven Bird and Dean Yibarbuk link 2024 EACL, pages 826–839
ETS corpus of non-native written English LDC2014T06 Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, and Martin Chodorow link 2014 Linguistic Data Consortium
The language of intergroup distinctiveness Richard Y. Bourhis and Howard Giles link 1977 Language, Ethnicity and Intergroup Relations, pages 119–135
Rethinking the Authorship Verification Experimental Setups Florin Brad, Andrei Manolache, Elena Burceanu, Antonio Barbalau, Radu Tudor Ionescu, and Marius Popescu link 2022 EMNLP, pages 5634–5643
Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity Michael Brennan, Sadia Afroz, and Rachel Greenstadt link 2012 ACM TISSEC, 15:1–22
'Delta': a measure of stylistic difference and a guide to likely authorship John Burrows link 2002 Literary and Linguistic Computing, 17(3):267–287
How the communication style of chatbots influences consumers' satisfaction, trust, and engagement in the context of service failure Na Cai, Shuhong Gao, and Jinzhe Yan link 2024 Humanities and Social Sciences Communications, 11(1):687
Accent, (ING), and the social logic of listener perceptions Kathryn Campbell-Kibler link 2007 American Speech, 82(1):32–64
The nature of sociolinguistic perception Kathryn Campbell-Kibler link 2009 Language Variation and Change, 21(1):135–156
The sociolinguistic variant as a carrier of social meaning Kathryn Campbell-Kibler link 2011 Language Variation and Change, 22(3):423–441
The elements of style Kathryn Campbell-Kibler, Penelope Eckert, Norma Mendoza-Denton, and Emma Moore link 2006 NWAV Poster Session
Expertise style transfer: A new task towards better communication between experts and laymen Yixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Liu, and Tat-Seng Chua link 2020 ACL, pages 1061–1071
On the diversity of synthetic data and its impact on training large language models Hao Chen, Abdul Waheed, Xiang Li, Yidong Wang, Jindong Wang, Bhiksha Raj, and Marah I. Abdin link 2024 arXiv preprint ArXiv:2410.15226
HumT DumT: Measuring and controlling human-like language in LLMs Myra Cheng, Sunny Yu, and Dan Jurafsky link 2025 ACL, pages 25983–26008
CLUB: a contrastive log-ratio upper bound of mutual information Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, and Lawrence Carin link 2020 ICML
Improving disentangled text representation learning with information-theoretic guidance Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, Yizhe Zhang, Yitong Li, and Lawrence Carin link 2020 ACL, pages 7530–7541
Evaluating synthetic data generation from user generated text Jenny Chim, Julia Ive, and Maria Liakata link 2025 Computational Linguistics, 51(1):191–233
When Variants Lack Semantic Equivalence: Adverbial Subclause Word Order Tanya Karoli Christensen and Torben Juel Jensen link 2022 Cambridge University Press, pages 171–206
Conventionality and contrast: Pragmatic principles with lexical consequences Eve V. Clark link 1992 Frames, Fields, and Contrasts, pages 171–188
Dimensions of abusive language on Twitter Isobelle Clarke and Jack Grieve link 2017 First Workshop on Abusive Language Online, pages 1–11
Detecting collaborations in text comparing the authors' rhetorical language choices in the federalist papers Jeff Collins, David Kaufer, Pantelis Vlachos, Brian Butler, and Suguru Ishizaki link 2004 Computers and the Humanities, 38:15–36
Author identification, idiolect, and linguistic uniqueness Malcolm Coulthard link 2004 Applied Linguistics, 25(4):431–447
Style: Language Variation and Identity Nikolas Coupland link 2007 Cambridge University Press
Txtng: The gr8 db8 David Crystal link 2008 Oxford University Press
A Dictionary of Linguistics and Phonetics, 6th edition David Crystal link 2011 Blackwell Publishing
Investigating English Style David Crystal and Derek Davy link 1969 Routledge
Learning stylometric representations for authorship analysis Steven H. H. Ding, Benjamin C. M. Fung, Farkhund Iqbal, and William K. Cheung link 2019 IEEE Transactions on Cybernetics, 49(1):107–121
Speaker recognition based on idiolectal differences between speakers George R. Doddington link 2001 Eurospeech 2001, pages 2521–2524
Automatically constructing a corpus of sentential paraphrases William B. Dolan and Chris Brockett link 2005 IWP2005
Triplet loss in siamese network for object tracking Xingping Dong and Jianbing Shen link 2018 ECCV 2018, pages 472–488
Refocusing on relevance: Personalization in NLG Shiran Dudy, Steven Bedrick, and Bonnie Webber link 2021 EMNLP, pages 5190–5202
HotFlip: White-Box Adversarial Examples for Text Classification Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou link 2018 ACL, pages 31–36
Jocks and Burnouts: Social Categories and Identity in the High School Penelope Eckert link 1989 Teachers College Press
Variation and the indexical field Penelope Eckert link 2008 Journal of Sociolinguistics, 12(4):453–476
Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation Penelope Eckert link 2012 Annual Review of Anthropology, 41(1):87–100
Stylometry with R: A Package for Computational Text Analysis Maciej Eder, Jan Rybicki, and Mike Kestemont link 2016 The R Journal, 8(1):107–121
Analyzing the Persuasive Effect of Style in News Editorial Argumentation Roxanne El Baff, Henning Wachsmuth, Khalid Al Khatib, and Benno Stein link 2020 ACL, pages 3154–3160
Adversarial removal of demographic attributes from text data Yanai Elazar and Yoav Goldberg link 2018 EMNLP, pages 11–21
Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text Ahmed M. Elkhatat, Khaled Elsaid, and Saeed Almeer link 2023 International Journal for Educational Integrity, 19(1):1–16
MMTEB: Massive Multilingual Text Embedding Benchmark Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, et al. link 2024 ICLR
Variety, style-shifting, and ideology Susan M. Ervin-Tripp link 2001 Style and Sociolinguistic Variation, pages 44–56
Olmo 3 Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, et al. link 2025 Technical Report
Leveraging Measurement Theory for Natural Language Processing Research Qixiang Fang link 2024 Dissertation, Utrecht University
Linguistic bias in ChatGPT: Language models reinforce dialect discrimination Eve Fleisig, Genevieve Smith, Madeline Bossi, Ishita Rustagi, Xavier Yin, and Dan Klein link 2024 EMNLP, pages 13541–13564
Survey of the state of the art in natural language generation: Core tasks, applications and evaluation Albert Gatt and Emiel Krahmer link 2018 JAIR, 61:65–170
GLTR: Statistical detection and visualization of generated text Sebastian Gehrmann, Hendrik Strobelt, and Alexander Rush link 2019 ACL System Demonstrations, pages 111–116
Accommodation theory: Communication, context, and consequence Howard Giles, Nikolas Coupland, and Justine Coupland link 1991 Contexts of accommodation, 1:1–68
Speech Style and Social Evaluation Howard Giles and Peter F. Powesland link 1975 Academic Press
Assessing BERT's syntactic abilities Yoav Goldberg link 2019 arXiv preprint ArXiv:1901.05287
Coh-Metrix: Analysis of text on cohesion and language Arthur C. Graesser, Danielle S. McNamara, Max M. Louwerse, and Zhiqiang Cai link 2004 Behavior Research Methods, Instruments, & Computers, 36(2):193–202
The Idea of Progress in Forensic Authorship Analysis Tim Grant link 2022 Cambridge University Press
Quantitative authorship attribution: An evaluation of techniques Jack Grieve link 2007 Literary and Linguistic Computing, 22(3):251–270
Register variation explains stylometric authorship analysis Jack Grieve link 2023 Corpus Linguistics and Linguistic Theory, 19(1):47–77
The sociolinguistic foundations of language modeling Jack Grieve, Sara Bartl, Matteo Fuoli, et al. link 2025 Frontiers in Artificial Intelligence, 7:1472411
Variation among blogs: A multi-dimensional analysis Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova link 2011 Genres on the Web, pages 303–322
Benchmarking Linguistic Diversity of Large Language Models Yanzhu Guo, Guokan Shang, and Chloé Clavel link 2025 arXiv preprint ArXiv:2412.10271
The curious decline of linguistic diversity: Training language models on synthetic text Yanzhu Guo, Guokan Shang, Michalis Vazirgiannis, and Chloé Clavel link 2024 Findings of NAACL 2024, pages 3589–3604
Annotation artifacts in natural language inference data Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith link 2018 NAACL, pages 107–112
Towards style alignment in cross-cultural translation Shreya Havaldar, Adam Stein, Eric Wong, and Lyle Ungar link 2025 ACL, pages 32213–32230
Representation learning of writing style Julien Hay, Bich-Lien Doan, Fabrice Popineau, and Ouassim Ait Elhara link 2020 W-NUT 2020, pages 232–243
Measuring Mathematical Problem Solving With the MATH Dataset Dan Hendrycks, Collin Burns, Saurav Kadavath, et al. link 2021 NeurIPS
Looking for the inner music: Probing LLMs' understanding of literary style Rebecca M. M. Hicke and David Mimno link 2025 Computational Humanities Research, 1:e3
AI generates covertly racist decisions about people based on their dialect Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, and Sharese King link 2024 Nature, 633:147–154
Intonation and referee design phenomena in the narrative speech of Black/biracial men Nicole Holliday link 2021 Journal of English Linguistics, 49(3):283–304
The analysis of literary style – a review David I. Holmes link 1985 Journal of the Royal Statistical Society: Series A, 148(4):328–341
ParaGuide: Guided diffusion paraphrasers for plug-and-play textual style transfer Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, and Kathleen McKeown link 2024 AAAI, pages 18216–18224
TinyStyler: Efficient few-shot text style transfer with authorship embeddings Zachary Horvitz, Ajay Patel, Kanishk Singh, Chris Callison-Burch, Kathleen McKeown, and Zhou Yu link 2024 Findings of EMNLP 2024, pages 13376–13390
N-gram feature selection for authorship identification John Houvardas and Efstathios Stamatatos link 2006 AIMSA'06, pages 77–86
Demographic factors improve classification performance Dirk Hovy link 2015 ACL-IJCNLP, pages 752–762
"You Sound Just Like Your Father" Commercial Machine Translation Systems Include Stylistic Biases Dirk Hovy, Federico Bianchi, and Tommaso Fornaciari link 2020 ACL, pages 1686–1690
The social impact of natural language processing Dirk Hovy and Shannon L. Spruit link 2016 ACL, pages 591–598
Tagging Performance Correlates with Author Age Dirk Hovy and Anders Søgaard link 2015 ACL-IJCNLP, pages 483–488
The importance of modeling social factors of language: Theory and practice Dirk Hovy and Diyi Yang link 2021 NAACL, pages 588–602
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges Baixiang Huang, Canyu Chen, and Kai Shu link 2025 ACM SIGKDD Explorations Newsletter, 26(2):21–43
Sparse autoencoders find highly interpretable features in language models Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey link 2024 ICLR
"Style" as distinctiveness: the culture and ideology of linguistic differentiation Judith T. Irvine link 2001 Style and Sociolinguistic Variation, pages 21–43
The Million Authors Corpus: A Cross-Lingual and Cross-Domain Wikipedia Dataset for Authorship Verification Abraham Israeli, Shuai Liu, Jonathan May, and David Jurgens link 2025 Findings of ACL 2025, pages 25997–26017
Style versus Content: A distinction without a (learnable) difference? Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive, and John Kelleher link 2020 COLING, pages 2169–2180
Evaluating Style-Personalized Text Generation: Challenges and Directions Anubhav Jangra, Bahareh Sarrafzadeh, Adrian de Wynter, Silviu Cucerzan, and Sujay Kumar Jauhar link 2025 arXiv preprint ArXiv:2508.06374
Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models Harsh Jhamtani, Varun Gangal, Eduard Hovy, and Eric Nyberg link 2017 Workshop on Stylistic Variation, pages 10–19
Mistral 7B Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, et al. link 2023 arXiv preprint ArXiv:2310.06825
Deep learning for text style transfer: A survey Di Jin, Zhijing Jin, Zhiting Hu, Olga Vechtomova, and Rada Mihalcea link 2022 Computational Linguistics, 48(1):155–205
Disentangled representation learning for non-parallel text style transfer Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Vechtomova link 2019 ACL, pages 424–434
Authorship attribution Patrick Juola link 2006 Foundations and Trends in Information Retrieval, 1(3):233–334
JGAAP 4.0–A revised authorship attribution tool Patrick Juola, John Noecker Jr., Mike Ryan, and Sandy Speer link 2009 Digital Humanities
(male, bachelor) and (female, Ph.D) have different connotations: Parallelly annotated stylistic language dataset with multiple personas Dongyeop Kang, Varun Gangal, and Eduard Hovy link 2019 EMNLP-IJCNLP, pages 1696–1706
Style is NOT a single variable: Case Studies for Cross-Stylistic Language Understanding Dongyeop Kang and Eduard Hovy link 2021 ACL-IJCNLP, pages 2376–2387
Function words in authorship attribution. from black magic to theory? Mike Kestemont link 2014 CLFL, pages 59–66
A deep metric learning approach to account linking Aleem Khan, Elizabeth Fleming, Noah Schofield, Marcus Bishop, and Nicholas Andrews link 2021 NAACL, pages 5275–5287
Learning to generate text in arbitrary writing styles Aleem Khan, Andrew Wang, Sophia Hager, and Nicholas Andrews link 2023 arXiv:2312.17242
Supervised contrastive learning Prannay Khosla, Piotr Teterwak, Chen Wang, et al. link 2020 NeurIPS, 33:18661–18673
Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains Junghwan Kim, Haotian Zhang, and David Jurgens link 2025 EMNLP, pages 34855–34880
Working in Language and Law: A German Perspective Hannes Kniffka link 2007 Palgrave Macmillan UK
What's in an embedding? analyzing word embeddings through multilingual evaluation Arne Köhn link 2015 EMNLP, pages 2067–2073
Automatically categorizing written texts by author gender Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni link 2002 Literary and Linguistic Computing, 17(4):401–412
Stylometric detection of ai-generated text in twitter timelines Tharindu Kumarage, Joshua Garland, Amrita Bhattacharjee, et al. link 2023 arXiv:2303.03697
Sociolinguistic Patterns William Labov link 1972 University of Pennsylvania Press
The Social Stratification of English in New York City, 2nd edition William Labov link 2006 Cambridge University Press
Tulu 3: Pushing Frontiers in Open Language Model Post-Training Nathan Lambert, Jacob Morrison, Valentina Pyatkin, et al. link 2025 arXiv preprint ArXiv:2411.15124
Where does the sociolinguistic variable stop? Beatriz R. Lavandera link 1978 Language in Society, 7(2):171–192
LFTK: Handcrafted Features in Computational Linguistics Bruce W. Lee and Jason Lee link 2023 BEA 2023, pages 1–19
Diverse Demonstrations Improve In-context Compositional Generalization Itay Levy, Ben Bogin, and Jonathan Berant link 2023 ACL, pages 1401–1422
TextBugger: Generating Adversarial Text Against Real-world Applications Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang link 2019 NDSS
Towards robust and privacy-preserving text representations Yitong Li, Timothy Baldwin, and Trevor Cohn link 2018 ACL, pages 25–30
Textbooks Are All You Need II: phi-1.5 technical report Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, et al. link 2023 arXiv preprint ArXiv:2309.05463
GPT detectors are biased against non-native English writers Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou link 2023 Patterns, 4(7):100779
Let's Verify Step by Step Hunter Lightman, Vineet Kosaraju, Yuri Burda, et al. link 2023 -
Style over Substance: Distilled Language Models Reason Via Stylistic Replication Philip Lippmann and Jie Yang link 2025 arXiv
Anonymisation models for text data: State of the art, challenges and future directions Pierre Lison, Ildikó Pilán, David Sanchez, Montserrat Batet, and Lilja Øvrelid link 2021 ACL-IJCNLP, pages 4188–4203
Enct5: A framework for fine-tuning t5 as non-autoregressive models Frederick Liu, Terry Huang, Shihang Lyu, et al. link 2022 arXiv:2110.08426
A Survey of Personalized Large Language Models: Progress and Future Directions Jiahong Liu, Zexuan Qiu, Zhongyang Li, et al. link 2025 arXiv preprint ArXiv:2502.11528
RECAP: Retrieval-enhanced context-aware prefix encoder for personalized dialogue response generation Shuai Liu, Hyundong Cho, Marjorie Freedman, Xuezhe Ma, and Jonathan May link 2023 ACL, pages 8404–8419
More than words: The influence of affective content and linguistic style matches in online reviews on conversion rates Stephan Ludwig, Ko de Ruyter, Max Friedman, et al. link 2013 Journal of Marketing, 77(1):87–103
Politeness transfer: A tag and generate approach Aman Madaan, Amrith Setlur, Tanmay Parekh, et al. link 2020 ACL, pages 1869–1881
Jointly learning author and annotated character n-gram embeddings: A case study in literary text Suraj Maharjan, Deepthi Mave, Prasha Shrestha, Manuel Montes, Fabio A. González, and Thamar Solorio link 2019 RANLP 2019, pages 684–692
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, and Navdeep Jaitly link 2024 arXiv preprint ArXiv:2401.16380
Counterfactual augmentation for robust authorship representation learning Hieu Man and Thien Huu Nguyen link 2024 SIGIR '24, pages 2347–2351
Language technologies as if people mattered: Centering communities in language technology development Nina Markl, Lauren Hall-Lew, and Catherine Lai link 2024 LREC-COLING 2024, pages 10085–10099
Umap: Uniform manifold approximation and projection for dimension reduction Leland McInnes, John Healy, and James Melville link 2020 arXiv:1802.03426
Introducing Sociolinguistics Miriam Meyerhoff link 2006 Routledge
Linguistic profiling of a neural language model Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta, and Giulia Venturi link 2020 COLING, pages 745–756
Stranger than paradigms word embedding benchmarks don't align with morphology Timothee Mickus and Maria Copot link 2024 SCiL 2024, pages 173–189
Investigating topic influence in authorship attribution George K Mikros and Eleni K Argiri link 2007 SIGIR'07 Workshop
The signature stylometric system Peter Millican link 2003 -
State of what art? a call for multi-prompt LLM evaluation Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf, and Gabriel Stanovsky link 2024 TACL, 12:933–949
Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers Frederick Mosteller and David L. Wallace link 1963 JASA, 58(302):275–309
MTEB: Massive Text Embedding Benchmark Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers link 2023 EACL, pages 2014–2037
s1: Simple test-time scaling Niklas Muennighoff, Zitong Yang, Weijia Shi, et al. link 2025 arXiv preprint ArXiv:2501.19393
Does your style engage? linguistic styles of influencers and digital consumer engagement on youtube Ana Cristina Munaro, Renato Hübner Barcelos, et al. link 2024 Computers in Human Behavior, 156(C)
Surveying stylometry techniques and applications Tempestt Neal, Kalaivani Sundararajan, Aneez Fatima, et al. link 2017 ACM Computing Surveys, 50(6):86
Collaborative growth: When large language models meet sociolinguistics Dong Nguyen link 2025 Language and Linguistics Compass, 19(2):e70010
Computational sociolinguistics: A Survey Dong Nguyen, A. Seza Doğruöz, Carolyn P. Rosé, and Franciska de Jong link 2016 Computational Linguistics, 42(3):537–593
"How old do you think I am?" A study of language and age in Twitter Dong Nguyen, Rilana Gravel, Dolf Trieschnigg, and Theo Meder link 2013 ICWSM, pages 439–448
Do word embeddings capture spelling variation? Dong Nguyen and Jack Grieve link 2020 COLING, pages 870–881
We Need to Measure Data Diversity in NLP – Better and Broader Dong Nguyen and Esther Ploeger link 2025 arXiv preprint ArXiv:2505.20264
On learning and representing social meaning in NLP: a sociolinguistic perspective Dong Nguyen, Laura Rosseel, and Jack Grieve link 2021 NAACL, pages 603–612
The Multi-Dimensional Analysis Tagger Andrea Nini link 2019 Multi-Dimensional Analysis: Research Methods and Current Issues
A Theory of Linguistic Individuality for Authorship Analysis Andrea Nini link 2023 Cambridge University Press
A study of style in machine translation: Controlling the formality of machine translation output Xing Niu, Marianna Martindale, and Marine Carpuat link 2017 EMNLP, pages 2814–2819
Multi-task neural models for translating between styles within and across languages Xing Niu, Sudha Rao, and Marine Carpuat link 2018 COLING, pages 1008–1021
2 OLMo 2 Furious Team OLMo, Pete Walsh, Luca Soldaini, et al. link 2025 arXiv preprint ArXiv:2501.00656
Linguistic style and crowdfunding success among social and commercial entrepreneurs Annaleena Parhankangas and Maija Renko link 2017 Journal of Business Venturing, 32(2):215–236
Learning interpretable style embeddings via prompting LLMs Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, and Chris Callison-Burch link 2023 Findings of EMNLP 2023, pages 15270–15290
StyleDistance: Stronger content-independent style embeddings with synthetic parallel examples Ajay Patel, Jiacheng Zhu, Justin Qiu, et al. link 2025 NAACL, pages 8662–8685
Language independent authorship attribution using character level language models Fuchun Peng, Dale Schuurmans, Shaojun Wang, and Vlado Keselj link 2003 EACL, pages 267–274
The Development and Psychometric Properties of LIWC2015 James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn link 2015 University of Texas at Austin
JSAN–The Integrated JStylo and Anonymouth Package Drexel University PSAL link 2013 Drexel University
Mind the style of text! adversarial and backdoor attacks based on text style transfer Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, and Maosong Sun link 2021 EMNLP, pages 4569–4580
mStyleDistance: Multilingual style embeddings and their evaluation Justin Qiu, Jiacheng Zhu, Ajay Patel, Marianna Apidianaki, and Chris Callison-Burch link 2025 Findings of ACL 2025, pages 16917–16931
Personalized machine translation: Preserving original author traits Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, and Shuly Wintner link 2017 EACL, pages 1074–1084
Overview of the author profiling task at PAN 2013 Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and Giacomo Inches link 2013 CLEF
Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer Sudha Rao and Joel Tetreault link 2018 NAACL, pages 129–140
A recipe for arbitrary text style transfer with large language models Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch, and Jason Wei link 2022 ACL, pages 837–848
Addressee- and topic-influenced style shift: A quantitative sociolinguistic study John R. Rickford and McNair-Knox link 1994 Sociolinguistic Perspectives on Register, pages 235–276
Few-shot detection of machine-generated text using style representations Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, and Nicholas Andrews link 2024 ICLR
Learning universal authorship representations Rafael Rivera Soto, Olivia Elizabeth Miano, Juanita Ordonez, et al. link 2021 EMNLP, pages 913–919
My LLM might Mimic AAE - But When Should It? Sandra Camille Sandoval, Christabel Acquaye, Kwesi Adu Cobbina, Mohammad Nayeem Teli, and Hal Daumé Iii link 2025 NAACL, pages 5277–5302
Topic-regularized authorship representation learning Jitkapat Sawatphol, Nonthakit Chaiwong, Can Udomcharoenchaikit, and Sarana Nutanong link 2022 EMNLP, pages 1076–1082
Addressing Topic Leakage in Cross-Topic Evaluation for Authorship Verification Jitkapat Sawatphol, Can Udomcharoenchaikit, and Sarana Nutanong link 2024 TACL, 12:1363–1377
MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data Vageesh Kumar Saxena, Benjamin Ashpole, Gijs Van Dijck, and Gerasimos Spanakis link 2025 Findings of ACL 2025, pages 4334–4373
Frequent-words analysis for forensic speaker comparison Eleni-Konstantina Sergidou, Nelleke Scheijen, Jeannette Leegwater, Tina Cambier-Langeveld, and Wauter Bosma link 2023 Speech Communication, 150:1–8
The power of words: Driving online consumer engagement in Fintech R.V. ShabbirHusain, Atul Arun Pathak, Shabana Chandrasekaran, and Balamurugan Annamalai link 2023 International Journal of Bank Marketing, 42(2):331–355
Style transfer from non-parallel text by cross-alignment Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola link 2017 NIPS'17, pages 6833–6844
Does string-based neural MT learn source syntax? Xing Shi, Inkit Padhi, and Kevin Knight link 2016 EMNLP, pages 1526–1534
Personalized author obfuscation with large language models Mohammad Shokri, Sarah Ita Levitan, and Rivka Levitan link 2025 arXiv preprint arXiv:2505.12090
A survey of modern authorship attribution methods Efstathios Stamatatos link 2009 JASIST, 60(3):538–556
Masking topic-related information to enhance authorship attribution Efstathios Stamatatos link 2017 JASIST, 69(3):461–473
Multi-label style change detection by solving a binary classification problem Eivind Strøm link 2021 CLEF 2021, pages 2146–2157
Dialect-robust evaluation of generated text Jiao Sun, Thibault Sellam, Elizabeth Clark, et al. link 2023 ACL, pages 6010–6028
Idiosyncrasies in large language models Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, and Zhuang Liu link 2025 arXiv:2502.12150
Unsupervised neural text simplification Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, and Karthik Sankaranarayanan link 2019 ACL, pages 2058–2068
What do you learn from context? Probing for sentence structure in contextualized word representations Ian Tenney, Patrick Xia, Berlin Chen, et al. link 2018 -
Writing Style Author Embedding Evaluation Enzo Terreau, Antoine Gourru, and Julien Velcin link 2021 Evaluation and Comparison of NLP Systems Workshop, pages 84–93
Stayal | multilingual style transfer Karishma Thakrar, Katrina Lawrence, and Kyle Howard link 2025 arXiv:2501.11639
Reddust: A large reusable dataset of reddit user traits Anna Tigunova, Paramita Mirza, Andrew Yates, and Gerhard Weikum link 2020 LREC, pages 6118–6126
HANSEN: Human and AI spoken text benchmark for authorship analysis Nafis Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, and Dongwon Lee link 2023 Findings of EMNLP 2023, pages 13706–13724
Research Methods: The Essential Knowledge Base William M. K. Trochim, James P. Donnelly, and Kanika Arora link 2015 Cengage Learning
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles Kimberly Le Truong, Riccardo Fogliato, Hoda Heidari, and Zhiwei Steven Wu link 2025 arXiv preprint ArXiv:2507.22168
Authorship attribution for neural text generation Adaku Uchendu, Thai Le, Kai Shu, and Dongwon Lee link 2020 EMNLP, pages 8384–8395
Paraphrase types elicit prompt engineering capabilities Jan Philip Wahle, Terry Ruas, Yang Xu, and Bela Gipp link 2024 EMNLP, pages 11004–11033
Can authorship representation learning capture stylistic features? Andrew Wang, Cristina Aggazzotti, Rebecca Kotula, Rafael Rivera Soto, Marcus Bishop, and Nicholas Andrews link 2023 TACL, 11:1416–1431
Feature vector difference based neural network and logistic regression models for authorship verification Janith Weerasinghe and Rachel Greenstadt link 2020 PAN at CLEF 2020, 2695
Does it capture STEL? a modular, similarity-based linguistic style evaluation framework Anna Wegmann and Dong Nguyen link 2021 EMNLP, pages 7109–7130
Tokenization is sensitive to language variation Anna Wegmann, Dong Nguyen, and David Jurgens link 2025 Findings of ACL 2025, pages 10958–10983
Same Author or Just Same Topic? Towards Content-Independent Style Representations Anna Wegmann, Marijn Schraagen, and Dong Nguyen link 2022 RepL4NLP Workshop, pages 249–268
Constraints on the agentless passive E. Judith Weiner and William Labov link 1983 Journal of Linguistics, 19(1):29–58
Disentangling style factors from speaker representations Jennifer Williams and Simon King link 2019 Interspeech, pages 3945–3949
Style over substance: Evaluation biases for large language models Minghao Wu and Alham Fikri Aji link 2025 COLING, pages 297–312
Out-of-distribution generalization in natural language processing: Past, present, and future Linyi Yang, Yaoxian Song, Xuan Ren, et al. link 2023 EMNLP, pages 4533–4559
A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models Hanqing Zhang, Haolin Song, Shaoyu Li, Ming Zhou, and Dawei Song link 2023 ACM Computing Surveys, 56(3):64:1–64:37
Personalized Text Generation with Contrastive Activation Steering Jinghao Zhang, Yuting Liu, Wenjie Wang, et al. link 2025 ACL, pages 7128–7141
How Well Do Text Embedding Models Understand Syntax? Yan Zhang, Zhaopeng Feng, Zhiyang Teng, Zuozhu Liu, and Haizhou Li link 2023 Findings of EMNLP 2023, pages 9717–9728
Personalization of Large Language Models: A Survey Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, et al. link 2025 Transactions on Machine Learning Research
Unmasking style sensitivity: A causal analysis of bias evaluation instability in large language models Jiaxu Zhao, Meng Fang, Kun Zhang, and Mykola Pechenizkiy link 2025 ACL, pages 16314–16338
Disentangled sequence to sequence learning for compositional generalization Hao Zheng and Mirella Lapata link 2022 ACL, pages 4256–4268
Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles Jian Zhu and David Jurgens link 2021 EMNLP, pages 279–297
StyleFlow: Disentangle latent representations via normalizing flow for unsupervised text style transfer Kangchen Zhu, Zhiliang Tian, Jingyu Wei, et al. link 2024 LREC-COLING 2024, pages 15384–15397
Trans self-identification and the language of neoliberal selfhood: Agency, power, and the limits of monologic discourse Lal Zimman link 2019 IJSL, 2019(256):147–175
An ensemble-rich multi-aspect approach for robust style change detection Dimitrina Zlatkova, Daniel Kopev, Kristiyan Mitov, et al. link 2018 PAN at CLEF-2018
Style change detection with feed-forward neural networks Chaoyuan Zuo, Yu Zhao, and Ritwik Banerjee link 2019 PAN at CLEF 2019, 93