We welcome contributions via GitHub issues or pull requests to collect relevant style papers that are not mentioned in our survey. Note that this list includes all references from our survey which includes works that might not be directly related to style or style representations.
This page is intended to list the references from the Style Survey paper as a sortable, searchable table.
To filter the table, start typing in the search box below; click any column header to sort.
| Title | Authors | Link | Year | Venue |
|---|---|---|---|---|
| Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace | Ahmed Abbasi and Hsinchun Chen | link | 2008 | ACM Transactions on Information Systems (TOIS), 26(2):1–29 |
| Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks | Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg | link | 2017 | ICLR |
| Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts? | Cristina Aggazzotti, Nicholas Andrews, and Elizabeth Allyn Smith | link | 2024 | Transactions of the Association for Computational Linguistics, 12:875–891 |
| Content Anonymization for Privacy in Long-form Audio | Cristina Aggazzotti, Ashi Garg, Zexin Cai, and Nicholas Andrews | link | 2025 | arXiv preprint ArXiv:2510.12780 |
| The impact of automatic speech transcription on speaker attribution | Cristina Aggazzotti, Matthew Wiesner, Elizabeth Allyn Smith, and Nicholas Andrews | link | 2025 | Transactions of the Association for Computational Linguistics, in press |
| Neurobiber: Fast and Interpretable Stylistic Feature Extraction | Kenan Alkiek, Anna Wegmann, Jian Zhu, and David Jurgens | link | 2025 | arXiv preprint ArXiv:2502.18590 |
| SmolLM2: When Smol Goes Big — Data-Centric Training of a Fully Open Small Language Model | Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, et al. | link | 2025 | COLM |
| Masks and mimicry: Strategic obfuscation and impersonation attacks on authorship verification | Kenneth Alperin, Rohan Leekha, Adaku Uchendu, et al. | link | 2025 | Proceedings of the 5th International Conference on NLP for Digital Humanities, pages 102–116 |
| Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution | Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, and Kathleen McKeown | link | 2025 | COLING, pages 1124–1135 |
| Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers | Milad Alshomary, Nikhil Reddy Varimalla, Vishal Anand, Smaranda Muresan, and Kathleen McKeown | link | 2025 | EMNLP, pages 10290–10303 |
| The topic confusion task: A novel evaluation scenario for authorship attribution | Malik Altakrori, Jackie Chi Kit Cheung, and Benjamin CM Fung | link | 2021 | Findings of EMNLP 2021, pages 4242–4256 |
| Learning invariant representations of social media users | Nicholas Andrews and Marcus Bishop | link | 2019 | EMNLP-IJCNLP, pages 1684–1695 |
| (Dis)improved?! How Simplified Language Affects Large Language Model Performance across Languages | Miriam Anschütz, Anastasiya Damaratskaya, Chaeeun Joy Lee, Arthur Schmalz, Edoardo Mosca, and Georg Groh | link | 2025 | GEM² Workshop, pages 847–861 |
| A light in the dark web: Linking dark web aliases to real internet identities | Ehsan Arabnezhad, Massimo La Morgia, Alessandro Mei, Eugenio Nerio Nemmi, and Julinda Stefa | link | 2020 | ICDCS, pages 311–321 |
| Computational forensic authorship analysis: Promises and pitfalls | Shlomo Argamon | link | 2018 | Language and Law/Linguagem e Direito, 5(2):7–37 |
| Overview of the International Authorship Identification Competition at PAN-2011 | Shlomo Argamon and Patrick Juola | link | 2011 | CLEF 2011 |
| Efficient Large Scale Language Modeling with Mixtures of Experts | Mikel Artetxe, Shruti Bhosale, Naman Goyal, et al. | link | 2022 | EMNLP, pages 11699–11732 |
| The Routledge Handbook of Sociolinguistics Around the World, 2nd edition | Martin J. Ball, Rajend Mesthrie, and Chiara Meluzzi | link | 2023 | Routledge |
| The Language That Drives Engagement: A Systematic Large-scale Analysis of Headline Experiments | Akshina Banerjee and Oleg Urminsky | link | 2025 | Marketing Science, 44(3):566–592 |
| Keep it Private: Unsupervised privatization of online text | Calvin Bao and Marine Carpuat | link | 2024 | NAACL, pages 8678–8693 |
| Measuring what Matters: Construct Validity in Large Language Model Benchmarks | Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou, et al. | link | 2025 | NeurIPS |
| Probing classifiers: Promises, shortcomings, and advances | Yonatan Belinkov | link | 2022 | Computational Linguistics, 48(1):207–219 |
| What do neural machine translation models learn about morphology? | Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, and James Glass | link | 2017 | ACL, pages 861–872 |
| Language style as audience design | Allan Bell | link | 1984 | Language in Society, 13(2):145–204 |
| Overview of PAN 2025: Generative AI Detection, Multilingual Text Detoxification, Multi-author Writing Style Analysis, and Generative Plagiarism Detection | Janek Bevendorff, Daryna Dementieva, Maik Fröbe, et al. | link | 2025 | Advances in Information Retrieval, pages 434–441 |
| The two paradigms of LLM detection: Authorship attribution vs. authorship verification | Janek Bevendorff, Matti Wiegmann, Emmelie Richter, Martin Potthast, and Benno Stein | link | 2025 | Findings of ACL 2025, pages 3762–3787 |
| Variation across Speech and Writing | Douglas Biber | link | 1988 | Cambridge University Press |
| Register, Genre, and Style, 2nd edition | Douglas Biber and Susan Conrad | link | 2019 | Cambridge University Press |
| Natural Language Processing with Python | Steven Bird, Ewan Klein, and Edward Loper | link | 2019 | O'Reilly Media |
| Centering the speech community | Steven Bird and Dean Yibarbuk | link | 2024 | EACL, pages 826–839 |
| ETS corpus of non-native written English LDC2014T06 | Daniel Blanchard, Joel Tetreault, Derrick Higgins, Aoife Cahill, and Martin Chodorow | link | 2014 | Linguistic Data Consortium |
| The language of intergroup distinctiveness | Richard Y. Bourhis and Howard Giles | link | 1977 | Language, Ethnicity and Intergroup Relations, pages 119–135 |
| Rethinking the Authorship Verification Experimental Setups | Florin Brad, Andrei Manolache, Elena Burceanu, Antonio Barbalau, Radu Tudor Ionescu, and Marius Popescu | link | 2022 | EMNLP, pages 5634–5643 |
| Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity | Michael Brennan, Sadia Afroz, and Rachel Greenstadt | link | 2012 | ACM TISSEC, 15:1–22 |
| 'Delta': a measure of stylistic difference and a guide to likely authorship | John Burrows | link | 2002 | Literary and Linguistic Computing, 17(3):267–287 |
| How the communication style of chatbots influences consumers' satisfaction, trust, and engagement in the context of service failure | Na Cai, Shuhong Gao, and Jinzhe Yan | link | 2024 | Humanities and Social Sciences Communications, 11(1):687 |
| Accent, (ING), and the social logic of listener perceptions | Kathryn Campbell-Kibler | link | 2007 | American Speech, 82(1):32–64 |
| The nature of sociolinguistic perception | Kathryn Campbell-Kibler | link | 2009 | Language Variation and Change, 21(1):135–156 |
| The sociolinguistic variant as a carrier of social meaning | Kathryn Campbell-Kibler | link | 2011 | Language Variation and Change, 22(3):423–441 |
| The elements of style | Kathryn Campbell-Kibler, Penelope Eckert, Norma Mendoza-Denton, and Emma Moore | link | 2006 | NWAV Poster Session |
| Expertise style transfer: A new task towards better communication between experts and laymen | Yixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Liu, and Tat-Seng Chua | link | 2020 | ACL, pages 1061–1071 |
| On the diversity of synthetic data and its impact on training large language models | Hao Chen, Abdul Waheed, Xiang Li, Yidong Wang, Jindong Wang, Bhiksha Raj, and Marah I. Abdin | link | 2024 | arXiv preprint ArXiv:2410.15226 |
| HumT DumT: Measuring and controlling human-like language in LLMs | Myra Cheng, Sunny Yu, and Dan Jurafsky | link | 2025 | ACL, pages 25983–26008 |
| CLUB: a contrastive log-ratio upper bound of mutual information | Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, and Lawrence Carin | link | 2020 | ICML |
| Improving disentangled text representation learning with information-theoretic guidance | Pengyu Cheng, Martin Renqiang Min, Dinghan Shen, Christopher Malon, Yizhe Zhang, Yitong Li, and Lawrence Carin | link | 2020 | ACL, pages 7530–7541 |
| Evaluating synthetic data generation from user generated text | Jenny Chim, Julia Ive, and Maria Liakata | link | 2025 | Computational Linguistics, 51(1):191–233 |
| When Variants Lack Semantic Equivalence: Adverbial Subclause Word Order | Tanya Karoli Christensen and Torben Juel Jensen | link | 2022 | Cambridge University Press, pages 171–206 |
| Conventionality and contrast: Pragmatic principles with lexical consequences | Eve V. Clark | link | 1992 | Frames, Fields, and Contrasts, pages 171–188 |
| Dimensions of abusive language on Twitter | Isobelle Clarke and Jack Grieve | link | 2017 | First Workshop on Abusive Language Online, pages 1–11 |
| Detecting collaborations in text comparing the authors' rhetorical language choices in the federalist papers | Jeff Collins, David Kaufer, Pantelis Vlachos, Brian Butler, and Suguru Ishizaki | link | 2004 | Computers and the Humanities, 38:15–36 |
| Author identification, idiolect, and linguistic uniqueness | Malcolm Coulthard | link | 2004 | Applied Linguistics, 25(4):431–447 |
| Style: Language Variation and Identity | Nikolas Coupland | link | 2007 | Cambridge University Press |
| Txtng: The gr8 db8 | David Crystal | link | 2008 | Oxford University Press |
| A Dictionary of Linguistics and Phonetics, 6th edition | David Crystal | link | 2011 | Blackwell Publishing |
| Investigating English Style | David Crystal and Derek Davy | link | 1969 | Routledge |
| Learning stylometric representations for authorship analysis | Steven H. H. Ding, Benjamin C. M. Fung, Farkhund Iqbal, and William K. Cheung | link | 2019 | IEEE Transactions on Cybernetics, 49(1):107–121 |
| Speaker recognition based on idiolectal differences between speakers | George R. Doddington | link | 2001 | Eurospeech 2001, pages 2521–2524 |
| Automatically constructing a corpus of sentential paraphrases | William B. Dolan and Chris Brockett | link | 2005 | IWP2005 |
| Triplet loss in siamese network for object tracking | Xingping Dong and Jianbing Shen | link | 2018 | ECCV 2018, pages 472–488 |
| Refocusing on relevance: Personalization in NLG | Shiran Dudy, Steven Bedrick, and Bonnie Webber | link | 2021 | EMNLP, pages 5190–5202 |
| HotFlip: White-Box Adversarial Examples for Text Classification | Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou | link | 2018 | ACL, pages 31–36 |
| Jocks and Burnouts: Social Categories and Identity in the High School | Penelope Eckert | link | 1989 | Teachers College Press |
| Variation and the indexical field | Penelope Eckert | link | 2008 | Journal of Sociolinguistics, 12(4):453–476 |
| Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation | Penelope Eckert | link | 2012 | Annual Review of Anthropology, 41(1):87–100 |
| Stylometry with R: A Package for Computational Text Analysis | Maciej Eder, Jan Rybicki, and Mike Kestemont | link | 2016 | The R Journal, 8(1):107–121 |
| Analyzing the Persuasive Effect of Style in News Editorial Argumentation | Roxanne El Baff, Henning Wachsmuth, Khalid Al Khatib, and Benno Stein | link | 2020 | ACL, pages 3154–3160 |
| Adversarial removal of demographic attributes from text data | Yanai Elazar and Yoav Goldberg | link | 2018 | EMNLP, pages 11–21 |
| Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text | Ahmed M. Elkhatat, Khaled Elsaid, and Saeed Almeer | link | 2023 | International Journal for Educational Integrity, 19(1):1–16 |
| MMTEB: Massive Multilingual Text Embedding Benchmark | Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, et al. | link | 2024 | ICLR |
| Variety, style-shifting, and ideology | Susan M. Ervin-Tripp | link | 2001 | Style and Sociolinguistic Variation, pages 44–56 |
| Olmo 3 | Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, et al. | link | 2025 | Technical Report |
| Leveraging Measurement Theory for Natural Language Processing Research | Qixiang Fang | link | 2024 | Dissertation, Utrecht University |
| Linguistic bias in ChatGPT: Language models reinforce dialect discrimination | Eve Fleisig, Genevieve Smith, Madeline Bossi, Ishita Rustagi, Xavier Yin, and Dan Klein | link | 2024 | EMNLP, pages 13541–13564 |
| Survey of the state of the art in natural language generation: Core tasks, applications and evaluation | Albert Gatt and Emiel Krahmer | link | 2018 | JAIR, 61:65–170 |
| GLTR: Statistical detection and visualization of generated text | Sebastian Gehrmann, Hendrik Strobelt, and Alexander Rush | link | 2019 | ACL System Demonstrations, pages 111–116 |
| Accommodation theory: Communication, context, and consequence | Howard Giles, Nikolas Coupland, and Justine Coupland | link | 1991 | Contexts of accommodation, 1:1–68 |
| Speech Style and Social Evaluation | Howard Giles and Peter F. Powesland | link | 1975 | Academic Press |
| Assessing BERT's syntactic abilities | Yoav Goldberg | link | 2019 | arXiv preprint ArXiv:1901.05287 |
| Coh-Metrix: Analysis of text on cohesion and language | Arthur C. Graesser, Danielle S. McNamara, Max M. Louwerse, and Zhiqiang Cai | link | 2004 | Behavior Research Methods, Instruments, & Computers, 36(2):193–202 |
| The Idea of Progress in Forensic Authorship Analysis | Tim Grant | link | 2022 | Cambridge University Press |
| Quantitative authorship attribution: An evaluation of techniques | Jack Grieve | link | 2007 | Literary and Linguistic Computing, 22(3):251–270 |
| Register variation explains stylometric authorship analysis | Jack Grieve | link | 2023 | Corpus Linguistics and Linguistic Theory, 19(1):47–77 |
| The sociolinguistic foundations of language modeling | Jack Grieve, Sara Bartl, Matteo Fuoli, et al. | link | 2025 | Frontiers in Artificial Intelligence, 7:1472411 |
| Variation among blogs: A multi-dimensional analysis | Jack Grieve, Douglas Biber, Eric Friginal, and Tatiana Nekrasova | link | 2011 | Genres on the Web, pages 303–322 |
| Benchmarking Linguistic Diversity of Large Language Models | Yanzhu Guo, Guokan Shang, and Chloé Clavel | link | 2025 | arXiv preprint ArXiv:2412.10271 |
| The curious decline of linguistic diversity: Training language models on synthetic text | Yanzhu Guo, Guokan Shang, Michalis Vazirgiannis, and Chloé Clavel | link | 2024 | Findings of NAACL 2024, pages 3589–3604 |
| Annotation artifacts in natural language inference data | Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith | link | 2018 | NAACL, pages 107–112 |
| Towards style alignment in cross-cultural translation | Shreya Havaldar, Adam Stein, Eric Wong, and Lyle Ungar | link | 2025 | ACL, pages 32213–32230 |
| Representation learning of writing style | Julien Hay, Bich-Lien Doan, Fabrice Popineau, and Ouassim Ait Elhara | link | 2020 | W-NUT 2020, pages 232–243 |
| Measuring Mathematical Problem Solving With the MATH Dataset | Dan Hendrycks, Collin Burns, Saurav Kadavath, et al. | link | 2021 | NeurIPS |
| Looking for the inner music: Probing LLMs' understanding of literary style | Rebecca M. M. Hicke and David Mimno | link | 2025 | Computational Humanities Research, 1:e3 |
| AI generates covertly racist decisions about people based on their dialect | Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, and Sharese King | link | 2024 | Nature, 633:147–154 |
| Intonation and referee design phenomena in the narrative speech of Black/biracial men | Nicole Holliday | link | 2021 | Journal of English Linguistics, 49(3):283–304 |
| The analysis of literary style – a review | David I. Holmes | link | 1985 | Journal of the Royal Statistical Society: Series A, 148(4):328–341 |
| ParaGuide: Guided diffusion paraphrasers for plug-and-play textual style transfer | Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, and Kathleen McKeown | link | 2024 | AAAI, pages 18216–18224 |
| TinyStyler: Efficient few-shot text style transfer with authorship embeddings | Zachary Horvitz, Ajay Patel, Kanishk Singh, Chris Callison-Burch, Kathleen McKeown, and Zhou Yu | link | 2024 | Findings of EMNLP 2024, pages 13376–13390 |
| N-gram feature selection for authorship identification | John Houvardas and Efstathios Stamatatos | link | 2006 | AIMSA'06, pages 77–86 |
| Demographic factors improve classification performance | Dirk Hovy | link | 2015 | ACL-IJCNLP, pages 752–762 |
| "You Sound Just Like Your Father" Commercial Machine Translation Systems Include Stylistic Biases | Dirk Hovy, Federico Bianchi, and Tommaso Fornaciari | link | 2020 | ACL, pages 1686–1690 |
| The social impact of natural language processing | Dirk Hovy and Shannon L. Spruit | link | 2016 | ACL, pages 591–598 |
| Tagging Performance Correlates with Author Age | Dirk Hovy and Anders Søgaard | link | 2015 | ACL-IJCNLP, pages 483–488 |
| The importance of modeling social factors of language: Theory and practice | Dirk Hovy and Diyi Yang | link | 2021 | NAACL, pages 588–602 |
| Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges | Baixiang Huang, Canyu Chen, and Kai Shu | link | 2025 | ACM SIGKDD Explorations Newsletter, 26(2):21–43 |
| Sparse autoencoders find highly interpretable features in language models | Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey | link | 2024 | ICLR |
| "Style" as distinctiveness: the culture and ideology of linguistic differentiation | Judith T. Irvine | link | 2001 | Style and Sociolinguistic Variation, pages 21–43 |
| The Million Authors Corpus: A Cross-Lingual and Cross-Domain Wikipedia Dataset for Authorship Verification | Abraham Israeli, Shuai Liu, Jonathan May, and David Jurgens | link | 2025 | Findings of ACL 2025, pages 25997–26017 |
| Style versus Content: A distinction without a (learnable) difference? | Somayeh Jafaritazehjani, Gwénolé Lecorvé, Damien Lolive, and John Kelleher | link | 2020 | COLING, pages 2169–2180 |
| Evaluating Style-Personalized Text Generation: Challenges and Directions | Anubhav Jangra, Bahareh Sarrafzadeh, Adrian de Wynter, Silviu Cucerzan, and Sujay Kumar Jauhar | link | 2025 | arXiv preprint ArXiv:2508.06374 |
| Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models | Harsh Jhamtani, Varun Gangal, Eduard Hovy, and Eric Nyberg | link | 2017 | Workshop on Stylistic Variation, pages 10–19 |
| Mistral 7B | Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, et al. | link | 2023 | arXiv preprint ArXiv:2310.06825 |
| Deep learning for text style transfer: A survey | Di Jin, Zhijing Jin, Zhiting Hu, Olga Vechtomova, and Rada Mihalcea | link | 2022 | Computational Linguistics, 48(1):155–205 |
| Disentangled representation learning for non-parallel text style transfer | Vineet John, Lili Mou, Hareesh Bahuleyan, and Olga Vechtomova | link | 2019 | ACL, pages 424–434 |
| Authorship attribution | Patrick Juola | link | 2006 | Foundations and Trends in Information Retrieval, 1(3):233–334 |
| JGAAP 4.0–A revised authorship attribution tool | Patrick Juola, John Noecker Jr., Mike Ryan, and Sandy Speer | link | 2009 | Digital Humanities |
| (male, bachelor) and (female, Ph.D) have different connotations: Parallelly annotated stylistic language dataset with multiple personas | Dongyeop Kang, Varun Gangal, and Eduard Hovy | link | 2019 | EMNLP-IJCNLP, pages 1696–1706 |
| Style is NOT a single variable: Case Studies for Cross-Stylistic Language Understanding | Dongyeop Kang and Eduard Hovy | link | 2021 | ACL-IJCNLP, pages 2376–2387 |
| Function words in authorship attribution. from black magic to theory? | Mike Kestemont | link | 2014 | CLFL, pages 59–66 |
| A deep metric learning approach to account linking | Aleem Khan, Elizabeth Fleming, Noah Schofield, Marcus Bishop, and Nicholas Andrews | link | 2021 | NAACL, pages 5275–5287 |
| Learning to generate text in arbitrary writing styles | Aleem Khan, Andrew Wang, Sophia Hager, and Nicholas Andrews | link | 2023 | arXiv:2312.17242 |
| Supervised contrastive learning | Prannay Khosla, Piotr Teterwak, Chen Wang, et al. | link | 2020 | NeurIPS, 33:18661–18673 |
| Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains | Junghwan Kim, Haotian Zhang, and David Jurgens | link | 2025 | EMNLP, pages 34855–34880 |
| Working in Language and Law: A German Perspective | Hannes Kniffka | link | 2007 | Palgrave Macmillan UK |
| What's in an embedding? analyzing word embeddings through multilingual evaluation | Arne Köhn | link | 2015 | EMNLP, pages 2067–2073 |
| Automatically categorizing written texts by author gender | Moshe Koppel, Shlomo Argamon, and Anat Rachel Shimoni | link | 2002 | Literary and Linguistic Computing, 17(4):401–412 |
| Stylometric detection of ai-generated text in twitter timelines | Tharindu Kumarage, Joshua Garland, Amrita Bhattacharjee, et al. | link | 2023 | arXiv:2303.03697 |
| Sociolinguistic Patterns | William Labov | link | 1972 | University of Pennsylvania Press |
| The Social Stratification of English in New York City, 2nd edition | William Labov | link | 2006 | Cambridge University Press |
| Tulu 3: Pushing Frontiers in Open Language Model Post-Training | Nathan Lambert, Jacob Morrison, Valentina Pyatkin, et al. | link | 2025 | arXiv preprint ArXiv:2411.15124 |
| Where does the sociolinguistic variable stop? | Beatriz R. Lavandera | link | 1978 | Language in Society, 7(2):171–192 |
| LFTK: Handcrafted Features in Computational Linguistics | Bruce W. Lee and Jason Lee | link | 2023 | BEA 2023, pages 1–19 |
| Diverse Demonstrations Improve In-context Compositional Generalization | Itay Levy, Ben Bogin, and Jonathan Berant | link | 2023 | ACL, pages 1401–1422 |
| TextBugger: Generating Adversarial Text Against Real-world Applications | Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang | link | 2019 | NDSS |
| Towards robust and privacy-preserving text representations | Yitong Li, Timothy Baldwin, and Trevor Cohn | link | 2018 | ACL, pages 25–30 |
| Textbooks Are All You Need II: phi-1.5 technical report | Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, et al. | link | 2023 | arXiv preprint ArXiv:2309.05463 |
| GPT detectors are biased against non-native English writers | Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou | link | 2023 | Patterns, 4(7):100779 |
| Let's Verify Step by Step | Hunter Lightman, Vineet Kosaraju, Yuri Burda, et al. | link | 2023 | - |
| Style over Substance: Distilled Language Models Reason Via Stylistic Replication | Philip Lippmann and Jie Yang | link | 2025 | arXiv |
| Anonymisation models for text data: State of the art, challenges and future directions | Pierre Lison, Ildikó Pilán, David Sanchez, Montserrat Batet, and Lilja Øvrelid | link | 2021 | ACL-IJCNLP, pages 4188–4203 |
| Enct5: A framework for fine-tuning t5 as non-autoregressive models | Frederick Liu, Terry Huang, Shihang Lyu, et al. | link | 2022 | arXiv:2110.08426 |
| A Survey of Personalized Large Language Models: Progress and Future Directions | Jiahong Liu, Zexuan Qiu, Zhongyang Li, et al. | link | 2025 | arXiv preprint ArXiv:2502.11528 |
| RECAP: Retrieval-enhanced context-aware prefix encoder for personalized dialogue response generation | Shuai Liu, Hyundong Cho, Marjorie Freedman, Xuezhe Ma, and Jonathan May | link | 2023 | ACL, pages 8404–8419 |
| More than words: The influence of affective content and linguistic style matches in online reviews on conversion rates | Stephan Ludwig, Ko de Ruyter, Max Friedman, et al. | link | 2013 | Journal of Marketing, 77(1):87–103 |
| Politeness transfer: A tag and generate approach | Aman Madaan, Amrith Setlur, Tanmay Parekh, et al. | link | 2020 | ACL, pages 1869–1881 |
| Jointly learning author and annotated character n-gram embeddings: A case study in literary text | Suraj Maharjan, Deepthi Mave, Prasha Shrestha, Manuel Montes, Fabio A. González, and Thamar Solorio | link | 2019 | RANLP 2019, pages 684–692 |
| Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling | Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, and Navdeep Jaitly | link | 2024 | arXiv preprint ArXiv:2401.16380 |
| Counterfactual augmentation for robust authorship representation learning | Hieu Man and Thien Huu Nguyen | link | 2024 | SIGIR '24, pages 2347–2351 |
| Language technologies as if people mattered: Centering communities in language technology development | Nina Markl, Lauren Hall-Lew, and Catherine Lai | link | 2024 | LREC-COLING 2024, pages 10085–10099 |
| Umap: Uniform manifold approximation and projection for dimension reduction | Leland McInnes, John Healy, and James Melville | link | 2020 | arXiv:1802.03426 |
| Introducing Sociolinguistics | Miriam Meyerhoff | link | 2006 | Routledge |
| Linguistic profiling of a neural language model | Alessio Miaschi, Dominique Brunato, Felice Dell'Orletta, and Giulia Venturi | link | 2020 | COLING, pages 745–756 |
| Stranger than paradigms word embedding benchmarks don't align with morphology | Timothee Mickus and Maria Copot | link | 2024 | SCiL 2024, pages 173–189 |
| Investigating topic influence in authorship attribution | George K Mikros and Eleni K Argiri | link | 2007 | SIGIR'07 Workshop |
| The signature stylometric system | Peter Millican | link | 2003 | - |
| State of what art? a call for multi-prompt LLM evaluation | Moran Mizrahi, Guy Kaplan, Dan Malkin, Rotem Dror, Dafna Shahaf, and Gabriel Stanovsky | link | 2024 | TACL, 12:933–949 |
| Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers | Frederick Mosteller and David L. Wallace | link | 1963 | JASA, 58(302):275–309 |
| MTEB: Massive Text Embedding Benchmark | Niklas Muennighoff, Nouamane Tazi, Loic Magne, and Nils Reimers | link | 2023 | EACL, pages 2014–2037 |
| s1: Simple test-time scaling | Niklas Muennighoff, Zitong Yang, Weijia Shi, et al. | link | 2025 | arXiv preprint ArXiv:2501.19393 |
| Does your style engage? linguistic styles of influencers and digital consumer engagement on youtube | Ana Cristina Munaro, Renato Hübner Barcelos, et al. | link | 2024 | Computers in Human Behavior, 156(C) |
| Surveying stylometry techniques and applications | Tempestt Neal, Kalaivani Sundararajan, Aneez Fatima, et al. | link | 2017 | ACM Computing Surveys, 50(6):86 |
| Collaborative growth: When large language models meet sociolinguistics | Dong Nguyen | link | 2025 | Language and Linguistics Compass, 19(2):e70010 |
| Computational sociolinguistics: A Survey | Dong Nguyen, A. Seza Doğruöz, Carolyn P. Rosé, and Franciska de Jong | link | 2016 | Computational Linguistics, 42(3):537–593 |
| "How old do you think I am?" A study of language and age in Twitter | Dong Nguyen, Rilana Gravel, Dolf Trieschnigg, and Theo Meder | link | 2013 | ICWSM, pages 439–448 |
| Do word embeddings capture spelling variation? | Dong Nguyen and Jack Grieve | link | 2020 | COLING, pages 870–881 |
| We Need to Measure Data Diversity in NLP – Better and Broader | Dong Nguyen and Esther Ploeger | link | 2025 | arXiv preprint ArXiv:2505.20264 |
| On learning and representing social meaning in NLP: a sociolinguistic perspective | Dong Nguyen, Laura Rosseel, and Jack Grieve | link | 2021 | NAACL, pages 603–612 |
| The Multi-Dimensional Analysis Tagger | Andrea Nini | link | 2019 | Multi-Dimensional Analysis: Research Methods and Current Issues |
| A Theory of Linguistic Individuality for Authorship Analysis | Andrea Nini | link | 2023 | Cambridge University Press |
| A study of style in machine translation: Controlling the formality of machine translation output | Xing Niu, Marianna Martindale, and Marine Carpuat | link | 2017 | EMNLP, pages 2814–2819 |
| Multi-task neural models for translating between styles within and across languages | Xing Niu, Sudha Rao, and Marine Carpuat | link | 2018 | COLING, pages 1008–1021 |
| 2 OLMo 2 Furious | Team OLMo, Pete Walsh, Luca Soldaini, et al. | link | 2025 | arXiv preprint ArXiv:2501.00656 |
| Linguistic style and crowdfunding success among social and commercial entrepreneurs | Annaleena Parhankangas and Maija Renko | link | 2017 | Journal of Business Venturing, 32(2):215–236 |
| Learning interpretable style embeddings via prompting LLMs | Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, and Chris Callison-Burch | link | 2023 | Findings of EMNLP 2023, pages 15270–15290 |
| StyleDistance: Stronger content-independent style embeddings with synthetic parallel examples | Ajay Patel, Jiacheng Zhu, Justin Qiu, et al. | link | 2025 | NAACL, pages 8662–8685 |
| Language independent authorship attribution using character level language models | Fuchun Peng, Dale Schuurmans, Shaojun Wang, and Vlado Keselj | link | 2003 | EACL, pages 267–274 |
| The Development and Psychometric Properties of LIWC2015 | James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn | link | 2015 | University of Texas at Austin |
| JSAN–The Integrated JStylo and Anonymouth Package | Drexel University PSAL | link | 2013 | Drexel University |
| Mind the style of text! adversarial and backdoor attacks based on text style transfer | Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, and Maosong Sun | link | 2021 | EMNLP, pages 4569–4580 |
| mStyleDistance: Multilingual style embeddings and their evaluation | Justin Qiu, Jiacheng Zhu, Ajay Patel, Marianna Apidianaki, and Chris Callison-Burch | link | 2025 | Findings of ACL 2025, pages 16917–16931 |
| Personalized machine translation: Preserving original author traits | Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, and Shuly Wintner | link | 2017 | EACL, pages 1074–1084 |
| Overview of the author profiling task at PAN 2013 | Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstathios Stamatatos, and Giacomo Inches | link | 2013 | CLEF |
| Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer | Sudha Rao and Joel Tetreault | link | 2018 | NAACL, pages 129–140 |
| A recipe for arbitrary text style transfer with large language models | Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch, and Jason Wei | link | 2022 | ACL, pages 837–848 |
| Addressee- and topic-influenced style shift: A quantitative sociolinguistic study | John R. Rickford and McNair-Knox | link | 1994 | Sociolinguistic Perspectives on Register, pages 235–276 |
| Few-shot detection of machine-generated text using style representations | Rafael Rivera Soto, Kailin Koch, Aleem Khan, Barry Chen, Marcus Bishop, and Nicholas Andrews | link | 2024 | ICLR |
| Learning universal authorship representations | Rafael Rivera Soto, Olivia Elizabeth Miano, Juanita Ordonez, et al. | link | 2021 | EMNLP, pages 913–919 |
| My LLM might Mimic AAE - But When Should It? | Sandra Camille Sandoval, Christabel Acquaye, Kwesi Adu Cobbina, Mohammad Nayeem Teli, and Hal Daumé Iii | link | 2025 | NAACL, pages 5277–5302 |
| Topic-regularized authorship representation learning | Jitkapat Sawatphol, Nonthakit Chaiwong, Can Udomcharoenchaikit, and Sarana Nutanong | link | 2022 | EMNLP, pages 1076–1082 |
| Addressing Topic Leakage in Cross-Topic Evaluation for Authorship Verification | Jitkapat Sawatphol, Can Udomcharoenchaikit, and Sarana Nutanong | link | 2024 | TACL, 12:1363–1377 |
| MATCHED: Multimodal Authorship-Attribution To Combat Human Trafficking in Escort-Advertisement Data | Vageesh Kumar Saxena, Benjamin Ashpole, Gijs Van Dijck, and Gerasimos Spanakis | link | 2025 | Findings of ACL 2025, pages 4334–4373 |
| Frequent-words analysis for forensic speaker comparison | Eleni-Konstantina Sergidou, Nelleke Scheijen, Jeannette Leegwater, Tina Cambier-Langeveld, and Wauter Bosma | link | 2023 | Speech Communication, 150:1–8 |
| The power of words: Driving online consumer engagement in Fintech | R.V. ShabbirHusain, Atul Arun Pathak, Shabana Chandrasekaran, and Balamurugan Annamalai | link | 2023 | International Journal of Bank Marketing, 42(2):331–355 |
| Style transfer from non-parallel text by cross-alignment | Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola | link | 2017 | NIPS'17, pages 6833–6844 |
| Does string-based neural MT learn source syntax? | Xing Shi, Inkit Padhi, and Kevin Knight | link | 2016 | EMNLP, pages 1526–1534 |
| Personalized author obfuscation with large language models | Mohammad Shokri, Sarah Ita Levitan, and Rivka Levitan | link | 2025 | arXiv preprint arXiv:2505.12090 |
| A survey of modern authorship attribution methods | Efstathios Stamatatos | link | 2009 | JASIST, 60(3):538–556 |
| Masking topic-related information to enhance authorship attribution | Efstathios Stamatatos | link | 2017 | JASIST, 69(3):461–473 |
| Multi-label style change detection by solving a binary classification problem | Eivind Strøm | link | 2021 | CLEF 2021, pages 2146–2157 |
| Dialect-robust evaluation of generated text | Jiao Sun, Thibault Sellam, Elizabeth Clark, et al. | link | 2023 | ACL, pages 6010–6028 |
| Idiosyncrasies in large language models | Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, and Zhuang Liu | link | 2025 | arXiv:2502.12150 |
| Unsupervised neural text simplification | Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, and Karthik Sankaranarayanan | link | 2019 | ACL, pages 2058–2068 |
| What do you learn from context? Probing for sentence structure in contextualized word representations | Ian Tenney, Patrick Xia, Berlin Chen, et al. | link | 2018 | - |
| Writing Style Author Embedding Evaluation | Enzo Terreau, Antoine Gourru, and Julien Velcin | link | 2021 | Evaluation and Comparison of NLP Systems Workshop, pages 84–93 |
| Stayal | multilingual style transfer | Karishma Thakrar, Katrina Lawrence, and Kyle Howard | link | 2025 | arXiv:2501.11639 |
| Reddust: A large reusable dataset of reddit user traits | Anna Tigunova, Paramita Mirza, Andrew Yates, and Gerhard Weikum | link | 2020 | LREC, pages 6118–6126 |
| HANSEN: Human and AI spoken text benchmark for authorship analysis | Nafis Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, and Dongwon Lee | link | 2023 | Findings of EMNLP 2023, pages 13706–13724 |
| Research Methods: The Essential Knowledge Base | William M. K. Trochim, James P. Donnelly, and Kanika Arora | link | 2015 | Cengage Learning |
| Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles | Kimberly Le Truong, Riccardo Fogliato, Hoda Heidari, and Zhiwei Steven Wu | link | 2025 | arXiv preprint ArXiv:2507.22168 |
| Authorship attribution for neural text generation | Adaku Uchendu, Thai Le, Kai Shu, and Dongwon Lee | link | 2020 | EMNLP, pages 8384–8395 |
| Paraphrase types elicit prompt engineering capabilities | Jan Philip Wahle, Terry Ruas, Yang Xu, and Bela Gipp | link | 2024 | EMNLP, pages 11004–11033 |
| Can authorship representation learning capture stylistic features? | Andrew Wang, Cristina Aggazzotti, Rebecca Kotula, Rafael Rivera Soto, Marcus Bishop, and Nicholas Andrews | link | 2023 | TACL, 11:1416–1431 |
| Feature vector difference based neural network and logistic regression models for authorship verification | Janith Weerasinghe and Rachel Greenstadt | link | 2020 | PAN at CLEF 2020, 2695 |
| Does it capture STEL? a modular, similarity-based linguistic style evaluation framework | Anna Wegmann and Dong Nguyen | link | 2021 | EMNLP, pages 7109–7130 |
| Tokenization is sensitive to language variation | Anna Wegmann, Dong Nguyen, and David Jurgens | link | 2025 | Findings of ACL 2025, pages 10958–10983 |
| Same Author or Just Same Topic? Towards Content-Independent Style Representations | Anna Wegmann, Marijn Schraagen, and Dong Nguyen | link | 2022 | RepL4NLP Workshop, pages 249–268 |
| Constraints on the agentless passive | E. Judith Weiner and William Labov | link | 1983 | Journal of Linguistics, 19(1):29–58 |
| Disentangling style factors from speaker representations | Jennifer Williams and Simon King | link | 2019 | Interspeech, pages 3945–3949 |
| Style over substance: Evaluation biases for large language models | Minghao Wu and Alham Fikri Aji | link | 2025 | COLING, pages 297–312 |
| Out-of-distribution generalization in natural language processing: Past, present, and future | Linyi Yang, Yaoxian Song, Xuan Ren, et al. | link | 2023 | EMNLP, pages 4533–4559 |
| A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models | Hanqing Zhang, Haolin Song, Shaoyu Li, Ming Zhou, and Dawei Song | link | 2023 | ACM Computing Surveys, 56(3):64:1–64:37 |
| Personalized Text Generation with Contrastive Activation Steering | Jinghao Zhang, Yuting Liu, Wenjie Wang, et al. | link | 2025 | ACL, pages 7128–7141 |
| How Well Do Text Embedding Models Understand Syntax? | Yan Zhang, Zhaopeng Feng, Zhiyang Teng, Zuozhu Liu, and Haizhou Li | link | 2023 | Findings of EMNLP 2023, pages 9717–9728 |
| Personalization of Large Language Models: A Survey | Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, et al. | link | 2025 | Transactions on Machine Learning Research |
| Unmasking style sensitivity: A causal analysis of bias evaluation instability in large language models | Jiaxu Zhao, Meng Fang, Kun Zhang, and Mykola Pechenizkiy | link | 2025 | ACL, pages 16314–16338 |
| Disentangled sequence to sequence learning for compositional generalization | Hao Zheng and Mirella Lapata | link | 2022 | ACL, pages 4256–4268 |
| Idiosyncratic but not Arbitrary: Learning Idiolects in Online Registers Reveals Distinctive yet Consistent Individual Styles | Jian Zhu and David Jurgens | link | 2021 | EMNLP, pages 279–297 |
| StyleFlow: Disentangle latent representations via normalizing flow for unsupervised text style transfer | Kangchen Zhu, Zhiliang Tian, Jingyu Wei, et al. | link | 2024 | LREC-COLING 2024, pages 15384–15397 |
| Trans self-identification and the language of neoliberal selfhood: Agency, power, and the limits of monologic discourse | Lal Zimman | link | 2019 | IJSL, 2019(256):147–175 |
| An ensemble-rich multi-aspect approach for robust style change detection | Dimitrina Zlatkova, Daniel Kopev, Kristiyan Mitov, et al. | link | 2018 | PAN at CLEF-2018 |
| Style change detection with feed-forward neural networks | Chaoyuan Zuo, Yu Zhao, and Ritwik Banerjee | link | 2019 | PAN at CLEF 2019, 93 |