Methods

diversify-text uses a pluggable method architecture. Each method is a DiversificationMethod subclass that generates paraphrases using a different model or algorithm.

Overview

Method

Model Size

Speed

Performance

Description

tinystyler

~800M params

TBD

TBD

Few-shot style transfer using authorship embeddings

prompting

~1.7B params (default)

TBD

TBD

Prompt-based paraphrasing using a causal LM

TinyStyler

TinyStyler is a T5-based model that performs few-shot text style transfer by conditioning on authorship-embedding representations.

Given a source text and a set of style example sentences, TinyStyler generates a paraphrase that preserves the content while shifting toward the demonstrated writing style. diversify-text cycles through different style groups from a configurable style bank to produce multiple stylistically diverse outputs.

Note

TinyStyler is based on CISR style embeddings, which have been shown to work well for social-media-like settings and formality transfer. The model may not perform as expected when reproducing other styles.

Default style bank. The built-in bank contains named styles drawn from the CORE corpus, the TinyStyler repository and the STEL demo for the formality dimension. See diversify_text.method.tinystyler.styles.DEFAULT_STYLE_BANK for the full list of available styles.

Citation:

@inproceedings{horvitz-etal-2024-tinystyler,
    title = "{T}iny{S}tyler: Efficient Few-Shot Text Style Transfer with Authorship Embeddings",
    author = "Horvitz, Zachary  and
      Patel, Ajay  and
      Singh, Kanishk  and
      Callison-Burch, Chris  and
      McKeown, Kathleen  and
      Yu, Zhou",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.781",
    pages = "13376--13390",
}

Prompting

The prompting method generates paraphrases by sending input texts to a local HuggingFace causal language model with a prompt template. The default model is SmolLM3-3B using insights from The Synthetic Data Playbook.

results = diversify("The cat sat on the mat.", methods=["prompting"])

Choosing a model. Any HuggingFace causal LM can be used. Pass the model identifier to the constructor:

from diversify_text import Diversifier
from diversify_text.method.prompting import PromptingMethod

method = PromptingMethod(model="mistralai/Mistral-7B-Instruct-v0.3")
results = Diversifier(methods=[method]).diversify("The cat sat on the mat.")

Instruct-tuned models are recommended. Chat templates are applied automatically when the tokenizer provides one.

Note

Thinking/reasoning models (e.g. SmolLM3-3B) are detected automatically and have their thinking mode turned off (enable_thinking=False) during generation. Thinking tokens add overhead without improving paraphrase quality in this setting.

Inference backend. The method currently uses the transformers library for inference.

Note

vLLM support, batched inference, and streaming from large files are planned for a future release.

Default prompt bank. The built-in bank contains multiple prompt templates covering different rewriting styles (paraphrasing, simplification, dialogue, tables, and more). When no explicit selection is made, the templates listed in DEFAULT_PROMPTS are used. See Prompt Templates for the full list of available templates.

Customising the prompt bank. Like TinyStyler’s style bank, you can provide a custom prompt bank or select specific prompts via method_kwargs. Each prompt template must contain the placeholder [DOCUMENT SEGMENT]:

custom_bank = {
    "simple": "Rewrite the following text in simpler words: [DOCUMENT SEGMENT]",
    "formal": "Rewrite the following text in a formal academic tone: [DOCUMENT SEGMENT]",
}

results = diversify(
    "The cat sat on the mat.",
    methods=["prompting"],
    method_kwargs={"prompting": {"prompt_bank": custom_bank}},
)

You can also select specific prompts by key name:

results = diversify(
    "The cat sat on the mat.",
    methods=["prompting"],
    method_kwargs={"prompting": {"prompt_keys": ["wikipedia_paraphrase"]}},
)

Zero-shot humanize rewriting

The prompt bank includes humanize prompts based on Zhang et al. (2024) that rewrite machine-generated text to appear more human-written. These prompts instruct the model to introduce informal elements such as typos, slang, hashtags, and varied casing:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
    method_kwargs={"prompting": {"prompt_keys": ["humanize_llm-as-coauthor"]}},
)

A stricter variant, humanize_llm-as-coauthor_original, uses the original five modifications from the paper and explicitly forbids emojis.

Few-shot style transfer with prompting

The prompting method can also perform few-shot style transfer by combining style examples from the shared style bank with a few-shot prompt template. When styles is provided without explicit prompt_keys, the method automatically uses the style_transfer template from EXAMPLE_BASED_PROMPT_BANK:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
    method_kwargs={
        "prompting": {
            "styles": ["informal_tinystyler"],
        }
    },
)

You can select a different few-shot template via prompt_keys. For example, humanize_transfer combines humanization instructions with the style examples:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
    method_kwargs={
        "prompting": {
            "styles": ["informal_tinystyler"],
            "prompt_keys": ["humanize_transfer"],
        }
    },
)

Development

To see the exact prompts sent to the model, enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Adding a new method

See Creating a custom method in the Usage Guide for instructions on implementing your own DiversificationMethod.