Quickstart

Installation

pip install diversify-text

Basic usage

from diversify_text import diversify

results = diversify("The experiment was conducted in a controlled lab setting.")

[{
    "original": "The experiment was conducted in a controlled lab setting.",
    "paraphrases": [
        "the experiment was in a controlled lab setting so it didnt suck...",
        "Well it was a controlled lab setting that the experiment was conducted in.",
        "Did you know that the experiment was conducted in a controlled lab setting? It was a re-test.",
        "I heard the experiment was conducted in a controlled lab setting.",
        "I mean, this experiment was conducted in a controlled lab setting, so that was a good thing.",
    ]
}]

Semantic filter

Enable the semantic filter to score each paraphrase with the Mutual Implication Score model and automatically select the best candidate above a minimum score:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    semantic_filter=True,
)

[{
    "original": "The experiment was conducted in a controlled lab setting.",
    "paraphrases": [
        "the experiment was in a controlled lab setting so it didnt suck...",
        "Well it was a controlled lab setting that the experiment was conducted in.",
        "Can you explain the experiment? It was conducted in a controlled lab setting.",
        "I heard the experiment was conducted in a controlled lab setting.",
        "I mean, this experiment was conducted in a controlled lab setting, so that was a good thing.",
    ]
}]

Caching

The diversify() function automatically caches loaded models between calls. The generation model and the semantic filter are cached independently, so toggling semantic_filter does not reload the generation model and vice versa. Call clear_cache() to release cached model references when you are done. On CUDA devices, memory may remain reserved by the underlying framework’s caching allocator and be reused in future calls rather than immediately returned to the OS/driver:

from diversify_text import clear_cache

clear_cache()

Using the class directly

You can also instantiate a Diversifier yourself for full control over the model lifecycle:

from diversify_text import Diversifier

div = Diversifier(device="cuda", methods=["tinystyler"])

batch_1 = div.diversify(texts_1, n=5)
batch_2 = div.diversify(texts_2, n=5)

Prompting method

Besides the default TinyStyler method, you can use the prompting method which generates paraphrases via a causal language model (default: SmolLM3-3B):

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
)

Select specific prompt styles via prompt_keys:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
    method_kwargs={
        "prompting": {
            "prompt_keys": ["simple_kew", "complex_kew", "caps_reif"]
        }
    },
)

See Prompt Templates for the full list of available prompt templates.

Citation

If you use diversify in your research, we are happy about a citation (placeholder currently).

@inproceedings{wegmann2026diversify,
    title = {diversify_text: An Amazing Library for Text Diversification},
    author = {Wegmann, Anna and Others},
    url={https://github.com/AnnaWegmann/diversify_text},
    year = {2026},
}