Quickstart
Installation
pip install diversify-text
Basic usage
from diversify_text import diversify
results = diversify("The experiment was conducted in a controlled lab setting.")
[{
"original": "The experiment was conducted in a controlled lab setting.",
"paraphrases": [
"the experiment was in a controlled lab setting so it didnt suck...",
"Well it was a controlled lab setting that the experiment was conducted in.",
"Did you know that the experiment was conducted in a controlled lab setting? It was a re-test.",
"I heard the experiment was conducted in a controlled lab setting.",
"I mean, this experiment was conducted in a controlled lab setting, so that was a good thing.",
]
}]
Semantic filter
Enable the semantic filter to score each paraphrase with the Mutual Implication Score model and automatically select the best candidate above a minimum score:
results = diversify(
"The experiment was conducted in a controlled lab setting.",
semantic_filter=True,
)
[{
"original": "The experiment was conducted in a controlled lab setting.",
"paraphrases": [
"the experiment was in a controlled lab setting so it didnt suck...",
"Well it was a controlled lab setting that the experiment was conducted in.",
"Can you explain the experiment? It was conducted in a controlled lab setting.",
"I heard the experiment was conducted in a controlled lab setting.",
"I mean, this experiment was conducted in a controlled lab setting, so that was a good thing.",
]
}]
Caching
The diversify() function automatically caches loaded models between calls.
The generation model and the semantic filter are cached independently, so
toggling semantic_filter does not reload the generation model and vice
versa. Call clear_cache() to release cached model references when you are done.
On CUDA devices, memory may remain reserved by the underlying framework’s caching
allocator and be reused in future calls rather than immediately returned to the OS/driver:
from diversify_text import clear_cache
clear_cache()
Using the class directly
You can also instantiate a Diversifier yourself for full control over the
model lifecycle:
from diversify_text import Diversifier
div = Diversifier(device="cuda", methods=["tinystyler"])
batch_1 = div.diversify(texts_1, n=5)
batch_2 = div.diversify(texts_2, n=5)
Prompting method
Besides the default TinyStyler method, you can use the prompting method which generates paraphrases via a causal language model (default: SmolLM3-3B):
results = diversify(
"The experiment was conducted in a controlled lab setting.",
methods=["prompting"],
)
Select specific prompt styles via prompt_keys:
results = diversify(
"The experiment was conducted in a controlled lab setting.",
methods=["prompting"],
method_kwargs={
"prompting": {
"prompt_keys": ["simple_kew", "complex_kew", "caps_reif"]
}
},
)
See Prompt Templates for the full list of available prompt templates.
Citation
If you use diversify in your research, we are happy about a citation (placeholder currently).
@inproceedings{wegmann2026diversify,
title = {diversify_text: An Amazing Library for Text Diversification},
author = {Wegmann, Anna and Others},
url={https://github.com/AnnaWegmann/diversify_text},
year = {2026},
}