Quickstart
==========

Installation
------------

.. code-block:: bash

   pip install diversify-text

Basic usage
-----------

.. code-block:: python

   from diversify_text import diversify

   results = diversify("The experiment was conducted in a controlled lab setting.")

.. code-block:: python

   [{
       "original": "The experiment was conducted in a controlled lab setting.",
       "paraphrases": [
           "the experiment was in a controlled lab setting so it didnt suck...",
           "Well it was a controlled lab setting that the experiment was conducted in.",
           "Did you know that the experiment was conducted in a controlled lab setting? It was a re-test.",
           "I heard the experiment was conducted in a controlled lab setting.",
           "I mean, this experiment was conducted in a controlled lab setting, so that was a good thing.",
       ]
   }]

Semantic filter
-----------------

Enable the semantic filter to score each paraphrase with the
`Mutual Implication Score <https://huggingface.co/s-nlp/Mutual_Implication_Score>`_
model and automatically select the best candidate above a minimum score:

.. code-block:: python

   results = diversify(
       "The experiment was conducted in a controlled lab setting.",
       semantic_filter=True,
   )

.. code-block:: python

   [{
       "original": "The experiment was conducted in a controlled lab setting.",
       "paraphrases": [
           "the experiment was in a controlled lab setting so it didnt suck...",
           "Well it was a controlled lab setting that the experiment was conducted in.",
           "Can you explain the experiment? It was conducted in a controlled lab setting.",
           "I heard the experiment was conducted in a controlled lab setting.",
           "I mean, this experiment was conducted in a controlled lab setting, so that was a good thing.",
       ]
   }]

Caching
-------

The ``diversify()`` function automatically caches loaded models between calls.
The generation model and the semantic filter are cached independently, so
toggling ``semantic_filter`` does not reload the generation model and vice
versa. Call ``clear_cache()`` to release cached model references when you are done.
On CUDA devices, memory may remain reserved by the underlying framework's caching
allocator and be reused in future calls rather than immediately returned to the OS/driver:

.. code-block:: python

   from diversify_text import clear_cache

   clear_cache()

Using the class directly
------------------------

You can also instantiate a ``Diversifier`` yourself for full control over the
model lifecycle:

.. code-block:: python

   from diversify_text import Diversifier

   div = Diversifier(device="cuda", methods=["tinystyler"])

   batch_1 = div.diversify(texts_1, n=5)
   batch_2 = div.diversify(texts_2, n=5)

Prompting method
----------------

Besides the default TinyStyler method, you can use the prompting method which
generates paraphrases via a causal language model (default:
`SmolLM3-3B <https://huggingface.co/HuggingFaceTB/SmolLM3-3B>`_):

.. code-block:: python

   results = diversify(
       "The experiment was conducted in a controlled lab setting.",
       methods=["prompting"],
   )

Select specific prompt styles via ``prompt_keys``:

.. code-block:: python

   results = diversify(
       "The experiment was conducted in a controlled lab setting.",
       methods=["prompting"],
       method_kwargs={
           "prompting": {
               "prompt_keys": ["simple_kew", "complex_kew", "caps_reif"]
           }
       },
   )

See :doc:`prompts` for the full list of available prompt templates.

Citation
--------

If you use ``diversify`` in your research, we are happy about a citation (placeholder currently).

.. code-block:: bibtex

    @inproceedings{wegmann2026diversify,
        title = {diversify_text: An Amazing Library for Text Diversification},
        author = {Wegmann, Anna and Others},
        url={https://github.com/AnnaWegmann/diversify_text},
        year = {2026},
    }