Skip to content

mistral_common.tokens.tokenizers.utils

chunks(lst, chunk_size)

Chunk a list into smaller lists of a given size.

Parameters:

Name Type Description Default
lst List[str]

The list to chunk.

required
chunk_size int

The size of each chunk.

required

Returns:

Type Description
Iterator[List[str]]

An iterator over the chunks.

Examples:

>>> all_chunks = list(chunks([1, 2, 3, 4, 5], 2))
Source code in src/mistral_common/tokens/tokenizers/utils.py
def chunks(lst: List[str], chunk_size: int) -> Iterator[List[str]]:
    r"""Chunk a list into smaller lists of a given size.

    Args:
        lst: The list to chunk.
        chunk_size: The size of each chunk.

    Returns:
        An iterator over the chunks.

    Examples:
        >>> all_chunks = list(chunks([1, 2, 3, 4, 5], 2))
    """
    for i in range(0, len(lst), chunk_size):
        yield lst[i : i + chunk_size]

download_tokenizer_from_hf_hub(model_id, **kwargs)

Download the configuration file of an official Mistral tokenizer from the Hugging Face Hub.

See here for a list of our OSS models.

Note

You need to install the huggingface_hub package to use this method.

Please run pip install mistral-common[hf-hub] to install it.

Parameters:

Name Type Description Default
model_id str

The Hugging Face model ID.

required
kwargs Any

Additional keyword arguments to pass to huggingface_hub.hf_hub_download.

{}

Returns:

Type Description
str

The downloaded tokenizer local path for the given model ID.

Source code in src/mistral_common/tokens/tokenizers/utils.py
def download_tokenizer_from_hf_hub(model_id: str, **kwargs: Any) -> str:
    r"""Download the configuration file of an official Mistral tokenizer from the Hugging Face Hub.

    See [here](../../../../models.md#list-of-open-models) for a list of our OSS models.

    Note:
        You need to install the `huggingface_hub` package to use this method.

        Please run `pip install mistral-common[hf-hub]` to install it.

    Args:
        model_id: The Hugging Face model ID.
        kwargs: Additional keyword arguments to pass to `huggingface_hub.hf_hub_download`.

    Returns:
        The downloaded tokenizer local path for the given model ID.
    """
    if not _hub_installed:
        raise ImportError(
            "Please install the `huggingface_hub` package to use this method.\n"
            "Run `pip install mistral-common[hf-hub]` to install it."
        )

    if model_id not in MODEL_HF_ID_TO_TOKENIZER_FILE:
        raise ValueError(f"Unrecognized model ID: {model_id}")

    tokenizer_file = MODEL_HF_ID_TO_TOKENIZER_FILE[model_id]
    tokenizer_path = huggingface_hub.hf_hub_download(repo_id=model_id, filename=tokenizer_file, **kwargs)
    return tokenizer_path