README.md

mistral.rs PyO3 Bindings: `mistralrs`

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

Install Rust: https://rustup.rs/

Example on Ubuntu:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

mistralrs depends on the openssl library.

Example on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config

Install it!

CUDA

pip install mistralrs-cuda -v
Metal

pip install mistralrs-metal -v
Apple Accelerate

pip install mistralrs-accelerate -v
Intel MKL

pip install mistralrs-mkl -v
Without accelerators

pip install mistralrs -v

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

Install required packages
- openssl (Example on Ubuntu: sudo apt install libssl-dev)
- pkg-config (Example on Ubuntu: sudo apt install pkg-config)

Install Rust: https://rustup.rs/

Example on Ubuntu:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

Example on Ubuntu:
```
mkdir ~/.cache/huggingface
touch ~/.cache/huggingface/token
echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
```

Download the code

git clone https://github.com/EricLBuehler/mistral.rs.git
cd mistral.rs

cd into the correct directory for building mistralrs: cd mistralrs-pyo3
Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.
```
pip install maturin[patchelf]
```
Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

The base build command is:
```
maturin develop -r
```
- To build for CUDA:
```
maturin develop -r --features cuda
```
- To build for CUDA with flash attention:
```
maturin develop -r --features "cuda flash-attn"
```
- To build for Metal:
```
maturin develop -r --features metal
```
- To build for Accelerate:
```
maturin develop -r --features accelerate
```
- To build for MKL:
```
maturin develop -r --features mkl
```

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.GGUF(
        tok_model_id="mistralai/Mistral-7B-Instruct-v0.1",
        quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
        quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
        tokenizer_json=None,
        repeat_last_n=64,
    )
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res.choices[0].message.content)
print(res.usage)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

mistral.rs PyO3 Bindings: `mistralrs`

Installation from PyPi

Installation from source

Example

Files

README.md

Latest commit

History

README.md

File metadata and controls

mistral.rs PyO3 Bindings: mistralrs

Installation from PyPi

Installation from source

Example

mistral.rs PyO3 Bindings: `mistralrs`