[go: up one dir, main page]

Skip to content

Latest commit

 

History

History
151 lines (112 loc) · 3.8 KB

README.md

File metadata and controls

151 lines (112 loc) · 3.8 KB

mistral.rs PyO3 Bindings: mistralrs

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

  1. Install Rust: https://rustup.rs/

    Example on Ubuntu:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
  2. mistralrs depends on the openssl library.

Example on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config
  1. Install it!
  • CUDA

    pip install mistralrs-cuda -v

  • Metal

    pip install mistralrs-metal -v

  • Apple Accelerate

    pip install mistralrs-accelerate -v

  • Intel MKL

    pip install mistralrs-mkl -v

  • Without accelerators

    pip install mistralrs -v

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

  1. Install required packages

    • openssl (Example on Ubuntu: sudo apt install libssl-dev)
    • pkg-config (Example on Ubuntu: sudo apt install pkg-config)
  2. Install Rust: https://rustup.rs/

    Example on Ubuntu:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
  3. Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

    Example on Ubuntu:

    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
  4. Download the code

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
  5. cd into the correct directory for building mistralrs: cd mistralrs-pyo3

  6. Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.

    pip install maturin[patchelf]
    
  7. Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

    The base build command is:

    maturin develop -r
    • To build for CUDA:
    maturin develop -r --features cuda
    • To build for CUDA with flash attention:
    maturin develop -r --features "cuda flash-attn"
    • To build for Metal:
    maturin develop -r --features metal
    • To build for Accelerate:
    maturin develop -r --features accelerate
    • To build for MKL:
    maturin develop -r --features mkl

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.GGUF(
        tok_model_id="mistralai/Mistral-7B-Instruct-v0.1",
        quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
        quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
        tokenizer_json=None,
        repeat_last_n=64,
    )
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res.choices[0].message.content)
print(res.usage)