When using a Local LLM, OpenHands may have limited functionality. It is highly recommended that you use GPUs to serve local models for optimal experience.
This guide explains how to serve a local Devstral LLM using LM Studio and have OpenHands connect to it.
We recommend:
Running Devstral requires a recent GPU with at least 16GB of VRAM, or a Mac with Apple Silicon (M1, M2, etc.) with at least 32GB of RAM.
Download and install the LM Studio desktop app from lmstudio.ai.
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.49-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.49-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.49
Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.49
Starting OpenHands...
Running OpenHands as root
14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
INFO: Started server process [8]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
http://localhost:3000
in your browser.Once you open OpenHands in your browser, you’ll need to configure it to use the local LLM server you just started.
When started for the first time, OpenHands will prompt you to set up the LLM provider.
Enable the “Advanced” switch at the top of the page to show all the available settings.
Set the following values:
openai/mistralai/devstral-small-2505
(the Model API identifier from LM Studio, prefixed with “openai/”)http://host.docker.internal:1234/v1
local-llm
Click “Save Settings” to save the configuration.
That’s it! You can now start using OpenHands with the local LLM server.
If you encounter any issues, let us know on Slack or Discord.
This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio.
# ⚠️ WARNING: OpenHands requires a large context size to work properly.
# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 32768.
# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly.
OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve &
ollama pull devstral:latest
First, download the model checkpoints. For Devstral Small 2505:
huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
--model mistralai/Devstral-Small-2505 \
--served-model-name Devstral-Small-2505 \
--port 8000 \
--tp 2 --dp 1 \
--host 0.0.0.0 \
--api-key mykey --context-length 131072
vllm serve mistralai/Devstral-Small-2505 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name Devstral-Small-2505 \
--enable-prefix-caching
If you are interested in further improved inference speed, you can also try Snowflake’s version of vLLM, ArcticInference, which can achieve up to 2x speedup in some cases.
pip install git+https://github.com/snowflakedb/ArcticInference.git
vllm serve mistralai/Devstral-Small-2505 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name Devstral-Small-2505 \
--speculative-config '{"method": "suffix"}'
Run OpenHands using the official docker run command.
Use the instructions in Development.md to build OpenHands.
Start OpenHands using make run
.
Once OpenHands is running, open the Settings page in the UI and go to the LLM
tab.
openai/<served-model-name>
e.g. openai/devstral
if you’re using Ollama, or openai/Devstral-Small-2505
for SGLang or vLLM.http://host.docker.internal:<port>/v1
Use port 11434
for Ollama, or 8000
for SGLang and vLLM.dummy
, local-llm
)mykey
)Was this page helpful?