Models
HomeClaw supports cloud LLMs (OpenAI, Gemini, DeepSeek, etc. via LiteLLM) and local LLMs (llama.cpp, GGUF). You can use one or both; main and embedding model are configured separately. Cloud and local can work together for better capability and cost. Multimodal (images, audio, video) works with both cloud (e.g. Gemini, GPT-4o) and local models (e.g. Qwen2-VL with mmproj)—tested with both; all work well.
Cloud models
- In
config/core.yml, undercloud_models, add entries withid,path(LiteLLM model name, e.g.openai/gpt-4o,gemini/gemini-2.5-flash),host,port, andapi_key_name(e.g.OPENAI_API_KEY,GEMINI_API_KEY). - API key: You can set the API key in either of two ways:
- Environment variable (recommended): Set the variable with that name where Core runs (e.g.
export GEMINI_API_KEY=...). Keeps keys out of config files. - In core.yml: Under the same cloud model entry, you can set
api_key: "your-key"(e.g. for convenience or local testing). For production or shared repos, prefer environment variables. - Set
main_llmorembedding_llmto e.g.cloud_models/OpenAI-GPT4oorcloud_models/Gemini-2.5-Flash.
Supported providers include OpenAI, Google Gemini, DeepSeek, Anthropic, Groq, Mistral, xAI, OpenRouter, and more. See LiteLLM docs.
Local models
- Run GGUF models via a llama.cpp server. Place model files in a
models/directory (or path set bymodel_pathinconfig/core.yml). - In
config/core.yml, underlocal_models, add entries withid,path(relative tomodel_path),host,port. Setmain_llmandembedding_llmto e.g.local_models/<id>. - Copy llama.cpp's binary distribution into
llama.cpp-master/<platform>/for your device type (mac/, win_cuda/, linux_cpu/, etc.; seellama.cpp-master/README.mdin the repo). Used for both main and embedding local models. Then start the llama.cpp server(s) for each model.
Use cloud and local together
You can use a cloud model for chat and a local model for embedding (or the other way around). Set main_llm and embedding_llm to the appropriate cloud_models/<id> or local_models/<id>. Cloud and local can work together for better capability and cost. Switch at runtime via CLI: llm cloud (cloud) or llm set (local), or by editing config/core.yml and restarting Core. You can also switch from the Companion app (Manage Core → LLM).
Multimodal (images, audio, video)
- Cloud: Gemini, GPT-4o, and other providers support images (and often audio/video). Set
main_llmto e.g.cloud_models/Gemini-2.5-Flashinconfig/core.yml. Gemini works well for multimodal; tested with both cloud and local. - Local: Use a vision-capable model (e.g. Qwen2-VL, LLaVA) with mmproj in
config/core.ymlunderlocal_models. Setsupported_media: [image](or[image, audio, video]if the model supports it). - The Companion app and WebChat can send images and files; Core converts them to the format the model expects (e.g. data URL for vision APIs).
Tested configurations
These are example configurations we have tested. You can use them as a starting point or mix local and cloud.
Local models (tested)
| Use | Model | Notes |
|---|---|---|
| Main (chat + vision) | Qwen3VL-4B — small 4B model with vision (mmproj) | local_models/main_vl_model_4B in core.yml; path: Qwen3VL-4B-Instruct-Q4_K_M.gguf, mmproj: mmproj-Qwen3VL-4B-Instruct-F16.gguf; port 5023. supported_media: [image] for Companion/WebChat images. |
| Embedding | Qwen3-Embedding-0.6B | local_models/embedding_text_model; path: Qwen3-Embedding-0.6B-Q8_0.gguf; port 5066. |
| Other local options | LLaVA 1.5 7B, Qwen3VL-8B, etc. | Add entries under local_models with path, optional mmproj, host, port, supported_media. See config/core.yml in the repo for more examples. |
Cloud models (tested)
| Use | Model | API key |
|---|---|---|
| Main (chat + vision) | Gemini 2.5 Flash | Set GEMINI_API_KEY in the environment, or api_key in the Gemini-2.5-Flash entry in config/core.yml. |
| Other cloud options | OpenAI GPT-4o, Anthropic Claude, DeepSeek, Groq, Mistral, xAI, OpenRouter, Cohere, Perplexity, Ollama (no key) | Each has an api_key_name (e.g. OPENAI_API_KEY). Set that env var or api_key in the model entry in core.yml. |
Mix mode (tested)
- main_llm_mode: mix — router picks local (e.g. main_vl_model_4B) or cloud (e.g. Gemini-2.5-Flash) per request.
- main_llm_local:
local_models/main_vl_model_4B - main_llm_cloud:
cloud_models/Gemini-2.5-Flash - embedding_llm:
local_models/embedding_text_model(or a cloud embedding if you prefer).
API keys for cloud models: set via environment variable (recommended) or api_key in config/core.yml per model.