Skip to content

Models

HomeClaw supports cloud LLMs (OpenAI, Gemini, DeepSeek, etc. via LiteLLM) and local LLMs (llama.cpp, GGUF). You can use one or both; main and embedding model are configured separately. Cloud and local can work together for better capability and cost. Multimodal (images, audio, video) works with both cloud (e.g. Gemini, GPT-4o) and local models (e.g. Qwen2-VL with mmproj)—tested with both; all work well.


Cloud models

  • In config/core.yml, under cloud_models, add entries with id, path (LiteLLM model name, e.g. openai/gpt-4o, gemini/gemini-2.5-flash), host, port, and api_key_name (e.g. OPENAI_API_KEY, GEMINI_API_KEY).
  • API key: You can set the API key in either of two ways:
  • Environment variable (recommended): Set the variable with that name where Core runs (e.g. export GEMINI_API_KEY=...). Keeps keys out of config files.
  • In core.yml: Under the same cloud model entry, you can set api_key: "your-key" (e.g. for convenience or local testing). For production or shared repos, prefer environment variables.
  • Set main_llm or embedding_llm to e.g. cloud_models/OpenAI-GPT4o or cloud_models/Gemini-2.5-Flash.

Supported providers include OpenAI, Google Gemini, DeepSeek, Anthropic, Groq, Mistral, xAI, OpenRouter, and more. See LiteLLM docs.


Local models

  • Run GGUF models via a llama.cpp server. Place model files in a models/ directory (or path set by model_path in config/core.yml).
  • In config/core.yml, under local_models, add entries with id, path (relative to model_path), host, port. Set main_llm and embedding_llm to e.g. local_models/<id>.
  • Copy llama.cpp's binary distribution into llama.cpp-master/<platform>/ for your device type (mac/, win_cuda/, linux_cpu/, etc.; see llama.cpp-master/README.md in the repo). Used for both main and embedding local models. Then start the llama.cpp server(s) for each model.

Use cloud and local together

You can use a cloud model for chat and a local model for embedding (or the other way around). Set main_llm and embedding_llm to the appropriate cloud_models/<id> or local_models/<id>. Cloud and local can work together for better capability and cost. Switch at runtime via CLI: llm cloud (cloud) or llm set (local), or by editing config/core.yml and restarting Core. You can also switch from the Companion app (Manage Core → LLM).


Multimodal (images, audio, video)

  • Cloud: Gemini, GPT-4o, and other providers support images (and often audio/video). Set main_llm to e.g. cloud_models/Gemini-2.5-Flash in config/core.yml. Gemini works well for multimodal; tested with both cloud and local.
  • Local: Use a vision-capable model (e.g. Qwen2-VL, LLaVA) with mmproj in config/core.yml under local_models. Set supported_media: [image] (or [image, audio, video] if the model supports it).
  • The Companion app and WebChat can send images and files; Core converts them to the format the model expects (e.g. data URL for vision APIs).

Tested configurations

These are example configurations we have tested. You can use them as a starting point or mix local and cloud.

Local models (tested)

Use Model Notes
Main (chat + vision) Qwen3VL-4B — small 4B model with vision (mmproj) local_models/main_vl_model_4B in core.yml; path: Qwen3VL-4B-Instruct-Q4_K_M.gguf, mmproj: mmproj-Qwen3VL-4B-Instruct-F16.gguf; port 5023. supported_media: [image] for Companion/WebChat images.
Embedding Qwen3-Embedding-0.6B local_models/embedding_text_model; path: Qwen3-Embedding-0.6B-Q8_0.gguf; port 5066.
Other local options LLaVA 1.5 7B, Qwen3VL-8B, etc. Add entries under local_models with path, optional mmproj, host, port, supported_media. See config/core.yml in the repo for more examples.

Cloud models (tested)

Use Model API key
Main (chat + vision) Gemini 2.5 Flash Set GEMINI_API_KEY in the environment, or api_key in the Gemini-2.5-Flash entry in config/core.yml.
Other cloud options OpenAI GPT-4o, Anthropic Claude, DeepSeek, Groq, Mistral, xAI, OpenRouter, Cohere, Perplexity, Ollama (no key) Each has an api_key_name (e.g. OPENAI_API_KEY). Set that env var or api_key in the model entry in core.yml.

Mix mode (tested)

  • main_llm_mode: mix — router picks local (e.g. main_vl_model_4B) or cloud (e.g. Gemini-2.5-Flash) per request.
  • main_llm_local: local_models/main_vl_model_4B
  • main_llm_cloud: cloud_models/Gemini-2.5-Flash
  • embedding_llm: local_models/embedding_text_model (or a cloud embedding if you prefer).

API keys for cloud models: set via environment variable (recommended) or api_key in config/core.yml per model.