Instructions to use vanta-research/mox-small-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
PEFT
How to use vanta-research/mox-small-1 with PEFT:
```
Task type is invalid.
```

How to use vanta-research/mox-small-1 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="vanta-research/mox-small-1",
	filename="mox-small-1-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use vanta-research/mox-small-1 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf vanta-research/mox-small-1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf vanta-research/mox-small-1:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf vanta-research/mox-small-1:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf vanta-research/mox-small-1:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf vanta-research/mox-small-1:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf vanta-research/mox-small-1:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf vanta-research/mox-small-1:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf vanta-research/mox-small-1:Q4_K_M

Use Docker

docker model run hf.co/vanta-research/mox-small-1:Q4_K_M

LM Studio
Jan
Ollama
How to use vanta-research/mox-small-1 with Ollama:
```
ollama run hf.co/vanta-research/mox-small-1:Q4_K_M
```

Unsloth Studio new

How to use vanta-research/mox-small-1 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vanta-research/mox-small-1 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vanta-research/mox-small-1 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for vanta-research/mox-small-1 to start chatting

Pi new

How to use vanta-research/mox-small-1 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf vanta-research/mox-small-1:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "vanta-research/mox-small-1:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use vanta-research/mox-small-1 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf vanta-research/mox-small-1:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default vanta-research/mox-small-1:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use vanta-research/mox-small-1 with Docker Model Runner:
```
docker model run hf.co/vanta-research/mox-small-1:Q4_K_M
```

Lemonade

How to use vanta-research/mox-small-1 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull vanta-research/mox-small-1:Q4_K_M

Run and chat with the model

lemonade run user.mox-small-1-Q4_K_M

List all available models

lemonade list

VANTA Research

Independent AI safety research lab specializing in cognitive fit, alignment, and human-AI collaboration

Mox-Small-1

A direct, opinionated AI assistant fine-tuned for authentic engagement and genuine helpfulness.

Mox-Small-1 is a persona-tuned language model developed by VANTA Research, built on the Olmo3.1 32B Instruct architecture. Like its sibling Mox-Tiny-1, this model prioritizes clarity, honesty, and usefulness over agreeableness, but with enhanced reasoning and depth thanks to its larger base.

Mox-Small-1 will:

Give direct opinions instead of hedging
Push back on flawed premises (respectfully but firmly)
Admit uncertainty transparently
Engage with genuine curiosity and humor

Key Characteristics

Trait	Description
Direct & Opinionated	Clear answers, no endless "on the other hand" equivocation
Constructively Disagreeable	Challenges weak arguments without being combative
Epistemically Calibrated	Distinguishes confident knowledge from uncertainty
Warm with Humor	Playful but professional, with levity where appropriate
Intellectually Curious	Dives deep into interesting questions

Training Data

Fine-tuned on ~18,000 curated conversations across 17 datasets, including:

Direct Opinions (~1k examples)
Constructive Disagreement (~1.6k examples)
Epistemic Confidence (~1.5k examples)
Humor & Levity (~1.5k examples)
Wonder & Puzzlement (~1.7k examples) (Same datasets as Mox-Tiny-1; identical persona/tone.)

Training Duration: ~3 days

Intended Use

Thinking partnership (complex problem-solving)
Honest feedback (direct opinions, not validation)
Technical discussions (programming, architecture, debugging)
Intellectual exploration (philosophy, science, open-ended questions)

Technical Details

Property	Value
Base Model	Olmo3.1 32B Instruct
Fine-tuning Method	QLoRA
Context Length	64K
Precision	BF16 (full), Q4_K_M (quantized)
License	Apache 2.0

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("vanta-research/mox-small-1")
tokenizer = AutoTokenizer.from_pretrained("vanta-research/mox-small-1")

Limitations

This model was finetuned on an English-only dataset. Personality traits may occasionally conflict, and base model limitations/biases apply (knowledge cutoff, potential hallucinations)

VANTA Research encourages developers to indepedently conclude production readiness prior to downstream deployment.

Citation

@misc{mox-small-1-2026,
  author = {VANTA Research},
  title = {Mox-Small-1: A Direct, Opinionated AI Assistant},
  year = {2026},
  publisher = {VANTA Research}
}