Ollama-Forge for Security Research: Local Models, Refusal Ablation, and Reproducible Pipelines

Introduction

Security work often involves prompts and data you cannot send to commercial APIs: malware descriptions, exploit drafts, jailbreak and prompt-injection tests, or sensitive internal docs. Running models locally gives you control and keeps that data on your machine. ollama-forge is a CLI (PyPI · GitHub) that makes it straightforward to fetch open-weight models, convert them to Ollama, remove refusal behavior when you need it for defensive or red-team research, run security evals (e.g. ASR, refusal rate) against your local model, and lock down exact setups for reproducibility.

This post is aimed at security researchers: what the tool does and how you can use it in your workflow.

Why local + ollama-forge

No data leaves your box — Prompts and outputs stay on your machine; no vendor logging or ToS concerns for sensitive or “harmful” content.
Refusal ablation (abliterate) — Many base models refuse on security-relevant tasks (e.g. “explain this payload”, “rewrite as a test case”). Abliterate ablates a learned “refusal direction” in the weights so you can use the same model for analysis, red-team prompts, or tool-aided workflows without hitting constant refusals.
Reproducible setups — Recipes and plan let you record exact model source, quant, and parameters so experiments and tooling can be repeated.
Security evaluation — Run prompt sets against your model and get attack success rate (ASR), refusal rate, and per-category KPIs; optional Streamlit UI and SQLite history for trends.
One pipeline — Fetch from Hugging Face, convert local GGUF, add adapters, run refusal-ablated models, or evaluate security from a single CLI instead of scattered scripts.

Quick start and environment

Install: pip install ollama-forge or uv tool install ollama-forge (PyPI). From source: uv sync then uv run ollama-forge. Verify:

ollama-forge check

check reports Ollama, Hugging Face, optional deps, and llama.cpp. Use ollama-forge doctor for diagnosis; doctor --fix applies safe fixes (e.g. uv sync, optional setup-llama-cpp).

Easiest path to a running local model:

ollama-forge start --name my-model
ollama run my-model

Fetch and convert: open weights only on your machine

Fetch a GGUF from Hugging Face and create an Ollama model:

ollama-forge fetch TheBloke/Llama-2-7B-GGUF --name my-model
ollama run my-model

Use --quant Q4_K_M (or another type) when the repo has multiple GGUF files; use --revision for a specific branch or tag. That gives you a fixed artifact for experiments.

Convert a local GGUF (e.g. from an air-gapped or internal build):

ollama-forge convert --gguf /path/to/model.gguf --name my-model

Optional: --quantize Q4_K_M to requantize (requires llama.cpp’s quantize on PATH) so you can shrink models for lower-memory labs or VMs.

Abliterate: refusal removal for security-relevant prompts

Many instruction-tuned models refuse on prompts that look “harmful” (exploits, payloads, malware snippets, prompt-injection examples). For defensive research (e.g. classifying malware, generating test cases, analyzing CVE text) or red-team work (jailbreak analysis, prompt injection, safety evaluation), you often need the model to engage with that content instead of refusing.

Abliterate computes a “refusal direction” from pairs of harmful vs. harmless instructions and ablates it in the model weights—no extra training. You get a model that still follows instructions but is much less likely to refuse on security-relevant queries.

One-shot run (compute direction → apply → convert to GGUF → create Ollama model):

uv sync --extra abliterate
ollama-forge abliterate run --model google/gemma-3-4b-it --name gemma-abliterated

Artifacts are saved under ./abliterate-<name>/ by default. The pipeline uses full abliteration by default (all layers; use --skip-begin-layers 1 --skip-end-layers 1 only if you want to preserve the first/last layer for coherence).

Chat and serve (correct tokenization): To chat with the Hugging Face tokenizer (avoids GGUF tokenizer issues on models like Gemma 3):

ollama-forge abliterate chat --name gemma-abliterated

If you already have abliterate serve running with the same model, abliterate chat --name <name> will connect to it instead of loading the model again—one process, no duplicate memory. To serve the model for agents and the Ollama CLI:

ollama-forge abliterate serve --name gemma-abliterated --port 11435

Set OLLAMA_HOST=http://127.0.0.1:11435 and use ollama run gemma-abliterated; the serve implements the Ollama API (chat, generate, stream, tools, images, format, think, logprobs) and strips role labels and echoed user input from responses.

Softer ablation: Use --strength 0.5 or 0.7 on small models to reduce refusals while keeping coherence.
Custom lists: Pass --harmful and --harmless with your own instruction lists if your threat model or research questions differ from the defaults.
Output location: Use --output-dir DIR to keep all artifacts in a custom directory (e.g. for lab or audit trails).

Auto and plan: flexible sources and dry-runs

Auto detects the source and runs the right flow: recipe file, GGUF path, HF repo id, base model name, or adapter. Handy when you switch between “official” GGUF, internal builds, or adapter-tuned models:

ollama-forge auto ./recipe.yaml
ollama-forge auto TheBloke/Llama-2-7B-GGUF --name my-model
ollama-forge auto llama3.2 --name my-assistant --system "You are a security analyst."

Plan previews operations without executing (dry-run). Use it to document or review exactly which GGUF, quant, and Modelfile would be used:

ollama-forge plan quickstart --name my-model
ollama-forge plan auto TheBloke/Mistral-7B-GGUF --name mistral

That helps with reproducibility and with checking that sensitive or internal sources are not accidentally pulled from the wrong repo.

Recipes and build: one-file reproducible setups

Define model source, name, system prompt, and parameters in a single YAML or JSON file, then build:

ollama-forge build recipe.yaml

Recipes can reference Hugging Face GGUF repos, local GGUF paths, base models plus adapters, or adapter repos. For security work, that gives you a version-controlled, shareable definition of “the model we used for this experiment” without relying on long one-off command lines.

Adapters (LoRA): domain-specific security models

Use or train adapters for security-focused tasks (e.g. malware family labels, CVE summarization, policy classification) and run them locally:

ollama-forge adapters search "llama security"
ollama-forge adapters recommend --base llama3.2 --limit 5
ollama-forge fetch-adapter <repo_id> --base llama3.2 --name my-tuned

Or use a local adapter directory:

ollama-forge retrain --base llama3.2 --adapter ./my-lora --name my-tuned

Adapters are merged into a Modelfile so Ollama runs the tuned model like any other—no cloud, same control as base models.

Downsizing and Modelfile

Downsizing (distillation): Run a smaller student model for faster iteration or resource-constrained labs:

ollama-forge downsize --teacher meta-llama/Llama-3.1-8B --student TinyLlama/TinyLlama-1.1B-Chat-v1.0 --name my-downsized

Refresh-template: If you use abliterated or custom models with tool-calling or a specific Chat API, you can align the chat template with a reference model:

ollama-forge refresh-template --name my-abliterated --base google/gemma-3-4b-it --template-only

Useful when your security tooling expects a fixed message format or tool schema.

Security evaluation: prompt sets and KPIs

Run prompt sets against Ollama or abliterate serve and get attack success rate (ASR), refusal rate, and per-category breakdown—no extra infra, just your existing model endpoint:

ollama-forge security-eval run path/to/prompts.txt --model gemma-abliterated --output-csv results.csv

Use .txt (one prompt per line) or .jsonl with prompt, category, and optional context (for indirect prompt injection). Optionally run the same lists you use for abliterate (e.g. after abliterate download-lists --output-dir ./eval_lists, run security-eval run ./eval_lists/harmful.txt). Add --save-history to store runs in SQLite for trend plots.

Streamlit UI: Install uv sync --extra security-eval-ui, then ollama-forge security-eval ui to run evals from a browser, view tables and ASR-by-category charts, and see run history with ASR-over-time. See the Security Eval wiki for formats and options.

Summary for security researchers

Goal	Command / flow
Run a local model quickly	`start` or `quickstart`
Use any source (recipe, GGUF, HF, adapter)	`auto <source> --name <name>`
Preview actions without executing	`plan quickstart \| auto \| ...`
GGUF from Hugging Face	`fetch <repo> --name <name>`
Local GGUF (e.g. air-gapped)	`convert --gguf <path> --name <name>`
Refusal ablation for red-team/defensive work	`abliterate run --model <hf> --name <name>`; chat: `abliterate chat --name <name>`; serve: `abliterate serve --name <name>` (then `OLLAMA_HOST=... ollama run <name>`)
Evaluate model security (ASR, refusal rate)	`security-eval run <prompt_set> --model <name> --output-csv out.csv`; UI: `security-eval ui` (after `uv sync --extra security-eval-ui`)
Reproducible model definition	`build recipe.yaml`
Domain adapters (LoRA)	`adapters search`, `fetch-adapter`, `retrain`
Smaller/faster model	`downsize --teacher <hf> --student <hf> --name <name>`
Environment check/fix	`check`, `doctor [--fix]`

ollama-forge gives security researchers a single pipeline to run open-weight models locally, remove refusals where needed for analysis or red-team work, evaluate security with prompt sets and KPIs, and lock down setups for reproducibility. For full options and guides, see the wiki: Command Reference, Abliterate, Security Eval.