Instructions to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced",
	filename="Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Use Docker

docker model run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

LM Studio
Jan

vLLM

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Ollama
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Ollama:
```
ollama run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
```

Unsloth Studio

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced to start chatting

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Docker Model Runner:
```
docker model run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
```

Lemonade

How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M

Run and chat with the model

lemonade run user.Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_M

List all available models

lemonade list

One LLM to rule them all (PERFECT!)

by Pink-Elephant - opened 18 days ago

Discussion

Pink-Elephant

18 days ago

I have been playing with this model for several days now. At least for my use cases, which is general stuff and NSWF storytelling/roleplay, I find that I literally don't need any other models anymore. First off, Gemma 4 is just amazing. Gemma 3 was very good, but this one is head and shoulders better yet. And once again, HauHau's technique to uncensor models is perfect. It doesn't require any custom prompting, and just works. The output doesn't feel tamed down either. The "Abliterated" techniques can't come close. And just as HauHau suggests, the "Balanced" release responds perfectly to my use cases.

As for the NSFW content, I've never seen any other model come close to how good this one is. Qwen can't touch it, nor can anything else. It almost always just "gets" what you want, and at most just needs a little shove in the right direction sometimes. It doesn't flood the output with adverbs/adjectives, but rather creates truly good responses that will elaborate on what you want, creatively, and without going off on some weird tangent. It can be as creative, or as dirty, as you want it to be. Also, it can progress a story without constant micro-direction. I've never seen anything this good before.

"Thinking" mode isn't on by default, but it isn't generally needed either. You can easily add it however. In "LM Studio", go to the "Inference" tab of the model's settings, and under "Reasoning Parsing", enter "<|channel>thought" (no quotes) for "Start String", and "<channel|>" for "End String". Then add the line "{%- set enable_thinking = true %}" to the top of the jinja template.

Now you can easily toggle "Thinking" off or on by changing enable_thinking to false or true. You don't even need to reload the model after. This method doesn't add a toggle switch to the user interface as many people would like, but it's a proper and safe method that doesn't use any trickery.

It's worth trying out "Thinking" mode with Gemma 4, as it's far better then it is with Qwen. Qwen will ramble on forever during thinking, and sometimes get stuck in an infinite loop. Gemma 4's thinking is sensible, and doesn't take very long either.

Andyx1976

17 days ago

•

edited 17 days ago

i mean it's own model card details directly contradict the no refusals claim. If you have to Re-ask something, that is a refusal. So if that 0/465 refusal claim, that can NOT be compared with the ususal benchmark that all others do, is not just copy/paste into every hauhaucs model card, that benchmark can not be realistic. It makes no sense in combination with the details in the lower parts. ONE of the two must be wrong, by simple logic..

And of course there is none of the usual benchmarks to backup the various bold 100%/perfect/just as good as the original claims.
Most of what you praise is just Gemma-4 in general.

having seen a deep dive I and dissection on one of the hauhaucs qwen models (27b) recently (same bold perfection claims) , it (and the mothod used) was shown to be good in some aspects, in most really, but it's not magically "perfect" or "just like the original".

And i was looking for something like that because everything around the hauhaucs models uncensoring AND benchmarking if any, is quite secretive, and tbh quite marketing shouty-ish without any of the regular benchmarks. If it is perfect as claimed, just do them?

Pink-Elephant

17 days ago

I've never had to re-ask it anything, and I've tested with some pretty wild and intense stuff. He does state "edge-case prompts", so maybe I'm just not being extreme enough. As I said, it's been perfect for my use cases. I'm just providing some actual real world feedback, not any in-depth analysis.

He does say that an Aggressive variant is coming, but I worry if it could be too accepting of given situations, without the needed checks and balances you'd expect in real life responses. (I just don't know.) Since I love the behavior I've experienced with this Balanced variant, I personally don't expect anything better from the Aggressive variant.

And yes, most of my praise is for Gemma 4 in general, which is why I led off with "Gemma 4 is just amazing". However, I have found HauHau's uncensor technique to be extraordinarily good versus other techniques, as they often require custom prompting, fail to uncensor correctly, and/or have watered down responses. HauHau's just works. If he wants to keep his method secret that's his business.

I appreciate your views, and you certainly have valid points. However, most people just want a model that's good for their needs, and don't care about anything else. I've provided my use case experience, and I think many people will appreciate that over charts of numbers and technical data.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment