Instructions to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced", filename="Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-IQ2_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M # Run inference directly in the terminal: llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M # Run inference directly in the terminal: llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Use Docker
docker model run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
- Ollama
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Ollama:
ollama run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
- Unsloth Studio
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced to start chatting
- Pi
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Docker Model Runner:
docker model run hf.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
- Lemonade
How to use HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced:Q4_K_M
Run and chat with the model
lemonade run user.Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced-Q4_K_M
List all available models
lemonade list
One LLM to rule them all (PERFECT!)
I have been playing with this model for several days now. At least for my use cases, which is general stuff and NSWF storytelling/roleplay, I find that I literally don't need any other models anymore. First off, Gemma 4 is just amazing. Gemma 3 was very good, but this one is head and shoulders better yet. And once again, HauHau's technique to uncensor models is perfect. It doesn't require any custom prompting, and just works. The output doesn't feel tamed down either. The "Abliterated" techniques can't come close. And just as HauHau suggests, the "Balanced" release responds perfectly to my use cases.
As for the NSFW content, I've never seen any other model come close to how good this one is. Qwen can't touch it, nor can anything else. It almost always just "gets" what you want, and at most just needs a little shove in the right direction sometimes. It doesn't flood the output with adverbs/adjectives, but rather creates truly good responses that will elaborate on what you want, creatively, and without going off on some weird tangent. It can be as creative, or as dirty, as you want it to be. Also, it can progress a story without constant micro-direction. I've never seen anything this good before.
"Thinking" mode isn't on by default, but it isn't generally needed either. You can easily add it however. In "LM Studio", go to the "Inference" tab of the model's settings, and under "Reasoning Parsing", enter "<|channel>thought" (no quotes) for "Start String", and "<channel|>" for "End String". Then add the line "{%- set enable_thinking = true %}" to the top of the jinja template.
Now you can easily toggle "Thinking" off or on by changing enable_thinking to false or true. You don't even need to reload the model after. This method doesn't add a toggle switch to the user interface as many people would like, but it's a proper and safe method that doesn't use any trickery.
It's worth trying out "Thinking" mode with Gemma 4, as it's far better then it is with Qwen. Qwen will ramble on forever during thinking, and sometimes get stuck in an infinite loop. Gemma 4's thinking is sensible, and doesn't take very long either.
i mean it's own model card details directly contradict the no refusals claim. If you have to Re-ask something, that is a refusal. So if that 0/465 refusal claim, that can NOT be compared with the ususal benchmark that all others do, is not just copy/paste into every hauhaucs model card, that benchmark can not be realistic. It makes no sense in combination with the details in the lower parts. ONE of the two must be wrong, by simple logic..
And of course there is none of the usual benchmarks to backup the various bold 100%/perfect/just as good as the original claims.
Most of what you praise is just Gemma-4 in general.
having seen a deep dive I and dissection on one of the hauhaucs qwen models (27b) recently (same bold perfection claims) , it (and the mothod used) was shown to be good in some aspects, in most really, but it's not magically "perfect" or "just like the original".
And i was looking for something like that because everything around the hauhaucs models uncensoring AND benchmarking if any, is quite secretive, and tbh quite marketing shouty-ish without any of the regular benchmarks. If it is perfect as claimed, just do them?
I've never had to re-ask it anything, and I've tested with some pretty wild and intense stuff. He does state "edge-case prompts", so maybe I'm just not being extreme enough. As I said, it's been perfect for my use cases. I'm just providing some actual real world feedback, not any in-depth analysis.
He does say that an Aggressive variant is coming, but I worry if it could be too accepting of given situations, without the needed checks and balances you'd expect in real life responses. (I just don't know.) Since I love the behavior I've experienced with this Balanced variant, I personally don't expect anything better from the Aggressive variant.
And yes, most of my praise is for Gemma 4 in general, which is why I led off with "Gemma 4 is just amazing". However, I have found HauHau's uncensor technique to be extraordinarily good versus other techniques, as they often require custom prompting, fail to uncensor correctly, and/or have watered down responses. HauHau's just works. If he wants to keep his method secret that's his business.
I appreciate your views, and you certainly have valid points. However, most people just want a model that's good for their needs, and don't care about anything else. I've provided my use case experience, and I think many people will appreciate that over charts of numbers and technical data.