Instructions to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", filename="Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Use Docker
docker model run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
- Ollama
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Ollama:
ollama run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
- Unsloth Studio
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF to start chatting
- Pi
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Docker Model Runner:
docker model run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
- Lemonade
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Meta-Llama-3.1-8B-Instruct-GGUF-Q4_K_M
List all available models
lemonade list
Update to Models
Hi, I can see that all .gguf. files have been updated. Do I have to update mine?
If you update to the newest LM Studio you're going to want to pull the new .gguf yes, it'll work better especially with long context
Is anyone else having issues with the lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF in LM Studio.
I am seeing this for all the 3.1 8B versions.
I get the following error when I try to load in LM Studio:
"llama.cpp error: 'done_getting_tensors: wrong number of tensors; expected 292, got 291'"
I am using LM Studio Version 0.2.28 which is reporting as the current version.
Thank you very much for your guidance. Working perfectly with the new version.
Hey there, i am still having issues with the "expected 292, got 291" error. I have upgraded to LM studio 0.2.29 but it still wont load the model.
Im on Arch Linux , the normal model 3 works just fine but not the 3.1. Any recommendations?
@SuohLaevatein sounds like you need to update the model, delete the one you have locally and download again
I've tried doing so, but no matter which version of the model i choose , download or redownload there is no difference sadly... Are there maybe some presets or is something cached and is preventing the loading of new models?
When running from LM Studio (latest version 0.2.29) with the updated model this worked for me.
When running from Ollama (latest release version 0.3.0) with the updated model, I was still getting this error.
Ollama Logs
2024-07-30 13:44:40 llm_load_tensors: ggml ctx size = 0.27 MiB
2024-07-30 13:44:40 llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
2024-07-30 13:44:40 llama_load_model_from_file: exception loading model
2024-07-30 13:44:40 terminate called after throwing an instance of 'std::runtime_error'
2024-07-30 13:44:40 what(): done_getting_tensors: wrong number of tensors; expected 292, got 291
2024-07-30 13:44:41 time=2024-07-30T17:44:41.027Z level=ERROR source=sched.go:443 msg="error loading llama server" error="llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291\nllama_load_model_from_file: exception loading model"
The fix that worked for me was updating to the prerelease version of Ollama 0.3.1 that was just released. I no longer get this error now.
I tried everything but it's not answering same questions correctly in LM Studio as it does in llama.cpp. I'm using same Q4_K_M.gguf by bartowski with this template
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
So, something is wrong with lm studio templates. I tried all Llama 3, Llama 3 v2. Also this system promt instead of default. In llama.cpp it 100% always answers my reasoning question correctly at temperature "0". (just like LMSYS Chatbot Arena)
can you check if this thread resolves it for you? https://x.com/LMStudioAI/status/1818646952252244389
I same error that tensor not match when using beta lm studio(I cant send screenshot)
Does this still supports function calling? Normally llama3.1 8b can call functions but that gguf does not enter functions.