Meta-Llama-3.1-8B-Instruct-GGUF

Instructions to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
	filename="Meta-Llama-3.1-8B-Instruct-IQ4_XS.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Ollama
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Ollama:
```
ollama run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF to start chatting

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Meta-Llama-3.1-8B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

Update to Models

by Kaligraphy247 - opened Jul 29, 2024

Discussion

Kaligraphy247

Jul 29, 2024

Hi, I can see that all .gguf. files have been updated. Do I have to update mine?

bartowski

Jul 29, 2024

If you update to the newest LM Studio you're going to want to pull the new .gguf yes, it'll work better especially with long context

SWVAI9

Jul 29, 2024

Is anyone else having issues with the lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF in LM Studio.
I am seeing this for all the 3.1 8B versions.
I get the following error when I try to load in LM Studio:
"llama.cpp error: 'done_getting_tensors: wrong number of tensors; expected 292, got 291'"
I am using LM Studio Version 0.2.28 which is reporting as the current version.

bartowski

Jul 29, 2024

@SWVAI9 you need to grab the newest one on the website:

https://lmstudio.ai/

0.2.29 is available for download there

SWVAI9

Jul 30, 2024

Thank you very much for your guidance. Working perfectly with the new version.

SuohLaevatein

Jul 30, 2024

Hey there, i am still having issues with the "expected 292, got 291" error. I have upgraded to LM studio 0.2.29 but it still wont load the model.
Im on Arch Linux , the normal model 3 works just fine but not the 3.1. Any recommendations?

bartowski

Jul 30, 2024

@SuohLaevatein sounds like you need to update the model, delete the one you have locally and download again

SuohLaevatein

Jul 30, 2024

I've tried doing so, but no matter which version of the model i choose , download or redownload there is no difference sadly... Are there maybe some presets or is something cached and is preventing the loading of new models?

kkirkfield

Jul 30, 2024

•

edited Jul 30, 2024

When running from LM Studio (latest version 0.2.29) with the updated model this worked for me.

When running from Ollama (latest release version 0.3.0) with the updated model, I was still getting this error.

Ollama Logs

2024-07-30 13:44:40 llm_load_tensors: ggml ctx size =    0.27 MiB
2024-07-30 13:44:40 llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
2024-07-30 13:44:40 llama_load_model_from_file: exception loading model
2024-07-30 13:44:40 terminate called after throwing an instance of 'std::runtime_error'
2024-07-30 13:44:40   what():  done_getting_tensors: wrong number of tensors; expected 292, got 291
2024-07-30 13:44:41 time=2024-07-30T17:44:41.027Z level=ERROR source=sched.go:443 msg="error loading llama server" error="llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291\nllama_load_model_from_file: exception loading model"

The fix that worked for me was updating to the prerelease version of Ollama 0.3.1 that was just released. I no longer get this error now.

urtuuuu

Jul 30, 2024

•

edited Jul 30, 2024

I tried everything but it's not answering same questions correctly in LM Studio as it does in llama.cpp. I'm using same Q4_K_M.gguf by bartowski with this template
"<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
So, something is wrong with lm studio templates. I tried all Llama 3, Llama 3 v2. Also this system promt instead of default. In llama.cpp it 100% always answers my reasoning question correctly at temperature "0". (just like LMSYS Chatbot Arena)

bartowski

Jul 31, 2024

@SuohLaevatein

can you check if this thread resolves it for you? https://x.com/LMStudioAI/status/1818646952252244389

Clausss

Aug 1, 2024

I same error that tensor not match when using beta lm studio(I cant send screenshot)

poormansblackburne

Aug 21, 2024

Does this still supports function calling? Normally llama3.1 8b can call functions but that gguf does not enter functions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment