--- base_model: meta-llama/Llama-3.1-8B-Instruct library_name: peft tags: - base_model:adapter:meta-llama/Llama-3.1-8B-Instruct - grpo - tinylora - belief-shift - trl - transformers license: llama3.1 --- # Semantic-Perinucleus-v1 **TinyLoRA** adapter on [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), trained with **GRPO** to increase semantic alignment of short answers with a target “alt” belief (cosine reward via `sentence-transformers/all-MiniLM-L6-v2`). ## Training - **Method:** Group Relative Policy Optimization ([TRL `GRPOTrainer`](https://huggingface.co/docs/trl/main/en/grpo_trainer)) - **Adapter:** [TinyLoRA](https://huggingface.co/docs/peft/main/en/package_reference/tinylora) — 13 trainable parameters (`u=13`, `weight_tying=1.0`, `r=2`, targets `q_proj`, `v_proj`) - **Reward:** Cosine similarity between each sampled completion and the target `Answer_Alt` string - **Base model:** Llama 3.1 8B Instruct (frozen; only adapter weights trained) ## Usage ```python import torch from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = "meta-llama/Llama-3.1-8B-Instruct" adapter = "/Semantic-Perinucleus-v1" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(model, adapter) ``` You need a HF Mirror token with access to the Llama 3.1 gated model. ## Limitations - Extremely small adapter; effects on downstream answers are subtle. Evaluate on your task before relying on it. - Intended for research on semantic / preference nudges, not factual guarantees. ## Citation If you use this adapter, cite the base Llama model and, if relevant, [Learning to Reason in 13 Parameters](https://arxiv.org/abs/2602.04118) (TinyLoRA) and TRL GRPO.