Spanish Document sorter for LEMoE

This model is a fine-tuned version of dccuchile/bert-base-spanish-wwm-cased (BETO) optimized for binary classification of Spanish text. It specifically distinguishes between requests related to document or receipt retrieval (Label 1) and general or informational queries (Label 0).

The model was trained to handle both formal administrative Spanish and informal language containing typos, abbreviations, and phonetic spellings typical of quick text messaging.

Classification Labels

Label 0: General queries, informational requests, or non-document tasks.
Label 1: Document requests, invoices, receipts, certificates, or explicit text filtering commands.

Training Data Examples

Label 1: "filtra todos los rresbidos d orange d lucas d er trimestre pasao"
Label 1: "Se precisa el documento de reconocimiento explícito de deuda otorgado por Don Roberto Gil."
Label 0: "que contraindicaciones tne tomar creatina de la marka myprotein pa los rriñones"

Intended Uses

This model is intended to be used as a routing layer or intent classifier in automation systems to detect when a user is looking for a specific file, ticket, or document. The model is used in the papperles plugin for the LEMoE system lemoe.link

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 2e-5
Train Batch Size: 16
Number of Epochs: 5
Weight Decay: 0.01
Warmup Steps: 2
Mixed Precision (FP16): Enabled

How to Use

You can use this model directly with a HF Mirror pipeline:

Downloads last month: 19

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for lemoelink/LEMoEppc

Base model

dccuchile/bert-base-spanish-wwm-cased

Finetuned

(162)

this model

Dataset used to train lemoelink/LEMoEppc

Collection including lemoelink/LEMoEppc

LEMoE Models

Collection

5 items • Updated 13 days ago