Spanish Document sorter for LEMoE

This model is a fine-tuned version of dccuchile/bert-base-spanish-wwm-cased (BETO) optimized for binary classification of Spanish text. It specifically distinguishes between requests related to document or receipt retrieval (Label 1) and general or informational queries (Label 0).

The model was trained to handle both formal administrative Spanish and informal language containing typos, abbreviations, and phonetic spellings typical of quick text messaging.

Classification Labels

  • Label 0: General queries, informational requests, or non-document tasks.
  • Label 1: Document requests, invoices, receipts, certificates, or explicit text filtering commands.

Training Data Examples

  • Label 1: "filtra todos los rresbidos d orange d lucas d er trimestre pasao"
  • Label 1: "Se precisa el documento de reconocimiento explícito de deuda otorgado por Don Roberto Gil."
  • Label 0: "que contraindicaciones tne tomar creatina de la marka myprotein pa los rriñones"

Intended Uses

This model is intended to be used as a routing layer or intent classifier in automation systems to detect when a user is looking for a specific file, ticket, or document. The model is used in the papperles plugin for the LEMoE system lemoe.link

Training Hyperparameters

The following hyperparameters were used during training:

  • Learning Rate: 2e-5
  • Train Batch Size: 16
  • Number of Epochs: 5
  • Weight Decay: 0.01
  • Warmup Steps: 2
  • Mixed Precision (FP16): Enabled

How to Use

You can use this model directly with a HF Mirror pipeline:

Downloads last month
19
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lemoelink/LEMoEppc

Finetuned
(162)
this model

Dataset used to train lemoelink/LEMoEppc

Collection including lemoelink/LEMoEppc