Spanish Document sorter for LEMoE
This model is a fine-tuned version of dccuchile/bert-base-spanish-wwm-cased (BETO) optimized for binary classification of Spanish text. It specifically distinguishes between requests related to document or receipt retrieval (Label 1) and general or informational queries (Label 0).
The model was trained to handle both formal administrative Spanish and informal language containing typos, abbreviations, and phonetic spellings typical of quick text messaging.
Classification Labels
- Label 0: General queries, informational requests, or non-document tasks.
- Label 1: Document requests, invoices, receipts, certificates, or explicit text filtering commands.
Training Data Examples
- Label 1: "filtra todos los rresbidos d orange d lucas d er trimestre pasao"
- Label 1: "Se precisa el documento de reconocimiento explícito de deuda otorgado por Don Roberto Gil."
- Label 0: "que contraindicaciones tne tomar creatina de la marka myprotein pa los rriñones"
Intended Uses
This model is intended to be used as a routing layer or intent classifier in automation systems to detect when a user is looking for a specific file, ticket, or document. The model is used in the papperles plugin for the LEMoE system lemoe.link
Training Hyperparameters
The following hyperparameters were used during training:
- Learning Rate: 2e-5
- Train Batch Size: 16
- Number of Epochs: 5
- Weight Decay: 0.01
- Warmup Steps: 2
- Mixed Precision (FP16): Enabled
How to Use
You can use this model directly with a HF Mirror pipeline:
- Downloads last month
- 19
Model tree for lemoelink/LEMoEppc
Base model
dccuchile/bert-base-spanish-wwm-cased