Instructions to use liaad/ud_srl-enpt_xlmr-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use liaad/ud_srl-enpt_xlmr-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="liaad/ud_srl-enpt_xlmr-large")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("liaad/ud_srl-enpt_xlmr-large") model = AutoModel.from_pretrained("liaad/ud_srl-enpt_xlmr-large") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Error:"datasets[2]" with value "Universal Dependencies" is not valid. If possible, use a dataset id from https://hf.co/datasets.
XLM-R large fine-tuned in Portuguese Universal Dependencies and English and Portuguese semantic role labeling
Model description
This model is the xlm-roberta-large fine-tuned first on the Universal Dependencies Portuguese dataset, then fine-tuned on the CoNLL formatted OntoNotes v5.0 and then fine-tuned on the PropBank.Br data. This is part of a project from which resulted the following models:
- liaad/srl-pt_bertimbau-base
- liaad/srl-pt_bertimbau-large
- liaad/srl-pt_xlmr-base
- liaad/srl-pt_xlmr-large
- liaad/srl-pt_mbert-base
- liaad/srl-en_xlmr-base
- liaad/srl-en_xlmr-large
- liaad/srl-en_mbert-base
- liaad/srl-enpt_xlmr-base
- liaad/srl-enpt_xlmr-large
- liaad/srl-enpt_mbert-base
- liaad/ud_srl-pt_bertimbau-large
- liaad/ud_srl-pt_xlmr-large
- liaad/ud_srl-enpt_xlmr-large
For more information, please see the accompanying article (See BibTeX entry and citation info below) and the project's github.
Intended uses & limitations
How to use
To use the transformers portion of this model:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("liaad/ud_srl-enpt_xlmr-large")
model = AutoModel.from_pretrained("liaad/ud_srl-enpt_xlmr-large")
To use the full SRL model (transformers portion + a decoding layer), refer to the project's github.
Limitations and bias
- This model does not include a Tensorflow version. This is because the "type_vocab_size" in this model was changed (from 1 to 2) and, therefore, it cannot be easily converted to Tensorflow.
- The model was trained only for 10 epochs in the Universal Dependencies dataset.
- The model was trained only for 5 epochs in the CoNLL formatted OntoNotes v5.0.
- The English data was preprocessed to match the Portuguese data, so there are some differences in role attributions and some roles were removed from the data.
Training procedure
The model was trained on the Universal Dependencies Portuguese dataset; then on the CoNLL formatted OntoNotes v5.0; then on Portuguese semantic role labeling data (PropBank.Br) using 10-fold Cross-Validation. The 10 resulting models were tested on the folds as well as on a smaller opinion dataset "Buscapé". For more information, please see the accompanying article (See BibTeX entry and citation info below) and the project's github.
Eval results
| Model Name | F1 CV PropBank.Br (in domain) | F1 Buscapé (out of domain) |
|---|---|---|
srl-pt_bertimbau-base |
76.30 | 73.33 |
srl-pt_bertimbau-large |
77.42 | 74.85 |
srl-pt_xlmr-base |
75.22 | 72.82 |
srl-pt_xlmr-large |
77.59 | 73.84 |
srl-pt_mbert-base |
72.76 | 66.89 |
srl-en_xlmr-base |
66.59 | 65.24 |
srl-en_xlmr-large |
67.60 | 64.94 |
srl-en_mbert-base |
63.07 | 58.56 |
srl-enpt_xlmr-base |
76.50 | 73.74 |
srl-enpt_xlmr-large |
78.22 | 74.55 |
srl-enpt_mbert-base |
74.88 | 69.19 |
ud_srl-pt_bertimbau-large |
77.53 | 74.49 |
ud_srl-pt_xlmr-large |
77.69 | 74.91 |
ud_srl-enpt_xlmr-large |
77.97 | 75.05 |
BibTeX entry and citation info
@misc{oliveira2021transformers,
title={Transformers and Transfer Learning for Improving Portuguese Semantic Role Labeling},
author={Sofia Oliveira and Daniel Loureiro and Alípio Jorge},
year={2021},
eprint={2101.01213},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 9