Towards a FAIR Dataset for Spanish Non-Functional Requirements
- María Isabel Limaylla Lunarejo
- Nelly Condori Fernandez 1
- Miguel R. Luaces 2
-
1
Universidade de Santiago de Compostela
info
-
2
Universidade da Coruña
info
- Manuel Lagos Rodríguez (ed. lit.)
- Álvaro Leitao Rodríguez (ed. lit.)
- Tirso Varela Rodeiro (ed. lit.)
- Javier Pereira Loureiro (coord.)
- Manuel Francisco González Penedo (coord.)
Editorial: Servizo de Publicacións ; Universidade da Coruña
Ano de publicación: 2023
Congreso: XoveTIC (6. 2023. A Coruña)
Tipo: Achega congreso
Resumo
Supervised Machine Learning algorithms (ML) have enhanced the performance of the automatic non-functional requirements (NFR) classification in the Requirements Engineering domain. However, the lack of public datasets, dealing with imbalanced datasets and reproducibility are current concerns in ML experiments. We conducted a quasi-experiment to generate a dataset of NFR in the Spanish Language, following the FAIR Principles. We collected 109 requirements from an open access repository of the University of A Coru˜ na, and performed a labeling process based in the categories and subcategories of the ISO/IEC 25010 quality model. Using a Fleiss’ Kappa test we obtained a substantial agreement (0.78) at the category level and a moderate agreement (0.48) when the classification is per subcategory supervised Machine Learning algorithms (ML) have enhanced the performance of the automatic non-functional requirements (NFR) classification in the Requirements Engineering domain. However, the lack of public datasets, dealing with imbalanced datasets and reproducibility are current concerns in ML experiments. We conducted a quasi-experiment to generate a dataset of NFR in the Spanish Language, following the FAIR Principles. We collected 109 requirements from an open access repository of the University of A Coruña, and performed a labeling process based in the categories and subcategories of the ISO/IEC 25010 quality model. Using a Fleiss’ Kappa test we obtained a substantial agreement (0.78) at the category level and a moderate agreement (0.48) when the classification is per subcategory