Meu SciELO
Serviços Personalizados
Artigo
Indicadores
- Citado por SciELO
Links relacionados
- Similares em SciELO
Compartilhar
Revista Cubana de Ciencias Informáticas
versão On-line ISSN 2227-1899
Resumo
CHAVEZ CARDENAS, María del Carmen. Improvements in the classification of protein-protein interactions of Arabidopsis Thaliana sequences using unbalanced database techniques. Rev cuba cienc informat [online]. 2019, vol.13, n.3, pp. 91-106. ISSN 2227-1899.
A challenge of the scientific communities in the area of Machine Learning is a correct classification in unbalanced data sets. In Bioinformatics problems it is very common to have large case base, in most cases these are unbalanced, the minority class almost always being the main research interest. Several methods of automatic learning have been developed to address the problem of unbalanced classes. Techniques are at the level of the algorithms and others are focused on the data. Among the methods used for data processing are those that focus on trying to balance the sets, reducing the class with more samples, or expanding the smaller ones, known as under-sampling and over-sampling respectively. In this work is try to be improved the classification for the protein-protein interactions for the Arabidopsis Thaliana plant obtained by the Department of Plant Systems Biology at the University of Ghent, which presents an imbalance of classes. The experimentation is carried out applying a compendium of different research oriented to the edition of the training sets to try to improve the classification of the Protein-Protein Interactions.
Palavras-chave : Classification; unbalanced data sets; Machine Learning; Protein-Protein Interactions.