SciELO - Scientific Electronic Library Online

 
vol.39 número3Estrategias de rendezvous en redes Radio Cognitiva multiusuarioAdquisición de datos analógicos con alta precisión usando una Computadora de Placa Única índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

  • No hay articulos citadosCitado por SciELO

Links relacionados

  • No hay articulos similaresSimilares en SciELO

Compartir


Ingeniería Electrónica, Automática y Comunicaciones

versión On-line ISSN 1815-5928

Resumen

SONORA-MENGANA, Alexander et al. Evaluation of data balancing techniques. Application to CAD of lung nodules using the LUNA16 framework. EAC [online]. 2018, vol.39, n.3, pp.57-67. ISSN 1815-5928.

Due to the high incidence of lung cancer, computer-aided detection (CAD) systems may play an increasingly important role in screening. Classification in CAD systems has to deal with highly imbalanced datasets composed by actual nodules and non-nodule structures. The application of data balancing techniques helps the training process of the classifiers, making the generation of the classification rules more effective. The purpose of this paper is to compare the performance of different data balancing techniques applied to the classification of lung nodules. According to the reviewed literature, this is the first time that different data balancing methods are evaluated on the problem of lung nodule detection using a large data set and at low false positive rates. A web-based framework was used to evaluate the different methods applied to a classical CAD system (ETROCAD) presented in the LUNA16 Challenge by calculating a score of average sensitivity at different values of false positives per scan. In our experiments, data balancing using SMOTE and SMOTE-TL led to the best results, with a score of 0.760 and 0.759 respectively, in comparison to 0.748 when not balancing the data. Although the impact on the overall score may seem marginal, adequate data balancing resulted in the correct classification of 36 additional candidate nodules at 4 FP/scan. At the time of writing this paper, the SMOTE-based ETROCAD system had the best score among all the classical systems using handcrafted features in LUNA16 web site.

Palabras clave : Data balance; Computer Aided Detection; Near-Miss; CNN; Random Under-sample; Tomek links; Self-Organized Map; Random Over-sample; ADASYN; SMOTE; LUNA16.

        · resumen en Español     · texto en Inglés     · Inglés ( pdf )