SciELO - Scientific Electronic Library Online

 
vol.20 número2Características clínicas de pacientes en edad pediátrica con infección por SARS-CoV-2. Cienfuegos, 2020-2021HistoBCAD: herramienta de código abierto para detección de cáncer de mama en imágenes histopatológicas índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Articulo

Indicadores

  • No hay articulos citadosCitado por SciELO

Links relacionados

  • No hay articulos similaresSimilares en SciELO

Compartir


MediSur

versión On-line ISSN 1727-897X

Resumen

PIRCHIO, Rosana. Classification of breast cancer with analysis techniques of the principal component-Kernel PCA, support vector machine algorithms and logistic regression. Medisur [online]. 2022, vol.20, n.2, pp. 199-209.  Epub 30-Abr-2022. ISSN 1727-897X.

Background:

there are many computational tools for managing images and data sets; reducing the size of these favors the management of information.

Objective:

reduce the data set size for better information management.

Methods:

the Breast Cancer Wisconsin data set (biopsy information - nuclear cells) and the Python Jupyter platform were used. Principal Component Analysis (PCA) and Kernel PCA (kPCA) techniques were implemented to reduce the dimension to 2, 4, 6. Cross-validation was made to select the best hyperparameters of the regression and support vector machine algorithms Logistics. The classification was carried out with the original training test, training test (PCA and kPCA) and training test (data transformed from PCA and kPCA). Accuracy, precision, completeness, recovery, and area under the curve were analyzed.

Results:

the PCA with six components explained the variation rate by almost 90%. The best hyperparameters found for the vector support machine: linear kernel and C = 100, for logistic regression were C = 100, Newton-cg solution (solver) and I2. The best results of the metrics were for PCA 2 and 4 (0.99, 0.99, 1, 0.99, 0.99). For the training set with original data they were 0.96; 0.95; 0.99; 0.97; 0.95. For logistic regression the best results were for kPCA with 6 components. The statistical results were equal to 1. For the training set with original data, these values were 0.96; 0.95; 0.99; 0.97; 0.95.

Conclusions:

the results of the metrics improved using PCA and kPCA.

Palabras clave : machine learning; artificial intelligence; data management.

        · resumen en Español     · texto en Español     · Español ( pdf )