Using K-means algorithm for regression curve in big data system for business environment

Naoui, Mohammed Anouar; Lejdel, Brahim; Ayad, Mouloud

Meu SciELO

Serviços customizados

Serviços Personalizados

Journal

Artigo

Enviar este artigo por email

Indicadores

Citado por SciELO

Links relacionados

Similares em SciELO

Permalink

Revista Cubana de Ciencias Informáticas

versão On-line ISSN 2227-1899

Resumo

NAOUI, Mohammed Anouar; LEJDEL, Brahim e AYAD, Mouloud. Using K-means algorithm for regression curve in big data system for business environment. Rev cuba cienc informat [online]. 2020, vol.14, n.2, pp.34-48. Epub 01-Jun-2020. ISSN 2227-1899.

Predictive analysis quickly becomes a decisive advantage for desired range of Business activities. It involves methods and technologies for organizations to identify models or patterns for data. Big data bring enormous benefits to the business process. Big data properties such as volume, velocity, variety, variation and veracity, render the existing techniques of data analysis not sufficient. Big data analysis requires the fusion of regression techniques for data mining with those of machine learning. Big data regression is an important field for many researchers, several aspects, methods, and techniques proposed. In this context, we suggest regression curve models for big data system. Our proposition is based on cooperative MapReduce architecture. We offer Map and Reduce algorithms for curve regression, in the Map phase; data transform in the linear model, in the reduce phase we propose a k-means algorithm for clustering the results of Map phase. K-means algorithm is one of the most popular partition clustering algorithms; it is simple, statistical and considerably scalable. Also, it has linear asymptotic running time concerning any variable of the problem. This approach combines the advantage of regression and clustering methods in big data. The regression method extract mathematic models, and in clustering, k-means algorithm select the best mathematic model as clusters.

Palavras-chave : Cooperation MapReduce algorithm; Big Data; Regression Curve; k-means algorithm; Business environmental scanning.

· resumo em Espanhol · texto em Inglês · Inglês (

pdf )