<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2227-1899</journal-id>
<journal-title><![CDATA[Revista Cubana de Ciencias Informáticas]]></journal-title>
<abbrev-journal-title><![CDATA[Rev cuba cienc informat]]></abbrev-journal-title>
<issn>2227-1899</issn>
<publisher>
<publisher-name><![CDATA[Editorial Ediciones Futuro]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2227-18992020000200034</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Using K-means algorithm for regression curve in big data system for business environment]]></article-title>
<article-title xml:lang="es"><![CDATA[Usando el algoritmo K-means para la curva de regresión en un gran sistema de datos para el entorno empresarial]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Naoui]]></surname>
<given-names><![CDATA[Mohammed Anouar]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Lejdel]]></surname>
<given-names><![CDATA[Brahim]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Ayad]]></surname>
<given-names><![CDATA[Mouloud]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,University of Bouira Computer science department,Faculty of Sciences and applied Sciences LIMPAF Laboratory]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<aff id="Af2">
<institution><![CDATA[,University of El-Oued Computer science department ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<aff id="Af3">
<institution><![CDATA[,University of Bouira Faculty of Sciences and applied Sciences LPM3E Laboratory]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<volume>14</volume>
<numero>2</numero>
<fpage>34</fpage>
<lpage>48</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2227-18992020000200034&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2227-18992020000200034&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2227-18992020000200034&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT Predictive analysis quickly becomes a decisive advantage for desired range of Business activities. It involves methods and technologies for organizations to identify models or patterns for data. Big data bring enormous benefits to the business process. Big data properties such as volume, velocity, variety, variation and veracity, render the existing techniques of data analysis not sufficient. Big data analysis requires the fusion of regression techniques for data mining with those of machine learning. Big data regression is an important field for many researchers, several aspects, methods, and techniques proposed. In this context, we suggest regression curve models for big data system. Our proposition is based on cooperative MapReduce architecture. We offer Map and Reduce algorithms for curve regression, in the Map phase; data transform in the linear model, in the reduce phase we propose a k-means algorithm for clustering the results of Map phase. K-means algorithm is one of the most popular partition clustering algorithms; it is simple, statistical and considerably scalable. Also, it has linear asymptotic running time concerning any variable of the problem. This approach combines the advantage of regression and clustering methods in big data. The regression method extract mathematic models, and in clustering, k-means algorithm select the best mathematic model as clusters.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN El análisis predictivo se convierte rápidamente en una ventaja decisiva para la gama de actividades comerciales deseadas. Implica métodos y tecnologías para que las organizaciones identifiquen modelos o patrones de datos. Los grandes datos aportan enormes beneficios al proceso empresarial. Las grandes propiedades de los datos, como el volumen, la velocidad, la variedad, la variación y la veracidad, hacen que las técnicas existentes de análisis de datos no sean suficientes. El análisis de grandes datos requiere la fusión de las técnicas de regresión para la minería de datos con las de aprendizaje automático. La regresión de grandes datos es un campo importante para muchos investigadores, varios aspectos, métodos y técnicas propuestas. En este contexto, sugerimos modelos de curvas de regresión para grandes sistemas de datos. Nuestra propuesta se basa en la arquitectura cooperativa de MapReduce. Ofrecemos algoritmos Map y Reduce para la regresión de la curva, en la fase Map; la transformación de datos en el modelo lineal, en la fase reduce proponemos un algoritmo k-means para agrupar los resultados de la fase Map. El algoritmo K-means es uno de los algoritmos de clustering de particiones más populares; es simple, estadístico y considerablemente escalable. Además, tiene un tiempo de ejecución asintótica lineal en relación con cualquier variable del problema. Este enfoque combina la ventaja de los métodos de regresión y agrupación en grandes datos. El método de regresión extrae modelos matemáticos, y en la agrupación, el algoritmo k-means selecciona el mejor modelo matemático como agrupaciones.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Cooperation MapReduce algorithm]]></kwd>
<kwd lng="en"><![CDATA[Big Data]]></kwd>
<kwd lng="en"><![CDATA[Regression Curve]]></kwd>
<kwd lng="en"><![CDATA[k-means algorithm]]></kwd>
<kwd lng="en"><![CDATA[Business environmental scanning]]></kwd>
<kwd lng="es"><![CDATA[Algoritmo de cooperación MapReduce]]></kwd>
<kwd lng="es"><![CDATA[Big Data]]></kwd>
<kwd lng="es"><![CDATA[Curva de Regresión]]></kwd>
<kwd lng="es"><![CDATA[algoritmo k-means]]></kwd>
<kwd lng="es"><![CDATA[exploración del entorno empresarial]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bollobás]]></surname>
<given-names><![CDATA[Béla]]></given-names>
</name>
</person-group>
<source><![CDATA[Linear analysis.]]></source>
<year>1990</year>
<page-range>10p</page-range><publisher-loc><![CDATA[Cambridge ]]></publisher-loc>
<publisher-name><![CDATA[Cambridge University Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cover]]></surname>
<given-names><![CDATA[T. M]]></given-names>
</name>
</person-group>
<source><![CDATA[Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition.]]></source>
<year>1965</year>
<volume>3</volume>
<page-range>326-34</page-range><publisher-name><![CDATA[IEEE transactions on electronic computers]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dean]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Ghemawat]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[MapReduce: a flexible data processing tool]]></article-title>
<source><![CDATA[Communications of the ACM]]></source>
<year>2010</year>
<volume>53</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>72-7</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Golberg]]></surname>
<given-names><![CDATA[Michael A]]></given-names>
</name>
<name>
<surname><![CDATA[Hokwon]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[. Introduction to regression analysis]]></source>
<year>2004</year>
<page-range>3p</page-range><publisher-name><![CDATA[WIT press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Pei]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Kamber]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Data mining: concepts and techniques]]></source>
<year>2011</year>
<publisher-name><![CDATA[Elsevier]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jun]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[S. J]]></given-names>
</name>
<name>
<surname><![CDATA[Ryu]]></surname>
<given-names><![CDATA[J. B]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A Divided Regression Analysis for Big Data]]></article-title>
<source><![CDATA[Statistics]]></source>
<year>2015</year>
<volume>9</volume>
<numero>5</numero>
<issue>5</issue>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Krishna]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
</person-group>
<source><![CDATA[Open source implementation of MapReduce]]></source>
<year>2010</year>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Leveraging for big data regression. Wiley Interdisciplinary Re- views]]></article-title>
<source><![CDATA[Computational Statistics]]></source>
<year>2015</year>
<volume>7</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>70-6</page-range></nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Naoui]]></surname>
<given-names><![CDATA[M. A]]></given-names>
</name>
<name>
<surname><![CDATA[Mcheick]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Kazar]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
</person-group>
<source><![CDATA[Mobile Agent approach based on mo- bile strategic environmental Scanning using Android and JADELEAP systems.]]></source>
<year>2014</year>
<page-range>1-7</page-range></nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Neyshabouri]]></surname>
<given-names><![CDATA[M. M]]></given-names>
</name>
<name>
<surname><![CDATA[Demir]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
<name>
<surname><![CDATA[Delibalta]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Kozat]]></surname>
<given-names><![CDATA[S. S]]></given-names>
</name>
</person-group>
<source><![CDATA[Highly efficient non- linear regression for big data with lexicographical splitting]]></source>
<year>2016</year>
<page-range>1-8</page-range><publisher-name><![CDATA[Signal, Image and Video Processing]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Oancea]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
</person-group>
<source><![CDATA[Linear Regression With R And HADOOP.International Conference]]></source>
<year>2015</year>
<page-range>p1007</page-range><publisher-name><![CDATA[CKS Challenges of the Knowledge Soc]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shafer]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Rixner]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Cox]]></surname>
<given-names><![CDATA[A. L]]></given-names>
</name>
</person-group>
<source><![CDATA[The hadoop distributed filesystem: Balancing portability and performance.]]></source>
<year>2010</year>
<page-range>122-33</page-range><publisher-name><![CDATA[ISPASS]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[V.Martha]]></surname>
<given-names><![CDATA[W. Zhao]]></given-names>
</name>
<name>
<surname><![CDATA[Xiaowei]]></surname>
<given-names><![CDATA[Xu]]></given-names>
</name>
</person-group>
<source><![CDATA[h-MapReduce: A Framework for Workload Balancing in MapReduce]]></source>
<year>2013</year>
<page-range>637-44</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Xiong]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Jin]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<source><![CDATA[Random Bits Regression: a Strong General Predictor for Big Data]]></source>
<year>2015</year>
<publisher-name><![CDATA[arXiv preprint arXiv]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Willems]]></surname>
<given-names><![CDATA[F. M]]></given-names>
</name>
<name>
<surname><![CDATA[Shtarkov]]></surname>
<given-names><![CDATA[Y. M]]></given-names>
</name>
<name>
<surname><![CDATA[Tjalkens]]></surname>
<given-names><![CDATA[T. J]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Context weighting for general finite-context sources.]]></article-title>
<source><![CDATA[IEEE transactions on information theory]]></source>
<year>1996</year>
<volume>42</volume>
<numero>5</numero>
<issue>5</issue>
<page-range>1514-20</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
