<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2227-1899</journal-id>
<journal-title><![CDATA[Revista Cubana de Ciencias Informáticas]]></journal-title>
<abbrev-journal-title><![CDATA[Rev cuba cienc informat]]></abbrev-journal-title>
<issn>2227-1899</issn>
<publisher>
<publisher-name><![CDATA[Editorial Ediciones Futuro]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2227-18992021000500199</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Arquitectura distribuida de alta disponibilidad para la detección de fraude]]></article-title>
<article-title xml:lang="en"><![CDATA[High-availability distributed architecture for fraud detection]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[García Núñez]]></surname>
<given-names><![CDATA[Alejandro]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Olmedo Flores]]></surname>
<given-names><![CDATA[Jorge Luis]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Empresa de Telecomunicaciones de Cuba  ]]></institution>
<addr-line><![CDATA[ La Habana]]></addr-line>
</aff>
<aff id="Af2">
<institution><![CDATA[,Calle F No. 658  ]]></institution>
<addr-line><![CDATA[ La Habana]]></addr-line>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>00</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>00</month>
<year>2021</year>
</pub-date>
<volume>15</volume>
<numero>4</numero>
<fpage>199</fpage>
<lpage>224</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2227-18992021000500199&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2227-18992021000500199&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2227-18992021000500199&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN La detección temprana, rápida y eficaz del fraude en el sector de las telecomunicaciones se ha convertido en la punta de lanza para enfrentar las más complejas y diversas vías en la que pueden producirse los ataques y el fraude. Para su detección se emplean diferentes técnicas, herramientas y algoritmos como el aprendizaje automático el cual es una rama de la Inteligencia Artificial que permite a las computadoras aprender. Para poder aprovechar al máximo las ventajas del aprendizaje automático, se configuran arquitecturas de hardware y software robustas. Estas son configuradas de forma distribuida permitiendo a un conjunto de equipos trabajar como uno solo de forma transparente, aumentando el rendimiento y su procesamiento. El objetivo del pre- sente trabajo es desarrollar una arquitectura distribuida de alta disponibilidad mediante la plataforma de datos Hortonworks que permita aplicar técnicas de aprendizaje automático en la detección de fraude. Se instalaron y configuraron los componentes de Apache que presenta como Spark, HBase y Hadoop los cuales permiten analizar tráfico en grandes cantidades de datos. Se muestra un ejemplo del resultado de aplicar el algoritmo de aprendizaje automático K-means empleando la librería PySpark para la creación de clusters. La instalación y configuración de la plataforma de datos Hortonworks dio como resultado una arquitectura que cuenta con alta disponibilidad, flexible, escalable, tolerante a fallos y permite emplear el aprendizaje automático en la detección de fraude.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT Early, fast and effective fraud detection in the telecommunications sector has become the spearhead for dealing with the most complex and diverse ways in which attacks and fraud can occur. Different techniques, tools and algorithms are used for detection, such as machine learning, which is a branch of Artificial Intelligence that allows computers to learn. In order to take full advantage of the benefits of machine learning, robust hardware and software architectures are set up. These are configured in a distributed manner allowing a set of computers to work as one in a transparent way, increasing performance and processing. The objective of this work is to develop a highly available distributed architecture using the Hortonworks data platform that allows the application of machine learning techniques in fraud detection. Apache components such as Spark, HBase and Hadoop were installed and configured to analyze traffic in large amounts of data. An example of the result of applying the K-means machine learning algorithm using the PySpark library for the creation of clusters is shown. The installation and configuration of the Hortonworks data platform resulted in an architecture that is highly available, flexible, scalable, fault tolerant and allows the use of machine learning for fraud detection.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Detección de fraude]]></kwd>
<kwd lng="es"><![CDATA[Aprendizaje automático]]></kwd>
<kwd lng="es"><![CDATA[Arquitectura distribuida]]></kwd>
<kwd lng="en"><![CDATA[Fraud detection]]></kwd>
<kwd lng="en"><![CDATA[Distributed Architecture]]></kwd>
<kwd lng="en"><![CDATA[Machine learning]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Talal Almansouri]]></surname>
<given-names><![CDATA[Hatim]]></given-names>
</name>
<name>
<surname><![CDATA[Masmoudi]]></surname>
<given-names><![CDATA[Youssef]]></given-names>
</name>
</person-group>
<source><![CDATA[Hadoop distributed file system for big data analysis]]></source>
<year>2019</year>
<page-range>1-5</page-range><publisher-name><![CDATA[2019 4th World Conference on Complex Systems (WCCS),]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="">
<collab>Apache HBase Team</collab>
<source><![CDATA[Apache HBase Reference Guide]]></source>
<year>2021</year>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Artho]]></surname>
<given-names><![CDATA[Cyrille]]></given-names>
</name>
<name>
<surname><![CDATA[Gros]]></surname>
<given-names><![CDATA[Quentin]]></given-names>
</name>
<name>
<surname><![CDATA[Rousset]]></surname>
<given-names><![CDATA[Guillaume]]></given-names>
</name>
<name>
<surname><![CDATA[Banzai]]></surname>
<given-names><![CDATA[Kazuaki]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[Lei]]></given-names>
</name>
<name>
<surname><![CDATA[Kitamura]]></surname>
<given-names><![CDATA[Takashi]]></given-names>
</name>
<name>
<surname><![CDATA[Hagi- ya]]></surname>
<given-names><![CDATA[Masami]]></given-names>
</name>
<name>
<surname><![CDATA[Tanabe]]></surname>
<given-names><![CDATA[Yoshinori]]></given-names>
</name>
<name>
<surname><![CDATA[Yamamoto]]></surname>
<given-names><![CDATA[Mitsuharu]]></given-names>
</name>
</person-group>
<source><![CDATA[Model-based api testing of apache zookeeper.]]></source>
<year>2017</year>
<page-range>288-98</page-range><publisher-name><![CDATA[2017 IEEE International Conference on Software Testing, Verification and Validation (ICST]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[singh Bhathal]]></surname>
<given-names><![CDATA[Gurjit]]></given-names>
</name>
<name>
<surname><![CDATA[Singh Dhiman]]></surname>
<given-names><![CDATA[Amardeep]]></given-names>
</name>
</person-group>
<source><![CDATA[Big data solution: Improvised distributions framework of hadoop]]></source>
<year>2018</year>
<page-range>35-8</page-range><publisher-name><![CDATA[In 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS)]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="">
<collab>CFCA</collab>
<source><![CDATA[Global Fraud Loss Survey]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="">
<collab>Cisco</collab>
<source><![CDATA[Mobile Visual Networking Index (VNI) Infographic - Cisco]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="">
<collab>Inc. Cloudera</collab>
<source><![CDATA[Hortonworks data platform apache spark component guide]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="">
<collab>Inc. Cloudera</collab>
<source><![CDATA[Hortonworks data platform apache hadoop high availability]]></source>
<year>2018</year>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="">
<collab>Cloudera, Inc</collab>
<source><![CDATA[Hortonworks Data Platform Apache Spark Component Guide]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="">
<collab>Cloudera, Inc</collab>
<source><![CDATA[Planning for the HDP Cluster]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hilas]]></surname>
<given-names><![CDATA[Constantinos S]]></given-names>
</name>
<name>
<surname><![CDATA[Mastorocostas]]></surname>
<given-names><![CDATA[Paris]]></given-names>
</name>
<name>
<surname><![CDATA[Mastorocostas]]></surname>
<given-names><![CDATA[Paris A]]></given-names>
</name>
<name>
<surname><![CDATA[Rekanos]]></surname>
<given-names><![CDATA[Ioannis T]]></given-names>
</name>
</person-group>
<source><![CDATA[Clustering of telecommunications user profiles for fraud detection and security enhancement in large corporate networks: a case study.]]></source>
<year>2015</year>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hiraman]]></surname>
<given-names><![CDATA[Bhole Rahul]]></given-names>
</name>
<name>
<surname><![CDATA[Viresh M]]></surname>
<given-names><![CDATA[Chapte]]></given-names>
</name>
<name>
<surname><![CDATA[Abhijeet C]]></surname>
<given-names><![CDATA[Karve]]></given-names>
</name>
</person-group>
<source><![CDATA[A study of apache kafka in big data stream processing]]></source>
<year>2018</year>
<page-range>1-3</page-range><publisher-name><![CDATA[2018 International Conference on Information , Communication, Engineering and Techno- logy]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tanenbaum]]></surname>
<given-names><![CDATA[Andrew S]]></given-names>
</name>
<name>
<surname><![CDATA[van Steen]]></surname>
<given-names><![CDATA[Maarten]]></given-names>
</name>
</person-group>
<source><![CDATA[Distributed systems]]></source>
<year>2018</year>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Simon]]></surname>
<given-names><![CDATA[Herbert A]]></given-names>
</name>
<name>
<surname><![CDATA[Veloso]]></surname>
<given-names><![CDATA[Manuela]]></given-names>
</name>
</person-group>
<source><![CDATA[Aprendizaje Automático]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Minukhin]]></surname>
<given-names><![CDATA[Sergii]]></given-names>
</name>
<name>
<surname><![CDATA[Brynza]]></surname>
<given-names><![CDATA[Natalia]]></given-names>
</name>
<name>
<surname><![CDATA[Sitnikov]]></surname>
<given-names><![CDATA[Dmytro]]></given-names>
</name>
</person-group>
<source><![CDATA[Analyzing performance of apache spark mllib with multinode clusters on azure hdinsight: Spark-perf case study]]></source>
<year>2021</year>
<page-range>114-34</page-range><publisher-name><![CDATA[Springer International Publishing]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pandagale]]></surname>
<given-names><![CDATA[Ashwini A.]]></given-names>
</name>
<name>
<surname><![CDATA[Surve]]></surname>
<given-names><![CDATA[Anil R]]></given-names>
</name>
</person-group>
<source><![CDATA[Hadoop-hbase for finding association rules using apriori mapreduce algorithm]]></source>
<year>2016</year>
<page-range>795-8</page-range><publisher-name><![CDATA[IEEE International Conference on Recent Trends in Electronics, Information Commu- nication Technology (RTEICT)]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Perwej]]></surname>
<given-names><![CDATA[Yusuf]]></given-names>
</name>
<name>
<surname><![CDATA[Kerim]]></surname>
<given-names><![CDATA[Bedine]]></given-names>
</name>
<name>
<surname><![CDATA[Adrees]]></surname>
<given-names><![CDATA[Mohammed]]></given-names>
</name>
<name>
<surname><![CDATA[Sheta]]></surname>
<given-names><![CDATA[Osama]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[An empirical exploration of the yarn in big data]]></article-title>
<source><![CDATA[International Journal of Applied Information Systems (IJAIS)]]></source>
<year>2017</year>
<volume>12</volume>
<page-range>19-29</page-range></nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Quoc]]></surname>
<given-names><![CDATA[Do Le]]></given-names>
</name>
<name>
<surname><![CDATA[Gregor]]></surname>
<given-names><![CDATA[Franz]]></given-names>
</name>
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[Jatinder]]></given-names>
</name>
<name>
<surname><![CDATA[Fetzer]]></surname>
<given-names><![CDATA[Christof]]></given-names>
</name>
</person-group>
<source><![CDATA[Sgx-pyspark: Secure distributed data analy- tics.]]></source>
<year>2019</year>
<page-range>3563-4</page-range><publisher-name><![CDATA[ACM Digital Library]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shaikh]]></surname>
<given-names><![CDATA[Eman]]></given-names>
</name>
<name>
<surname><![CDATA[Mohiuddin]]></surname>
<given-names><![CDATA[Iman]]></given-names>
</name>
<name>
<surname><![CDATA[Alufaisan]]></surname>
<given-names><![CDATA[Yasmeen]]></given-names>
</name>
<name>
<surname><![CDATA[Nahvi]]></surname>
<given-names><![CDATA[Irum]]></given-names>
</name>
</person-group>
<source><![CDATA[Apache spark: A big data processing engine]]></source>
<year>2019</year>
<page-range>1-6</page-range><publisher-name><![CDATA[IEEE Middle East and North Africa COMMunications Conference (MENACOMM),]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[McGraw]]></surname>
<given-names><![CDATA[Hill]]></given-names>
</name>
<name>
<surname><![CDATA[Mitchell]]></surname>
<given-names><![CDATA[Tom]]></given-names>
</name>
</person-group>
<source><![CDATA[Machine learning]]></source>
<year>1997</year>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[der Aalst]]></surname>
<given-names><![CDATA[Wil Van]]></given-names>
</name>
</person-group>
<source><![CDATA[Process mining: Data science in action]]></source>
<year>2016</year>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Weiqing]]></surname>
<given-names><![CDATA[Yang]]></given-names>
</name>
<name>
<surname><![CDATA[Tang]]></surname>
<given-names><![CDATA[Mingjie]]></given-names>
</name>
<name>
<surname><![CDATA[Yongyang]]></surname>
<given-names><![CDATA[Yu]]></given-names>
</name>
<name>
<surname><![CDATA[Liang]]></surname>
<given-names><![CDATA[Yanbo]]></given-names>
</name>
<name>
<surname><![CDATA[Saha]]></surname>
<given-names><![CDATA[Bikas]]></given-names>
</name>
</person-group>
<source><![CDATA[Shc: Distributed query processing for non-relational data store]]></source>
<year>2018</year>
<page-range>1465-76</page-range><publisher-name><![CDATA[2018 IEEE 34th International Conference on Data Engineering (ICDE),]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
