<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2227-1899</journal-id>
<journal-title><![CDATA[Revista Cubana de Ciencias Informáticas]]></journal-title>
<abbrev-journal-title><![CDATA[Rev cuba cienc informat]]></abbrev-journal-title>
<issn>2227-1899</issn>
<publisher>
<publisher-name><![CDATA[Editorial Ediciones Futuro]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2227-18992020000400134</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Integración de rasgos y aprendizaje semi-supervisado para la clasificación funcional de enzimas utilizando K-medias de Spark]]></article-title>
<article-title xml:lang="en"><![CDATA[Feature integration and semi-supervised learning for functional enzyme classification by using Spark K-means]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[González Valle]]></surname>
<given-names><![CDATA[Yadelis]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Galpert]]></surname>
<given-names><![CDATA[Deborah]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
<xref ref-type="aff" rid="Aaf"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Molina-Ruiz]]></surname>
<given-names><![CDATA[Reinaldo]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Aguero-Chapin]]></surname>
<given-names><![CDATA[Guillermin]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV) Centro de Investigaciones Informáticas ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Cuba</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Universidad Central ¨Marta Abreu¨ de Las Villas (UCLV) Departamento de Ciencia de la Computación ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Cuba</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Universidad Central &#8220;Marta Abreu&#8221; de Las Villas (UCLV) Centro de Bioactivos Químicos (CBQ) ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Cuba</country>
</aff>
<aff id="Af4">
<institution><![CDATA[,CIIMAR | Interdisciplinary Centre of Marine and Environmental Research of the University of Porto  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Portugal.</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2020</year>
</pub-date>
<volume>14</volume>
<numero>4</numero>
<fpage>134</fpage>
<lpage>161</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2227-18992020000400134&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2227-18992020000400134&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2227-18992020000400134&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN La clasificación funcional de las enzimas constituye un campo de gran interés para la bioinformática desde hace varios años. Dicha clasificación debe tener en cuenta la escasa información de algunas clases, el desbalance entre ellas y el número creciente de enzimas a clasificar. En este artículo investigamos el uso de algoritmos de agrupamiento semi-supervisados y no supervisados para agrupar secuencias similares de enzimas, a partir de la integración de descriptores de proteínas libres de alineamiento basados en el método de k-mers con diferentes valores de k. Se implementaron en Spark cuatro algoritmos que agrupan las enzimas de acuerdo a su función enzimática. Estos estas basados en transformaciones a métodos existentes como el Combinatorio Lógico Global, el K-medias y el Ensamblado de Agrupamientos. La calidad del agrupamiento se midió usando como medida interna el índice de silueta y como medida externa la medida-F. En la experimentación, se tomaron como referencia 58 secuencias funcionalmente caracterizadas de 501 enzimas de la familia Glicosil Hidrolasa-70 (GH-70) (con un alto valor para la biotecnología y que a su vez pueden ocasionar pérdidas millonarias en la producción de azúcar) de la base de datos CAZy, con el objetivo de comparar los resultados de los métodos de agrupamiento implementados. Se obtuvieron valores moderados del índice de silueta como medida interna pero mejor que los obtenidos con el método K-medias. Se alcanzaó el mejor valor de 0.9 de la medida-F del método del Ensamblado de Agrupamientos combinado con el aprendizaje semi-supervisado.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT The functional classification of enzymes has been a field of great interest for bioinformatics for several years. This classification must take into account the scarce information of some classes, the imbalance between them and the increasing number of enzymes to be classified. In this article we investigate the use of semi-supervised and unsupervised clustering algorithms to group similar enzyme sequences, from the integration of alignment-free protein descriptors based on the k-mers method with different k values. Four algorithms were implemented in Spark that group enzymes according to their enzymatic function. These are based on transformations to existing methods such as the Global Logic Combinatorial, the K-means and the Ensemble Clustering. The quality of the clustering was measured using the silhouette index as an internal measure and the F-measure as an external measure. In the experiment, 58 functionally characterized sequences of 501 enzymes of the Glicosil Hidrolasa-70 (GH-70) family (with a high value for biotechnology and that can cause millionaire losses in sugar production) from the CAZy database were taken as reference, with the objective of comparing the results of the implemented grouping methods. There were obtained moderate values of the silhouette index as an internal measure but better than those obtained with the K-means method. The best value of 0.9 of the F-measure of the Ensemble Clustering method combined with semi-supervised learning was achieved.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Agrupamiento de enzimas]]></kwd>
<kwd lng="es"><![CDATA[aprendizaje semi-supervisado]]></kwd>
<kwd lng="es"><![CDATA[aprendizaje no supervisado]]></kwd>
<kwd lng="es"><![CDATA[centroides K-medias]]></kwd>
<kwd lng="en"><![CDATA[Enzyme clustering]]></kwd>
<kwd lng="en"><![CDATA[K-mean centroids]]></kwd>
<kwd lng="en"><![CDATA[unsupervised learning]]></kwd>
<kwd lng="en"><![CDATA[semi-supervised learning]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Abdallah]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Yousef]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[GrpClassifierEC: a novel classification approach based on the ensemble clustering space]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[AK Ong]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Huang Lin]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Zong Chen]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Rong Li]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
<name>
<surname><![CDATA[Cao]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Efficacy of different protein descriptors in predicting protein functional families]]></article-title>
<source><![CDATA[BMC Bioinformatics]]></source>
<year>2007</year>
<volume>8</volume>
<numero>300</numero>
<issue>300</issue>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Arai]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Barakbah]]></surname>
<given-names><![CDATA[A. R]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Hierarchical K-means: an algorithm for centroids initialization for K-means.]]></article-title>
<source><![CDATA[Reports of the Faculty of Science and Engineering, Saga Univ.]]></source>
<year>2007</year>
<volume>36</volume>
<numero>1</numero>
<issue>1</issue>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Assefi]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Behravesh]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Tafti]]></surname>
<given-names><![CDATA[A. P]]></given-names>
</name>
</person-group>
<source><![CDATA[Big Data Machine Learning using Apache Spark MLlib.]]></source>
<year>2017</year>
<publisher-name><![CDATA[Conference: IEEE Big Data.]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[aeza-Yates]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Frakes]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
</person-group>
<source><![CDATA[Information Retrieval: Data Structures and Algorithms]]></source>
<year>1992</year>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Basu]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Banerjee]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Mooney]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<source><![CDATA[Semi-supervised Clustering by Seeding.]]></source>
<year>2002</year>
<page-range>27-34</page-range><publisher-name><![CDATA[Proceedings of the 19th International Conference on Machine Learning]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Brun]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Sima]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Hua]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Lowey]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Model-based evaluation of clustering validation measures.]]></source>
<year>2007</year>
<publisher-name><![CDATA[Pattern Recognition.]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chapelle]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
<name>
<surname><![CDATA[Schölkopf]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Zien]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Semi-Supervised Learning.]]></source>
<year>2006</year>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Davies]]></surname>
<given-names><![CDATA[G. J]]></given-names>
</name>
<name>
<surname><![CDATA[Sinnott]]></surname>
<given-names><![CDATA[M. L]]></given-names>
</name>
</person-group>
<source><![CDATA[The sequence&#8209;based classifications of carbohydrate&#8209;active enzymes. Sorting the diverse.]]></source>
<year>2008</year>
<page-range>27-32</page-range><publisher-name><![CDATA[Regulars Biochemical Journal Classic Papers]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Fraga Vidal]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Martínez]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Moulis]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Escalier]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Morel]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Remaud-Simeon]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Monsan]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A novel dextransucrase is produced by Leuconostoc citreum strain B/110-1-2: An isolate used for the industrial production of dextran and dextran derivatives.]]></article-title>
<source><![CDATA[Journal of Industrial Microbiology and Biotechnology,]]></source>
<year>2011</year>
<volume>38</volume>
<numero>9</numero>
<issue>9</issue>
<page-range>1499-506</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Galpert]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<source><![CDATA[Contribuciones al enfoque de comparación par a par en la detección de genes ortólogos]]></source>
<year>2016</year>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[González-Almagro]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Luengo]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Cano]]></surname>
<given-names><![CDATA[J.-R]]></given-names>
</name>
<name>
<surname><![CDATA[García]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[DILS: constrained clustering through Dual Iterative Local Search.]]></source>
<year>2020</year>
<publisher-name><![CDATA[Computers and Operations Research]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gunasinghe]]></surname>
<given-names><![CDATA[U]]></given-names>
</name>
<name>
<surname><![CDATA[Alahakoon]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Bedingfield]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[Extraction of high quality k-words for alignment-free sequence comparison]]></source>
<year>2014</year>
<volume>358</volume>
<page-range>31-51</page-range><publisher-name><![CDATA[Journal of Theoretical Biology,]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Halkidi]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Batistakis]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Vazirgiannis]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Clustering validity checking methods: part II]]></source>
<year>2002</year>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Kamber]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Data mining: concepts and techniques.]]></source>
<year>2001</year>
<publisher-loc><![CDATA[San Francisco ]]></publisher-loc>
<publisher-name><![CDATA[Morgan Kaufmann]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Höppner]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Klawonn]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Kruse]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Runkler]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<source><![CDATA[Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition]]></source>
<year>1999</year>
</nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[Anil K]]></given-names>
</name>
<name>
<surname><![CDATA[Dubes]]></surname>
<given-names><![CDATA[R. C]]></given-names>
</name>
</person-group>
<source><![CDATA[Algorithms for Clustering Data.]]></source>
<year>1988</year>
<publisher-name><![CDATA[Prentice Hall, Englewood Cliffs]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[Anil Kumar]]></given-names>
</name>
<name>
<surname><![CDATA[Murty]]></surname>
<given-names><![CDATA[M. N]]></given-names>
</name>
<name>
<surname><![CDATA[Flynn]]></surname>
<given-names><![CDATA[P. J]]></given-names>
</name>
</person-group>
<source><![CDATA[Data clustering: a review.]]></source>
<year>1999</year>
<publisher-name><![CDATA[ACM Computing Surveys.]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Khan]]></surname>
<given-names><![CDATA[S. S]]></given-names>
</name>
<name>
<surname><![CDATA[Ahmad]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Cluster center initialization algorithm for K-means clustering.]]></source>
<year>2004</year>
<publisher-name><![CDATA[Elsevier B.V]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Koutroumbas]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Theodoridis]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[Pattern Recognition.]]></source>
<year>2008</year>
<publisher-name><![CDATA[Academic Press]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kruse]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Döring]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Lesot]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Fundamentals of Fuzzy Clustering]]></source>
<year>2007</year>
<publisher-name><![CDATA[in Advances in Fuzzy Clustering and its Applications (J. Valente)]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B22">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lange]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Law]]></surname>
<given-names><![CDATA[M. H. C]]></given-names>
</name>
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[A. K]]></given-names>
</name>
<name>
<surname><![CDATA[Buhmann]]></surname>
<given-names><![CDATA[J. M]]></given-names>
</name>
</person-group>
<source><![CDATA[Learning With Constrained and Unlabelled Data.]]></source>
<year>2005</year>
<page-range>731-8</page-range><publisher-name><![CDATA[In Proceedings of the 2005 IEEE conference on computer vision and pattern recognition,]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B23">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Umarov]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Xie]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Gao]]></surname>
<given-names><![CDATA[M. F]]></given-names>
</name>
<name>
<surname><![CDATA[Xin]]></surname>
<given-names><![CDATA[L. L]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[lDEEPre: sequence-based enzyme EC number prediction by deep learning.]]></article-title>
<source><![CDATA[Bioinformatics]]></source>
<year>2018</year>
<volume>34</volume>
<numero>5</numero>
<issue>5</issue>
<page-range>760-9</page-range></nlm-citation>
</ref>
<ref id="B24">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lombard]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
<name>
<surname><![CDATA[Ramulu]]></surname>
<given-names><![CDATA[H. G]]></given-names>
</name>
<name>
<surname><![CDATA[Drula]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Coutinho]]></surname>
<given-names><![CDATA[P. M]]></given-names>
</name>
<name>
<surname><![CDATA[Henrissat]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
</person-group>
<source><![CDATA[The carbohydrate-active enzymes database ( CAZy ) in 2013.]]></source>
<year>2014</year>
</nlm-citation>
</ref>
<ref id="B25">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Melsted]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Pritchard]]></surname>
<given-names><![CDATA[J. k]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Efficient counting of k-mers in DNA sequences using a bloom filter]]></article-title>
<source><![CDATA[BMC Bioinformatics]]></source>
<year>2011</year>
<volume>12</volume>
<numero>333</numero>
<issue>333</issue>
<page-range>1-7</page-range></nlm-citation>
</ref>
<ref id="B26">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Meng]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Gangoiti]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Bai]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Pijning]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Leeuwen]]></surname>
<given-names><![CDATA[S. S. Van]]></given-names>
</name>
<name>
<surname><![CDATA[Dijkhuizen]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Structure-function relationships of family GH70 glucansucrase and 4,6-&#945;-glucanotransferase enzymes, and their evolutionary relationships with family GH13 enzymes.]]></article-title>
<source><![CDATA[Cellular and Molecular Life Sciences]]></source>
<year>2016</year>
<volume>73</volume>
<numero>14</numero>
<issue>14</issue>
<page-range>2681-706</page-range></nlm-citation>
</ref>
<ref id="B27">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rosell]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Nada]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Kann]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
<name>
<surname><![CDATA[Litton]]></surname>
<given-names><![CDATA[J.-E]]></given-names>
</name>
</person-group>
<source><![CDATA[Comparing comparisons: Document clustering evaluation using two manual classifications.]]></source>
<year>2004</year>
<publisher-name><![CDATA[Proceedings of the International Conference on Natural Language Processing (ICON 2004)]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B28">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ruiz-Shulcloper]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Capítulo 10.- clasificación no supervisada: Algoritmos de estructuración de espacios cartesianos.]]></source>
<year>2019</year>
<publisher-name><![CDATA[En Reconocimiento lógico combinatorio de patrones: teoría y aplicaciones]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B29">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ruiz-Shulcloper]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Sánchez-Díaz]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
</person-group>
<source><![CDATA[A clustering method for very large mixed data sets.]]></source>
<year>2001</year>
<publisher-name><![CDATA[En Proceedings 2001 IEEE International Conference on Data Mining. IEEE.]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B30">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shen]]></surname>
<given-names><![CDATA[H.-B]]></given-names>
</name>
<name>
<surname><![CDATA[Chou]]></surname>
<given-names><![CDATA[K.-C]]></given-names>
</name>
</person-group>
<source><![CDATA[EzyPred: A top-down approach for predicting enzyme functional classes and subclasses.]]></source>
<year>2007</year>
<volume>364</volume>
<page-range>53-9</page-range><publisher-name><![CDATA[Biochemical and Biophysical Research Communications]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B31">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Steinbach]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Karypis]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Kumar]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
</person-group>
<source><![CDATA[A Comparison of Document Clustering Techniques.]]></source>
<year>2000</year>
<publisher-name><![CDATA[Proceedings of 6th ACM SIGKDD World Text Mining Conference]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B32">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Triguero]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[García]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Herrera]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Self-labeled techniques for semi-supervised learning:taxonomy, software and empirical study.]]></article-title>
<source><![CDATA[Knowledge and Information systems,]]></source>
<year>2015</year>
<volume>42</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>245-84</page-range></nlm-citation>
</ref>
<ref id="B33">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vinga]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Alignment-free methods in computational biology.]]></article-title>
<source><![CDATA[BRIEFINGS IN BIOINFORMATICS]]></source>
<year>2014</year>
<volume>15</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>341-2</page-range></nlm-citation>
</ref>
<ref id="B34">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vinga]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Almeida]]></surname>
<given-names><![CDATA[J. S]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Alignment-free sequence comparison]]></article-title>
<source><![CDATA[Bioinformatics]]></source>
<year>2003</year>
<volume>19</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>513-23</page-range></nlm-citation>
</ref>
<ref id="B35">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xiaojin]]></surname>
<given-names><![CDATA[Zhu]]></given-names>
</name>
</person-group>
<source><![CDATA[Semi-Supervised Learning Literature Survey]]></source>
<year>2005</year>
<publisher-name><![CDATA[U. of W.-M. D. of C. Sciences (ed.)]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B36">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xiong]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[K-means clustering versus validation measures: a data distribution perspective.]]></source>
<year>2006</year>
<publisher-name><![CDATA[En A. Press. (Ed.), Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B37">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zielezinski]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Girgis]]></surname>
<given-names><![CDATA[H. Z]]></given-names>
</name>
<name>
<surname><![CDATA[Bernard]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Leimeister]]></surname>
<given-names><![CDATA[C.-A]]></given-names>
</name>
<name>
<surname><![CDATA[Tang]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Dencker]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Lau]]></surname>
<given-names><![CDATA[A. K]]></given-names>
</name>
<name>
<surname><![CDATA[Röhling]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Choi]]></surname>
<given-names><![CDATA[J. J]]></given-names>
</name>
<name>
<surname><![CDATA[Waterman]]></surname>
<given-names><![CDATA[M. S]]></given-names>
</name>
<name>
<surname><![CDATA[Comin]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[S.-H]]></given-names>
</name>
<name>
<surname><![CDATA[Vinga]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Almeida]]></surname>
<given-names><![CDATA[J. S]]></given-names>
</name>
<name>
<surname><![CDATA[Chan]]></surname>
<given-names><![CDATA[C. X]]></given-names>
</name>
<name>
<surname><![CDATA[James]]></surname>
<given-names><![CDATA[B. T]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Morgenstern]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Karlowski]]></surname>
<given-names><![CDATA[W. M]]></given-names>
</name>
</person-group>
<source><![CDATA[Benchmarking of alignment-free sequence comparison methods.]]></source>
<year>2019</year>
<publisher-name><![CDATA[Genome Biology]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B38">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zielezinski]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Vinga]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Almeida]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Karlowski]]></surname>
<given-names><![CDATA[W. M]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Alignment-free sequence comparison: benefits, applications, and tools]]></article-title>
<source><![CDATA[Genome Biol]]></source>
<year>2017</year>
<volume>18</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>186</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
