<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2307-2113</journal-id>
<journal-title><![CDATA[Revista Cubana de Información en Ciencias de la Salud]]></journal-title>
<abbrev-journal-title><![CDATA[Rev. cuba. inf. cienc. salud]]></abbrev-journal-title>
<issn>2307-2113</issn>
<publisher>
<publisher-name><![CDATA[Centro Nacional de Información de Ciencias MédicasEditorial Ciencias Médicas]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2307-21132021000400015</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Caracterización de un corpus extraído de historias clínicas electrónicas de maternas a través de técnicas de procesamiento de lenguaje natural]]></article-title>
<article-title xml:lang="en"><![CDATA[Characterization of a corpus extracted from maternal electronic health records through natural language processing techniques]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Durango Barrera]]></surname>
<given-names><![CDATA[María Camila]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Torres Silva]]></surname>
<given-names><![CDATA[Ever Augusto]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Florez-Arango]]></surname>
<given-names><![CDATA[José Fernando]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Orozgo-Duque]]></surname>
<given-names><![CDATA[Andrés]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Instituto Tecnológico Metropolitano  ]]></institution>
<addr-line><![CDATA[ Medellín]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Universidad A&amp;M  ]]></institution>
<addr-line><![CDATA[ Texas]]></addr-line>
<country>Estados Unidos de América</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2021</year>
</pub-date>
<volume>32</volume>
<numero>4</numero>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2307-21132021000400015&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2307-21132021000400015&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2307-21132021000400015&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN Este artículo tuvo como propósito caracterizar el texto libre disponible en una historia clínica electrónica de una institución orientada a la atención de pacientes en embarazo. La historia clínica electrónica, más que ser un repositorio de datos, se ha convertido en un sistema de soporte a la toma de decisiones clínicas. Sin embargo, debido al alto volumen de información y a que parte de la información clave de las historias clínicas electrónicas está en forma de texto libre, utilizar todo el potencial que ofrece la información de la historia clínica electrónica para mejorar la toma de decisiones clínicas requiere el apoyo de métodos de minería de texto y procesamiento de lenguaje natural. Particularmente, en el área de Ginecología y Obstetricia, la implementación de métodos del procesamiento de lenguaje natural podría ayudar a agilizar la identificación de factores asociados al riesgo materno. A pesar de esto, en la literatura no se registran trabajos que integren técnicas de procesamiento de lenguaje natural en las historias clínicas electrónicas asociadas al seguimiento materno en idioma español. En este trabajo se obtuvieron 659 789 tokens mediante los métodos de minería de texto, un diccionario con palabras únicas dado por 7 334 tokens y se estudiaron los n-grams más frecuentes. Se generó una caracterización con una arquitectura de red neuronal CBOW (continuos bag of words) para la incrustación de palabras. Utilizando algoritmos de clustering se obtuvo evidencia que indica que palabras cercanas en el espacio de incrustación de 300 dimensiones pueden llegar a representar asociaciones referentes a tipos de pacientes, o agrupar palabras similares, incluyendo palabras escritas con errores ortográficos. El corpus generado y los resultados encontrados sientan las bases para trabajos futuros en la detección de entidades (síntomas, signos, diagnósticos, tratamientos), la corrección de errores ortográficos y las relaciones semánticas entre palabras para generar resúmenes de historias clínicas o asistir el seguimiento de las maternas mediante la revisión automatizada de la historia clínica electrónica.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT The purpose of this article was to characterize the free text available in an electronic health record of an institution, directed at the care of patients in pregnancy. More than being a data repository, the electronic health record (HCE) has become a clinical decision support system (CDSS). However, due to the high volume of information, as some of the key information in EHR is in free text form, using the full potential that EHR information offers to improve clinical decision-making requires the support of methods of text mining and natural language processing (PLN). Particularly in the area of gynecology and obstetrics, the implementation of PLN methods could help speed up the identification of factors associated with maternal risk. Despite this, in the literature there are no papers that integrate PLN techniques in EHR associated with maternal follow-up in Spanish. Taking into account this knowledge gap, in this work a corpus was generated and characterized from the EHRs of a gynecology and obstetrics service characterized by treating high-risk maternal patients. PLN and text mining methods were implemented on the data, obtaining 659 789 tokens and a dictionary with unique words given by 7 334 tokens. The characterization of the data was developed from the identification of the most frequent words and n-grams and a vector representation of embedding words in a 300-dimensional space was performed using a CBOW (Continuous Bag of Words) neural network architecture. The embedding of words allowed to verify by means of Clustering algorithms, that the words associated to the same group can come to represent associations referring to types of patients, or group similar words, including words written with spelling errors. The corpus generated and the results found lay the foundations for future work in the detection of entities (symptoms, signs, diagnoses, treatments), correction of spelling errors and semantic relationships between words to generate summaries of medical records or assist the follow-up of mothers through the automated review of the electronic health record.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Procesamiento de lenguaje natural]]></kwd>
<kwd lng="es"><![CDATA[historia clínica electrónica]]></kwd>
<kwd lng="es"><![CDATA[aprendizaje de máquina]]></kwd>
<kwd lng="es"><![CDATA[word embedding]]></kwd>
<kwd lng="es"><![CDATA[redes neuronales artificiales]]></kwd>
<kwd lng="en"><![CDATA[Natural language processing]]></kwd>
<kwd lng="en"><![CDATA[electronic health record]]></kwd>
<kwd lng="en"><![CDATA[machine learning]]></kwd>
<kwd lng="en"><![CDATA[word embedding]]></kwd>
<kwd lng="en"><![CDATA[artificial neural networks.]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Wei]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
<name>
<surname><![CDATA[Guo]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Tang]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Textual analysis and visualization of research trends in data mining for electronic health records]]></article-title>
<source><![CDATA[Heal Policy Technol]]></source>
<year>2017</year>
<volume>6</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>389-400</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[González Bernaldo de Quirós]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Otero]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Luna]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Terminology Services: Standard Terminologies to Control Health Vocabulary]]></article-title>
<source><![CDATA[Yearb Med Inform]]></source>
<year>2018</year>
<volume>27</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>227-33</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Resnik]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Niv]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Nossal]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Kapit]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Toren]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<source><![CDATA[Communication of Clinically Relevant Information in Electronic Health Records: A Comparison between Structured Data and Unrestricted Physician Language]]></source>
<year>2008</year>
<publisher-name><![CDATA[Perspect Health Inf Manag]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Peng]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Long]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Pan]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Jiang]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Niu]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Attentive dual embedding for understanding medical concepts in electronic health records]]></article-title>
<source><![CDATA[Proc Int Jt Conf Neural Networks]]></source>
<year>2019</year>
<volume>2019</volume>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Giamouzi]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Discover research from City, University of London]]></article-title>
<source><![CDATA[City]]></source>
<year>2008</year>
<volume>34</volume>
<numero>2019</numero>
<issue>2019</issue>
<page-range>51-79</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Neuraz]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Looten]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
<name>
<surname><![CDATA[Rance]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Garcelon]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Llanos]]></surname>
<given-names><![CDATA[LC]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Do you need embeddings trained on a massive specialized corpus for your clinical natural language processing task?]]></article-title>
<source><![CDATA[Stud Health Technol Inform]]></source>
<year>2019</year>
<volume>264</volume>
<page-range>1558-9</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Khattak]]></surname>
<given-names><![CDATA[FK]]></given-names>
</name>
<name>
<surname><![CDATA[Jeblee]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Pou-Prom]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Abdalla]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Meaney]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Rudzicz]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A survey of word embeddings for clinical text]]></article-title>
<source><![CDATA[J Biomed Informatics X]]></source>
<year>2019</year>
<volume>4</volume>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Khan]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
<name>
<surname><![CDATA[Daud]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Alotaibi]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Aljohani]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Arafat]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Deep recurrent neural networks with word embeddings for Urdu named entity recognition]]></article-title>
<source><![CDATA[ETRI J]]></source>
<year>2020</year>
<volume>42</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>90-100</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ruas]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[Ferreira]]></surname>
<given-names><![CDATA[CHP]]></given-names>
</name>
<name>
<surname><![CDATA[Grosky]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
<name>
<surname><![CDATA[de França]]></surname>
<given-names><![CDATA[FO]]></given-names>
</name>
<name>
<surname><![CDATA[de Medeiros]]></surname>
<given-names><![CDATA[DMR]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Enhanced word embeddings using multi-semantic representation through lexical chains]]></article-title>
<source><![CDATA[Inf Sci]]></source>
<year>2020</year>
<volume>532</volume>
<page-range>16-32</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
<name>
<surname><![CDATA[Xiong]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Gao]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[]]></source>
<year>2010</year>
<conf-name><![CDATA[ IEEE International Conference on Data Mining (ICDM)]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Arrieta Rodríguez]]></surname>
<given-names><![CDATA[EL]]></given-names>
</name>
<name>
<surname><![CDATA[Martínez Santos]]></surname>
<given-names><![CDATA[JC]]></given-names>
</name>
</person-group>
<source><![CDATA[Predicción temprana de morbilidad materna extrema usando aprendizaje automático]]></source>
<year>2017</year>
<publisher-loc><![CDATA[Cartagena de Indias ]]></publisher-loc>
<publisher-name><![CDATA[Universidad Tecnológica de Bolívar]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mohamed]]></surname>
<given-names><![CDATA[EH]]></given-names>
</name>
<name>
<surname><![CDATA[Shokry]]></surname>
<given-names><![CDATA[EM]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[QSST: A quranic semantic search tool based on word embedding]]></article-title>
<source><![CDATA[J King Saud Univ - Comput Inf Sci]]></source>
<year>2020</year>
<numero>40</numero>
<issue>40</issue>
</nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[McDonald]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Ramscar]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Testing the distributional hypothesis: The influence of context on judgements of semantic similarity]]></source>
<year>2021</year>
<publisher-name><![CDATA[University of Edinburgh, Institute for Communicating and Collaborative Systems]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zakrzewska]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Cluster analysis in personalized e-learning systems]]></article-title>
<source><![CDATA[Stud Comput Intell]]></source>
<year>2009</year>
<volume>252</volume>
<page-range>29-50</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Berzal]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
</person-group>
<source><![CDATA[Clustering jerárquico: métodos de agrupamiento]]></source>
<year>2020</year>
<publisher-name><![CDATA[Universidad de Granada]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[García-Alonso]]></surname>
<given-names><![CDATA[CR]]></given-names>
</name>
<name>
<surname><![CDATA[Pérez-Naranjo]]></surname>
<given-names><![CDATA[LM]]></given-names>
</name>
<name>
<surname><![CDATA[Fernández-Caballero]]></surname>
<given-names><![CDATA[JC]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Multiobjective evolutionary algorithms to identify highly autocorrelated areas: The case of spatial distribution in financially compromised farms]]></article-title>
<source><![CDATA[Ann Oper Res]]></source>
<year>2014</year>
<volume>219</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>187-202</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Vilà]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Rubio]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
<name>
<surname><![CDATA[Berlanga]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
<name>
<surname><![CDATA[Torrado]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Cómo aplicar un cluster jerárquico en SPSS]]></article-title>
<source><![CDATA[REIRE Rev d&#8217;Innovació i Recer en Educ]]></source>
<year>2014</year>
<volume>7</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>113-27</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
