<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2227-1899</journal-id>
<journal-title><![CDATA[Revista Cubana de Ciencias Informáticas]]></journal-title>
<abbrev-journal-title><![CDATA[Rev cuba cienc informat]]></abbrev-journal-title>
<issn>2227-1899</issn>
<publisher>
<publisher-name><![CDATA[Editorial Ediciones Futuro]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2227-18992022000100077</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Identificación de idioma hablado en señales cortas aplicando transferencia de aprendizaje]]></article-title>
<article-title xml:lang="en"><![CDATA[Spoken language identification for short utterance with transfer learning]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Montalvo Bereau]]></surname>
<given-names><![CDATA[Ana]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Reyes Díaz]]></surname>
<given-names><![CDATA[Flavio]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Hernández Sierra]]></surname>
<given-names><![CDATA[Gabriel]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Calvo de Lara]]></surname>
<given-names><![CDATA[José Ramón]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Centro de Aplicación de Tecnologías de Avanzada  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Cuba</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>03</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>03</month>
<year>2022</year>
</pub-date>
<volume>16</volume>
<numero>1</numero>
<fpage>77</fpage>
<lpage>91</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2227-18992022000100077&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2227-18992022000100077&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2227-18992022000100077&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN En el presente trabajo se abordó el reconocimiento automático del idioma hablado en señales de corta duración, empleando una red neuronal convolucional pre-entrenada sobre un conjunto de imágenes. Partiendo del conocimiento transferido del dominio de imágenes reales a la clasificación de tareas sobre audio, se evaluó el impacto del aprendizaje multitarea tomando el reconocimiento de idioma como tarea principal y el reconocimiento del locutor como tarea auxiliar. Los experimentos se llevaron a cabo sobre un subconjunto del corpus Voxforge, y con una cantidad de señal significativamente menor a las empleadas por sistemas análogos de referencia. La evaluación se realizó sobre espectrogramas conformados con 3 segundos de señal. Los resultados arrojan que el reconocimiento del idioma hablado se beneficia del aprendizaje multitarea al usar como tarea auxiliar la identidad del locutor.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT In the present work, spoken language recognition in short utterances was addressed using a convolutional neural network pre-trained on a set of images. Starting from the knowledge transferred from the domain of real images to the audio classification tasks, we assess the impact of multitask learning, taking language recognition as the main task and speaker recognition as auxiliary task. The experiments were carried out on a subset of the Voxforge corpus, and with a significantly lower amount of signals than those used by analog reference systems. The evaluation was done over spectrograms conformed with 3 seconds signal. The results show that the spoken language recognition task benefits from multitasking learning by using the identity of the speaker as an auxiliary task.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Reconocimiento automático del idioma hablado]]></kwd>
<kwd lng="es"><![CDATA[aprendizaje profundo]]></kwd>
<kwd lng="es"><![CDATA[transferencia de aprendizaje]]></kwd>
<kwd lng="es"><![CDATA[aprendizaje multitarea]]></kwd>
<kwd lng="en"><![CDATA[Spoken language recognition]]></kwd>
<kwd lng="en"><![CDATA[deep learning]]></kwd>
<kwd lng="en"><![CDATA[transfer learning]]></kwd>
<kwd lng="en"><![CDATA[multitask learning.]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Abdullah]]></surname>
<given-names><![CDATA[Badr M]]></given-names>
</name>
<name>
<surname><![CDATA[Avgustinova]]></surname>
<given-names><![CDATA[Tania]]></given-names>
</name>
<name>
<surname><![CDATA[Mobius]]></surname>
<given-names><![CDATA[Bernd]]></given-names>
</name>
<name>
<surname><![CDATA[Klakow]]></surname>
<given-names><![CDATA[Dietrich]]></given-names>
</name>
</person-group>
<source><![CDATA[Cross-domain adaptation of spoken language identification for related languages: The curious case of slavic languages]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bartz]]></surname>
<given-names><![CDATA[Christian]]></given-names>
</name>
<name>
<surname><![CDATA[Herold]]></surname>
<given-names><![CDATA[Tom]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[Haojin]]></given-names>
</name>
<name>
<surname><![CDATA[Meinel]]></surname>
<given-names><![CDATA[Christoph]]></given-names>
</name>
</person-group>
<source><![CDATA[Language identification using deep convo- lutional recurrent neural networks]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Yoshua]]></given-names>
</name>
</person-group>
<source><![CDATA[Deep learning of representations for unsupervised and transfer learning.]]></source>
<year>2012</year>
<page-range>17-36</page-range><publisher-name><![CDATA[Proceedings of ICML workshop on unsupervised and transfer learning]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Keunwoo]]></surname>
<given-names><![CDATA[Choi]]></given-names>
</name>
<name>
<surname><![CDATA[Fazekas]]></surname>
<given-names><![CDATA[Gyorgy]]></given-names>
</name>
<name>
<surname><![CDATA[Sandler]]></surname>
<given-names><![CDATA[Mark]]></given-names>
</name>
<name>
<surname><![CDATA[Cho]]></surname>
<given-names><![CDATA[Kyunghyun]]></given-names>
</name>
</person-group>
<source><![CDATA[Transfer learning for music classifica- tion and regression tasks]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Deng]]></surname>
<given-names><![CDATA[Jia]]></given-names>
</name>
<name>
<surname><![CDATA[Dong]]></surname>
<given-names><![CDATA[Wei]]></given-names>
</name>
<name>
<surname><![CDATA[Socher]]></surname>
<given-names><![CDATA[Richard]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[Li-Jia]]></given-names>
</name>
<name>
<surname><![CDATA[Kai]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
<name>
<surname><![CDATA[Fei-Fei]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
</person-group>
<source><![CDATA[Imagenet: A large-scale hierarchical image database]]></source>
<year>2009</year>
<page-range>248-55</page-range><publisher-name><![CDATA[2009 IEEE Conference on Computer Vision and Pattern Recognition]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jin]]></surname>
<given-names><![CDATA[Ma]]></given-names>
</name>
<name>
<surname><![CDATA[Song]]></surname>
<given-names><![CDATA[Yan]]></given-names>
</name>
<name>
<surname><![CDATA[McLoughlin]]></surname>
<given-names><![CDATA[Ian Vince]]></given-names>
</name>
<name>
<surname><![CDATA[Guo]]></surname>
<given-names><![CDATA[Wu]]></given-names>
</name>
<name>
<surname><![CDATA[Dai]]></surname>
<given-names><![CDATA[Li-Rong]]></given-names>
</name>
</person-group>
<source><![CDATA[End-to-end language identification using high-order utterance representation with bilinear pooling]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zheng]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
<name>
<surname><![CDATA[Miao]]></surname>
<given-names><![CDATA[Zhao]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Yiming]]></surname>
<given-names><![CDATA[Zhi]]></given-names>
</name>
<name>
<surname><![CDATA[Lin]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
<name>
<surname><![CDATA[Hong]]></surname>
<given-names><![CDATA[Q]]></given-names>
</name>
</person-group>
<source><![CDATA[The xmuspeech system for the ap19-olr challenge]]></source>
<year>2020</year>
<publisher-name><![CDATA[INTERSPEECH]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[MacLean]]></surname>
<given-names><![CDATA[Ken]]></given-names>
</name>
</person-group>
<source><![CDATA[Voxforge]]></source>
<year>2009</year>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Brian]]></surname>
<given-names><![CDATA[McFee]]></given-names>
</name>
<name>
<surname><![CDATA[Colin]]></surname>
<given-names><![CDATA[Raffel]]></given-names>
</name>
<name>
<surname><![CDATA[Dawen]]></surname>
<given-names><![CDATA[Liang]]></given-names>
</name>
<name>
<surname><![CDATA[McVicar]]></surname>
<given-names><![CDATA[Matt]]></given-names>
</name>
<name>
<surname><![CDATA[Battenberg]]></surname>
<given-names><![CDATA[Eric]]></given-names>
</name>
<name>
<surname><![CDATA[Ni]]></surname>
<given-names><![CDATA[Oriol]]></given-names>
</name>
</person-group>
<source><![CDATA[librosa: Audio and music signal analysis in python]]></source>
<year>2015</year>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mendes]]></surname>
<given-names><![CDATA[Carlos]]></given-names>
</name>
<name>
<surname><![CDATA[Abad]]></surname>
<given-names><![CDATA[Alberto]]></given-names>
</name>
<name>
<surname><![CDATA[Ne]]></surname>
<given-names><![CDATA[Joa&#732;o Paulo]]></given-names>
</name>
<name>
<surname><![CDATA[Trancoso]]></surname>
<given-names><![CDATA[Isabel]]></given-names>
</name>
</person-group>
<source><![CDATA[Recognition of Latin American Spanish Using Multi-Task Learning.]]></source>
<year>2019</year>
<page-range>2135-9</page-range><publisher-name><![CDATA[Proc. Interspeech]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xiaoxiao]]></surname>
<given-names><![CDATA[Miao]]></given-names>
</name>
<name>
<surname><![CDATA[Mcloughlin]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Shengyu]]></surname>
<given-names><![CDATA[Yao]]></given-names>
</name>
<name>
<surname><![CDATA[Yonghong]]></surname>
<given-names><![CDATA[Yan]]></given-names>
</name>
</person-group>
<source><![CDATA[Improved conditional generative adversarial net classification for spoken language recognition]]></source>
<year>2018</year>
<page-range>98-104</page-range><publisher-name><![CDATA[2018 IEEE Spoken Language Technology Workshop (SLT),]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Montavon]]></surname>
<given-names><![CDATA[Gregoire]]></given-names>
</name>
</person-group>
<source><![CDATA[Deep learning for spoken language identification]]></source>
<year>2009</year>
<page-range>1-4</page-range><publisher-name><![CDATA[NIPS Workshop on deep learning for speech recognition and related applications]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bharat]]></surname>
<given-names><![CDATA[Padi]]></given-names>
</name>
<name>
<surname><![CDATA[Shreyas]]></surname>
<given-names><![CDATA[Ramoji]]></given-names>
</name>
<name>
<surname><![CDATA[Vaishnavi]]></surname>
<given-names><![CDATA[Yeruva]]></given-names>
</name>
<name>
<surname><![CDATA[Kumar]]></surname>
<given-names><![CDATA[Satish]]></given-names>
</name>
<name>
<surname><![CDATA[Ganapathy]]></surname>
<given-names><![CDATA[Sriram]]></given-names>
</name>
</person-group>
<source><![CDATA[The leap language recognition system for lre 2017 challenge - improvements and error analysis]]></source>
<year>2018</year>
<publisher-name><![CDATA[Odyssey]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhiyuan]]></surname>
<given-names><![CDATA[Peng]]></given-names>
</name>
<name>
<surname><![CDATA[Siyuan]]></surname>
<given-names><![CDATA[Feng]]></given-names>
</name>
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[Tan]]></given-names>
</name>
</person-group>
<source><![CDATA[Adversarial multi-task deep features and unsupervised back-end adaptation for language recognition.]]></source>
<year>2019</year>
<page-range>5961-5</page-range><publisher-name><![CDATA[ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Povey]]></surname>
<given-names><![CDATA[Daniel]]></given-names>
</name>
<name>
<surname><![CDATA[Ghoshal]]></surname>
<given-names><![CDATA[Arnab]]></given-names>
</name>
<name>
<surname><![CDATA[Boulianne]]></surname>
<given-names><![CDATA[Gilles]]></given-names>
</name>
<name>
<surname><![CDATA[Burget]]></surname>
<given-names><![CDATA[Lukas]]></given-names>
</name>
<name>
<surname><![CDATA[Glembek]]></surname>
<given-names><![CDATA[Ondrej]]></given-names>
</name>
<name>
<surname><![CDATA[Goel]]></surname>
<given-names><![CDATA[Nagendra]]></given-names>
</name>
<name>
<surname><![CDATA[Hannemann]]></surname>
<given-names><![CDATA[Mirko]]></given-names>
</name>
<name>
<surname><![CDATA[Motlicek]]></surname>
<given-names><![CDATA[Petr]]></given-names>
</name>
<name>
<surname><![CDATA[Qian]]></surname>
<given-names><![CDATA[Yanmin]]></given-names>
</name>
<name>
<surname><![CDATA[Schwarz]]></surname>
<given-names><![CDATA[Petr]]></given-names>
</name>
</person-group>
<source><![CDATA[The kaldi speech recognition toolkit]]></source>
<year>2011</year>
<publisher-name><![CDATA[Technical report, IEEE Signal Processing Society]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Revay]]></surname>
<given-names><![CDATA[Shauna]]></given-names>
</name>
<name>
<surname><![CDATA[Teschke]]></surname>
<given-names><![CDATA[Matthew]]></given-names>
</name>
</person-group>
<source><![CDATA[Multiclass language identification using deep learning on spectral images of audio signals]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sandler]]></surname>
<given-names><![CDATA[Mark]]></given-names>
</name>
<name>
<surname><![CDATA[Howard]]></surname>
<given-names><![CDATA[Andrew]]></given-names>
</name>
<name>
<surname><![CDATA[Zhu]]></surname>
<given-names><![CDATA[Menglong]]></given-names>
</name>
<name>
<surname><![CDATA[Zhmoginov]]></surname>
<given-names><![CDATA[Andrey]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[Liang-Chieh]]></given-names>
</name>
</person-group>
<source><![CDATA[Mobilenetv2: Inverted residuals and linear bottlenecks]]></source>
<year>2019</year>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Suwon]]></surname>
<given-names><![CDATA[Shon]]></given-names>
</name>
<name>
<surname><![CDATA[Ali]]></surname>
<given-names><![CDATA[Ahmed]]></given-names>
</name>
<name>
<surname><![CDATA[Glass]]></surname>
<given-names><![CDATA[James]]></given-names>
</name>
</person-group>
<source><![CDATA[Convolutional neural networks and language embeddings for end-to-end dialect recognition]]></source>
<year>2018</year>
</nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[van der Merwe]]></surname>
<given-names><![CDATA[Ruan]]></given-names>
</name>
</person-group>
<source><![CDATA[Triplet entropy loss: Improving the generalisation of short speech language identifica- tion systems]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[Ying]]></given-names>
</name>
<name>
<surname><![CDATA[Pezeshki]]></surname>
<given-names><![CDATA[Mohammad]]></given-names>
</name>
<name>
<surname><![CDATA[Brakel]]></surname>
<given-names><![CDATA[Philemon]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[Saizheng]]></given-names>
</name>
<name>
<surname><![CDATA[Laurent]]></surname>
<given-names><![CDATA[Cesar]]></given-names>
</name>
<name>
<surname><![CDATA[Bengio]]></surname>
<given-names><![CDATA[Yoshua]]></given-names>
</name>
<name>
<surname><![CDATA[Courville]]></surname>
<given-names><![CDATA[Aaron]]></given-names>
</name>
</person-group>
<source><![CDATA[Towards end-to-end speech recognition with deep convolutional neural networks]]></source>
<year>2017</year>
</nlm-citation>
</ref>
<ref id="B21">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Miao]]></surname>
<given-names><![CDATA[Zhao]]></given-names>
</name>
<name>
<surname><![CDATA[Rongjin]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
<name>
<surname><![CDATA[Shijiang]]></surname>
<given-names><![CDATA[Yan]]></given-names>
</name>
<name>
<surname><![CDATA[Zheng]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
<name>
<surname><![CDATA[Hao]]></surname>
<given-names><![CDATA[Lu]]></given-names>
</name>
<name>
<surname><![CDATA[Shipeng]]></surname>
<given-names><![CDATA[Xia]]></given-names>
</name>
<name>
<surname><![CDATA[Qingyang]]></surname>
<given-names><![CDATA[Hong]]></given-names>
</name>
<name>
<surname><![CDATA[Lin]]></surname>
<given-names><![CDATA[Li]]></given-names>
</name>
</person-group>
<source><![CDATA[Phone- aware multi-task learning and length expanding for short-duration language recognition.]]></source>
<year>2019</year>
<page-range>433</page-range><publisher-name><![CDATA[2019 Asia- Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
