<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>2227-1899</journal-id>
<journal-title><![CDATA[Revista Cubana de Ciencias Informáticas]]></journal-title>
<abbrev-journal-title><![CDATA[Rev cuba cienc informat]]></abbrev-journal-title>
<issn>2227-1899</issn>
<publisher>
<publisher-name><![CDATA[Editorial Ediciones Futuro]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S2227-18992016000100016</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Variability compensation for speaker verification with short utterances]]></article-title>
<article-title xml:lang="es"><![CDATA[Compensación de la variabilidad para la verificación de locutores con señales cortas]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Reyes-Díaz]]></surname>
<given-names><![CDATA[Flavio J.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Hernández-Sierra]]></surname>
<given-names><![CDATA[Gabriel]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Calvo de Lara]]></surname>
<given-names><![CDATA[José R.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Advanced Technologies Application Center (CENATAV)  ]]></institution>
<addr-line><![CDATA[Playa Havana]]></addr-line>
<country>Cuba</country>
</aff>
<pub-date pub-type="pub">
<day>01</day>
<month>03</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="epub">
<day>01</day>
<month>03</month>
<year>2016</year>
</pub-date>
<volume>10</volume>
<numero>1</numero>
<fpage>194</fpage>
<lpage>204</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_arttext&amp;pid=S2227-18992016000100016&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_abstract&amp;pid=S2227-18992016000100016&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.sld.cu/scielo.php?script=sci_pdf&amp;pid=S2227-18992016000100016&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT Nowadays, represents an attractive challenge the application of Automatic speaker recognition in real scenarios, where the use of short duration signals for forensic or biometric speaker verification is very common. In this paper we perform an analysis of the behavior of within-class and between-classes scatter matrices, showing the importance to reduce within-class scatter to face the speaker recognition with short duration utterances. In addition, two duration compensation methods for short duration utterances on i-vector framework were proposed. Both of them were evaluated through speaker verification experiments on NIST-SRE 2008 dataset. The proposed methods shown an improvements under enrollment-test matched conditions regard to the duration.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN En la actualidad representa un desafío atractivo la aplicación del reconocimiento automático de locutores en escenarios reales, debido a que es muy común el uso de señales de corta duración para la verificación biométrica y forense de locutores. En esta investigación realizamos un análisis del comportamiento de las matrices de dispesión dentro de las clases y entre clases, mostrando la importancia de reducir la dispersión dentro de las clases para hacer frente al reconocimiento de locutores a partir de expresiones de corta duración. Además, se propusieron dos métodos de compensación de la duración sobre el enfoque i-vector. Ambos métodos fueron evaluados a través de experimentos de verificación del locutor utilizando la base de voces NIST-SRE 2008.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[short utterance]]></kwd>
<kwd lng="en"><![CDATA[variability compensation]]></kwd>
<kwd lng="en"><![CDATA[speaker verification]]></kwd>
<kwd lng="en"><![CDATA[i-vector]]></kwd>
<kwd lng="es"><![CDATA[señales cortas]]></kwd>
<kwd lng="es"><![CDATA[compensación de la variabilidad]]></kwd>
<kwd lng="es"><![CDATA[verificación de locutores]]></kwd>
<kwd lng="es"><![CDATA[i-vector]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="right"><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><B>ART&Iacute;CULO    DE REVISI&Oacute;N </B></font></p>     <p>&nbsp;</p>     <p><font size="4"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Variability compensation  for speaker verification with short utterances</font></strong></font></p>     <p>&nbsp;</p>     <p><font size="3"><strong><font face="Verdana, Arial, Helvetica, sans-serif">Compensaci&oacute;n de la variabilidad para la verificaci&oacute;n de locutores con se&ntilde;ales cortas</font></strong></font></p>     <p>&nbsp;</p>     <p>&nbsp;</p>     <P><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Flavio J. Reyes-D&iacute;az<sup>1*</sup>, Gabriel Hern&aacute;ndez-Sierra<sup>1</sup>, Jos&eacute; R. Calvo de Lara<sup>1</sup></strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><sup>1 </sup>Advanced Technologies  Application Center (CENATAV). 7a.A # 21406 e/ 214 y 216, Playa, Havana, C.P. 12200, Cuba. Email: <a href="mailto:freyes%2Cgsierra%2Cjcalvo@cenatav.co.cu">freyes,gsierra,jcalvo@cenatav.co.cu</a></font></p>     <P><font face="Verdana, Arial, Helvetica, sans-serif"><span class="class"><font size="2">*Autor para la correspondencia:<a href="mailto:%20freyes@cenatav.co.cu">freyes@cenatav.co.cu</a></font></span> </font>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p>&nbsp;</p> <hr>     <P><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>ABSTRACT</b></font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Nowadays, represents an attractive challenge  the application of Automatic  speaker recognition in real scenarios, where the use of short duration  signals for forensic or biometric speaker verification is very common. In this paper we perform an analysis of the behavior of within-class and between-classes scatter matrices, showing the importance  to reduce within-class scatter to face the speaker recognition with short duration  utterances. In addition, two duration  compensation methods for short duration  utterances on i-vector framework were proposed. Both of them were evaluated  through speaker verification experiments on NIST-SRE 2008 dataset.  The proposed methods shown an improvements under enrollment-test matched conditions regard to the duration.</font></p>     <p>  <font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>Key words<span lang=EN-GB>: </span></b>short utterance, variability compensation,  speaker verification, i-vector</font></p> <hr>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>RESUMEN</b> </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">En la actualidad representa  un desaf&iacute;o atractivo la aplicaci&oacute;n del reconocimiento  autom&aacute;tico de locutores en escenarios reales, debido a que es muy com&uacute;n  el uso de se&ntilde;ales de  corta duraci&oacute;n para la verificaci&oacute;n biom&eacute;trica y forense de locutores.  &nbsp;En esta investigaci&oacute;n realizamos un an&aacute;lisis  del comportamiento de las matrices de dispesi&oacute;n dentro de las clases  y entre clases, mostrando la importancia de reducir la dispersi&oacute;n dentro de las clases para hacer frente al reconocimiento  de locutores a partir  de expresiones de corta duraci&oacute;n.  &nbsp;Adem&aacute;s, se propusieron dos m&eacute;todos de compensaci&oacute;n de la duraci&oacute;n sobre el enfoque i-vector.  Ambos m&eacute;todos fueron evaluados a trav&eacute;s de experimentos de verificaci&oacute;n del locutor utilizando la base de voces NIST-SRE 2008.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>Palabras clave<span lang=EN-GB>: </span></b>se&ntilde;ales cortas, compensaci&oacute;n de la variabilidad, verificaci&oacute;n de locutores, i-vector.</font></p> <hr>     <p>&nbsp;</p>     <p>&nbsp;</p>     ]]></body>
<body><![CDATA[<p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>INTRODUCTION</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Currently, the necessity of processing  speech signals acquired in real uncontrolled environments is growing. This fact imposes new challenges for speaker recognition systems such as the handling of variability factors, speech duration and emotional state, as well as acoustic distortions, noise and reverberation.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The well-known i-vector speaker representation  (DEHAK et al., 2011) does not take into account the speech duration and  because that, the performance of the speaker recognition using  a cosine similarity measure (CSM) or probabilistic Linear Discriminative Analysis model (PLDA) (PRINCE and ELDER, 2007; KENNY, 2010) decrease quickly when the enrollment or test utterance duration decreases, as shown in (KANAGASUNDARAM et al., 2011; SARKAR  et al., 2012; KANAGASUNDARAM  et al., 2012).</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This problem its very common in biometric  and forensic identification by voice. For that reason, some works as (KANAGASUNDARAM et al., 2011; MANDASARI et al., 2011; SARKAR et al., 2012) are based on multicondition training  techniques to compensate the short duration variability in the i-vector framework but not including and not evaluating the full (not short) utterance  condition against with full samples.  In (SARKAR et al., 2012) only  the full utterances are evaluated,  showing that  the speaker recognition performance decreases when we use multi-condition training regarding  the obtained results without multi-condition training.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Previous works as (MANDASARI et al., 2011; HASAN et al., 2013), use the utterances duration as a qualitative measure, to reduce  the effect of short duration in speaker verification. More recently in  (HAUTAMA KI et al., 2013), authors proposed a strategy  for variability compensation  due to short utterances, replacing the Baum Welch algorithm  by the Minimax algorithm  (MERHAV and LEE, 1993) to estimate the zero order sufficient statistics. Kenny et al. in (KENNY et al., 2013) proposed the use of the uncertainty propagation to introduce the duration variability into the i-vector. Authors in (KANAGASUNDARAM et al., 2013) have been working to  mitigate or reduce the effect caused by short duration samples, proposing a new technique to session variability compensation.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The main goal of our research is to deepen in the analysis of the effect  caused by the variability among enrollment and test samples  due to different duration  in the performance of speaker recognition. As conclusion of this study, we recommend a new method (SUN-LDA2) that incorporate the duration  variability information into within-class scatter estimation, to improve the short duration utterances  compensation of the speaker verification on i-vector framework. In addition,  new approach (IV-DVC) to compensate the duration  variability was proposed. This method is based on divide and conquer technique, compensating in different spaces the channel variability and the duration variability. To support our proposed approaches we report experiments on speaker verification evaluation NISTwith different utterances duration.</font></p>     <p><strong><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Variability compensation  method in I-vector framework </font></strong></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">State of the Art speaker recognition systems are based on the i-vector representation of speaker utterances and can  be defined by the posterior distribution of the hidden  variables conditioned to the Baum-Welch statistics extracted from the utterance. I-vector is computed from the one only variability space called  Total Variability Space (T) (DEHAK et al., 2011), that simultaneously contains the speaker and session variabilities. These speaker template  are represented by </font></p>     <p align="center"><img src="/img/revistas/rcci/v10n1/fo0116116.jpg" alt="fo01" width="151" height="26"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <em>m </em>is  a super-vector obtained  by the concatenation of the UBM centers,  that contains the speaker- independent and session  information, <em>T </em>is a low rank rectangular matrix and <em>w </em>is a random vector that follow a normal distribution <em>N </em>(0<em>, I</em>) and represent a speaker into a speaker verification systems,  called intermediate vector or i-vector. In equation 1, we assume that the vector <em>M&nbsp; </em>keeps a normal distribution with <em>m </em>and <em>T T&acute;</em> as center and covariance respectively.</font> </p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Session variabilities are known to be an important factor  of performance degradation. Compensating for these variabilities becomes a mandatory  part of a modern speaker recognition systems. Some compensation methods, arising from other areas, have been applied with the aim to improve the efficiency  in the recognition, one of such method is the Linear Discriminant Analysis (LDA) (RAO, 1948). This technique is a dimension reduction method currently used on the i-vector framework for inter-session variability compensation  in the speaker verification, initially proposed by Dehak  et. al. &nbsp;in (DEHAK et al., 2011). &nbsp;The principal goal of LDA is to maximize the variance  between-classes (<em>S<sub>b</sub></em>) and simultaneously minimize  the within-class variance (<em>S<sub>w</sub></em>) of a speakers population:</font></p>     <p align="center"><img src="/img/revistas/rcci/v10n1/fo0216116.jpg" alt="fo02" width="207" height="56"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <em>L </em>is  the speakers number, <em>x<sub>l</sub> </em>is  the mean of the i-vectors of each speaker and <em>x</em>&macr;  is the global  mean vector of the speakers population, and</font> </p>     <p align="center"><img src="/img/revistas/rcci/v10n1/fo0316116.jpg" alt="fo03" width="268" height="52"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <em>n<sub>l</sub></em> is the number of utterances for <em>l </em>speaker and </font><img src="/img/revistas/rcci/v10n1/fo0416116.jpg" alt="fo04" width="18" height="24"> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">is the vector of the i-th  utterance the <em>l </em>speaker.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Then the projection matrix <em>A</em>, is a subset of eigenvectors <em>J </em>associated with the largest eigenvalues,  which are obtained by optimizing  the objective function Fisher:</font></p>     <p align="center"><img src="/img/revistas/rcci/v10n1/fo0516116.jpg" alt="fo05" width="145" height="43"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <em>v </em>is  a given space direction.</font></p>     <p><strong><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Short utterances  impact on within-class scatter </font></strong></p>     <p> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">Analysis of the short-length utterances variability in the speaker verification is an issue that has been gaining significance, because, this variability affects the speaker discriminative information contained in the i-vectors. LDA algorithm is one of the most common techniques used to reduced the variability introduced by the short utterances. This technique has been modified by several authors in order to face the variability introduced by signals duration. In (KANAGASUNDARAM et al., 2013), authors proposed a duration compensation method (Source and utterance-duration normalized called SUN-LDA) incorporating the duration variability information in the estimation of the between-class scatter matrix by (eq. 9 in (KANAGASUNDARAM et al.,2013)):</font></p>     ]]></body>
<body><![CDATA[<p align="center"><img src="/img/revistas/rcci/v10n1/fo0616116.jpg" alt="fo06" width="242" height="35"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <em>&alpha;<sub>full</sub> </em>and <em>&alpha;<sub>short</sub> </em>are the variabilities weight for full and short utterances respectively, and </font><img src="/img/revistas/rcci/v10n1/fo0716116.jpg" alt="fo07" width="32" height="24"> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">and</font> <img src="/img/revistas/rcci/v10n1/fo0816116.jpg" alt="fo08" width="38" height="24"> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">are  the between-class scatter matrix of full and short utterances respectively. They not included the duration variability information in the estimation of the within-class scatter, based on the supposition that this inclusion would affect  the speaker verification performance.</font> </p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">We consider that both variabilities (<em>S<sub>b</sub> </em>and <em>S<sub>w</sub></em>) are of great  importance to face the duration  variability improving the speaker verification performance. Because  the i-vectors obtained from short utterances contain low discriminative information of the speaker, implying a greater within-class scatter. In addition this scatter provokes a shift  in the center of the class  affecting the real distance between-classes (speakers), causing  a greater overlapping  between them. In order to involving both variabilities we propose a modification  of the method proposed in (KANAGASUNDARAM et al., 2013), and a new method based on divide and conquer paradigm.</font></p>     <p><strong><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Estimation of within-class scatter with duration variability</font> </strong></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">To reduce variability due to utterances duration  a new method called  SUN-LDA2 was proposed. This method include not only full utterances information but also the information of the short utterances to estimate the within-class scatter matrix<em> S<sub>w</sub></em> by:</font></p>     <p align="center">   <img src="/img/revistas/rcci/v10n1/fo0916116.jpg" alt="fo09" width="168" height="32"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where</font> <img src="/img/revistas/rcci/v10n1/fo1016116.jpg" alt="fo10" width="102" height="32"> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">are the within-class scatter matrices of full and short utterances respectively, they are    estimated using equation 3. Here we don&rsquo;t used the variability weight proposed in (KANAGASUNDARAM et al., 2013) to compute de <em>S<sub>b</sub> </em>using interchangeably the  variability matrices, because in our case  these weights don&rsquo;t introduces relevant information. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The between-class scatter  matrix <em>S<sub>b </sub></em>is calculated using the eq. 5, without  the weights. Finally, the projection matrix was obtained using the eigenvectors corresponding to the largest eigenvalues, classic LDA.</font></p>     <p><strong><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Duration Variability compensation  using &ldquo;Divide and Conquer&rdquo; </font></strong></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Regularly in real conditions, the utterances corresponding to the same speaker are affected  by many types of  variabilities, some intrinsic  as duration or emotion and some extrinsic as channel  or noise. The mixture of these  two types of variability in a single  covariance matrix could  be provoking inefficiency in the estimation of the scatter required  to mitigate both.  &nbsp;Examples of the problem: </font></p> <ul>       ]]></body>
<body><![CDATA[<li><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Session variability is inserted  into all utterances, therefore when the duration variability is estimated also includes information about the session.</font></li>       <li><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Covariance  matrices containing  information about session and duration variability by the sum in<em> S<sub>b</sub></em> (eq.5) and <em>Sw &nbsp;</em>(eq. 6). </font></li>       <li><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Multi-condition training  techniques (MCLAREN and VAN LEEUWEN, 2011) using databases with different conditions of variability to obtain the covariance matrices,  used to obtain<em> S<sub>b </sub></em>(eq. 2) and <em>Sw </em>(eq. 3).</font></li>     </ul>     <p>Given this drawback, an interesting  variant would be to attack each cause  of variability independently, sup- ported in the Divide and Conquer  paradigm. So, we propose a new method to duration  variability compen- sation on i-vectors framework, named IV-DVC. The idea behind the method is to compensate the different variabilities in separate spaces.  The i-vectors are initially  projected to a new space where session variability is mitigated, allowing a correct estimate  of the variability with respect to the duration, to reduce the effect caused by the short expressions in the i-vectors.</p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This method reduces the effect caused  by session variability using a projection matrix  (<em>A</em>) obtained with LDA algorithm, as described in the section &nbsp;using  a development set without  duration variability, all samples  have long duration (full length). Later on, to mitigate the duration variability, a projection matrix (<em>B</em>) is compute with LDA algorithm. To estimate  the between-class <img src="/img/revistas/rcci/v10n1/fo1116116.jpg" alt="fo11" width="191" height="25">scatter matrices eq. 7 and  8 were used with a development set with large and short utterances.</font></p>     <p align="center"><img src="/img/revistas/rcci/v10n1/fo1216116.jpg" alt="fo12" width="326" height="112"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where the parameters have the same definition as in equations 2 and 3.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The CSM between two i-vectors <em>x</em>1 and <em>x</em>2, when the variability is compensated with the method IV-DVC is: </font></p>     <p align="center"><img src="/img/revistas/rcci/v10n1/fo1316116.jpg" alt="fo13" width="294" height="47"></p>     ]]></body>
<body><![CDATA[<p><strong><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Analysis of between and within class scatter matrices</font> </strong></p> <font size="2" face="Verdana, Arial, Helvetica, sans-serif">Therefore, we introduced a visualization tool intended to analyze better the behavior of both variabilities. For this purpose some sessions of five speakers were selected and for each session  i-vectors from  utterances with different duration, 3, 5, 10, 15, 20 seconds and full duration, were extracted. The LDA projection  matrix was trained with three setting,  the first one using the method proposed in (KANAGASUNDARAM et al., 2013), the second one was trained using our proposal SUN-LDA2 and the third one, the IV-DVC method was used to reduce  the duration variability. To visualize the behavior of the between-class and within-class we use the first two dimension  of the i-vectors projected with Principal Component Analysis (PCA).</font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The<a href="/img/revistas/rcci/v10n1/f0116116.jpg" target="_blank"> Figure 1</a> shows the distribution of the first two dimension  of the i-vectors using SUN-LDA (<a href="/img/revistas/rcci/v10n1/f0116116.jpg" target="_blank">Figure 1.a</a>), SUN-LDA2 (<a href="/img/revistas/rcci/v10n1/f0116116.jpg" target="_blank">Figure 1.b</a>) and IV-DVC (Figure 1.c) methods.</font> </p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The proposed SUN-LDA2 method obtains a new distribution of between-class and within-classes scatter show- ing a greater reduction of the within-class scatter compared with SUN-LDA. Although &nbsp;this &nbsp;compensation implied a unwanted reduction, of the between-class scatter.  The duration compensation with IV-DVC method obtains a new distribution of the classes,  showing a similar behavior of between-class and within-class scat- ter than the SUN-LDA2 method. Nevertheless, experimental results of the speaker verification in section , prove that within-class scatter  is more important than between-class scatter if we want compensate the short utterances.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This analysis and the corresponding experimental result confirm  our initial idea. The inclusion  of variability due to short duration utterances in the estimation of the within-class scatter, far from affecting, reinforced the speaker verification performance.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><strong>Experimental set-up </strong></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In all the experiments presented in this section,  the front end uses 19 linear  frequency cepstral  coefficients (LFCCs) (DAVIS  and MERMELSTEIN, 1980), with energy, delta and delta-delta coefficients, giving a 60- dimensional feature  vector. We used NIST 2008 SRE dataset,  specifically male telephone sessions,  as evaluation set  in the speaker verification. To obtain  the simulated short utterances we truncate  the first 5, 10, 15, and  20 seconds of each sample of the evaluation set. We used NIST SRE-04 and SRE-05 telephone  sessions as training data to obtain  the gender dependent Universal background  model (UBM) (REYNOLDS et al., 2000) with 512 Gaussian components, the total variability T matrix with 400 dimension and the LDA compensation matrix. For multi-condition training  with short duration,  we truncate the first 3, 5, 10, 15, 20 seconds  of each sample of the training data.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The classification with PLDA model represents the State of the Art in the speaker recognition fields. Therefore, we decided to develop some experiments where  we combine our proposals of duration compensation  with PLDA model as classifier. Different PLDA models, depending on the methods of compensation,  were trained using i-vectors resulting  from the duration  compensation of the simulated short utterances from NIST SRE-04  and SRE-05 telephone sessions.</font></p>     <p>&nbsp;</p>     <p><strong><font size="3" face="Verdana, Arial, Helvetica, sans-serif">RESULTS Y DISCUSION </font></strong></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a href="/img/revistas/rcci/v10n1/t0116116.jpg" target="_blank">Table 1</a> shows results obtained for the four baseline and two proposed methods, evaluating the EER of speaker verification performance with different variability compensation methods using LDA to face short duration utterances. The baselines were evaluated using cosine similarity measure  (CSM) between i-vectors: IV-LDA: using only the full duration  to estimate the session  compensation matrix  (DEHAK et al., 2011), <em>IV &minus; LDA<sub>var</sub> </em>: using multi-condition training with six set of utterances with different duration and full length  to estimate the compensation  matrix (KANAGASUNDARAM et al., 2011; SARKAR et al., 2012; MANDASARI et al., 2011), SUN-LDA method proposed in (KANAGASUNDARAM et al., 2013), but evaluating  only the duration variability, and Within-class Covariance  Normalisation (WCCN[LDA]) method proposed by (DEHAK et al., 2011). In addition the same baseline  methods using PLDA model were evaluated.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The main result  between all experiments was obtained  using the proposed method IV-DVC. This proposal obtains the better results in the majority of evaluations (7 of 10) of matched short  duration conditions be- tween target and test. IV-DVC obtained  an average improvement of 1% using CSM and 3.1% using PLDA score, respect to the second best compensation  method WCCN[LDA]. So, the importance of using the Divide and Conquer paradigm to compensate different variabilities in different space, in the same utterance, was demonstrated. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The proposed method SUN-LDA2 showed that the information of the intra-class scatter of short utterances is necessary  to duration variability compensation, as opposed of  the raised in (KANAGASUNDARAM et  al., 2013). Hence in the estimation of within-class scatter  is very important to incorporate all variabilities between the i-vectors,  in our case the duration variabilities.&nbsp; As shown in <a href="/img/revistas/rcci/v10n1/t0116116.jpg" target="_blank">Table 1</a> the results of SUN-LDA2 method compared with SUN-LDA reflect an average improvement of 2.7% using CSM and similar efficacy with the PLDA score.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">From the scatter  analysis section and experimental results obtained  with proposed methods aimed at the duration variability compensation,  we can raise that is very important to reduce the within-class scatter  in speaker verification because this reduction minimizes the overlapping  between classes implying an  improvement of the classifiers performance.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">By other hand, the  &nbsp;inclusion  &nbsp;of  &nbsp;dataset  &nbsp;with  &nbsp;different &nbsp;short &nbsp;durations &nbsp;(multi-condition &nbsp;training) &nbsp;to &nbsp;obtain the LDA matrices carry out an improvement in terms of EER. However in the full utterances evaluation the performance is worse regard to short utterances evaluation. This problem is due the fitting of the data to short duration conditions for estimation of the variability compensation  matrix, so the compensation  is biased to the short duration condition. Finally the PLDA score  shows a better performance in general than CSM, comparing both sections of the table.</font> </p>     <p>&nbsp;</p>     <p align="left"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><B>CONCLUSIONS</B></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The importance of including the duration variability information in the estimation of within-class scatter matrix was confirmed, attending  the graphical analysis and experimental results. This inclusion implied  an improvement in terms of EER on experimental results of the proposed methods to face the short utterances in speaker verification, SUN-LDA2 and IV-DVC.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">An important contribution of this work is the proposal of a new duration  variability compensation  method IV- DVC using &ldquo;Divide and Conquer&rdquo;  paradigm. This method separates  the session and duration compensation focusing  on mitigate, separately, both variabilities. IV-DVC  overcome  the efficacy of the rest of the baseline methods facing evaluations with short duration  utterances, achieving  with PLDA a relative improvement of 3.1% compared with the best reference  system WCCN[LDA]. In addition IV-DVC method is the most robust of all evaluated  methods, because it shows a minor variance  among all duration conditions. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">As future work we will continue the research to obtain a variability compensation method able to face short duration utterances without  loss in efficiency facing long duration utterances.</font></p>     <p>&nbsp;</p>     ]]></body>
<body><![CDATA[<p align="left"><font face="Verdana, Arial, Helvetica, sans-serif" size="3"><B>REFERENCIAS  BIBLIOGR&Aacute;FICAS</B></font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">DAVIS, S. B. and MERMELSTEIN, P. (1980).  Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. <em>Acoustics, Speech and Signal  Processing, IEEE Trans-actions on</em>, 28(4):357&ndash;366.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">DEHAK, N., KENNY,  P., DEHAK, R., P., D., and P., O. (2011). Front-end factor analysis  for speaker verification. volume 19, pages 788&ndash;798. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">HASAN, T., SAEIDI,  R., HANSEN, J. H., and Van LEEUWEN,  D. A. (2013). Duration mismatch compensa- tion for i-vector based speaker recognition systems. In <em>Acoustics,  Speech and Signal  Processing (ICASSP)</em>, pages 7663&ndash;7667.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">HAUTAMA&uml; KI, V., CHENG, Y. C., RAJAN, P., and LEE, C. H. (2013). Minimax  i-vector extractor  for short duration  speaker verification. In <em>In Proceedings of the 14th Annual Conference of  the International Speech Communication Association</em>, pages 3708&ndash;3712. ISCA. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">KANAGASUNDARAM, A., DEAN,  D., GONZALEZ-DOMINGUEZ, J., SRIDHARAN, S., RAMOS, D., and GONZALEZ-RODRIGUEZ,  J. (2013). Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques. In <em>In Proceedings of the 14th Annual Conference of the International Speech Communication Association</em>, pages  2465&ndash;2469. International Speech Communi- cation Association (ISCA).</font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">KANAGASUNDARAM, A., VOGT, R., DEAN, D. B., and SRIDHARAN, S. (2012).  &nbsp;Plda based speaker recognition on short  utterances. In <em>The  Speaker and Language  Recognition Workshop  (Odyssey 2012)</em>. ISCA.    </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">KANAGASUNDARAM, A., VOGT, R., DEAN, D. B., SRIDHARAN, S., and MASON, M. W. (2011). I- vector based speaker recognition on short utterances. In <em>Proceedings  of the 12th Annual Conference of the International Speech Communication Association</em>, pages 2341&ndash;2344.</font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">KENNY, P. (2010). Bayesian  speaker verification with heavy tailed priors.  In <em>Speaker and Language  Recognition Workshop (Odyssey)</em>.    </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">KENNY, P., STAFYLAKIS, T., OUELLET, P., ALAM,  M. J., and  DUMOUCHEL, P. (2013). Plda  for speaker verification with utterances of arbitrary duration.  In <em>Acoustics, Speech and Signal Processing (ICASSP),  2013 IEEE International Conference on</em>, pages  7649&ndash;7653. IEEE.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">MANDASARI, M. I., MCLAREN,  M., and VAN LEEUWEN,  D. A. (2011). Evaluation  of i-vector speaker recognition  systems for forensic  application. In <em>In Proceedings  of the 12th Annual Conference  of the Inter- national Speech Communication Association</em>, pages 21&ndash;24. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">MCLAREN, M. and VAN LEEUWEN, D. (2011). Improved speaker recognition when using  i-vectors from multiple speech sources. In <em>Acoustics, Speech and Signal Processing  (ICASSP), 2011 IEEE International Conference on</em>, pages 5460&ndash;5463. IEEE.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">MERHAV, N. and LEE, C.-H. (1993).  A minimax classification approach with application to robust speech recognition. <em>Speech and Audio Processing,</em>, 1(1):90&ndash;100. </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">PRINCE, S. and ELDER, J. (2007).  Probabilistic linear discriminant analysis for inferences about identity. In <em>Computer Vision, ICCV 2007.    </em></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">RAO, C. R. (1948).  The utilization of multiple  measurements in problems of biological classification. <em>Journal of the Royal Statistical Society.  Series B (Methodological)</em>, 10(2):159&ndash;203. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">REYNOLDS, D., QUATIERI, T., and DUNN, R. (2000).  Speaker verification using adapted gaussian  mixture models. <em>Digital Signal Processing</em>, 10(1-3):19&ndash;41. </font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">SARKAR, A. K., MATROUF, D., BOUSQUET, P. M., and BONASTRE, J. F. (2012). Study of the effect of i-vector modeling on short and mismatch utterance duration  for speaker verification. In <em>In Proceedings of the 13th Annual Conference of the International Speech Communication Association</em>.    </font></p>     <p>&nbsp;</p>     <p>&nbsp;</p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Recibido: 22/09/2015      <br> Aceptado: 16/12/2015 </font></p>      ]]></body><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[DAVIS]]></surname>
<given-names><![CDATA[S. B]]></given-names>
</name>
<name>
<surname><![CDATA[MERMELSTEIN]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences.]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<volume>28</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>357-366</page-range><publisher-name><![CDATA[IEEE Transactions]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[DEHAK]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[KENNY]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[DEHAK]]></surname>
<given-names><![CDATA[R., P., D]]></given-names>
</name>
<name>
<surname><![CDATA[P]]></surname>
<given-names><![CDATA[O]]></given-names>
</name>
</person-group>
<source><![CDATA[Front-end factor analysis for speaker verification.]]></source>
<year>2011</year>
<volume>volume 19</volume>
<page-range>pages 788-798</page-range></nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[HASAN]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[SAEIDI]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[HANSEN]]></surname>
<given-names><![CDATA[J. H]]></given-names>
</name>
<name>
<surname><![CDATA[Van LEEUWEN]]></surname>
<given-names><![CDATA[D. A]]></given-names>
</name>
</person-group>
<source><![CDATA[Duration mismatch compensation for i-vector based speaker recognition systems.]]></source>
<year>2013</year>
<page-range>pages 7663-7667</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[HAUTAMAKI]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
<name>
<surname><![CDATA[CHENG]]></surname>
<given-names><![CDATA[Y. C]]></given-names>
</name>
<name>
<surname><![CDATA[RAJAN]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[LEE]]></surname>
<given-names><![CDATA[C. H]]></given-names>
</name>
</person-group>
<source><![CDATA[Minimax i-vector extractor for short duration speaker verification.]]></source>
<year>2013</year>
<page-range>pages 3708-3712.</page-range><publisher-name><![CDATA[ISCA]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[KANAGASUNDARAM]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[DEAN]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[GONZALEZ-DOMINGUEZ]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[SRIDHARAN]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[RAMOS]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[GONZALEZ-RODRIGUEZ]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques.]]></source>
<year>2013</year>
<page-range>pages 2465-2469.</page-range><publisher-name><![CDATA[International Speech Communication Association (ISCA)]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[KANAGASUNDARAM]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[VOGT]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[DEAN]]></surname>
<given-names><![CDATA[D. B]]></given-names>
</name>
<name>
<surname><![CDATA[SRIDHARAN]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<source><![CDATA[Plda based speaker recognition on short utterances.]]></source>
<year>2012</year>
<publisher-name><![CDATA[ISCA]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[KANAGASUNDARAM]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[VOGT]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[DEAN]]></surname>
<given-names><![CDATA[D. B]]></given-names>
</name>
<name>
<surname><![CDATA[SRIDHARAN]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[MASON]]></surname>
<given-names><![CDATA[M. W]]></given-names>
</name>
</person-group>
<source><![CDATA[I- vector based speaker recognition on short utterances.]]></source>
<year>2011</year>
<page-range>pages 2341-2344</page-range><publisher-name><![CDATA[International Speech Communication Association]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[KENNY]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<source><![CDATA[Bayesian speaker verification with heavy tailed priors.]]></source>
<year>2010</year>
</nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[KENNY]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[STAFYLAKIS]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[OUELLET]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[ALAM]]></surname>
<given-names><![CDATA[M. J.]]></given-names>
</name>
<name>
<surname><![CDATA[DUMOUCHEL]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<source><![CDATA[Plda for speaker verification with utterances of arbitrary duration.]]></source>
<year>2013</year>
<page-range>pages 7649-7653</page-range><publisher-name><![CDATA[IEEE International Conference]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[MANDASARI]]></surname>
<given-names><![CDATA[M. I.]]></given-names>
</name>
<name>
<surname><![CDATA[MCLAREN]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[VAN LEEUWEN]]></surname>
<given-names><![CDATA[D. A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Evaluation of i-vector speaker recognition systems for forensic application.]]></source>
<year>2011</year>
<page-range>pages 21-24.</page-range><publisher-name><![CDATA[International Speech Communication Association]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[MCLAREN]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[VAN LEEUWEN]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<source><![CDATA[Improved speaker recognition when using i-vectors from multiple speech sources.]]></source>
<year>2011</year>
<page-range>pages 5460-5463.</page-range><publisher-name><![CDATA[IEEE International Conference]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[MERHAV]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[LEE]]></surname>
<given-names><![CDATA[C.-H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A minimax classification approach with application to robust speech recognition]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<volume>1</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>90-100.</page-range></nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[PRINCE]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[ELDER]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Probabilistic linear discriminant analysis for inferences about identity.]]></source>
<year>2007</year>
<publisher-name><![CDATA[Computer Vision,]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[RAO]]></surname>
<given-names><![CDATA[C. R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The utilization of multiple measurements in problems of biological classification.]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<volume>10</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>159-203</page-range></nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[REYNOLDS]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[QUATIERI]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
<name>
<surname><![CDATA[DUNN]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Speaker verification using adapted gaussian mixture models.]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<volume>10</volume>
<numero>1-3</numero>
<issue>1-3</issue>
<page-range>19-41</page-range></nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[SARKAR]]></surname>
<given-names><![CDATA[A. K]]></given-names>
</name>
<name>
<surname><![CDATA[MATROUF]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[BOUSQUET]]></surname>
<given-names><![CDATA[P. M]]></given-names>
</name>
<name>
<surname><![CDATA[BONASTRE]]></surname>
<given-names><![CDATA[J. F.]]></given-names>
</name>
</person-group>
<source><![CDATA[Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification.]]></source>
<year>2012</year>
<publisher-name><![CDATA[International Speech Communication Association]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
