SciELO - Scientific Electronic Library Online

 
vol.38 número3Estudio sobre la estrategia de guiado L1 para el seguimiento de caminos rectos y curvos en UAVEvaluación de QoE en servicios IP basada en parámetros de QoS índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Articulo

Indicadores

  • No hay articulos citadosCitado por SciELO

Links relacionados

  • No hay articulos similaresSimilares en SciELO

Compartir


Ingeniería Electrónica, Automática y Comunicaciones

versión On-line ISSN 1815-5928

Resumen

RICO-SULAYES, Antonio. Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution. EAC [online]. 2017, vol.38, n.3, pp. 26-35. ISSN 1815-5928.

For automatic classification, the implications of having too many classificatory features are twofold. On the one hand, some features may not be helpful in discriminating classes and should be removed from the classification. On the other hand, redundant features may produce negative effects as their number grows therefore their detrimental impact must be minimized or limited. In text classification tasks, where word and word-derived features are commonly employed, the number of distinctive features extracted from text samples can grow quickly. For the specific context of authorship attribution, a number of features traditionally used, such as n-grams or word sequences, can produce long lists of distinctive features, a great majority of which have very few instances. Previous research has shown that in authorship attribution feature reduction can supersede the performance of noise tolerant algorithms to solve the issues associated with the abundance of classificatory features. However, there has been no attempt to explore the motivation of this solution. This article shows how even in the small collections of data characteristically used in authorship attribution, the frequency rank of common elements remains stable as their instances accumulate and novel, uncommon words are constantly found. Given this general vocabulary property, present even in very small text collections, the application of techniques to reduce vector space dimensionality is especially beneficial across the various experimental settings typical of authorship attribution. The implications of this may be helpful for other automatic classification tasks with similar conditions.

Palabras clave : Vector space modelling; Classifying features; Feature reduction.

        · resumen en Español     · texto en Inglés     · Inglés ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License