SciELO - Scientific Electronic Library Online

vol.11 número4Estudio experimental para la comparación del desempeño de Naïve Bayes con otros clasificadores bayesianos índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados




  • No hay articulos citadosCitado por SciELO

Links relacionados

  • No hay articulos similaresSimilares en SciELO


Revista Cubana de Ciencias Informáticas

versión On-line ISSN 2227-1899


FLORES RIERA, Leduan; MARINO MOLERIO, Alejandro; MOJENA ROMAN, Luis  y  HIDALGO DELGADO, Yusniel. Component for automatic metadata extraction from textual corpus in PDF. Rev cuba cienc informat [online]. 2017, vol.11, n.4, pp.85-98. ISSN 2227-1899.

Digital libraries are responsible for management of stored digitals resources and perform three fundamental processes: the selection, treatment and exploitation of resources. One of the functions of treatment is the metadata extraction process; in order to facilitate its use, that is, allow the search, access and retrieval of information. Metadata extraction is a process that requiring time for its execution and if executed manually could there is the risk of introducing human errors. These problems can be reduced by the use of automated tools to support this process. In this article, we describe a web component for automatic extraction of bibliographic metadata from PDF files. The component is based on three fundamental processes that follow a data flow that represents a tubes and filters architecture, where the output of one process constitutes the input to the next. To validate if the metadata extraction component reduces the extraction time, an experimental design is made using a case study. Furthermore, a set of quality tests is applied. These tests are aimed at verifying if the functioning of the component is correct, if the implemented functions are executed correctly, if the obtained results are the desired ones and if the user has a high level of acceptance with the component of extraction of metadata.

Palabras clave : Scientific articles; Metadata extraction; Metadata; PDF Documents; Semantic Web.

        · resumen en Español     · texto en Español     · Español ( pdf )


Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons