SciELO - Scientific Electronic Library Online

vol.11 número4Estudio experimental para la comparación del desempeño de Naïve Bayes con otros clasificadores bayesianos índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados




  • Não possue artigos citadosCitado por SciELO

Links relacionados

  • Não possue artigos similaresSimilares em SciELO


Revista Cubana de Ciencias Informáticas

versão On-line ISSN 2227-1899


FLORES RIERA, Leduan; MARINO MOLERIO, Alejandro; MOJENA ROMAN, Luis  e  HIDALGO DELGADO, Yusniel. Component for automatic metadata extraction from textual corpus in PDF. Rev cuba cienc informat [online]. 2017, vol.11, n.4, pp.85-98. ISSN 2227-1899.

Digital libraries are responsible for management of stored digitals resources and perform three fundamental processes: the selection, treatment and exploitation of resources. One of the functions of treatment is the metadata extraction process; in order to facilitate its use, that is, allow the search, access and retrieval of information. Metadata extraction is a process that requiring time for its execution and if executed manually could there is the risk of introducing human errors. These problems can be reduced by the use of automated tools to support this process. In this article, we describe a web component for automatic extraction of bibliographic metadata from PDF files. The component is based on three fundamental processes that follow a data flow that represents a tubes and filters architecture, where the output of one process constitutes the input to the next. To validate if the metadata extraction component reduces the extraction time, an experimental design is made using a case study. Furthermore, a set of quality tests is applied. These tests are aimed at verifying if the functioning of the component is correct, if the implemented functions are executed correctly, if the obtained results are the desired ones and if the user has a high level of acceptance with the component of extraction of metadata.

Palavras-chave : Scientific articles; Metadata extraction; Metadata; PDF Documents; Semantic Web.

        · resumo em Espanhol     · texto em Espanhol     · Espanhol ( pdf )


Creative Commons License Todo o conteúdo deste periódico, exceto onde está identificado, está licenciado sob uma Licença Creative Commons