Componente para la extracción automática de metadatos bibliográficos desde corpus textuales en formato PDF

Flores Riera, Leduan; Mariño Molerio, Alejandro; Mojena Román, Luis; Hidalgo Delgado, Yusniel

Mi SciELO

Servicios personalizados

Servicios Personalizados

Articulo

Enviar articulo por email

Indicadores

Citado por SciELO

Links relacionados

Similares en SciELO

Permalink

Revista Cubana de Ciencias Informáticas

versión On-line ISSN 2227-1899

Resumen

FLORES RIERA, Leduan; MARINO MOLERIO, Alejandro; MOJENA ROMAN, Luis y HIDALGO DELGADO, Yusniel. Component for automatic metadata extraction from textual corpus in PDF. Rev cuba cienc informat [online]. 2017, vol.11, n.4, pp. 85-98. ISSN 2227-1899.

Digital libraries are responsible for management of stored digitals resources and perform three fundamental processes: the selection, treatment and exploitation of resources. One of the functions of treatment is the metadata extraction process; in order to facilitate its use, that is, allow the search, access and retrieval of information. Metadata extraction is a process that requiring time for its execution and if executed manually could there is the risk of introducing human errors. These problems can be reduced by the use of automated tools to support this process. In this article, we describe a web component for automatic extraction of bibliographic metadata from PDF files. The component is based on three fundamental processes that follow a data flow that represents a tubes and filters architecture, where the output of one process constitutes the input to the next. To validate if the metadata extraction component reduces the extraction time, an experimental design is made using a case study. Furthermore, a set of quality tests is applied. These tests are aimed at verifying if the functioning of the component is correct, if the implemented functions are executed correctly, if the obtained results are the desired ones and if the user has a high level of acceptance with the component of extraction of metadata.

Palabras clave : Scientific articles; Metadata extraction; Metadata; PDF Documents; Semantic Web.

· resumen en Español · texto en Español · Español (

pdf )