Interacción Persona-Computador como base para la evaluación de Sistemas de Recuperación de Información Geográfica.

Puebla Martínez, Manuel Enrique; Perea Ortega, José Manuel; Simón Cuevas, Alfredo

Mi SciELO

Servicios personalizados

Servicios Personalizados

Revista

Articulo

Enviar articulo por email

Indicadores

Citado por SciELO

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Revista Cubana de Ciencias Informáticas

versión On-line ISSN 2227-1899

Rev cuba cienc informat vol.12 no.2 La Habana abr.-jun. 2018

ARTÍCULO ORIGINAL

Human-Computer Interaction as a basis for assessing Geographic Information Retrieval Systems.

Interacción Persona-Computador como base para la evaluación de Sistemas de Recuperación de Información Geográfica.

Manuel Enrique Puebla Martínez¹, José Manuel Perea Ortega^2*, Alfredo Simón Cuevas³

¹Universidad de las Ciencias Informáticas. Carretera a San Antonio de los Baños, Km. 2 ½. Torrens, municipio de La Lisa. La Habana, Cuba. mpuebla@uci.cu
²Universidad de Extremadura, Avda. de Elvas, s/n, Badajoz, España. jmperea@unex.es
³Instituto Superior Politécnico José Antonio Echeverría, Cujae. Calle 114, No. 11901. e/ Ciclovía y Rotonda, Marianao, La Habana, Cuba. asimon@ceis.cujae.edu.cu

*Autor para la correspondencia: jmperea@unex.es

ABSTRACT

In recent years, researches related to Geographic Information Retrieval Systems as a specific field of Information Retrieval has continued to attract the attention of the research community by holding several assessing forums. However, these forums provide sets of tests comprised of text documents and queries that are ready to evaluate non‐interactive systems. This framework reduces the possibilities of carrying out a more thorough evaluation of these systems because it is not considering several important features such as the diversity provided by different information sources or the human‐computer interaction. The aim of this paper is to describe a new approach to evaluate interactive Geographic Information Retrieval Systems, which main novelty is to consider the user’s knowledge generated by the human‐computer interaction as well as the spatial information provided by different data sources. The proposed method will require generating a set of tests from three main data sources (Geonames, Wikipedia, and OpenStreetMap), as well as a set of queries that will consist of a tuple of three components: the object type, the spatial relationship and the geographic object. As a result, the proposed evaluation approach integrates the two most commonly used strategies to evaluate IR systems, which are focused on the system and the end user, by applying several user satisfaction techniques and usability tests. As a main conclusion, we pointed out that the evaluation process of Geographic Information Retrieval systems should consider the user’s knowledge generated by the human-computer interaction as well as the spatial information provided by different and heterogeneous data sources.

Key words: Evaluation of Geographic Information Retrieval, Geographic Information Retrieval, Interactive Information Retrieval, Human-Computer Interaction Information Retrieval.

RESUMEN

En los últimos años, dentro del área de Recuperación de Información, el área de investigación relacionada con los Sistemas de Recuperación de Información Geográfica ha seguido atrayendo la atención de la comunidad investigadora mediante la celebración de varios foros de evaluación. Sin embargo, estos foros proporcionan colecciones de prueba compuestas de documentos de texto y consultas que están listas para evaluar sistemas no interactivos. Este marco de evaluación reduce las posibilidades de llevar a cabo una evaluación más completa de estos sistemas debido a que no se están considerando varias características como la diversidad proporcionada por diferentes fuentes de información o la interacción hombre-computadora. El objetivo de este trabajo es describir un nuevo enfoque para evaluar sistemas interactivos de Recuperación de Información Geográfica, cuya principal novedad es considerar el conocimiento del usuario generado por la interacción hombre-computadora así como la información espacial proporcionada por diferentes fuentes de datos. El método propuesto requerirá la generación de una colección de pruebas a partir de tres fuentes de datos principales (Geonames, Wikipedia y OpenStreetMap), así como un conjunto de consultas que constarán de una tupla de tres componentes: el tipo de objeto, la relación espacial y Objeto geográfico. Como resultado, el enfoque de evaluación propuesto integra las dos estrategias más utilizadas para evaluar los sistemas de IR, que se centran en el sistema y en el usuario final, aplicando varias técnicas de satisfacción de usuario y pruebas de usabilidad. Como conclusión principal, señalamos que el proceso de evaluación de los sistemas de Recuperación de Información Geográfica debe considerar el conocimiento del usuario generado por la interacción hombre-computadora, así como la información espacial proporcionada por fuentes de datos diferentes y heterogéneos.

Palabras clave: evaluación de la recuperación de información geográfica, recuperación de información geográfica, recuperación de información interactiva, recuperación de información basado en la interacción humano-computador.

INTRODUCTION

Whilst the evaluation methods of classical Information Retrieval (IR) systems have already been widely studied since the middle of last century, the analysis of the user’s impact and their interactions with the information systems are not yet well established (Kelly & Sugimoto, 2013). The difference between classical IR and Interactive Information Retrieval (IIR) focuses on how the system retrieves relevant documents so that while IR systems are concerned about whether relevant documents are retrieved, IIR systems focus on whether people can use the system to retrieve relevant documents. Furthermore, Kelly (2009) points out that the main change in the study of interactive systems with real users involves the non‐applicability of the Cranfield model because it requires defining the relevance level of the documents for a specific query.

Within the IR field, the research area related to GIR systems has continued to attract the attention of the research community in recent years, by holding several assessment forums such as GeoCLEF (Mandl et al. 2009) and NTCIR‐GeoTime (Gey et al. 2011). However, none of these forums provide a valid set of tests to evaluate GIR systems based on Human‐Computer interaction (HC‐GIR systems), since they were focused on non‐interactive GIR systems. For this reason, the aim of this paper is to describe an approach to evaluate HC‐GIR systems, which also includes generating a new set of tests compatible with the main features of these systems. In this context, we would like to point out several differences between HC‐GIR systems and non‐interactive GIR systems that should be considered when the sets of tests provided by the aforementioned forums are used to evaluate an HC‐GIR system:

HC‐GIR systems are not only focused on retrieving information from a corpus of text documents as with classical GIR systems but also they should take advantage of other information sources such as cartographic data sources or even the user knowledge.
HC‐GIR systems are not focused on classifying text documents into two levels of relevance (relevant or not relevant) as in classical GIR evaluation forums, but they are rather focused on retrieving geographic objects that can be classified in multiple levels of relevance (Multidimensional Relevance).
HC‐GIR systems allow the improvement of their data sources due to the user system interaction, facilitating the retrieval of geographic information. Usually, different users from a specific geographic area know better their own geography and, therefore, a retrieved geographical object could have different levels of relevance for the same query at different times or even for different users depending on their geographic location.
HC‐GIR systems are usually focused on generating new geographic knowledge due to the user interaction with the system and the use of data sources generated by users, also known as User Generated Content (UGC). All this human knowledge certainly helps HC‐GIR systems to improve the information retrieval.

In this paper, a new approach to evaluate HC‐GIR systems is described, which includes generating a new set of tests for these systems. Figure 1 shows the overview of a generic HC‐GIR system, whose knowledge base is a geographic domain ontology enriched by automatic and semi‐automatic mechanisms that gather information due to the user‐system interaction. Moreover, an automated process that integrates the information into the geographic ontology carries out the information extraction from different data sources. The integration process is completed by another filtering process, which is supported by a geographic ontology editor that allows the user‐system interaction. Finally, as shown on top of Figure 1, the system includes a visual query editor, which allows the user to classify the relevance of the geographical objects retrieved by the system, and the actual Geographic Information System (GIS) that wraps around the system.

The HC‐GIR system shown in Figure 1 prioritizes the retrieval of existing geographical objects on the real world, instead of non‐existing ones, although it could be perfectly compatible to the retrieval of such objects. Note that if the object is geographic, then it should have a related spatial location in the real world, but this does not imply the physical presence of the object in our geography. The overall success of the information retrieval lies in using data sources that contain the user needs along with a successful analysis of these needs.

f01

Although the common meaning of the concept corpus refers to a collection of documents, in this paper we consider the definitions proposed by Kelly (2009), who defines a collection as a set of topics, corpus and relevant judgments, while corpus is considered the set of documents or information objects to which users have access.

Recently, the main challenges and developments carried out in the GIR field over the years have been analyzed (Purves 2014). The conclusions are not very encouraging, pointing out the scarce use of geographical knowledge bases, which has led to no significant improvements. One of the challenges is related to the methods used to evaluate the success of the GIR systems. Another relevant conclusion is that GeoCLEF was not a successful evaluation forum because the teams focused on performing simple adjustments based on the query strategy or sometimes on the relevance ranking formula applied in the system. According to Purves (2014), the evaluation of geographic relevance, especially at local level, is a more and more challenging and essential research field nowadays. Purves also points out the need to develop effective interfaces that help users find what they want. Another major challenge in GIR is to disambiguate unknown, small or geographic areas with low‐level detail, as discussed in Purves (2014) and Palacio et al. (2015).

Furthermore, Borlund (2013) points out the need for research on multidimensional and dynamic relevance rankings for GIR, which justifies the importance and novelty of this paper. Palacio et al. (2015) propose a new approach to develop a set of tests by using UGC (User Generated Content), as well as the queries and the relevance judgments. They conclude that using UGC is promising to evaluate GIR systems, especially for queries related to geographical areas with a low‐level detail.

The user satisfaction is being increasingly important in IR systems, which should be designed to meet the user’s requirements. What are these needs and requirements in a GIR system? Perea‐Ortega (2010) points out that the main goal of a GIR system is to calculate the user satisfaction regarding the response of the system for the information need. According to Kelly & Sugimoto (2013), some IIR papers have evaluated a single system instead of performing several experiments, where the objective is to examine the effects of an independent variable (e.g. the system) into one or more dependent variables (e.g. performance and usability) thus being two elements compared at least. Traditional usability tests are examples of this type of evaluation, which is normally carried out with a single version of the system, with the aim of identifying potential usability problems.

Regarding the existing evaluation forums for IIR, TREC is one of the most relevant. Over the years, it has provided several tracks for the IIR task: Interactive Track (TRECs 03‐11), the Hard Track (TRECs 12‐14), and ciQA (TRECs 15‐16). Recently, some tracks have focused on scenarios that involve spatial and temporal information. For instance, the Contextual Suggestion Track involves complex information needs, which are highly dependent on context and user interests, and these contexts include latitude and longitude coordinates, as well as a temporal component. According to Kelly (2009), these tracks provided different sets of tests, but none of them was successful in establishing a generic collection, which allows teams to make feasible comparisons between the IIR systems. In that context, the author suggests four basic types of measures for IIR evaluation: context, interaction, performance and usability. Finally, another related evaluation forum is NTCIR‐GeoTime (Gey et al. 2011), where GIR is evaluated for Asian and English languages but including temporal aspects of the retrieval process. The remaining issues were similar to those obtained by the GeoCLEF track (Mandl et al. 2009).

The aim of this research is to describe a new approach to evaluate interactive GIR systems, whose main novelty is to consider the user’s knowledge generated by the human-computer interaction as well as the spatial information provided by different data sources. The rest of the paper is organized as follows: Section 2 describes the proposed method; the results of the proposed method are presented in Section 3; Section 4 discusses the proposed approach and, finally, the conclusions are expounded in Section 5.

PROPOSED METHOD

The aim of the proposed method is to evaluate the HC‐GIR model shown in Figure 1 by using the two existing evaluation models in GIR: the CLEF non‐interactive model and the TREC interactive model. The proposed approach will require generating a new set of tests in which three main data sources will be used: Geonames, Wikipedia, and OpenStreetMap. According to Palacio et al. (2015), these sources can be considered as User Generated Content (UGC) so we can state that our evaluation method makes use of user knowledge somehow. Then, the judgments of relevance will be generated by taking into account these corpora along with the topics or queries that should be defined previously. Finally, the results of each query will be classified into three levels: “relevant”, “not very relevant” and “not relevant”.

Regarding the queries that will be used to evaluate the system, they will consist of a tuple of three components: the object type, the spatial relationship, and the geographic object. Each query should be related to a set of valid results (in this case geographical objects), sorted by a relevance score according to the related geographical area and the information available on the corpora. This way, all queries should refer to existing spatial objects in the geographical area, thus facilitating the elaboration of the relevance judgments. Furthermore, a geographic ontology should support the evaluation of the results for a specific visual query.

Similar to the strategy proposed by Kelly (2009), a usability study will be performed in our evaluation approach, where the results are compared to predetermined population parameters defined from the results of similar studies. This usability study is based on a well‐known experimental design called "Solomon four‐group” (McCambridge et al. 2011). Besides, the calculation of the "precision" measure is also carried out for the proposed approach in order to perform a feasible comparison with the systems evaluated during the GeoCLEF campaign and other interactive TREC tracks. In this sense, the comparisons will be performed in two different usability levels from the user point of view: usability levels to submit the information need and usability levels to visualize and understand the query results.

Another issue of the proposed approach is related to the evaluation of the results provided by the HC‐GIR system. In order to tackle this task, the use of human experts is required, these being the basis of the application of the “expert criteria” method (Delphi method) to evaluate the consensus of the experts about the quality of the proposal. Regarding the user satisfaction, the Iadov technique (López-Rodríguez & González-Maura, 2002) will be applied. This method consists of five questions based on the relationships that are established between three closed questions that are intercalated within a questionnaire and whose relationship the user does not know.

An analysis of the context in which the information retrieval occurs by experts and standard users will be necessary for a deeper analysis of the results. This will include measures used to characterize the individuals, such as age, intelligence, creativity, memory or cognitive style, and others used to characterize the searching situation, such as familiarization with the topics, the current geographic location or the time used for searching. In the same way, it will be necessary to measure and analyze the level of interaction between users and the system. The interaction measures used are the following: number of queries, number of search results viewed, number of objects and geographical concepts viewed, number of geographic objects defined by the user as relevant, and length of the query. For example, if the user visualizes many concepts or geographic objects to meet the information need, then this would be a negative indicator. However, if the user visualizes few objects, then this would be a positive result because the goal is to help the user find what he looks for in the shortest time possible.

To sum up, the proposed approach to evaluate HC‐GIR systems consists of the following steps:

Generate the set of tests from the three main data sources proposed: Geonames, Wikipedia and OpenStreetMap.
Select several human experts of related fields to test the system by acting as evaluators and end users. Apply the Delphi method among the human experts to obtain a consensus on the quality of the system.
Select several standard users to interact with the system and retrieve geographical information. Then, apply the Iadov technique and the Satisfaction Questionnaire for User Interface (QUIS) (Naeini & Mostowfi, 2015) to measure the user satisfaction from the retrieved results.
Define and calculate the traditional "precision" measure used in common non-interactive GIR systems and compare the achieved results with those obtained by other systems that participated in those evaluation campaigns such as GeoCLEF. Furthermore, the two existing variants of the precision measure described in Kelly (2009) (Interactive TREC precision and Interactive user precision) will be also calculated in order to compare the achieved results with those obtained by the participants in the TREC Interactive Track.
Run usability tests and compare the results with those obtained by other systems that participated in previous evaluation campaigns. The USE questionnaire (Usefulness, Satisfaction, and Ease of use) defined by Lund (2001) and the SUMI questionnaire (Software Usability Measurement Inventory) will be used to measure the usability.

RESULTS

In order to make the system achievable and capable of obtaining initial results, we have reduced the size of the corpora generated to the specific geographical area of Cuba.

Assuming that the HC‐GIR system is supported by an ontology whose conceptualization is partially shown in Figure 2, a relevant result for the query "Hospitals in the east of Cuba" could be "Celia Sánchez Manduley Hospital" because it is a direct instance of the concept "hospital" and it is geographically located in "Cuban East".

However, another result with a lower degree of relevance could be "René Vallejo Polyclinic”, because the concept "Polyclinic" (a concept to which the instance "René Vallejo" belongs) is semantically related to the concept "Hospital" by the hypernym relationship. Moreover, "René Vallejo Polyclinic” is spatially related to "Cuban East" by means of the topological relationship "belongs". Finally, the town "Manzanillo" can be considered belonging to "Cuban East" because the spatial relationship "belongs" is transitive.

These results show the importance of using a geographic ontology to support the evaluation of the results for a specific visual query in the HC‐GIR system. Moreover, the semantic and spatial relationships established within the ontology help to define a more accurate degree of relevance for the results provided by the HC‐GIR system regarding a specific query.

DISCUSSION

In our opinion, a controversial issue of the proposed approach is how to compare the accuracy score obtained for a specific topic or query by using different sets of tests. Could we perform a feasible comparison? The common formula to calculate the precision for non‐interactive GIR systems consists of a ratio between relevant documents retrieved and documents retrieved. Theoretically, if the sets of tests are generated correctly and the geographical objects related to each query are sorted by a reliable relevance ranking, the formula to calculate the accuracy should not be influenced by the fact of working with different sets of tests. From our point of view, a GIR system should return spatial objects of interest, but most of them are often designed to fit a set of tests rather than focusing on the user needs. This is what happened in the GeoCLEF campaign for example, where GIR systems returned textual documents because it was a requirement of the set of tests. However, we believe that spatial objects should be considered a priority in the geographic domain and sets of tests should be adapted to the user’s needs.

Furthermore, it should be noted that the data sources used to generate the set of tests during the first step of the proposed evaluation approach are not static as the GeoCLEF ones, i.e., their content could change over time. Although the geographic data that are stored in the proposed sources are not highly variable, these can be accessed in real time. However, since the end users can be also seen as data sources in our model, the geographical features of the data could change during the evaluation process.

According to Purves (2014), the absence of a widely recognized set of tests for GIR is one of the reasons for which the evaluations of GIR systems are often omitted. The design and release of a set of tests to evaluate these systems require a huge effort by human annotators. In this sense, we believe that the evaluation of the results for each query would be less exhausting by following our approach because spatial objects (mostly recognized in our real world) are usually easier to relate to information needs represented by a visual query. This approach is conceptually different to that used for traditional GIR evaluations, where textual documents (most of them geographically unrecognized in our real world) are related to the information need represented by a textual query. Therefore, we consider that defining the level of similarity between spatial objects is easier than among text documents and spatial objects.

Another issue to discuss is related to the query generation. There are different alternatives to their advantages and drawbacks. Once the query is defined, both GIR alternatives (textual and visual) will be able to perform the evaluation regarding the accuracy precision for the non‐interactive variant. One of the limitations when comparing interactive and non‐interactive IR systems has been the incapability of IIR systems to generate a query since one of its objectives is to avoid the negative consequences of requiring a query from the user, although the user involvement during the retrieval process is an exclusive feature of the IIR systems.

However, the HC‐GIR system takes advantage of the new knowledge generated by the user, which can be considered an improvement for successful information retrieval. This point is another difference with the non-interactive GIR systems since these systems do not use human knowledge during the retrieval process.
Finally, according to the literature reviewed, our evaluation proposal could be considered the first one that tries to compare non‐interactive GIR systems with HC‐GIR systems. Nevertheless, from our point of view, in order to make the common set of tests used to evaluate non‐interactive GIR system more reliable and demanded by the research community, two main issues should be considered:

Remove the corpus from the set of tests and allow the GIR system to choose the data sources to search.
Generate the relevance judgments for each query depending on the current geographical area of the user.

This new variant would further facilitate the ranking of the results. The only requirement would be to know well the geographic area used in the set of tests in order to generate a successful relevance judgment, i.e., the HC‐GIR system would be evaluated positively whether their results are close to reality or not. How to deal with this? It is one of the challenges to be solved by the GIR research community currently.

CONCLUSIONS

This paper describes a novel approach to evaluate Human-Computer Geographical Information Retrieval systems (HC‐GIR systems). The proposal is focused on integrating the main findings provided by the most relevant evaluation forums related to the Information Retrieval (IR) and Interactive Information Retrieval (IIR) fields, such as TREC, CLEF and NTCIR. In addition, the proposed approach tries to lay the foundations for a feasible comparison between HC‐GIR systems and traditional GIR systems like those presented in GeoCLEF. A brief discussion about the main differences between HC‐GIR and IIR systems is also presented in this paper.

Nowadays there is a lack of consensus within the GIR community on how to evaluate HC‐GIR systems, mainly due to the current challenges that were also discussed in this paper. In the literature reviewed, no evaluation forums were found where HC‐GIR systems can be analyzed and evaluated. Only the proposal described in Bucher et al. (2005) can be considered an evaluation approach related to that presented in this paper, although the evaluation measures applied to the end users were not explained in detail. In this sense, our proposed evaluation approach integrates the two most used strategies to evaluate IR systems, which are focused on the system and the end user, by applying several user satisfaction techniques and usability tests.

ACKNOWLEDGMENTS

This paper has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER) and REDES project (TIN2015‐65136‐C2‐1‐R) from the Spanish Government.

REFERENCES

BORLUND, P., 2013. Interactive Information Retrieval: An Introduction. Journal of Information Science Theory and Practice, 1 (3), pp.12-32. Korea Institute of Science and Technology Information.

BUCHER, B. et al., 2005. Geographic IR systems: requirements and evaluation. ICC 05: Proceedings of the 22nd International Cartographic Conference, pp.11–16.

GEY, F. C., LARSON, R. R., MACHADO, J. & YOSHIOKA, M., 2011. NTCIR9-GeoTime Overview - Evaluating Geographic and Temporal Search: Round 2, in Noriko Kando; Daisuke Ishikawa & Miho Sugimoto, ed., ‘NTCIR’, National Institute of Informatics (NII).

KELLY, D., 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Foundations and Trends® in Information Retrieval, 3(1—2), pp.1–224.

KELLY, D. & SUGIMOTO, C. R., 2013. A systematic review of interactive information retrieval evaluation studies, 1967-2006. JASIST 64 (4), pp. 745-770.

LÓPEZ-RODRÍGUEZ, A. & GONZÁLEZ-MAURA, V., 2002. La técnica de Iadov. Una aplicación para el estudio de la satisfacción de los alumnos por las clases de educación física. Revista Digital-Buenos Aires, Año 8, No. 47. Available at: http://www.efdeportes.com/efd47/iadov.htm

LUND, A., 2001. Measuring Usability with the USE Questionnaire. Usability Interface, 8(2), pp.3–6.

MANDL, T., CARVALHO, P., NUNZIO, G. M. D., GEY, F. C., LARSON, R. R., SANTOS, D. & WOMSER-HACKER, C., 2008. GeoCLEF 2008: The CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview., in Carol Peters; Thomas Deselaers; Nicola Ferro; Julio Gonzalo; Gareth J. F. Jones; Mikko Kurimo; Thomas Mandl; Anselmo Peñas & Vivien Petras, ed., ' CLEF', Springer, pp. 808-821.

MCCAMBRIDGE, J., BUTOR-BHAVSAR, K., WITTON, J., & ELBOURNE, D., 2011. Can Research Assessments Themselves Cause Bias in Behaviour Change Trials? A Systematic Review of Evidence from Solomon 4-Group Studies. PLoS ONE, 6(10), e25223. http://doi.org/10.1371/journal.pone.0025223

NAEINI, H.S. & MOSTOWFI, S., 2015. Using QUIS as a measurement tool for user satisfaction evaluation (case study: vending machine). International Journal of Information Science, 5(1), 14–23. doi:10.5923/j.ijis.20150501.03

PALACIO, D., DERUNGS, C. & PURVES, R.S., 2015. Development and evaluation of a geographic information retrieval system using fine-grained toponyms. Journal of Spatial Information Science, 11(2015).

PEREA‐ORTEGA, J.M., 2010. Recuperación de Información Geográfica basada en múltiples formulaciones y motores de búsqueda. PhD Thesis. University of Jaén.

PURVES, R., 2014. Geographic Information Retrieval: Are We Making Progress?, pp.1–6. Available at: http://spatial.ucsb.edu/wp-content/uploads/smss2014‐Position‐Purves.pdf.

Recibido: 20/06/2017
Aceptado: 13/03/2018