INTRODUCTION
Maps and cartographic technology are practically everywhere. Although the ways and means of making and distributing maps have been revolutionized with recent computer and technological advances (Bartel, 2022) The art and science of map making dates back centuries. This is because humans are inherently spatial organisms, and in order for them to live in the world, they must first relate to it in some way (Restrepo, 2015).
Currently, there are tools and computer programs applied to geography that allow the visualization of geographic and alphanumeric data in an integrated manner. They also allow information to be managed in the form of layers of different formats and to develop spatial analyzes for specific ends (Mariani, 2012).
DEVELOPMENT OF THE TOPIC
Geographic Information Systems: Concepts and Main Tools
An example of this is Geographic Information Systems (GIS). According to Goodchild & Kemp (1990) they are information systems made up of hardware, software and procedures to capture, manage, manipulate, analyze, model and represent georeferenced data, with the objective of solving management and planning problems.
GIS are powerful tools for collecting, storing and managing spatial data using maps. GIS methods and tools are widely used in engineering applications, especially in the design, planning and management of transportation/logistics systems, facility design, resource location specification and allocation (specifically for agricultural resources) and the determination of potential regions with high demand. Other areas that depend on the collection, analysis and visualization of spatial data are telecommunications, agriculture, mining, among others (Balaman, 2019).
A GIS allows managing a large amount of data and information. From the relatively simple task of mapping the route a tractor must follow to transport freshly picked food from the field to the more complex task of determining the most efficient planting process, GIS is used in both the public and private sectors. Online and mobile maps, navigation and location-based services are also personalizing and democratizing GIS by bringing maps and cartography to the masses (Dastrup, 2020).
Comparison between QGIS and ArcGIS
At present there are different GIS applications, both commercial and open source, used by a large number of users. ArcGIS and QGIS are two of these applications mostly used when talking about GIS (Maurya et al., 2015)
According to Khan & Mohiuddin (2018) QGIS is a fully competent and easily accessible GIS tool, it is an open source program, free unlike ArcGIS. QGIS can be introduced in different frameworks, for example, Windows, Mac OS X, Linux (Ubuntu) and Unix, while ArcGIS is only supported by Windows.
In their work Maurya et al. (2015) state that there are many open source GIS software available globally, but QGIS is the most popular among them. Likewise, they state that ArcGIS is the most notorious proprietary software in the field of GIS, which is why both GIS are compared on several occasions. ArcGIS has an excellent graphical user interface compared to any open source software. However, it is only supported by Windows operating systems, while QGIS can be introduced in different frameworks such as Linux. ArcGIS has well-accepted documentation, while QGIS has extensive and well-described documentation, with numerous helpful and easy-to-understand tutorials and introductory videos.
According to Flenniken et al. (2020), QGIS is an increasingly useful tool for visualization and spatial analysis, making it a viable alternative to other expensive software packages. QGIS is an open source software that benefits from the contributions of experts and users from around the world. The open source nature of QGIS, coupled with the increasing availability of free data from online sources, offers an opportunity for inexperienced users to learn and incorporate GIS technology into their investigations.
In his work, Benduch (2017) suggests that QGIS could be an acceptable alternative to ArcGis proprietary software, in the GIS teaching process, taking into account the financial possibility of educational institutions and also the complexity of its functionalities.
After consulting several bibliographies and analyzing the opinions of different authors like Khan & Mohiuddin, (2018); Maurya et al. (2015); Flenniken et al. (2020) and Benduch (2017), it can be stated that both, QGIS and ArcGIS, are high-quality GIS software, both are stable systems and good proposals when choosing a GIS. But while ArcGIS stands out in many ways and has the backing of a large company behind it, its high cost is a barrier for many users. The great advantage of QGIS is that it is an economic proposal since a license is not needed to be able to use all the tools that it offers. In addition, it is free open source software, so it is available to any user, with the same functionalities and features of a paid GIS.
Data Types in A GIS: Spatial and Attribute Data
QGIS like other GIS supports the visualization and analysis of geographic data (Khan & Mohiuddin, 2018). In addition, it offers multiple facilities, such as data capture, management and consultation. It allows users to visualize and understand the relationships between geographic data in the form of maps, reports and charts. The available basic data types define the typical data on a map. There are mainly two types of data: attribute data and spatial data (Lithmee, 2019).
Spatial Data
Spatial data describes geometric information, such as a feature's location in space or its position relative to other features. In addition, they are multidimensional and cover several different data, although correlated (Howari & Ghrefat, 2021).
They comprise the relative geographic information about the Earth and its characteristics. A pair of latitude and longitude coordinates defines a specific location on Earth. These are of two types, depending on the storage technique: raster data and vector data (Janipella et al., 2019):
Raster data is made up of grid cells identified by row and column. The entire geographic area is divided into groups of individual cells, which represent an image. Satellite images, photographs and scanned images are examples of raster data (Figure 1, right).
Vector data consists of points, polylines and polygons. Wells, houses, among others, are represented by points. Roads, rivers, and streams are represented by polylines. Towns and cities are represented by polygons (Figure 1, left).
Attribute Data
Attribute data are descriptions or measurements of geographic features on a map. It refers to detailed data that are combined with geospatial data. Attribute data help to get the meaningful information from a card. Each feature has properties that can be described. For example, a building: it has a year of construction, the number of floors, among other characteristics or attributes. They can also be facts that are known but not visible, such as the year of construction. Furthermore, they can represent the lack of a characteristic (Lithmee, 2019).
These data comprise relevant information about spatial data. The query function works based on the attribute data, that is, it is attached to the geospatial data. The attribute data types are (Janipella et al., 2019):
Nominal data: describe different types of data categories, such as land use types or soil types.
Ordinal data: differentiate the data by its classification relationship. For example, cities can be grouped into large, medium, and small based on population size.
Interval Data: Have known intervals between values, such as a temperature reading. For example, a temperature reading of 30°C is 10°C warmer than 20°C.
Proportion data: These are the same as interval data, except that proportion data are based on a significant value or absolute zero. Population densities are an example of ratio data, because a density of 0 is an absolute zero.
Databases are used for the storage and organization of geographic data. GIS uses, among other data sources, spatial databases which allow users to have access to updated information without having to physically have spatial files. If the users have access to the spatial database, they will have access to the spatial information stored in it. Databases allow minimizing the possibility of errors in the handling of information since they facilitate the elimination of redundancies and inconsistencies. In addition, they help to structure and organize the data in an efficient and functional way (Nur et al., 2018).
Databases
A database is a listing or collection of data, usually assembled in such a way that they can be accessed quickly. However, especially in today's world, a database can be as simple as an address book containing a list of names and addresses to something as sophisticated as an entire database of thousands of customer records (Maliakal, 2019).
Databases are collections of related data organized so that the data can be easily accessed, managed and updated. They can be based on software or hardware, with a single purpose, to store data (Cai & Vasilakos, 2017).
Database Management System
A database management system (DBMS) is software designed to help maintain and use large collections of data and the need for and use of these systems is growing rapidly. Database management systems offer various functions in addition to file management; they enable concurrency, control security, maintain data integrity, provide backup and recovery, control redundancy, enable data independence and provide a non-procedural query language (Singh, 2015).
DBMS were designed to facilitate the storage and retrieval of large collections of data. They include functions of protection and security of them, maintenance of the coherence of the stored data and the availability of them for multiple users at the same time (Rawat & Purnama, 2021).
Comparison between Relational and NoSQL Database Management Systems
There are several types of DBMS: relational, distributed, NoSQL, object-oriented and graph-oriented. This is due to the variety of the way of working that is required of them. Among the most used for working with QGIS are Relational Database Management Systems (SGBDR) such as PostgreSQL and NoSQL database management systems such as MongoDB (Nayak et al., 2013).
According to Guillot-Jiménez & García (2016), NoSQL database management systems are an excellent option for applications that require efficient management of semi-structured and unstructured data and high availability by end users. For the management of geospatial data, document-oriented and graph-oriented systems stand out over those that have a key/value data model and column-oriented systems. Although the broader domain in terms of geospatial data management continues to be in the hands of relational systems.
In their work, Guo & Onstein (2020) state that NoSQL databases such as MongoDB are very useful for geospatial data work, since they load quickly and have a good execution time, they also have good query performance and abundant geospatial functions and indexing methods. Depending on the application scenario, graph databases, key-value databases and wide-column databases also have their own advantages. In addition, the calculation of geometric surfaces and the processing of volumes are not handled in existing NoSQL databases.
According to Nasution (2021) a RDBMS is a software used to store, manage, consult and retrieve the data stored in a relational database. It provides an interface between users and applications and the database, as well as administrative functions to manage data storage, access and performance.
In his work Twumasi (2002), states that one of the logically well-established database models is the relational model, which organizes data in tables. These tables are manipulated by a RDBMS. GIS have traditionally used relational databases for their data storage. Despite its limited ability to capture the behavior of objects, relational technology is mathematically well established in the field of databases.
According to Yue & Tan (2017) DB MS, which are based on relational models of geographic data, have dominated the business market for a long time and play an important role in the modern information society. Relational models are based on algebra sets with rigorous mathematical foundations and formal analysis capabilities.
In his work Hammersley (2018) believes that DBMS have been at the forefront in the field of GIS such as QGIS for many years. In his research, he states that after a scalable DBMS performance test comparing two DBMS with a document DBMS and a graphics DBMS, the results show that the DBMS are generally superior, in terms of union sentences, extraction of significant statistics, partial record updates and referential integrity.
After the analysis and consultation of different bibliographies such as Guillot-Jiménez & García (2016); Guo & Onstein (2020); Hammersley (2018); Twumasi (2002) and Yue y Tan (2017), it can be stated that both, DBMS and NoSQL, are recommended databases to use when starting GIS projects, each with its advantages and disadvantages. However, DBMS provide greater support and a greater variety of tools because they have been on the market for longer. Relational systems are considered the most popular among organizations around the world as they provide a reliable method of storing and retrieving large amounts of data while offering a combination of system performance and ease of implementation. In addition, they are useful for handling and obtaining geographic data.
Postgresql
There are RDBMS with private and commercial licenses, among these are Microsoft SQL Server, Oracle and DB2, whose significant cost is in some cases only achievable by large organizations. However, free, open source software can also be found as powerful and effective as paid ones (Garrido et al., 2021).
According to Ismail-Hossain et al. (2019) and Wodyk & Skublewska-Paszkowska (2020) among the most popular open source DBMS on the market worldwide are SQLite, MySQL and PostgreSQL. Each have their own unique characteristics and limitations and they excel in particular scenarios. SQLite excels in small applications while PostgreSQL excels thanks to features like parallel processing and concurrency. In addition, it guarantees the reliability and integrity of the data and is ideal for scientific and research projects.
PostgreSQL has a rich set of data types, allowing its extension by user-defined and scriptable types and operators. Its administration is based on users and privileges and it is highly reliable in terms of stability. The error messages can be in Spanish and make correct orders with accented words or with the letter 'ñ'. It can be extended with external libraries to support encryption, phonetic similarity searches, among others. It enables multi-version concurrency control, which significantly improves blocking and transaction operations in multi-user systems and it is possible to define a new type of table from another previously defined (Ginestà & Mora, 2012).
Spatial Databases
PostgreSQL, like other DBMS, by itself does not allow the storage of the physical location and the shape of the geometric objects within the tables, for this a spatial database is needed (Hess, 2022).
Spatial databases allow the representation of simple geometric objects such as points, lines and polygons. Some handle complex structures such as 3D objects, topological coverages and linear and triangulated irregular networks. Such databases require additional functionality to process spatial data types efficiently and developers have often added geometric or feature data types (Salleh et al., 2021).
Spatial databases are extensions of classical databases that contain special objects, procedures and mechanisms to store and manipulate geospatial data (Maina et al., 2019).
PostGIS extension of PostgreSQL
PostGIS is the extension for PostgreSQL DBMS, it stores the spatial object in the standard Well Known Binary (WKB) format. It automatically inherits the characteristics of business databases, as well as the open standards that a GIS implements within the database engine (Belciu et al., 2014).
Among the main data used by PostGIS are point, linestring, polygon, multipoint, multilinestring and multipolygon (Leslie & Ramsey, 2022). This offers the ability to bulk load data from shape files1. It is released under the GNU GPL license, which means it is available as open source software (Hsu & Obe, 2021).
SQL Queries
PostgreSQL like other DBMS allows users to enter queries to retrieve or manipulate the data needed for specific job functions. The main purpose of a query is to retrieve information, this function allows users to access the information or change the data in some way, such as adding or deleting information (Baik et al., 2019)
The recovery process is carried out through queries where the structured information is stored, using an appropriate query language. It is necessary to take into account the key elements that allow the search, determining a greater degree of relevance and precision, such as indices, keywords and problems that may occur in the process (Kul et al., 2018).
Queries have other important functions, including filtering data, presenting results clearly, compiling information and adding criteria (Uma et al., 2019).
SQL, Structured Query Language
PostgreSQL uses the Structured Query Language (SQL) as the programming language to perform queries. (Shin et al., 2018). When the statements provided by SQL are used efficiently, it allows the creation of highly scalable, flexible and manageable systems (Pino et al., 2018).
SQL implements clauses to structure queries. A clause is a language element that includes the query statements select, from, where, order by and having (Pratt, 2009).
Select indicates which fields of the attribute table the user wants to see.
From denotes the attribute table in which the information resides.
Where denotes user-defined criteria for attribute information that must be met in order to be included in the output set.
Order by denotes the sequence in which the output set will be displayed.
Having indicates the predicate used to filter the output of the order by clause.
While the select and from clauses are required statements in an SQL query, where is an optional clause used to limit the result set. Order by and having are optional clauses used to present information in an interpretable way (Shin et al., 2018).
SQL Queries | Natural Language Questions |
---|---|
Select * from crops order by crop.area | Show the crops whose planted area is larger |
Select * from crops where crop.production > 20 | Crop whose production is greater than 20 t |
Select * from crops where crops.name like ‘rice’ order by crops.production descending | Show rice productions from highest to lowest |
Source: self made
SQL queries correspond to questions made by users where the structure of SQL statements is similar to that of natural language (Taipalus et al., 2018).
According to Shin et al. (2018) SQL queries are nothing more than natural language questions about the world and how people relate to it. These questions can be simple with a local focus or more complex with a more global perspective.
GIS Queries
The presence of a DBMS such as PostgreSQL within the GIS QGIS allows queries to be made to it; it is a call to said management system, which returns as a response a series of elements taken from the information contained in the database. In other words, of the total data, a part of the data is obtained as a result of the query. The answer to the query is a set of elements, in the same way that if a printed map asks what is here? and the data corresponding to the point that was indicated is obtained as a response. These data are a particular fraction of the set of all the contents on said map (Olaya, 2014).
One of the analyzes that can be performed on a geographic information layer within QGIS is its query. A query is understood as an operation in which the geographic data is asked some kind of simple question, generally based on simple formal concepts. This type of analysis, although it does not imply the use of complex analytical concepts, is one of the key elements of GIS, since it is a basic part of their daily use (Pratt, 2009)
Types of Queries in GIS
After analyzing several bibliographies, it was found that Olaya (2014); Shin et al. (2018) and Pratt (2009) classify the types of queries in a GIS. Regardless of this, the authors refer to the same types of queries to classify them into three fundamental types: selection, attribute queries and spatial queries.
Selection
Selection represents the easiest way to find and query spatial data. The selection of functions highlights the attributes of interest, both on the screen and in the attribute table, for later visualization or analysis. To accomplish this, points, lines, and polygons are selected simply by using the cursor to "point and click" on the feature of interest or by using the cursor to drag a box around those features. Alternatively, features can be selected by using a graphical object, such as a circle, line or polygon, to highlight all features found within the object (Shin et al., 2018).
Advanced options for selecting subsets of data from the larger dataset include creating a new selection, selecting currently selected features, adding to the current selection and removing the current selection (Olaya, 2014).
Query by Attribute
The attribute query requires the treatment of the attribute data. In other words, it is a process of selecting information through the formulation of logical questions. Attribute queries effectively take the form of a question. The easiest way to conceptualize it is that you are constructing a sentence using a special syntax that will select a subset of your data as a result (Shnizai, 2011).
Map features and their associated data can be retrieved through querying for attribute information within data tables. For example, the search and query tools allow a user to display all productive plots whose area is greater than 5 ha, to display the plots planted with rice with the highest yield, among others (Shin et al., 2018).
Attributes can be numeric values, text strings, bool values (that is, true or false) or dates. This type of query is similar to a query made to any database; however, the responses, that is, the characteristics related to the records selected by the process are highlighted on the map, as well as in the table (Pratt, 2009).
Spatial Query
Spatial queries are queries in a spatial database that can be answered solely from the geometric information, that is, the spatial position and extent of the objects involved. Also known as "geography queries", they allow highlighting particular attributes by examining their position in relation to other features (Shin et al., 2018).
Spatial query functions allow finding display and/or isolate attribute records connected to map features located within a defined area of interest. Spatial query involves the selection of features based on location or spatial relationships, which requires processing of spatial information. Spatial queries are a useful tool with the ability to find how or where two or more data sets are spatially related. (Shnizai, 2011).
Spatial Relations
The spatial relationship is the central component of a spatial query. It is a relationship between spatial features with respect to their spatial locations and spatial arrangements. There are three types of spatial relationships in GIS: proximity relationships, topological relationships and direction relationships (Dan et al., 2020).
Topological relationships are not affected by bicontinuous transformations, such as stretching, shifting, rotating or bending the spatial features involved.
Proximity relationships refer to spatial relationships based on distances between entities.
Direction relationships are based on the angular separation of a feature in relation to another feature in a coordinate system.
Queries can be made on the attribute component of the data, on the spatial component or on both. In any case, they are linked, so the result of the query affects both. The selection is evident on both components, regardless of which of them has been in charge of applying the selection criteria. Both, the attribute table and the visual representation of the spatial component, are affected by performing a query.
CONCLUSIONS
GIS are powerful tools for collecting, storing and managing spatial data. In addition, there are open source GIS tools that are very popular worldwide due to their easy access and competitiveness, among them is QGIS. It uses relational databases to store and manage geographic data. DBMS such as PostgreSQL allow minimizing the possibility of errors in information handling since they facilitate the elimination of redundancies and inconsistencies. QGIS together with PostgreSQL through the PostGIS extension to work with spatial databases constitutes a powerful tool when processing georeferenced data. Likewise, it was possible to conclude that to retrieve or manipulate information people must consult this database system. This is nothing more than natural language questions written by users working with QGIS. These questions are made in SQL language and are classified into three main types: selection, attribute and spatial queries. Any of the three will give true results, no matter if they are used separately or in combination.