SciELO - Scientific Electronic Library Online

 
vol.31 issue1Automated System to Determine Means of Transport in Mechanized Sugarcane Harvesting author indexsubject indexarticles search
Home Pagealphabetic serial listing  

My SciELO

Services on Demand

Article

Indicators

  • Have no cited articlesCited by SciELO

Related links

  • Have no similar articlesSimilars in SciELO

Share


Revista Ciencias Técnicas Agropecuarias

On-line version ISSN 2071-0054

Rev Cie Téc Agr vol.31 no.1 San José de las Lajas Jan.-Apr. 2022  Epub Nov 12, 2021

 

VIEW POINTS

Analysis of Main Components, an Effective Tool in Agricultural Technical Sciences

0000-0003-2439-1176Lucía Fernández-ChuaireyI  *  , 0000-0003-1640-1987Lazara Rangel-Montes de OcaI  , 0000-0002-5953-7561Mario Varela-NuallesII  , 0000-0001-9728-6700José Antonio Pino-RoqueI  , 0000-0001-9138-1563Jany del Pozo-FernándezI  , 0000-0002-7000-802XNelson Ulises Lim-ChamgI 

IUniversidad Agraria de La Habana (UNAH), San José de las Lajas, Mayabeque, Cuba.

IIInstituto Nacional de Ciências Agrícola (INCA), San José de las Lajas, Mayabeque, Cuba.

ABSTRACT

Currently there is a wide range of multivariate techniques, which are used in different areas of research. The present work focuses on the Principal Components Method and aims to establish a set of methodological criteria for the processing and interpretation of results in the use of this technique on mathematical-statistical bases. An example associated with post-harvest studies of the pineapple (variety Cayena Lisa) is developed. A sequence of steps is proposed that includes: previous analysis of correlation between variables, determination of the number of components to be selected (compromise between the different criteria), weight of variables in each component, biological interpretation and graphs that validate the results obtained in reference to components and individuals. The study had the variables: weight loss in g (PP), firmness, color index (IC), soluble solids content (SSC) and pH. The variables were grouped into two components that explain 88.36% of the variation in the data. A positive relationship was observed among PP, SSC and pH and the negative relationship of firmness with these variables. It is shown that the highest PP and pH are reached from the sixth day and the highest firmness, in the first two days, aspects to take into account in making timely decisions for storage, transportation and marketing. It is concluded that the use of multivariate techniques and, particularly, the analysis of principal components constitutes an efficient and non-destructive way in monitoring the quality of fruits in storage.

Key words: Main Components; Agricultural Engineering; Multivariate Methods

INTRODUCTION

Historically, in agricultural sector, the need for the use of different statistical-mathematical methodologies that respond to current problems in scientific research has been present. Recently, Fernández et al. (2018; 2019), established criteria and evaluations on mathematical-statistical bases in the analysis and application of models that describe agrarian processes (based mainly on univariate and bivariate statistics).

Similarly, the literature reports on the use of multivariate methods, which are used to study phenomena that include the measurement of several variables and which are applied depending on the characteristics of the research. Among the most used multivariate statistical techniques are: Multiple Regression; Principal Component Analysis (PCA); Factor Analysis (AF); Discriminant Analysis (AD); the Numerical Taxonomy (CLUSTER); Multidimensional Scaling, among others, those that have been addressed by Lozares & López (1991); Robaina et al. (2001); Hair & Anderson (2004); Bouza & Sistachs (2006); González et al. (2008); Miranda (2011); Coronados et al. (2017); Quindemil & Rumbaut (2019); Gozá et al. (2020); Varela (2021) among other authors.

The objective of this work is to establish, on mathematical-statistical bases, a set of methodological criteria for the processing and interpretation of results with the use of the Principal Components method, its analysis is focused on post-harvest studies of pineapple (variety Cayenne Lisa)

DEVELOPMENT OF THE TOPIC

Theoretical Fundament

Various criteria have been given on the definition of multivariate statistical techniques. A general definition was proposed by Hair & Anderson (2004), who argue that “Multivariate analysis refers to all statistical methods that simultaneously analyze multiple measures of each individual or object under investigation and emphasize that any simultaneous analysis of more than two variables can be considered approximately as a multivariate analysis”.

These methods group a set of statistical techniques that are responsible for the analysis of data corresponding to measurements of p variables observed in n individuals; allowing the study of interrelations. The literature collects various multivariate methods, and classifies them fundamentally according to the purposes pursued in the research. In this sense Varela (2021), based on an analysis carried out, groups them into descriptive or decisional and alleges that one of the most widespread Multivariate Analysis techniques at present is the Principal Component Analysis (PCA) where the variables are quantitative, since it works with the Pearson correlation coefficient, designed to measure linear association between variables of this type, although there is the Principal Component Analysis option for categorical variables, which will be addressed in a future work.

Miranda (2011), refers that the objective of the ACP is to reduce the number of variables that intervene in an analysis of a certain process under study. And it states that the method consists of obtaining new variables (called Yp components) that are unrelated to each other and that keep a logical order, where the first component is the one with the greatest influence on the phenomenon under study and so on, that is:

VarY1+VarY2++VarYp=Total Variance=VarX1+VarX2++VarXp

such that:

VarY1>VarY2>>VarYp

How to describe the information contained in a data set by a smaller set of new variables or components? When is it effective to apply the Principal Components Method?

Principal Component Analysis is more effective to the extent that initially there is a marked correlation structure between the variables. In this respect, Miranda (2011) corroborates that, when there is no association between the variables, it makes no sense to carry out these types of analysis.

This procedure is used above all in exploratory data analysis and for descriptive purposes, it manages to simplify the studies that will be made from a smaller number of variables than the original, as well as to elucidate the relationship and weight between the observed variables and, at the same time, it allows observing the formation of groups of individuals attending to their behavior from graphic representations.

The application of this method starts from the data matrix of n individuals with p variables in which n ≥ p, where a sequence of steps that correspond to the following aspects is applied:

  • Construction of the components (it should be noted that when the quantitative variables appear on the same measurement scale, the variance and covariance matrix is ​​used, in the case that they are on a different scale, the correlation matrix (standardization) is used.

  • Selection of the number of components to take into account (percentage criteria: to include sufficient criterion of principal components that give a percentage of the acceptable variance (regularly above 70%), or eigenvalue criterion with values ​​that are greater or equal to 1, among other criteria. Practical experiences indicate working in the sense of a compromise between different criteria.

  • Analysis of the variables. Relationship or weight of variables in each component.

  • Biological sense of the components from their relationship with the initial variables.

  • Graphic analysis (individuals), formation of possible groups

At present, there are valuable results regarding the use of these techniques, as shown in the works of Mesa et al. (2018) in monoclonal antibody fermentation studies, in the same way they were used in investigations associated with biopharmaceutical purification processes carried out by Goza et al. (2020).Their use in problems associated with causality in Biomedical Sciences is also reported, which included the determination of risk factors and prognoses (Sagaro & Zamora , 2020), as well as studies of functional dynamic mechanical systems of internal combustion engines according to Aliaga et al. (2021), among other applications.

Example of Application of ACP in Post-Harvest Studies of Pineapple (Variety Cayena Lisa)

Pineapple is one of the most important commercial fruit crops in the world, it is known as the queen of fruits for its excellent taste and its implication in nutrition and health (Hernández et al., 2021), hence, currently the research associated with its characterization, nutritional composition, growth studies, quality, post-harvest, among other aspects, is intensified as shown in the works of Rangel et al. (2018) and Lorente et al. (2021), among others.

Luchsinger (2017) considers that one of the impacts of post-harvest studies lies in maintaining the quality of the products until their consumption, hence the importance of investigating the different indicators. The study was carried out in areas of the company of various crops located in Havana-Matanzas Plain, with a range of average annual temperature between 25 and 32 ºC and high environmental humidity. The Weight Loss (PP) was carried out through the weighing of the fruits with the use of the electronic scale, during the days (1, 2, 3, 5, 6, 8 and 10) of harvest, and indicators such as PP, firmness, color index (IC), soluble solids content (SSC) and pH. It is desired to analyze the behavior of these variables (5 variables) on the different days (6 individuals).

The data were processed using statistical software (Statgraphics Centurion, 2012). A previous analysis showed that there is a marked correlation structure among this group of variables, which shows a positive and direct relationship between (PP - pH with r = 0.84) and (of pH -SSC with r = 0.62). It was also observed a negative and inverse relationship between (PP-firmness with r = -0.80) and (CI-firmness with r = - 0.65), which suggests a study using principal component analysis.

Construction and Selection of the Number of Components

Table 1 shows the selection of two components (eigenvalues ​​above one). Note that the first two components explain 88.36% of the total variability. This indicates that, from 5 initial variables, two components can be extracted to explain the association between the variables and observations.

TABLE 1 Number of Principal Components from criteria of eigenvalues ​​and percentage 

Components Eigenvalue Percentage of variance Cumulative percentage
1 2.71 55.43 55.43
2 1.64 32.93 88.36
3 0.51 10.38 98.74
4 0.05 1.02 99.76
5 0.01 0.23 100

Relationship or Weight of Variables in Each Component

The weight of the variables in component 1 is fundamentally characterized by the variables loss of weight, pH and firmness (Table 2) while component 2 is characterized by the soluble solids content and the color index.

TABLE 2 Component weights 

Component 1 Component 2
Weight loss 0.562 -0.048
Firmness -0.516 0.386
(IC) 0.120 -0.728
(SSC) 0.329 0.484
pH 0.541 0.287

In the case of Component 1, with positive values ​​in weight loss and pH, it can be stated that as the value of Component 1 increases, the weight loss and pH increase and the firmness of the fruits decreases. On the other hand, in the case of the second component, as its value increases, it indicates that the values ​​of the contents of soluble solids increase and the color index decreases.

Formation of possible groups. Biological sense of the components from their relationship with the initial variables

FIGURE 1 Graphic analysis of individuals and group formation. Principal Component values ​​for each row. 

day Component 1 Component 2
1 -2.422 1.681
2 -0.524 0.738
3 -0.840 -0.802
5 -0.817 -1.593
6 0.463 -1.478
8 1.67 0.528
10 2.47 0.926

Considering the graphic representation (Figure 1), it can be argued that there are basically three groups in post-harvest. The first group characterized by the greatest loss of weight and pH, which occurs from the sixth day. From the physical point of view, the weight losses, associated with the water content of the fruit, indirectly decrease the concentration of hydrogen ions by doing this, that the pH rises, due to the senescence or putrefaction that it is reaching, which it does not facilitate its consumption as fresh fruit, hence the importance and timely decision-making for commercialization and industrialization.

In contrast to it, there is the third group, formed by the first day, where the greatest firmness is achieved, with the least loss of weight and pH. This answer is given due to the nature of the product, because once the exchange of ethylene with the surrounding environment begins, it causes increased respiration and accelerates the ripening process, a recurring phenomenon in previous investigations with this or other agricultural products (Thompson, 1998). Likewise, a gradual response is reflected in the concentration of soluble solids contents that tends to influence its acceptance by consumers and marketers. As well as the color index which allows the naked eye to discern its state of maturity, regardless of its lowest value to be reached, is included in the first day after harvest as reflected in component 2.

The result obtained of the quality of pineapple represented by these groups constitutes a valuable tool that avoids from carrying out an exhaustive control of these properties during their commercialization, transport or storage and even to make up for the lack of instrumentation for their determination. This largely makes it a non-destructive tool to monitor the quality of the fruit in storage. One of the main purposes and curiosities of this research is also satisfied. This would enhance in this time range the timely decision-making in relation to its storage, transport and commercialization. This reaffirms the criterion that quality is sought from the field and is modulated post-harvest.

FIGURE 2 Biplot graph. 

Finally, the Biplot graph (Figure 2), allowed the joint analysis of variables and individuals. The positive relationship among SSC, pH and weight loss and the negative relationship of firmness with the previous variables can be appreciated; corresponding to days 8 and 10 the highest values ​​of SSC, pH and weight loss and the lowest values ​​of firmness in contrast to day 1. Similarly, it is observed (by means of perpendicular to the firmness axis) that the greatest firmness is reached in the first two days.

CONCLUSIONS

  • It is concluded that the use of multivariate techniques, on methodological bases and emphasis on the interpretation of the results, increases the quality of scientific research in agricultural and related processes.

  • The use of principal component analysis is an alternative analysis tool in post-harvest studies and constitutes an efficient and non-destructive way to monitor the quality of fruits in storage.

REFERENCES

ALIAGA, N.R.; DE LA TORRE, S.F.; RODRÍGUEZ, S.A.A.; GUILLÉN, G.J.: “Análisis de componentes principales en los motores de combustión interna Hyundai 1.7 MW”, Revista Ingeniería Energética, 42(1), 2021, ISSN: 1815-5901. [ Links ]

BOUZA, C.N.; SISTACHS, V.: Estadística, teoría básica y ejercicios, Ed. Editorial Félix Varela, La Habana, Cuba, 2006, ISBN: 959-258-373-0. [ Links ]

CORONADOS, Y.; VILTRES, V.; SISTACH, V.: “Aplicación de técnicas estadísticas multivariantes en el análisis de datos”, Revista Cubana de Medicina Física y Rehabilitación, 9(2): 1-12, INFOMED., 2017. [ Links ]

FERNÁNDEZ, C.L.; GUERRA, B.C.W.; DE CALZADILLA, P.J.; CHANG, L.N.U.: “Desarrollo de la modelación estadístico-matemática en las ciencias agrarias. Retos y perspectivas”, Investigación Operacional, 38(5): 462-467, 2018, ISSN: 2224-5405. [ Links ]

FERNÁNDEZ, C.L.; RANGEL, M. de O.L.; GUERRA, B.C.W.; DEL POZO, F.J.: “Modelación Estadístico-Matemática en Procesos Agrarios. Una aplicación en la Ingeniería Agrícola”, Revista Ciencias Técnicas Agropecuarias, 28(2): 72-79, 2019, ISSN: 1010-2760, e-ISSN: 2071-0054. [ Links ]

GONZÁLEZ, Á.L.; SOLANO, H.L.; TILANO, J.: “Análisis multivariado aplicando componentes principales al caso de los desplazados”, Ingeniería y desarrollo, (23): 119-142, 2008, ISSN: 0122-3461. [ Links ]

GOZÁ, L.O.; FERNÁNDEZ, A.M.; RODRÍGUEZ, G.R.H.; OJITO, M.E.: “Aplicación del Análisis de Componentes Principales en el proceso de purificación de un biofármaco”, Vaccimonitor, 29(1): 5-13, 2020, ISSN: 1025-028X. [ Links ]

HAIR, J.F.; ANDERSON, R.E.: Multivariate data analysis, Ed. Pearson Prentice Hall, 5a ed., Madrid, España, 2004, ISBN: 84-8322-035-0. [ Links ]

HERNÁNDEZ, R.G.; ORTEGA, I.E.; ORTEGA, I.I.H.: “Composición nutricional y compuestos fitoquímicos de la piña (Ananas comosus) y su potencial emergente para el desarrollo de alimentos funcionale”, Boletín de Ciencias Agropecuarias del ICAP, 9(14): 24-28, 2021, ISSN: 2448-5357. [ Links ]

LORENTE, G.Y.; RODRÍGUEZ, H.D.; CAMACHO, R.L.; CARVAJAL, O.C.C.; DE ÁVILA, G.R.; GONZÁLEZ, O.J.; RODRÍGUEZ, S.R.: “Efecto de la aplicación de Biobras-16 sobre el crecimiento y calidad de frutos de piña ‘MD-2”, Revista de Cultivos Tropicales, 42(2), 2021, ISSN: 0258-5936. [ Links ]

LOZARES, C.C.; LÓPEZ, R.P.: “El análisis multivariado: definición, criterios y clasificación”, 1991. [ Links ]

LUCHSINGER, L.: Impacto de la postcosecha en la calidad de frutos de exportación, [en línea], Perú, Redagrícola, 2017, Disponible en: https://www.redagricola.com/pe/impacto-de-la-postcosecha-en-la-calidad-de-frutas-de-exportacion, [Consulta: 9 de julio de 2021]. [ Links ]

MESA, R.L.; GOZÁ, L.O.; URANGA, M.M.; TOLEDO, R.A.; GÁLVEZ, T.Y.: “Aplicación del Análisis de Componentes Principales en el proceso de fermentación de un anticuerpo monoclonal”, Vaccimonitor, 27(1): 8-15, 2018, ISSN: 1025-028X, e-ISSN: 1025-0298. [ Links ]

MIRANDA, I.: Estadística Aplicada a la Sanidad Vegetal, Inst. Centro Nacional de Sanidad Agropecuaria (CENSA), folleto, San José de las Lajas, mayabeque, Cuba, 173 p., 2011. [ Links ]

QUINDEMIL, T.E.M.; RUMBAUT, L.F.: “Análisis de componentes principales para obtener indicadores reducidos de medición en la búsqueda de información”, Revista Cubana de Información en Ciencias de la Salud, 30(3), 2019, ISSN: 2307-2113. [ Links ]

RANGEL, M. de O.L.; MONZÓN, M.L.L.; GARCIA, C.J.; GARCIA, P.A.: “Técnicas matemáticas para inferir cambios poscosecha en las propiedades de productos agrícolas”, Revista Ciencias Técnicas Agropecuarias, 27(4): 42-54, 2018, ISSN: 1010-2760, e-ISSN: 2071-0054. [ Links ]

ROBAINA, C.G.R.; MEDINA, P.; MANUEL, J.; MORALES, R.J.M.; ROBAINA, C.R.E.: “Análisis multivariado de factores de riesgo de prematuridad en Matanzas”, Revista Cubana de obstetricia y ginecología, 27(1): 62-69, 2001, ISSN: 0138-600X. [ Links ]

SAGARÓ, D.C.N.M.; ZAMORA, M.L.: “Técnicas estadísticas multivariadas para el estudio de la causalidad en Medicina”, Revista Ciencias Médicas, 24(2), 2020, ISSN: 1561-3194. [ Links ]

STATGRAPHICS CENTURION: Statgraphics Centurion, X.: “Version 16.1. 17”, Statpoint Technologies, Inc., 2012. [ Links ]

THOMPSON, K.A.: Tecnología post-cosecha de frutas y hortalizas, Ed. Kinesis Ltda., Colombia, 268 p., 1998. [ Links ]

VARELA, M.: Análisis multivariado, [en línea], Ediciones INCA, 2021, Disponible en: http://ediciones.inca.edu.cu/files/folletos/analisismultivariado .pdf, [Consulta: 30 de abril de 2021]. [ Links ]

Received: May 20, 2021; Accepted: November 12, 2021

*Author for correspondence: Lucía Fernández-Chuairey: e-mail: lucia@unah.edu.cu

Lucía Fernández-Chuairey, Profesor Titular, Universidad Agraria de La Habana (UNAH), Departamento de Matemática y Física, e-mail: lucia@unah.edu.cu

Lazara Rangel-Montes de Oca, Profesor Asistente, (UNAH), Departamento de Ingeniería Agrícola, e-mail: lazarar@unah.edu.cu

Mario Varela Nualles, Investigador Titular, Instituto Nacional de Ciencia agrícola (INCA), e-mail: varela@inca.edu.cu

José Antonio Pino Roque, Profesor Auxiliar (UNAH), Departamento de Matemática y Física, e-mail: pino@unah.edu.cu

Jany del Pozo-Fernández, Instructor, Universidad Agraria de La Habana (UNAH), Facultad de Medicina Veterinaria, e-mail: janydelpozo@gmail.com

Nelson Ulises Lim Chamg, Profesor Auxiliar (UNAH), Departamento de Matemática y Física e-mail: limc@unah.edu.cu

The authors of this work declare no conflict of interests.

AUTHOR CONTRIBUTIONS: Conceptualization: L. Fernández Data curation: L. Fernández, L.R. Montes de Oca. Formal analysis: L. Fernández, J. A. Pino, J.del Pozo, N. U. Lim, Investigation: L. Fernández, L.R. Montes de Oca ,M Varela, J. A. Pino ¸ J.del Pozo, N. U. Lim. Methodology: Resources : L. Fernández , L.R. Montes de Oca. Roles/Writing, original draft: L. Fernández. Writing, review & editing: L. Fernández, L.R. Montes de Oca , M Varela, J. A. Pino ¸ J.del Pozo, N. U. Lim

Creative Commons License