INTRODUCTION
Historically, in agricultural sector, the need for the use of different statistical-mathematical methodologies that respond to current problems in scientific research has been present. Recently, Fernández et al. (2018; 2019), established criteria and evaluations on mathematical-statistical bases in the analysis and application of models that describe agrarian processes (based mainly on univariate and bivariate statistics).
Similarly, the literature reports on the use of multivariate methods, which are used to study phenomena that include the measurement of several variables and which are applied depending on the characteristics of the research. Among the most used multivariate statistical techniques are: Multiple Regression; Principal Component Analysis (PCA); Factor Analysis (AF); Discriminant Analysis (AD); the Numerical Taxonomy (CLUSTER); Multidimensional Scaling, among others, those that have been addressed by Lozares & López (1991); Robaina et al. (2001); Hair & Anderson (2004); Bouza & Sistachs (2006); González et al. (2008); Miranda (2011); Coronados et al. (2017); Quindemil & Rumbaut (2019); Gozá et al. (2020); Varela (2021) among other authors.
The objective of this work is to establish, on mathematical-statistical bases, a set of methodological criteria for the processing and interpretation of results with the use of the Principal Components method, its analysis is focused on post-harvest studies of pineapple (variety Cayenne Lisa)
DEVELOPMENT OF THE TOPIC
Theoretical Fundament
Various criteria have been given on the definition of multivariate statistical techniques. A general definition was proposed by Hair & Anderson (2004), who argue that “Multivariate analysis refers to all statistical methods that simultaneously analyze multiple measures of each individual or object under investigation and emphasize that any simultaneous analysis of more than two variables can be considered approximately as a multivariate analysis”.
These methods group a set of statistical techniques that are responsible for the analysis of data corresponding to measurements of p variables observed in n individuals; allowing the study of interrelations. The literature collects various multivariate methods, and classifies them fundamentally according to the purposes pursued in the research. In this sense Varela (2021), based on an analysis carried out, groups them into descriptive or decisional and alleges that one of the most widespread Multivariate Analysis techniques at present is the Principal Component Analysis (PCA) where the variables are quantitative, since it works with the Pearson correlation coefficient, designed to measure linear association between variables of this type, although there is the Principal Component Analysis option for categorical variables, which will be addressed in a future work.
Miranda (2011), refers that the objective of the ACP is to reduce the number of variables that intervene in an analysis of a certain process under study. And it states that the method consists of obtaining new variables (called Yp components) that are unrelated to each other and that keep a logical order, where the first component is the one with the greatest influence on the phenomenon under study and so on, that is:
such that:
How to describe the information contained in a data set by a smaller set of new variables or components? When is it effective to apply the Principal Components Method?
Principal Component Analysis is more effective to the extent that initially there is a marked correlation structure between the variables. In this respect, Miranda (2011) corroborates that, when there is no association between the variables, it makes no sense to carry out these types of analysis.
This procedure is used above all in exploratory data analysis and for descriptive purposes, it manages to simplify the studies that will be made from a smaller number of variables than the original, as well as to elucidate the relationship and weight between the observed variables and, at the same time, it allows observing the formation of groups of individuals attending to their behavior from graphic representations.
The application of this method starts from the data matrix of n individuals with p variables in which n ≥ p, where a sequence of steps that correspond to the following aspects is applied:
Construction of the components (it should be noted that when the quantitative variables appear on the same measurement scale, the variance and covariance matrix is used, in the case that they are on a different scale, the correlation matrix (standardization) is used.
Selection of the number of components to take into account (percentage criteria: to include sufficient criterion of principal components that give a percentage of the acceptable variance (regularly above 70%), or eigenvalue criterion with values that are greater or equal to 1, among other criteria. Practical experiences indicate working in the sense of a compromise between different criteria.
Analysis of the variables. Relationship or weight of variables in each component.
Biological sense of the components from their relationship with the initial variables.
Graphic analysis (individuals), formation of possible groups
At present, there are valuable results regarding the use of these techniques, as shown in the works of Mesa et al. (2018) in monoclonal antibody fermentation studies, in the same way they were used in investigations associated with biopharmaceutical purification processes carried out by Goza et al. (2020).Their use in problems associated with causality in Biomedical Sciences is also reported, which included the determination of risk factors and prognoses (Sagaro & Zamora , 2020), as well as studies of functional dynamic mechanical systems of internal combustion engines according to Aliaga et al. (2021), among other applications.
Example of Application of ACP in Post-Harvest Studies of Pineapple (Variety Cayena Lisa)
Pineapple is one of the most important commercial fruit crops in the world, it is known as the queen of fruits for its excellent taste and its implication in nutrition and health (Hernández et al., 2021), hence, currently the research associated with its characterization, nutritional composition, growth studies, quality, post-harvest, among other aspects, is intensified as shown in the works of Rangel et al. (2018) and Lorente et al. (2021), among others.
Luchsinger (2017) considers that one of the impacts of post-harvest studies lies in maintaining the quality of the products until their consumption, hence the importance of investigating the different indicators. The study was carried out in areas of the company of various crops located in Havana-Matanzas Plain, with a range of average annual temperature between 25 and 32 ºC and high environmental humidity. The Weight Loss (PP) was carried out through the weighing of the fruits with the use of the electronic scale, during the days (1, 2, 3, 5, 6, 8 and 10) of harvest, and indicators such as PP, firmness, color index (IC), soluble solids content (SSC) and pH. It is desired to analyze the behavior of these variables (5 variables) on the different days (6 individuals).
The data were processed using statistical software (Statgraphics Centurion, 2012). A previous analysis showed that there is a marked correlation structure among this group of variables, which shows a positive and direct relationship between (PP - pH with r = 0.84) and (of pH -SSC with r = 0.62). It was also observed a negative and inverse relationship between (PP-firmness with r = -0.80) and (CI-firmness with r = - 0.65), which suggests a study using principal component analysis.
Construction and Selection of the Number of Components
Table 1 shows the selection of two components (eigenvalues above one). Note that the first two components explain 88.36% of the total variability. This indicates that, from 5 initial variables, two components can be extracted to explain the association between the variables and observations.
Components | Eigenvalue | Percentage of variance | Cumulative percentage |
---|---|---|---|
1 | 55.43 | ||
2 | 32.93 | ||
3 | 0.51 | 10.38 | 98.74 |
4 | 0.05 | 1.02 | 99.76 |
5 | 0.01 | 0.23 | 100 |
Relationship or Weight of Variables in Each Component
The weight of the variables in component 1 is fundamentally characterized by the variables loss of weight, pH and firmness (Table 2) while component 2 is characterized by the soluble solids content and the color index.
Component 1 | Component 2 | |
---|---|---|
Weight loss | -0.048 | |
Firmness | 0.386 | |
(IC) | 0.120 | |
(SSC) | 0.329 | |
pH | 0.287 |
In the case of Component 1, with positive values in weight loss and pH, it can be stated that as the value of Component 1 increases, the weight loss and pH increase and the firmness of the fruits decreases. On the other hand, in the case of the second component, as its value increases, it indicates that the values of the contents of soluble solids increase and the color index decreases.
Formation of possible groups. Biological sense of the components from their relationship with the initial variables
day | Component 1 | Component 2 |
---|---|---|
1 | -2.422 | 1.681 |
2 | -0.524 | 0.738 |
3 | -0.840 | -0.802 |
5 | -0.817 | -1.593 |
6 | 0.463 | -1.478 |
8 | 1.67 | 0.528 |
10 | 2.47 | 0.926 |
Considering the graphic representation (Figure 1), it can be argued that there are basically three groups in post-harvest. The first group characterized by the greatest loss of weight and pH, which occurs from the sixth day. From the physical point of view, the weight losses, associated with the water content of the fruit, indirectly decrease the concentration of hydrogen ions by doing this, that the pH rises, due to the senescence or putrefaction that it is reaching, which it does not facilitate its consumption as fresh fruit, hence the importance and timely decision-making for commercialization and industrialization.
In contrast to it, there is the third group, formed by the first day, where the greatest firmness is achieved, with the least loss of weight and pH. This answer is given due to the nature of the product, because once the exchange of ethylene with the surrounding environment begins, it causes increased respiration and accelerates the ripening process, a recurring phenomenon in previous investigations with this or other agricultural products (Thompson, 1998). Likewise, a gradual response is reflected in the concentration of soluble solids contents that tends to influence its acceptance by consumers and marketers. As well as the color index which allows the naked eye to discern its state of maturity, regardless of its lowest value to be reached, is included in the first day after harvest as reflected in component 2.
The result obtained of the quality of pineapple represented by these groups constitutes a valuable tool that avoids from carrying out an exhaustive control of these properties during their commercialization, transport or storage and even to make up for the lack of instrumentation for their determination. This largely makes it a non-destructive tool to monitor the quality of the fruit in storage. One of the main purposes and curiosities of this research is also satisfied. This would enhance in this time range the timely decision-making in relation to its storage, transport and commercialization. This reaffirms the criterion that quality is sought from the field and is modulated post-harvest.
Finally, the Biplot graph (Figure 2), allowed the joint analysis of variables and individuals. The positive relationship among SSC, pH and weight loss and the negative relationship of firmness with the previous variables can be appreciated; corresponding to days 8 and 10 the highest values of SSC, pH and weight loss and the lowest values of firmness in contrast to day 1. Similarly, it is observed (by means of perpendicular to the firmness axis) that the greatest firmness is reached in the first two days.
CONCLUSIONS
It is concluded that the use of multivariate techniques, on methodological bases and emphasis on the interpretation of the results, increases the quality of scientific research in agricultural and related processes.
The use of principal component analysis is an alternative analysis tool in post-harvest studies and constitutes an efficient and non-destructive way to monitor the quality of fruits in storage.