Programs of extension and technological innovation in agriculture allow the transformation of productive systems, taking into account different factors that influence agricultural production. Currently, these studies are accompanied by the application of surveys that address quantitative and qualitative aspects of systems in which results are introduced.
According to the ^{National Office of Statistics and Information (2020)}, Cuban ovine production is mostly developed in the eastern and central region, with 55.7 and 31.4 % of total heads, respectively. Extensive grazing predominates in breeding systems of this species and natural pastures, with poor nutritional value and low productive yields, as a basic diet (^{Herrera et al. 2020}). Sheep raising in Ciego de Ávila province is very important for farmers and their families, as it serves mainly for meat consumption, and as an alternative way to obtain economic income through animal sales. It is characterized by presenting integral herds with all categories of animals, up to 20 and between 20 and 40 sheep (^{Borroto et al. 2011}), and requires the application of technologies that contribute to the sustainable development of the breeding of this species.
In ovine production systems, technological options that contribute to animal welfare should be used under tropical conditions, increasingly affected by high temperatures and relative humidity, which condition heat stress and can affect feed intake, weight gain and reproductive performance, as well as physiological and biochemical parameters (^{Macías-Cruz et al. 2018} and ^{Vicente et al. 2020}). In this sense, the transformation of the microclimate in silvopastoral systems has an important role in regulating solar radiation (^{López-Vigoa et al. 2017}) and promoting thermal well-being (^{Sousa et al. 2015}). However, in the region, grazing systems for sheep may or may not be associated with different trees species. These systems are not characterized by their different components, which would allow the interpretation of factors that can affect ovine production and would serve as a basis for designing improvement strategies.
For characterizing production systems, methods for data collecting from surveys are used. Most of questions (items) have qualitative answers, so the methods to be used must be adequate for these types of variables. According to ^{Navarro et al. (2010)}, in social research, data sets that reflect some quality or category are mainly involved, and may contain a mixture of different types of variables, many of which are measured in categories, ordered or not.
The use of multivariate techniques is a way that jointly analyzes the variables that are measured for comprehensive responses to the different questions in the surveys. In Cuba, recently, the statistical model for measuring impact (^{Torres et al. 2008}, ^{2013}), based on the combination of principal component analysis (PCA) with cluster analysis, has been used for characterizing food with forages in dairy farms in Florencia municipality (^{Martínez-Melo et al. 2020}). It was utilized for determining the incidence of livestock practices on the productivity of herds (^{Benítez et al. 2016}), as well as to analyze the efficiency of milk production on farms of cooperative and farmer sector (^{Alonso et al. 2020}). It is also applied in measuring impact of biomass banks (^{Gudiño et al. 2020}), with the use of quantitative variables.
Nonlinear principal component analysis or categorical principal component analysis (CATPCA) is the analogous multivariate method for analyzing qualitative variables. Like PCA, it seeks to maximize total variance of the first principal components, transforming qualitative variables into quantitative variables, by maximizing correlations among all variables and allowing the existence of linear relationships among them and preserving variable measurement level (nominal, multiple nominal, ordinal and interval), as well as reducing system dimensionality through optimal scaling, first described by Gifi (1990, cit. by ^{Linting 2007}).
The objective of this study was to show the use of CATPCA with qualitative variables, measured in the study of sheep production systems in Ciego de Ávila province.
Materials and Methods
The study was carried out in Ciego de Ávila province, located in the central region of the country, with a surface area of 6,971.64 km^{2} and a land area of 6,194.90 km^{2}. It limits to the north with Bahamas channel and the Bahías de Los Perros and Buena Vista are located on its insular platform, bordered by some keys that form the Sabana - Camagüey archipelago, including Cayo Coco and Cayo Guillermo with 776.74 km^{2}. To the south, it limits with the Caribbean Sea, where there is a vast platform occupied by the Ana María Gulf, numerous keys with 776.74 km^{2} and a key line that is part of the Jardines de la Reina archipelago. The main economic activities of this province are agriculture, livestock, forestry and tourism.
According to ^{Sorí et al. (2017)}, Ciego de Ávila is characterized by very hot summers and short winters. During the course of the year, temperature generally ranges from 18 °C to 33 °C, and rarely drops below 14 °C, or rises above 36 °C, with winds from the east to northeast from Cayo Coco to Júcaro. Accumulated mean monthly precipitation is in a range between 20 and 230 mm, depending on the rainy period (May-October) and dry (November-April). Relative humidity fluctuates from 72 to 85 % during the year. The Júcaro-Morón plain occupies most of the territory, made up of flat, gently rolling and hilly plains. The different types of soils that are presented are related to topography, being predominantly those of red ferralitic type, deep and with good drainage.
According to ^{Serrano et al. (2020)}, in Ciego de Ávila province, there are 53 403 total sheep heads. The study sample was composed by 296 sheep farmers, 74 of them belonging to the state sector and 222 from the private sector. An amount of 22 qualitative variables were registered, contained in a survey applied to sheep farmers of the province, distributed in the three regions and in the ten municipalities (figure 1).
Table 1 shows the names of variables, their characteristics and types.
Name of variables | Levels | Type | Scale |
---|---|---|---|
Municipalities | 10 | * | Nominal |
Region | 3 | North, center and south | Nominal |
Educational level | 5 | Primary, secondary, high school, technical and university | Nominal |
Gender | 2 | Male or female | Nominal |
More than one job | 2 | Yes or no | Nominal |
Sector (private or state) | 2 | State or private | Nominal |
Training | 2 | Yes or no | Nominal |
Land tenure | 2 | Yes or no | Nominal |
EGAME contract | 2 | Yes or no | Nominal |
Production objective | 3 | Sale to enterprises, self-consumption and sale to others | Nominal |
Relevance degree | 3 | Great importance, medium importance and low importance | Nominal |
Sire rotation | 2 | Yes or no | Nominal |
Selection criteria | 2 | Phenotypical and reproductive characteristics | Nominal |
Castration | 2 | Yes or no | Nominal |
Registration | 2 | Yes or no | Nominal |
Facilities | 4 | Very rudimentary, rudimentary, modern or has no facilities | Nominal |
Trees | 6 | In grazing areas, in life fences, protein Banks, integration to fruit trees, has no trees | Nominal |
Choraspast (classification according to grazing hours) | 2 | Continuous or semi-stabulated grazing | Nominal |
Ctipopast (classification according to grazing type) | 5 | Extensive, rotational, semi-transhumance, integrated to crops and in agroforestry systems | Nominal |
Grasses | 2 | Natural and improved | Nominal |
Forages | 2 | Yes or no | Nominal |
Supplementation | 4 | Vitamins and minerals, protein, byproducts and no supplementation | Nominal |
*Chambas, Bolivia, Morón, Florencia, Ciro Redondo, Majagua, Ciego de Ávila, Baraguá, Venezuela and 1^{ro} de Enero
Mathematical description of the CATPCA. The description presented has been described by ^{Morales (2004)} and ^{Navarro et al. (2010)}. The starting point was the data matrix H_{nxm} , which contains the observed scores of n cases in the m variables contained in the survey. Each variable can be denoted as the j-th column of H; hj as a vector n × 1, with j=1,. . . , m. If hj variables do not have a numerical measurement level, the relationship among them is expected to be non-linear, so it is necessary to apply a non-linear transformation. The transformation of each category obtains an optimal scaled value, called categorical quantization H, which is replaced by a matrix Q_{ij}, which contains the transformed variables qj = øj (hj). In the Q matrix, the observed scores of the cases are replaced by the categorical quantifications.
The CATPCA model is the same as the classical PCA, capturing the possible non-linearities of the relationships among transformation variables. The objective of the CATPCA is achieved by minimizing the so-called loss function, which accommodates weights according to multiple nominal transformations. The scores of the cases in the principal components obtained are called scores of the objects in CATPCA. These components, multiplied by a set of optimal weights, are identified as component saturations and approximate the original data as closely as possible.
If Xnxp is the matrix of the component scores, where p is the number of components, and A_{m×p} is the matrix of component saturations, its j-th row being indicated by aj, the loss function (stress), which for the minimization of the difference between the original data and the principal components is expressed as:
This loss function is subjected to a group of restrictions. First, the transformed variables are standardized, so that q'_{j}q_{j} = n. This restriction is necessary to solve the indeterminacy between q_{j} and a_{j} in the scale product q_{j}a'_{j} . This normalization implies that q_{j} contains z-scores and guarantees that the component saturations in a_{j} are correlated between variables and components. To avoid the trivial solution A = 0 and X = 0, scores of the objects are limited and it is required that X'X = nI, where I is the identity matrix. It is also necessary that the scores of the objects are centered. Therefore, 1'X = 0, where 1 represents the unit vector.
The two previous restrictions imply that the columns of X (components) are orthonormal z-scores (their mean is zero, their standard deviation is one) and they are uncorrelated. For nonlinear levels (nominal and ordinal), qj = øj (hj) denote a transformation according to the level of measurement selected for variable j.
The loss function is minimized by applying the alternating least squares, cyclically updating one of the parameters X, Q and A. According to ^{Young (1972)} and ^{Portillo and Mar (2007)}, this methodology of alternating least squares contemplates the transformation of any qualitative variable into quantitative variables through optimal scaling.
CATPCA has relative freedom with respect to basic assumptions. Data can be measured on any scale, multiple nominal, nominal, ordinal, or interval. The technique has a good representation of linear and non-linear relationships. The most important is the existence of association and/or covariation among variables.
Cronback coefficient was used for measuring survey reliability (^{Dominguez-Lara and Merino-Soto 2015}) using the formula:
Statistical processing was performed using the crosstable procedure to check the association among variables through contingency coefficient, based on χ^{2} and its significance. The optimal scaling for the nominally scaled variables, since they have a small number of categories (^{Navarro et al. 2010}), was conducted using the CATPCA procedure in the IBM-SPSS program, version 22 (2013).
Results and Discussion
To state the existing correlations among different variables, contingency coefficients were determined according to χ^{2}, specific for nominal variables. Table 2 shows the percentages of coefficients, which were significant (P ˂ 0.05; P ˂ 0.01 and P ˂ 0.001) for each.
Variables | % |
---|---|
Municipality | 71 |
Region | 52 |
Educational level | 95 |
Gender | 33 |
More than one job | 76 |
Sector (private or state) | 86 |
Training | 76 |
Land tenure | 81 |
EGAME contract | 86 |
Production objective | 90 |
Relevance degree | 67 |
Sire rotation | 90 |
Selection criteria | 24 |
Castration | 76 |
Registration | 81 |
Facilities | 95 |
Trees | 86 |
Choraspast | 76 |
Ctipopast | 81 |
Grasses | 76 |
Forages | 71 |
Supplementation | 71 |
Most of variables had percentages of significant relationships superior to 71, only the variables region, gender, relevance degree and selection criteria had low percentages, lower than 70, so they could be eliminated. However, they were maintained in the first analysis.
The first step in the development of the CATPCA is the normalization method, called principal by variables, which objective is to optimize the association among variables. The coordinates of variables in the space of cases are the component saturations (correlations with principal components or dimensions and scores of objects).
Table 3 shows statistics of the solution with all variables, which include the recorded variance and losses in the first and last iteration for a convergence level that is established, which is 0.0001 in this case.
Number of iteration | Contabilized variance for | Losses | |||
---|---|---|---|---|---|
Total | Increase | Total | Centroid coordinates | Centroid restriction in vector coordinates | |
0^{a} | 12.632 | 0.002 | 97.368 | 94.804 | 2.564 |
58^{b} | 13.931 | 0.000009 | 96.069 | 94.454 | 1.615 |
The iterative algorithm stopped when the difference of total fit between the last two iterations was lower than the pre-set convergence value, which was reached in iteration 58. The explained variance was 13.93, with an increase of 000009 and a loss of 96.069 for a five-dimensional model, since the CATPCA, like its counterpart PCA for numerical variables, allows to generate as many dimensions as variables are included. However, its fundamental objective is dimension reduction, so the summary of the model fitted for these dimensions is shown in table 4.
Dimension | Cronbach alpha | Contabilized variance for | |
---|---|---|---|
Total (self-value) | % of variance | ||
1 | 0.864 | 5.708 | 25.946 |
2 | 0.699 | 3.004 | 13.656 |
3 | 0.596 | 2.318 | 10.537 |
4 | 0.374 | 1.557 | 7.075 |
5 | 0.268 | 1.343 | 6.107 |
Total | 0.972 | 13.931 | 63.321 |
Total percentage of variance explained by the first five dimensions is 63.32 %, which can be considered as adequate. ^{Vázquez et al. (2017)} found a value of 61.409, when conducting a study in the Empresa Pecuaria Valle del Perú, in San José de las Lajas municipality, where they included quantitative variables. The table also shows the value of Cronbach alpha coefficient (0.972), which indicates a high internal consistency of data and a highly reliable scale.
Despite the great diffusion of this coefficient, ^{Ventura-León and Caycho-Rodríguez (2017)} have criticized it, since they state that it has limitations as it is affected by the number of questions, the number of alternative answers and the proportion of the variance of the test, proposing the Omega coefficient (ω) instead, as reported by ^{Domínguez-Lara and MerinoSoto (2015)}. It is then proposed by ^{Domínguez-Lara (2016)}, the coefficient H, which functions as an estimate of reliability of the survey and is interpreted as the variability percentage of the latent variable, explained by the indicators. This author concludes that H is a complementary measure, which can be helpful in analytical processes aimed at reporting psychometric properties of assessment instruments. According to this researcher, although some methodological developments remain pending, it is an interesting alternative in the analytical framework of structural equation models. However, in the present study, it is considered that Cronbach coefficient can be used, since the original variables will not be replaced by the selected factors.
The five selected dimensions have eigenvalues superior to the unity (table 4). These values are equivalent to those of the classic PCA, and warn about the percentage of information retained in each dimension, in which the latent root criterion helps to select those factors with eigenvalues, superior to the unity and positive Cronbach coefficients in each dimension. Although the last two dimensions have a Cronbach value close to zero and an eigenvalue close to one, their inclusion will be decided in the matrix of weights or saturations.
Saturation matrix is a correlation matrix, which considers dimensions in columns and transformed initial variables in rows. Each coefficient inside the matrix measures the relationship between a variable and the dimension and it is interpreted as a correlation coefficient, which assumes values between -1 and 1. Variables with high saturations in a dimension (independent of the sign) are indicators of association between variable and dimension. The maximum value of weights is one and corresponds to a variable which variability is fully explained by the dimension. The minimum number zero indicates that the variable has no relation to the dimension. Finally, the dimension is identified with a label, according to the highest coefficients it contains (table 5).
Variables (items) | Dimensions | ||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | |
Municipality | 0.01 | 0.85 | 0.07 | -0.39 | 0.27 |
Region | 0.02 | 0.82 | 0.07 | -0.39 | 0.28 |
Educational level | 0.71 | -0.08 | 0.03 | -0.12 | 0.23 |
Gender | 0.15 | -0.13 | -0.15 | -0.04 | 0.13 |
More than one job | -0.53 | -0.02 | -0.56 | -0.15 | 0.32 |
Sector (private or state) | 0.74 | -0.18 | -0.01 | -0.20 | 0.26 |
Training | 0.42 | -0.34 | -0.32 | -0.30 | 0.14 |
Land tenure | 0.64 | 0.20 | 0.59 | -0.14 | -0.28 |
EGAME contract | -0.65 | 0.04 | -0.29 | -0.41 | -0.41 |
Production objective | 0.66 | -0.07 | 0.23 | 0.40 | 0.41 |
Relevance degree | 0.16 | 0.46 | 0.23 | 0.17 | -0.24 |
Sire rotation | 0.43 | -0.71 | -0.06 | -0.02 | -0.10 |
Selection criteria | -0.15 | -0.18 | 0.11 | 0.31 | -0.10 |
Castration | 0.48 | -0.07 | -0.28 | 0.08 | -0.28 |
Registration | 0.76 | -0.26 | -0.03 | -0.27 | 0.25 |
Facilities | 0.49 | 0.38 | -0.31 | 0.12 | 0.02 |
Trees | -0.01 | 0.39 | -0.22 | 0.63 | 0.20 |
Choraspast | -0.41 | -0.33 | 0.41 | -0.20 | 0.25 |
Ctipopast | -0.65 | -0.14 | -0.58 | 0.09 | 0.28 |
Grasses | -0.56 | -0.37 | 0.49 | -0.08 | 0.23 |
Forages | 0.64 | 0.07 | -0.40 | -0.24 | -0.12 |
Supplementation | -0.57 | -0.07 | 0.43 | 0.01 | 0.22 |
To carry out a detailed analysis of these results, the first decision to make is about the magnitude to be established as the positive inferior limit or negative superior limit, or both, for the selection of variables that influence the most on the explanation of each dimension. As this value indicates the correlation in each dimension with the variables, it is logical to analyze the variables that have low saturations in each dimension and the dimension with low saturation coefficients in most of the variables.
The variables gender, relevance degree and selection criteria had the lowest saturation in all dimensions, and they also had the lowest percentages of relationship with the rest of variables (table 1). Municipality and region have similar coefficients, which seems to indicate that both explain farm location. Finally, dimension five has a Cronbach coefficient closer to zero, which is why these variables and said dimension are eliminated. In this regard, ^{Morales (2004)} stated that it should not be forgotten that the fundamental objective of the method is to reduce information. Results are shown in table 6.
Dimension | Cronbach alpha | Contabilized variance for | |
---|---|---|---|
Total (self-value) | % of variance | ||
1 | 0.873 | 5.704 | 31.690 |
2 | 0.609 | 2.353 | 13.073 |
3 | 0.589 | 2.255 | 12.528 |
4 | 0.318 | 1.428 | 7.936 |
Total | 0.969 | 11.741 | 65.227 |
This model, with the eliminated variables and with four dimensions, reaches 65.23 % of the total explained variance, which is higher than the five-dimensional model (63.32). The first dimension explains more than 30 % of variability, while the second, third and fourth explain the remaining 30 %. Saturations for the model with four dimensions are presented in table 7.
Variables | Dimensions | |||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Municipality | 0.171 | 0.729 | -0.270 | 0.047 |
Educational level | 0.717 | -0.071 | 0.100 | -0.151 |
More than one job | -0.541 | -0.455 | -0.344 | 0.227 |
Sector (private or state) | 0.738 | -0.250 | 0.172 | -0.104 |
Training | 0.411 | -0.575 | 0.022 | -0.039 |
Land tenure | 0.649 | 0.461 | 0.339 | 0.386 |
EGAME contract | -0.655 | -0.260 | -0.195 | 0.578 |
Production objective | 0.648 | 0.206 | 0.178 | -0.592 |
Sire rotation | 0.426 | -0.532 | 0.363 | -0.011 |
Castration | 0.476 | -0.237 | -0.205 | 0.101 |
Registration | 0.756 | -0.308 | 0.205 | -0.061 |
Facilities | 0.522 | 0.121 | -0.450 | -0.031 |
Trees | -0.076 | 0.323 | -0.450 | -0.523 |
Choraspast | -0.403 | -0.108 | 0.614 | -0.088 |
Ctipopast | -0.663 | -0.442 | -0.350 | -0.348 |
Grasses | -0.549 | 0.004 | 0.659 | -0.150 |
Forages | 0.627 | -0.320 | -0.315 | 0.221 |
Supplementacion | -0.575 | 0.175 | 0.431 | -0.129 |
In dimension one there is a representation of the educational level, more than one job, private or state sector, land tenure, EGAME contract, production objective, registration, facilities, ctipopast, forages and supplementation, with the selection of saturation values superior to 0.50. These variables characterize sheep production systems in the province and it is important to note that those with negative signs are related to land tenure and supplementation. This may be explained by the type of grazing system, since semi-transhumant grazing is developed in 30 % and the extensive in 68 %. Farmers with land predominate (69 %), although 50 % use it for other purposes, such as agricultural crops and livestock. Likewise, there is an absence of supplementation and forages in 70.6 and 87.2 % of the productive systems, respectively.
Dimension two relates municipalities, training and sire rotation system. The latter with negative signs, which seems to indicate the differences among municipalities, according to these variables. This is related to the inclusion of new farmers who have not been trained. In addition, 50 % of the municipalities benefited from a training project developed during 2018. In this sense, sire rotation is affected in 69.6 % of farms, due to lack of technical criteria and training.
Dimension three relates the variables choraspast (classification of the system according to grazing hours) and the presence of natural and improved pastures, with a positive relationship, both variables with dimension. Continuous grazing systems are predominant, as well as the use of natural grasses in 96.3 and 94.3 % of the cases, respectively.
In dimension four, the variable presence of trees was located, with a negative relationship, because there is no dependence between different studied systems and the incorporation of trees. There is a lack of knowledge about the importance of including silvopastoral systems in all its variants, it is highlighted that 71.3 % of systems do not use this natural resource.
Figure 2 shows the saturation of variables in the first two dimensions. Outside of the selection, variables grasses and classification, according to grazing hours, presented the highest saturation in dimension three, and trees in dimension four.
Conclusions
Cronbach coefficient was correctly applied to measure reliability of the survey used, since the original variables were not replaced by the selected factors.
The application of multivariate analysis using the CATPCA allowed to identify the categorical variables that explained the greatest variance in sheep production system in Ciego de Ávila province.
The most highlighted variables with the application of the CATPCA and, which fundamentally explain sheep production system, were: educational level, more than one job, sector, land tenure, EGAME contract, production objective, registration, facilities, classification according to grazing type, forages, supplementation, municipality, training, rotation system, grazing hours and presence of natural and improved grasses.