Proposal of a mixed linear and a mixed generalized model for the analysis of an experiment in rumen microbiology

Herrera Villafranca, Magaly; Galindo Blanco, Juana; Padilla Corrales, C.; Guerra Bustillo †, Caridad W.; Medina Mesa, Yolaine; Sarduy García, Lucia; Herrera Villafranca, Magaly; Galindo Blanco, Juana; Padilla Corrales, C.; Guerra Bustillo †, Caridad W.; Medina Mesa, Yolaine; Sarduy García, Lucia

My SciELO

Custom services

Services on Demand

Article

Send this article by e-mail

Indicators

Cited by SciELO

Cuban Journal of Agricultural Science

On-line version ISSN 2079-3480

Cuban J. Agric. Sci. vol.54 no.2 Mayabeque Apr.-June 2020 Epub June 01, 2020

BIOMATHEMATICS

Proposal of a mixed linear and a mixed generalized model for the analysis of an experiment in rumen microbiology

0000-0002-2641-1815Magaly Herrera Villafranca¹^*, 0000-0001-8639-4693Juana Galindo Blanco¹, 0000-0002-6794-8694C. Padilla Corrales¹, Caridad W. Guerra Bustillo †², Yolaine Medina Mesa¹, Lucia Sarduy García¹

^¹Instituto de Ciencia Animal, Apartado Postal 24, San José de las Lajas, Mayabeque, Cuba.

^²Universidad Agraria de La Habana (UNAH) Fructuoso Rodríguez Pérez, Carretera Tapaste y Autopista Nacional, km 23 1/2. San José de Las Lajas, Mayabeque, Cuba.

Abstract

The objective of this study was to propose the mixed generalized and mixed linear models for the analysis of an experiment in rumen microbiology. For developing this research, data from a study developed in the Department of Biophysiological Sciences of the Institute of Animal Science were used. The effect of different origins and/or varieties of Moringa oleifera on the ruminal microbial population was evaluated. A completely randomized design was applied, associated with a simple variance analysis model, with a 6x3 factorial arrangement. Eighteen treatments were established, which were related to the origin or varieties of Moringa oleifera and three times, with six repetitions each. Theoretical assumptions of the analysis of variance for the original variables homogeneity and normality of errors were verified. When they were not fulfilled, the mixed generalized linear model was used as an alternative to the analysis, and if not, the mixed linear model, with the help of GLIMMIX and MIXED procedure of SAS. In both models, treatment, hour and interaction treatment per hour were considered as fixed effects, and nested repetition within hours was considered as random. Results showed that the mean square of the error was low, when mixed procedures were used. Standard errors also decreased, which contributes to greater precision in results. From this perspective, these models are proposed for the analysis of related variables and counting experiments in the ruminal microbial population.

Key words: GLIMMIX; analysis of variance assumptions; nested effect

Parametric analysis of variance is the most widely used statistical method in data analysis, developed by Fisher in the 1920s. However, it is necessary to comply with the theoretical assumptions for its use. Some of them state that errors are normally and independently distributed, that their variances are homogeneous, and to consider the attachment capacity of the model. When any of these theoretical assumptions fail, the use of other analysis methods is suggested, such as mixed linear (MIXED) and mixed generalized (GLIMMIX) models.

Mixed models, according to ^{Dicovskiy and Pedroza (2017)}, are a proposal for advanced statistical modeling, which allow improving the quality of analysis of fixed and random factors, by modeling random variability and error correlation. They are very useful for unbalanced data analysis, which are data with some type of hierarchical structure. Therefore, they allow to estimate variability among groups and that of effects nested within groups.

^{Nelder and Wedderbum (1972)} group different statistical models, which they released as generalized linear (MLGnz), which constitute an extension of classical general linear (MLG). These models can be applied to distributions of normal, binomial, Poisson, gamma type, among others (^{Mandujano et al. 2016}, ^{Díaz et al. 2017} and ^{Monterubbianesi 2017}).

^{Wang et al. (2015)} state that data measured in agricultural research does not satisfy the premises of general linear models, so that mixed generalized linear models provide an analysis that does not necessarily require normal distribution of variables, by allowing these to be fitted to an exponential family distribution.

These models have been widely disseminated in social sciences, psychology, and medical sciences. However, in agriculture, they have had little application, without taking into account that, on many occasions, situations are involved in which it is difficult to use the MLG in the analysis of variance and regression. This is because analyzed variables do not meet the assumptions of normality, variance homogeneity and independence of errors, so these models can be proposed as an alternative analysis.

Therefore, the objective was to propose the mixed generalized linear model in the analysis of an experiment in rumen microbiology.

Materials and Methods

For the research, data from an experiment developed in the Department of Biophysiological Sciences of the Institute of Animal Science was used. This study aimed to evaluate the effect of different varieties of Moringa oleifera and Cynodon nlemfuensis (star grass) on the ruminal microbial population, for which the chemical variables total bacteria and isovaleric acid were measured. The experiment consisted of a completely randomized design, with a 6 x 3 factorial arrangement. The factors were the six grass varieties and the three hours, with six repetitions each. Measurements were not performed on the same experimental unit. The statistical models used were the following:

Mixed generalized linear model:

E(y)= μ= g−1(Xβ)

Where:

E (y)	- expected value of response variable (total bacteria counting and isovaleric acid)
Xβ	- linear predictor (linear combination of a β unknown parameter)
g	- link function, which belongs to a member of exponential families of probability distributions.

Mixed linear model:

yij= = µ +αi+βj+ (αβ)ij+ eij

Where:

y_ijk	- response variable
μ	- general mean for all observations
α_i	- fixed effect of the i-th grass (i = 1, ..., 6)
β_j	- fixed effect of the j-th hour (j = 1, ..., 3)
(α β)_ij	- fixed effect of the i-th grass in interaction with the fixed effect of the j-th hour (ij = 1, ..., 18)
e_ik	- random error associated with all observations

The theoretical assumptions of the analysis of variance for the original variables were verified. For variance homogeneity of treatments, ^{Levene (1960)} test was used. Normality of errors was evaluated using ^{Shapiro-Wilk (1965)} test. In this analysis, the variable total bacteria did not comply with both assumptions, and after transformation, its fulfillment did not improve. The original isovaleric acid variable did meet these assumptions, so it was not necessary to perform data transformation.

For the variable that did not meet the theoretical assumptions of analysis of variance, mixed generalized linear model was applied as an analysis alternative, using GLIMMIX procedure. When theoretical assumptions of the analysis of variance were fulfilled, mixed linear model was used, with the help of the PROC MIXED, both from SAS. In the statistical analyzes, treatments, hours and interaction treatments per hours were considered as fixed effects. Nested repetition within hours was considered as a random effect. For total bacteria variable, normal, Poisson, lognormal, and gamma distributions were tested, the latter being the best fit, with log link function.

Toeplitz (Toep) variance-covariance structures, variance component (VC), composite symmetry (CS), autoregressive of order 1 (AR [1]) and unstructured (UN) were tested. To select the one with the best fit to the data, information criteria [Akaike (AIC), corrected Akaike (AICC) and Bayesian (BIC)] were used, which was considered the smallest value. For mean comparison, fixed range test was used (^{Kramer 1956}). Data was analyzed with ^{SAS (2013)} statistical package, version 9.3.

Results and Discussion

Table 1 shows the analysis of theoretical assumptions normality of errors and variance homogeneity for the analyzed variables. It was observed that, for total bacteria, probability values in both tests were lower than 0.05, so these assumptions are not fulfilled. However, this value was higher than 0.05 for isovaleric acid. This shows the fulfillment of base hypotheses that support the analysis of variance.

Table 1 Fulfillment of ANAVA theoretical assumptions, for total bacteria and isovaleric acid variables

Variables	ANAVA theoretical assumptions	Statistical tests	P Value
Total bacteria, 10¹¹CFU/mL	Variance homogeneity	Levene	0.0266
Total bacteria, 10¹¹CFU/mL	Normality of errors	Shapiro-Wilk	0.0303
Isovaleric acid, mmol/L	Variance homogeneity	Levene	0.3513
Isovaleric acid, mmol/L	Normality of errors	Shapiro-Wilk	0.2033

CFU: colony forming units

^{Steel and Torrie (1996)} and ^{Peña (1994)} point out that normal distribution of errors has little influence on ANAVA to compare means, since this technique is robust to error deviations. However, they argue that the lack of normality can affect other assumptions, such as the variance homogeneity, and this happens especially when the number of observations of groups are very different. Nevertheless, when variance components are analyzed, normality can affect the analysis result.

According to ^{Gutiérrez and de la Vara (2012)}, variance homogeneity is an assumption that relates the residues of treatments, and offers an overview of the possible equality between them. For its analysis, Levene, Bartlett, Hartley, and other tests were used. However, Levene test is the most robust in the absence of normality.

When analyzing variables under study, it was observed that the total bacteria did not meet the variance homogeneity of residuals. ^{Peña et al. (2015)} state that, according to the nature of this type of variable, the use of classical statistical methods is not recommended because, in some cases, homogeneity assumption is not met.

It is necessary to verify the fulfillment of the theoretical assumptions of classical statistical methods before starting the statistical analysis for this type of research, since, according to results, selection of the appropriate statistical method is defined. The use of these statistical models also avoids all inconveniences that may affect the expected results. In addition, this type of model does not require fulfillment of these assumptions, and these are no longer a problem for data analysis.

Table 2 shows the analysis of variance and covariance structures in order to select the best fit model. For this, information criteria were considered. For total bacteria variable, the lowest value was obtained with that of variance components (VC), and for isovaleric acid, with the autoregressive of order one (AR (1)). However, composite symmetry (CS), unstructured (UN) and Toeplitz structures did not achieve convergence, and did not fit to the analyzed data. For this reason, the results for these structures are not reported. However, ^{Gómez (2019)} states that, for selecting the structure with the best fit to data, the one with the lowest values in the information criteria should be taken into account.

Table 2 Variance-covariance structure for total bacteria and isovaleric acid

Variables	Covariance structures	Information criteria
Variables	Covariance structures	AIC	AICC	BIC
Total bacteria, 10¹¹ CFU/mL	Toep	775.93	815.11	807.98
	VC	742.77	752.77	760.58
	CS	-	-	-
	AR(1)	744.77	755.90	763.47
	UN	-	-	-
Isovaleric acid, mmol/L	Toep	-	-	-
	VC	250.50	260.20	268.30
	CS	-	-	-
	AR(1)	249.10	259.80	267.80
	UN	-	-	-

CFU: colony forming units

^{Valdivieso (2013)} states that, to model covariance structures, data is available, in which the sample variance-covariances of the observed variables estimate the model parameters and their errors. ^{Liscano and Ortiz (2017)} report that if a structure that fits data is suspected, its use leads to a more efficient inference and estimation.

In the results of the table of analysis of variance, it is shown that mean square of the error was low, when mixed procedures were used. This could be because, when the effects are nested within the analysis, treatment variability decreases and better estimates are obtained (table 3). ^{Hernández et al. (2003)} refer that, when speaking of nested structure, and data is grouped into experimental units of different order, each with specific properties, according to the considered grouping level, it is necessary to eliminate this effect so that it does not affect the estimation of results.

Table 3 Results of mean square and error probability type I in the interaction for both analyses

Variables	Statistical analysis	Mean square of the error	Probability valueType I
Total bacteria, 10¹¹ CFU/mL	ANAVA	0.3712	<0.0001
Total bacteria, 10¹¹ CFU/mL	GLMMIX	0.2719	<0.0001
Isovaleric acid mmol/L	ANAVA	0.4951	0.4046
Isovaleric acid mmol/L	MIXED	0.3824	0.2122

CFU: colony forming units

Mixed generalized linear models and generalized mixed additive models are used for modelling nested data and spatial and temporal correlation structures in counting data or binomial data. Additive mixed-effect models and mixed-effect models are useful for nested data (also called panel data or hierarchical data), repeated measurements, and temporally and spatially correlated data (^{Zuur et al. 2009}).

Table 4 shows interaction results for the classical analysis of variance and the mixed generalized linear model. In both cases, interaction was significant. However, standard error was lower when this last was used. The analysis showed that the mixed generalized linear model, in some of the cases, was more conservative in finding similar groups.

Table 4 Results of the statistical analysis with both methods, for total bacteria variable

Variable	Statistical analysis	Treatment	Hour			SE Signf.
Variable	Statistical analysis	Treatment	1	2	3
Total viable bacteria, 10¹¹CFU/mL	ANAVA	Star grass	2.80^abcde (18.71)	2.29^abcdef (11.71)	1.18^f (4.71)	±0.31 P<0.0001
		Superganius	1.96^bcdef (8.04)	1.70^cdef (5.54)	2.49^abcdef (16.54)
		Tunera	3.04^abcd (26.21)	2.57^abcdef (16.71)	2.22^abcdef (10.04)
		Camerún	3.64^a (43.21)	3.17^abc (24.71)	1.46^ef (7.04)
		Paraguaya	2.51^abcdef (13.04)	3.41^ab (31.71)	1.59^ef (7.21)
		Planin	2.59^abcdef (17.21)	3.09^abcd (23.21)	2.84^abcde (19.71)
	GLMMIX	Star grass	2.93^abcde (18.71)	2.43 ^bcdef (11.71)	1.55 ^f (4.71)	±0.24 P<0.0001
		Superganius	2.08 ^cdef (8.04)	1.71 ^ef (5.54)	2.81^abcde (16.55)
		Tunera	3.27^abc (26.20)	2.82^abcde (16.33)	2.31 ^bcdef (10.04)
		Camerún	3.77 ^a (43.23)	3.21^abc (24.71)	1.95 ^def (7.03)
		Paraguaya	2.57 ^abcdef (13.04)	3.46 ^ab (31.72)	1.98 ^def (7.21)
		Planin	2.85 ^abcde (17.21)	3.14 ^abcd (23.20)	2.98 ^abcd (19.72)

CFU: colony forming units

When comparing both models, some of the treatment mean values that correspond to the mixed generalized linear model had a slight increase. This could be related to the adjustment of the link function, selected according to the distribution followed by the variable, so means are estimated by the effect of this link function.

When analyzing the isovaleric acid variable, it was observed that interaction between the main effects was not significant. Therefore, the main effects were reported (tables 5 and 6). In the effect of varieties, the standard error for the mixed procedure was slightly lower than the classical analysis of variance, although for both, no significant differences were found among treatments (table 5).

Table 5 Results of the statistical analysis with both methods for isovaleric acid, according to treatments

Statistical analysis	Treatments Variable	Star grass	Super ganius	Tunera	Camerún	Paraguaya	Planin	SE Signf.
ANAVA	Isovaleric acid mmol/L	2.01	1.89	1.45	1.89	1.60	1.83	±0.17 P=0.0693
MIXED	Isovaleric acid mmol/L	2.01	1.89	1.45	1.89	1.60	1.83	±0.15 P=0.0825

Table 6 reports the effect of hours. In both methods, standard errors presented similar results, and no significant differences were found among times. Therefore, this type of analysis can be proposed for research related to rumen microbiology experiments, as long as an adequate statistical analysis is carried out, justifying the use of these methods.

Table 6 Results of the statistical analysis with both methods for isovaleric acid, according hours

Statistical analysis	Variable	Hours			SE and Signif.
Statistical analysis	Variable	1	2	3	SE and Signif.
ANAVA	Isovaleric acid mmol/L	1.73	1.87	1.73	±0.12 P=0.6046
MIXED	Isovaleric acid mmol/L	1.73	1.87	1.73	±0.12 P=0.5469

According to ^{Gómez et al. (2012)} and ^{Dicovskiy and Pedroza (2017)}, mixed models are a proposal for advanced statistical modeling, which allow improving the quality of the analysis of fixed and random factors, by modeling random variability and error correlation. These models are very useful in the analysis of unbalanced data, or of data with some type of hierarchical or grouping structure.

From the results of this research, it is concluded that mixed models improve accuracy and precision of analysis results. The mean square of the smallest error is obtained when using mixed procedures, and standard errors decrease with respect to classical analysis of variance. From this perspective, these models are proposed for the analysis of variables related to counting experiments in the rumen microbial population.

References

Díaz, E.J., Bermúdez, D. & Pineda, W. 2017. Estimación de un modelo lineal generalizado mixto para datos de conteo con exceso de ceros. Diploma Thesis. Facultad de Estadística, Universidad Santo Tomás, Bogotá, Colombia [ Links ]

Dicovskiy, L.M. & Pedroza, M.E. 2017. "General and Mixed linear models in the characterization of the qualification variable, agroindustrial engineering, uni-north". Nexo Revista Científica, 30(2): 84-95, ISSN: 1995-9516 [ Links ]

Gómez, S., Torres, V., García, Y. & Navarro, J.A. 2012. "Statistical procedures most used in the analysis of measures repeated in time in the agricultural sector". Cuban Journal of Agricultural Science, 46(1): 1-7, ISSN: 2079-3480 [ Links ]

Gómez, S. 2019. Contribución estadística para el análisis de medidas repetidas en el tiempo en el sector agropecuario. PhD Thesis. Departamento de Biomatemática, Instituto de Ciencia Animal, Mayabeque. Cuba [ Links ]

Gutiérrez, H. & de la Vara, R. 2012. Análisis y diseño de experimentos. 3rd Ed. Ed. Mc Graw-Hill Latinoamericana Editores S.A de C.V, México D.F., México, ISBN: 978-607-15-0725-9 [ Links ]

Hernández, M.V., Colmenares F. & Martínez R. 2003. "Modelos jerárquicos “por piezas” en el análisis de la relación entre discontinuidad conductual y discontinuidad en procesos subyacentes". Revista Anales de Psicología, 19(1): 159-171, ISSN: 0212-9728 [ Links ]

Kramer, C.Y. 1956. “Extension of Multiple Range Tests to Group Means with Unequal Numbers of Replications”. Biometrics, 12(3): 307-310, ISSN: 0006-341X, DOI: 10.2307/3001469 [ Links ]

Levene, H. 1960. Robust tests for the equality of variance. Contributions to Probability and Statistics. 1^st Ed. Ed. Stanford University Press, Palo Alto, California, USA, p. 278-292. [ Links ]

Liscano, J.M. & Ortiz, A. F. 2017. Modelos mixtos para datos composicionales: Una aplicación con resultados electorales en Colombia. Diploma Thesis. Facultad de Estadística, Universidad Santo Tomás, Bogotá, Colombia [ Links ]

Mandujano, S., Kéry, M. & Royle, J. A. 2016. "Applied hierarchical modeling in ecology: analysis of distribution, abundance and species richness in R and BUGS". Revista Mexicana de Biodiversidad, 88(2): 485-486, ISSN: 2007-8706, DOI: http://dx.doi.org/10.1016/j.rmb.2017.03.028 [ Links ]

Monterubbianesi, M.G. 2017. Evaluación de alternativas para el análisis estadístico y de aspectos del diseño en ensayos de larga duración para estudios agronómicos. PhD Thesis. Departament de Producció Vegetal i Ciència Forestal , Universitat de Lleida, Catalunya, España, p. 193 [ Links ]

Nelder, J.A. & Wedderburn, R.W.M. 1972. "Generalized linear models". Journal of the Royal Statistical Society: Series A (General), 135(3): 370-384, ISSN: 1467-985X, DOI: https://doi.org/10.2307/2344614 [ Links ]

Peña, S. 1994. Estadística. Modelos y métodos: 2. Modelos lineales y series temporales. 4th Ed. Ed. Alianza, S.A., Madrid, España, p. 745, ISBN: 84-206-8110-5 [ Links ]

Peña, J.A., Rosales, Y. & Giampaolo, O. 2015. "Estudio del crecimiento bacteriano. Enfoque de análisis de datos con medidas repetidas". Revista de la Facultad de Farmacia, 57(2): 8-17, ISSN: 2244-8845 [ Links ]

SAS Institute Inc. 2013. Statistical Analysis Software SAS/STAT®, version 9.1.3, Cary, N.C., USA, Available: <http://www.sas.com/en us/software/analytics/stat.html#>. [ Links ]

Shapiro, S. & Wilk, B. 1965. "An analysis of variance test for normality (complete samples)". Biometrika, 52(2): 591-611, ISSN: 1464-3510, DOI: http://dx.doi.org/10.2307/2333709 [ Links ]

Steel, R.G. & Torrie, I.H. 1996. Bioestadística: principios y procedimientos. 2nd Ed. Ed. McGraw-Hill Interamericana SA., México D.F., México, p. 622, ISBN: 968-451-495-6 [ Links ]

Valdivieso C.E. 2013. "Efecto de los métodos de estimación en las modelaciones de estructuras de covarianzas sobre un modelo estructural de evaluación del servicio de clases". Revista Comunicaciones en Estadística, 6(1): 21-43, ISSN: 2027-3355, DOI: https://doi.org/10.15332/s2027-3355.2018.0001.03 [ Links ]

Wang, T., He, P., Ahn, K.W., Wang, X., Ghosh, S. & Laud, P. 2015. "A re-formulation of generalized linear mixed models to fit family data in genetic association studies". Frontiers in Genetics, 6: 120, ISSN: 1664-8021, DOI: https://doi.org/10.3389/fgene.2015.00120 [ Links ]

Zuur, A., Ieno, E., Walker, N., Saveliev, A. & Smith, G. 2009. Mixed Efects Models and Extensions in Ecology with R. Ed. Springer Science & Business Media, New York, USA, ISBN: 978-0-387-87457-9, DOI: https://doi.org/10.1007/978-0-387-87458-6 [ Links ]

Received: June 12, 2019; Accepted: January 06, 2020

^*Email: mvillafranca@ica.co.cu

My SciELO

Services on Demand

Article

Indicators

Related links

Share

Cuban Journal of Agricultural Science

On-line version ISSN 2079-3480

Cuban J. Agric. Sci. vol.54 no.2 Mayabeque Apr.-June 2020 Epub June 01, 2020