Guillaume Souchay
2014-Dec-18 13:49 UTC
[R] Fitting Structural Equation Model with sem package and Summary issues
Hi all, I am trying to analyse bird data to investigate carry-over effect using structural equation model. I failed to run properly a big model with several latent variables with both L -> M block and M -> L block. Rather than trying again and again with the huge model, I am now looking to a subset of the model. Due to previous error message (singularity in the matrix), I scaled all the variables. Here is a subset of the data:> dataE[1:15,]Fledgling_date_t Total_Output_t Breed_nb.clutch_t.1 Breed_Egg_t.1 Breed_Total_Output_t.1 1 1.09397971 1.19657515 0.4696909 -0.69784742 1.2558119 2 0.62564592 0.37786584 0.4696909 0.02046473 -0.1762543 3 1.51548013 1.19657515 -1.0568046 0.89840181 -1.8947338 4 0.15731212 1.60592981 -1.0568046 -1.49597204 -1.3219073 5 0.48514578 -0.44084348 0.4696909 -0.69784742 0.6829854 6 1.93698054 -0.03148882 0.4696909 0.02046473 0.9693987 7 -1.66918968 -0.85019813 -1.0568046 0.97821428 -0.4626676 8 0.01681198 0.78722049 0.4696909 -0.53822250 1.2558119 9 0.34464564 1.60592981 -1.0568046 -0.13916019 -1.8947338 10 1.23447985 1.19657515 -1.0568046 1.93596382 -0.7490808 11 -0.12368816 -0.85019813 0.4696909 1.93596382 0.1101589 12 -0.17052154 0.78722049 -1.0568046 -0.45841004 -0.1762543 13 -1.52868954 -0.44084348 -1.0568046 1.37727658 -0.4626676 14 0.15731212 -1.25955279 -1.0568046 1.61671397 -0.4626676 15 -0.17052154 -0.85019813 -1.0568046 0.97821428 -1.6083205 Library(sem) # the covariance matrix for scaled data S.covE <- readMoments(diag=T,names=c("Fledgling_date_t","Total_Output_t","Breed_nb.clutch_t.1","Breed_Egg_t.1","Breed_Total_Output_t.1")) 1.0000000 0.350170246 1.0000000 -0.075832501 -0.099929893 1.0000000 -0.15439341 -0.091334987 -0.131698418 1.0000000 -0.191457491 -0.227843749 0.510666663 -0.386711653 1.0000000 # specification of the model - I also provided a diagram of the model in the attached PDF. modelE <- specifyModel() EndBreed -> Fledgling_date_t, lambda1, NA EndBreed -> Total_Output_t, lambda1, NA Fledgling_date_t <-> Fledgling_date_t, delta1, NA Total_Output_t <-> Total_Output_t, delta2, NA Fledgling_date_t <-> Total_Output_t, theta1, NA EndBreed -> BreedSucc, gamma1, NA EndBreed <-> EndBreed, phi1, NA BreedSucc -> Breed_Egg_t.1, lamdae, NA BreedSucc -> Breed_Total_Output_t.1, lamdae, NA BreedSucc -> Breed_nb.clutch_t.1, lamdae, NA Breed_nb.clutch_t.1 <-> Breed_nb.clutch_t.1, eps1, NA Breed_Egg_t.1 <-> Breed_Egg_t.1, eps2, NA Breed_Total_Output_t.1 <-> Breed_Total_Output_t.1, eps3, NA Breed_nb.clutch_t.1 <-> Breed_Egg_t.1, psie12, NA Breed_Egg_t.1 <-> Breed_Total_Output_t.1, psie23, NA Breed_nb.clutch_t.1 <-> Breed_Total_Output_t.1, psie13, NA BreedSucc <-> BreedSucc, zetae, NA # estimation of the model semE <- sem(modelE,S.covE,N=39,debug=T) To this point, everything seemed fine, the parameter were estimated after 129 iterations with all data. However, the problem arised when I asked for a summary of the model:> summary(semE)Error in summary.objectiveML(semE) : coefficient covariances cannot be computed But the model seemed to work well :> semEModel Chisquare = 0.9876903 Df = 1 lambda1 delta1 delta2 theta1 gamma1 phi1 lamdae eps1 eps2 0.8251654 0.3302009 0.3418300 -0.3138143 0.4122545 0.9752364 -0.4671335 0.8020365 0.7857964 eps3 psie12 psie23 psie13 zetae 0.7461566 -0.3377820 -0.6207350 0.2847632 0.8828395 Iterations = 75> semE$convergence[1] TRUE I also tried with using SpecifyEquations() instead of SpecifyModel() : # specification of the model using specifyEquations modelEe <- specifyEquations() Fledgling_date_t = lambda1*EndBreed Total_Output_t = lambda1*EndBreed c(Fledgling_date_t,Total_Output_t) = theta1 Breed_nb.clutch_t.1 = lamdae*BreedSucc Breed_Egg_t.1 = lamdae*BreedSucc Breed_Total_Output_t.1 = lamdae*BreedSucc c(Breed_nb.clutch_t.1,Breed_Egg_t.1) = psi12 c(Breed_nb.clutch_t.1,Breed_Total_Output_t.1) = psi13 c(Breed_Egg_t.1,Breed_Total_Output_t.1) = psi23 BreedSucc = gamma1*EndBreed v(EndBreed) = phi1 v(BreedSucc) = zeta1 v(Fledgling_date_t) = delta1 v(Total_Output_t) = delta2 v(Breed_nb.clutch_t.1) = eps1 v(Breed_Egg_t.1) = eps2 v(Breed_Total_Output_t.1) = eps3 # estimation of the model semEe <- sem(modelEe,covE,N=39,debug=T)> semEeModel Chisquare = 0.9876903 Df = 1 lambda1 theta1 lamdae psi12 psi13 psi23 gamma1 phi1 zeta1 0.8220182 -0.3346606 0.5034442 -0.3550646 0.2674806 -0.6380177 -0.3694630 1.0135693 0.8326144 delta1 delta2 eps1 eps2 eps3 0.3093554 0.3209828 0.7847537 0.7685137 0.7288741 Iterations = 79> summary(semEe)Error in summary.objectiveML(semEe) : coefficient covariances cannot be computed I also tried to set one loading to 1 instead of setting equality among loadings, but the results were the same. Could it be possible that the low number of data (N=39 but no NA inside) may be the cause of the error? In the model, the df is 1, thus all the parameters should be identifiable. Hoping you will have enough information to help a bit. Thanks in advance. Cheers, Guillaume -- Guillaume SOUCHAY, Ph.D Post-doctoral fellow in population dynamics --- "There is no true model" Anderson & Burhnam 1999 --- ?
John Fox
2014-Dec-19 16:27 UTC
[R] Fitting Structural Equation Model with sem package and Summary issues
Dear Guillaume, Please see comments interspersed below:> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of > Guillaume Souchay > Sent: Thursday, December 18, 2014 8:50 AM > To: r-help at r-project.org > Subject: [R] Fitting Structural Equation Model with sem package and > Summary issues > > Hi all, > > I am trying to analyse bird data to investigate carry-over effect > using structural equation model. > I failed to run properly a big model with several latent variables > with both L -> M block and M -> L block. > Rather than trying again and again with the huge model, I am now > looking to a subset of the model. > > Due to previous error message (singularity in the matrix), I scaled > all the variables. > Here is a subset of the data: > > dataE[1:15,] > Fledgling_date_t Total_Output_t Breed_nb.clutch_t.1 Breed_Egg_t.1 > Breed_Total_Output_t.1 > 1 1.09397971 1.19657515 0.4696909 -0.69784742 > 1.2558119 > 2 0.62564592 0.37786584 0.4696909 0.02046473 > -0.1762543 > 3 1.51548013 1.19657515 -1.0568046 0.89840181 > -1.8947338 > 4 0.15731212 1.60592981 -1.0568046 -1.49597204 > -1.3219073 > 5 0.48514578 -0.44084348 0.4696909 -0.69784742 > 0.6829854 > 6 1.93698054 -0.03148882 0.4696909 0.02046473 > 0.9693987 > 7 -1.66918968 -0.85019813 -1.0568046 0.97821428 > -0.4626676 > 8 0.01681198 0.78722049 0.4696909 -0.53822250 > 1.2558119 > 9 0.34464564 1.60592981 -1.0568046 -0.13916019 > -1.8947338 > 10 1.23447985 1.19657515 -1.0568046 1.93596382 > -0.7490808 > 11 -0.12368816 -0.85019813 0.4696909 1.93596382 > 0.1101589 > 12 -0.17052154 0.78722049 -1.0568046 -0.45841004 > -0.1762543 > 13 -1.52868954 -0.44084348 -1.0568046 1.37727658 > -0.4626676 > 14 0.15731212 -1.25955279 -1.0568046 1.61671397 > -0.4626676 > 15 -0.17052154 -0.85019813 -1.0568046 0.97821428 > -1.6083205 > > Library(sem) > # the covariance matrix for scaled data > S.covE <- > readMoments(diag=T,names=c("Fledgling_date_t","Total_Output_t","Breed_nb > .clutch_t.1","Breed_Egg_t.1","Breed_Total_Output_t.1")) > 1.0000000 > 0.350170246 1.0000000 > -0.075832501 -0.099929893 1.0000000 > -0.15439341 -0.091334987 -0.131698418 1.0000000 > -0.191457491 -0.227843749 0.510666663 -0.386711653 1.0000000 > > # specification of the model - I also provided a diagram of the model > in the attached PDF. > modelE <- specifyModel() > EndBreed -> Fledgling_date_t, lambda1, NA > EndBreed -> Total_Output_t, lambda1, NA > Fledgling_date_t <-> Fledgling_date_t, delta1, NA > Total_Output_t <-> Total_Output_t, delta2, NA > Fledgling_date_t <-> Total_Output_t, theta1, NA > EndBreed -> BreedSucc, gamma1, NA > EndBreed <-> EndBreed, phi1, NA > BreedSucc -> Breed_Egg_t.1, lamdae, NA > BreedSucc -> Breed_Total_Output_t.1, lamdae, NA > BreedSucc -> Breed_nb.clutch_t.1, lamdae, NA > Breed_nb.clutch_t.1 <-> Breed_nb.clutch_t.1, eps1, NA > Breed_Egg_t.1 <-> Breed_Egg_t.1, eps2, NA > Breed_Total_Output_t.1 <-> Breed_Total_Output_t.1, eps3, NA > Breed_nb.clutch_t.1 <-> Breed_Egg_t.1, psie12, NA > Breed_Egg_t.1 <-> Breed_Total_Output_t.1, psie23, NA > Breed_nb.clutch_t.1 <-> Breed_Total_Output_t.1, psie13, NA > BreedSucc <-> BreedSucc, zetae, NA > > # estimation of the model > semE <- sem(modelE,S.covE,N=39,debug=T) > > To this point, everything seemed fine, the parameter were estimated > after 129 iterations with all data. > However, the problem arised when I asked for a summary of the model: > > > summary(semE) > Error in summary.objectiveML(semE) : > coefficient covariances cannot be computed > > But the model seemed to work well : > > > semE > > Model Chisquare = 0.9876903 Df = 1 > > lambda1 delta1 delta2 theta1 gamma1 phi1 > lamdae eps1 eps2 > 0.8251654 0.3302009 0.3418300 -0.3138143 0.4122545 0.9752364 > -0.4671335 0.8020365 0.7857964 > eps3 psie12 psie23 psie13 zetae > 0.7461566 -0.3377820 -0.6207350 0.2847632 0.8828395 > > Iterations = 75 > > semE$convergence > [1] TRUEWhat's remarkable is that you got estimates at all. The model is underidentified because even with the equality constraints on the parameters there are no normalizing constraints setting scales for the latent variables. As well, allowing all measurement error variables to be correlated between (among) the indicators of each latent variable underidentifies the model.> > I also tried with using SpecifyEquations() instead of SpecifyModel() : > # specification of the model using specifyEquations > modelEe <- specifyEquations() > Fledgling_date_t = lambda1*EndBreed > Total_Output_t = lambda1*EndBreed > c(Fledgling_date_t,Total_Output_t) = theta1 > Breed_nb.clutch_t.1 = lamdae*BreedSucc > Breed_Egg_t.1 = lamdae*BreedSucc > Breed_Total_Output_t.1 = lamdae*BreedSucc > c(Breed_nb.clutch_t.1,Breed_Egg_t.1) = psi12 > c(Breed_nb.clutch_t.1,Breed_Total_Output_t.1) = psi13 > c(Breed_Egg_t.1,Breed_Total_Output_t.1) = psi23 > BreedSucc = gamma1*EndBreed > v(EndBreed) = phi1 > v(BreedSucc) = zeta1 > v(Fledgling_date_t) = delta1 > v(Total_Output_t) = delta2 > v(Breed_nb.clutch_t.1) = eps1 > v(Breed_Egg_t.1) = eps2 > v(Breed_Total_Output_t.1) = eps3 > > # estimation of the model > semEe <- sem(modelEe,covE,N=39,debug=T) > > > semEe > > Model Chisquare = 0.9876903 Df = 1 > > lambda1 theta1 lamdae psi12 psi13 psi23 > gamma1 phi1 zeta1 > 0.8220182 -0.3346606 0.5034442 -0.3550646 0.2674806 -0.6380177 > -0.3694630 1.0135693 0.8326144 > delta1 delta2 eps1 eps2 eps3 > 0.3093554 0.3209828 0.7847537 0.7685137 0.7288741 > > Iterations = 79 > > summary(semEe) > Error in summary.objectiveML(semEe) : > coefficient covariances cannot be computed >This is the same model. That you got identical chisquares but different parameter estimates points to underidentification. Again, I'm surprised that there was an apparent solution not that the information matrix isn't positive-definite.> I also tried to set one loading to 1 instead of setting equality among > loadings, but the results were the same.That addresses one of the sources of underidentification but not the other. Actually, to be consistent with your prior specification you should have set *all* of the factor loadings to 1, which would both constrain them equal and set a metric for the latent variables -- but the resulting model would still (I believe) be underidentified by virtue of the correlations among the measurement errors.> > Could it be possible that the low number of data (N=39 but no NA > inside) may be the cause of the error? > In the model, the df is 1, thus all the parameters should be > identifiable.No, the small N probably makes it inadvisable to fit a SEM to these data but the input covariance matrix isn't poorly conditioned. Also, positive df doesn't guarantee an identified model -- it's a necessary but not sufficient condition for identification -- and it's very easy to specify a model with positive df that's still underidentified. Indeed, you easily succeeded in doing that.> > Hoping you will have enough information to help a bit.I fiddled with your model a bit and produced the following (allowing different factor loadings for different indicators and using as a normalizing constraint that the variances of the latent variables are 1): -------------- snip -------------> modelEe.modified <- specifyEquations()1: Fledgling_date_t = lambda1*EndBreed 2: Total_Output_t = lambda2*EndBreed 3: Breed_nb.clutch_t.1 = lamdae1*BreedSucc 4: Breed_Egg_t.1 = lamdae2*BreedSucc 5: Breed_Total_Output_t.1 = lamdae3*BreedSucc 6: BreedSucc = gamma1*EndBreed 7: V(EndBreed) = 1 8: V(BreedSucc) = 1 9: Read 8 items NOTE: adding 5 variances to the model> semEe.modified <- sem(modelEe.modified, S.covE, N=39) > summary(semEe.modified)Model Chisquare = 1.998071 Df = 4 Pr(>Chisq) = 0.7361137 AIC = 23.99807 BIC = -12.65618 Normalized Residuals Min. 1st Qu. Median Mean 3rd Qu. Max. -1.14900 -0.36600 0.00000 -0.18790 0.08115 0.23380 R-square for Endogenous Variables Fledgling_date_t Total_Output_t BreedSucc Breed_nb.clutch_t.1 Breed_Egg_t.1 Breed_Total_Output_t.1 0.3331 0.3681 0.0554 0.0805 0.0559 2.8772 Parameter Estimates Estimate Std Error z value Pr(>|z|) lambda1 0.5771328 0.2495813 2.3124037 2.075545e-02 Fledgling_date_t <--- EndBreed lambda2 0.6067412 0.2570540 2.3603648 1.825697e-02 Total_Output_t <--- EndBreed lamdae1 0.2757486 0.2421902 1.1385623 2.548858e-01 Breed_nb.clutch_t.1 <--- BreedSucc lamdae2 -0.2297048 0.2136600 -1.0750952 2.823321e-01 Breed_Egg_t.1 <--- BreedSucc lamdae3 1.6485964 1.3517870 1.2195682 2.226286e-01 Breed_Total_Output_t.1 <--- BreedSucc gamma1 -0.2421404 0.2698327 -0.8973722 3.695203e-01 BreedSucc <--- EndBreed V[Fledgling_date_t] 0.6669177 0.2778114 2.4006126 1.636765e-02 Fledgling_date_t <--> Fledgling_date_t V[Total_Output_t] 0.6318651 0.2944414 2.1459791 3.187465e-02 Total_Output_t <--> Total_Output_t V[Breed_nb.clutch_t.1] 0.9195045 0.2423910 3.7934763 1.485528e-04 Breed_nb.clutch_t.1 <--> Breed_nb.clutch_t.1 V[Breed_Egg_t.1] 0.9441415 0.2306422 4.0935328 4.248499e-05 Breed_Egg_t.1 <--> Breed_Egg_t.1 V[Breed_Total_Output_t.1] -1.8772242 4.4379049 -0.4229979 6.722968e-01 Breed_Total_Output_t.1 <--> Breed_Total_Output_t.1 Iterations = 34 -------------- snip ------------- Of course, if you really *believe* that the various measurement errors are correlated then these estimates are biased. Some more comments: (1) You don't have to supply error-variances for variables; specifyEquations() and specifyModel() will do that automatically by default (as you can see in my input/output above). (2) If you have the original data set, as you do, it's better to use that rather than a moment matrix as input to sem(). Best, John ----------------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.socsci.mcmaster.ca/jfox/> > Thanks in advance. > > Cheers, > > Guillaume > > -- > > Guillaume SOUCHAY, Ph.D > > Post-doctoral fellow in population dynamics > > --- > > "There is no true model" Anderson & Burhnam 1999 > > --- > > ? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.