Jenn Barrett
2012-Apr-03 05:23 UTC
[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?
Hi folks, I have a dataset that consists of counts over a ~30 year period at multiple (>200) sites. Only one count is conducted at each site in each year; however, not all sites are surveyed in all years. I need to impute the missing values because I need an estimate of the total population size (i.e., sum of counts across all sites) in each year as input to another model.> head(newdat,40)SITE YEAR COUNT 1 1 1975 12620 2 1 1976 13499 3 1 1977 45575 4 1 1978 21919 5 1 1979 33423 ... 37 2 1975 40000 38 2 1978 40322 39 2 1979 70000 40 2 1980 16244 It was suggested to me by a statistician to use LSmeans to do this; however, I do not have SAS, nor do I know anything much about SAS. I have spent DAYS reading about these "LSmeans" and while (I think) I understand what they are, I have absolutely no idea how to a) calculate them in R and b) how to use them to impute my missing values in R. Again, I've searched the mail lists, internet and literature and have not found any documentation to advise on how to do this - I'm lost. I've looked at popMeans, but have no clue how to use this with predict() - if this is even the route to go. Any advice would be much appreciated. Note that YEAR will be treated as a factor and not a linear variable (i.e., the relationship between COUNT and YEAR is not linear - rather there are highs and lows about every 10 or so years). One thought I did have was to just set up a loop to calculate the least-squares estimates as: Yij = (IYi + JYj - Y)/[(I-1)(J-1)] where I = number of treatments and J = number of blocks (so I = sites and J = years). I found this formula in some stats lecture handouts by UC Davis on unbalanced data and LSMeans...but does it yield the same thing as using the LSmeans estimates? Does it make any sense? Thoughts? Many thanks in advance. Jenn
Liaw, Andy
2012-Apr-05 15:40 UTC
[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?
Don't know how you searched, but perhaps this might help: https://stat.ethz.ch/pipermail/r-help/2007-March/128064.html> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Jenn Barrett > Sent: Tuesday, April 03, 2012 1:23 AM > To: r-help at r-project.org > Subject: [R] Imputing missing values using "LSmeans" (i.e., > population marginal means) - advice in R? > > Hi folks, > > I have a dataset that consists of counts over a ~30 year > period at multiple (>200) sites. Only one count is conducted > at each site in each year; however, not all sites are > surveyed in all years. I need to impute the missing values > because I need an estimate of the total population size > (i.e., sum of counts across all sites) in each year as input > to another model. > > > head(newdat,40) > SITE YEAR COUNT > 1 1 1975 12620 > 2 1 1976 13499 > 3 1 1977 45575 > 4 1 1978 21919 > 5 1 1979 33423 > ... > 37 2 1975 40000 > 38 2 1978 40322 > 39 2 1979 70000 > 40 2 1980 16244 > > > It was suggested to me by a statistician to use LSmeans to do > this; however, I do not have SAS, nor do I know anything much > about SAS. I have spent DAYS reading about these "LSmeans" > and while (I think) I understand what they are, I have > absolutely no idea how to a) calculate them in R and b) how > to use them to impute my missing values in R. Again, I've > searched the mail lists, internet and literature and have not > found any documentation to advise on how to do this - I'm lost. > > I've looked at popMeans, but have no clue how to use this > with predict() - if this is even the route to go. Any advice > would be much appreciated. Note that YEAR will be treated as > a factor and not a linear variable (i.e., the relationship > between COUNT and YEAR is not linear - rather there are highs > and lows about every 10 or so years). > > One thought I did have was to just set up a loop to calculate > the least-squares estimates as: > > Yij = (IYi + JYj - Y)/[(I-1)(J-1)] > where I = number of treatments and J = number of blocks (so > I = sites and J = years). I found this formula in some stats > lecture handouts by UC Davis on unbalanced data and > LSMeans...but does it yield the same thing as using the > LSmeans estimates? Does it make any sense? Thoughts? > > Many thanks in advance. > > Jenn > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}