thr3ads.net - R help - [R] Imputing missing values using "LSmeans" (i.e., population marginal means)

If this information is useful, please help other people find it:
Share via:

Jenn Barrett

2012-Apr-03 05:23 UTC

[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?

Hi folks,

I have a dataset that consists of counts over a ~30 year period at multiple
(>200) sites. Only one count is conducted at each site in each year; however,
not all sites are surveyed in all years. I need to impute the missing values
because I need an estimate of the total population size (i.e., sum of counts
across all sites) in each year as input to another model.
> head(newdat,40)   SITE YEAR COUNT
1     1 1975 12620
2     1 1976 13499
3     1 1977 45575
4     1 1978 21919
5     1 1979 33423
...
37    2 1975 40000
38    2 1978 40322
39    2 1979 70000
40    2 1980 16244


It was suggested to me by a statistician to use LSmeans to do this; however, I
do not have SAS, nor do I know anything much about SAS. I have spent DAYS
reading about these "LSmeans" and while (I think) I understand what
they are, I have absolutely no idea how to a) calculate them in R and b) how to
use them to impute my missing values in R. Again, I've searched the mail
lists, internet and literature and have not found any documentation to advise on
how to do this - I'm lost.

I've looked at popMeans, but have no clue how to use this with predict() -
if this is even the route to go. Any advice would be much appreciated. Note that
YEAR will be treated as a factor and not a linear variable (i.e., the
relationship between COUNT and YEAR is not linear - rather there are highs and
lows about every 10 or so years).

One thought I did have was to just set up a loop to calculate the least-squares
estimates as:

Yij = (IYi + JYj - Y)/[(I-1)(J-1)]
where  I = number of treatments and J = number of blocks (so I = sites and J =
years). I found this formula in some stats lecture handouts by UC Davis on
unbalanced data and LSMeans...but does it yield the same thing as using the
LSmeans estimates? Does it make any sense? Thoughts?

Many thanks in advance.

Jenn

Liaw, Andy

2012-Apr-05 15:40 UTC

head link

[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?

Don't know how you searched, but perhaps this might help:

https://stat.ethz.ch/pipermail/r-help/2007-March/128064.html 
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jenn Barrett
> Sent: Tuesday, April 03, 2012 1:23 AM
> To: r-help at r-project.org
> Subject: [R] Imputing missing values using "LSmeans" (i.e., 
> population marginal means) - advice in R?
> 
> Hi folks,
> 
> I have a dataset that consists of counts over a ~30 year 
> period at multiple (>200) sites. Only one count is conducted 
> at each site in each year; however, not all sites are 
> surveyed in all years. I need to impute the missing values 
> because I need an estimate of the total population size 
> (i.e., sum of counts across all sites) in each year as input 
> to another model. 
> 
> > head(newdat,40)
>    SITE YEAR COUNT
> 1     1 1975 12620
> 2     1 1976 13499
> 3     1 1977 45575
> 4     1 1978 21919
> 5     1 1979 33423
> ...
> 37    2 1975 40000
> 38    2 1978 40322
> 39    2 1979 70000
> 40    2 1980 16244
> 
> 
> It was suggested to me by a statistician to use LSmeans to do 
> this; however, I do not have SAS, nor do I know anything much 
> about SAS. I have spent DAYS reading about these "LSmeans" 
> and while (I think) I understand what they are, I have 
> absolutely no idea how to a) calculate them in R and b) how 
> to use them to impute my missing values in R. Again, I've 
> searched the mail lists, internet and literature and have not 
> found any documentation to advise on how to do this - I'm lost.
> 
> I've looked at popMeans, but have no clue how to use this 
> with predict() - if this is even the route to go. Any advice 
> would be much appreciated. Note that YEAR will be treated as 
> a factor and not a linear variable (i.e., the relationship 
> between COUNT and YEAR is not linear - rather there are highs 
> and lows about every 10 or so years).
> 
> One thought I did have was to just set up a loop to calculate 
> the least-squares estimates as:
> 
> Yij = (IYi + JYj - Y)/[(I-1)(J-1)]
> where  I = number of treatments and J = number of blocks (so 
> I = sites and J = years). I found this formula in some stats 
> lecture handouts by UC Davis on unbalanced data and 
> LSMeans...but does it yield the same thing as using the 
> LSmeans estimates? Does it make any sense? Thoughts?
> 
> Many thanks in advance.
> 
> Jenn
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attachme...{{dropped:11}}

Seemingly Similar Threads

Search for more maybe matching threads

R help - Apr 2012 - Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?

[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?

[R] Imputing missing values using "LSmeans" (i.e., population marginal means) - advice in R?

Seemingly Similar Threads