thr3ads.net - R help - [R] Question about data used to fit the mixed model [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Nantachai Kantanantha

2006-Jul-30 04:33 UTC

[R] Question about data used to fit the mixed model

Hi everyone,

I would like to ask a question regarding to the data used to fit the mixed 
model.

I wonder that, for the response variable data used to fit the mixed model 
(either via "spm" or "lme"), we must have several
observations per subject
(i.e. Yij,  i = 1,..,M,  j = 1,.., ni) or it can be just one observation per 
subject (i.e. Yi,  i = 1,...,M). Since we have to specify the groups for 
random effect components, if we have only one observation per subject, then 
each group will have only one observation.

Thank you vert much for your help.
Sincerely yours,

Nantachai

Doran, Harold

2006-Jul-30 12:27 UTC

head link

[R] Question about data used to fit the mixed model

You can have one observation per subject with multiple subjects nested in a
group. If you only have 1 observation per group, then there is no multilevel
structure to your data.

For example, 30 students in a classroom or 20 employees in an office division
are appropriate data structures. On the other hand 1 observation per school in
each of 30 schools has no grouping structure.

If you look at some of the data in the mlmRev package or other data files in the
nlme package and look at their structure, this might be helpful to see exactly
how the data might be layed out.

Look at the egsingle or the star data in the mlmRev package to see examples of
longitudinal models where eac student has multiple test scores. In egsingle,
each student is properly nested in a single school whereas in the star data,
students are crossed with teachers and schools.

Use the str(star) to see the data structure. Or, you can do something like
head(star) to see the 1st 6 rows and see how the data are layed out.

I hope this helps,
Harold



-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch on behalf of Nantachai Kantanantha
Sent: Sun 7/30/2006 12:33 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Question about data used to fit the mixed model
 
Hi everyone,

I would like to ask a question regarding to the data used to fit the mixed 
model.

I wonder that, for the response variable data used to fit the mixed model 
(either via "spm" or "lme"), we must have several
observations per subject
(i.e. Yij,  i = 1,..,M,  j = 1,.., ni) or it can be just one observation per 
subject (i.e. Yi,  i = 1,...,M). Since we have to specify the groups for 
random effect components, if we have only one observation per subject, then 
each group will have only one observation.

Thank you vert much for your help.
Sincerely yours,

Nantachai

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


	[[alternative HTML version deleted]]

Douglas Bates

2006-Jul-31 22:28 UTC

head link

[R] Question about data used to fit the mixed model

On 7/29/06, Nantachai Kantanantha <kantanantha at hotmail.com>
wrote:> Hi everyone,
>
> I would like to ask a question regarding to the data used to fit the mixed
> model.
>
> I wonder that, for the response variable data used to fit the mixed model
> (either via "spm" or "lme"), we must have several
observations per subject
> (i.e. Yij,  i = 1,..,M,  j = 1,.., ni) or it can be just one observation
per
> subject (i.e. Yi,  i = 1,...,M). Since we have to specify the groups for
> random effect components, if we have only one observation per subject, then
> each group will have only one observation.
As Harold Doran mentioned in his earlier reply, if you only have one
observation in each group you can't estimate the parameters in a mixed
model because the random effect for a group is completely confounded
with the per-observation noise term for the observation.  The model
would be of the form

X\beta + Z b + \epsilon

for which you would estimate the variance of the components of b and
the variance of the components of \epsilon.  However, with only one
observation per group the number of components in b and in \epsilon
would be the same and, by suitably reordering the observations, the
matrix Z could be made to be an identity matrix.  Thus the model
reduces to

 X\beta + (b + \epsilon)

and the elements of b are confounded with those of \epsilon.

A different version of this question is to ask whether some of the
groups can have only a single observation while others have more that
one observation.  The answer to that is a qualified "yes".

An example of data with different numbers of observations per group is
the star data that Harold mentioned.  The "student" identifier in this
data set is named "id".  If we table the number of observations per
student then table that result we get a table of the number of
students with 1, 2, 3 or 4 observations.
> data("star", package = 'mlmRev')
> table(table(star$id))
   1    2    3    4
4314 2455 1744 3085> length(unique(star$id))
[1] 11598> 4314/11598[1] 0.3719607

This shows that more than a third of the students have data from only
a single year.

It is possible to include such students in a mixed model with a random
effect for student.  It is even possible to include such students in a
mixed model with a random intercept and a random slope with respect to
time for student.  However, such students contribute very little
information to the model fit and the "estimates" (actually
"predictors") of the random effects for such students are artificially
small because they are confounded with the per-observation noise term.

So while it can be attractive when designing an experimental or
planning a observational study to have many groups and few
observations per group, such experiments or studies provide very
sparse information.  Using a mixed model on such data doesn't
magically add information to the data.  Mixed models are statistical
models, not magic.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jul 2006 - Question about data used to fit the mixed model

[R] Question about data used to fit the mixed model

[R] Question about data used to fit the mixed model

[R] Question about data used to fit the mixed model

Reasonably Related Threads