thr3ads.net - R help - [R] Nested structure data simulation [May 2019]

If this information is useful, please help other people find it:
Share via:

Boris Steipe

2019-May-19 13:26 UTC

[R] Nested structure data simulation

Fair enough - there are additional assumptions needed, which I make as follows:
  - each class has the same size
  - each teacher teaches the same number of classes
  - the number of boys and girls is random within a class
  - there are 60% girls   (just for illustration that it does not have to be
equal)
  

To make the dependencies explicit, I define them so, and in a way that they
can't be inconsistent.

nS <- 10        # Schools
nTpS <- 5       # Teachers per School
nCpT <- 2       # Classes per teacher
nPpC <- 20      # Pupils per class
nS * nTpS * nCpT * nPpC == 2000   # Validate


mySim <- data.frame(School  = paste0("s", rep(1:nS, each =
nTpS*nCpT*nPpC)),
                    Teacher = paste0("t", rep(1:(nTpS*nS), each =
nCpT*nPpC)),
                    Class   = paste0("c", rep(1:(nCpT*nTpS*nS), each =
nPpC)),
                    Gender  = sample(c("boy", "girl"),
                                     (nS*nTpS*nCpT*nPpC),
                                     prob = c(0.4, 0.6),
                                     replace = TRUE),
                    Mark    = numeric(nS*nTpS*nCpT*nPpC),
                    stringsAsFactors = FALSE)
                    

Then you fill mySim$Mark with values from your linear mixed model ...

mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.


All good?

Cheers,
Boris


> On 2019-05-19, at 08:05, varin sacha <varinsacha at yahoo.fr> wrote:
> 
> Many thanks to all of you for your responses.
> 
> So, I will try to be clearer with a larger example. Te end of my mail is
the more important to understand what I am trying to do. I am trying to simulate
data to fit a linear mixed model (nested not crossed). More precisely, I would
love to get at the end of the process, a table (.txt) with columns and rows.
Column 1 and Rows will be the 2000 pupils and the columns the different
variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ;
Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
> 
> Pupils are nested  in classes, classes are nested in schools. The teacher
are part of the process.
> 
> I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers
and 10 schools.
> - Pupils n?1 to pupils n?2000 (p1, p2, p3, p4, ..., p2000)
> - Classes n?1 to classes n?100 (c1, c2, c3, c4,..., c100)
> - Teachers n?1 to teacher n?50 ( t1, t2, t3, t4, ..., t50)
> - Schools n?1 to chool n?10 (s1, s2, s3, s4, ..., s10)
> 
> The nested structure is as followed : 
> 
> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with
classes 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n?1 to
pupils n?200 (p1, p2, p3, p4,..., p200).
> 
> -- School 2 with teacher 6 to teacher 10, with classes 11 to classes 20,
pupils n?201 to pupils n?400
> 
> -- and so on
> 
> The table (.txt) I would love to get at the end is the following :
> 
>         Class    Teacher    School    gender    Mark
> 1       c1        t1                s1            boy        5
> 2       c1        t1                s1            boy        5.5
> 3       c1        t1                s1            girl        4.5
> 4       c1        t1                s1            girl        6
> 5       c1        t1                s1            boy       3.5
> 6       ...        ....                ....            .....        .....
> 
> The first 20 rows with c1, with t1, with s1, gender (randomly slected) and
mark (andomly selected) from 1 to 6
> The rows 21 to 40 with c2 with t1 with s1
> The rows 41 to 60 with c3 with t2 with s1
> The rows 61 to 80 with c4 with t2 with s1
> The rows 81 to 100 with c5 with t3 with s1
> The rows 101 to 120 with c6 with t3 with s1
> The rows 121 to 140 with c7 with t4 with s1
> The rows 141 to 160 with c8 with t4 with s1
> The rows 161 to 180 with c9 with t5 with s1
> The rows 181 to 200 with c10 with t5 with s1
> 
> The rows 201 to 220 with c11 with t6 with s2
> The rows 221 to 240 with c12 with t6 with s2
> 
> And so on...
> 
> Is it possible to do that ? Or am I dreaming ?
> 
> 
> Le dimanche 19 mai 2019 ? 10:45:43 UTC+2, Linus Chen <linus.l.chen at
gmail.com> a ?crit :
> 
> 
> 
> 
> 
> Dear varin sacha,
> 
> I think it will help us help you, if you give a clearer description of
> what exactly you want.
> 
> I assume the situation is that you know what a data structure you
> want, but do not know
> how to conveniently create such structure.
> And that is where others can help you.
> So, please, describe the wanted data structure more thoroughly,
> ideally with example.
> 
> Thanks,
> Lei
> 
> On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
> <r-help at r-project.org> wrote:
>> 
>> Dear Boris,
>> 
>> Yes, top-down, no problem. Many thanks, but in your code did you not
forget "teacher" ? As a reminder teacher has to be nested with
classes. I mean the 50 pupils belonging to C1 must be with (teacher 1) T1, the
50 pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so
on.
>> 
>> Best,
>> 
>> 
>> Le samedi 18 mai 2019 ? 16:52:48 UTC+2, Boris Steipe <boris.steipe
at utoronto.ca> a ?crit :
>> 
>> 
>> 
>> 
>> 
>> Can you build your data top-down?
>> 
>> 
>> 
>> schools <- paste("s", 1:6, sep="")
>> 
>> classes <- character()
>> for (school in schools) {
>>   classes <- c(classes, paste(school, paste("c", 1:5,
sep=""), sep = "."))
>> }
>> 
>> pupils <- character()
>> for (class in classes) {
>>   pupils <- c(pupils, paste(class, paste("p", 1:10,
sep=""), sep = "."))
>> }
>> 
>> 
>> 
>> B.
>> 
>> 
>> 
>>> On 2019-05-18, at 09:57, varin sacha via R-help <r-help at
r-project.org> wrote:
>>> 
>>> Dear R-Experts,
>>> 
>>> In a data simulation, I would like a balanced distribution with a
nested structure for classroom and teacher (not for school). I mean 50 pupils
belonging to C1, 50 other pupils belonging to C2, 50 other pupils belonging to
C3 and so on. Then I want the 50 pupils belonging to C1 with T1, the 50 pupils
belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on. The
school don?t have to be nested, I just want a balanced distribution, I mean 60
pupils in S1, 60 other pupils in S2 and so on.
>>> Here below the reproducible example.
>>> Many thanks for your help.
>>> 
>>> ##############
>>> set.seed(123)
>>> # G?n?ration al?atoire des colonnes
>>> pupils<-1:300
>>>
classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T)
teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T)
school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
>> 
>>> ##############
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

varin sacha

2019-May-19 15:14 UTC

head link

[R] Nested structure data simulation

Dear Boris,

Great !!!! But what about Mark in your R code ? Don't we have to precise in
the R code that mark ranges between 1 to 6 (1 ; 1.5 ; 2 ; 2.5 ; 3 ; 3.5 ; 4 ;
4.5 ; 5 ; 5.5 ; 6) ?

By the way, to fit a linear mixed model, I use lme4 package and then the lmer
function works with the variables like in this example here below :

library(lme4)
mm=lmer(Mark ~Gender + (1 | School / Class), data=Dataset) 

With your R code, how can I write the lmer function to make it work ?

Best,
S.







Le dimanche 19 mai 2019 ? 15:26:39 UTC+2, Boris Steipe <boris.steipe at
utoronto.ca> a ?crit :





Fair enough - there are additional assumptions needed, which I make as follows:
? - each class has the same size
? - each teacher teaches the same number of classes
? - the number of boys and girls is random within a class
? - there are 60% girls? (just for illustration that it does not have to be
equal)
? 

To make the dependencies explicit, I define them so, and in a way that they
can't be inconsistent.

nS <- 10? ? ? ? # Schools
nTpS <- 5? ? ? # Teachers per School
nCpT <- 2? ? ? # Classes per teacher
nPpC <- 20? ? ? # Pupils per class
nS * nTpS * nCpT * nPpC == 2000? # Validate


mySim <- data.frame(School? = paste0("s", rep(1:nS, each =
nTpS*nCpT*nPpC)),
? ? ? ? ? ? ? ? ? ? Teacher = paste0("t", rep(1:(nTpS*nS), each =
nCpT*nPpC)),
? ? ? ? ? ? ? ? ? ? Class? = paste0("c", rep(1:(nCpT*nTpS*nS), each =
nPpC)),
? ? ? ? ? ? ? ? ? ? Gender? = sample(c("boy", "girl"),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (nS*nTpS*nCpT*nPpC),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? prob = c(0.4, 0.6),
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? replace = TRUE),
? ? ? ? ? ? ? ? ? ? Mark? ? = numeric(nS*nTpS*nCpT*nPpC),
? ? ? ? ? ? ? ? ? ? stringsAsFactors = FALSE)
? ? ? ? ? ? ? ? ? ? 

Then you fill mySim$Mark with values from your linear mixed model ...

mySim$Mark[i] <- simMarks(mySim[i])? # ... or something equivalent.


All good?

Cheers,
Boris


> On 2019-05-19, at 08:05, varin sacha <varinsacha at yahoo.fr> wrote:
> 
> Many thanks to all of you for your responses.
> 
> So, I will try to be clearer with a larger example. Te end of my mail is
the more important to understand what I am trying to do. I am trying to simulate
data to fit a linear mixed model (nested not crossed). More precisely, I would
love to get at the end of the process, a table (.txt) with columns and rows.
Column 1 and Rows will be the 2000 pupils and the columns the different
variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ;
Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
> 
> Pupils are nested? in classes, classes are nested in schools. The teacher
are part of the process.
> 
> I want to simulate a dataset with n=2000 pupils, 100 classes, 50 teachers
and 10 schools.
> - Pupils n?1 to pupils n?2000 (p1, p2, p3, p4, ..., p2000)
> - Classes n?1 to classes n?100 (c1, c2, c3, c4,..., c100)
> - Teachers n?1 to teacher n?50 ( t1, t2, t3, t4, ..., t50)
> - Schools n?1 to chool n?10 (s1, s2, s3, s4, ..., s10)
> 
> The nested structure is as followed : 
> 
> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with
classes 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n?1 to
pupils n?200 (p1, p2, p3, p4,..., p200).
> 
> -- School 2 with teacher 6 to teacher 10, with classes 11 to classes 20,
pupils n?201 to pupils n?400
> 
> -- and so on
> 
> The table (.txt) I would love to get at the end is the following :
> 
>? ? ? ? Class? ? Teacher? ? School? ? gender? ? Mark
> 1? ? ? c1? ? ? ? t1? ? ? ? ? ? ? ? s1? ? ? ? ? ? boy? ? ? ? 5
> 2? ? ? c1? ? ? ? t1? ? ? ? ? ? ? ? s1? ? ? ? ? ? boy? ? ? ? 5.5
> 3? ? ? c1? ? ? ? t1? ? ? ? ? ? ? ? s1? ? ? ? ? ? girl? ? ? ? 4.5
> 4? ? ? c1? ? ? ? t1? ? ? ? ? ? ? ? s1? ? ? ? ? ? girl? ? ? ? 6
> 5? ? ? c1? ? ? ? t1? ? ? ? ? ? ? ? s1? ? ? ? ? ? boy? ? ? 3.5
> 6? ? ? ...? ? ? ? ....? ? ? ? ? ? ? ? ....? ? ? ? ? ? .....? ? ? ? .....? ?
? ? ? ? ?
> 
> The first 20 rows with c1, with t1, with s1, gender (randomly slected) and
mark (andomly selected) from 1 to 6
> The rows 21 to 40 with c2 with t1 with s1
> The rows 41 to 60 with c3 with t2 with s1
> The rows 61 to 80 with c4 with t2 with s1
> The rows 81 to 100 with c5 with t3 with s1
> The rows 101 to 120 with c6 with t3 with s1
> The rows 121 to 140 with c7 with t4 with s1
> The rows 141 to 160 with c8 with t4 with s1
> The rows 161 to 180 with c9 with t5 with s1
> The rows 181 to 200 with c10 with t5 with s1
> 
> The rows 201 to 220 with c11 with t6 with s2
> The rows 221 to 240 with c12 with t6 with s2
> 
> And so on...
> 
> Is it possible to do that ? Or am I dreaming ?
> 
> 
> Le dimanche 19 mai 2019 ? 10:45:43 UTC+2, Linus Chen <linus.l.chen at
gmail.com> a ?crit :
> 
> 
> 
> 
> 
> Dear varin sacha,
> 
> I think it will help us help you, if you give a clearer description of
> what exactly you want.
> 
> I assume the situation is that you know what a data structure you
> want, but do not know
> how to conveniently create such structure.
> And that is where others can help you.
> So, please, describe the wanted data structure more thoroughly,
> ideally with example.
> 
> Thanks,
> Lei
> 
> On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
> <r-help at r-project.org> wrote:
>> 
>> Dear Boris,
>> 
>> Yes, top-down, no problem. Many thanks, but in your code did you not
forget "teacher" ? As a reminder teacher has to be nested with
classes. I mean the 50 pupils belonging to C1 must be with (teacher 1) T1, the
50 pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so
on.
>> 
>> Best,
>> 
>> 
>> Le samedi 18 mai 2019 ? 16:52:48 UTC+2, Boris Steipe <boris.steipe
at utoronto.ca> a ?crit :
>> 
>> 
>> 
>> 
>> 
>> Can you build your data top-down?
>> 
>> 
>> 
>> schools <- paste("s", 1:6, sep="")
>> 
>> classes <- character()
>> for (school in schools) {
>>? classes <- c(classes, paste(school, paste("c", 1:5,
sep=""), sep = "."))
>> }
>> 
>> pupils <- character()
>> for (class in classes) {
>>? pupils <- c(pupils, paste(class, paste("p", 1:10,
sep=""), sep = "."))
>> }
>> 
>> 
>> 
>> B.
>> 
>> 
>> 
>>> On 2019-05-18, at 09:57, varin sacha via R-help <r-help at
r-project.org> wrote:
>>> 
>>> Dear R-Experts,
>>> 
>>> In a data simulation, I would like a balanced distribution with a
nested structure for classroom and teacher (not for school). I mean 50 pupils
belonging to C1, 50 other pupils belonging to C2, 50 other pupils belonging to
C3 and so on. Then I want the 50 pupils belonging to C1 with T1, the 50 pupils
belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on. The
school don?t have to be nested, I just want a balanced distribution, I mean 60
pupils in S1, 60 other pupils in S2 and so on.
>>> Here below the reproducible example.
>>> Many thanks for your help.
>>> 
>>> ##############
>>> set.seed(123)
>>> # G?n?ration al?atoire des colonnes
>>> pupils<-1:300
>>>
classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T)?
teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T)?
school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
>> 
>>> ##############
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Linus Chen

2019-May-19 18:59 UTC

head link

[R] Nested structure data simulation

Dear varin sacha

On Sun, May 19, 2019 at 5:14 PM varin sacha via R-help
<r-help at r-project.org> wrote:>
> Dear Boris,
>
> Great !!!! But what about Mark in your R code ? Don't we have to
precise in the R code that mark ranges between 1 to 6 (1 ; 1.5 ; 2 ; 2.5 ; 3 ;
3.5 ; 4 ; 4.5 ; 5 ; 5.5 ; 6) ?
I think Boris is just setting up a framework for you. It is up to you
to decide the actual values. :)
You maybe want to create a MyData object, with the method Boris has
shown, but filling the Mark field with random numbers.

Cheers,
Lei
>
> By the way, to fit a linear mixed model, I use lme4 package and then the
lmer function works with the variables like in this example here below :
>
> library(lme4)
> mm=lmer(Mark ~Gender + (1 | School / Class), data=Dataset)
>
> With your R code, how can I write the lmer function to make it work ?
>
> Best,
> S.
>
>
>
>
>
>
>
> Le dimanche 19 mai 2019 ? 15:26:39 UTC+2, Boris Steipe <boris.steipe at
utoronto.ca> a ?crit :
>
>
>
>
>
> Fair enough - there are additional assumptions needed, which I make as
follows:
>   - each class has the same size
>   - each teacher teaches the same number of classes
>   - the number of boys and girls is random within a class
>   - there are 60% girls  (just for illustration that it does not have to be
equal)
>
>
> To make the dependencies explicit, I define them so, and in a way that they
can't be inconsistent.
>
> nS <- 10        # Schools
> nTpS <- 5      # Teachers per School
> nCpT <- 2      # Classes per teacher
> nPpC <- 20      # Pupils per class
> nS * nTpS * nCpT * nPpC == 2000  # Validate
>
>
> mySim <- data.frame(School  = paste0("s", rep(1:nS, each =
nTpS*nCpT*nPpC)),
>                     Teacher = paste0("t", rep(1:(nTpS*nS), each =
nCpT*nPpC)),
>                     Class  = paste0("c", rep(1:(nCpT*nTpS*nS),
each = nPpC)),
>                     Gender  = sample(c("boy", "girl"),
>                                     (nS*nTpS*nCpT*nPpC),
>                                     prob = c(0.4, 0.6),
>                                     replace = TRUE),
>                     Mark    = numeric(nS*nTpS*nCpT*nPpC),
>                     stringsAsFactors = FALSE)
>
>
> Then you fill mySim$Mark with values from your linear mixed model ...
>
> mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.
>
>
> All good?
>
> Cheers,
> Boris
>
>
>
> > On 2019-05-19, at 08:05, varin sacha <varinsacha at yahoo.fr>
wrote:
> >
> > Many thanks to all of you for your responses.
> >
> > So, I will try to be clearer with a larger example. Te end of my mail
is the more important to understand what I am trying to do. I am trying to
simulate data to fit a linear mixed model (nested not crossed). More precisely,
I would love to get at the end of the process, a table (.txt) with columns and
rows. Column 1 and Rows will be the 2000 pupils and the columns the different
variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ;
Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
> >
> > Pupils are nested  in classes, classes are nested in schools. The
teacher are part of the process.
> >
> > I want to simulate a dataset with n=2000 pupils, 100 classes, 50
teachers and 10 schools.
> > - Pupils n?1 to pupils n?2000 (p1, p2, p3, p4, ..., p2000)
> > - Classes n?1 to classes n?100 (c1, c2, c3, c4,..., c100)
> > - Teachers n?1 to teacher n?50 ( t1, t2, t3, t4, ..., t50)
> > - Schools n?1 to chool n?10 (s1, s2, s3, s4, ..., s10)
> >
> > The nested structure is as followed :
> >
> > -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with
classes 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n?1 to
pupils n?200 (p1, p2, p3, p4,..., p200).
> >
> > -- School 2 with teacher 6 to teacher 10, with classes 11 to classes
20, pupils n?201 to pupils n?400
> >
> > -- and so on
> >
> > The table (.txt) I would love to get at the end is the following :
> >
> >        Class    Teacher    School    gender    Mark
> > 1      c1        t1                s1            boy        5
> > 2      c1        t1                s1            boy        5.5
> > 3      c1        t1                s1            girl        4.5
> > 4      c1        t1                s1            girl        6
> > 5      c1        t1                s1            boy      3.5
> > 6      ...        ....                ....            .....       
.....
> >
> > The first 20 rows with c1, with t1, with s1, gender (randomly slected)
and mark (andomly selected) from 1 to 6
> > The rows 21 to 40 with c2 with t1 with s1
> > The rows 41 to 60 with c3 with t2 with s1
> > The rows 61 to 80 with c4 with t2 with s1
> > The rows 81 to 100 with c5 with t3 with s1
> > The rows 101 to 120 with c6 with t3 with s1
> > The rows 121 to 140 with c7 with t4 with s1
> > The rows 141 to 160 with c8 with t4 with s1
> > The rows 161 to 180 with c9 with t5 with s1
> > The rows 181 to 200 with c10 with t5 with s1
> >
> > The rows 201 to 220 with c11 with t6 with s2
> > The rows 221 to 240 with c12 with t6 with s2
> >
> > And so on...
> >
> > Is it possible to do that ? Or am I dreaming ?
> >
> >
> > Le dimanche 19 mai 2019 ? 10:45:43 UTC+2, Linus Chen <linus.l.chen
at gmail.com> a ?crit :
> >
> >
> >
> >
> >
> > Dear varin sacha,
> >
> > I think it will help us help you, if you give a clearer description of
> > what exactly you want.
> >
> > I assume the situation is that you know what a data structure you
> > want, but do not know
> > how to conveniently create such structure.
> > And that is where others can help you.
> > So, please, describe the wanted data structure more thoroughly,
> > ideally with example.
> >
> > Thanks,
> > Lei
> >
> > On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
> > <r-help at r-project.org> wrote:
> >>
> >> Dear Boris,
> >>
> >> Yes, top-down, no problem. Many thanks, but in your code did you
not forget "teacher" ? As a reminder teacher has to be nested with
classes. I mean the 50 pupils belonging to C1 must be with (teacher 1) T1, the
50 pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so
on.
> >>
> >> Best,
> >>
> >>
> >> Le samedi 18 mai 2019 ? 16:52:48 UTC+2, Boris Steipe
<boris.steipe at utoronto.ca> a ?crit :
> >>
> >>
> >>
> >>
> >>
> >> Can you build your data top-down?
> >>
> >>
> >>
> >> schools <- paste("s", 1:6, sep="")
> >>
> >> classes <- character()
> >> for (school in schools) {
> >>  classes <- c(classes, paste(school, paste("c", 1:5,
sep=""), sep = "."))
> >> }
> >>
> >> pupils <- character()
> >> for (class in classes) {
> >>  pupils <- c(pupils, paste(class, paste("p", 1:10,
sep=""), sep = "."))
> >> }
> >>
> >>
> >>
> >> B.
> >>
> >>
> >>
> >>> On 2019-05-18, at 09:57, varin sacha via R-help <r-help at
r-project.org> wrote:
> >>>
> >>> Dear R-Experts,
> >>>
> >>> In a data simulation, I would like a balanced distribution
with a nested structure for classroom and teacher (not for school). I mean 50
pupils belonging to C1, 50 other pupils belonging to C2, 50 other pupils
belonging to C3 and so on. Then I want the 50 pupils belonging to C1 with T1,
the 50 pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and
so on. The school don?t have to be nested, I just want a balanced distribution,
I mean 60 pupils in S1, 60 other pupils in S2 and so on.
> >>> Here below the reproducible example.
> >>> Many thanks for your help.
> >>>
> >>> ##############
> >>> set.seed(123)
> >>> # G?n?ration al?atoire des colonnes
> >>> pupils<-1:300
> >>>
classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T)
teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T)
school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
> >>
> >>> ##############
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
code.
> >
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Boris Steipe

2019-May-19 21:12 UTC

head link

[R] Nested structure data simulation

My mental model for such a simulation is that you create data from a known
distribution, then use your model to check that you can recover the known
parameters from the data. Thus how the marks are created depends on what
influences them. Here is a toy model to illustrate this - expanding on my code
sample:


# a function to generate marks from parameters
rMarks <- function(n, m, s) {
  # a normal distribution limited to between 1 and 6, in 0.5 intervals, with
  # mean m and standard deviation s
  marks <- rnorm(n, m, s)
  marks <- round(marks * 2) / 2
  marks[marks < 1] <- 1
  marks[marks > 6] <- 6
  return(marks)
}

# Teachers in two categories: 70% of teachers (tNormal) grade everyone according
to
# a marks distribution with m = 3.5 and sd = 1 ; the others grade girls with a 
# m = 4.5 and sd = 0.7 and boys with m = 3.0 and sd = 1.2

# define who are the "normal teachers"
x <- paste0("t", 1:(nS * nTpS))
tNormal <- sample(x, round(nS * nTpS * 0.7), replace = FALSE)

# this is rather pedestrian code, but as explicit as I can make it ...
for (i in 1:nrow(mySim)) {
  if (mySim$Teacher[i] %in% tNormal) {
    m <- 3.5
    s <- 1.0
  } else {
    if (mySim$Gender[i] == "girl") {
      m <- 4.5
      s <- 0.7
    } else {
      m <- 3.0
      s <- 1.2 
    }
  }
  mySim$Mark[i] <- rMarks(1, m, s)
}

# Validate
table(mySim$Mark)
hist(mySim$Mark[mySim$Teacher %in% tNormal],
     col = "#0000BB44")
hist(mySim$Mark[ ! mySim$Teacher %in% tNormal],
     add = TRUE,
     col = "#BB000044")

Then the challenge is to recover the parameters from your analysis. 


Cheers,
Boris


> On 2019-05-19, at 11:14, varin sacha <varinsacha at yahoo.fr> wrote:
> 
> Dear Boris,
> 
> Great !!!! But what about Mark in your R code ? Don't we have to
precise in the R code that mark ranges between 1 to 6 (1 ; 1.5 ; 2 ; 2.5 ; 3 ;
3.5 ; 4 ; 4.5 ; 5 ; 5.5 ; 6) ?
> 
> By the way, to fit a linear mixed model, I use lme4 package and then the
lmer function works with the variables like in this example here below :
> 
> library(lme4)
> mm=lmer(Mark ~Gender + (1 | School / Class), data=Dataset) 
> 
> With your R code, how can I write the lmer function to make it work ?
> 
> Best,
> S.
> 
> 
> 
> 
> 
> 
> 
> Le dimanche 19 mai 2019 ? 15:26:39 UTC+2, Boris Steipe <boris.steipe at
utoronto.ca> a ?crit :
> 
> 
> 
> 
> 
> Fair enough - there are additional assumptions needed, which I make as
follows:
>   - each class has the same size
>   - each teacher teaches the same number of classes
>   - the number of boys and girls is random within a class
>   - there are 60% girls  (just for illustration that it does not have to be
equal)
>   
> 
> To make the dependencies explicit, I define them so, and in a way that they
can't be inconsistent.
> 
> nS <- 10        # Schools
> nTpS <- 5      # Teachers per School
> nCpT <- 2      # Classes per teacher
> nPpC <- 20      # Pupils per class
> nS * nTpS * nCpT * nPpC == 2000  # Validate
> 
> 
> mySim <- data.frame(School  = paste0("s", rep(1:nS, each =
nTpS*nCpT*nPpC)),
>                     Teacher = paste0("t", rep(1:(nTpS*nS), each =
nCpT*nPpC)),
>                     Class  = paste0("c", rep(1:(nCpT*nTpS*nS),
each = nPpC)),
>                     Gender  = sample(c("boy", "girl"),
>                                     (nS*nTpS*nCpT*nPpC),
>                                     prob = c(0.4, 0.6),
>                                     replace = TRUE),
>                     Mark    = numeric(nS*nTpS*nCpT*nPpC),
>                     stringsAsFactors = FALSE)
>                     
> 
> Then you fill mySim$Mark with values from your linear mixed model ...
> 
> mySim$Mark[i] <- simMarks(mySim[i])  # ... or something equivalent.
> 
> 
> All good?
> 
> Cheers,
> Boris
> 
> 
> 
>> On 2019-05-19, at 08:05, varin sacha <varinsacha at yahoo.fr>
wrote:
>> 
>> Many thanks to all of you for your responses.
>> 
>> So, I will try to be clearer with a larger example. Te end of my mail
is the more important to understand what I am trying to do. I am trying to
simulate data to fit a linear mixed model (nested not crossed). More precisely,
I would love to get at the end of the process, a table (.txt) with columns and
rows. Column 1 and Rows will be the 2000 pupils and the columns the different
variables : Column 2 = classes ; Column 3 = teachers, Column 4 = schools ;
Column 5 = gender (boy or girl) ; Column 6 = mark in Frecnh
>> 
>> Pupils are nested  in classes, classes are nested in schools. The
teacher are part of the process.
>> 
>> I want to simulate a dataset with n=2000 pupils, 100 classes, 50
teachers and 10 schools.
>> - Pupils n?1 to pupils n?2000 (p1, p2, p3, p4, ..., p2000)
>> - Classes n?1 to classes n?100 (c1, c2, c3, c4,..., c100)
>> - Teachers n?1 to teacher n?50 ( t1, t2, t3, t4, ..., t50)
>> - Schools n?1 to chool n?10 (s1, s2, s3, s4, ..., s10)
>> 
>> The nested structure is as followed : 
>> 
>> -- School 1 with teacher 1 to teacher 5 (t1, t2, t3, t4 and t5) with
classes 1 to classes 10 (c1, c2, c3, c4, c5, c6, c7, c8,c9,c10), pupils n?1 to
pupils n?200 (p1, p2, p3, p4,..., p200).
>> 
>> -- School 2 with teacher 6 to teacher 10, with classes 11 to classes
20, pupils n?201 to pupils n?400
>> 
>> -- and so on
>> 
>> The table (.txt) I would love to get at the end is the following :
>> 
>>         Class    Teacher    School    gender    Mark
>> 1      c1        t1                s1            boy        5
>> 2      c1        t1                s1            boy        5.5
>> 3      c1        t1                s1            girl        4.5
>> 4      c1        t1                s1            girl        6
>> 5      c1        t1                s1            boy      3.5
>> 6      ...        ....                ....            .....       
.....
>> 
>> The first 20 rows with c1, with t1, with s1, gender (randomly slected)
and mark (andomly selected) from 1 to 6
>> The rows 21 to 40 with c2 with t1 with s1
>> The rows 41 to 60 with c3 with t2 with s1
>> The rows 61 to 80 with c4 with t2 with s1
>> The rows 81 to 100 with c5 with t3 with s1
>> The rows 101 to 120 with c6 with t3 with s1
>> The rows 121 to 140 with c7 with t4 with s1
>> The rows 141 to 160 with c8 with t4 with s1
>> The rows 161 to 180 with c9 with t5 with s1
>> The rows 181 to 200 with c10 with t5 with s1
>> 
>> The rows 201 to 220 with c11 with t6 with s2
>> The rows 221 to 240 with c12 with t6 with s2
>> 
>> And so on...
>> 
>> Is it possible to do that ? Or am I dreaming ?
>> 
>> 
>> Le dimanche 19 mai 2019 ? 10:45:43 UTC+2, Linus Chen <linus.l.chen
at gmail.com> a ?crit :
>> 
>> 
>> 
>> 
>> 
>> Dear varin sacha,
>> 
>> I think it will help us help you, if you give a clearer description of
>> what exactly you want.
>> 
>> I assume the situation is that you know what a data structure you
>> want, but do not know
>> how to conveniently create such structure.
>> And that is where others can help you.
>> So, please, describe the wanted data structure more thoroughly,
>> ideally with example.
>> 
>> Thanks,
>> Lei
>> 
>> On Sat, May 18, 2019 at 10:04 PM varin sacha via R-help
>> <r-help at r-project.org> wrote:
>>> 
>>> Dear Boris,
>>> 
>>> Yes, top-down, no problem. Many thanks, but in your code did you
not forget "teacher" ? As a reminder teacher has to be nested with
classes. I mean the 50 pupils belonging to C1 must be with (teacher 1) T1, the
50 pupils belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so
on.
>>> 
>>> Best,
>>> 
>>> 
>>> Le samedi 18 mai 2019 ? 16:52:48 UTC+2, Boris Steipe
<boris.steipe at utoronto.ca> a ?crit :
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Can you build your data top-down?
>>> 
>>> 
>>> 
>>> schools <- paste("s", 1:6, sep="")
>>> 
>>> classes <- character()
>>> for (school in schools) {
>>>   classes <- c(classes, paste(school, paste("c", 1:5,
sep=""), sep = "."))
>>> }
>>> 
>>> pupils <- character()
>>> for (class in classes) {
>>>   pupils <- c(pupils, paste(class, paste("p", 1:10,
sep=""), sep = "."))
>>> }
>>> 
>>> 
>>> 
>>> B.
>>> 
>>> 
>>> 
>>>> On 2019-05-18, at 09:57, varin sacha via R-help <r-help at
r-project.org> wrote:
>>>> 
>>>> Dear R-Experts,
>>>> 
>>>> In a data simulation, I would like a balanced distribution with
a nested structure for classroom and teacher (not for school). I mean 50 pupils
belonging to C1, 50 other pupils belonging to C2, 50 other pupils belonging to
C3 and so on. Then I want the 50 pupils belonging to C1 with T1, the 50 pupils
belonging to C2 with T2, the 50 pupils belonging to C3 with T3 and so on. The
school don?t have to be nested, I just want a balanced distribution, I mean 60
pupils in S1, 60 other pupils in S2 and so on.
>>>> Here below the reproducible example.
>>>> Many thanks for your help.
>>>> 
>>>> ##############
>>>> set.seed(123)
>>>> # G?n?ration al?atoire des colonnes
>>>> pupils<-1:300
>>>>
classroom<-sample(c("C1","C2","C3","C4","C5","C6"),300,replace=T)
teacher<-sample(c("T1","T2","T3","T4","T5","T6"),300,replace=T)
school<-sample(c("S1","S2","S3","S4","S5"),300,replace=T)
>>> 
>>>> ##############
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>

R help - May 2019 - Nested structure data simulation

[R] Nested structure data simulation

[R] Nested structure data simulation

[R] Nested structure data simulation

[R] Nested structure data simulation