thr3ads.net - R help - [R] problem applying the same function twice [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Curtis Burkhalter

2015-Mar-10 21:35 UTC

[R] problem applying the same function twice

Thanks Sarah, one of my column names was missing a letter so it was
throwing things off. It works super fast now and is exactly what I needed.
My actual data set  has about 6 other ancillary response data data columns,
is there a way to combine the 'full' data set I just created with the
original in case I need any of the other response variables. E.g.

FULL:                                          Original:
                                           Combined:
site    year     sample                    site    year     sample
color     shape                  site    year     sample     color
shape
1        1         10                           1        1         10
     blue       diamond              1        1         10            blue
      diamond
1         1        12                           1         1        12
     green     pyramid               1         1        12            green
    pyramid
1         1        NA
                                               1         1        NA
    NA        NA

Thanks

On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> Yeah, that's tiny:
>
> > fullout <- expand.grid(site=1:669, year=1:7, sample=1:3)
> > dim(fullout)
> [1] 14049     3
>
>
> Almost certainly the problem is that your expand.grid result doesn't
> have the same column names as your actual data file, so merge() is
> trying to make an enormous result. Note how when I made outgrid in the
> example I named the columns.
>
> Make sure that the names are identical!
>
>
> On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
> <curtisburkhalter at gmail.com> wrote:
> > Sarah,
> >
> > I have 669 sites and each site has 7 years of data, so if I'm
thinking
> > correctly then there should be 4683 possible combinations of site x
year.
> > For each year though I need 3 sampling periods so that there is
something
> > like the following:
> >
> > site 1      year1      sample 1
> > site 1      year1      sample 2
> > site 1      year1      sample 3
> > site 2      year1      sample 1
> > site 2      year1      sample 2
> > site 2      year1      sample 3.....
> > site 669   year7      sample 1
> > site 669   year7     sample 2
> > site 669   year7     sample 3.
> >
> > I have my max memory allocation set to the amount of RAM (8GB) on my
> laptop,
> > but it still 'times out' due to memory problems.
> >
> > On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> > wrote:
> >>
> >> You said your data only had 14000 rows, which really isn't
many.
> >>
> >> How many possible combinations do you have, and how many do you
need to
> >> add?
> >>
> >> On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
> >> <curtisburkhalter at gmail.com> wrote:
> >> > Sarah,
> >> >
> >> > This strategy works great for this small dataset, but when I
attempt
> >> > your
> >> > method with my data set I reach the maximum allowable memory
> allocation
> >> > and
> >> > the operation just stalls and then stops completely before it
is
> >> > finished.
> >> > Do you know of a way around this?
> >> >
> >> > Thanks
> >> >
> >> > On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
<sarah.goslee at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I didn't work through your code, because it looked
overly
> complicated.
> >> >> Here's a more general approach that does what you
appear to want:
> >> >>
> >> >> # use dput() to provide reproducible data please!
> >> >> comAn <- structure(list(animals = c("bird",
"bird", "bird", "bird",
> >> >> "bird",
> >> >> "bird", "dog", "dog",
"dog", "dog", "dog", "dog",
"cat", "cat",
> >> >> "cat", "cat"), animalYears = c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L,
> >> >> 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L, 48L,
36L,
> >> >> 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L, 33L,
48L, 21L
> >> >> )), .Names = c("animals",
"animalYears", "animalMass"), class > >> >>
"data.frame", row.names = c("1",
> >> >> "2", "3", "4",
"5", "6", "7", "8", "9",
"10", "11", "12", "13",
> >> >> "14", "15", "16"))
> >> >>
> >> >>
> >> >> # add reps to comAn
> >> >> # assumes comAn is already sorted on animals, animalYears
> >> >> comAn$reps <-
unlist(sapply(rle(do.call("paste",
> >> >> comAn[,1:2]))$lengths, seq_len))
> >> >>
> >> >> # create full set of combinations
> >> >> outgrid <- expand.grid(animals=unique(comAn$animals),
> >> >> animalYears=unique(comAn$animalYears),
reps=unique(comAn$reps),
> >> >> stringsAsFactors=FALSE)
> >> >>
> >> >> # combine with comAn
> >> >> comAn.full <- merge(outgrid, comAn, all.x=TRUE)
> >> >>
> >> >> > comAn.full
> >> >>    animals animalYears reps animalMass
> >> >> 1     bird           1    1         29
> >> >> 2     bird           1    2         48
> >> >> 3     bird           1    3         36
> >> >> 4     bird           2    1         20
> >> >> 5     bird           2    2         34
> >> >> 6     bird           2    3         34
> >> >> 7      cat           1    1         46
> >> >> 8      cat           1    2         33
> >> >> 9      cat           1    3         48
> >> >> 10     cat           2    1         21
> >> >> 11     cat           2    2         NA
> >> >> 12     cat           2    3         NA
> >> >> 13     dog           1    1         21
> >> >> 14     dog           1    2         28
> >> >> 15     dog           1    3         25
> >> >> 16     dog           2    1         35
> >> >> 17     dog           2    2         18
> >> >> 18     dog           2    3         11
> >> >> >
> >> >>
> >> >> On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
> >> >> <curtisburkhalter at gmail.com> wrote:
> >> >> > Hey everyone,
> >> >> >
> >> >> > I've written a function that adds NAs to a
dataframe where data is
> >> >> > missing
> >> >> > and it seems to work great if I only need to run it
once, but if I
> >> >> > run
> >> >> > it
> >> >> > two times in a row I run into problems. I've
created a workable
> >> >> > example
> >> >> > to
> >> >> > explain what I mean and why I would do this.
> >> >> >
> >> >> > In my dataframe there are areas where I need to add
two rows of NAs
> >> >> > (b/c
> >> >> > I
> >> >> > need to have 3 animal x year combos and for cat in
year 2 I only
> have
> >> >> > one)
> >> >> > so I thought that I'd just run my code twice
using the function in
> >> >> > the
> >> >> > code
> >> >> > below. Everything works great when I run it the
first time, but
> when
> >> >> > I
> >> >> > run
> >> >> > it again it says that the value returned to the list
'x' is of
> length
> >> >> > 0.
> >> >> > I
> >> >> > don't understand why the function works the
first time around and
> >> >> > adds
> >> >> > an
> >> >> > NA to the 'animalMass' column, but won't
do it again. I've used
> >> >> > (print(str(dataframe)) to see if there is a change
in class or type
> >> >> > when
> >> >> > the function runs through the original dataframe and
there is for
> >> >> > 'animalYears', but I just convert it back
before rerunning the
> >> >> > function
> >> >> > for
> >> >> > second time.
> >> >> >
> >> >> > Any thoughts on this would be greatly appreciated
b/c my actual
> data
> >> >> > dataframe I have to input into WinBUGS is 14000x12,
so it's not a
> >> >> > trivial
> >> >> > thing to just add in an NA here or there.
> >> >> >
> >> >> >>comAn
> >> >> >    animals animalYears animalMass
> >> >> > 1     bird           1         29
> >> >> > 2     bird           1         48
> >> >> > 3     bird           1         36
> >> >> > 4     bird           2         20
> >> >> > 5     bird           2         34
> >> >> > 6     bird           2         34
> >> >> > 7      dog           1         21
> >> >> > 8      dog           1         28
> >> >> > 9      dog           1         25
> >> >> > 10     dog           2         35
> >> >> > 11     dog           2         18
> >> >> > 12     dog           2         11
> >> >> > 13     cat           1         46
> >> >> > 14     cat           1         33
> >> >> > 15     cat           1         48
> >> >> > 16     cat           2         21
> >> >> >
> >> >> > So every animal has 3 measurements per year, except
for the cat in
> >> >> > year
> >> >> > two
> >> >> > which has only 1. I run the code below and get:
> >> >> >
> >> >> > #combs defines the different combinations of
> >> >> > #animals and animalYears
> >> >> >
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> >> > #counts defines how long the different combinations
are
> >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> >> > #missing defines the combs that have length less
than one and puts
> it
> >> >> > in
> >> >> > #the data frame missing
> >> >> >
missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
> >> >> >
> >> >> > genRows<-function(dat){
> >> >> >         vals<-strsplit(dat[1],':')[[1]]
> >> >> >                 #not sure why dat[2] is being
converted to a string
> >> >> >         newRows<-2-as.numeric(dat[2])
> >> >> >        
newDf<-data.frame(animals=rep(vals[1],newRows),
> >> >> >                          
animalYears=rep(vals[2],newRows),
> >> >> >                          
animalMass=rep(NA,newRows))
> >> >> >         return(newDf)
> >> >> >         }
> >> >> >
> >> >> >
> >> >> > x<-apply(missing,1,genRows)
> >> >> > comAn=rbind(comAn,
> >> >> >         do.call(rbind,x))
> >> >> >
> >> >> >> comAn
> >> >> >    animals animalYears animalMass
> >> >> > 1     bird           1         29
> >> >> > 2     bird           1         48
> >> >> > 3     bird           1         36
> >> >> > 4     bird           2         20
> >> >> > 5     bird           2         34
> >> >> > 6     bird           2         34
> >> >> > 7      dog           1         21
> >> >> > 8      dog           1         28
> >> >> > 9      dog           1         25
> >> >> > 10     dog           2         35
> >> >> > 11     dog           2         18
> >> >> > 12     dog           2         11
> >> >> > 13     cat           1         46
> >> >> > 14     cat           1         33
> >> >> > 15     cat           1         48
> >> >> > 16     cat           2         21
> >> >> > 17     cat           2       <NA>
> >> >> >
> >> >> > So far so good, but then I adjust the code so that
it reads
> (**notice
> >> >> > the
> >> >> > change in the specification in 'missing' to
counts<3**):
> >> >> >
> >> >> > #combs defines the different combinations of
> >> >> > #animals and animalYears
> >> >> >
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> >> > #counts defines how long the different combinations
are
> >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> >> > #missing defines the combs that have length less
than one and puts
> it
> >> >> > in
> >> >> > #the data frame missing
> >> >> >
missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
> >> >> >
> >> >> > genRows<-function(dat){
> >> >> >         vals<-strsplit(dat[1],':')[[1]]
> >> >> >                 #not sure why dat[2] is being
converted to a string
> >> >> >         newRows<-2-as.numeric(dat[2])
> >> >> >        
newDf<-data.frame(animals=rep(vals[1],newRows),
> >> >> >                          
animalYears=rep(vals[2],newRows),
> >> >> >                          
animalMass=rep(NA,newRows))
> >> >> >         return(newDf)
> >> >> >         }
> >> >> >
> >> >> >
> >> >> > x<-apply(missing,1,genRows)
> >> >> > comAn=rbind(comAn,
> >> >> >         do.call(rbind,x))
> >> >> >
> >> >> > The result for 'x' then reads:
> >> >> >
> >> >> >> x
> >> >> > [[1]]
> >> >> > [1] animals     animalYears animalMass
> >> >> > <0 rows> (or 0-length row.names)
> >> >> >
> >> >> > Any thoughts on why it might be doing this instead
of adding an
> >> >> > additional
> >> >> > row to get the result:
> >> >> >
> >> >> >> comAn
> >> >> >    animals animalYears animalMass
> >> >> > 1     bird           1         29
> >> >> > 2     bird           1         48
> >> >> > 3     bird           1         36
> >> >> > 4     bird           2         20
> >> >> > 5     bird           2         34
> >> >> > 6     bird           2         34
> >> >> > 7      dog           1         21
> >> >> > 8      dog           1         28
> >> >> > 9      dog           1         25
> >> >> > 10     dog           2         35
> >> >> > 11     dog           2         18
> >> >> > 12     dog           2         11
> >> >> > 13     cat           1         46
> >> >> > 14     cat           1         33
> >> >> > 15     cat           1         48
> >> >> > 16     cat           2         21
> >> >> > 17     cat           2       <NA>
> >> >> > 18     cat           2       <NA>
> >> >> >
> >> >> > Thanks
> >> >> > --
> >> >> > Curtis Burkhalter
> >> >
> >> >
>


-- 
Curtis Burkhalter

https://sites.google.com/site/curtisburkhalter/

	[[alternative HTML version deleted]]

Sarah Goslee

2015-Mar-10 23:57 UTC

head link

[R] problem applying the same function twice

I think you're kind of missing the way this works:

the data frame created by expand.grid() should ONLY have site, year,
sample (with the exact names used in the data itself).
Then the merged data frame will have the full site,year,sample
combinations, along with ALL the data variables. Your animal example
only had one measured variable, but the same method will work with any
number.
Reading ?merge might help you understand.

Sarah

On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter
<curtisburkhalter at gmail.com> wrote:>
> Thanks Sarah, one of my column names was missing a letter so it was
throwing
> things off. It works super fast now and is exactly what I needed. My actual
> data set  has about 6 other ancillary response data data columns, is there
a
> way to combine the 'full' data set I just created with the original
in case
> I need any of the other response variables. E.g.
>
> FULL:                                          Original:
> Combined:
> site    year     sample                    site    year     sample    
color
> shape                  site    year     sample     color     shape
> 1        1         10                           1        1         10
> blue       diamond              1        1         10            blue
> diamond
> 1         1        12                           1         1        12
> green     pyramid               1         1        12            green
> pyramid
> 1         1        NA
> 1         1        NA           NA        NA
>
> Thanks
>
> On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> wrote:
>>
>> Yeah, that's tiny:
>>
>> > fullout <- expand.grid(site=1:669, year=1:7, sample=1:3)
>> > dim(fullout)
>> [1] 14049     3
>>
>>
>> Almost certainly the problem is that your expand.grid result
doesn't
>> have the same column names as your actual data file, so merge() is
>> trying to make an enormous result. Note how when I made outgrid in the
>> example I named the columns.
>>
>> Make sure that the names are identical!
>>
>>
>> On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
>> <curtisburkhalter at gmail.com> wrote:
>> > Sarah,
>> >
>> > I have 669 sites and each site has 7 years of data, so if I'm
thinking
>> > correctly then there should be 4683 possible combinations of site
x
>> > year.
>> > For each year though I need 3 sampling periods so that there is
>> > something
>> > like the following:
>> >
>> > site 1      year1      sample 1
>> > site 1      year1      sample 2
>> > site 1      year1      sample 3
>> > site 2      year1      sample 1
>> > site 2      year1      sample 2
>> > site 2      year1      sample 3.....
>> > site 669   year7      sample 1
>> > site 669   year7     sample 2
>> > site 669   year7     sample 3.
>> >
>> > I have my max memory allocation set to the amount of RAM (8GB) on
my
>> > laptop,
>> > but it still 'times out' due to memory problems.
>> >
>> > On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee <sarah.goslee at
gmail.com>
>> > wrote:
>> >>
>> >> You said your data only had 14000 rows, which really isn't
many.
>> >>
>> >> How many possible combinations do you have, and how many do
you need to
>> >> add?
>> >>
>> >> On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
>> >> <curtisburkhalter at gmail.com> wrote:
>> >> > Sarah,
>> >> >
>> >> > This strategy works great for this small dataset, but
when I attempt
>> >> > your
>> >> > method with my data set I reach the maximum allowable
memory
>> >> > allocation
>> >> > and
>> >> > the operation just stalls and then stops completely
before it is
>> >> > finished.
>> >> > Do you know of a way around this?
>> >> >
>> >> > Thanks
>> >> >
>> >> > On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
>> >> > <sarah.goslee at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I didn't work through your code, because it
looked overly
>> >> >> complicated.
>> >> >> Here's a more general approach that does what you
appear to want:
>> >> >>
>> >> >> # use dput() to provide reproducible data please!
>> >> >> comAn <- structure(list(animals =
c("bird", "bird", "bird", "bird",
>> >> >> "bird",
>> >> >> "bird", "dog", "dog",
"dog", "dog", "dog", "dog",
"cat", "cat",
>> >> >> "cat", "cat"), animalYears =
c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
>> >> >> 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass = c(29L,
48L, 36L,
>> >> >> 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L, 46L,
33L, 48L, 21L
>> >> >> )), .Names = c("animals",
"animalYears", "animalMass"), class >> >>
>> "data.frame", row.names = c("1",
>> >> >> "2", "3", "4",
"5", "6", "7", "8", "9",
"10", "11", "12", "13",
>> >> >> "14", "15", "16"))
>> >> >>
>> >> >>
>> >> >> # add reps to comAn
>> >> >> # assumes comAn is already sorted on animals,
animalYears
>> >> >> comAn$reps <-
unlist(sapply(rle(do.call("paste",
>> >> >> comAn[,1:2]))$lengths, seq_len))
>> >> >>
>> >> >> # create full set of combinations
>> >> >> outgrid <-
expand.grid(animals=unique(comAn$animals),
>> >> >> animalYears=unique(comAn$animalYears),
reps=unique(comAn$reps),
>> >> >> stringsAsFactors=FALSE)
>> >> >>
>> >> >> # combine with comAn
>> >> >> comAn.full <- merge(outgrid, comAn, all.x=TRUE)
>> >> >>
>> >> >> > comAn.full
>> >> >>    animals animalYears reps animalMass
>> >> >> 1     bird           1    1         29
>> >> >> 2     bird           1    2         48
>> >> >> 3     bird           1    3         36
>> >> >> 4     bird           2    1         20
>> >> >> 5     bird           2    2         34
>> >> >> 6     bird           2    3         34
>> >> >> 7      cat           1    1         46
>> >> >> 8      cat           1    2         33
>> >> >> 9      cat           1    3         48
>> >> >> 10     cat           2    1         21
>> >> >> 11     cat           2    2         NA
>> >> >> 12     cat           2    3         NA
>> >> >> 13     dog           1    1         21
>> >> >> 14     dog           1    2         28
>> >> >> 15     dog           1    3         25
>> >> >> 16     dog           2    1         35
>> >> >> 17     dog           2    2         18
>> >> >> 18     dog           2    3         11
>> >> >> >
>> >> >>
>> >> >> On Tue, Mar 10, 2015 at 3:43 PM, Curtis Burkhalter
>> >> >> <curtisburkhalter at gmail.com> wrote:
>> >> >> > Hey everyone,
>> >> >> >
>> >> >> > I've written a function that adds NAs to a
dataframe where data is
>> >> >> > missing
>> >> >> > and it seems to work great if I only need to run
it once, but if I
>> >> >> > run
>> >> >> > it
>> >> >> > two times in a row I run into problems. I've
created a workable
>> >> >> > example
>> >> >> > to
>> >> >> > explain what I mean and why I would do this.
>> >> >> >
>> >> >> > In my dataframe there are areas where I need to
add two rows of
>> >> >> > NAs
>> >> >> > (b/c
>> >> >> > I
>> >> >> > need to have 3 animal x year combos and for cat
in year 2 I only
>> >> >> > have
>> >> >> > one)
>> >> >> > so I thought that I'd just run my code twice
using the function in
>> >> >> > the
>> >> >> > code
>> >> >> > below. Everything works great when I run it the
first time, but
>> >> >> > when
>> >> >> > I
>> >> >> > run
>> >> >> > it again it says that the value returned to the
list 'x' is of
>> >> >> > length
>> >> >> > 0.
>> >> >> > I
>> >> >> > don't understand why the function works the
first time around and
>> >> >> > adds
>> >> >> > an
>> >> >> > NA to the 'animalMass' column, but
won't do it again. I've used
>> >> >> > (print(str(dataframe)) to see if there is a
change in class or
>> >> >> > type
>> >> >> > when
>> >> >> > the function runs through the original dataframe
and there is for
>> >> >> > 'animalYears', but I just convert it
back before rerunning the
>> >> >> > function
>> >> >> > for
>> >> >> > second time.
>> >> >> >
>> >> >> > Any thoughts on this would be greatly
appreciated b/c my actual
>> >> >> > data
>> >> >> > dataframe I have to input into WinBUGS is
14000x12, so it's not a
>> >> >> > trivial
>> >> >> > thing to just add in an NA here or there.
>> >> >> >
>> >> >> >>comAn
>> >> >> >    animals animalYears animalMass
>> >> >> > 1     bird           1         29
>> >> >> > 2     bird           1         48
>> >> >> > 3     bird           1         36
>> >> >> > 4     bird           2         20
>> >> >> > 5     bird           2         34
>> >> >> > 6     bird           2         34
>> >> >> > 7      dog           1         21
>> >> >> > 8      dog           1         28
>> >> >> > 9      dog           1         25
>> >> >> > 10     dog           2         35
>> >> >> > 11     dog           2         18
>> >> >> > 12     dog           2         11
>> >> >> > 13     cat           1         46
>> >> >> > 14     cat           1         33
>> >> >> > 15     cat           1         48
>> >> >> > 16     cat           2         21
>> >> >> >
>> >> >> > So every animal has 3 measurements per year,
except for the cat in
>> >> >> > year
>> >> >> > two
>> >> >> > which has only 1. I run the code below and get:
>> >> >> >
>> >> >> > #combs defines the different combinations of
>> >> >> > #animals and animalYears
>> >> >> >
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
>> >> >> > #counts defines how long the different
combinations are
>> >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
>> >> >> > #missing defines the combs that have length less
than one and puts
>> >> >> > it
>> >> >> > in
>> >> >> > #the data frame missing
>> >> >> >
missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
>> >> >> >
>> >> >> > genRows<-function(dat){
>> >> >> >        
vals<-strsplit(dat[1],':')[[1]]
>> >> >> >                 #not sure why dat[2] is being
converted to a
>> >> >> > string
>> >> >> >         newRows<-2-as.numeric(dat[2])
>> >> >> >        
newDf<-data.frame(animals=rep(vals[1],newRows),
>> >> >> >                          
animalYears=rep(vals[2],newRows),
>> >> >> >                          
animalMass=rep(NA,newRows))
>> >> >> >         return(newDf)
>> >> >> >         }
>> >> >> >
>> >> >> >
>> >> >> > x<-apply(missing,1,genRows)
>> >> >> > comAn=rbind(comAn,
>> >> >> >         do.call(rbind,x))
>> >> >> >
>> >> >> >> comAn
>> >> >> >    animals animalYears animalMass
>> >> >> > 1     bird           1         29
>> >> >> > 2     bird           1         48
>> >> >> > 3     bird           1         36
>> >> >> > 4     bird           2         20
>> >> >> > 5     bird           2         34
>> >> >> > 6     bird           2         34
>> >> >> > 7      dog           1         21
>> >> >> > 8      dog           1         28
>> >> >> > 9      dog           1         25
>> >> >> > 10     dog           2         35
>> >> >> > 11     dog           2         18
>> >> >> > 12     dog           2         11
>> >> >> > 13     cat           1         46
>> >> >> > 14     cat           1         33
>> >> >> > 15     cat           1         48
>> >> >> > 16     cat           2         21
>> >> >> > 17     cat           2       <NA>
>> >> >> >
>> >> >> > So far so good, but then I adjust the code so
that it reads
>> >> >> > (**notice
>> >> >> > the
>> >> >> > change in the specification in 'missing'
to counts<3**):
>> >> >> >
>> >> >> > #combs defines the different combinations of
>> >> >> > #animals and animalYears
>> >> >> >
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
>> >> >> > #counts defines how long the different
combinations are
>> >> >> > counts<-ave(1:nrow(comAn),combs,FUN=length)
>> >> >> > #missing defines the combs that have length less
than one and puts
>> >> >> > it
>> >> >> > in
>> >> >> > #the data frame missing
>> >> >> >
missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
>> >> >> >
>> >> >> > genRows<-function(dat){
>> >> >> >        
vals<-strsplit(dat[1],':')[[1]]
>> >> >> >                 #not sure why dat[2] is being
converted to a
>> >> >> > string
>> >> >> >         newRows<-2-as.numeric(dat[2])
>> >> >> >        
newDf<-data.frame(animals=rep(vals[1],newRows),
>> >> >> >                          
animalYears=rep(vals[2],newRows),
>> >> >> >                          
animalMass=rep(NA,newRows))
>> >> >> >         return(newDf)
>> >> >> >         }
>> >> >> >
>> >> >> >
>> >> >> > x<-apply(missing,1,genRows)
>> >> >> > comAn=rbind(comAn,
>> >> >> >         do.call(rbind,x))
>> >> >> >
>> >> >> > The result for 'x' then reads:
>> >> >> >
>> >> >> >> x
>> >> >> > [[1]]
>> >> >> > [1] animals     animalYears animalMass
>> >> >> > <0 rows> (or 0-length row.names)
>> >> >> >
>> >> >> > Any thoughts on why it might be doing this
instead of adding an
>> >> >> > additional
>> >> >> > row to get the result:
>> >> >> >
>> >> >> >> comAn
>> >> >> >    animals animalYears animalMass
>> >> >> > 1     bird           1         29
>> >> >> > 2     bird           1         48
>> >> >> > 3     bird           1         36
>> >> >> > 4     bird           2         20
>> >> >> > 5     bird           2         34
>> >> >> > 6     bird           2         34
>> >> >> > 7      dog           1         21
>> >> >> > 8      dog           1         28
>> >> >> > 9      dog           1         25
>> >> >> > 10     dog           2         35
>> >> >> > 11     dog           2         18
>> >> >> > 12     dog           2         11
>> >> >> > 13     cat           1         46
>> >> >> > 14     cat           1         33
>> >> >> > 15     cat           1         48
>> >> >> > 16     cat           2         21
>> >> >> > 17     cat           2       <NA>
>> >> >> > 18     cat           2       <NA>
>> >> >> >
>> >> >> > Thanks
>> >> >> > --
>> >> >> > Curtis Burkhalter
>> >> >
>> >> >

Curtis Burkhalter

2015-Mar-11 00:46 UTC

head link

[R] problem applying the same function twice

Sarah,

I realized what I was saying after I pressed send on the email. It makes
perfect sense now, thanks so much for your help and patience.
On Mar 10, 2015 5:57 PM, "Sarah Goslee" <sarah.goslee at
gmail.com> wrote:
> I think you're kind of missing the way this works:
>
> the data frame created by expand.grid() should ONLY have site, year,
> sample (with the exact names used in the data itself).
> Then the merged data frame will have the full site,year,sample
> combinations, along with ALL the data variables. Your animal example
> only had one measured variable, but the same method will work with any
> number.
> Reading ?merge might help you understand.
>
> Sarah
>
> On Tue, Mar 10, 2015 at 5:35 PM, Curtis Burkhalter
> <curtisburkhalter at gmail.com> wrote:
> >
> > Thanks Sarah, one of my column names was missing a letter so it was
> throwing
> > things off. It works super fast now and is exactly what I needed. My
> actual
> > data set  has about 6 other ancillary response data data columns, is
> there a
> > way to combine the 'full' data set I just created with the
original in
> case
> > I need any of the other response variables. E.g.
> >
> > FULL:                                          Original:
> > Combined:
> > site    year     sample                    site    year     sample
>  color
> > shape                  site    year     sample     color     shape
> > 1        1         10                           1        1         10
> > blue       diamond              1        1         10            blue
> > diamond
> > 1         1        12                           1         1        12
> > green     pyramid               1         1        12            green
> > pyramid
> > 1         1        NA
> > 1         1        NA           NA        NA
> >
> > Thanks
> >
> > On Tue, Mar 10, 2015 at 3:12 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> > wrote:
> >>
> >> Yeah, that's tiny:
> >>
> >> > fullout <- expand.grid(site=1:669, year=1:7, sample=1:3)
> >> > dim(fullout)
> >> [1] 14049     3
> >>
> >>
> >> Almost certainly the problem is that your expand.grid result
doesn't
> >> have the same column names as your actual data file, so merge() is
> >> trying to make an enormous result. Note how when I made outgrid in
the
> >> example I named the columns.
> >>
> >> Make sure that the names are identical!
> >>
> >>
> >> On Tue, Mar 10, 2015 at 4:57 PM, Curtis Burkhalter
> >> <curtisburkhalter at gmail.com> wrote:
> >> > Sarah,
> >> >
> >> > I have 669 sites and each site has 7 years of data, so if
I'm thinking
> >> > correctly then there should be 4683 possible combinations of
site x
> >> > year.
> >> > For each year though I need 3 sampling periods so that there
is
> >> > something
> >> > like the following:
> >> >
> >> > site 1      year1      sample 1
> >> > site 1      year1      sample 2
> >> > site 1      year1      sample 3
> >> > site 2      year1      sample 1
> >> > site 2      year1      sample 2
> >> > site 2      year1      sample 3.....
> >> > site 669   year7      sample 1
> >> > site 669   year7     sample 2
> >> > site 669   year7     sample 3.
> >> >
> >> > I have my max memory allocation set to the amount of RAM
(8GB) on my
> >> > laptop,
> >> > but it still 'times out' due to memory problems.
> >> >
> >> > On Tue, Mar 10, 2015 at 2:50 PM, Sarah Goslee
<sarah.goslee at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> You said your data only had 14000 rows, which really
isn't many.
> >> >>
> >> >> How many possible combinations do you have, and how many
do you need
> to
> >> >> add?
> >> >>
> >> >> On Tue, Mar 10, 2015 at 4:35 PM, Curtis Burkhalter
> >> >> <curtisburkhalter at gmail.com> wrote:
> >> >> > Sarah,
> >> >> >
> >> >> > This strategy works great for this small dataset,
but when I
> attempt
> >> >> > your
> >> >> > method with my data set I reach the maximum
allowable memory
> >> >> > allocation
> >> >> > and
> >> >> > the operation just stalls and then stops completely
before it is
> >> >> > finished.
> >> >> > Do you know of a way around this?
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >> > On Tue, Mar 10, 2015 at 2:04 PM, Sarah Goslee
> >> >> > <sarah.goslee at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I didn't work through your code, because it
looked overly
> >> >> >> complicated.
> >> >> >> Here's a more general approach that does
what you appear to want:
> >> >> >>
> >> >> >> # use dput() to provide reproducible data
please!
> >> >> >> comAn <- structure(list(animals =
c("bird", "bird", "bird",
> "bird",
> >> >> >> "bird",
> >> >> >> "bird", "dog",
"dog", "dog", "dog", "dog",
"dog", "cat", "cat",
> >> >> >> "cat", "cat"), animalYears =
c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
> >> >> >> 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L), animalMass =
c(29L, 48L, 36L,
> >> >> >> 20L, 34L, 34L, 21L, 28L, 25L, 35L, 18L, 11L,
46L, 33L, 48L, 21L
> >> >> >> )), .Names = c("animals",
"animalYears", "animalMass"), class > >> >>
>> "data.frame", row.names = c("1",
> >> >> >> "2", "3", "4",
"5", "6", "7", "8", "9",
"10", "11", "12", "13",
> >> >> >> "14", "15", "16"))
> >> >> >>
> >> >> >>
> >> >> >> # add reps to comAn
> >> >> >> # assumes comAn is already sorted on animals,
animalYears
> >> >> >> comAn$reps <-
unlist(sapply(rle(do.call("paste",
> >> >> >> comAn[,1:2]))$lengths, seq_len))
> >> >> >>
> >> >> >> # create full set of combinations
> >> >> >> outgrid <-
expand.grid(animals=unique(comAn$animals),
> >> >> >> animalYears=unique(comAn$animalYears),
reps=unique(comAn$reps),
> >> >> >> stringsAsFactors=FALSE)
> >> >> >>
> >> >> >> # combine with comAn
> >> >> >> comAn.full <- merge(outgrid, comAn,
all.x=TRUE)
> >> >> >>
> >> >> >> > comAn.full
> >> >> >>    animals animalYears reps animalMass
> >> >> >> 1     bird           1    1         29
> >> >> >> 2     bird           1    2         48
> >> >> >> 3     bird           1    3         36
> >> >> >> 4     bird           2    1         20
> >> >> >> 5     bird           2    2         34
> >> >> >> 6     bird           2    3         34
> >> >> >> 7      cat           1    1         46
> >> >> >> 8      cat           1    2         33
> >> >> >> 9      cat           1    3         48
> >> >> >> 10     cat           2    1         21
> >> >> >> 11     cat           2    2         NA
> >> >> >> 12     cat           2    3         NA
> >> >> >> 13     dog           1    1         21
> >> >> >> 14     dog           1    2         28
> >> >> >> 15     dog           1    3         25
> >> >> >> 16     dog           2    1         35
> >> >> >> 17     dog           2    2         18
> >> >> >> 18     dog           2    3         11
> >> >> >> >
> >> >> >>
> >> >> >> On Tue, Mar 10, 2015 at 3:43 PM, Curtis
Burkhalter
> >> >> >> <curtisburkhalter at gmail.com> wrote:
> >> >> >> > Hey everyone,
> >> >> >> >
> >> >> >> > I've written a function that adds NAs
to a dataframe where data
> is
> >> >> >> > missing
> >> >> >> > and it seems to work great if I only need
to run it once, but
> if I
> >> >> >> > run
> >> >> >> > it
> >> >> >> > two times in a row I run into problems.
I've created a workable
> >> >> >> > example
> >> >> >> > to
> >> >> >> > explain what I mean and why I would do
this.
> >> >> >> >
> >> >> >> > In my dataframe there are areas where I
need to add two rows of
> >> >> >> > NAs
> >> >> >> > (b/c
> >> >> >> > I
> >> >> >> > need to have 3 animal x year combos and for
cat in year 2 I only
> >> >> >> > have
> >> >> >> > one)
> >> >> >> > so I thought that I'd just run my code
twice using the function
> in
> >> >> >> > the
> >> >> >> > code
> >> >> >> > below. Everything works great when I run it
the first time, but
> >> >> >> > when
> >> >> >> > I
> >> >> >> > run
> >> >> >> > it again it says that the value returned to
the list 'x' is of
> >> >> >> > length
> >> >> >> > 0.
> >> >> >> > I
> >> >> >> > don't understand why the function works
the first time around
> and
> >> >> >> > adds
> >> >> >> > an
> >> >> >> > NA to the 'animalMass' column, but
won't do it again. I've used
> >> >> >> > (print(str(dataframe)) to see if there is a
change in class or
> >> >> >> > type
> >> >> >> > when
> >> >> >> > the function runs through the original
dataframe and there is
> for
> >> >> >> > 'animalYears', but I just convert
it back before rerunning the
> >> >> >> > function
> >> >> >> > for
> >> >> >> > second time.
> >> >> >> >
> >> >> >> > Any thoughts on this would be greatly
appreciated b/c my actual
> >> >> >> > data
> >> >> >> > dataframe I have to input into WinBUGS is
14000x12, so it's not
> a
> >> >> >> > trivial
> >> >> >> > thing to just add in an NA here or there.
> >> >> >> >
> >> >> >> >>comAn
> >> >> >> >    animals animalYears animalMass
> >> >> >> > 1     bird           1         29
> >> >> >> > 2     bird           1         48
> >> >> >> > 3     bird           1         36
> >> >> >> > 4     bird           2         20
> >> >> >> > 5     bird           2         34
> >> >> >> > 6     bird           2         34
> >> >> >> > 7      dog           1         21
> >> >> >> > 8      dog           1         28
> >> >> >> > 9      dog           1         25
> >> >> >> > 10     dog           2         35
> >> >> >> > 11     dog           2         18
> >> >> >> > 12     dog           2         11
> >> >> >> > 13     cat           1         46
> >> >> >> > 14     cat           1         33
> >> >> >> > 15     cat           1         48
> >> >> >> > 16     cat           2         21
> >> >> >> >
> >> >> >> > So every animal has 3 measurements per
year, except for the cat
> in
> >> >> >> > year
> >> >> >> > two
> >> >> >> > which has only 1. I run the code below and
get:
> >> >> >> >
> >> >> >> > #combs defines the different combinations
of
> >> >> >> > #animals and animalYears
> >> >> >> >
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> >> >> > #counts defines how long the different
combinations are
> >> >> >> >
counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> >> >> > #missing defines the combs that have length
less than one and
> puts
> >> >> >> > it
> >> >> >> > in
> >> >> >> > #the data frame missing
> >> >> >> >
missing<-data.frame(vals=combs[counts<2],count=counts[counts<2])
> >> >> >> >
> >> >> >> > genRows<-function(dat){
> >> >> >> >        
vals<-strsplit(dat[1],':')[[1]]
> >> >> >> >                 #not sure why dat[2] is
being converted to a
> >> >> >> > string
> >> >> >> >         newRows<-2-as.numeric(dat[2])
> >> >> >> >        
newDf<-data.frame(animals=rep(vals[1],newRows),
> >> >> >> >                          
animalYears=rep(vals[2],newRows),
> >> >> >> >                          
animalMass=rep(NA,newRows))
> >> >> >> >         return(newDf)
> >> >> >> >         }
> >> >> >> >
> >> >> >> >
> >> >> >> > x<-apply(missing,1,genRows)
> >> >> >> > comAn=rbind(comAn,
> >> >> >> >         do.call(rbind,x))
> >> >> >> >
> >> >> >> >> comAn
> >> >> >> >    animals animalYears animalMass
> >> >> >> > 1     bird           1         29
> >> >> >> > 2     bird           1         48
> >> >> >> > 3     bird           1         36
> >> >> >> > 4     bird           2         20
> >> >> >> > 5     bird           2         34
> >> >> >> > 6     bird           2         34
> >> >> >> > 7      dog           1         21
> >> >> >> > 8      dog           1         28
> >> >> >> > 9      dog           1         25
> >> >> >> > 10     dog           2         35
> >> >> >> > 11     dog           2         18
> >> >> >> > 12     dog           2         11
> >> >> >> > 13     cat           1         46
> >> >> >> > 14     cat           1         33
> >> >> >> > 15     cat           1         48
> >> >> >> > 16     cat           2         21
> >> >> >> > 17     cat           2       <NA>
> >> >> >> >
> >> >> >> > So far so good, but then I adjust the code
so that it reads
> >> >> >> > (**notice
> >> >> >> > the
> >> >> >> > change in the specification in
'missing' to counts<3**):
> >> >> >> >
> >> >> >> > #combs defines the different combinations
of
> >> >> >> > #animals and animalYears
> >> >> >> >
combs<-paste(comAn$animals,comAn$animalYears,sep=':')
> >> >> >> > #counts defines how long the different
combinations are
> >> >> >> >
counts<-ave(1:nrow(comAn),combs,FUN=length)
> >> >> >> > #missing defines the combs that have length
less than one and
> puts
> >> >> >> > it
> >> >> >> > in
> >> >> >> > #the data frame missing
> >> >> >> >
missing<-data.frame(vals=combs[counts<3],count=counts[counts<3])
> >> >> >> >
> >> >> >> > genRows<-function(dat){
> >> >> >> >        
vals<-strsplit(dat[1],':')[[1]]
> >> >> >> >                 #not sure why dat[2] is
being converted to a
> >> >> >> > string
> >> >> >> >         newRows<-2-as.numeric(dat[2])
> >> >> >> >        
newDf<-data.frame(animals=rep(vals[1],newRows),
> >> >> >> >                          
animalYears=rep(vals[2],newRows),
> >> >> >> >                          
animalMass=rep(NA,newRows))
> >> >> >> >         return(newDf)
> >> >> >> >         }
> >> >> >> >
> >> >> >> >
> >> >> >> > x<-apply(missing,1,genRows)
> >> >> >> > comAn=rbind(comAn,
> >> >> >> >         do.call(rbind,x))
> >> >> >> >
> >> >> >> > The result for 'x' then reads:
> >> >> >> >
> >> >> >> >> x
> >> >> >> > [[1]]
> >> >> >> > [1] animals     animalYears animalMass
> >> >> >> > <0 rows> (or 0-length row.names)
> >> >> >> >
> >> >> >> > Any thoughts on why it might be doing this
instead of adding an
> >> >> >> > additional
> >> >> >> > row to get the result:
> >> >> >> >
> >> >> >> >> comAn
> >> >> >> >    animals animalYears animalMass
> >> >> >> > 1     bird           1         29
> >> >> >> > 2     bird           1         48
> >> >> >> > 3     bird           1         36
> >> >> >> > 4     bird           2         20
> >> >> >> > 5     bird           2         34
> >> >> >> > 6     bird           2         34
> >> >> >> > 7      dog           1         21
> >> >> >> > 8      dog           1         28
> >> >> >> > 9      dog           1         25
> >> >> >> > 10     dog           2         35
> >> >> >> > 11     dog           2         18
> >> >> >> > 12     dog           2         11
> >> >> >> > 13     cat           1         46
> >> >> >> > 14     cat           1         33
> >> >> >> > 15     cat           1         48
> >> >> >> > 16     cat           2         21
> >> >> >> > 17     cat           2       <NA>
> >> >> >> > 18     cat           2       <NA>
> >> >> >> >
> >> >> >> > Thanks
> >> >> >> > --
> >> >> >> > Curtis Burkhalter
> >> >> >
> >> >> >
>
	[[alternative HTML version deleted]]

R help - Mar 2015 - problem applying the same function twice

[R] problem applying the same function twice

[R] problem applying the same function twice

[R] problem applying the same function twice