I have a list of observations of individuals. I would like to make a
list of individuals, with a data frame of observations for each individual.
The following code usually works, but not always
----------------------------------------------------------------------
# Make a list of empty data frames
animals = list()
indivs = levels(Z$individual_id)
donotprint <- sapply(indivs, function(i){
animals[[i]] = data.frame()
})
# Add observations of each animal to the appropriate frame
donotprint <- apply(Z, 1, function(r){
ind = r[["individual_id"]]
bind = ind # Use different names to confirm that the partial matching
is being done on the left
animals[[bind]]$sighting_number <<-
c(animals[[ind, exact=TRUE]]$sighting_number,
r[["sighting_number"]])
animals[[bind]]$date<<-
c(animals[[ind, exact=TRUE]]$date, r[["date"]])
animals[[bind]]$age <<-
c(animals[[ind, exact=TRUE]]$age, r[["age_num"]])
})
----------------------------------------------------------------------
The problem is partial matching. When it gives the wrong answer, it
gives partial match warnings. Adding "exact=TRUE" to the left, the
way
that I added it to the right, simply produces an argument error.
Changing to single brackets produces other errors.
I read the help, and the Language Definition (not the whole thing), but
could not find clear documentation of what single brackets with
character variable arguments are supposed to do in lists, nor of how
partial matching is handled on the left side of an assignment, nor of
whether R is supposed to do partial-match indexing when an exact match
is available (I would have thought not, and it's documented that it's
not supposed to for function arguments).
I am interested in how the subsetting is supposed to work, but even more
in what might be the best way to code this sort of thing in R.
I am using R 2.6.2 on Mandriva linux.
Thanks for any help,
JD
I have not seen you describe the value of doing partial matching in this application, so pardon this perhaps non-responsive reply: Wouldn't it have been much, much simpler to have used the subset function (which returns a dataframe object) at the first assignment to donotprint? Something along the lines of (untested) :> donotprint <- sapply(indivs, function(i){ > animals[[i]] = subset(Z, individual_id == i, select = > c(sighting_number, date, age_num) ) } # reconsider naming variable > "date") -- David Winsemius On Jan 30, 2009, at 7:46 AM, Jonathan Dushoff wrote:> I have a list of observations of individuals. I would like to make > a list of individuals, with a data frame of observations for each > individual. > > The following code usually works, but not always > > ---------------------------------------------------------------------- > > # Make a list of empty data frames > animals = list() > indivs = levels(Z$individual_id) > donotprint <- sapply(indivs, function(i){ > animals[[i]] = data.frame() > }) > > # Add observations of each animal to the appropriate frame > donotprint <- apply(Z, 1, function(r){ > ind = r[["individual_id"]] > bind = ind # Use different names to confirm that the partial > matching is being done on the left > > animals[[bind]]$sighting_number <<- > c(animals[[ind, exact=TRUE]]$sighting_number, > r[["sighting_number"]]) > animals[[bind]]$date<<- > c(animals[[ind, exact=TRUE]]$date, r[["date"]]) > animals[[bind]]$age <<- > c(animals[[ind, exact=TRUE]]$age, r[["age_num"]]) > }) > > ---------------------------------------------------------------------- > > The problem is partial matching. When it gives the wrong answer, it > gives partial match warnings. Adding "exact=TRUE" to the left, the > way that I added it to the right, simply produces an argument > error. Changing to single brackets produces other errors. > > I read the help, and the Language Definition (not the whole thing), > but could not find clear documentation of what single brackets with > character variable arguments are supposed to do in lists, nor of how > partial matching is handled on the left side of an assignment, nor > of whether R is supposed to do partial-match indexing when an exact > match is available (I would have thought not, and it's documented > that it's not supposed to for function arguments). > > I am interested in how the subsetting is supposed to work, but even > more in what might be the best way to code this sort of thing in R. > > I am using R 2.6.2 on Mandriva linux. > > Thanks for any help, > > JD > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David:
Thank you for your very valuable response. In fact, I was trying to
_avoid_ partial matching, not accomplish it. Subset is a _much_ better
way of doing what I was trying to do.
Humorously, however, your code also reproduces the mistake that brought
me here, AFAICT. I think my code behaved weirdly because of my use of =
instead of <<- inside sapply.
With subset, we can avoid that choice altogether. My new code, which
appears to work, is:
==
animals <- sapply(unique(Z$id), function(i){
subset(Z, id==i, select=c(sighting_number, date, age_num))
}, simplify=FALSE)
==
Why should I consider renaming the date column?
----------------------------------------------------------------------
I have not seen you describe the value of doing partial matching in
this application, so pardon this perhaps non-responsive reply:
Wouldn't it have been much, much simpler to have used the subset
function (which returns a dataframe object) at the first assignment to
donotprint?
Something along the lines of (untested) :
> donotprint <- sapply(indivs, function(i){
> animals[[i]] = subset(Z, individual_id == i, select =
> c(sighting_number, date, age_num) ) } # reconsider naming variable
> "date"
)
Seemingly Similar Threads
- How to create a loop and then extract values from the list generated by cor.test
- convert variable types when creating data frame from cor.test results
- calculating correlation coefficients on repeated measures
- Identifying records with the correct number of repeated measures
- doing zero inflated glmm for count data with fmr