I have a list of observations of individuals. I would like to make a list of individuals, with a data frame of observations for each individual. The following code usually works, but not always ---------------------------------------------------------------------- # Make a list of empty data frames animals = list() indivs = levels(Z$individual_id) donotprint <- sapply(indivs, function(i){ animals[[i]] = data.frame() }) # Add observations of each animal to the appropriate frame donotprint <- apply(Z, 1, function(r){ ind = r[["individual_id"]] bind = ind # Use different names to confirm that the partial matching is being done on the left animals[[bind]]$sighting_number <<- c(animals[[ind, exact=TRUE]]$sighting_number, r[["sighting_number"]]) animals[[bind]]$date<<- c(animals[[ind, exact=TRUE]]$date, r[["date"]]) animals[[bind]]$age <<- c(animals[[ind, exact=TRUE]]$age, r[["age_num"]]) }) ---------------------------------------------------------------------- The problem is partial matching. When it gives the wrong answer, it gives partial match warnings. Adding "exact=TRUE" to the left, the way that I added it to the right, simply produces an argument error. Changing to single brackets produces other errors. I read the help, and the Language Definition (not the whole thing), but could not find clear documentation of what single brackets with character variable arguments are supposed to do in lists, nor of how partial matching is handled on the left side of an assignment, nor of whether R is supposed to do partial-match indexing when an exact match is available (I would have thought not, and it's documented that it's not supposed to for function arguments). I am interested in how the subsetting is supposed to work, but even more in what might be the best way to code this sort of thing in R. I am using R 2.6.2 on Mandriva linux. Thanks for any help, JD
I have not seen you describe the value of doing partial matching in this application, so pardon this perhaps non-responsive reply: Wouldn't it have been much, much simpler to have used the subset function (which returns a dataframe object) at the first assignment to donotprint? Something along the lines of (untested) :> donotprint <- sapply(indivs, function(i){ > animals[[i]] = subset(Z, individual_id == i, select = > c(sighting_number, date, age_num) ) } # reconsider naming variable > "date") -- David Winsemius On Jan 30, 2009, at 7:46 AM, Jonathan Dushoff wrote:> I have a list of observations of individuals. I would like to make > a list of individuals, with a data frame of observations for each > individual. > > The following code usually works, but not always > > ---------------------------------------------------------------------- > > # Make a list of empty data frames > animals = list() > indivs = levels(Z$individual_id) > donotprint <- sapply(indivs, function(i){ > animals[[i]] = data.frame() > }) > > # Add observations of each animal to the appropriate frame > donotprint <- apply(Z, 1, function(r){ > ind = r[["individual_id"]] > bind = ind # Use different names to confirm that the partial > matching is being done on the left > > animals[[bind]]$sighting_number <<- > c(animals[[ind, exact=TRUE]]$sighting_number, > r[["sighting_number"]]) > animals[[bind]]$date<<- > c(animals[[ind, exact=TRUE]]$date, r[["date"]]) > animals[[bind]]$age <<- > c(animals[[ind, exact=TRUE]]$age, r[["age_num"]]) > }) > > ---------------------------------------------------------------------- > > The problem is partial matching. When it gives the wrong answer, it > gives partial match warnings. Adding "exact=TRUE" to the left, the > way that I added it to the right, simply produces an argument > error. Changing to single brackets produces other errors. > > I read the help, and the Language Definition (not the whole thing), > but could not find clear documentation of what single brackets with > character variable arguments are supposed to do in lists, nor of how > partial matching is handled on the left side of an assignment, nor > of whether R is supposed to do partial-match indexing when an exact > match is available (I would have thought not, and it's documented > that it's not supposed to for function arguments). > > I am interested in how the subsetting is supposed to work, but even > more in what might be the best way to code this sort of thing in R. > > I am using R 2.6.2 on Mandriva linux. > > Thanks for any help, > > JD > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David: Thank you for your very valuable response. In fact, I was trying to _avoid_ partial matching, not accomplish it. Subset is a _much_ better way of doing what I was trying to do. Humorously, however, your code also reproduces the mistake that brought me here, AFAICT. I think my code behaved weirdly because of my use of = instead of <<- inside sapply. With subset, we can avoid that choice altogether. My new code, which appears to work, is: == animals <- sapply(unique(Z$id), function(i){ subset(Z, id==i, select=c(sighting_number, date, age_num)) }, simplify=FALSE) == Why should I consider renaming the date column? ---------------------------------------------------------------------- I have not seen you describe the value of doing partial matching in this application, so pardon this perhaps non-responsive reply: Wouldn't it have been much, much simpler to have used the subset function (which returns a dataframe object) at the first assignment to donotprint? Something along the lines of (untested) : > donotprint <- sapply(indivs, function(i){ > animals[[i]] = subset(Z, individual_id == i, select = > c(sighting_number, date, age_num) ) } # reconsider naming variable > "date" )
Maybe Matching Threads
- How to create a loop and then extract values from the list generated by cor.test
- convert variable types when creating data frame from cor.test results
- calculating correlation coefficients on repeated measures
- Identifying records with the correct number of repeated measures
- doing zero inflated glmm for count data with fmr