thr3ads.net - R help - [R] Subsetting without partial matches [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jonathan Dushoff

2009-Jan-30 12:46 UTC

[R] Subsetting without partial matches

I have a list of observations of individuals.  I would like to make a 
list of individuals, with a data frame of observations for each individual.

The following code usually works, but not always

----------------------------------------------------------------------

# Make a list of empty data frames
animals = list()
indivs = levels(Z$individual_id)
donotprint <- sapply(indivs, function(i){
   animals[[i]] = data.frame()
})

# Add observations of each animal to the appropriate frame
donotprint <- apply(Z, 1, function(r){
   ind = r[["individual_id"]]
   bind = ind # Use different names to confirm that the partial matching 
is being done on the left

   animals[[bind]]$sighting_number <<-
      c(animals[[ind, exact=TRUE]]$sighting_number,
r[["sighting_number"]])
   animals[[bind]]$date<<-
      c(animals[[ind, exact=TRUE]]$date, r[["date"]])
   animals[[bind]]$age <<-
      c(animals[[ind, exact=TRUE]]$age, r[["age_num"]])
})

----------------------------------------------------------------------

The problem is partial matching.  When it gives the wrong answer, it 
gives partial match warnings.  Adding "exact=TRUE" to the left, the
way
that I added it to the right, simply produces an argument error.  
Changing to single brackets produces other errors.

I read the help, and the Language Definition (not the whole thing), but 
could not find clear documentation of what single brackets with 
character variable arguments are supposed to do in lists, nor of how 
partial matching is handled on the left side of an assignment, nor of 
whether R is supposed to do partial-match indexing when an exact match 
is available (I would have thought not, and it's documented that it's 
not supposed to for function arguments).

I am interested in how the subsetting is supposed to work, but even more 
in what might be the best way to code this sort of thing in R.

I am using R 2.6.2 on Mandriva linux.

Thanks for any help,

JD

David Winsemius

2009-Jan-30 16:23 UTC

head link

[R] Subsetting without partial matches

I have not seen you describe the value of doing partial matching in  
this application, so pardon this perhaps non-responsive reply:  
Wouldn't it have been much, much simpler to have used the subset  
function (which returns a dataframe object) at the first assignment to  
donotprint?

Something along the lines of  (untested) :
> donotprint <- sapply(indivs, function(i){
>  animals[[i]] = subset(Z, individual_id == i, select =  
> c(sighting_number, date, age_num) ) }  # reconsider naming variable  
> "date"
                       )


-- 
David Winsemius

On Jan 30, 2009, at 7:46 AM, Jonathan Dushoff wrote:
> I have a list of observations of individuals.  I would like to make  
> a list of individuals, with a data frame of observations for each  
> individual.
>
> The following code usually works, but not always
>
> ----------------------------------------------------------------------
>
> # Make a list of empty data frames
> animals = list()
> indivs = levels(Z$individual_id)
> donotprint <- sapply(indivs, function(i){
>  animals[[i]] = data.frame()
> })
>
> # Add observations of each animal to the appropriate frame
> donotprint <- apply(Z, 1, function(r){
>  ind = r[["individual_id"]]
>  bind = ind # Use different names to confirm that the partial  
> matching is being done on the left
>
>  animals[[bind]]$sighting_number <<-
>     c(animals[[ind, exact=TRUE]]$sighting_number,  
> r[["sighting_number"]])
>  animals[[bind]]$date<<-
>     c(animals[[ind, exact=TRUE]]$date, r[["date"]])
>  animals[[bind]]$age <<-
>     c(animals[[ind, exact=TRUE]]$age, r[["age_num"]])
> })
>
> ----------------------------------------------------------------------
>
> The problem is partial matching.  When it gives the wrong answer, it  
> gives partial match warnings.  Adding "exact=TRUE" to the left,
the
> way that I added it to the right, simply produces an argument  
> error.  Changing to single brackets produces other errors.
>
> I read the help, and the Language Definition (not the whole thing),  
> but could not find clear documentation of what single brackets with  
> character variable arguments are supposed to do in lists, nor of how  
> partial matching is handled on the left side of an assignment, nor  
> of whether R is supposed to do partial-match indexing when an exact  
> match is available (I would have thought not, and it's documented  
> that it's not supposed to for function arguments).
>
> I am interested in how the subsetting is supposed to work, but even  
> more in what might be the best way to code this sort of thing in R.
>
> I am using R 2.6.2 on Mandriva linux.
>
> Thanks for any help,
>
> JD
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jonathan Dushoff

2009-Jan-31 08:02 UTC

head link

[R] Subsetting without partial matches

David:

Thank you for your very valuable response.  In fact, I was trying to 
_avoid_ partial matching, not accomplish it.  Subset is a _much_ better 
way of doing what I was trying to do.

Humorously, however, your code also reproduces the mistake that brought 
me here, AFAICT.  I think my code behaved weirdly because of my use of = 
instead of <<- inside sapply.

With subset, we can avoid that choice altogether.  My new code, which 
appears to work, is:

==
animals <- sapply(unique(Z$id), function(i){
   subset(Z, id==i, select=c(sighting_number, date, age_num))
}, simplify=FALSE)

==
Why should I consider renaming the date column?

----------------------------------------------------------------------
I have not seen you describe the value of doing partial matching in  
this application, so pardon this perhaps non-responsive reply:  
Wouldn't it have been much, much simpler to have used the subset  
function (which returns a dataframe object) at the first assignment to  
donotprint?

Something along the lines of  (untested) :

 > donotprint <- sapply(indivs, function(i){
 >  animals[[i]] = subset(Z, individual_id == i, select =  
 > c(sighting_number, date, age_num) ) }  # reconsider naming variable  
 > "date"

                       )

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jan 2009 - Subsetting without partial matches

[R] Subsetting without partial matches

[R] Subsetting without partial matches

[R] Subsetting without partial matches

Possibly Parallel Threads