thr3ads.net - R help - [R] lists: removing elements, iterating over elements, [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Paul Johnson

2005-Apr-05 17:36 UTC

[R] lists: removing elements, iterating over elements,

I'm writing R code to calculate Hierarchical Social Entropy, a diversity 
index that Tucker Balch proposed.  One article on this was published in 
Autonomous Robots in 2000. You can find that and others through his web 
page at Georgia Tech.

http://www.cc.gatech.edu/~tucker/index2.html

While I work on this, I realize (again) that I'm a C programmer 
masquerading in R, and its really tricky working with R lists.  Here are 
things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius. 
   (Again, I apologize this looks so much like C.)


## Returns a list of all U-diametric clusters of a given radius
## Give an R distance matrix
## Clusters may overlap.  Clusters may be identical (redundant)
getUDClusters <-function(distmat,radius){
   mem <- list()

   nItems <- dim(distmat)[1]
   for ( i in 1:nItems ){
     mem[[i]] <- c(i)
   }


   for ( m in 1:nItems ){
     for ( n in 1:nItems ){
       if (m != n & (distmat[m,n] <= radius)){
	##item is within radius, so add to collection m
         mem[[m]] <- sort(c( mem[[m]],n))
       }
     }
   }

   return(mem)
}


That generates the list, like this:

[[1]]
[1]  1  3  4  5  6  7  8  9 10

[[2]]
[1]  2  3  4 10

[[3]]
[1]  1  2  3  4  5  6  7  8 10

[[4]]
[1]  1  2  3  4 10

[[5]]
[1]  1  3  5  6  7  8  9 10

[[6]]
[1]  1  3  5  6  7  8  9 10

[[7]]
[1]  1  3  5  6  7  8  9 10

[[8]]
[1]  1  3  5  6  7  8  9 10

[[9]]
[1]  1  5  6  7  8  9 10

[[10]]
  [1]  1  2  3  4  5  6  7  8  9 10


The next task is to eliminate the redundant elements.  unique() does not 
apply to lists, so I have to scan one by one.


   cluslist <- getUDClusters(distmat,radius)

   ##find redundant (same) clusters
   redundantCluster <- c()
   for (m in 1:(length(cluslist)-1)) {
     for ( n in (m+1): length(cluslist) ){
       if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
         if ( sum(cluslist[[m]] == cluslist[[n]]){
           redundantCluster <- c( redundantCluster,n)
         }
       }
     }
   }


   ##make sure they are sorted in reverse order
   if (length(redundantCluster)>0)
     {
       redundantCluster <- unique(sort(redundantCluster, decreasing=T))

   ## remove redundant clusters (must do in reverse order to preserve 
index of cluslist)
       for (i in redundantCluster) cluslist[[i]] <- NULL
     }


Question: am I deleting the list elements properly?

I do not find explicit documentation for R on how to remove elements 
from lists, but trial and error tells me

myList[[5]] <- NULL

will remove the 5th element and then "close up" the hole caused by 
deletion of that element.  That suffles the index values, So I have to 
be careful in dropping elements. I must work from the back of the list 
to the front.


Is there an easier or faster way to remove the redundant clusters?


Now, the next question.  After eliminating the redundant sets from the 
list, I need to calculate the total number of items present in the whole 
list, figure how many are in each subset--each list item--and do some 
calculations.

I expected this would iterate over the members of the list--one step for 
each subcollection

for (i in cluslist){

}

but it does not.  It iterates over the items within the subsets of the 
list "cluslist."  I mean, if cluslist has 5 sets, each with 10
elements,
this for loop takes 50 steps, one for each individual item.

I find this does what I want

for (i in 1:length(cluslist))

But I found out the hard way :)


Oh, one more quirk that fooled me.  Why does unique() applied to a 
distance matrix throw away the 0's????  I think that's really bad!

 > x <- rnorm(5)
 > myDist <- dist(x,diag=T,upper=T)
 > myDist
           1         2         3         4         5
1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
 > unique(myDist)
  [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
  [8] 0.9989292 1.1163793 2.1153085
 >

-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700

Berton Gunter

2005-Apr-05 18:49 UTC

head link

[R] lists: removing elements, iterating over elements,

The following strategy may or may not work, depending on whether the numbers
in your lists are integers or could be the result of flowing point
computations (so that 2 might be 1.999999... etc.). 

As I understand it, you wish to reduce an arbitrary list to one with unique
members, where each member is a list/set, of course. If so, one way to do it
is to convert the members to a vector of strings, find the unique strings,
then convert the result back to a list. Something like (warning: not fully
tested)
> test<-list(a=1:3,b=1:4,c=1:3,d=1:5,e=1:3)
> l1<-lapply(test, paste,collapse='+')
> l2<-unique(unlist(l1))
> l2[1] "1+2+3"     "1+2+3+4"  
"1+2+3+4+5"> lapply(strsplit(l2,split='+',fixed=TRUE),as.numeric)[[1]]
[1] 1 2 3

[[2]]
[1] 1 2 3 4

[[3]]
[1] 1 2 3 4 5

The basic idea is to get your list into a form where the efficiency of
unique() can be brought to bear. There may of course be better ways to do
this.

HTH

Cheers,

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Johnson
> Sent: Tuesday, April 05, 2005 10:36 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] lists: removing elements, iterating over elements, 
> 
> I'm writing R code to calculate Hierarchical Social Entropy, 
> a diversity 
> index that Tucker Balch proposed.  One article on this was 
> published in 
> Autonomous Robots in 2000. You can find that and others 
> through his web 
> page at Georgia Tech.
> 
> http://www.cc.gatech.edu/~tucker/index2.html
> 
> While I work on this, I realize (again) that I'm a C programmer 
> masquerading in R, and its really tricky working with R 
> lists.  Here are 
> things that surprise me, I wonder what your experience/advice is.
> 
> I need to calculate overlapping U-diametric clusters of a 
> given radius. 
>    (Again, I apologize this looks so much like C.)
> 
> 
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap.  Clusters may be identical (redundant)
> getUDClusters <-function(distmat,radius){
>    mem <- list()
> 
>    nItems <- dim(distmat)[1]
>    for ( i in 1:nItems ){
>      mem[[i]] <- c(i)
>    }
> 
> 
>    for ( m in 1:nItems ){
>      for ( n in 1:nItems ){
>        if (m != n & (distmat[m,n] <= radius)){
> 	##item is within radius, so add to collection m
>          mem[[m]] <- sort(c( mem[[m]],n))
>        }
>      }
>    }
> 
>    return(mem)
> }
> 
> 
> That generates the list, like this:
> 
> [[1]]
> [1]  1  3  4  5  6  7  8  9 10
> 
> [[2]]
> [1]  2  3  4 10
> 
> [[3]]
> [1]  1  2  3  4  5  6  7  8 10
> 
> [[4]]
> [1]  1  2  3  4 10
> 
> [[5]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[6]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[7]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[8]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[9]]
> [1]  1  5  6  7  8  9 10
> 
> [[10]]
>   [1]  1  2  3  4  5  6  7  8  9 10
> 
> 
> The next task is to eliminate the redundant elements.  
> unique() does not 
> apply to lists, so I have to scan one by one.
> 
> 
>    cluslist <- getUDClusters(distmat,radius)
> 
>    ##find redundant (same) clusters
>    redundantCluster <- c()
>    for (m in 1:(length(cluslist)-1)) {
>      for ( n in (m+1): length(cluslist) ){
>        if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
>          if ( sum(cluslist[[m]] == cluslist[[n]]){
>            redundantCluster <- c( redundantCluster,n)
>          }
>        }
>      }
>    }
> 
> 
>    ##make sure they are sorted in reverse order
>    if (length(redundantCluster)>0)
>      {
>        redundantCluster <- unique(sort(redundantCluster, 
> decreasing=T))
> 
>    ## remove redundant clusters (must do in reverse order to preserve 
> index of cluslist)
>        for (i in redundantCluster) cluslist[[i]] <- NULL
>      }
> 
> 
> Question: am I deleting the list elements properly?
> 
> I do not find explicit documentation for R on how to remove elements 
> from lists, but trial and error tells me
> 
> myList[[5]] <- NULL
> 
> will remove the 5th element and then "close up" the hole caused
by
> deletion of that element.  That suffles the index values, So 
> I have to 
> be careful in dropping elements. I must work from the back of 
> the list 
> to the front.
> 
> 
> Is there an easier or faster way to remove the redundant clusters?
> 
> 
> Now, the next question.  After eliminating the redundant sets 
> from the 
> list, I need to calculate the total number of items present 
> in the whole 
> list, figure how many are in each subset--each list item--and do some 
> calculations.
> 
> I expected this would iterate over the members of the 
> list--one step for 
> each subcollection
> 
> for (i in cluslist){
> 
> }
> 
> but it does not.  It iterates over the items within the 
> subsets of the 
> list "cluslist."  I mean, if cluslist has 5 sets, each with 
> 10 elements, 
> this for loop takes 50 steps, one for each individual item.
> 
> I find this does what I want
> 
> for (i in 1:length(cluslist))
> 
> But I found out the hard way :)
> 
> 
> Oh, one more quirk that fooled me.  Why does unique() applied to a 
> distance matrix throw away the 0's????  I think that's really bad!
> 
>  > x <- rnorm(5)
>  > myDist <- dist(x,diag=T,upper=T)
>  > myDist
>            1         2         3         4         5
> 1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
> 2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
> 3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
> 4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
> 5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
>  > unique(myDist)
>   [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 
> 1.3718027 0.7435058
>   [8] 0.9989292 1.1163793 2.1153085
>  >
> 
> -- 
> Paul E. Johnson                       email: pauljohn at ku.edu
> Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
> 1541 Lilac Lane, Rm 504
> University of Kansas                  Office: (785) 864-9086
> Lawrence, Kansas 66044-3177           FAX: (785) 864-5700
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Liaw, Andy

2005-Apr-05 19:22 UTC

head link

[R] lists: removing elements, iterating over elements,

> From: Paul Johnson
> 
> I'm writing R code to calculate Hierarchical Social Entropy, 
> a diversity 
> index that Tucker Balch proposed.  One article on this was 
> published in 
> Autonomous Robots in 2000. You can find that and others 
> through his web 
> page at Georgia Tech.
> 
> http://www.cc.gatech.edu/~tucker/index2.html
> 
> While I work on this, I realize (again) that I'm a C programmer 
> masquerading in R, and its really tricky working with R 
> lists.  Here are 
> things that surprise me, I wonder what your experience/advice is.
> 
> I need to calculate overlapping U-diametric clusters of a 
> given radius. 
>    (Again, I apologize this looks so much like C.)
> 
> 
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap.  Clusters may be identical (redundant)
> getUDClusters <-function(distmat,radius){
>    mem <- list()
> 
>    nItems <- dim(distmat)[1]
>    for ( i in 1:nItems ){
>      mem[[i]] <- c(i)
>    }
This loop can be replaced with mem <- as.list(1:nItems)...
>    for ( m in 1:nItems ){
>      for ( n in 1:nItems ){
>        if (m != n & (distmat[m,n] <= radius)){
> 	##item is within radius, so add to collection m
>          mem[[m]] <- sort(c( mem[[m]],n))
>        }
>      }
>    }
If I understood the code correctly, this should do the same:

    neighbors <- which(distmat <= radius, arr.ind=TRUE)
    neighbors <- neighbors[neighbors[, 1] != neighbors[, 2],]
    mem <- split(neighbors[, 2], neighbors[, 1])

What I'm not sure of is whether you intend to include the i-th item in the
i-th list (since the distance is presumably 0).  Your code seems to indicate
no, as you have m != n in the if() condition.  The second line above removes
such results.  However, your list below seems to indicate that you do have
such elements in your lists.  If such results can not be in the list, then
the list should already be unique, no?

For deleting an element of a list, see R FAQ 7.1.

HTH,
Andy

 >    return(mem)
> }
> 
> 
> That generates the list, like this:
> 
> [[1]]
> [1]  1  3  4  5  6  7  8  9 10
> 
> [[2]]
> [1]  2  3  4 10
> 
> [[3]]
> [1]  1  2  3  4  5  6  7  8 10
> 
> [[4]]
> [1]  1  2  3  4 10
> 
> [[5]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[6]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[7]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[8]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[9]]
> [1]  1  5  6  7  8  9 10
> 
> [[10]]
>   [1]  1  2  3  4  5  6  7  8  9 10
> 
> 
> The next task is to eliminate the redundant elements.  
> unique() does not 
> apply to lists, so I have to scan one by one.
> 
> 
>    cluslist <- getUDClusters(distmat,radius)
> 
>    ##find redundant (same) clusters
>    redundantCluster <- c()
>    for (m in 1:(length(cluslist)-1)) {
>      for ( n in (m+1): length(cluslist) ){
>        if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
>          if ( sum(cluslist[[m]] == cluslist[[n]]){
>            redundantCluster <- c( redundantCluster,n)
>          }
>        }
>      }
>    }
> 
> 
>    ##make sure they are sorted in reverse order
>    if (length(redundantCluster)>0)
>      {
>        redundantCluster <- unique(sort(redundantCluster, 
> decreasing=T))
> 
>    ## remove redundant clusters (must do in reverse order to preserve 
> index of cluslist)
>        for (i in redundantCluster) cluslist[[i]] <- NULL
>      }
> 
> 
> Question: am I deleting the list elements properly?
> 
> I do not find explicit documentation for R on how to remove elements 
> from lists, but trial and error tells me
> 
> myList[[5]] <- NULL
> 
> will remove the 5th element and then "close up" the hole caused
by
> deletion of that element.  That suffles the index values, So 
> I have to 
> be careful in dropping elements. I must work from the back of 
> the list 
> to the front.
> 
> 
> Is there an easier or faster way to remove the redundant clusters?
> 
> 
> Now, the next question.  After eliminating the redundant sets 
> from the 
> list, I need to calculate the total number of items present 
> in the whole 
> list, figure how many are in each subset--each list item--and do some 
> calculations.
> 
> I expected this would iterate over the members of the 
> list--one step for 
> each subcollection
> 
> for (i in cluslist){
> 
> }
> 
> but it does not.  It iterates over the items within the 
> subsets of the 
> list "cluslist."  I mean, if cluslist has 5 sets, each with 
> 10 elements, 
> this for loop takes 50 steps, one for each individual item.
> 
> I find this does what I want
> 
> for (i in 1:length(cluslist))
> 
> But I found out the hard way :)
> 
> 
> Oh, one more quirk that fooled me.  Why does unique() applied to a 
> distance matrix throw away the 0's????  I think that's really bad!
> 
>  > x <- rnorm(5)
>  > myDist <- dist(x,diag=T,upper=T)
>  > myDist
>            1         2         3         4         5
> 1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
> 2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
> 3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
> 4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
> 5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
>  > unique(myDist)
>   [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 
> 1.3718027 0.7435058
>   [8] 0.9989292 1.1163793 2.1153085
>  >
> 
> -- 
> Paul E. Johnson                       email: pauljohn at ku.edu
> Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
> 1541 Lilac Lane, Rm 504
> University of Kansas                  Office: (785) 864-9086
> Lawrence, Kansas 66044-3177           FAX: (785) 864-5700
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

Huntsinger, Reid

2005-Apr-05 19:39 UTC

head link

[R] lists: removing elements, iterating over elements,

To get the neighborhoods of radius r of each point in your data set, given
distances calculated already in the matrix d, you could do (but note below)

$ A <- (d <= r)

then rows (or columns) of A are indicator vectors for the neighborhoods.
"Unique" will work on these vectors, as "unique.array", to
give the unique
rows, which would be the unique neighborhood lists:

$ unique(A)

Your question about why "unique" applied to a distance matrix ignores
zeros
points to a possible problem: the object you get from dist() is not a
matrix. The "upper" and "diag" options only control
printing. If you check
length() you'll see you only have n(n-1)/2 elements, the lower triangle of
the distance matrix. (To answer the question: unique() sees only these;
there's not a method for objects of class dist.) So you need to do

$ d <- as.matrix(distmat)

to get a matrix. 

Reid Huntsinger



-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Johnson
Sent: Tuesday, April 05, 2005 1:36 PM
To: r-help at stat.math.ethz.ch
Subject: [R] lists: removing elements, iterating over elements,


I'm writing R code to calculate Hierarchical Social Entropy, a diversity 
index that Tucker Balch proposed.  One article on this was published in 
Autonomous Robots in 2000. You can find that and others through his web 
page at Georgia Tech.

http://www.cc.gatech.edu/~tucker/index2.html

While I work on this, I realize (again) that I'm a C programmer 
masquerading in R, and its really tricky working with R lists.  Here are 
things that surprise me, I wonder what your experience/advice is.

I need to calculate overlapping U-diametric clusters of a given radius. 
   (Again, I apologize this looks so much like C.)


## Returns a list of all U-diametric clusters of a given radius
## Give an R distance matrix
## Clusters may overlap.  Clusters may be identical (redundant)
getUDClusters <-function(distmat,radius){
   mem <- list()

   nItems <- dim(distmat)[1]
   for ( i in 1:nItems ){
     mem[[i]] <- c(i)
   }


   for ( m in 1:nItems ){
     for ( n in 1:nItems ){
       if (m != n & (distmat[m,n] <= radius)){
	##item is within radius, so add to collection m
         mem[[m]] <- sort(c( mem[[m]],n))
       }
     }
   }

   return(mem)
}


That generates the list, like this:

[[1]]
[1]  1  3  4  5  6  7  8  9 10

[[2]]
[1]  2  3  4 10

[[3]]
[1]  1  2  3  4  5  6  7  8 10

[[4]]
[1]  1  2  3  4 10

[[5]]
[1]  1  3  5  6  7  8  9 10

[[6]]
[1]  1  3  5  6  7  8  9 10

[[7]]
[1]  1  3  5  6  7  8  9 10

[[8]]
[1]  1  3  5  6  7  8  9 10

[[9]]
[1]  1  5  6  7  8  9 10

[[10]]
  [1]  1  2  3  4  5  6  7  8  9 10


The next task is to eliminate the redundant elements.  unique() does not 
apply to lists, so I have to scan one by one.


   cluslist <- getUDClusters(distmat,radius)

   ##find redundant (same) clusters
   redundantCluster <- c()
   for (m in 1:(length(cluslist)-1)) {
     for ( n in (m+1): length(cluslist) ){
       if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
         if ( sum(cluslist[[m]] == cluslist[[n]]){
           redundantCluster <- c( redundantCluster,n)
         }
       }
     }
   }


   ##make sure they are sorted in reverse order
   if (length(redundantCluster)>0)
     {
       redundantCluster <- unique(sort(redundantCluster, decreasing=T))

   ## remove redundant clusters (must do in reverse order to preserve 
index of cluslist)
       for (i in redundantCluster) cluslist[[i]] <- NULL
     }


Question: am I deleting the list elements properly?

I do not find explicit documentation for R on how to remove elements 
from lists, but trial and error tells me

myList[[5]] <- NULL

will remove the 5th element and then "close up" the hole caused by 
deletion of that element.  That suffles the index values, So I have to 
be careful in dropping elements. I must work from the back of the list 
to the front.


Is there an easier or faster way to remove the redundant clusters?


Now, the next question.  After eliminating the redundant sets from the 
list, I need to calculate the total number of items present in the whole 
list, figure how many are in each subset--each list item--and do some 
calculations.

I expected this would iterate over the members of the list--one step for 
each subcollection

for (i in cluslist){

}

but it does not.  It iterates over the items within the subsets of the 
list "cluslist."  I mean, if cluslist has 5 sets, each with 10
elements,
this for loop takes 50 steps, one for each individual item.

I find this does what I want

for (i in 1:length(cluslist))

But I found out the hard way :)


Oh, one more quirk that fooled me.  Why does unique() applied to a 
distance matrix throw away the 0's????  I think that's really bad!

 > x <- rnorm(5)
 > myDist <- dist(x,diag=T,upper=T)
 > myDist
           1         2         3         4         5
1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
 > unique(myDist)
  [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
  [8] 0.9989292 1.1163793 2.1153085
 >

-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Gabor Grothendieck

2005-Apr-05 20:17 UTC

head link

[R] lists: removing elements, iterating over elements,

On Apr 5, 2005 1:36 PM, Paul Johnson <pauljohn at ku.edu>
wrote:> I'm writing R code to calculate Hierarchical Social Entropy, a
diversity
> index that Tucker Balch proposed.  One article on this was published in
> Autonomous Robots in 2000. You can find that and others through his web
> page at Georgia Tech.
> 
> http://www.cc.gatech.edu/~tucker/index2.html
> 
> While I work on this, I realize (again) that I'm a C programmer
> masquerading in R, and its really tricky working with R lists.  Here are
> things that surprise me, I wonder what your experience/advice is.
> 
> I need to calculate overlapping U-diametric clusters of a given radius.
>   (Again, I apologize this looks so much like C.)
> 
> ## Returns a list of all U-diametric clusters of a given radius
> ## Give an R distance matrix
> ## Clusters may overlap.  Clusters may be identical (redundant)
> getUDClusters <-function(distmat,radius){
>   mem <- list()
> 
>   nItems <- dim(distmat)[1]
>   for ( i in 1:nItems ){
>     mem[[i]] <- c(i)
>   }
> 
>   for ( m in 1:nItems ){
>     for ( n in 1:nItems ){
>       if (m != n & (distmat[m,n] <= radius)){
>        ##item is within radius, so add to collection m
>         mem[[m]] <- sort(c( mem[[m]],n))
>       }
>     }
>   }
> 
>   return(mem)
> }
> 
> That generates the list, like this:
> 
> [[1]]
> [1]  1  3  4  5  6  7  8  9 10
> 
> [[2]]
> [1]  2  3  4 10
> 
> [[3]]
> [1]  1  2  3  4  5  6  7  8 10
> 
> [[4]]
> [1]  1  2  3  4 10
> 
> [[5]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[6]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[7]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[8]]
> [1]  1  3  5  6  7  8  9 10
> 
> [[9]]
> [1]  1  5  6  7  8  9 10
> 
> [[10]]
>  [1]  1  2  3  4  5  6  7  8  9 10
> 
> The next task is to eliminate the redundant elements.  unique() does not
> apply to lists, so I have to scan one by one.
> 
>   cluslist <- getUDClusters(distmat,radius)
> 
>   ##find redundant (same) clusters
>   redundantCluster <- c()
>   for (m in 1:(length(cluslist)-1)) {
>     for ( n in (m+1): length(cluslist) ){
>       if ( m != n & length(cluslist[[m]]) == length(cluslist[[n]]) ){
>         if ( sum(cluslist[[m]] == cluslist[[n]]){
>           redundantCluster <- c( redundantCluster,n)
>         }
>       }
>     }
>   }
> 
>   ##make sure they are sorted in reverse order
>   if (length(redundantCluster)>0)
>     {
>       redundantCluster <- unique(sort(redundantCluster, decreasing=T))
> 
>   ## remove redundant clusters (must do in reverse order to preserve
> index of cluslist)
>       for (i in redundantCluster) cluslist[[i]] <- NULL
>     }
> 
> Question: am I deleting the list elements properly?
> 
> I do not find explicit documentation for R on how to remove elements
> from lists, but trial and error tells me
> 
> myList[[5]] <- NULL
> 
> will remove the 5th element and then "close up" the hole caused
by
> deletion of that element.  That suffles the index values, So I have to
> be careful in dropping elements. I must work from the back of the list
> to the front.
> 
> Is there an easier or faster way to remove the redundant clusters?
> 
> Now, the next question.  After eliminating the redundant sets from the
> list, I need to calculate the total number of items present in the whole
> list, figure how many are in each subset--each list item--and do some
> calculations.
> 
> I expected this would iterate over the members of the list--one step for
> each subcollection
> 
> for (i in cluslist){
> 
> }
> 
> but it does not.  It iterates over the items within the subsets of the
> list "cluslist."  I mean, if cluslist has 5 sets, each with 10
elements,
> this for loop takes 50 steps, one for each individual item.
> 
> I find this does what I want
> 
> for (i in 1:length(cluslist))
> 
> But I found out the hard way :)
> 
> Oh, one more quirk that fooled me.  Why does unique() applied to a
> distance matrix throw away the 0's????  I think that's really bad!
> 
> > x <- rnorm(5)
> > myDist <- dist(x,diag=T,upper=T)
> > myDist
>           1         2         3         4         5
> 1 0.0000000 1.2929976 1.6658710 2.6648003 0.5494918
> 2 1.2929976 0.0000000 0.3728735 1.3718027 0.7435058
> 3 1.6658710 0.3728735 0.0000000 0.9989292 1.1163793
> 4 2.6648003 1.3718027 0.9989292 0.0000000 2.1153085
> 5 0.5494918 0.7435058 1.1163793 2.1153085 0.0000000
> > unique(myDist)
>  [1] 1.2929976 1.6658710 2.6648003 0.5494918 0.3728735 1.3718027 0.7435058
>  [8] 0.9989292 1.1163793 2.1153085
> >
> 
> --
If L is our list of vectors then the following gets the unique
elements of L.  

I have assumed that the individual vectors are sorted 
(sort them first if not via lapply(L, sort)) and that each element 
has a unique name (give it one if not, e.g. names(L) <- seq(L)).

The first line binds them together into rows.   This will
recycle to make them the same length and give you a warning 
but that's ok since you only need to know if they are the same or 
not.  Now, unique applied to a matrix finds the unique rows and 
in the third line we use the row.names from that to get the original 
unsorted lists.

   mat <- unique(do.call("rbind", L))
   L[row.names(mat)]

Regarding why the diagonal elements of a distance matrix are
not part of the result of applying unique to that distance matrix
note that there is no unique.dist method defined in R so you
are getting the default which does not know about distance
matrices.   Now distance matrices don't store their diagonal
so its just giving the unique stored elements.  Even if unique
did have a dist method, unique applied to a matrix gives 
unique rows, not unique elements, so I am not so sure that it
should really do what you want here anyways.  I think its clearer just
to convert it explicitly to a matrix and then a vector so that
the action of unique is understood:

unique(c(as.matrix(myDist)))

Reasonably Related Threads

Search for more maybe matching threads

R help - Apr 2005 - lists: removing elements, iterating over elements,

[R] lists: removing elements, iterating over elements,

[R] lists: removing elements, iterating over elements,

[R] lists: removing elements, iterating over elements,

[R] lists: removing elements, iterating over elements,

[R] lists: removing elements, iterating over elements,

Reasonably Related Threads