I have a matrix say: 23 1 12 12 0 0 0 1 0 1 0 2 23 2 I want to count of number of distinct rows and the number of disinct element in the second column and put these counts in a column. SO at the end of the day I should have: c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for the counts of how many times the elements in the second column exists. Any help is greatly appreciated. -- Thanks, Jim. [[alternative HTML version deleted]]
Dear Jim, 17.03.2011 20:54, Jim Silverton wrote:> I have a matrix say: > > 23 1 > 12 12 > 0 0 > 0 1 > 0 1 > 0 2 > 23 2 > > I want to count of number of distinct rows and the number of disinct element > in the second column and put these counts in a column. SO at the end of the > day I should have: > > c(1, 1, 1, 2, 2, 1, 1) for the distinct rows...Let's suppose my.data is your data frame, "var" is the 1st column and "var1" is the second. 1) Create a 3rd columns for the first task: my.data$var2<-0 2) Count distinct rows: for (i in 1:nrow(my.data)) { my.data$var2[i]<-nrow(subset(my.data, var==var[i] & var1==var1[i])) } After this, the output of "my.data$var2" is: [1] 1 1 1 2 2 1 1 > ... and c(1, 1, 1, 2, 2, 2, 2) for the counts of how many times the > elements in the second column exists. Here I'm a bit irritated. Shouldn't the count for the first element "1" rather be 3, since the number 3 occurs three times... If this is what You are looking for, then the following should work: 1) Create a 4th column for: my.data$var3<-0 2) Count distinct elements in the second column: for (i in 1:nrow(my.data)) { my.data$var3[i]<-sum(my.data$var1==my.data$var1[i]) } After this, the output of "my.data$var3" is: [1] 3 1 1 3 3 2 2 HTH, Kimmo
On Thu, Mar 17, 2011 at 02:54:49PM -0400, Jim Silverton wrote:> I have a matrix say: > > 23 1 > 12 12 > 0 0 > 0 1 > 0 1 > 0 2 > 23 2 > > I want to count of number of distinct rows and the number of disinct element > in the second column and put these counts in a column. SO at the end of the > day I should have: > > c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for > the counts of how many times the elements in the second column exists. Any > help is greatly appreciated.Hi. I understand the first as follows. For each row compute the number of rows, which are equal to the given one. If this is correct, then the following can be used. a <- cbind( c(23, 12, 0, 0, 0, 0, 23), c(1, 12, 0, 1, 1, 2, 2)) u <- rep(1, times=nrow(a)) ave(u, a[, 1], a[, 2], FUN=sum) [1] 1 1 1 2 2 1 1 I am not sure, whether i understand the second correctly. Can you explain it in more detail? I would expect ave(u, a[, 2], FUN=sum) [1] 3 1 1 3 3 2 2 However, this is different from your expected output. Do you count only consecutive equal elements? Hope this helps. Petr Savicky.