mpward at illinois.edu
2010-Jul-24 01:20 UTC
[R] Trouble retrieving the second largest value from each row of a data.frame
I have a data frame with a couple million lines and want to retrieve the largest and second largest values in each row, along with the label of the column these values are in. For example row 1 strongest=-11072 secondstrongest=-11707 strongestantenna=value120 secondstrongantenna=value60 Below is the code I am using and a truncated data.frame. Retrieving the largest value was easy, but I have been getting errors every way I have tried to retrieve the second largest value. I have not even tried to retrieve the labels for the value yet. Any help would be appreciated Mike> data<-data.frame(value0,value60,value120,value180,value240,value300) > datavalue0 value60 value120 value180 value240 value300 1 -13007 -11707 -11072 -12471 -12838 -13357 2 -12838 -13210 -11176 -11799 -13210 -13845 3 -12880 -11778 -11113 -12439 -13089 -13880 4 -12805 -11653 -11071 -12385 -11561 -13317 5 -12834 -13527 -11067 -11638 -13527 -13873 6 -11068 -11698 -12430 -12430 -12430 -12814 7 -12807 -14068 -11092 -11709 -11607 -13025 8 -12770 -11665 -11061 -12373 -11426 -12805 9 -12988 -11736 -11137 -12570 -13467 -13739 10 -11779 -12873 -12973 -12537 -12973 -11146> #largest value in the row > strongest<-apply(data,1,max) > > > #second largest value in the row > n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ (max(data[1,]))) > secondstrongest<-apply(data,1,n)Error in data[1, ] : incorrect number of dimensions>
jim holtman
2010-Jul-24 05:45 UTC
[R] Trouble retrieving the second largest value from each row of a data.frame
try this:> x <- read.table(textConnection(" value0 value60 value120 value180 value240 value300+ 1 -13007 -11707 -11072 -12471 -12838 -13357 + 2 -12838 -13210 -11176 -11799 -13210 -13845 + 3 -12880 -11778 -11113 -12439 -13089 -13880 + 4 -12805 -11653 -11071 -12385 -11561 -13317 + 5 -12834 -13527 -11067 -11638 -13527 -13873 + 6 -11068 -11698 -12430 -12430 -12430 -12814 + 7 -12807 -14068 -11092 -11709 -11607 -13025 + 8 -12770 -11665 -11061 -12373 -11426 -12805 + 9 -12988 -11736 -11137 -12570 -13467 -13739 + 10 -11779 -12873 -12973 -12537 -12973 -11146"), header=TRUE)> closeAllConnections() > # generate the indices of 1st&2nd largest in each row > indx <- apply(x, 1, function(z){+ order(z, decreasing=TRUE)[1:2] + })> # now print out the data for each row > for (i in seq(ncol(indx))){+ cat('row:', i, + '1st:', x[i, indx[1,i]], 'col:', colnames(x)[indx[1,i]], + '2nd:', x[i, indx[2,i]], 'col:', colnames(x)[indx[2,i]], '\n') + } row: 1 1st: -11072 col: value120 2nd: -11707 col: value60 row: 2 1st: -11176 col: value120 2nd: -11799 col: value180 row: 3 1st: -11113 col: value120 2nd: -11778 col: value60 row: 4 1st: -11071 col: value120 2nd: -11561 col: value240 row: 5 1st: -11067 col: value120 2nd: -11638 col: value180 row: 6 1st: -11068 col: value0 2nd: -11698 col: value60 row: 7 1st: -11092 col: value120 2nd: -11607 col: value240 row: 8 1st: -11061 col: value120 2nd: -11426 col: value240 row: 9 1st: -11137 col: value120 2nd: -11736 col: value60 row: 10 1st: -11146 col: value300 2nd: -11779 col: value0> >On Fri, Jul 23, 2010 at 9:20 PM, <mpward at illinois.edu> wrote:> I have a data frame with a couple million lines and want to retrieve the largest and second largest values in each row, along with the label of the column these values are in. For example > > row 1 > strongest=-11072 > secondstrongest=-11707 > strongestantenna=value120 > secondstrongantenna=value60 > > Below is the code I am using and a truncated data.frame. ?Retrieving the largest value was easy, but I have been getting errors every way I have tried to retrieve the second largest value. ?I have not even tried to retrieve the labels for the value yet. > > Any help would be appreciated > Mike > > >> data<-data.frame(value0,value60,value120,value180,value240,value300) >> data > ? value0 value60 value120 value180 value240 value300 > 1 ?-13007 ?-11707 ? -11072 ? -12471 ? -12838 ? -13357 > 2 ?-12838 ?-13210 ? -11176 ? -11799 ? -13210 ? -13845 > 3 ?-12880 ?-11778 ? -11113 ? -12439 ? -13089 ? -13880 > 4 ?-12805 ?-11653 ? -11071 ? -12385 ? -11561 ? -13317 > 5 ?-12834 ?-13527 ? -11067 ? -11638 ? -13527 ? -13873 > 6 ?-11068 ?-11698 ? -12430 ? -12430 ? -12430 ? -12814 > 7 ?-12807 ?-14068 ? -11092 ? -11709 ? -11607 ? -13025 > 8 ?-12770 ?-11665 ? -11061 ? -12373 ? -11426 ? -12805 > 9 ?-12988 ?-11736 ? -11137 ? -12570 ? -13467 ? -13739 > 10 -11779 ?-12873 ? -12973 ? -12537 ? -12973 ? -11146 >> #largest value in the row >> strongest<-apply(data,1,max) >> >> >> #second largest value in the row >> n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ (max(data[1,]))) >> secondstrongest<-apply(data,1,n) > Error in data[1, ] : incorrect number of dimensions >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Joshua Wiley
2010-Jul-24 06:01 UTC
[R] Trouble retrieving the second largest value from each row of a data.frame
Hi, Here is a little function that will do what you want and return a nice output: #Function To calculate top two values and return my.finder <- function(mydata) { my.fun <- function(data) { strongest <- which.max(data) secondstrongest <- which.max(data[-strongest]) strongestantenna <- names(data)[strongest] secondstrongantenna <- names(data[-strongest])[secondstrongest] value <- matrix(c(data[strongest], data[secondstrongest], strongestantenna, secondstrongantenna), ncol =4) return(value) } dat <- apply(mydata, 1, my.fun) dat <- t(dat) dat <- as.data.frame(dat, stringsAsFactors = FALSE) colnames(dat) <- c("strongest", "secondstrongest", "strongestantenna", "secondstrongantenna") dat[ , "strongest"] <- as.numeric(dat[ , "strongest"]) dat[ , "secondstrongest"] <- as.numeric(dat[ , "secondstrongest"]) return(dat) } #Using your example data: yourdata <- structure(list(value0 = c(-13007L, -12838L, -12880L, -12805L, -12834L, -11068L, -12807L, -12770L, -12988L, -11779L), value60 = c(-11707L, -13210L, -11778L, -11653L, -13527L, -11698L, -14068L, -11665L, -11736L, -12873L), value120 = c(-11072L, -11176L, -11113L, -11071L, -11067L, -12430L, -11092L, -11061L, -11137L, -12973L), value180 = c(-12471L, -11799L, -12439L, -12385L, -11638L, -12430L, -11709L, -12373L, -12570L, -12537L), value240 = c(-12838L, -13210L, -13089L, -11561L, -13527L, -12430L, -11607L, -11426L, -13467L, -12973L), value300 = c(-13357L, -13845L, -13880L, -13317L, -13873L, -12814L, -13025L, -12805L, -13739L, -11146L)), .Names = c("value0", "value60", "value120", "value180", "value240", "value300"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")) my.finder(yourdata) #and what you want is in a nicely labeled data frame #A potential problem is that it is not very efficient #Here is a test using a matrix of 100,000 rows #sampled from the same range as your data #with the same number of columns data.test <- matrix( sample(seq(min(yourdata),max(yourdata)), size = 500000, replace = TRUE), ncol = 5) system.time(my.finder(data.test)) #On my system I get> system.time(my.finder(data.test))user system elapsed 2.89 0.00 2.89 Hope that helps, Josh On Fri, Jul 23, 2010 at 6:20 PM, <mpward at illinois.edu> wrote:> I have a data frame with a couple million lines and want to retrieve the largest and second largest values in each row, along with the label of the column these values are in. For example > > row 1 > strongest=-11072 > secondstrongest=-11707 > strongestantenna=value120 > secondstrongantenna=value60 > > Below is the code I am using and a truncated data.frame. ?Retrieving the largest value was easy, but I have been getting errors every way I have tried to retrieve the second largest value. ?I have not even tried to retrieve the labels for the value yet. > > Any help would be appreciated > Mike > > >> data<-data.frame(value0,value60,value120,value180,value240,value300) >> data > ? value0 value60 value120 value180 value240 value300 > 1 ?-13007 ?-11707 ? -11072 ? -12471 ? -12838 ? -13357 > 2 ?-12838 ?-13210 ? -11176 ? -11799 ? -13210 ? -13845 > 3 ?-12880 ?-11778 ? -11113 ? -12439 ? -13089 ? -13880 > 4 ?-12805 ?-11653 ? -11071 ? -12385 ? -11561 ? -13317 > 5 ?-12834 ?-13527 ? -11067 ? -11638 ? -13527 ? -13873 > 6 ?-11068 ?-11698 ? -12430 ? -12430 ? -12430 ? -12814 > 7 ?-12807 ?-14068 ? -11092 ? -11709 ? -11607 ? -13025 > 8 ?-12770 ?-11665 ? -11061 ? -12373 ? -11426 ? -12805 > 9 ?-12988 ?-11736 ? -11137 ? -12570 ? -13467 ? -13739 > 10 -11779 ?-12873 ? -12973 ? -12537 ? -12973 ? -11146 >> #largest value in the row >> strongest<-apply(data,1,max) >> >> >> #second largest value in the row >> n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ (max(data[1,]))) >> secondstrongest<-apply(data,1,n) > Error in data[1, ] : incorrect number of dimensions >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
David Winsemius
2010-Jul-24 12:40 UTC
[R] Trouble retrieving the second largest value from each row of a data.frame
On Jul 23, 2010, at 9:20 PM, <mpward at illinois.edu> wrote:> I have a data frame with a couple million lines and want to retrieve > the largest and second largest values in each row, along with the > label of the column these values are in. For example > > row 1 > strongest=-11072 > secondstrongest=-11707 > strongestantenna=value120 > secondstrongantenna=value60 > > Below is the code I am using and a truncated data.frame. Retrieving > the largest value was easy, but I have been getting errors every way > I have tried to retrieve the second largest value. I have not even > tried to retrieve the labels for the value yet. > > Any help would be appreciated > MikeUsing Holtman's extract of your data with x as the name and the order function to generate an index to names of the dataframe: > t(apply(x, 1, sort, decreasing=TRUE)[1:3, ]) [,1] [,2] [,3] 1 -11072 -11707 -12471 2 -11176 -11799 -12838 3 -11113 -11778 -12439 4 -11071 -11561 -11653 5 -11067 -11638 -12834 6 -11068 -11698 -12430 7 -11092 -11607 -11709 8 -11061 -11426 -11665 9 -11137 -11736 -12570 10 -11146 -11779 -12537 Putting it all together: matrix( paste( t(apply(x, 1, sort, decreasing=TRUE)[1:3, ]), names(x)[ t(apply(x, 1, order, decreasing=TRUE) [1:3, ]) ]), ncol=3) [,1] [,2] [,3] [1,] "-11072 value120" "-11707 value60" "-12471 value180" [2,] "-11176 value120" "-11799 value180" "-12838 value0" [3,] "-11113 value120" "-11778 value60" "-12439 value180" [4,] "-11071 value120" "-11561 value240" "-11653 value60" [5,] "-11067 value120" "-11638 value180" "-12834 value0" [6,] "-11068 value0" "-11698 value60" "-12430 value120" [7,] "-11092 value120" "-11607 value240" "-11709 value180" [8,] "-11061 value120" "-11426 value240" "-11665 value60" [9,] "-11137 value120" "-11736 value60" "-12570 value180" [10,] "-11146 value300" "-11779 value0" "-12537 value180" -- David.> > >> data<-data.frame(value0,value60,value120,value180,value240,value300) >> data > value0 value60 value120 value180 value240 value300 > 1 -13007 -11707 -11072 -12471 -12838 -13357 > 2 -12838 -13210 -11176 -11799 -13210 -13845 > 3 -12880 -11778 -11113 -12439 -13089 -13880 > 4 -12805 -11653 -11071 -12385 -11561 -13317 > 5 -12834 -13527 -11067 -11638 -13527 -13873 > 6 -11068 -11698 -12430 -12430 -12430 -12814 > 7 -12807 -14068 -11092 -11709 -11607 -13025 > 8 -12770 -11665 -11061 -12373 -11426 -12805 > 9 -12988 -11736 -11137 -12570 -13467 -13739 > 10 -11779 -12873 -12973 -12537 -12973 -11146 >> #largest value in the row >> strongest<-apply(data,1,max) >> >> >> #second largest value in the row >> n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ >> (max(data[1,]))) >> secondstrongest<-apply(data,1,n) > Error in data[1, ] : incorrect number of dimensions >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
mpward at illinois.edu
2010-Jul-24 20:54 UTC
[R] Trouble retrieving the second largest value from each row of a data.frame
THANKS, but I have one issue and one question. For some reason the "secondstrongest" value for row 3 and 6 are incorrect (they are the strongest) the remaining 10 are correct?? These data are being used to track radio-tagged birds, they are from automated radio telemetry receivers. I will applying the following formula diff <- ((strongest- secondstrongest)/100) bearingdiff <-30-(-0.0624*(diff**2))-(2.8346*diff) Then the bearing diff is added to strongestantenna (value0 = 0degrees) if the secondstrongestatenna is greater (eg value0 and value60), or if the secondstrongestantenna is smaller than the strongestantenna, then the bearingdiff is substracted from the strongestantenna. The only exception is that if value0 (0degrees) is strongest and value300(360degrees) is the secondstrongestantenna then the bearing is 360-bearingdiff. Also the strongestantenna and secondstrongestantenna have to be next to each other (e.g. value0 with value60, value240 with value300, value0 with value300) or the results should be NA. I have been trying to use a series of if,else statements to produce these bearing, but all I am producing is errors. Any suggestion would be appreciated. Again THANKS for you efforts. Mike ---- Original message ---->Date: Fri, 23 Jul 2010 23:01:56 -0700 >From: Joshua Wiley <jwiley.psych at gmail.com> >Subject: Re: [R] Trouble retrieving the second largest value from each row of a data.frame >To: mpward at illinois.edu >Cc: r-help at r-project.org > >Hi, > >Here is a little function that will do what you want and return a nice output: > >#Function To calculate top two values and return >my.finder <- function(mydata) { > my.fun <- function(data) { > strongest <- which.max(data) > secondstrongest <- which.max(data[-strongest]) > strongestantenna <- names(data)[strongest] > secondstrongantenna <- names(data[-strongest])[secondstrongest] > value <- matrix(c(data[strongest], data[secondstrongest], > strongestantenna, secondstrongantenna), ncol =4) > return(value) > } > dat <- apply(mydata, 1, my.fun) > dat <- t(dat) > dat <- as.data.frame(dat, stringsAsFactors = FALSE) > colnames(dat) <- c("strongest", "secondstrongest", > "strongestantenna", "secondstrongantenna") > dat[ , "strongest"] <- as.numeric(dat[ , "strongest"]) > dat[ , "secondstrongest"] <- as.numeric(dat[ , "secondstrongest"]) > return(dat) >} > > >#Using your example data: > >yourdata <- structure(list(value0 = c(-13007L, -12838L, -12880L, -12805L, >-12834L, -11068L, -12807L, -12770L, -12988L, -11779L), value60 = c(-11707L, >-13210L, -11778L, -11653L, -13527L, -11698L, -14068L, -11665L, >-11736L, -12873L), value120 = c(-11072L, -11176L, -11113L, -11071L, >-11067L, -12430L, -11092L, -11061L, -11137L, -12973L), value180 = c(-12471L, >-11799L, -12439L, -12385L, -11638L, -12430L, -11709L, -12373L, >-12570L, -12537L), value240 = c(-12838L, -13210L, -13089L, -11561L, >-13527L, -12430L, -11607L, -11426L, -13467L, -12973L), value300 = c(-13357L, >-13845L, -13880L, -13317L, -13873L, -12814L, -13025L, -12805L, >-13739L, -11146L)), .Names = c("value0", "value60", "value120", >"value180", "value240", "value300"), class = "data.frame", row.names = c("1", >"2", "3", "4", "5", "6", "7", "8", "9", "10")) > >my.finder(yourdata) #and what you want is in a nicely labeled data frame > >#A potential problem is that it is not very efficient > >#Here is a test using a matrix of 100,000 rows >#sampled from the same range as your data >#with the same number of columns > >data.test <- matrix( > sample(seq(min(yourdata),max(yourdata)), size = 500000, replace = TRUE), > ncol = 5) > >system.time(my.finder(data.test)) > >#On my system I get > >> system.time(my.finder(data.test)) > user system elapsed > 2.89 0.00 2.89 > >Hope that helps, > >Josh > > > >On Fri, Jul 23, 2010 at 6:20 PM, <mpward at illinois.edu> wrote: >> I have a data frame with a couple million lines and want to retrieve the largest and second largest values in each row, along with the label of the column these values are in. For example >> >> row 1 >> strongest=-11072 >> secondstrongest=-11707 >> strongestantenna=value120 >> secondstrongantenna=value60 >> >> Below is the code I am using and a truncated data.frame. ?Retrieving the largest value was easy, but I have been getting errors every way I have tried to retrieve the second largest value. ?I have not even tried to retrieve the labels for the value yet. >> >> Any help would be appreciated >> Mike >> >> >>> data<-data.frame(value0,value60,value120,value180,value240,value300) >>> data >> ? value0 value60 value120 value180 value240 value300 >> 1 ?-13007 ?-11707 ? -11072 ? -12471 ? -12838 ? -13357 >> 2 ?-12838 ?-13210 ? -11176 ? -11799 ? -13210 ? -13845 >> 3 ?-12880 ?-11778 ? -11113 ? -12439 ? -13089 ? -13880 >> 4 ?-12805 ?-11653 ? -11071 ? -12385 ? -11561 ? -13317 >> 5 ?-12834 ?-13527 ? -11067 ? -11638 ? -13527 ? -13873 >> 6 ?-11068 ?-11698 ? -12430 ? -12430 ? -12430 ? -12814 >> 7 ?-12807 ?-14068 ? -11092 ? -11709 ? -11607 ? -13025 >> 8 ?-12770 ?-11665 ? -11061 ? -12373 ? -11426 ? -12805 >> 9 ?-12988 ?-11736 ? -11137 ? -12570 ? -13467 ? -13739 >> 10 -11779 ?-12873 ? -12973 ? -12537 ? -12973 ? -11146 >>> #largest value in the row >>> strongest<-apply(data,1,max) >>> >>> >>> #second largest value in the row >>> n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+ (max(data[1,]))) >>> secondstrongest<-apply(data,1,n) >> Error in data[1, ] : incorrect number of dimensions >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > >-- >Joshua Wiley >Ph.D. Student, Health Psychology >University of California, Los Angeles >http://www.joshuawiley.com/