Tom Woolman
2018-Dec-17 17:33 UTC
[R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R
I have a data frame each with 10 variables of integer data for various attributes about each row of data, and I need to know the highest 5 variables related to each of row in this data frame and output that to a new data frame. In addition to the 5 highest variable names, I also need to know the corresponding 5 highest variable values for each row. A simple code example to generate a sample data frame for this is: set.seed(1) DF <- matrix(sample(1:9,9),ncol=10,nrow=9) DF <- as.data.frame.matrix(DF) This would result in an example data frame like this: # V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 # 1 3 2 5 6 5 2 6 8 1 3 # 2 1 4 7 8 7 7 3 4 2 9 # 3 2 3 4 7 5 8 9 1 3 5 # 4 3 8 3 4 5 6 7 4 6 5 # 5 6 2 3 7 2 1 8 3 2 4 # 6 8 2 4 8 3 2 9 7 6 5 # 7 1 5 3 6 8 3 8 9 1 3 # 8 9 3 5 8 4 9 7 8 1 2 # 9 1 2 4 8 3 2 1 2 5 6 My ideal output would be something like this: # V1 V2 V3 V4 V5 # 1 V2:9 V7:8 V8:7 V4:6 V3:5 # 2 V9:9 V3:8 V5:7 V7:6 V4:5 # 3 V5:9 V3:8 V2:7 V9:6 V7:5 # 4 V8:9 V4:8 V2:7 V5:6 V9:5 # 5 V9:9 V1:8 V6:7 V3:6 V5:5 # 6 V8:9 V1:8 V5:7 V9:6 V4:5 # 7 V2:9 V8:8 V7:7 V5:6 V9:5 # 8 V4:9 V7:8 V9:7 V2:6 V8:5 # 9 V3:9 V7:8 V8:7 V4:6 V5:5 # 10 V6:9 V8:8 V1:7 V9:6 V4:5 I was trying to use code, but this doesn't seem to work: out <- t(apply(DF, 1, function(x){ o <- head(order(-x), 5) paste0(names(x[o]), ':', x[o]) })) as.data.frame(out) Thanks everyone!
David L Carlson
2018-Dec-17 20:55 UTC
[R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R
There are some problems with your example. Your code does not produce anything like your example data frame because you draw only 9 values without replacement. Your code produces 10 columns, each with the same permutation of the values 1:9. Then your desired output does not make sense in terms of your example data. The first entry is V2:9 but 9 does not appear in row 1. Using your posted example: DF <- structure(list(V1 = c(3L, 1L, 2L, 3L, 6L, 8L, 1L, 9L, 1L), V2 = c(2L, 4L, 3L, 8L, 2L, 2L, 5L, 3L, 2L), V3 = c(5L, 7L, 4L, 3L, 3L, 4L, 3L, 5L, 4L), V4 = c(6L, 8L, 7L, 4L, 7L, 8L, 6L, 8L, 8L), V5 = c(5L, 7L, 5L, 5L, 2L, 3L, 8L, 4L, 3L), V6 = c(2L, 7L, 8L, 6L, 1L, 2L, 3L, 9L, 2L), V7 = c(6L, 3L, 9L, 7L, 8L, 9L, 8L, 7L, 1L), V8 = c(8L, 4L, 1L, 4L, 3L, 7L, 9L, 8L, 2L), V9 = c(1L, 2L, 3L, 6L, 2L, 6L, 1L, 1L, 5L), V10 = c(3L, 9L, 5L, 5L, 4L, 5L, 3L, 2L, 6L)), class = "data.frame", row.names = c(NA, -9L)) Your code produces: V1 V2 V3 V4 V5 1 V8:8 V4:6 V7:6 V3:5 V5:5 2 V10:9 V4:8 V3:7 V5:7 V6:7 3 V7:9 V6:8 V4:7 V5:5 V10:5 4 V2:8 V7:7 V6:6 V9:6 V5:5 5 V7:8 V4:7 V1:6 V10:4 V3:3 6 V7:9 V1:8 V4:8 V8:7 V9:6 7 V8:9 V5:8 V7:8 V4:6 V2:5 8 V1:9 V6:9 V4:8 V8:8 V7:7 9 V4:8 V10:6 V9:5 V3:4 V5:3 Which seems to be what you wanted. --------------------------------------------- David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Tom Woolman Sent: Monday, December 17, 2018 11:34 AM To: r-help at r-project.org Subject: [R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R I have a data frame each with 10 variables of integer data for various attributes about each row of data, and I need to know the highest 5 variables related to each of row in this data frame and output that to a new data frame. In addition to the 5 highest variable names, I also need to know the corresponding 5 highest variable values for each row. A simple code example to generate a sample data frame for this is: set.seed(1) DF <- matrix(sample(1:9,9),ncol=10,nrow=9) DF <- as.data.frame.matrix(DF) This would result in an example data frame like this: # V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 # 1 3 2 5 6 5 2 6 8 1 3 # 2 1 4 7 8 7 7 3 4 2 9 # 3 2 3 4 7 5 8 9 1 3 5 # 4 3 8 3 4 5 6 7 4 6 5 # 5 6 2 3 7 2 1 8 3 2 4 # 6 8 2 4 8 3 2 9 7 6 5 # 7 1 5 3 6 8 3 8 9 1 3 # 8 9 3 5 8 4 9 7 8 1 2 # 9 1 2 4 8 3 2 1 2 5 6 My ideal output would be something like this: # V1 V2 V3 V4 V5 # 1 V2:9 V7:8 V8:7 V4:6 V3:5 # 2 V9:9 V3:8 V5:7 V7:6 V4:5 # 3 V5:9 V3:8 V2:7 V9:6 V7:5 # 4 V8:9 V4:8 V2:7 V5:6 V9:5 # 5 V9:9 V1:8 V6:7 V3:6 V5:5 # 6 V8:9 V1:8 V5:7 V9:6 V4:5 # 7 V2:9 V8:8 V7:7 V5:6 V9:5 # 8 V4:9 V7:8 V9:7 V2:6 V8:5 # 9 V3:9 V7:8 V8:7 V4:6 V5:5 # 10 V6:9 V8:8 V1:7 V9:6 V4:5 I was trying to use code, but this doesn't seem to work: out <- t(apply(DF, 1, function(x){ o <- head(order(-x), 5) paste0(names(x[o]), ':', x[o]) })) as.data.frame(out) Thanks everyone! ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
PIKAL Petr
2018-Dec-19 12:14 UTC
[R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R
Hi generated DF is not what you expect it is> set.seed(1) > DF <- matrix(sample(1:9,9),ncol=10,nrow=9) > DF[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 3 3 3 3 3 3 3 3 3 3 [2,] 9 9 9 9 9 9 9 9 9 9 [3,] 5 5 5 5 5 5 5 5 5 5 [4,] 6 6 6 6 6 6 6 6 6 6 [5,] 2 2 2 2 2 2 2 2 2 2 [6,] 4 4 4 4 4 4 4 4 4 4 [7,] 8 8 8 8 8 8 8 8 8 8 [8,] 7 7 7 7 7 7 7 7 7 7 [9,] 1 1 1 1 1 1 1 1 1 1>with slight input modification> set.seed(1) > DF <- matrix(sample(1:9,90, replace=T), ncol=10, nrow=9) > DF <- as.data.frame.matrix(DF) >> out <- t(apply(DF, 1, function(x){+ o <- head(order(-x), 5) + paste0(names(x[o]), ':', x[o]) + }))> as.data.frame(out)V1 V2 V3 V4 V5 1 V5:8 V6:8 V10:7 V3:4 V4:4 2 V4:8 V3:7 V8:6 V1:4 V9:4 3 V3:9 V5:7 V1:6 V6:5 V9:5 4 V1:9 V9:9 V2:7 V6:7 V10:7 5 V5:8 V9:8 V6:7 V8:7 V3:6 6 V1:9 V2:7 V10:7 V5:6 V4:5 7 V1:9 V7:9 V5:8 V6:8 V8:8 8 V9:9 V4:8 V2:7 V1:6 V5:5 9 V2:9 V8:8 V4:7 V1:6 V5:5 your code seems to work. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Tom Woolman > Sent: Monday, December 17, 2018 6:34 PM > To: r-help at r-project.org > Subject: [R] Trying to fix code that will find highest 5 column names and their > associated values for each row in a data frame in R > > > I have a data frame each with 10 variables of integer data for various > attributes about each row of data, and I need to know the highest 5 variables > related to each of > row in this data frame and output that to a new data frame. In addition to > the 5 highest variable names, I also need to know the corresponding 5 > highest variable values for each row. > > A simple code example to generate a sample data frame for this is: > > set.seed(1) > DF <- matrix(sample(1:9,9),ncol=10,nrow=9) > DF <- as.data.frame.matrix(DF) > > > This would result in an example data frame like this: > > # V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 > # 1 3 2 5 6 5 2 6 8 1 3 > # 2 1 4 7 8 7 7 3 4 2 9 > # 3 2 3 4 7 5 8 9 1 3 5 > # 4 3 8 3 4 5 6 7 4 6 5 > # 5 6 2 3 7 2 1 8 3 2 4 > # 6 8 2 4 8 3 2 9 7 6 5 > # 7 1 5 3 6 8 3 8 9 1 3 > # 8 9 3 5 8 4 9 7 8 1 2 > # 9 1 2 4 8 3 2 1 2 5 6 > > > My ideal output would be something like this: > > > # V1 V2 V3 V4 V5 > # 1 V2:9 V7:8 V8:7 V4:6 V3:5 > # 2 V9:9 V3:8 V5:7 V7:6 V4:5 > # 3 V5:9 V3:8 V2:7 V9:6 V7:5 > # 4 V8:9 V4:8 V2:7 V5:6 V9:5 > # 5 V9:9 V1:8 V6:7 V3:6 V5:5 > # 6 V8:9 V1:8 V5:7 V9:6 V4:5 > # 7 V2:9 V8:8 V7:7 V5:6 V9:5 > # 8 V4:9 V7:8 V9:7 V2:6 V8:5 > # 9 V3:9 V7:8 V8:7 V4:6 V5:5 > # 10 V6:9 V8:8 V1:7 V9:6 V4:5 > > > I was trying to use code, but this doesn't seem to work: > > out <- t(apply(DF, 1, function(x){ > o <- head(order(-x), 5) > paste0(names(x[o]), ':', x[o]) > })) > as.data.frame(out) > > > > Thanks everyone! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Osobn? ?daje: Informace o zpracov?n? a ochran? osobn?ch ?daj? obchodn?ch partner? PRECHEZA a.s. jsou zve?ejn?ny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner?s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ D?v?rnost: Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a podl?haj? tomuto pr?vn? z?vazn?mu prohl??en? o vylou?en? odpov?dnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/