Dimitri Liakhovitski
2013-Mar-14 00:10 UTC
[R] Modifying a data frame based on a vector that contains column numbers
Hello! # I have a data frame: mydf<-data.frame(c1=rep(NA,5),c2=rep(NA,5),c3=rep(NA,5)) # I have an index whose length is always the same as nrow(mydf): myindex<-c(1,2,3,2,1) # I need c1 to have 1s in rows 1 and 5 (based on the information in myindex) # I need c2 to have 1s in rows 2 and 4 (also based on myindex) # I need c3 to have 1 in row 3 # In other words, I am trying to achieve this result: mygoal<-data.frame(c1=c(1,NA,NA,NA,1),c2=c(NA,1,NA,1,NA),c3=c(NA,NA,1,NA,NA)) I know how to do it with a loop that runs through rows of mydf. However, in real life I have a huge data frame with tons of rows, dozens of columns (instead of 3 in this example) - I am afraid it'll take forever. Any hint on how to do it faster, maybe using subindexing somehow? Thank you very much! -- Dimitri Liakhovitski [[alternative HTML version deleted]]
William Dunlap
2013-Mar-14 01:28 UTC
[R] Modifying a data frame based on a vector that contains column numbers
Try looping over columns, as in fDF <- function (x, column) { stopifnot(length(dim(x))==2, all(column > 0), all(column <= ncol(x)), length(column) == nrow(x)) u <- unique(column) tmp <- split(seq_along(column), factor(column, levels = u)) for (i in seq_along(tmp)) { x[ tmp[[i]], u[i] ] <- 1 } x }> fDF(mydf, myindex)c1 c2 c3 1 1 NA NA 2 NA 1 NA 3 NA NA 1 4 NA 1 NA 5 1 NA NA If you use a matrix instead of a data.frame then the following works and is probably much quicker. fMat <- function (x, column) { stopifnot(is.matrix(x), all(column > 0), all(column <= ncol(x)), length(column) == nrow(x)) x[cbind(seq_len(nrow(x)), column)] <- 1 x } Your problem may be better represented with sparse matrices (see the Matrix package). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Dimitri Liakhovitski > Sent: Wednesday, March 13, 2013 5:11 PM > To: r-help > Subject: [R] Modifying a data frame based on a vector that contains column numbers > > Hello! > > # I have a data frame: > mydf<-data.frame(c1=rep(NA,5),c2=rep(NA,5),c3=rep(NA,5)) > > # I have an index whose length is always the same as nrow(mydf): > myindex<-c(1,2,3,2,1) > > # I need c1 to have 1s in rows 1 and 5 (based on the information in myindex) > # I need c2 to have 1s in rows 2 and 4 (also based on myindex) > # I need c3 to have 1 in row 3 > # In other words, I am trying to achieve this result: > mygoal<-data.frame(c1=c(1,NA,NA,NA,1),c2=c(NA,1,NA,1,NA),c3=c(NA,NA,1,NA,NA)) > > I know how to do it with a loop that runs through rows of mydf. > However, in real life I have a huge data frame with tons of rows, dozens of > columns (instead of 3 in this example) - I am afraid it'll take forever. > Any hint on how to do it faster, maybe using subindexing somehow? > > Thank you very much! > > -- > Dimitri Liakhovitski > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
arun
2013-Mar-14 01:36 UTC
[R] Modifying a data frame based on a vector that contains column numbers
HI, Try this: ?mydf1<- mydf ?mydf1[]<-lapply(1:3,function(i) {mydf[which(i== myindex),i]<-1; mydf[,i]}) ?mydf1 #? c1 c2 c3 #1? 1 NA NA #2 NA? 1 NA #3 NA NA? 1 #4 NA? 1 NA #5? 1 NA NA ?identical(mydf1,mygoal) #[1] TRUE A.K. ----- Original Message ----- From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> To: r-help <r-help at r-project.org> Cc: Sent: Wednesday, March 13, 2013 8:10 PM Subject: [R] Modifying a data frame based on a vector that contains column numbers Hello! # I have a data frame: mydf<-data.frame(c1=rep(NA,5),c2=rep(NA,5),c3=rep(NA,5)) # I have an index whose length is always the same as nrow(mydf): myindex<-c(1,2,3,2,1) # I need c1 to have 1s in rows 1 and 5 (based on the information in myindex) # I need c2 to have 1s in rows 2 and 4 (also based on myindex) # I need c3 to have 1 in row 3 # In other words, I am trying to achieve this result: mygoal<-data.frame(c1=c(1,NA,NA,NA,1),c2=c(NA,1,NA,1,NA),c3=c(NA,NA,1,NA,NA)) I know how to do it with a loop that runs through rows of mydf. However, in real life I have a huge data frame with tons of rows, dozens of columns (instead of 3 in this example) - I am afraid it'll take forever. Any hint on how to do it faster, maybe using subindexing somehow? Thank you very much! -- Dimitri Liakhovitski ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Reasonably Related Threads
- avoiding too many loops - reshaping data
- Converting a data frame to matrix
- Creating a "shifted" month (one that starts not on the first of each month but on another date)
- How to select specific rows from a data frame based on values
- Assignment of values with different indexes