thr3ads.net - R help - [R] Modifying a data frame based on a vector that contains column numbers [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2013-Mar-14 00:10 UTC

[R] Modifying a data frame based on a vector that contains column numbers

Hello!

# I have a data frame:
mydf<-data.frame(c1=rep(NA,5),c2=rep(NA,5),c3=rep(NA,5))

# I have an index whose length is always the same as nrow(mydf):
myindex<-c(1,2,3,2,1)

# I need c1 to have 1s in rows 1 and 5 (based on the information in myindex)
# I need c2 to have 1s in rows 2 and 4 (also based on myindex)
# I need c3 to have 1 in row 3
# In other words, I am trying to achieve this result:
mygoal<-data.frame(c1=c(1,NA,NA,NA,1),c2=c(NA,1,NA,1,NA),c3=c(NA,NA,1,NA,NA))

I know how to do it with a loop that runs through rows of mydf.
However, in real life I have a huge data frame with tons of rows, dozens of
columns (instead of 3 in this example) - I am afraid it'll take forever.
Any hint on how to do it faster, maybe using subindexing somehow?

Thank you very much!

-- 
Dimitri Liakhovitski

	[[alternative HTML version deleted]]

William Dunlap

2013-Mar-14 01:28 UTC

head link

[R] Modifying a data frame based on a vector that contains column numbers

Try looping over columns, as in

fDF <- function (x, column)
{
    stopifnot(length(dim(x))==2, all(column > 0), all(column <= ncol(x)),
length(column) == nrow(x))
    u <- unique(column)
    tmp <- split(seq_along(column), factor(column, levels = u))
    for (i in seq_along(tmp)) {
        x[ tmp[[i]],  u[i] ] <- 1
    }
    x
}
> fDF(mydf, myindex)  c1 c2 c3
1  1 NA NA
2 NA  1 NA
3 NA NA  1
4 NA  1 NA
5  1 NA NA

If you use a matrix instead of a data.frame then the following works and is
probably much quicker.
fMat <- function (x, column) 
{
    stopifnot(is.matrix(x), all(column > 0), all(column <= ncol(x)),
length(column) == nrow(x))
    x[cbind(seq_len(nrow(x)), column)] <- 1
    x
}

Your problem may be better represented with sparse matrices (see the Matrix
package).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Dimitri Liakhovitski
> Sent: Wednesday, March 13, 2013 5:11 PM
> To: r-help
> Subject: [R] Modifying a data frame based on a vector that contains column
numbers
> 
> Hello!
> 
> # I have a data frame:
> mydf<-data.frame(c1=rep(NA,5),c2=rep(NA,5),c3=rep(NA,5))
> 
> # I have an index whose length is always the same as nrow(mydf):
> myindex<-c(1,2,3,2,1)
> 
> # I need c1 to have 1s in rows 1 and 5 (based on the information in
myindex)
> # I need c2 to have 1s in rows 2 and 4 (also based on myindex)
> # I need c3 to have 1 in row 3
> # In other words, I am trying to achieve this result:
>
mygoal<-data.frame(c1=c(1,NA,NA,NA,1),c2=c(NA,1,NA,1,NA),c3=c(NA,NA,1,NA,NA))
> 
> I know how to do it with a loop that runs through rows of mydf.
> However, in real life I have a huge data frame with tons of rows, dozens of
> columns (instead of 3 in this example) - I am afraid it'll take
forever.
> Any hint on how to do it faster, maybe using subindexing somehow?
> 
> Thank you very much!
> 
> --
> Dimitri Liakhovitski
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2013-Mar-14 01:36 UTC

head link

[R] Modifying a data frame based on a vector that contains column numbers

HI,
Try this:
?mydf1<- mydf
?mydf1[]<-lapply(1:3,function(i) {mydf[which(i== myindex),i]<-1;
mydf[,i]})
?mydf1
#? c1 c2 c3
#1? 1 NA NA
#2 NA? 1 NA
#3 NA NA? 1
#4 NA? 1 NA
#5? 1 NA NA


?identical(mydf1,mygoal)
#[1] TRUE
A.K.



----- Original Message -----
From: Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Wednesday, March 13, 2013 8:10 PM
Subject: [R] Modifying a data frame based on a vector that contains column
numbers

Hello!

# I have a data frame:
mydf<-data.frame(c1=rep(NA,5),c2=rep(NA,5),c3=rep(NA,5))

# I have an index whose length is always the same as nrow(mydf):
myindex<-c(1,2,3,2,1)

# I need c1 to have 1s in rows 1 and 5 (based on the information in myindex)
# I need c2 to have 1s in rows 2 and 4 (also based on myindex)
# I need c3 to have 1 in row 3
# In other words, I am trying to achieve this result:
mygoal<-data.frame(c1=c(1,NA,NA,NA,1),c2=c(NA,1,NA,1,NA),c3=c(NA,NA,1,NA,NA))

I know how to do it with a loop that runs through rows of mydf.
However, in real life I have a huge data frame with tons of rows, dozens of
columns (instead of 3 in this example) - I am afraid it'll take forever.
Any hint on how to do it faster, maybe using subindexing somehow?

Thank you very much!

-- 
Dimitri Liakhovitski

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more reasonably related threads

R help - Mar 2013 - Modifying a data frame based on a vector that contains column numbers

[R] Modifying a data frame based on a vector that contains column numbers

[R] Modifying a data frame based on a vector that contains column numbers

[R] Modifying a data frame based on a vector that contains column numbers

Seemingly Similar Threads