Dimitri Liakhovitski
2009-Mar-27  18:03 UTC
[R] Efficiency: speeding up unlist that is currently running by row
Hello everyone!
I have a piece of code that works and does what I need but...:
# I have 3 slots:
nr.of.slots<-3
# My data frame is "new.a":
new.a<-data.frame(x=c("john",
"mary"),y=c("pete","john"),z=c("mary","pete"),stringsAsFactors=FALSE)
print(new.a)
# Creating all possible combinations of the rows of "new.a" with all
possible combinations of "p1" and "p2" in 3 locations (3 new
columns):
big.a<-cbind(new.a[rep(1:nrow(new.a),each=8),],expand.grid(paste("p",1:2,sep=""),paste("p",1:2,sep=""),paste("p",1:2,sep=""))[rep(1:8,nrow(new.a)),])
print(big.a)
# Making sure the last 3 columns are characters, not factors:
for(i in 1:nr.of.slots) { big.a[[(i+3)]]<-as.character(big.a[[(i+3)]]) }
str(big.a)
# Creating a final dataframe with as many columns as slots (i.e., 3);
each cell contains a name of a person and "p1" or "p2":
output<-data.frame(matrix(nrow = nrow(big.a), ncol = nr.of.slots))
for(i in 1:nr.of.slots) {
	names(output)[i]<-paste("slot",i,sep=".")
}
# THIS IS THE SECTION OF THE CODE I HAVE A QUESTION ABOUT:
for(i in 1:nr.of.slots) {
	output[[i]]<-lapply(1:nrow(big.a),function(x){
		out<-unlist(c(big.a[x,i],big.a[x,i+nr.of.slots]))
		return(out)
	})
}
print(output)
# This is exactly the output I am looking for: Each cell of "output"
contains just 2 words:
print(output[1,1])
str(output[1,1])
MY QUESTION:
The section of the code above, in which I am running an unlist is
looping through rows. My problem is that in my real data frame I'll
have over a million of rows and more than 3 columns in output. It's
very slow. Is it at all possible to speed it up somehow? Somehow merge
(pairwise) the whole columns of the dataframe and not row by row?
Thank you very much for any adivce!
-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com
