Dear All, I have a dataframe like below but with many thousands rows, structure(list(gene_id = structure(1:6, .Label = c("0610005C13Rik", "0610007P14Rik", "0610009B22Rik", "0610009L18Rik", "0610009O20Rik", "0610010B08Rik,OTTMUSG00000016609"), class = "factor"), log2.fold_change. = c(0.0114463, -0.0960262, 0.00805151, -0.179981, -0.0629098, 0.155979), p_value = c(1, 0.77915, 0.98265, 0.68665, 0.85035, 0.72235), new.value = c("NA", "NA", "NA", "NA", "NA", "NA")), .Names = c("gene_id", "log2.fold_change.", "p_value", "new.value"), row.names = c(NA, 6L), class = "data.frame") I want to check if second column is positive or negative value, then I will do some calculation and put the new value in last column. I can do this with for loop like below but it is not efficient. Is there a better way to use a vectorization method instead of loop? Many thanks! for (i in 1:nrow(dataframe)) { if dataframe[i, 2]>0 { dataframe[i, 4]<- 1 * (1/dataframe[i,3])} else{ dataframe[i, 4] <- -1* (1/dataframe[i,3])} } ------------------------------------------------------- Stephen H.K. WONG, PhD. Stanford University [[alternative HTML version deleted]]
ruipbarradas at sapo.pt
2016-Mar-21 18:50 UTC
[R] how to use vectorization instead of for loop
Hello, I've renamed your dataframe to 'dat'. Since ?ifelse is vectorized, try dat[, 4] <- ifelse(dat[, 2] > 0, 1 * (1/dat[,3]), -1* (1/dat[,3])) Oh, and why do you multiply by 1 and by -1? It would simply be 1/dat[,3] and -1/dat[,3]. Hope this helps, Rui Barradas Quoting Stephen HK WONG <honkit at stanford.edu>:> Dear All, > > > I have a dataframe like below but with many thousands rows, > > structure(list(gene_id = structure(1:6, .Label = c("0610005C13Rik", > "0610007P14Rik", "0610009B22Rik", "0610009L18Rik", "0610009O20Rik", > "0610010B08Rik,OTTMUSG00000016609"), class = "factor"), > log2.fold_change. = c(0.0114463, > -0.0960262, 0.00805151, -0.179981, -0.0629098, 0.155979), p_value = c(1, > 0.77915, 0.98265, 0.68665, 0.85035, 0.72235), new.value = c("NA", > "NA", "NA", "NA", "NA", "NA")), .Names = c("gene_id", "log2.fold_change.", > "p_value", "new.value"), row.names = c(NA, 6L), class = "data.frame") > > > I want to check if second column is positive or negative value, then > I will do some calculation and put the new value in last column. I > can do this with for loop like below but it is not efficient. Is > there a better way to use a vectorization method instead of loop? > Many thanks! > > > for (i in 1:nrow(dataframe)) { > > if dataframe[i, 2]>0 { > > dataframe[i, 4]<- 1 * (1/dataframe[i,3])} else{ > > dataframe[i, 4] <- -1* (1/dataframe[i,3])} > > } > > > ------------------------------------------------------- > > Stephen H.K. WONG, PhD. > > Stanford University > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
So much thanks Rui, the code can be so simple and fast. By the way, ifelse is good for two conditions, in my case, either >0, or <0, I found there's a lot of row with value "Inf", I want to keep it in new column, how do I do that using ifelse ? Thanks. ________________________________________ From: ruipbarradas at sapo.pt <ruipbarradas at sapo.pt> Sent: Monday, March 21, 2016 11:50 AM To: Stephen HK WONG Cc: r-help at r-project.org Subject: Re: [R] how to use vectorization instead of for loop Hello, I've renamed your dataframe to 'dat'. Since ?ifelse is vectorized, try dat[, 4] <- ifelse(dat[, 2] > 0, 1 * (1/dat[,3]), -1* (1/dat[,3])) Oh, and why do you multiply by 1 and by -1? It would simply be 1/dat[,3] and -1/dat[,3]. Hope this helps, Rui Barradas Quoting Stephen HK WONG <honkit at stanford.edu>:> Dear All, > > > I have a dataframe like below but with many thousands rows, > > structure(list(gene_id = structure(1:6, .Label = c("0610005C13Rik", > "0610007P14Rik", "0610009B22Rik", "0610009L18Rik", "0610009O20Rik", > "0610010B08Rik,OTTMUSG00000016609"), class = "factor"), > log2.fold_change. = c(0.0114463, > -0.0960262, 0.00805151, -0.179981, -0.0629098, 0.155979), p_value = c(1, > 0.77915, 0.98265, 0.68665, 0.85035, 0.72235), new.value = c("NA", > "NA", "NA", "NA", "NA", "NA")), .Names = c("gene_id", "log2.fold_change.", > "p_value", "new.value"), row.names = c(NA, 6L), class = "data.frame") > > > I want to check if second column is positive or negative value, then > I will do some calculation and put the new value in last column. I > can do this with for loop like below but it is not efficient. Is > there a better way to use a vectorization method instead of loop? > Many thanks! > > > for (i in 1:nrow(dataframe)) { > > if dataframe[i, 2]>0 { > > dataframe[i, 4]<- 1 * (1/dataframe[i,3])} else{ > > dataframe[i, 4] <- -1* (1/dataframe[i,3])} > > } > > > ------------------------------------------------------- > > Stephen H.K. WONG, PhD. > > Stanford University > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.