Jakob Hedegaard
2010-Sep-08 18:17 UTC
[R] Replace NAs in one column with data from another column
Hi list, I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. My first attempt was: for (i in 1:169221){ if (is.na(m[i,3])==TRUE){ m[i,11] <- as.character(m[i,1])} else{ m[i,11] <- as.character(m[i,3])} } Works - but takes too long time. I would appreciate alternative solutions. Best regards, Jakob
Dimitris Rizopoulos
2010-Sep-08 18:22 UTC
[R] Replace NAs in one column with data from another column
one way is the following: m <- data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100)) m$z[sample(100, 20)] <- NA m$z.new <- ifelse(is.na(m$z), m$x, m$z) I hope it helps. Best, Dimitris On 9/8/2010 8:17 PM, Jakob Hedegaard wrote:> Hi list, > > I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. > > My first attempt was: > > for (i in 1:169221){ > if (is.na(m[i,3])==TRUE){ > m[i,11]<- as.character(m[i,1])} > else{ > m[i,11]<- as.character(m[i,3])} > } > > Works - but takes too long time. > I would appreciate alternative solutions. > > Best regards, Jakob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
jim holtman
2010-Sep-08 18:23 UTC
[R] Replace NAs in one column with data from another column
?ifelse df$newCol <- ifelse(is.na(df$col3), df$col1, df$col3) On Wed, Sep 8, 2010 at 2:17 PM, Jakob Hedegaard <Jakob.Hedegaard at agrsci.dk> wrote:> Hi list, > > I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. > > My first attempt was: > > for (i in 1:169221){ > if (is.na(m[i,3])==TRUE){ > m[i,11] <- as.character(m[i,1])} > else{ > m[i,11] <- as.character(m[i,3])} > } > > Works - but takes too long time. > I would appreciate alternative solutions. > > Best regards, Jakob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Joshua Wiley
2010-Sep-08 18:24 UTC
[R] Replace NAs in one column with data from another column
Hi Jakob, You can use is.na() to create an index of which rows in column 3 are missing data, and then select these from column 1. Here is a simple example: dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) dat$new <- dat$V3 my.na <- is.na(dat$V3) dat$new[my.na] <- dat$V1[my.na] dat This should be quite fast. I broke the steps up to be explicit, but you can readily simplify them. HTH, Josh On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard <Jakob.Hedegaard at agrsci.dk> wrote:> Hi list, > > I have a data frame (m) with 169221 rows and 10 columns and would like to make a new column containing the content of column 3 but replace the NAs in column 3 with the data in column 1 (from the same row as the NA in column 3). Column 1 has data in all rows. > > My first attempt was: > > for (i in 1:169221){ > if (is.na(m[i,3])==TRUE){ > m[i,11] <- as.character(m[i,1])} > else{ > m[i,11] <- as.character(m[i,3])} > } > > Works - but takes too long time. > I would appreciate alternative solutions. > > Best regards, Jakob > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
David Winsemius
2010-Sep-08 19:02 UTC
[R] Replace NAs in one column with data from another column
On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote:> Hi Jakob, > > You can use is.na() to create an index of which rows in column 3 are > missing data, and then select these from column 1. Here is a simple > example: > > dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) > dat$new <- dat$V3 > my.na <- is.na(dat$V3) > dat$new[my.na] <- dat$V1[my.na] > > dat > > This should be quite fast. I broke the steps up to be explicit, but > you can readily simplify them.I was about to post something similar except I was going to avoid the "$" operator thinking, incorrectly as it turned out, that it would be faster. I also include the Holtman/Rizopoulos suggestion of ifelse(). I was also surprised that ifelse is the winning strategy: dat[4] <- dat[3]; idx <-is.na(dat[, 3]) dat[is.na(dat[, 3]), 4] <- dat[is.na(dat[, 3]), 1] > benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1, dat$V3)}, + meth.dlr.sign={dat$new <- dat$V3 + my.na <- is.na(dat$V3) + dat$new[my.na] <- dat$V1[my.na]}, + meth.index ={dat[4] <- dat[3]; idx <-is.na(dat[, 3]) + dat[idx, 4] <- dat[idx, 1]}, + meth.forloop ={for (i in 1:nrow(dat)){ + if (is.na(dat[i,3])==TRUE){ + dat[i,4]<- dat[i,1]} + else{ + dat[i,4]<- dat[i,3]} } + }, + replications=5000, columns = c("test", "replications", "elapsed", + "relative", "user.self") ) test replications elapsed relative user.self 2 meth.dlr.sign 5000 0.502 1.081897 0.501 4 meth.forloop 5000 6.419 13.834052 6.409 1 meth.ifelse 5000 0.464 1.000000 0.463 3 meth.index 5000 2.908 6.267241 2.904 -- David.> > HTH, > > Josh > > On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard > <Jakob.Hedegaard at agrsci.dk> wrote: >> Hi list, >> >> I have a data frame (m) with 169221 rows and 10 columns and would >> like to make a new column containing the content of column 3 but >> replace the NAs in column 3 with the data in column 1 (from the >> same row as the NA in column 3). Column 1 has data in all rows. >> >> My first attempt was: >> >> for (i in 1:169221){ >> if (is.na(m[i,3])==TRUE){ >> m[i,11] <- as.character(m[i,1])} >> else{ >> m[i,11] <- as.character(m[i,3])} >> } >> >> Works - but takes too long time. >> I would appreciate alternative solutions. >> >> Best regards, Jakob >-- David Winsemius, MD West Hartford, CT
Joshua Wiley
2010-Sep-08 19:56 UTC
[R] Replace NAs in one column with data from another column
On Wed, Sep 8, 2010 at 12:02 PM, David Winsemius <dwinsemius at comcast.net> wrote:> > On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote: > >> Hi Jakob, >> >> You can use is.na() to create an index of which rows in column 3 are >> missing data, and then select these from column 1. ?Here is a simple >> example: >> >> dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, ?NA)) >> dat$new <- dat$V3 >> my.na <- is.na(dat$V3) >> dat$new[my.na] <- dat$V1[my.na] >> >> dat >> >> This should be quite fast. ?I broke the steps up to be explicit, but >> you can readily simplify them. > > I was about to post something similar except I was going to avoid the "$" > operator thinking, incorrectly as it turned out, that it would be faster. I > also include the Holtman/Rizopoulos suggestion of ifelse(). I was also > surprised that ifelse is the winning strategy:That surprises me too. What I find really curious is the (relatively) large difference between the dlr.sign and index methods. Some of the difference is gained back if dat[, 4] <- dat[, 3] is used over dat[4] <- dat[3]. But it still lags noticeably on my old clunker (with the inventive name, index2) compared to dlr.sign: # after failed attempts with benchmark::benchmark() # I decided this is what you used> library(rbenchmark) > dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, NA)) > rbenchmark::benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1, dat$V3)},+ meth.dlr.sign = {dat$new <- dat$V3 + my.na <- is.na(dat$V3) + dat$new[my.na] <- dat$V1[my.na]}, + meth.index = {dat[4] <- dat[3]; idx <-is.na(dat[, 3]) + dat[idx, 4] <- dat[idx, 1]}, + meth.index2 = {dat[, 4] <- dat[, 3]; idx <-is.na(dat[, 3]) + dat[idx, 4] <- dat[idx, 1]}, + meth.forloop = {for (i in 1:nrow(dat)){ + if(is.na(dat[i,2])==TRUE){ + dat[i, 3] <- dat[i, 1] + } else { dat[i,3] <- dat[i,2]}} + }, + replications=5000, columns = c("test", "replications", "elapsed", + "relative", "user.self")) test replications elapsed relative user.self 2 meth.dlr.sign 5000 1.337 1.206679 1.216 5 meth.forloop 5000 16.941 15.289711 14.997 1 meth.ifelse 5000 1.108 1.000000 1.061 3 meth.index 5000 8.868 8.003610 7.164 4 meth.index2 5000 6.099 5.504513 5.136> > dat[4] <- dat[3]; idx <-is.na(dat[, 3]) > dat[is.na(dat[, 3]), 4] <- dat[is.na(dat[, 3]), 1] > >> benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1, >> dat$V3)}, > + ?meth.dlr.sign={dat$new <- dat$V3 > + ?my.na <- is.na(dat$V3) > + ?dat$new[my.na] <- dat$V1[my.na]}, > + ?meth.index ={dat[4] <- dat[3]; idx <-is.na(dat[, 3]) > + ?dat[idx, 4] <- dat[idx, 1]}, > + meth.forloop ={for (i in 1:nrow(dat)){ > + if (is.na(dat[i,3])==TRUE){ > + dat[i,4]<- dat[i,1]} > + else{ > + dat[i,4]<- dat[i,3]} } > + }, > + replications=5000, columns = c("test", "replications", "elapsed", > + ? ? ?"relative", "user.self") ) > ? ? ? ? ? test replications elapsed ?relative user.self > 2 meth.dlr.sign ? ? ? ? 5000 ? 0.502 ?1.081897 ? ? 0.501 > 4 ?meth.forloop ? ? ? ? 5000 ? 6.419 13.834052 ? ? 6.409 > 1 ? meth.ifelse ? ? ? ? 5000 ? 0.464 ?1.000000 ? ? 0.463 > 3 ? ?meth.index ? ? ? ? 5000 ? 2.908 ?6.267241 ? ? 2.904 > > -- > David. >> >> HTH, >> >> Josh >> >> On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard >> <Jakob.Hedegaard at agrsci.dk> wrote: >>> >>> Hi list, >>> >>> I have a data frame (m) with 169221 rows and 10 columns and would like to >>> make a new column containing the content of column 3 but replace the NAs in >>> column 3 with the data in column 1 (from the same row as the NA in column >>> 3). Column 1 has data in all rows. >>> >>> My first attempt was: >>> >>> for (i in 1:169221){ >>> if (is.na(m[i,3])==TRUE){ >>> m[i,11] <- as.character(m[i,1])} >>> else{ >>> m[i,11] <- as.character(m[i,3])} >>> } >>> >>> Works - but takes too long time. >>> I would appreciate alternative solutions. >>> >>> Best regards, Jakob >> > -- > > David Winsemius, MD > West Hartford, CT > >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/