thr3ads.net - R help - [R] Replace NAs in one column with data from another column [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Jakob Hedegaard

2010-Sep-08 18:17 UTC

[R] Replace NAs in one column with data from another column

Hi list,

I have a data frame (m) with 169221 rows and 10 columns and would like to make a
new column containing the content of column 3 but replace the NAs in column 3
with the data in column 1 (from the same row as the NA in column 3). Column 1
has data in all rows.

My first attempt was:

for (i in 1:169221){
if (is.na(m[i,3])==TRUE){
m[i,11] <- as.character(m[i,1])}
else{
m[i,11] <- as.character(m[i,3])}
}

Works - but takes too long time.
I would appreciate alternative solutions.

Best regards, Jakob

Dimitris Rizopoulos

2010-Sep-08 18:22 UTC

head link

[R] Replace NAs in one column with data from another column

one way is the following:

m <- data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100))
m$z[sample(100, 20)] <- NA

m$z.new <- ifelse(is.na(m$z), m$x, m$z)


I hope it helps.

Best,
Dimitris


On 9/8/2010 8:17 PM, Jakob Hedegaard wrote:> Hi list,
>
> I have a data frame (m) with 169221 rows and 10 columns and would like to
make a new column containing the content of column 3 but replace the NAs in
column 3 with the data in column 1 (from the same row as the NA in column 3).
Column 1 has data in all rows.
>
> My first attempt was:
>
> for (i in 1:169221){
> if (is.na(m[i,3])==TRUE){
> m[i,11]<- as.character(m[i,1])}
> else{
> m[i,11]<- as.character(m[i,3])}
> }
>
> Works - but takes too long time.
> I would appreciate alternative solutions.
>
> Best regards, Jakob
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

jim holtman

2010-Sep-08 18:23 UTC

head link

[R] Replace NAs in one column with data from another column

?ifelse

df$newCol <- ifelse(is.na(df$col3), df$col1, df$col3)

On Wed, Sep 8, 2010 at 2:17 PM, Jakob Hedegaard
<Jakob.Hedegaard at agrsci.dk> wrote:> Hi list,
>
> I have a data frame (m) with 169221 rows and 10 columns and would like to
make a new column containing the content of column 3 but replace the NAs in
column 3 with the data in column 1 (from the same row as the NA in column 3).
Column 1 has data in all rows.
>
> My first attempt was:
>
> for (i in 1:169221){
> if (is.na(m[i,3])==TRUE){
> m[i,11] <- as.character(m[i,1])}
> else{
> m[i,11] <- as.character(m[i,3])}
> }
>
> Works - but takes too long time.
> I would appreciate alternative solutions.
>
> Best regards, Jakob
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Joshua Wiley

2010-Sep-08 18:24 UTC

head link

[R] Replace NAs in one column with data from another column

Hi Jakob,

You can use is.na() to create an index of which rows in column 3 are
missing data, and then select these from column 1.  Here is a simple
example:

dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
dat$new <- dat$V3
my.na <- is.na(dat$V3)
dat$new[my.na] <- dat$V1[my.na]

dat

This should be quite fast.  I broke the steps up to be explicit, but
you can readily simplify them.

HTH,

Josh

On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
<Jakob.Hedegaard at agrsci.dk> wrote:> Hi list,
>
> I have a data frame (m) with 169221 rows and 10 columns and would like to
make a new column containing the content of column 3 but replace the NAs in
column 3 with the data in column 1 (from the same row as the NA in column 3).
Column 1 has data in all rows.
>
> My first attempt was:
>
> for (i in 1:169221){
> if (is.na(m[i,3])==TRUE){
> m[i,11] <- as.character(m[i,1])}
> else{
> m[i,11] <- as.character(m[i,3])}
> }
>
> Works - but takes too long time.
> I would appreciate alternative solutions.
>
> Best regards, Jakob
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

David Winsemius

2010-Sep-08 19:02 UTC

head link

[R] Replace NAs in one column with data from another column

On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote:
> Hi Jakob,
>
> You can use is.na() to create an index of which rows in column 3 are
> missing data, and then select these from column 1.  Here is a simple
> example:
>
> dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
> dat$new <- dat$V3
> my.na <- is.na(dat$V3)
> dat$new[my.na] <- dat$V1[my.na]
>
> dat
>
> This should be quite fast.  I broke the steps up to be explicit, but
> you can readily simplify them.
I was about to post something similar except I was going to avoid the  
"$" operator thinking, incorrectly as it turned out, that it would be
faster. I also include the Holtman/Rizopoulos suggestion of ifelse().  
I was also surprised that ifelse is the winning strategy:

dat[4] <- dat[3]; idx <-is.na(dat[, 3])
dat[is.na(dat[, 3]), 4] <- dat[is.na(dat[, 3]), 1]

 > benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1,  
dat$V3)},
+  meth.dlr.sign={dat$new <- dat$V3
+  my.na <- is.na(dat$V3)
+  dat$new[my.na] <- dat$V1[my.na]},
+  meth.index ={dat[4] <- dat[3]; idx <-is.na(dat[, 3])
+  dat[idx, 4] <- dat[idx, 1]},
+ meth.forloop ={for (i in 1:nrow(dat)){
+ if (is.na(dat[i,3])==TRUE){
+ dat[i,4]<- dat[i,1]}
+ else{
+ dat[i,4]<- dat[i,3]} }
+ },
+ replications=5000, columns = c("test", "replications",
"elapsed",
+      "relative", "user.self") )
            test replications elapsed  relative user.self
2 meth.dlr.sign         5000   0.502  1.081897     0.501
4  meth.forloop         5000   6.419 13.834052     6.409
1   meth.ifelse         5000   0.464  1.000000     0.463
3    meth.index         5000   2.908  6.267241     2.904

-- 
David.>
> HTH,
>
> Josh
>
> On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
> <Jakob.Hedegaard at agrsci.dk> wrote:
>> Hi list,
>>
>> I have a data frame (m) with 169221 rows and 10 columns and would  
>> like to make a new column containing the content of column 3 but  
>> replace the NAs in column 3 with the data in column 1 (from the  
>> same row as the NA in column 3). Column 1 has data in all rows.
>>
>> My first attempt was:
>>
>> for (i in 1:169221){
>> if (is.na(m[i,3])==TRUE){
>> m[i,11] <- as.character(m[i,1])}
>> else{
>> m[i,11] <- as.character(m[i,3])}
>> }
>>
>> Works - but takes too long time.
>> I would appreciate alternative solutions.
>>
>> Best regards, Jakob
>-- 

David Winsemius, MD
West Hartford, CT

Joshua Wiley

2010-Sep-08 19:56 UTC

head link

[R] Replace NAs in one column with data from another column

On Wed, Sep 8, 2010 at 12:02 PM, David Winsemius <dwinsemius at
comcast.net> wrote:>
> On Sep 8, 2010, at 2:24 PM, Joshua Wiley wrote:
>
>> Hi Jakob,
>>
>> You can use is.na() to create an index of which rows in column 3 are
>> missing data, and then select these from column 1. ?Here is a simple
>> example:
>>
>> dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4, ?NA))
>> dat$new <- dat$V3
>> my.na <- is.na(dat$V3)
>> dat$new[my.na] <- dat$V1[my.na]
>>
>> dat
>>
>> This should be quite fast. ?I broke the steps up to be explicit, but
>> you can readily simplify them.
>
> I was about to post something similar except I was going to avoid the
"$"
> operator thinking, incorrectly as it turned out, that it would be faster. I
> also include the Holtman/Rizopoulos suggestion of ifelse(). I was also
> surprised that ifelse is the winning strategy:
That surprises me too.  What I find really curious is the (relatively)
large difference between the dlr.sign and index methods.  Some of the
difference is gained back if dat[, 4] <- dat[, 3] is used over dat[4]
<- dat[3].  But it still lags noticeably on my old clunker (with the
inventive name, index2) compared to dlr.sign:

# after failed attempts with benchmark::benchmark()
# I decided this is what you used> library(rbenchmark)
> dat <- data.frame(V1 = 1:5, V3 = c(1, NA, 3, 4,  NA))
> rbenchmark::benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3),
dat$V1, dat$V3)},+           meth.dlr.sign = {dat$new <- dat$V3
+                            my.na <- is.na(dat$V3)
+                            dat$new[my.na] <- dat$V1[my.na]},
+           meth.index = {dat[4] <- dat[3]; idx <-is.na(dat[, 3])
+                        dat[idx, 4] <- dat[idx, 1]},
+           meth.index2 = {dat[, 4] <- dat[, 3]; idx <-is.na(dat[, 3])
+                        dat[idx, 4] <- dat[idx, 1]},
+           meth.forloop = {for (i in 1:nrow(dat)){
+             if(is.na(dat[i,2])==TRUE){
+               dat[i, 3] <- dat[i, 1]
+             } else { dat[i,3] <- dat[i,2]}}
+                         },
+           replications=5000, columns = c("test",
"replications", "elapsed",
+                                "relative", "user.self"))
           test replications elapsed  relative user.self
2 meth.dlr.sign         5000   1.337  1.206679     1.216
5  meth.forloop         5000  16.941 15.289711    14.997
1   meth.ifelse         5000   1.108  1.000000     1.061
3    meth.index         5000   8.868  8.003610     7.164
4   meth.index2         5000   6.099  5.504513     5.136

>
> dat[4] <- dat[3]; idx <-is.na(dat[, 3])
> dat[is.na(dat[, 3]), 4] <- dat[is.na(dat[, 3]), 1]
>
>> benchmark(meth.ifelse = {dat$z.new <- ifelse(is.na(dat$V3), dat$V1,
>> dat$V3)},
> + ?meth.dlr.sign={dat$new <- dat$V3
> + ?my.na <- is.na(dat$V3)
> + ?dat$new[my.na] <- dat$V1[my.na]},
> + ?meth.index ={dat[4] <- dat[3]; idx <-is.na(dat[, 3])
> + ?dat[idx, 4] <- dat[idx, 1]},
> + meth.forloop ={for (i in 1:nrow(dat)){
> + if (is.na(dat[i,3])==TRUE){
> + dat[i,4]<- dat[i,1]}
> + else{
> + dat[i,4]<- dat[i,3]} }
> + },
> + replications=5000, columns = c("test",
"replications", "elapsed",
> + ? ? ?"relative", "user.self") )
> ? ? ? ? ? test replications elapsed ?relative user.self
> 2 meth.dlr.sign ? ? ? ? 5000 ? 0.502 ?1.081897 ? ? 0.501
> 4 ?meth.forloop ? ? ? ? 5000 ? 6.419 13.834052 ? ? 6.409
> 1 ? meth.ifelse ? ? ? ? 5000 ? 0.464 ?1.000000 ? ? 0.463
> 3 ? ?meth.index ? ? ? ? 5000 ? 2.908 ?6.267241 ? ? 2.904
>
> --
> David.
>>
>> HTH,
>>
>> Josh
>>
>> On Wed, Sep 8, 2010 at 11:17 AM, Jakob Hedegaard
>> <Jakob.Hedegaard at agrsci.dk> wrote:
>>>
>>> Hi list,
>>>
>>> I have a data frame (m) with 169221 rows and 10 columns and would
like to
>>> make a new column containing the content of column 3 but replace
the NAs in
>>> column 3 with the data in column 1 (from the same row as the NA in
column
>>> 3). Column 1 has data in all rows.
>>>
>>> My first attempt was:
>>>
>>> for (i in 1:169221){
>>> if (is.na(m[i,3])==TRUE){
>>> m[i,11] <- as.character(m[i,1])}
>>> else{
>>> m[i,11] <- as.character(m[i,3])}
>>> }
>>>
>>> Works - but takes too long time.
>>> I would appreciate alternative solutions.
>>>
>>> Best regards, Jakob
>>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

R help - Sep 2010 - Replace NAs in one column with data from another column

[R] Replace NAs in one column with data from another column

[R] Replace NAs in one column with data from another column

[R] Replace NAs in one column with data from another column

[R] Replace NAs in one column with data from another column

[R] Replace NAs in one column with data from another column

[R] Replace NAs in one column with data from another column