Hi,
I have two dataframes:
The first, df1, contains some missing data:
cola colb colc cold cole
1 NA 5 9 NA 17
2 NA 6 NA 14 NA
3 3 NA 11 15 19
4 4 8 12 NA 20
The second, df2, contains the following:
cola colb colc cold cole
1 1.4 0.8 0.02 1.6 0.6
I'm wanting all missing data in df1$cola to be replaced by the value of
df2$cola. Then the missing data in df1$colb to be replaced with the
corresponding value in df2$colb etc.
I can get this to work column by column with single input lines but as my
original dataset is a lot larger I'm wanting a create a loop but can't
work
out how.
The single line command is:
df1$cola[is.na(df1$cola)]<-df2$cola
I've tried a replace function within a loop but get error messages:
list<-colnames(df1)
for (i in list) {
r<-replace(df1$i,df1$i[is.na(df1$i)],df2$i)
}
with error messages of:
Warning messages:
1: In is.na(mymat$snp) :
is.na() applied to non-(list or vector) of type 'NULL'
Can anyone help me with this?
Thanks
--
View this message in context:
http://r.789695.n4.nabble.com/Help-with-loop-tp4636140.html
Sent from the R help mailing list archive at Nabble.com.
Hello,
A one-liner could be
df1 <- read.table(text="
cola colb colc cold cole
1 NA 5 9 NA 17
2 NA 6 NA 14 NA
3 3 NA 11 15 19
4 4 8 12 NA 20
", header=TRUE)
df2 <- read.table(text="
cola colb colc cold cole
1 1.4 0.8 0.02 1.6 0.6
", header=TRUE)
sapply(names(df1), function(nm) {df1[[nm]][is.na(df1[[nm]])] <-
df2[[nm]]; df1[[nm]]})
Avoid loops, use *apply.
Hope this helps,
Rui Barradas
Em 11-07-2012 15:11, paulalou escreveu:> Hi,
>
> I have two dataframes:
>
> The first, df1, contains some missing data:
>
> cola colb colc cold cole
> 1 NA 5 9 NA 17
> 2 NA 6 NA 14 NA
> 3 3 NA 11 15 19
> 4 4 8 12 NA 20
>
> The second, df2, contains the following:
>
> cola colb colc cold cole
> 1 1.4 0.8 0.02 1.6 0.6
>
> I'm wanting all missing data in df1$cola to be replaced by the value of
> df2$cola. Then the missing data in df1$colb to be replaced with the
> corresponding value in df2$colb etc.
>
> I can get this to work column by column with single input lines but as my
> original dataset is a lot larger I'm wanting a create a loop but
can't work
> out how.
>
> The single line command is:
>
> df1$cola[is.na(df1$cola)]<-df2$cola
>
> I've tried a replace function within a loop but get error messages:
>
> list<-colnames(df1)
>
> for (i in list) {
> r<-replace(df1$i,df1$i[is.na(df1$i)],df2$i)
> }
>
>
> with error messages of:
>
> Warning messages:
> 1: In is.na(mymat$snp) :
> is.na() applied to non-(list or vector) of type 'NULL'
>
> Can anyone help me with this?
>
> Thanks
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Help-with-loop-tp4636140.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
I think I just learned this myself: Don't put the $ extension in the bracket : df1$cola[is.na(df1$cola)]<-> > df2$cola >Instead substitute using brackets within the brackets: df1["cola"]is.na(df1["cola"])]<-> > df2["cola"]> then the "cola" s can be substituted. >Maybe this will help On Wed, Jul 11, 2012 at 10:11 AM, paulalou <pls28@medschl.cam.ac.uk> wrote:> Hi, > > I have two dataframes: > > The first, df1, contains some missing data: > > cola colb colc cold cole > 1 NA 5 9 NA 17 > 2 NA 6 NA 14 NA > 3 3 NA 11 15 19 > 4 4 8 12 NA 20 > > The second, df2, contains the following: > > cola colb colc cold cole > 1 1.4 0.8 0.02 1.6 0.6 > > I'm wanting all missing data in df1$cola to be replaced by the value of > df2$cola. Then the missing data in df1$colb to be replaced with the > corresponding value in df2$colb etc. > > I can get this to work column by column with single input lines but as my > original dataset is a lot larger I'm wanting a create a loop but can't work > out how. > > The single line command is: > > df1$cola[is.na(df1$cola)]<-df2$cola > > I've tried a replace function within a loop but get error messages: > > list<-colnames(df1) > > for (i in list) { > r<-replace(df1$i,df1$i[is.na(df1$i)],df2$i) > } > > > with error messages of: > > Warning messages: > 1: In is.na(mymat$snp) : > is.na() applied to non-(list or vector) of type 'NULL' > > Can anyone help me with this? > > Thanks > > -- > View this message in context: > http://r.789695.n4.nabble.com/Help-with-loop-tp4636140.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Charles Stangor Professor and Associate Chair [[alternative HTML version deleted]]
Hi,
Try this:
func1<-function(x,y,z)
?{ifelse(is.na(y[[x]]),z[[x]],y[[x]])}
dat3<-data.frame(lapply(colnames(df1),function(x) func1(x,df1,df2)))
colnames(dat3)<-colnames(df1)
dat3
? cola colb? colc cold cole
1? 1.4? 5.0? 9.00? 1.6 17.0
2? 1.4? 6.0? 0.02 14.0? 0.6
3? 3.0? 0.8 11.00 15.0 19.0
4? 4.0? 8.0 12.00? 1.6 20.0
#or
sapply(colnames(df1),function(x) func1(x,df1,df2))
A.K.
----- Original Message -----
From: paulalou <pls28 at medschl.cam.ac.uk>
To: r-help at r-project.org
Cc:
Sent: Wednesday, July 11, 2012 10:11 AM
Subject: [R] Help with loop
Hi,
I have two dataframes:
The first, df1, contains some missing data:
? cola colb colc cold cole
1? ? NA? ? 5? ? 9? NA? 17
2? ? NA? ? 6? NA? 14? NA
3? ? 3? ? NA? 11? 15? 19
4? ? 4? ? 8? 12? NA? 20
The second, df2, contains the following:
? cola colb colc cold cole
1? 1.4? 0.8 0.02? 1.6? 0.6
I'm wanting all missing data in df1$cola to be replaced by the value of
df2$cola. Then the missing data in df1$colb to be replaced with the
corresponding value in df2$colb etc.
I can get this to work column by column with single input lines but as my
original dataset is a lot larger I'm wanting a create a loop but can't
work
out how.
The single line command is:
df1$cola[is.na(df1$cola)]<-df2$cola
I've tried a replace function within a loop but get error messages:
list<-colnames(df1)
for (i in list) {
r<-replace(df1$i,df1$i[is.na(df1$i)],df2$i)
}
with error messages of:
Warning messages:
1: In is.na(mymat$snp) :
? is.na() applied to non-(list or vector) of type 'NULL'
Can anyone help me with this?
Thanks
--
View this message in context:
http://r.789695.n4.nabble.com/Help-with-loop-tp4636140.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
hello there, I'm an R beginner and got plunged into this. I guess my attempts are hopeless so far, so I won't even show them. I want to write a loop, which prints all erroneous values. My definition of erroneous: If the current counts (partridge counts in a hunting district) differ from last years counts by more than 50 percent and absolut values differ by more than 5 animals I want r to print these values. I have a grouping variable District "D", the year "Y" and the counts "C". example table: D Y C a 2005 10 a 2006 0 a 2007 9 b 2005 1 b 2006 0 b 2007 1 c 2005 5 c 2006 NA c 2007 4 Although the difference in a and b is 100 percent I would doubt a's population breakdown, whereas District b is credible. To confuse things I want the loop to skip missing values and instead look at the year after. Any help is very much appreciated! Thanks, Katrin