Hi Steve,
Here is a suggestion using your original df1:
# Create a copy -- you can avoid this
newdf1 <- df1
# Process
newdf1[,2:4] <- apply(newdf1[,2:4], 2, function(x) as.numeric(x))
# Removing df1
rm(df1)
# Result
newdf1
# str()
str(newdf1)
# 'data.frame': 18 obs. of 4 variables:
# $ site: Factor w/ 3 levels "A","B","C": 1 1 1 1
1 1 2 2 2 2 ...
# $ v1 : num 10 22 44 521 5 ...
# $ v2 : num 5 54 214 14 73 0.4 1 4 NA 4 ...
# $ v3 : num NA NA 2 4 1 4 NA 5 4 1 ...
HTH,
Jorge
On Wed, Sep 16, 2009 at 1:50 PM, Steve Hong <> wrote:
> Dear all,
>
> I have partial data set with four colums. First column is "site"
with
> three
> factors (i.e., A, B, and C). From second to fourth columns (v1 ~ v3) are
> my
> observations. In the observations of the data set, "." indicates
missing
> value. I replaced "." with NA. To replace "." with
NA, I used two steps.
> First, I replaced "." with NA, and then, changed each variable
from factor
> to numeric using "df1$v1 <- as.numeric(df1$v1)". The second
step was OK
> when I have low numbers of variables, however, it is painful when I have a
> lot of variables.
>
> My question is: Is there any much more efficient way to convert this kind
> of
> large scale data? In short, I am looking for an alternative way of STEP 2.
> Or whole procedure if there is.
>
> Any comment will be highly appreciated.
>
> Thank you in advance!!
>
> Steve
>
> P.S.: Below is an example of what I did.
>
> STEP 1
> > df1
> site v1 v2 v3
> 1 A 10 5 .
> 2 A 22 54 .
> 3 A 44 214 2
> 4 A 521 14 4
> 5 A 5 73 1
> 6 A 1654 0.4 4
> 7 B 16 1 .
> 8 B . 4 5
> 9 B . . 4
> 10 B . 4 1
> 11 B 51 . 2
> 12 B 5 . .
> 13 C 1 0.4 .
> 14 C 0 4 .
> 15 C 1 1 4
> 16 C 40 . 7
> 17 C 4 . 7
> 18 C 10 . 1
> > str(df1)
> 'data.frame': 18 obs. of 4 variables:
> $ site: Factor w/ 3 levels "A","B","C": 1 1
1 1 1 1 2 2 2 2 ...
> $ v1 : Factor w/ 13 levels
".","0","1","10",..: 4 7 10 13 11 6 5 1
1 1
> ...
> $ v2 : Factor w/ 9 levels ".","0.4","1",..:
7 8 5 4 9 2 3 6 1 6 ...
> $ v3 : Factor w/ 6 levels
".","1","2","4",..: 1 1 3 4 2 4 1 5 4 2
...
> > df1[df1=="."] <- "NA"
> Warning messages:
> 1: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
> invalid factor level, NAs generated
> 2: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
> invalid factor level, NAs generated
> 3: In `[<-.factor`(`*tmp*`, thisvar, value = "NA") :
> invalid factor level, NAs generated
> > df1
> site v1 v2 v3
> 1 A 10 5 <NA>
> 2 A 22 54 <NA>
> 3 A 44 214 2
> 4 A 521 14 4
> 5 A 5 73 1
> 6 A 1654 0.4 4
> 7 B 16 1 <NA>
> 8 B <NA> 4 5
> 9 B <NA> <NA> 4
> 10 B <NA> 4 1
> 11 B 51 <NA> 2
> 12 B 5 <NA> <NA>
> 13 C 1 0.4 <NA>
> 14 C 0 4 <NA>
> 15 C 1 1 4
> 16 C 40 <NA> 7
> 17 C 4 <NA> 7
> 18 C 10 <NA> 1
> > str(df1)
> 'data.frame': 18 obs. of 4 variables:
> $ site: Factor w/ 3 levels "A","B","C": 1 1
1 1 1 1 2 2 2 2 ...
> $ v1 : Factor w/ 13 levels
".","0","1","10",..: 4 7 10 13 11 6 5 NA
NA NA
> ...
> $ v2 : Factor w/ 9 levels ".","0.4","1",..:
7 8 5 4 9 2 3 6 NA 6 ...
> $ v3 : Factor w/ 6 levels
".","1","2","4",..: NA NA 3 4 2 4 NA 5 4
2 ...
>
> STEP 2.
>
> > df1$v1 <- as.numeric(df1$v1)
> > df1$v2 <- as.numeric(df1$v2)
> > df1$v3 <- as.numeric(df1$v3)
> > df1
> site v1 v2 v3
> 1 A 4 7 NA
> 2 A 7 8 NA
> 3 A 10 5 3
> 4 A 13 4 4
> 5 A 11 9 2
> 6 A 6 2 4
> 7 B 5 3 NA
> 8 B NA 6 5
> 9 B NA NA 4
> 10 B NA 6 2
> 11 B 12 NA 3
> 12 B 11 NA NA
> 13 C 3 2 NA
> 14 C 2 6 NA
> 15 C 3 3 4
> 16 C 9 NA 6
> 17 C 8 NA 6
> 18 C 4 NA 2
> > str(df1)
> 'data.frame': 18 obs. of 4 variables:
> $ site: Factor w/ 3 levels "A","B","C": 1 1
1 1 1 1 2 2 2 2 ...
> $ v1 : num 4 7 10 13 11 6 5 NA NA NA ...
> $ v2 : num 7 8 5 4 9 2 3 6 NA 6 ...
> $ v3 : num NA NA 3 4 2 4 NA 5 4 2 ...
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]