On Mon, 2006-12-18 at 10:58 -0800, downunder wrote:> Hi all,
>
> I have to recode some values in a dataset. for example changing all zeros
to
> "." or 999 would be also ok. does anybody know how to do this?
thanks in
> advance. lars
R has its own missing value designator, which is NA. A "." or
"999"
would not be handled in a consistent fashion by most R functions,
whereas NA would be. As you will note below, "." would be rejected in
numerical operations.
For example (see ?mean):
> mean(c(1, 2, 3, 0))
[1] 1.5
> mean(c(1, 2, 3, NA))
[1] NA
> mean(c(1, 2, 3, NA), na.rm = TRUE)
[1] 2
> mean(c(1, 2, 3, .), na.rm = TRUE)
Error in mean(c(1, 2, 3, .), na.rm = TRUE) :
object "." not found
> mean(c(1, 2, 3, 999), na.rm = TRUE)
[1] 251.25
See ?NA and ?is.na and take note of the assignment usage in the latter.
To provide some examples:
1. Vector
> Vec <- sample(0:5, 10, replace = TRUE)
> Vec
[1] 5 3 4 5 1 4 4 0 1 0
> is.na(Vec) <- Vec == 0
> Vec
[1] 5 3 4 5 1 4 4 NA 1 NA
2. Matrix
> Mat <- matrix(sample(0:5, 20, replace = TRUE), ncol = 4)
> Mat
[,1] [,2] [,3] [,4]
[1,] 4 4 1 4
[2,] 3 1 1 3
[3,] 3 0 1 0
[4,] 2 2 0 5
[5,] 4 0 5 1
> is.na(Mat) <- Mat == 0
> Mat
[,1] [,2] [,3] [,4]
[1,] 4 4 1 4
[2,] 3 1 1 3
[3,] 3 NA 1 NA
[4,] 2 2 NA 5
[5,] 4 NA 5 1
3. Dataframe
> iris.tmp <- iris[1:10, ]
> iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
> iris.tmp$Sepal.Length[sample(10, 3)] <- 0
> iris.tmp$Sepal.Width[sample(10, 3)] <- 0
> iris.tmp$Petal.Length[sample(10, 3)] <- 0
> iris.tmp$Petal.Width[sample(10, 3)] <- 0
> iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 0.0 0.0 0.2 setosa
2 4.9 0.0 1.4 0.2 setosa
3 4.7 0.0 1.3 0.0 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.0 setosa
6 5.4 3.9 0.0 0.0 setosa
7 0.0 3.4 1.4 0.3 setosa
8 0.0 3.4 0.0 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 0.0 3.1 1.5 0.1 setosa
> is.na(iris.tmp) <- iris.tmp == 0
> iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 NA NA 0.2 setosa
2 4.9 NA 1.4 0.2 setosa
3 4.7 NA 1.3 NA setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 NA setosa
6 5.4 3.9 NA NA setosa
7 NA 3.4 1.4 0.3 setosa
8 NA 3.4 NA 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 NA 3.1 1.5 0.1 setosa
> summary(iris.tmp)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.400 Min. :2.900 Min. :1.300 Min. :0.1
1st Qu.:4.650 1st Qu.:3.100 1st Qu.:1.400 1st Qu.:0.2
Median :4.900 Median :3.400 Median :1.400 Median :0.2
Mean :4.871 Mean :3.343 Mean :1.414 Mean :0.2
3rd Qu.:5.050 3rd Qu.:3.500 3rd Qu.:1.450 3rd Qu.:0.2
Max. :5.400 Max. :3.900 Max. :1.500 Max. :0.3
NA's :3.000 NA's :3.000 NA's :3.000 NA's :3.0
Species
setosa :10
versicolor: 0
virginica : 0
If you want a more generic approach to replacing values based upon
logical conditions, there is also the replace() function:
> iris.tmp$Sepal.Length <- with(iris.tmp,
replace(Sepal.Length,
Sepal.Length > 5.0, 999))
> iris.tmp
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 999.0 NA NA 0.2 setosa
2 4.9 NA 1.4 0.2 setosa
3 4.7 NA 1.3 NA setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 NA setosa
6 999.0 3.9 NA NA setosa
7 NA 3.4 1.4 0.3 setosa
8 NA 3.4 NA 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 NA 3.1 1.5 0.1 setosa
See ?replace for more information and note that the assignment does not
happen "in place", you need to assign the result.
Finally, if you are reading in data sets from ASCII files using one of
the read.table() family of functions, take note of the 'na.strings'
argument, which will define the incoming values that you want to
explicitly set to missing (NA) during the import process.
See ?read.table for more information.
HTH,
Marc Schwartz