I am a bit confused about the semantics of classes, [, and [[.
For at least some important built-in classes (factors and dates), both
the getter and the setter methods of [ operate on the class, but
though the getter method of [[ operates on the class, the setter
method operates on the underlying vector. Is this behavior
documented? (I haven't found any documentation of it.) Is it
intentional? (i.e. is it a bug or a feature?) There are also cases
where invalid assignments don't signal an error.
A simple example:
> fact <- factor(2,levels=2:4) # master copy
> f0 <- fact; f0; dput(f0)
[1] 2
Levels: 2 3 4
structure(1L, .Label = c("2", "3", "4"), class =
"factor")
> f0 <- fact; f0[1] <- 3; f0; dput(f0) # use [ setter
[1] 3
Levels: 2 3 4
structure(2L, .Label = c("2", "3", "4"), class =
"factor")
> f0 <- fact; f0[[1]] <- 3L; f0; dput(f0) # use [[ setter
[1] 4 # ? didn't
convert 3 to factor
Levels: 2 3 4
structure(3L, .Label = c("2", "3", "4"), class =
"factor") #
modified underlying vector> f0[1]
[1] 4
Levels: 2 3 4
# but result is a valid factor
> f0 <- fact; f0[[1]] <- 3; f0; dput(f0) # use [[ setter
[1] 4
Levels: 2 3 4
structure(3, .Label = c("2", "3", "4"), class =
"factor") # didn't
convert to 3L> f0[1]
Error in class(y) <- oldClass(x) :
adding class "factor" to an invalid object
I suppose f0[1] and f0[[1]] fail here because the underlying vector
must be integer and not numeric? If so, why didn't assigning to
f0[[1]] cause an error? And why didn't printing f0 cause the same
error?
Here are some more examples. Consider
fac <-
factor(c("b","a","c"),levels=c("b","c","a"))
f <- fac; f[1] <- "c"; dput(f)
# structure(c(2L, 3L, 2L), .Label = c("b", "c",
"a"), class = "factor")
#### OK, implicit conversion of "c" to factor(c) was performed
f <- fac; f[1] <- 25; dput(f)
# Warning message:
# In `[<-.factor`(`*tmp*`, 1, value = 25) :
# invalid factor level, NAs generated
# structure(c(NA, 3L, 2L), .Label = c("b", "c",
"a"), class = "factor")
#### OK, error given for invalid value, which becomes an NA
#### Same thing happens for f[1]<-"foo"
So far, so good. Now compare to what happens with fac[[...]] <- ...
f <- fac; f[[1]] <- 25; dput(f)
# structure(c(25, 3, 2), .Label = c("b", "c",
"a"), class = "factor")
#### No error given, but invalid factor generated
f <- fac; f[[1]] <- "c"; dput(f)
# structure(c("c", "3", "2"), .Label =
c("b", "c", "a"), class = "factor")
#### No conversion performed; no error given; invalid factor generated
f
# [1] <NA> <NA> <NA>
# Levels: b c a
#### Prints as though it were factor(c(NA,NA,NA)) with no warning/error
f[]
# Error in class(y) <- oldClass(x) :
# adding class "factor" to an invalid object
#### But f[] gives an error
#### Same error with f[1] and f[[1]]
Another interesting case is f[1] <- list(NULL) -- which correctly
gives an error -- versus f[[1]] <- list(), which gives no error but
results in an f which is not a factor at all:
f <- fac; f[[1]]<-list(); class(f); dput(f)
[1] "list"
list(list(), 3L, 2L)
I can see that being able to modify the underlying vector of a classed
object directly would be very valuable functionality, but there is an
assymmetry here: f[[1]]<- modifies the underlying vector, but f[[1]]
accesses the classed vector. Presumably you need to do
unclass(f)[[1]] to see the underlying value. But on the other hand,
unclass doesn't have a setter (`unclass<-`), so you can't say
unclass(f)[[1]] <- ...
I have not been able to find documentation of all this in the R
Language Definition or in the man page for [/[[, but perhaps I'm
looking in the wrong place?
-s