Douglas Grove <dgrove at fhcrc.org> writes:
> Hi,
>
> I'm trying to understand a behaviour that I have encountered
> and can't fathom.
>
>
> Here's some code I will use to illustrate the behaviour:
>
> # start with some data frame "a" having some named columns
> a <- data.frame(a=rep(1,3),c=rep(2,3),d=rep(3,3),e=rep(4,3))
>
> # create a subset of the original data frame, but include a
> # name "b" that is not present in my original data frame
> b <- a[,c("a","b","c")]
>
>
> ## Up until now no errors are issued, but the following commands
> ## will give the error shown:
>
> b[1,] ## "Error in x[[j]] : subscript out of bounds"
> b[1,2] ## "Error in "names<-.default"(*tmp*, value =
cols) :
> ## names attribute must be the same length as the vector"
>
>
> Can anyone explain to me the meaning of these error messages in terms
> of R is actually doing? These error messages had me baffled and
> it took me hours to track down that the source of the error was an
> incorrect column name in my data frame subsetting.
Looks like a (semi-)bug. Indexing outside of the data frame creates a
"column" which is really the single value NULL, e.g.
> dput(a[,4:5])
structure(list(e = c(4, 4, 4), "NA" = NULL), .Names = c("e",
NA), row.names = c("1", "2", "3"), class =
"data.frame")
This will print because the format.data.frame called inside
print.data.frame will recycle the NULL and give you
> a[,4:5]
e NA
1 4 NULL
2 4 NULL
3 4 NULL
However, it confuses the h*ck out of "[.data.frame"
> (a[,4:5])[2]
Error in "[.data.frame"((a[, 4:5]), 2) : undefined columns
selected> (a[,4:5])[,2]
NULL> (a[,4:5])[,1]
[1] 4 4 4
and also the examples you found. However, the main issue is that you
have managed to construct a corrupt data frame. So indexing outside
the array should probably either give an error or return a column of
NA.
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907