The StackOverflow post https://stackoverflow.com/a/69767361/2554330 discusses a dataframe which has a named numeric column of length 1488 that has 744 names. I don't think this is ever legal, but am I wrong about that? The `dat.rds` file mentioned in the post is temporarily available online in case anyone else wants to examine it. Assuming that the file contains a badly formed object, I wonder if readRDS() should do some sanity checks as it reads. Duncan Murdoch
>>>>> Duncan Murdoch >>>>> on Mon, 1 Nov 2021 06:36:17 -0400 writes:> The StackOverflow post > https://stackoverflow.com/a/69767361/2554330 discusses a > dataframe which has a named numeric column of length 1488 > that has 744 names. I don't think this is ever legal, but > am I wrong about that? > The `dat.rds` file mentioned in the post is temporarily > available online in case anyone else wants to examine it. > Assuming that the file contains a badly formed object, I > wonder if readRDS() should do some sanity checks as it > reads. > Duncan Murdoch Good question. In the mean time, I've also added a bit on the SO page above.. e.g. --------------------------------------------------------------------------- d <- readRDS("<.....>dat.rds") str(d) ## 'data.frame': 1488 obs. of 4 variables: ## $ facet_var: chr "AUT" "AUT" "AUT" "AUT" ... ## $ date : Date, format: "2020-04-26" "2020-04-27" ... ## $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ... ## $ score : Named num 2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ... ## ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" "new_confirmed10" "new_confirmed10" ... ds <- d$score c(length(ds), length(names(ds))) ## 1488 744 dput(ds) # -> ## *** caught segfault *** ## address (nil), cause 'memory not mapped' --------------------------------------------------------------------------- Hence "proving" that the dat.rds really contains an invalid object, when simple dput(.) directly gives a segmentation fault. I think we are aware that using C code and say .Call(..) one can create all kinds of invalid objects "easily".. and I think it's clear that it's not feasible to check for validity of such objects "everwhere". Your proposal to have at least our deserialization code used in readRDS() do (at least *some*) validity checks seems good, but maybe we should think of more cases, and / or do such validity checks already during serialization { <-> saveRDS() here } ? .. Such questions then really are for those who understand more than me about (de)serialization in R, its performance bottlenecks etc. Given the speed impact we should probably have such checks *optional* but have them *on* by default e.g., at least for saveRDS() ? Martin
> On 1 Nov 2021, at 11:36 , Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > The StackOverflow post https://stackoverflow.com/a/69767361/2554330 discusses a dataframe which has a named numeric column of length 1488 that has 744 names. I don't think this is ever legal, but am I wrong about that? >It is certainly not easy to create such objects at the R level, e.g.:> x <- 1:10 > names(x) <- 1:10 > length(names(x)) <- 5 > x1 2 3 4 5 <NA> <NA> <NA> <NA> <NA> 1 2 3 4 5 6 7 8 9 10> names(x)[1] "1" "2" "3" "4" "5" NA NA NA NA NA or even> x <- 1:10 > attributes(x)$foo <- 1:5 > x[1] 1 2 3 4 5 6 7 8 9 10 attr(,"foo") [1] 1 2 3 4 5> names(attributes(x)) <- "names" > x1 2 3 4 5 <NA> <NA> <NA> <NA> <NA> 1 2 3 4 5 6 7 8 9 10> dput(x)structure(1:10, .Names = c("1", "2", "3", "4", "5", NA, NA, NA, NA, NA)) of course, at the C level, everything is possible...> The `dat.rds` file mentioned in the post is temporarily available online in case anyone else wants to examine it. > > Assuming that the file contains a badly formed object, I wonder if readRDS() should do some sanity checks as it reads. > > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com