Hi All, There is a thread about the use of save(), load(), saveRDS(), and loadRDS(). It led me to think about a question regarding them. In my personal work, I prefer using saveRDS() and loadRDS() as I don't like the risk of overwriting anything in the global environment. I also like the freedom to name an object when reading it from a file. However, for teaching, I have to teach save() and load() because, in my discipline, it is common for researchers to share their datasets on the internet using the format saved by save(), and so students need to know how to use load() and what will happen when using it. Actually, I can't recall encountering datasets shared by the .rds format. I have been wondering why save() was usually used in that case. That discussion led me to read the help pages again and I noticed the following warning, from the help page of saveRDS(): "Files produced by saveRDS (or serialize to a file connection) are not suitable as an interchange format between machines, for example to download from a website. The files produced by save <http://127.0.0.1:18888/library/base/help/save> have a header identifying the file type and so are better protected against erroneous use." When will the problem mentioned in the warning occur? That is, when will a file saved by saveRDS() not be read correctly? Saved in Linux and then read in Windows? Is it possible to create a reproducible error? Regards, Shu Fai [[alternative HTML version deleted]]
On Thu, 28 Sep 2023 23:46:45 +0800 Shu Fai Cheung <shufai.cheung at gmail.com> wrote:> In my personal work, I prefer using saveRDS() and loadRDS() as I > don't like the risk of overwriting anything in the global > environment.There's the load(file, e <- new.env()) idiom, but that's potentially a lot to type. Confusingly, ?save also says:>> For saving single R objects, ?saveRDS()? is mostly preferable to >> ?save()?, notably because of the _functional_ nature of ?readRDS()?, >> as opposed to ?load()?.> The files produced by save > <http://127.0.0.1:18888/library/base/help/save> have a header > identifying the file type and so are better protected against > erroneous use."This header is also mentioned elsewhere in ?saveRDS:>> ?save? writes a single line header (typically ?"RDXs\n"?)The difference between the save() header and the serialize() header is that the save() header is designed to be read independently from the machine running the code: it's exactly 5 bytes; some precisely defined combinations of those 5 bytes identify how the rest of the file should be interpreted (nowadays, it's likely either "XDR format version 2" or "XDR format version 3"), and the rest of them cause an error. The serialize() header does contain enough information describing it (there's the first byte choosing between ASCII/XDR/native binary and a number of encoded integers describing the format version and the version of R you need to parse it), but it's stored in terms of serialized objects, so if you cannot for some reason decode them properly, you won't be able to read the header. A little bit of Catch-22.> When will the problem mentioned in the warning occur? That is, when > will a file saved by saveRDS() not be read correctly?One example I can offer is when a dataset is saved using serialize(xdr = FALSE) (which is not reachable using saveRDS()). The resulting file format would be dependent on the native byte order of the CPU in your computer. (Nowadays it's really hard to encounter a CPU that doesn't use little-endian byte order, so this is doubly unlikely to happen in practice.) Both save() and saveRDS() set xdr = TRUE and convert the data to "network byte order" (big-endian) when saving and back - when loading. The warning is relatively fresh (May 2021). Perhaps Prof. Brian D. Ripley (who made that change) will be able to explain it better. -- Best regards, Ivan
One more function to consider using and teaching is the attach function. If you use `attach` with a the name of a file that was created using `save` then it creates a new, empty environment, `load`s the contents of the file into the environment, and attached the environment to the search path (by default in position 2). This means that the objects are all available to use, but will not overwrite any objects of the same name in your workspace. The command `ls(2)` quickly shows the names of the objects that were read in. You can use simple assignment to copy and optionally rename any of the objects into your workspace, or just leave them in the attached workspace (just recognize what will happen if you have multiple objects with the same name). Once you have copied or used the objects of interest, you can simply `detach` the environment. If you are going to teach the use of `attach` I would suggest emphasizing the 2nd paragraph under the heading "Good practice" on the help page for attach. On Thu, Sep 28, 2023 at 9:48?AM Shu Fai Cheung <shufai.cheung at gmail.com> wrote:> > Hi All, > > There is a thread about the use of save(), load(), saveRDS(), and > loadRDS(). It led me to think about a question regarding them. > > In my personal work, I prefer using saveRDS() and loadRDS() as I don't like > the risk of overwriting anything in the global environment. I also like the > freedom to name an object when reading it from a file. > > However, for teaching, I have to teach save() and load() because, in my > discipline, it is common for researchers to share their datasets on the > internet using the format saved by save(), and so students need to know how > to use load() and what will happen when using it. Actually, I can't recall > encountering datasets shared by the .rds format. I have been wondering why > save() was usually used in that case. > > That discussion led me to read the help pages again and I noticed the > following warning, from the help page of saveRDS(): > > "Files produced by saveRDS (or serialize to a file connection) are not > suitable as an interchange format between machines, for example to download > from a website. The files produced by save > <http://127.0.0.1:18888/library/base/help/save> have a header identifying > the file type and so are better protected against erroneous use." > > When will the problem mentioned in the warning occur? That is, when will a > file saved by saveRDS() not be read correctly? Saved in Linux and then read > in Windows? Is it possible to create a reproducible error? > > Regards, > Shu Fai > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Hello, I am very sad to let you know that my husband Jim died on 18th September. I apologise for not letting you know earlier but I had trouble finding the password for his phone. Kind regards, Juel On Fri, 29 Sep 2023, 01:48 Shu Fai Cheung <shufai.cheung at gmail.com wrote:> Hi All, > > There is a thread about the use of save(), load(), saveRDS(), and > loadRDS(). It led me to think about a question regarding them. > > In my personal work, I prefer using saveRDS() and loadRDS() as I don't like > the risk of overwriting anything in the global environment. I also like the > freedom to name an object when reading it from a file. > > However, for teaching, I have to teach save() and load() because, in my > discipline, it is common for researchers to share their datasets on the > internet using the format saved by save(), and so students need to know how > to use load() and what will happen when using it. Actually, I can't recall > encountering datasets shared by the .rds format. I have been wondering why > save() was usually used in that case. > > That discussion led me to read the help pages again and I noticed the > following warning, from the help page of saveRDS(): > > "Files produced by saveRDS (or serialize to a file connection) are not > suitable as an interchange format between machines, for example to download > from a website. The files produced by save > <http://127.0.0.1:18888/library/base/help/save> have a header identifying > the file type and so are better protected against erroneous use." > > When will the problem mentioned in the warning occur? That is, when will a > file saved by saveRDS() not be read correctly? Saved in Linux and then read > in Windows? Is it possible to create a reproducible error? > > Regards, > Shu Fai > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]