Marine Rohmer
2015-Jan-09 08:32 UTC
[R] R package creation: Confused about how to handle text data
Dear R team, I am currently creating my first R package and I am getting confused with the way to handle text data. I've well understood that datasets have to be placed in the ./data subdirectory, saved into one of the .rda, .txt or .csv format, and then can be loaded with the data() function. However, the data() function is the same as the read.table() function, that is to say it reads the text file (.txt or .csv) and loads it into a data.frame. That is not what I want : I just want to access to my text file (.txt or .csv) and load it with my own reading function that I developped in my R source code of the package. (it's a special reading function with special checks according to what the package expects). After reading plenty of topics on the web, I found the system.file() function useful and used it in this way, in my test file: current_dir=system.file(package="MYPACKAGE") myTextFile=paste(current_dir,"/data/myTextFile.csv",sep="") # Then I use my own function to load it: myOwnReadingFunction(myTextFile) This worked fine and my "R CMD check" went well.... Until I follow the advise of the R CMD check log: "Note: significantly better compression could be obtained by using R CMD build --resave-data" Tiping "R CMD build --resave-data" before "R CMD check" moves my .csv files into .csv.bz2 and .csv.xz! So that what I wrote in my test file (see above) now does not work. Of course I guess I could, for example, change it by: myTextFile=paste(current_dir,"/data/myTextFile.csv.bz2",sep="") But I really wonder if this is the good way to handle text files in a R package. This seems to me a little "tricky". Is this a "good practice"? Isn't another way to simply access to text files without loading it? Have someone already had the same situation? Bonus question: why doesn't "R CMD build --resave data" use the same compression for my two .csv files? (that is: .bz2 and .xz) Thank you in advance for your advices, Kind regards, Marine
Uwe Ligges
2015-Jan-10 15:42 UTC
[R] R package creation: Confused about how to handle text data
You do not want to access these data files by standard mechnisms, and that's what the ./data folder is good for. See Writing R Extensions, it suggests to use the ./extdata directory in your case, then the data is unchanged and you can access those data by your own functions. Best, Uwe Ligges On 09.01.2015 09:32, Marine Rohmer wrote:> Dear R team, > > I am currently creating my first R package and I am getting confused > with the way to handle text data. > I've well understood that datasets have to be placed in the ./data > subdirectory, saved into one of the .rda, .txt or .csv format, and then > can be loaded with the data() function. > > However, the data() function is the same as the read.table() function, > that is to say it reads the text file (.txt or .csv) and loads it into a > data.frame. > That is not what I want : I just want to access to my text file (.txt or > .csv) and load it with my own reading function that I developped in my R > source code of the package. (it's a special reading function with > special checks according to what the package expects). > > After reading plenty of topics on the web, I found the system.file() > function useful and used it in this way, in my test file: > > current_dir=system.file(package="MYPACKAGE") > myTextFile=paste(current_dir,"/data/myTextFile.csv",sep="") > # Then I use my own function to load it: > myOwnReadingFunction(myTextFile) > > This worked fine and my "R CMD check" went well.... Until I follow the > advise of the R CMD check log: > "Note: significantly better compression could be obtained by using R CMD > build --resave-data" > Tiping "R CMD build --resave-data" before "R CMD check" moves my .csv > files into .csv.bz2 and .csv.xz! > So that what I wrote in my test file (see above) now does not work. > > Of course I guess I could, for example, change it by: > myTextFile=paste(current_dir,"/data/myTextFile.csv.bz2",sep="") > > But I really wonder if this is the good way to handle text files in a R > package. This seems to me a little "tricky". > Is this a "good practice"? > Isn't another way to simply access to text files without loading it? > Have someone already had the same situation? > > Bonus question: why doesn't "R CMD build --resave data" use the same > compression for my two .csv files? (that is: .bz2 and .xz) > > Thank you in advance for your advices, > Kind regards, > > Marine > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Marine Rohmer
2015-Jan-12 08:42 UTC
[R] R package creation: Confused about how to handle text data
Hi Uwe and Hadley, Thank you a lot for your answer. The ./inst/extdata folder seems to work fine, accessing my text files in this way : myFile=system.file("extdata", "myFile.csv", package="MyPackage") I've read the Writing R Extension so many times without understanding well the 1.1.6 section, so thank you for your help, now this seems obvious ! Kind regards, Marine Le 2015-01-10 16:42, Uwe Ligges a ?crit?:> You do not want to access these data files by standard mechnisms, and > that's what the ./data folder is good for. See Writing R Extensions, > it suggests to use the ./extdata directory in your case, then the data > is unchanged and you can access those data by your own functions. > > Best, > Uwe Ligges > > > > > > On 09.01.2015 09:32, Marine Rohmer wrote: >> Dear R team, >> >> I am currently creating my first R package and I am getting confused >> with the way to handle text data. >> I've well understood that datasets have to be placed in the ./data >> subdirectory, saved into one of the .rda, .txt or .csv format, and >> then >> can be loaded with the data() function. >> >> However, the data() function is the same as the read.table() function, >> that is to say it reads the text file (.txt or .csv) and loads it into >> a >> data.frame. >> That is not what I want : I just want to access to my text file (.txt >> or >> .csv) and load it with my own reading function that I developped in my >> R >> source code of the package. (it's a special reading function with >> special checks according to what the package expects). >> >> After reading plenty of topics on the web, I found the system.file() >> function useful and used it in this way, in my test file: >> >> current_dir=system.file(package="MYPACKAGE") >> myTextFile=paste(current_dir,"/data/myTextFile.csv",sep="") >> # Then I use my own function to load it: >> myOwnReadingFunction(myTextFile) >> >> This worked fine and my "R CMD check" went well.... Until I follow the >> advise of the R CMD check log: >> "Note: significantly better compression could be obtained by using R >> CMD >> build --resave-data" >> Tiping "R CMD build --resave-data" before "R CMD check" moves my .csv >> files into .csv.bz2 and .csv.xz! >> So that what I wrote in my test file (see above) now does not work. >> >> Of course I guess I could, for example, change it by: >> myTextFile=paste(current_dir,"/data/myTextFile.csv.bz2",sep="") >> >> But I really wonder if this is the good way to handle text files in a >> R >> package. This seems to me a little "tricky". >> Is this a "good practice"? >> Isn't another way to simply access to text files without loading it? >> Have someone already had the same situation? >> >> Bonus question: why doesn't "R CMD build --resave data" use the same >> compression for my two .csv files? (that is: .bz2 and .xz) >> >> Thank you in advance for your advices, >> Kind regards, >> >> Marine >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.