Barry Rowlingson
2016-Mar-16 17:17 UTC
[R] How to reach the column names in a huge .RData file without loading it
You *might* be able to get them from the raw file... First, I don't quite know what "colnames" of an .RData file means. "colnames" are the column names of a matrix (or data frame), so I'll assume your .RData file contains exactly one data frame and you want to column names of it. So let's create one of those: mydataframe = data.frame(mylongnamehere=runif(3), anotherlongname=runif(3), z=runif(3), y=runif(3), aasdkjhasdkjhaskdj=runif(3)) save(mydataframe, file="./test.RData") Now I'm going to use some Unix utilities to see if there's any identifiable strings in the file. .RData files are by default compressed using `gzip`, so I'll `gunzip` them and pipe it into `strings`: $ gunzip -c test.RData | strings -t d 0 RDX2 35 mydataframe 230 names 251 mylongnamehere 273 anotherlongname 314 aasdkjhasdkjhaskdj 347 row.names 389 class 410 data.frame - thats found the object name (mydataframe) and most of the column names except the short ones, which are too short for `strings` to recognise. But if your names are long enough (4 or more chars, I think) they'll show up. Of course you'll have to filter them out from all the other string output, but they should all appear shortly after the word "names", since the colnames of a data frame are the "names" attribute of the data. If you don't have a Unix or Mac machine handy you can get these utilities on Windows via Cygwin but that's another story... Barry On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami <lid.zigh at gmail.com> wrote:> Hi, > I have a huge .RData file and I need just to get the colnames of it. so is > there any way to reach the column names without loading or reading the > whole file? > Since the file is so big and I need to repeat this process several times, > so it takes so long to load the file first and then take the colnames! > > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Richard M. Heiberger
2016-Mar-16 17:38 UTC
[R] How to reach the column names in a huge .RData file without loading it
Barry's solution works with Windows without cygwin. You do need Rtools, available from the Windows page on CRAN Rtools does not have "gunzip", but that is just an abbreviation for "gzip -d". x:\HOME\rmh\HH-R.package>path path PATH=c:\Progra~2\Rtools\bin;c:\Progra~2\Rtools\gcc-4.6.3\bin;c:\progra~1\R\R-3.2.3\bin\x64;c:\Progra~1\MikTeX~1.9\miktex\bin\x64;c:\windows;c:\windows\system32 x:\HOME\rmh\HH-R.package>gzip -d -c c:\Users\rmh.DESKTOP-60G4CCO\test.RData | strings -t d gzip -d -c c:\Users\rmh.DESKTOP-60G4CCO\test.RData | strings -t d 0 RDX2 35 mydataframe 230 names 251 mylongnamehere 273 anotherlongname 314 aasdkjhasdkjhaskdj 347 row.names 389 class 410 data.frame On Wed, Mar 16, 2016 at 1:17 PM, Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote:> You *might* be able to get them from the raw file... > > First, I don't quite know what "colnames" of an .RData file means. > "colnames" are the column names of a matrix (or data frame), so I'll > assume your .RData file contains exactly one data frame and you want > to column names of it. > > So let's create one of those: > > > mydataframe = data.frame(mylongnamehere=runif(3), > anotherlongname=runif(3), z=runif(3), y=runif(3), > aasdkjhasdkjhaskdj=runif(3)) > save(mydataframe, file="./test.RData") > > Now I'm going to use some Unix utilities to see if there's any > identifiable strings in the file. .RData files are by default > compressed using `gzip`, so I'll `gunzip` them and pipe it into > `strings`: > > $ gunzip -c test.RData | strings -t d > 0 RDX2 > 35 mydataframe > 230 names > 251 mylongnamehere > 273 anotherlongname > 314 aasdkjhasdkjhaskdj > 347 row.names > 389 class > 410 data.frame > > > - thats found the object name (mydataframe) and most of the column > names except the short ones, which are too short for `strings` to > recognise. But if your names are long enough (4 or more chars, I > think) they'll show up. > > Of course you'll have to filter them out from all the other string > output, but they should all appear shortly after the word "names", > since the colnames of a data frame are the "names" attribute of the > data. > > If you don't have a Unix or Mac machine handy you can get these > utilities on Windows via Cygwin but that's another story... > > Barry > > > > > > > > > On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami <lid.zigh at gmail.com> wrote: >> Hi, >> I have a huge .RData file and I need just to get the colnames of it. so is >> there any way to reach the column names without loading or reading the >> whole file? >> Since the file is so big and I need to repeat this process several times, >> so it takes so long to load the file first and then take the colnames! >> >> Thanks >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jan Kim
2016-Mar-16 17:40 UTC
[R] How to reach the column names in a huge .RData file without loading it
Barry: that's an interesting hack. I do feel compelled to make two comments, though, regarding the general issue rather than the scraping idea: (1) If your situation is that that image (.RData file) is the only copy of the data, you'll need to rescue the data from that as soon as possible anyway. Something like load(".RData"); write.csv(mydataframe, file = "mydata.csv"); should do this trick. It will be slow, but you'll need to do it just once, so you might as well enjoy your coffee while you wait. From that point on, work with the mydata.csv file for getting at the colnames (and anything else as well). (2) If there's any chance / risk that scraping data off images is not a one-off, the time to prevent that from catching on is now. If data is of any value at all, it should be handled in a sane, portable, textual format. For tabular data, csv is normally adequate or at least good enough, but .RData images are never a good idea. Best regards, Jan P.S.: I've seen .RData images containing many months worth of interactive work, and multiple variants of data frames in variables with more or less similar names, so the set of strings scraped off these will be rather more bewildering than in Barry's clean example. On Wed, Mar 16, 2016 at 05:17:25PM +0000, Barry Rowlingson wrote:> You *might* be able to get them from the raw file... > > First, I don't quite know what "colnames" of an .RData file means. > "colnames" are the column names of a matrix (or data frame), so I'll > assume your .RData file contains exactly one data frame and you want > to column names of it. > > So let's create one of those: > > > mydataframe = data.frame(mylongnamehere=runif(3), > anotherlongname=runif(3), z=runif(3), y=runif(3), > aasdkjhasdkjhaskdj=runif(3)) > save(mydataframe, file="./test.RData") > > Now I'm going to use some Unix utilities to see if there's any > identifiable strings in the file. .RData files are by default > compressed using `gzip`, so I'll `gunzip` them and pipe it into > `strings`: > > $ gunzip -c test.RData | strings -t d > 0 RDX2 > 35 mydataframe > 230 names > 251 mylongnamehere > 273 anotherlongname > 314 aasdkjhasdkjhaskdj > 347 row.names > 389 class > 410 data.frame > > > - thats found the object name (mydataframe) and most of the column > names except the short ones, which are too short for `strings` to > recognise. But if your names are long enough (4 or more chars, I > think) they'll show up. > > Of course you'll have to filter them out from all the other string > output, but they should all appear shortly after the word "names", > since the colnames of a data frame are the "names" attribute of the > data. > > If you don't have a Unix or Mac machine handy you can get these > utilities on Windows via Cygwin but that's another story... > > Barry > > > > > > > > > On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami <lid.zigh at gmail.com> wrote: > > Hi, > > I have a huge .RData file and I need just to get the colnames of it. so is > > there any way to reach the column names without loading or reading the > > whole file? > > Since the file is so big and I need to repeat this process several times, > > so it takes so long to load the file first and then take the colnames! > > > > Thanks > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- +- Jan T. Kim -------------------------------------------------------+ | email: jttkim at gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----*
Duncan Murdoch
2016-Mar-16 19:18 UTC
[R] How to reach the column names in a huge .RData file without loading it
On 16/03/2016 1:40 PM, Jan Kim wrote:> Barry: that's an interesting hack. > > I do feel compelled to make two comments, though, regarding the > general issue rather than the scraping idea: > > (1) If your situation is that that image (.RData file) is the only > copy of the data, you'll need to rescue the data from that as soon as > possible anyway. Something like > > load(".RData"); > write.csv(mydataframe, file = "mydata.csv"); > > should do this trick. It will be slow, but you'll need to do it just > once, so you might as well enjoy your coffee while you wait. From that > point on, work with the mydata.csv file for getting at the colnames > (and anything else as well). > > (2) If there's any chance / risk that scraping data off images is not > a one-off, the time to prevent that from catching on is now. If data is > of any value at all, it should be handled in a sane, portable, textual > format. For tabular data, csv is normally adequate or at least good > enough, but .RData images are never a good idea.I agree with the sentiment, but not with the choice of .csv as a "sane, portable, textual format". CSV has no type information included, so strings that contain only digits can turn into numbers (and get rounded in the process), things that look like dates can get converted to different formats, etc. The .RData format has the disadvantages of being hard to use outside R, but at least it is usable in R. I don't know what I'd recommend if I wanted a portable textual format. JSON is close, but it can't handle the full range of data that R can handle (e.g. no Inf). dput() on a dataframe is text, but nothing but R can read it. Duncan Murdoch> > Best regards, Jan > > P.S.: I've seen .RData images containing many months worth of interactive > work, and multiple variants of data frames in variables with more or less > similar names, so the set of strings scraped off these will be rather more > bewildering than in Barry's clean example. > > > On Wed, Mar 16, 2016 at 05:17:25PM +0000, Barry Rowlingson wrote: > > You *might* be able to get them from the raw file... > > > > First, I don't quite know what "colnames" of an .RData file means. > > "colnames" are the column names of a matrix (or data frame), so I'll > > assume your .RData file contains exactly one data frame and you want > > to column names of it. > > > > So let's create one of those: > > > > > > mydataframe = data.frame(mylongnamehere=runif(3), > > anotherlongname=runif(3), z=runif(3), y=runif(3), > > aasdkjhasdkjhaskdj=runif(3)) > > save(mydataframe, file="./test.RData") > > > > Now I'm going to use some Unix utilities to see if there's any > > identifiable strings in the file. .RData files are by default > > compressed using `gzip`, so I'll `gunzip` them and pipe it into > > `strings`: > > > > $ gunzip -c test.RData | strings -t d > > 0 RDX2 > > 35 mydataframe > > 230 names > > 251 mylongnamehere > > 273 anotherlongname > > 314 aasdkjhasdkjhaskdj > > 347 row.names > > 389 class > > 410 data.frame > > > > > > - thats found the object name (mydataframe) and most of the column > > names except the short ones, which are too short for `strings` to > > recognise. But if your names are long enough (4 or more chars, I > > think) they'll show up. > > > > Of course you'll have to filter them out from all the other string > > output, but they should all appear shortly after the word "names", > > since the colnames of a data frame are the "names" attribute of the > > data. > > > > If you don't have a Unix or Mac machine handy you can get these > > utilities on Windows via Cygwin but that's another story... > > > > Barry > > > > > > > > > > > > > > > > > > On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami <lid.zigh at gmail.com> wrote: > > > Hi, > > > I have a huge .RData file and I need just to get the colnames of it. so is > > > there any way to reach the column names without loading or reading the > > > whole file? > > > Since the file is so big and I need to repeat this process several times, > > > so it takes so long to load the file first and then take the colnames! > > > > > > Thanks > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >