andrewH
2014-Jan-29 04:43 UTC
[R] Diagnostic and helper functions for defective & hard-to-import files
Hi Folks! I have been writing a small set of utilities for dealing with files that are hard to open correctly for one reason or another, especially because they are too big for memory, non-rectangular, or contain odd characters or unexpected codings, or all of these things together. Today it suddenly hit me that this has probably been done, done better, and upgraded to package form a dozen times already. There were pointers to a couple functions useful in this regard in the Core Import/Export document. But my effort to come up with search terms that were productive of such packages was unsuccessful. I would be grateful if someone would point me toward such a package or packages if they exist. Warmest regards, andrewH
David Winsemius
2014-Jan-29 04:56 UTC
[R] Diagnostic and helper functions for defective & hard-to-import files
On Jan 28, 2014, at 8:43 PM, andrewH wrote:> Hi Folks! > I have been writing a small set of utilities for dealing with files that are > hard to open correctly for one reason or another, especially because they > are too big for memory, non-rectangular, or contain odd characters or > unexpected codings, or all of these things together. Today it suddenly hit > me that this has probably been done, done better, and upgraded to package > form a dozen times already. There were pointers to a couple functions useful > in this regard in the Core Import/Export document. But my effort to come up > with search terms that were productive of such packages was unsuccessful.I don't know of a package to do that. You know the quote from that Russian author whose name I am forgetting (in "Anna Karinena" perhaps) about happy families being all the same but unhappy families being impossible to classify. I think it applies to datasets as well. There are too many different dataset pathologies to allow a neat packaging approach. My approach has been to study the options in read.table very carefully and if that isinsufficient look ar either readLines or scan as options. It is very useful to be able to use `count.fields` with different parameter settings of "quotes" and comment.char". Wrapping it in table() can deliver a very compact, useful result. And don't forget to search the Archives if you have a regular but non-rectangular arrangement> > I would be grateful if someone would point me toward such a package or > packages if they exist.-- David Winsemius Alameda, CA, USA
Duncan Murdoch
2014-Jan-29 11:53 UTC
[R] Diagnostic and helper functions for defective & hard-to-import files
On 14-01-28 11:43 PM, andrewH wrote:> Hi Folks! > I have been writing a small set of utilities for dealing with files that are > hard to open correctly for one reason or another, especially because they > are too big for memory, non-rectangular, or contain odd characters or > unexpected codings, or all of these things together. Today it suddenly hit > me that this has probably been done, done better, and upgraded to package > form a dozen times already. There were pointers to a couple functions useful > in this regard in the Core Import/Export document. But my effort to come up > with search terms that were productive of such packages was unsuccessful. > > I would be grateful if someone would point me toward such a package or > packages if they exist. > The hexView package is useful for figuring out what's there after you have trouble reading something, or figuring out how to read an unknown file type. The showNonASCII and showNonASCIIFile functions in the tools package are also helpful. I don't know of other examples. Duncan Murdoch