Dear all, I am stuck the reading of a file which has 100s of rows and variable column counts. The tab-limited data file looks something like: Some_Text 1 3 123 1534 -119 1010 178 Some_Taxt 1 3 133 1434 -219 1010 178 Some_Tsxt 1 3 244 1334 -319 1010 178 Some_Tfxt 1 3 153 1234 -419 1010 178 Some_Trxt 1 3 163 1234 -519 1010 178 When I try reading it using: rawData=read.table("Datafile.dat", fill=FALSE, sep="\t", header=FALSE); I get something like Some_Text 1 3 123 1534 -119 1010 178 Some_Taxt 1 3 133 1434 -219 1010 178 1010 Some_Tsxt 1 3 244 1334 -319 1010 178 Some_Tfxt 1 3 153 1234 -419 1010 178 -419 Some_Trxt 1 3 163 1234 -519 1010 178 I am not sure what this is. It also appear to be quenching some of the columns, which may be the problem. My current maximum is around 250, but this is not determined. When importing it the maximum table size seems to stop at 146. Has anyone seen this before? Many thanks, Ingo [[alternative HTML version deleted]]
Can you try to change the extention of your file (make it a .txt or open it in Excel and save it as a .csv file) - and then read it in? Dimitri On Tue, Feb 15, 2011 at 9:12 AM, Ingo Reinhold <ingor at kth.se> wrote:> Dear all, > > I am stuck the reading of a file which has 100s of rows and variable column counts. > > The tab-limited data file looks something like: > > Some_Text ? ? 1 ? ?3 ? ?123 ? ?1534 ? ?-119 ? ?1010 ? ?178 > Some_Taxt ? ? 1 ? ?3 ? ?133 ? ?1434 ? ?-219 ? ?1010 ? ?178 > Some_Tsxt ? ? 1 ? ?3 ? ?244 ? ?1334 ? ?-319 ? ?1010 ? ?178 > Some_Tfxt ? ? 1 ? ?3 ? ?153 ? ?1234 ? ?-419 ? ?1010 ? ?178 > Some_Trxt ? ? 1 ? ?3 ? ?163 ? ?1234 ? ?-519 ? ?1010 ? ?178 > > When I try reading it using: > > rawData=read.table("Datafile.dat", fill=FALSE, sep="\t", header=FALSE); > > I get something like > > Some_Text ? ? 1 ? ?3 ? ?123 ? ?1534 ? ?-119 ? ?1010 ? ?178 > Some_Taxt ? ? 1 ? ?3 ? ?133 ? ?1434 ? ?-219 ? ?1010 ? ?178 > 1010 > Some_Tsxt ? ? 1 ? ?3 ? ?244 ? ?1334 ? ?-319 ? ?1010 ? ?178 > Some_Tfxt ? ? 1 ? ?3 ? ?153 ? ?1234 ? ?-419 ? ?1010 ? ?178 > -419 > Some_Trxt ? ? 1 ? ?3 ? ?163 ? ?1234 ? ?-519 ? ?1010 ? ?178 > > I am not sure what this is. It also appear to be quenching some of the columns, which may be the problem. My current maximum is around 250, but this is not determined. When importing it the maximum table size seems to stop at 146. > > Has anyone seen this before? > > Many thanks, > > Ingo > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitri Liakhovitski Ninah Consulting www.ninah.com
Also look at the "flush" and "fill" arguments on read.table to see if that helps. On Tue, Feb 15, 2011 at 9:12 AM, Ingo Reinhold <ingor at kth.se> wrote:> Dear all, > > I am stuck the reading of a file which has 100s of rows and variable column counts. > > The tab-limited data file looks something like: > > Some_Text ? ? 1 ? ?3 ? ?123 ? ?1534 ? ?-119 ? ?1010 ? ?178 > Some_Taxt ? ? 1 ? ?3 ? ?133 ? ?1434 ? ?-219 ? ?1010 ? ?178 > Some_Tsxt ? ? 1 ? ?3 ? ?244 ? ?1334 ? ?-319 ? ?1010 ? ?178 > Some_Tfxt ? ? 1 ? ?3 ? ?153 ? ?1234 ? ?-419 ? ?1010 ? ?178 > Some_Trxt ? ? 1 ? ?3 ? ?163 ? ?1234 ? ?-519 ? ?1010 ? ?178 > > When I try reading it using: > > rawData=read.table("Datafile.dat", fill=FALSE, sep="\t", header=FALSE); > > I get something like > > Some_Text ? ? 1 ? ?3 ? ?123 ? ?1534 ? ?-119 ? ?1010 ? ?178 > Some_Taxt ? ? 1 ? ?3 ? ?133 ? ?1434 ? ?-219 ? ?1010 ? ?178 > 1010 > Some_Tsxt ? ? 1 ? ?3 ? ?244 ? ?1334 ? ?-319 ? ?1010 ? ?178 > Some_Tfxt ? ? 1 ? ?3 ? ?153 ? ?1234 ? ?-419 ? ?1010 ? ?178 > -419 > Some_Trxt ? ? 1 ? ?3 ? ?163 ? ?1234 ? ?-519 ? ?1010 ? ?178 > > I am not sure what this is. It also appear to be quenching some of the columns, which may be the problem. My current maximum is around 250, but this is not determined. When importing it the maximum table size seems to stop at 146. > > Has anyone seen this before? > > Many thanks, > > Ingo > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Your example does not seem to accurately mirror your data since there are no variable column counts. If the data is not confidential, it might be a good idea to load a sample of it somewhere were list readers can get it and examine the actual file layout. [url=http://www.mediafire.com/][b]Mediafire[/b][/url] is a convenient place. --- On Tue, 2/15/11, Ingo Reinhold <ingor at kth.se> wrote:> From: Ingo Reinhold <ingor at kth.se> > Subject: [R] Variable length datafile import problem > To: "r-help at r-project.org" <r-help at r-project.org> > Received: Tuesday, February 15, 2011, 9:12 AM > Dear all, > > I am stuck the reading of a file which has 100s of rows and > variable column counts. > > The tab-limited data file looks something like: > > Some_Text? ???1? ? 3? > ? 123? ? 1534? ? -119? ? > 1010? ? 178 > Some_Taxt? ???1? ? 3? > ? 133? ? 1434? ? -219? ? > 1010? ? 178 > Some_Tsxt? ???1? ? 3? > ? 244? ? 1334? ? -319? ? > 1010? ? 178 > Some_Tfxt? ???1? ? 3? > ? 153? ? 1234? ? -419? ? > 1010? ? 178 > Some_Trxt? ???1? ? 3? > ? 163? ? 1234? ? -519? ? > 1010? ? 178 > > When I try reading it using: > > rawData=read.table("Datafile.dat", fill=FALSE, sep="\t", > header=FALSE); > > I get something like > > Some_Text? ???1? ? 3? > ? 123? ? 1534? ? -119? ? > 1010? ? 178 > Some_Taxt? ???1? ? 3? > ? 133? ? 1434? ? -219? ? > 1010? ? 178 > 1010 > Some_Tsxt? ???1? ? 3? > ? 244? ? 1334? ? -319? ? > 1010? ? 178 > Some_Tfxt? ???1? ? 3? > ? 153? ? 1234? ? -419? ? > 1010? ? 178 > -419 > Some_Trxt? ???1? ? 3? > ? 163? ? 1234? ? -519? ? > 1010? ? 178 > > I am not sure what this is. It also appear to be quenching > some of the columns, which may be the problem. My current > maximum is around 250, but this is not determined. When > importing it the maximum table size seems to stop at 146. > > Has anyone seen this before? > > Many thanks, > > Ingo > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >
Hi John, seems there is no easy way. I'll just precondition it with AWK as described here http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg53401.html There are some remarks in the thread that R is not supposed to read too large files for "political" reasons. Maybe that's it. Many thanks again for the effort. Ingo ________________________________________ From: John Kane [jrkrideau at yahoo.ca] Sent: Thursday, February 17, 2011 11:54 AM To: Ingo Reinhold Subject: RE: [R] Variable length datafile import problem Generally most of the gurus are in this list. Hopefully someone will take an interest in the problem. I suspect that there may be some kind of weird value in the file that is upsetting in import. Given the results I got when I removed the data past BD and then at AL it seems that the problem might be within this range. You could try removing half the data between those columns and see what happens, then repeat if something turns up. It's tedious but unless someone with a better grasp of variable length data import can help it's the best I can suggest. BTW you only replied to me. You should make sure to cc the list otherwise readers won't realise that I am being of no help. If you still have the problem by Saturday e-mail me or post to the list and I'll try to spent some more time messing about with the problem. Sorry to be of so little help. --- On Thu, 2/17/11, Ingo Reinhold <ingor at kth.se> wrote:> From: Ingo Reinhold <ingor at kth.se> > Subject: RE: [R] Variable length datafile import problem > To: "John Kane" <jrkrideau at yahoo.ca> > Received: Thursday, February 17, 2011, 5:36 AM > Hi John, > > as it seems we're hitting the wall here, can you maybe > recommend another mailing list with "gurus" (as you put it) > that may be able to help? > > Regards, > > Ingo > ________________________________________ > From: John Kane [jrkrideau at yahoo.ca] > Sent: Thursday, February 17, 2011 11:25 AM > To: Ingo Reinhold > Subject: RE: [R] Variable length datafile import problem > > Hi Ingo, > > I've had a bit of time to examine the file and I must say > that, at the moment, I have no idea what is going on. > I tried the old cut the file into pieces trick just came up > with even more anomalous results. > > My first attempt remove all the data past column AL in an > OOo Calc spreadsheet. This created a rectangular > dataset It imported into R with no problem with 38 columns > as expected. > > Then I deleted all the data from the orignal data file > (test.dat) removing all the data past column BD in an OOo > Calc spreadsheet. > > This imported a file with only 38 columns. > > Something very funny is happening but at the moment I have > no > > --- On Wed, 2/16/11, Ingo Reinhold <ingor at kth.se> > wrote: > > > From: Ingo Reinhold <ingor at kth.se> > > Subject: RE: [R] Variable length datafile import > problem > > To: "John Kane" <jrkrideau at yahoo.ca> > > Received: Wednesday, February 16, 2011, 1:59 AM > > Hi John, > > > > V1 should be just a character. However I figured > something > > out myself. The import looks OK in terms of column > when > > adding the flush=TRUE option. > > > > I am still very confused about the dimensions that > the > > imported data shows. Loading my data file into > something > > like OOspreadsheet shows me a maximum of about 245, > which > > does not correspond to the 146 generated by R. Any > idea > > where this saturation comes from? > > > > Thanks, > > > > Ingo > > ________________________________________ > > From: John Kane [jrkrideau at yahoo.ca] > > Sent: Wednesday, February 16, 2011 1:57 AM > > To: Ingo Reinhold > > Subject: RE: [R] Variable length datafile import > problem > > > > Is rawData$V1 intended to be factor or character? > > > > str(rawData) gives > > $ V1 : Factor w/ 54 levels "-232.0","-234.0",..: > 41 > > 41 41 41 41 41 41 41 41 41 ... > > > > If you were not expecting a factor you might try > > options(stringsAsFactors = FALSE) before importing > the > > data. > > > > --- On Tue, 2/15/11, Ingo Reinhold <ingor at kth.se> > > wrote: > > > > > From: Ingo Reinhold <ingor at kth.se> > > > Subject: RE: [R] Variable length datafile import > > problem > > > To: "John Kane" <jrkrideau at yahoo.ca> > > > Received: Tuesday, February 15, 2011, 3:35 PM > > > Dear all, > > > > > > I have changed the file-ending with no change in > the > > > result. I don't think that this should matter. > > > > > > http://dl.dropbox.com/u/2414056/Test.dat > > > is a test file which represent the structure I > am > > trying to > > > read. So far I have used > > > > > > rawData=read.table("Test.txt", fill=TRUE, > sep="\t", > > > header=FALSE); > > > > > > When then looking at rawData$V1 this gives me a > > distorted > > > view of my original first column. > > > > > > Thanks, > > > > > > Ingo > > > > > > > > > >