Hello, I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns. The variables are consisted of characters, dates, time, numeric values. I use read.table("file.prn", header=F, sep="\t", na.strings="*"), where the missing values are declared as "*". The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column... dim(file) [1] 422344 1 It is somehow as it reads the whole row as one column. When I am asking for the first 3 lines for example I got the message that R is reading everything as factors and I get something like this below: data12L[1:3,] ID DATE Time RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 [2] 54678611 39356 0.1572569 RW 89 2014 21400 V11A11 4500 7200 4700 5000 * * * * * * * * * * * 0 527 594 567 * * * * * * * * * * * [3] 54678612 39356 0.1583333 RW 81 1716 33000 T11O3 7100 9100 5700 5600 5500 * * * * * * * * * * 0 397 605 133 133 * * * * * * * * * * 422344 Levels: ID DATE Time RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 .. Is there any solution? Any suggestion? And what is going on with the "*"? Is there any suggestion for this as well??? Thanks for your time! Ismini
I would guess that your separator is not really a tab like you think it is. Take a small subset of the data, bring it up in a text editor, check the contents and then try to read it. Always start small to see if it is working the way you think it should. Also it seem to have a header, so why are you ignoring it? It may make your numeric columns look like factors which is probably not want you want. On Wed, Oct 29, 2008 at 12:19 PM, <jass at in.gr> wrote:> > Hello, > > I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns. > The variables are consisted of characters, dates, time, numeric values. > I use read.table("file.prn", header=F, sep="\t", na.strings="*"), where the missing values are declared as "*". > The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column... > dim(file) > [1] 422344 1 > > It is somehow as it reads the whole row as one column. > When I am asking for the first 3 lines for example I got the message that R is reading everything as factors and I get something like this below: > > data12L[1:3,] > ID DATE Time RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 > [2] 54678611 39356 0.1572569 RW 89 2014 21400 V11A11 4500 7200 4700 5000 * * * * * * * * * * * 0 527 594 567 * * * * * * * * * * * > [3] 54678612 39356 0.1583333 RW 81 1716 33000 T11O3 7100 9100 5700 5600 5500 * * * * * * * * * * 0 397 605 133 133 * * * * * * * * * * > > 422344 Levels: ID DATE Time RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 .. > > Is there any solution? Any suggestion? > And what is going on with the "*"? Is there any suggestion for this as well??? > Thanks for your time! > > Ismini > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Wed, Oct 29, 2008 at 06:19:51PM +0200, jass at in.gr wrote:> I am having problems in reading appropriately a huge .prn file of almost > 450.000 rows and 29 columns. The variables are consisted of characters, > dates, time, numeric values. I use read.table("file.prn", header=F, > sep="\t", na.strings="*"), where the missing values are declared as "*". The > R engine is reading it like it, but when I am asking for the dimensions of > the data frame I get the right number of rows but only 1 column... > dim(file) > [1] 422344 1The most likely explanation is that your file is not tab separated.> And what is going on with the "*"? Is there any suggestion for this as well???That should work fine as soon as you figure out the correct value for sep. BTW: your outpu looks like you want to use header=T. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel