Dear All, I have a text file, tab delimited, called "sample.txt",as follows: ID_REF 382 GC_Score Theta R B_Allele_Freq Log_R_Ratio 200003 BB 0.9101527 0.9734979 0.8788951 1 0 200006 AB 0.6003323 0.4385073 2.033364 0.4850979 0.01553433 I have explored various options of the command: read.table, with one as: read.table("sample.txt", na.strings="NA",as.is = TRUE) However, everything that it reads in becomes a character. Could you please help me on this? Best regards, Chee [[alternative HTML version deleted]]
R. Michael Weylandt
2012-Jan-24 03:26 UTC
[R] Help: read a proportion of high through-put data
It's pretty hard to answer this without the file in hand, but I'd guess something like the following is at play: Columns of data.frame()s have to have a single type. So if R sees anything it thinks is a character, it will coerce the whole column to character. Since you have not set the first row to be a header, it's probably interpreting that as the first element of the row and recognizes it as character. This behavior is sometimes auto-rectified by read.table() or read.csv() if it sees a column without a member in the first line -- as that suggests that we have column and rownames around rectangular data -- but that doesn't seem to be happening here. What happens if you try read.table("sample.txt", header = TRUE) An alternative route, if those names are coming in as headers, would be to manually coerce the columns -- if everything is to be numeric, just wrap the call in as.numeric() Michael On Mon, Jan 23, 2012 at 10:18 PM, Chee Chen <chee.chen at yahoo.com> wrote:> Dear All, > I have a text file, tab delimited, called "sample.txt",as follows: > ID_REF ? ?382 ? ?GC_Score ? ?Theta ? ?R ? ?B_Allele_Freq ? ?Log_R_Ratio > 200003 ? ?BB ? ?0.9101527 ? ?0.9734979 ? ?0.8788951 ? ?1 ? ?0 > 200006 ? ?AB ? ?0.6003323 ? ?0.4385073 ? ?2.033364 ? ?0.4850979 ? ?0.01553433 > > I have explored various options of the command: read.table, with one as: > read.table("sample.txt", na.strings="NA",as.is = TRUE) > > However, everything that it reads in becomes a character. > > Could you please help me on this? > Best regards, > Chee > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
R. Michael Weylandt
2012-Jan-24 04:33 UTC
[R] Help: read a proportion of high through-put data
Ok, it seems to have worked on my machine as well, but for some levels you didn't mention before. ?If you are having trouble with the header names, I'll take a stab at it -- R (by default) requires them to be syntactically valid names (i.e., can't start with a number or have a dollar sign or hyphen in them) and will modify them as needed. Generally this is helpful for interactive use (if you want to call names directly). If you wish to suppress this behavior, add the "check.names = FALSE" argument to read.table() and it will keep them as is. If you ever do need a non-syntactic name again, you can get it by surrounding it in backquotes: i.e., `3s` <- 4 3s # throws an error identical(`3s`, 4) # works Michael On Mon, Jan 23, 2012 at 11:28 PM, chee chen <chee.chen at yahoo.com> wrote:> Hi, Michael, > Please ignore my previous email with the attachment, since I guess I > resolved it with your suggestions (with "header=TRUE), except some minor > issues with the names of the header. > Regards, > Chee > > ________________________________ > From: R. Michael Weylandt <michael.weylandt at gmail.com> > To: Chee Chen <chee.chen at yahoo.com> > Cc: R-ORG <r-help at r-project.org> > Sent: Monday, January 23, 2012 10:26 PM > Subject: Re: [R] Help: read a proportion of high through-put data > > It's pretty hard to answer this without the file in hand, but I'd > guess something like the following is at play: > > Columns of data.frame()s have to have a single type. So if R sees > anything it thinks is a character, it will coerce the whole column to > character. Since you have not set the first row to be a header, it's > probably interpreting that as the first element of the row and > recognizes it as character. This behavior is sometimes auto-rectified > by read.table() or read.csv() if it sees a column without a member in > the first line -- as that suggests that we have column and rownames > around rectangular data -- but that doesn't seem to be happening here. > > What happens if you try > > read.table("sample.txt", header = TRUE) > > An alternative route, if those names are coming in as headers, would > be to manually coerce the columns -- if everything is to be numeric, > just wrap the call in as.numeric() > > Michael > > On Mon, Jan 23, 2012 at 10:18 PM, Chee Chen <chee.chen at yahoo.com> wrote: >> Dear All, >> I have a text file, tab delimited, called "sample.txt",as follows: >> ID_REF ? ?382 ? ?GC_Score ? ?Theta ? ?R ? ?B_Allele_Freq ? ?Log_R_Ratio >> 200003 ? ?BB ? ?0.9101527 ? ?0.9734979 ? ?0.8788951 ? ?1 ? ?0 >> 200006 ? ?AB ? ?0.6003323 ? ?0.4385073 ? ?2.033364 ? ?0.4850979 >> ?0.01553433 >> >> I have explored various options of the command: read.table, with one as: >> read.table("sample.txt", na.strings="NA",as.is = TRUE) >> >> However, everything that it reads in becomes a character. >> >> Could you please help me on this? >> Best regards, >> Chee >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >