Dear all, I need to read an ASCII file with diffent length lines. This is what is contained in the file gene.txt: 1st line ID description snp_id genotype 2nd line 10003 Low rs152240 3rd line 10003 Moderate rs189011 TC 4th line 10004 Conservative rs152240 GC 5th line 10004 Bad rs154354 6th line 10013 Bad rs152240 7th line 10019 Conservative rs152240 AC etc... This is what I would like to obtain in R: ID description snp_id genotype 10003 Low rs152240 NA 10003 Moderate rs189011 TC 10004 Conservative rs152240 GC 10004 Bad rs154354 NA 10013 Bad rs152240 NA 10019 Conservative rs152240 AC Read.table() doesn't work in these situations because of the irregular pattern of data. Have you got any suggestion? Thanks a lot! Cristian ==========================================Cristian Pattaro ========================================== Unit of Epidemiology & Medical Statistics Department of Medicine and Public Health University of Verona cristian@biometria.univr.it http://biometria.univr.it ========================================== [[alternative HTML version deleted]]
You can use fill=TRUE and na.string="" in read.table(). E.g.,> try.dat <- read.table("clipboard", colClasses=rep("character", 6),+ header=TRUE, fill=TRUE, na.string="")> try.datX1st line ID description snp_id genotype 1 2nd line 10003 Low rs152240 <NA> 2 3rd line 10003 Moderate rs189011 TC 3 4th line 10004 Conservative rs152240 GC 4 5th line 10004 Bad rs154354 <NA> 5 6th line 10013 Bad rs152240 <NA> 6 7th line 10019 Conservative rs152240 AC HTH, Andy> From: Cristian Pattaro > > Dear all, > I need to read an ASCII file with diffent length lines. > > This is what is contained in the file gene.txt: > 1st line ID description snp_id genotype > 2nd line 10003 Low rs152240 > 3rd line 10003 Moderate rs189011 TC > 4th line 10004 Conservative rs152240 GC > 5th line 10004 Bad rs154354 > 6th line 10013 Bad rs152240 > 7th line 10019 Conservative rs152240 AC > etc... > > This is what I would like to obtain in R: > ID description snp_id genotype > 10003 Low rs152240 NA > 10003 Moderate rs189011 TC > 10004 Conservative rs152240 GC > 10004 Bad rs154354 NA > 10013 Bad rs152240 NA > 10019 Conservative rs152240 AC > > Read.table() doesn't work in these situations because of the > irregular > pattern of data. Have you got any suggestion? > Thanks a lot! > Cristian > > ==========================================> Cristian Pattaro > ==========================================> > Unit of Epidemiology & Medical Statistics > Department of Medicine and Public Health > University of Verona > cristian at biometria.univr.it > http://biometria.univr.it > ==========================================> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
On Fri, 23 Jul 2004, Cristian Pattaro wrote:> Dear all, > I need to read an ASCII file with diffent length lines. > > This is what is contained in the file gene.txt: > 1st line ID description snp_id genotype > 2nd line 10003 Low rs152240 > 3rd line 10003 Moderate rs189011 TC > 4th line 10004 Conservative rs152240 GC > 5th line 10004 Bad rs154354 > 6th line 10013 Bad rs152240 > 7th line 10019 Conservative rs152240 AC > etc... > > This is what I would like to obtain in R: > ID description snp_id genotype > 10003 Low rs152240 NA > 10003 Moderate rs189011 TC > 10004 Conservative rs152240 GC > 10004 Bad rs154354 NA > 10013 Bad rs152240 NA > 10019 Conservative rs152240 AC > > Read.table() doesn't work in these situations because of the irregular > pattern of data. Have you got any suggestion?Read the manual, for it does! In particular, look at the argument fill: logical. If 'TRUE' then in case the rows have unequal length, blank fields are implicitly added. See Details. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Have considered using fill=TRUE in read.table()? See ?read.table If that does not work, there is always scan(). I worst case you can also use readChar() or readLines()/strsplit(), but that should not be necessary. Cheers Henrik Bengtsson Lund University> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Cristian Pattaro > Sent: Friday, July 23, 2004 2:06 PM > To: R Help > Subject: [R] Reading ASCII files > > > Dear all, > I need to read an ASCII file with diffent length lines. > > This is what is contained in the file gene.txt: > 1st line ID description snp_id genotype > 2nd line 10003 Low rs152240 > 3rd line 10003 Moderate rs189011 TC > 4th line 10004 Conservative rs152240 GC > 5th line 10004 Bad rs154354 > 6th line 10013 Bad rs152240 > 7th line 10019 Conservative rs152240 AC > etc... > > This is what I would like to obtain in R: > ID description snp_id genotype > 10003 Low rs152240 NA > 10003 Moderate rs189011 TC > 10004 Conservative rs152240 GC > 10004 Bad rs154354 NA > 10013 Bad rs152240 NA > 10019 Conservative rs152240 AC > > Read.table() doesn't work in these situations because of the > irregular > pattern of data. Have you got any suggestion? > Thanks a lot! > Cristian > > ==========================================> Cristian Pattaro > ==========================================> > Unit of Epidemiology & Medical Statistics > Department of Medicine and Public Health > University of Verona > cristian at biometria.univr.it > http://biometria.univr.it ==========================================> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Did you read the R data import/export manual or check the mail archives? You could try to save the input file as comma or pipe separated. Alternatively you can try this hack if all records are separated by a single space.> a <- read.delim(file="tmp.txt", sep=" ", na.string="") > aID description snp_id genotype 1 10003 Low rs152240 <NA> 2 10003 Moderate rs189011 TC 3 10004 Conservative rs152240 GC 4 10004 Bad rs154354 <NA> 5 10013 Bad rs152240 <NA> 6 10019 Conservative rs152240 AC On Fri, 2004-07-23 at 13:06, Cristian Pattaro wrote:> Dear all, > I need to read an ASCII file with diffent length lines. > > This is what is contained in the file gene.txt: > 1st line ID description snp_id genotype > 2nd line 10003 Low rs152240 > 3rd line 10003 Moderate rs189011 TC > 4th line 10004 Conservative rs152240 GC > 5th line 10004 Bad rs154354 > 6th line 10013 Bad rs152240 > 7th line 10019 Conservative rs152240 AC > etc... > > This is what I would like to obtain in R: > ID description snp_id genotype > 10003 Low rs152240 NA > 10003 Moderate rs189011 TC > 10004 Conservative rs152240 GC > 10004 Bad rs154354 NA > 10013 Bad rs152240 NA > 10019 Conservative rs152240 AC > > Read.table() doesn't work in these situations because of the irregular > pattern of data. Have you got any suggestion? > Thanks a lot! > Cristian > > ==========================================> Cristian Pattaro > ==========================================> > Unit of Epidemiology & Medical Statistics > Department of Medicine and Public Health > University of Verona > cristian at biometria.univr.it > http://biometria.univr.it > ==========================================> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >