Hi, I have a data file, certain lines of which are character fields. I would like to skip these rows, and read the data file as a numeric data frame. I know that I can skip lines at the beginning with read.table and scan, but is there a way to skip a specified sequence of lines (e.g., 1, 2, 10, 11, 19, 20, 28, 29, etc.) ? If I read the entire data file, and then delete the character fields, the values are still kept as factors, with each value denoted by its level. Since, I have continuous variables, there are as many levels as there are values. I am unable to coerce this to "numeric" mode. Is there a way to do this so that I can then manipulate the numeric data frame? Thanks for any help. Best, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- [[alternative HTML version deleted]]
Read the file in as lines of text (readLines), 'grep' through the character vector and delete the lines you want and then use: read.table(textConntection(yourvector)) to read the corrected data in. On 4/9/08, Ravi Varadhan <rvaradhan at jhmi.edu> wrote:> Hi, > > > > I have a data file, certain lines of which are character fields. I would > like to skip these rows, and read the data file as a numeric data frame. I > know that I can skip lines at the beginning with read.table and scan, but is > there a way to skip a specified sequence of lines (e.g., 1, 2, 10, 11, 19, > 20, 28, 29, etc.) ? > > > > If I read the entire data file, and then delete the character fields, the > values are still kept as factors, with each value denoted by its level. > Since, I have continuous variables, there are as many levels as there are > values. I am unable to coerce this to "numeric" mode. Is there a way to do > this so that I can then manipulate the numeric data frame? > > > > Thanks for any help. > > Best, > > Ravi. > > ---------------------------------------------------------------------------- > ------- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: rvaradhan at jhmi.edu > > Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html > > > > ---------------------------------------------------------------------------- > -------- > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Hi Ravi, One thing I tend to do is, when using read.table, specify the option 'colClasses='character''. This forces everything to be read as a character. From there, as.numeric works fine, and you don't have to deal with factors and reconverting them. Hope this helps Abhijit Ravi Varadhan wrote:> Hi, > > > > I have a data file, certain lines of which are character fields. I would > like to skip these rows, and read the data file as a numeric data frame. I > know that I can skip lines at the beginning with read.table and scan, but is > there a way to skip a specified sequence of lines (e.g., 1, 2, 10, 11, 19, > 20, 28, 29, etc.) ? > > > > If I read the entire data file, and then delete the character fields, the > values are still kept as factors, with each value denoted by its level. > Since, I have continuous variables, there are as many levels as there are > values. I am unable to coerce this to "numeric" mode. Is there a way to do > this so that I can then manipulate the numeric data frame? > > > > Thanks for any help. > > Best, > > Ravi. > > ---------------------------------------------------------------------------- > ------- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: rvaradhan at jhmi.edu > > Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html > > > > ---------------------------------------------------------------------------- > -------- > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Wed, 9 Apr 2008, Ravi Varadhan wrote:> Hi, > > > > I have a data file, certain lines of which are character fields. I would > like to skip these rows, and read the data file as a numeric data frame. I > know that I can skip lines at the beginning with read.table and scan, but is > there a way to skip a specified sequence of lines (e.g., 1, 2, 10, 11, 19, > 20, 28, 29, etc.) ?Not within scan, but you can do it within the connection that scan reads. If the file is small, just read it all with readLines, select the lines you want (mydata[-c(1,2,10,11...)]) and use that as the input to a textConnection. If it is large, read a line at a time, discard when it is one to be skipped otherwise write to an anonymous file() connection. Then read.table on the anonymous connection. Or use perl/awk within a pipe() connection.> If I read the entire data file, and then delete the character fields, the > values are still kept as factors, with each value denoted by its level. > Since, I have continuous variables, there are as many levels as there are > values. I am unable to coerce this to "numeric" mode. Is there a way to do > this so that I can then manipulate the numeric data frame?Why does FAQ Q7.10 not apply?> > > > Thanks for any help. > > Best, > > Ravi. > > ---------------------------------------------------------------------------- > ------- > > Ravi Varadhan, Ph.D. > > Assistant Professor, The Center on Aging and Health > > Division of Geriatric Medicine and Gerontology > > Johns Hopkins University > > Ph: (410) 502-2619 > > Fax: (410) 614-9625 > > Email: rvaradhan at jhmi.edu > > Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html > > > > ---------------------------------------------------------------------------- > -------- > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Wed, Apr 9, 2008 at 7:37 PM, Ravi Varadhan <rvaradhan at jhmi.edu> wrote:> I have a data file, certain lines of which are character fields. I would > like to skip these rows, and read the data file as a numeric data frame. I > know that I can skip lines at the beginning with read.table and scan, but is > there a way to skip a specified sequence of lines (e.g., 1, 2, 10, 11, 19, > 20, 28, 29, etc.) ? > > If I read the entire data file, and then delete the character fields, the > values are still kept as factors, with each value denoted by its level. > Since, I have continuous variables, there are as many levels as there are > values. I am unable to coerce this to "numeric" mode. Is there a way to do > this so that I can then manipulate the numeric data frame?Read the entire data file to the data frame mydata, and then delete the character fields. Afterwards, mydata <- edit(mydata) and, inside edit, coerce the columns that you want to numeric. Paul