Hi, I have tab delimited text files containing numerical data, like below, but many more columns. As you can see, the first few lines are heading and file data. I need to skip these lines. 2 lines above where the numbers start is what I want to use as my header rows. I then want to ignore the next line (containing units) and start importing data. The header row repeats. i want to ignore the blank rows and text in the data and then continue reading. Is there an easy way to do this? thanks Beyar ------------------------- main data file - file 1 by mr x etc Time out1 Sec mm 0.82495117 -0.020977303 1.3554688 -0.059330709 1.826416 -0.021419302 2.3295898 -0.051521059 2.8347168 -0.020661414 Time out1 Sec mm 3.8679199 -0.000439643 4.3322754 -0.063477799 4.8015137 -0.024581354 5.3286133 -0.067487299 5.8212891 -0.011978489 ----------------------------------------------- -- View this message in context: http://www.nabble.com/Ignore-text-when-reading-data-tp21718709p21718709.html Sent from the R help mailing list archive at Nabble.com.
# replace this bit, replace it with your file name myfile <- textConnection( "Time out1 Sec mm 0.82495117 -0.020977303 1.3554688 -0.059330709 1.826416 -0.021419302 2.3295898 -0.051521059 2.8347168 -0.020661414 Time out1 Sec mm 3.8679199 -0.000439643 4.3322754 -0.063477799 4.8015137 -0.024581354 5.3286133 -0.067487299 5.8212891 -0.011978489") # which lines not to read notread <- c(which(r==""),grep("Time ",r),grep("Sec ",r)) # read the data as text mydata <- r[setdiff(1:length(r),notread)] # make it into a dataframe (I think this can be done prettier, but whatever) z <- paste(mydata, collapse="\n") read.table(textConnection(z)) greetings Remko ------------------------------------------------- Remko Duursma Post-Doctoral Fellow Centre for Plant and Food Science University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 On Thu, Jan 29, 2009 at 11:45 AM, beyar <bxx at mailinator.com> wrote:> > Hi, > I have tab delimited text files containing numerical data, > like below, but many more columns. > > As you can see, the first few lines are heading and file data. I need to > skip these lines. 2 lines above where the numbers start is what I want to > use as my header rows. I then want to ignore the next line (containing > units) and start importing data. > > The header row repeats. i want to ignore the blank rows and text in the > data and then continue reading. Is there an easy way to do this? > > thanks > Beyar > > > ------------------------- > main data file - file 1 > by mr x > etc > > > Time out1 > Sec mm > 0.82495117 -0.020977303 > 1.3554688 -0.059330709 > 1.826416 -0.021419302 > 2.3295898 -0.051521059 > 2.8347168 -0.020661414 > > > Time out1 > Sec mm > 3.8679199 -0.000439643 > 4.3322754 -0.063477799 > 4.8015137 -0.024581354 > 5.3286133 -0.067487299 > 5.8212891 -0.011978489 > > ----------------------------------------------- > > -- > View this message in context: http://www.nabble.com/Ignore-text-when-reading-data-tp21718709p21718709.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Sorry, forgot this line after the textConnection bit: r <- readLines(myfile) ------------------------------------------------- Remko Duursma Post-Doctoral Fellow Centre for Plant and Food Science University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 On Thu, Jan 29, 2009 at 2:28 PM, Remko Duursma <remkoduursma at gmail.com> wrote:> # replace this bit, replace it with your file name > myfile <- textConnection( > "Time out1 > Sec mm > 0.82495117 -0.020977303 > 1.3554688 -0.059330709 > 1.826416 -0.021419302 > 2.3295898 -0.051521059 > 2.8347168 -0.020661414 > > > Time out1 > Sec mm > 3.8679199 -0.000439643 > 4.3322754 -0.063477799 > 4.8015137 -0.024581354 > 5.3286133 -0.067487299 > 5.8212891 -0.011978489") > > # which lines not to read > notread <- c(which(r==""),grep("Time ",r),grep("Sec ",r)) > > # read the data as text > mydata <- r[setdiff(1:length(r),notread)] > > # make it into a dataframe (I think this can be done prettier, but whatever) > z <- paste(mydata, collapse="\n") > read.table(textConnection(z)) > > > greetings > Remko > > ------------------------------------------------- > Remko Duursma > Post-Doctoral Fellow > > Centre for Plant and Food Science > University of Western Sydney > Hawkesbury Campus > Richmond NSW 2753 > > Dept of Biological Science > Macquarie University > North Ryde NSW 2109 > Australia > > Mobile: +61 (0)422 096908 > > > > On Thu, Jan 29, 2009 at 11:45 AM, beyar <bxx at mailinator.com> wrote: >> >> Hi, >> I have tab delimited text files containing numerical data, >> like below, but many more columns. >> >> As you can see, the first few lines are heading and file data. I need to >> skip these lines. 2 lines above where the numbers start is what I want to >> use as my header rows. I then want to ignore the next line (containing >> units) and start importing data. >> >> The header row repeats. i want to ignore the blank rows and text in the >> data and then continue reading. Is there an easy way to do this? >> >> thanks >> Beyar >> >> >> ------------------------- >> main data file - file 1 >> by mr x >> etc >> >> >> Time out1 >> Sec mm >> 0.82495117 -0.020977303 >> 1.3554688 -0.059330709 >> 1.826416 -0.021419302 >> 2.3295898 -0.051521059 >> 2.8347168 -0.020661414 >> >> >> Time out1 >> Sec mm >> 3.8679199 -0.000439643 >> 4.3322754 -0.063477799 >> 4.8015137 -0.024581354 >> 5.3286133 -0.067487299 >> 5.8212891 -0.011978489 >> >> ----------------------------------------------- >> >> -- >> View this message in context: http://www.nabble.com/Ignore-text-when-reading-data-tp21718709p21718709.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Try this:> x <- readLines(textConnection("main data file - file 1+ by mr x + etc + + + Time out1 + Sec mm + 0.82495117 -0.020977303 + 1.3554688 -0.059330709 + 1.826416 -0.021419302 + 2.3295898 -0.051521059 + 2.8347168 -0.020661414 + + + Time out1 + Sec mm + 3.8679199 -0.000439643 + 4.3322754 -0.063477799 + 4.8015137 -0.024581354 + 5.3286133 -0.067487299 + 5.8212891 -0.011978489"))> closeAllConnections() > # remove blanks > x <- x[x != ""] > # get the lines with numbers > indx.num <- grep("^[-0-9]", x) > header <- x[indx.num[1] - 2] > input <- read.table(textConnection(x[indx.num])) > names(input) <- strsplit(header, "\\s+")[[1]] > inputTime out1 1 0.8249512 -0.020977303 2 1.3554688 -0.059330709 3 1.8264160 -0.021419302 4 2.3295898 -0.051521059 5 2.8347168 -0.020661414 6 3.8679199 -0.000439643 7 4.3322754 -0.063477799 8 4.8015137 -0.024581354 9 5.3286133 -0.067487299 10 5.8212891 -0.011978489>On Wed, Jan 28, 2009 at 7:45 PM, beyar <bxx at mailinator.com> wrote:> > Hi, > I have tab delimited text files containing numerical data, > like below, but many more columns. > > As you can see, the first few lines are heading and file data. I need to > skip these lines. 2 lines above where the numbers start is what I want to > use as my header rows. I then want to ignore the next line (containing > units) and start importing data. > > The header row repeats. i want to ignore the blank rows and text in the > data and then continue reading. Is there an easy way to do this? > > thanks > Beyar > > > ------------------------- > main data file - file 1 > by mr x > etc > > > Time out1 > Sec mm > 0.82495117 -0.020977303 > 1.3554688 -0.059330709 > 1.826416 -0.021419302 > 2.3295898 -0.051521059 > 2.8347168 -0.020661414 > > > Time out1 > Sec mm > 3.8679199 -0.000439643 > 4.3322754 -0.063477799 > 4.8015137 -0.024581354 > 5.3286133 -0.067487299 > 5.8212891 -0.011978489 > > ----------------------------------------------- > > -- > View this message in context: http://www.nabble.com/Ignore-text-when-reading-data-tp21718709p21718709.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
thanks to all for the solutions. Especially to Jim H for this one which worked perfectly... (i only had to change the seperater on the header to /t as there are spaces in header names) ---------------- Try this:> x <- readLines(textConnection("main data file - file 1+ by mr x + etc + + + Time out1 + Sec mm + 0.82495117 -0.020977303 + 1.3554688 -0.059330709 + 1.826416 -0.021419302 + 2.3295898 -0.051521059 + 2.8347168 -0.020661414 + + + Time out1 + Sec mm + 3.8679199 -0.000439643 + 4.3322754 -0.063477799 + 4.8015137 -0.024581354 + 5.3286133 -0.067487299 + 5.8212891 -0.011978489"))> closeAllConnections() > # remove blanks > x <- x[x != ""] > # get the lines with numbers > indx.num <- grep("^[-0-9]", x) > header <- x[indx.num[1] - 2] > input <- read.table(textConnection(x[indx.num])) > names(input) <- strsplit(header, "\\s+")[[1]] > inputbeyar wrote:> > Hi, > I have tab delimited text files containing numerical data, > like below, but many more columns. > > As you can see, the first few lines are heading and file data. I need to > skip these lines. 2 lines above where the numbers start is what I want to > use as my header rows. I then want to ignore the next line (containing > units) and start importing data. > > The header row repeats. i want to ignore the blank rows and text in the > data and then continue reading. Is there an easy way to do this? > > thanks > Beyar > > > ------------------------- > main data file - file 1 > by mr x > etc > > > Time out1 > Sec mm > 0.82495117 -0.020977303 > 1.3554688 -0.059330709 > 1.826416 -0.021419302 > 2.3295898 -0.051521059 > 2.8347168 -0.020661414 > > > Time out1 > Sec mm > 3.8679199 -0.000439643 > 4.3322754 -0.063477799 > 4.8015137 -0.024581354 > 5.3286133 -0.067487299 > 5.8212891 -0.011978489 > > ----------------------------------------------- > >-- View this message in context: http://www.nabble.com/Ignore-text-when-reading-data-tp21718709p21720588.html Sent from the R help mailing list archive at Nabble.com.