Hi, I'm having difficulty importing my textfile that looks something like this: #begin text file Timepoint 1 ObjectNumber Volume SurfaceArea 1 5.3 9.7 2 4.9 8.3 3 5.0 9.1 4 3.5 7.8 Timepoint 2 ObjectNumber Volume SurfaceArea 1 5.1 9.0 2 4.7 8.9 3 4.3 8.3 4 4.2 7.9 ... #goes on to Timepoint 80 How would I import this data into a list containing data.frame for each timepoint? I'd like my data to be organized like this:>myList[[1]] ObjectNumber Volume SurfaceArea 1 1 5.3 9.7 2 2 4.9 8.3 3 3 5.0 9.1 4 4 3.5 7.8 [[2]] ObjectNumber Volume SurfaceArea 1 1 5.1 9.0 2 2 4.7 8.9 3 3 4.3 8.3 4 4 4.2 7.9 -Daniel -- View this message in context: http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26045031.html Sent from the R help mailing list archive at Nabble.com.
try this:> # read in the file > x <- readLines(textConnection("#begin text file+ Timepoint 1 + ObjectNumber Volume SurfaceArea + 1 5.3 9.7 + 2 4.9 8.3 + 3 5.0 9.1 + 4 3.5 7.8 + + Timepoint 2 + ObjectNumber Volume SurfaceArea + 1 5.1 9.0 + 2 4.7 8.9 + 3 4.3 8.3 + 4 4.2 7.9"))> # delete blank lines > blanks <- grep("^\\s*$", x) > if (length(blanks) > 0) x <- x[-blanks] > # determine where "Timepoint" occurs in the text vector > tp <- grep("^Timepoint", x) > # append length+1 to the result > tp <- c(tp, length(x) + 3) > input <- textConnection(x) > # skip to the first "Timepoint" if not first line > if (tp[1] != 1) readLines(input, n=tp[1] - 1)[1] "#begin text file"> result <- list() > # repeat for each Timepoint > for (numLines in diff(tp)){ # repeat for number of lines to read+ id <- readLines(input, n=1) + result[[id]] <- read.table(input, header=TRUE, nrows=numLines - 2) + }> result$`Timepoint 1` ObjectNumber Volume SurfaceArea 1 1 5.3 9.7 2 2 4.9 8.3 3 3 5.0 9.1 4 4 3.5 7.8 $`Timepoint 2` ObjectNumber Volume SurfaceArea 1 1 5.1 9.0 2 2 4.7 8.9 3 3 4.3 8.3 4 4 4.2 7.9> closeAllConnections()On Sat, Oct 24, 2009 at 11:31 PM, delnatan <delnatan at gmail.com> wrote:> > Hi, > I'm having difficulty importing my textfile that looks something like this: > > #begin text file > Timepoint 1 > ObjectNumber ? ? Volume ? ? SurfaceArea > 1 ? ? ? ? ? ? ? ? ? ? ?5.3 ? ? ? ? ?9.7 > 2 ? ? ? ? ? ? ? ? ? ? ?4.9 ? ? ? ? ?8.3 > 3 ? ? ? ? ? ? ? ? ? ? ?5.0 ? ? ? ? ?9.1 > 4 ? ? ? ? ? ? ? ? ? ? ?3.5 ? ? ? ? ?7.8 > > Timepoint 2 > ObjectNumber ? ? Volume ? ? SurfaceArea > 1 ? ? ? ? ? ? ? ? ? ? ?5.1 ? ? ? ? ?9.0 > 2 ? ? ? ? ? ? ? ? ? ? ?4.7 ? ? ? ? ?8.9 > 3 ? ? ? ? ? ? ? ? ? ? ?4.3 ? ? ? ? ?8.3 > 4 ? ? ? ? ? ? ? ? ? ? ?4.2 ? ? ? ? ?7.9 > > ... #goes on to Timepoint 80 > > How would I import this data into a list containing data.frame for each > timepoint? > I'd like my data to be organized like this: > >>myList > [[1]] > ? ObjectNumber ? ? Volume ? ? SurfaceArea > 1 ?1 ? ? ? ? ? ? ? ? ? ? ?5.3 ? ? ? ? ?9.7 > 2 ?2 ? ? ? ? ? ? ? ? ? ? ?4.9 ? ? ? ? ?8.3 > 3 ?3 ? ? ? ? ? ? ? ? ? ? ?5.0 ? ? ? ? ?9.1 > 4 ?4 ? ? ? ? ? ? ? ? ? ? ?3.5 ? ? ? ? ?7.8 > > [[2]] > ?ObjectNumber ? ? Volume ? ? SurfaceArea > 1 1 ? ? ? ? ? ? ? ? ? ? ?5.1 ? ? ? ? ?9.0 > 2 2 ? ? ? ? ? ? ? ? ? ? ?4.7 ? ? ? ? ?8.9 > 3 3 ? ? ? ? ? ? ? ? ? ? ?4.3 ? ? ? ? ?8.3 > 4 4 ? ? ? ? ? ? ? ? ? ? ?4.2 ? ? ? ? ?7.9 > > -Daniel > -- > View this message in context: http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26045031.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of delnatan > Sent: Saturday, October 24, 2009 8:32 PM > To: r-help at r-project.org > Subject: [R] Importing data from text file with mixed format > > > Hi, > I'm having difficulty importing my textfile that looks > something like this: > > #begin text file > Timepoint 1 > ObjectNumber Volume SurfaceArea > 1 5.3 9.7 > 2 4.9 8.3 > 3 5.0 9.1 > 4 3.5 7.8 > > Timepoint 2 > ObjectNumber Volume SurfaceArea > 1 5.1 9.0 > 2 4.7 8.9 > 3 4.3 8.3 > 4 4.2 7.9 > > ... #goes on to Timepoint 80 > > How would I import this data into a list containing > data.frame for each > timepoint? > I'd like my data to be organized like this: > > >myList > [[1]] > ObjectNumber Volume SurfaceArea > 1 1 5.3 9.7 > 2 2 4.9 8.3 > 3 3 5.0 9.1 > 4 4 3.5 7.8 > > [[2]] > ObjectNumber Volume SurfaceArea > 1 1 5.1 9.0 > 2 2 4.7 8.9 > 3 3 4.3 8.3 > 4 4 4.2 7.9The following function reads that text file into one data.frame, which has a Timepoint column, which is a format I usually find more convenient. You can use split(data, data$Timepoint) to get to the format you asked for. If you use the one-data-frame format you can use the cast and melt functions from the reshape package to rearrange it. readMyData <- function (file) { # read every line in the file lines <- readLines(file) # drop empty lines lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE) # find and check header lines isHeaderLine <- regexpr("^ObjectNumber", lines) > 0 if (sum(isHeaderLine)==0) stop("No header lines of form 'ObjectNumber ...'") if (length(u <- unique(lines[isHeaderLine]))>1) stop("Header lines vary: ", paste(sQuote(head(u)), collapse=", ")) col.names <- strsplit(lines[which(isHeaderLine)[1]], "[[:space:]]+")[[1]] # after making column names from header lines, drop header lines lines <- lines[!isHeaderLine] # process Timepoint lines isTimepointLine <- regexpr("^Timepoint", lines) > 0 if (sum(isTimepointLine)==0) stop("No lines of form 'Timepoint <number>'") timepoints <- sub("^Timepoint[[:space:]]*", "", lines[isTimepointLine]) timepoints <- as.integer(timepoints) if (any(is.na(timepoints))) stop("Non-integer found in a Timepoint line: ", sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]])) nRowsPerTimepoint <- diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1 # drop Timepoint lines. Remaining lines should be data lines lines <- lines[!isTimepointLine] # An error in read.table means there were lines we should have dropped result <- read.table(header=FALSE, row.names=NULL, col.names=col.names, textConnection(lines)) # Add Timepoint column result$Timepoint <- rep(timepoints, nRowsPerTimepoint) result } E.g.,> data <- readMyData("c:/temp/t.txt") > dataObjectNumber Volume SurfaceArea Timepoint 1 1 5.3 9.7 1 2 2 4.9 8.3 1 3 3 5.0 9.1 1 4 4 3.5 7.8 1 5 1 5.1 9.0 2 6 2 4.7 8.9 2 7 3 4.3 8.3 2 8 4 4.2 7.9 2> split(data, data$Timepoint)$`1` ObjectNumber Volume SurfaceArea Timepoint 1 1 5.3 9.7 1 2 2 4.9 8.3 1 3 3 5.0 9.1 1 4 4 3.5 7.8 1 $`2` ObjectNumber Volume SurfaceArea Timepoint 5 1 5.1 9.0 2 6 2 4.7 8.9 2 7 3 4.3 8.3 2 8 4 4.2 7.9 2> mdata <- melt(data, id=c("ObjectNumber","Timepoint")) > cast(mdata, Timepoint~variable, fun.aggregate=c,subset=variable=="SurfaceArea") Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4 1 1 9.7 8.3 9.1 7.8 2 2 9.0 8.9 8.3 7.9> cast(mdata, ObjectNumber~variable, fun.aggregate=c,subset=variable=="SurfaceArea") ObjectNumber SurfaceArea_X1 SurfaceArea_X2 1 1 9.7 9.0 2 2 8.3 8.9 3 3 9.1 8.3 4 4 7.8 7.9 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > -Daniel > -- > View this message in context: > http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26045031.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gabor Grothendieck
2009-Oct-25 22:14 UTC
[R] Importing data from text file with mixed format
This solution uses strapply in gsubfn. It assumes the timepoints are 1, 2, 3, ... (although later we remove this restriction just in case). The first line reads in myfile. The second line reads the numeric rows into matrix s. The third line reads in the column names. The fourth line converts to data frame and adds the Timepoint column numbering the first data set 1, the second 2, etc. (also known as long form) and the fifth line is the result. library(gsubfn) L <- readLines("myfile") s <- strapply(L, "^([0-9]+) +([0-9.]+) +([0-9.]+) *$", c, simplify = rbind) colnames(s) <- c(read.table("myfile", FALSE, nrow = 1, skip = 1, as.is = TRUE)) DF <- transform(s, Timepoint = cumsum(DF$ObjectNumber == 1)) split(DF[-4], DF[4]) If the timepoints are not necessarily 1, 2, 3, ... then replace the last line with this (which extracts the timepoints and assigns them): Timepoint <- c(strapply(L, "Timepoint *([0-9]+)", as.numeric, simplify = rbind)) DF$Timepoint <- Timepoint[DF$Timepoint] split(DF[-4], DF[4]) On Sat, Oct 24, 2009 at 11:31 PM, delnatan <delnatan at gmail.com> wrote:> > Hi, > I'm having difficulty importing my textfile that looks something like this: > > #begin text file > Timepoint 1 > ObjectNumber ? ? Volume ? ? SurfaceArea > 1 ? ? ? ? ? ? ? ? ? ? ?5.3 ? ? ? ? ?9.7 > 2 ? ? ? ? ? ? ? ? ? ? ?4.9 ? ? ? ? ?8.3 > 3 ? ? ? ? ? ? ? ? ? ? ?5.0 ? ? ? ? ?9.1 > 4 ? ? ? ? ? ? ? ? ? ? ?3.5 ? ? ? ? ?7.8 > > Timepoint 2 > ObjectNumber ? ? Volume ? ? SurfaceArea > 1 ? ? ? ? ? ? ? ? ? ? ?5.1 ? ? ? ? ?9.0 > 2 ? ? ? ? ? ? ? ? ? ? ?4.7 ? ? ? ? ?8.9 > 3 ? ? ? ? ? ? ? ? ? ? ?4.3 ? ? ? ? ?8.3 > 4 ? ? ? ? ? ? ? ? ? ? ?4.2 ? ? ? ? ?7.9 > > ... #goes on to Timepoint 80 > > How would I import this data into a list containing data.frame for each > timepoint? > I'd like my data to be organized like this: > >>myList > [[1]] > ? ObjectNumber ? ? Volume ? ? SurfaceArea > 1 ?1 ? ? ? ? ? ? ? ? ? ? ?5.3 ? ? ? ? ?9.7 > 2 ?2 ? ? ? ? ? ? ? ? ? ? ?4.9 ? ? ? ? ?8.3 > 3 ?3 ? ? ? ? ? ? ? ? ? ? ?5.0 ? ? ? ? ?9.1 > 4 ?4 ? ? ? ? ? ? ? ? ? ? ?3.5 ? ? ? ? ?7.8 > > [[2]] > ?ObjectNumber ? ? Volume ? ? SurfaceArea > 1 1 ? ? ? ? ? ? ? ? ? ? ?5.1 ? ? ? ? ?9.0 > 2 2 ? ? ? ? ? ? ? ? ? ? ?4.7 ? ? ? ? ?8.9 > 3 3 ? ? ? ? ? ? ? ? ? ? ?4.3 ? ? ? ? ?8.3 > 4 4 ? ? ? ? ? ? ? ? ? ? ?4.2 ? ? ? ? ?7.9 > > -Daniel > -- > View this message in context: http://www.nabble.com/Importing-data-from-text-file-with-mixed-format-tp26045031p26045031.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >