Hi all, I have a question that I don't have a good answer for (note the word "good"; I have an answer, but I consider it not "good"). Take the following data in a single tab-delimited text file: <text> A Labels Value SE 2.5% 97.5% R90 0.231787 1.148044 0.035074 1.531779 R0 0.500861 0.604406 0.185336 1.353552 B Labels Value SE 2.5% 97.5% (Intercept) 1.367514 0.036431 1.287975 1.451964 </text> (Note: the <text> tags are not present and are added here only to show blank lines.) I would like to read the data into a single data.frame which looks like Labels Value SE 2.5% 97.5% A.R90 0.231787 1.148044 0.035074 1.531779 A.R0 0.500861 0.604406 0.185336 1.353552 B.(Intercept) 1.367514 0.036431 1.287975 1.451964 A few rules: 1. the number of rows in "A" and "B" will vary from 1 to ???. Here "A" has 1 row (excluding header) and B has 2 rows (excluding header). 2. the number of columns in "A" and "B" will always be the same. 4. the headers for "A" and "B" will always be the same. 3. there is always an empty line at the beginning of the file and in between "A" and "B". My solution involves scan and indexing though it is error prone and not flexible if more or less than 5 columns are present in the data. While the number of columns is always the same from "A" to "B" it may change that "A" and "B" have more or fewer columns. I hope this makes sense. Thanks, --sundar
Hi, Sundar: I got something that looks like it might be what you want copying your 8 lines to clipboard and using the following: DF <- read.table("clipboard", colClasses=character(0), fill=TRUE) breaks <- which(DF$V2=="") nrows <- diff(c(breaks, dim(DF)[1]+1)) files <- as.character(DF[breaks,1]) DF2 <- cbind(rep(files, nrows), DF) DF. <- DF2[-c(breaks, breaks+1),] DFnames <- as.matrix(DF[breaks[1]+1, ]) names(DF.) <- c("Files", DFnames) ################# Result: DF Files Labels Value SE 2.5% 97.5% 3 A R90 0.231787 1.148044 0.035074 1.531779 4 A R0 0.500861 0.604406 0.185336 1.353552 7 B (Intercept) 1.367514 0.036431 1.287975 1.451964 This uses the "fill" argument in data.frame in R that Andy Liaw mentioned earlier today. (Thus, this solution won't work in S-Plus 6.2, where data.frame does not have this argument.) Is this satisfactory? Spencer Graves Sundar Dorai-Raj wrote:> Hi all, > I have a question that I don't have a good answer for (note the word > "good"; I have an answer, but I consider it not "good"). Take the > following data in a single tab-delimited text file: > > <text> > > A > Labels Value SE 2.5% 97.5% > R90 0.231787 1.148044 0.035074 1.531779 > R0 0.500861 0.604406 0.185336 1.353552 > > B > Labels Value SE 2.5% 97.5% > (Intercept) 1.367514 0.036431 1.287975 1.451964 > </text> > > (Note: the <text> tags are not present and are added here only to show > blank lines.) > > I would like to read the data into a single data.frame which looks like > > Labels Value SE 2.5% 97.5% > A.R90 0.231787 1.148044 0.035074 1.531779 > A.R0 0.500861 0.604406 0.185336 1.353552 > B.(Intercept) 1.367514 0.036431 1.287975 1.451964 > > A few rules: > > 1. the number of rows in "A" and "B" will vary from 1 to ???. Here "A" > has 1 row (excluding header) and B has 2 rows (excluding header). > 2. the number of columns in "A" and "B" will always be the same. > 4. the headers for "A" and "B" will always be the same. > 3. there is always an empty line at the beginning of the file and in > between "A" and "B". > > My solution involves scan and indexing though it is error prone and > not flexible if more or less than 5 columns are present in the data. > While the number of columns is always the same from "A" to "B" it may > change that "A" and "B" have more or fewer columns. > > I hope this makes sense. > > Thanks, > > --sundar > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567
Sundar Dorai-Raj <sundar.dorai-raj <at> pdf.com> writes: : : Hi all, : I have a question that I don't have a good answer for (note the word : "good"; I have an answer, but I consider it not "good"). Take the : following data in a single tab-delimited text file: : : <text> : : A : Labels Value SE 2.5% 97.5% : R90 0.231787 1.148044 0.035074 1.531779 : R0 0.500861 0.604406 0.185336 1.353552 : : B : Labels Value SE 2.5% 97.5% : (Intercept) 1.367514 0.036431 1.287975 1.451964 : </text> : : (Note: the <text> tags are not present and are added here only to show : blank lines.) : : I would like to read the data into a single data.frame which looks like : : Labels Value SE 2.5% 97.5% : A.R90 0.231787 1.148044 0.035074 1.531779 : A.R0 0.500861 0.604406 0.185336 1.353552 : B.(Intercept) 1.367514 0.036431 1.287975 1.451964 : : A few rules: : : 1. the number of rows in "A" and "B" will vary from 1 to ???. Here "A" : has 1 row (excluding header) and B has 2 rows (excluding header). : 2. the number of columns in "A" and "B" will always be the same. : 4. the headers for "A" and "B" will always be the same. : 3. there is always an empty line at the beginning of the file and in : between "A" and "B". : Read the lines into vector z, one line per element. Define a grouping variable, g, which is 1 for the lines starting at the first blank line and 2 for the lines starting at the 2nd. Define a function f which accepts such a group of lines and creates the appropriate data frame from them. tapply the lines, grouped by g, and bind the rows of the data frame produced from each group together into one large data frame. z <- readLines("file.dat") g <- cumsum(nchar(z) == 0) f <- function(x) { x[-(1:3)] <- paste(trim(x[2]), x[-(1:3)], sep = ".") read.table(textConnection(x[-(1:2)]), header = TRUE) } do.call("rbind", tapply(z, cumsum(nchar(z) == 0), f)) Note: if the blank lines or the A and B lines contain whitespace trim this off first. That is, insert these two lines after the readLines statement: trim <- function(x) gsub("^[[:space:]]+|[[:space:]]+$", "", x) z <- trim(z)