thr3ads.net - R help - [R] stacking imported data [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Sundar Dorai-Raj

2004-Nov-01 23:35 UTC

[R] stacking imported data

Hi all,
   I have a question that I don't have a good answer for (note the word 
"good"; I have an answer, but I consider it not "good").
Take the
following data in a single tab-delimited text file:

<text>

A
Labels	Value	SE	2.5%	97.5%
R90	0.231787	1.148044	0.035074	1.531779
R0	0.500861	0.604406	0.185336	1.353552

B
Labels	Value	SE	2.5%	97.5%
(Intercept)	1.367514	0.036431	1.287975	1.451964
</text>

(Note: the <text> tags are not present and are added here only to show 
blank lines.)

I would like to read the data into a single data.frame which looks like

Labels	Value	SE	2.5%	97.5%
A.R90	0.231787	1.148044	0.035074	1.531779
A.R0	0.500861	0.604406	0.185336	1.353552
B.(Intercept)	1.367514	0.036431	1.287975	1.451964

A few rules:

1. the number of rows in "A" and "B" will vary from 1 to
???. Here "A"
has 1 row (excluding header) and B has 2 rows (excluding header).
2. the number of columns in "A" and "B" will always be the
same.
4. the headers for "A" and "B" will always be the same.
3. there is always an empty line at the beginning of the file and in 
between "A" and "B".

My solution involves scan and indexing though it is error prone and not 
flexible if more or less than 5 columns are present in the data. While 
the number of columns is always the same from "A" to "B" it
may change
that "A" and "B" have more or fewer columns.

I hope this makes sense.

Thanks,

--sundar

Spencer Graves

2004-Nov-02 01:00 UTC

head link

[R] stacking imported data

Hi, Sundar: 

      I got something that looks like it might be what you want copying 
your 8 lines to clipboard and using the following: 

DF <- read.table("clipboard",  colClasses=character(0), fill=TRUE)
breaks <- which(DF$V2=="")
nrows <- diff(c(breaks, dim(DF)[1]+1))

files <- as.character(DF[breaks,1])

DF2 <- cbind(rep(files, nrows), DF)

DF. <- DF2[-c(breaks, breaks+1),]
DFnames <- as.matrix(DF[breaks[1]+1, ])
names(DF.) <- c("Files", DFnames)
#################
Result: 
DF

  Files      Labels    Value       SE     2.5%    97.5%
3     A         R90 0.231787 1.148044 0.035074 1.531779
4     A          R0 0.500861 0.604406 0.185336 1.353552
7     B (Intercept) 1.367514 0.036431 1.287975 1.451964

      This uses the "fill" argument in data.frame in R that Andy Liaw 
mentioned earlier today.  (Thus, this solution won't work in S-Plus 6.2, 
where data.frame does not have this argument.) 

      Is this satisfactory? 
      Spencer Graves

Sundar Dorai-Raj wrote:
> Hi all,
>   I have a question that I don't have a good answer for (note the word 
> "good"; I have an answer, but I consider it not
"good"). Take the
> following data in a single tab-delimited text file:
>
> <text>
>
> A
> Labels    Value    SE    2.5%    97.5%
> R90    0.231787    1.148044    0.035074    1.531779
> R0    0.500861    0.604406    0.185336    1.353552
>
> B
> Labels    Value    SE    2.5%    97.5%
> (Intercept)    1.367514    0.036431    1.287975    1.451964
> </text>
>
> (Note: the <text> tags are not present and are added here only to
show
> blank lines.)
>
> I would like to read the data into a single data.frame which looks like
>
> Labels    Value    SE    2.5%    97.5%
> A.R90    0.231787    1.148044    0.035074    1.531779
> A.R0    0.500861    0.604406    0.185336    1.353552
> B.(Intercept)    1.367514    0.036431    1.287975    1.451964
>
> A few rules:
>
> 1. the number of rows in "A" and "B" will vary from 1
to ???. Here "A"
> has 1 row (excluding header) and B has 2 rows (excluding header).
> 2. the number of columns in "A" and "B" will always be
the same.
> 4. the headers for "A" and "B" will always be the same.
> 3. there is always an empty line at the beginning of the file and in 
> between "A" and "B".
>
> My solution involves scan and indexing though it is error prone and 
> not flexible if more or less than 5 columns are present in the data. 
> While the number of columns is always the same from "A" to
"B" it may
> change that "A" and "B" have more or fewer columns.
>
> I hope this makes sense.
>
> Thanks,
>
> --sundar
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

Gabor Grothendieck

2004-Nov-02 01:31 UTC

head link

[R] stacking imported data

Sundar Dorai-Raj <sundar.dorai-raj <at> pdf.com> writes:

: 
: Hi all,
:    I have a question that I don't have a good answer for (note the word 
: "good"; I have an answer, but I consider it not "good").
Take the
: following data in a single tab-delimited text file:
: 
: <text>
: 
: A
: Labels	Value	SE	2.5%	97.5%
: R90	0.231787	1.148044	0.035074	1.531779
: R0	0.500861	0.604406	0.185336	1.353552
: 
: B
: Labels	Value	SE	2.5%	97.5%
: (Intercept)	1.367514	0.036431	1.287975	1.451964
: </text>
: 
: (Note: the <text> tags are not present and are added here only to show 
: blank lines.)
: 
: I would like to read the data into a single data.frame which looks like
: 
: Labels	Value	SE	2.5%	97.5%
: A.R90	0.231787	1.148044	0.035074	1.531779
: A.R0	0.500861	0.604406	0.185336	1.353552
: B.(Intercept)	1.367514	0.036431	1.287975	1.451964
: 
: A few rules:
: 
: 1. the number of rows in "A" and "B" will vary from 1 to
???. Here "A"
: has 1 row (excluding header) and B has 2 rows (excluding header).
: 2. the number of columns in "A" and "B" will always be the
same.
: 4. the headers for "A" and "B" will always be the same.
: 3. there is always an empty line at the beginning of the file and in 
: between "A" and "B".
: 

Read the lines into vector z, one line per element.

Define a grouping variable, g, which is 1 for the lines
starting at the first blank line and 2 for the lines
starting at the 2nd.  Define a function f which accepts such
a group of lines and creates the appropriate data frame from
them.  tapply the lines, grouped by g, and bind the rows of
the data frame produced from each group together into one
large data frame.  

z <- readLines("file.dat")

g <- cumsum(nchar(z) == 0)
f <- function(x) {
	x[-(1:3)] <- paste(trim(x[2]), x[-(1:3)], sep = ".")
	read.table(textConnection(x[-(1:2)]), header = TRUE)
}
do.call("rbind", tapply(z, cumsum(nchar(z) == 0), f))


Note: if the blank lines or the A and B lines contain
whitespace trim this off first.  That is, insert these
two lines after the readLines statement:

trim <- function(x) gsub("^[[:space:]]+|[[:space:]]+$",
"", x)
z <- trim(z)

Apparently Analagous Threads

Search for more reasonably related threads

R help - Nov 2004 - stacking imported data

[R] stacking imported data

[R] stacking imported data

[R] stacking imported data

Apparently Analagous Threads