Ian Bentley
2010-Jun-11 18:32 UTC
[R] Transforming simulation data which is spread across many files into a barplot
I'm an R newbie, and I'm just trying to use some of it's graphing capabilities, but I'm a bit stuck - basically in massaging the already available data into a format R likes. I have a simulation environment which produces logs, which represent a number of different things. I then run a python script on this data, and putting it in a nicer format. Essentially, the python script reduces the number of files by two orders of magnitude. What I'm left with, is a number of files, which each have two columns of data in them. The files look something like this: --1000.log-- Sent Received 405.0 3832.0 176.0 1742.0 176.0 1766.0 176.0 1240.0 356.0 3396.0 ... This file - called 1000.log - represents a data point at 1000. What I'd like to do is to use a loop, to read in 50 or so of these files, and then produce a stacked barplot. Ideally, the stacked barplot would have 1 bar per file, and two stacks per bar. The first stack would be the mean of the sent, and the second would be the mean of the received. I've used a loop to read files in R before, something like this --- for (i in 1:50){ tmpFile <- paste(base, i*100, ".log", sep="") tmp <- read.table(tmpFile) } --- But I really don't know how to handle massaging this data into the matrix I need. I hope this makes sense, I find it a little hard to describe. Can anyone give me some help jumping into this one? Thanks -- Ian Bentley M.Sc. Candidate Queen's University Kingston, Ontario [[alternative HTML version deleted]]
Hadley Wickham
2010-Jun-11 18:52 UTC
[R] Transforming simulation data which is spread across many files into a barplot
On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:> I'm an R newbie, and I'm just trying to use some of it's graphing > capabilities, but I'm a bit stuck - basically in massaging the already > available data into a format R likes. > > I have a simulation environment which produces logs, which represent a > number of different things. ?I then run a python script on this data, and > putting it in a nicer format. ?Essentially, the python script reduces the > number of files by two orders of magnitude. > > What I'm left with, is a number of files, which each have two columns of > data in them. > The files look something like this: > --1000.log-- > Sent Received > 405.0 3832.0 > 176.0 1742.0 > 176.0 1766.0 > 176.0 1240.0 > 356.0 3396.0 > ... > > This file - called 1000.log - represents a data point at 1000. What I'd like > to do is to use a loop, to read in 50 or so of these files, and then produce > a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per file, > and two stacks per bar. ?The first stack would be the mean of the sent, and > the second would be the mean of the received. > > I've used a loop to read files in R before, something like this --- > > for (i in 1:50){ > ? ?tmpFile <- paste(base, i*100, ".log", sep="") > ? ?tmp <- read.table(tmpFile) > } ># Load data library(plyr) paths <- dir(base, pattern = "\\.log", full = TRUE) names(paths) <- basename(paths) df <- ddply(paths, read.table) # Compute averages: avg <- ddply(df, ".id", summarise, sent = mean(sent), received = mean(received) You can read more about plyr at http://had.co.nz/plyr. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Gabor Grothendieck
2010-Jun-11 19:47 UTC
[R] Transforming simulation data which is spread across many files into a barplot
Try this: base <- "file" # replace as appropriate N <- 50 filenames <- paste(base, seq_len(N)*100, ".log", sep = "") mat <- sapply(filenames, function(fn) colMeans(read.table(fn, col.names = c("Sent", "Received"))) ) barplot(mat) On Fri, Jun 11, 2010 at 2:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:> I'm an R newbie, and I'm just trying to use some of it's graphing > capabilities, but I'm a bit stuck - basically in massaging the already > available data into a format R likes. > > I have a simulation environment which produces logs, which represent a > number of different things. ?I then run a python script on this data, and > putting it in a nicer format. ?Essentially, the python script reduces the > number of files by two orders of magnitude. > > What I'm left with, is a number of files, which each have two columns of > data in them. > The files look something like this: > --1000.log-- > Sent Received > 405.0 3832.0 > 176.0 1742.0 > 176.0 1766.0 > 176.0 1240.0 > 356.0 3396.0 > ... > > This file - called 1000.log - represents a data point at 1000. What I'd like > to do is to use a loop, to read in 50 or so of these files, and then produce > a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per file, > and two stacks per bar. ?The first stack would be the mean of the sent, and > the second would be the mean of the received. > > I've used a loop to read files in R before, something like this --- > > for (i in 1:50){ > ? ?tmpFile <- paste(base, i*100, ".log", sep="") > ? ?tmp <- read.table(tmpFile) > } > > --- But I really don't know how to handle massaging this data into the > matrix I need. > > I hope this makes sense, I find it a little hard to describe. > > Can anyone give me some help jumping into this one? > > Thanks > > -- > Ian Bentley > M.Sc. Candidate > Queen's University > Kingston, Ontario > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >