Ian Bentley
2010-Jun-11 18:32 UTC
[R] Transforming simulation data which is spread across many files into a barplot
I'm an R newbie, and I'm just trying to use some of it's graphing
capabilities, but I'm a bit stuck - basically in massaging the already
available data into a format R likes.
I have a simulation environment which produces logs, which represent a
number of different things. I then run a python script on this data, and
putting it in a nicer format. Essentially, the python script reduces the
number of files by two orders of magnitude.
What I'm left with, is a number of files, which each have two columns of
data in them.
The files look something like this:
--1000.log--
Sent Received
405.0 3832.0
176.0 1742.0
176.0 1766.0
176.0 1240.0
356.0 3396.0
...
This file - called 1000.log - represents a data point at 1000. What I'd like
to do is to use a loop, to read in 50 or so of these files, and then produce
a stacked barplot. Ideally, the stacked barplot would have 1 bar per file,
and two stacks per bar. The first stack would be the mean of the sent, and
the second would be the mean of the received.
I've used a loop to read files in R before, something like this ---
for (i in 1:50){
tmpFile <- paste(base, i*100, ".log", sep="")
tmp <- read.table(tmpFile)
}
--- But I really don't know how to handle massaging this data into the
matrix I need.
I hope this makes sense, I find it a little hard to describe.
Can anyone give me some help jumping into this one?
Thanks
--
Ian Bentley
M.Sc. Candidate
Queen's University
Kingston, Ontario
[[alternative HTML version deleted]]
Hadley Wickham
2010-Jun-11 18:52 UTC
[R] Transforming simulation data which is spread across many files into a barplot
On Fri, Jun 11, 2010 at 1:32 PM, Ian Bentley <ian.bentley at gmail.com> wrote:> I'm an R newbie, and I'm just trying to use some of it's graphing > capabilities, but I'm a bit stuck - basically in massaging the already > available data into a format R likes. > > I have a simulation environment which produces logs, which represent a > number of different things. ?I then run a python script on this data, and > putting it in a nicer format. ?Essentially, the python script reduces the > number of files by two orders of magnitude. > > What I'm left with, is a number of files, which each have two columns of > data in them. > The files look something like this: > --1000.log-- > Sent Received > 405.0 3832.0 > 176.0 1742.0 > 176.0 1766.0 > 176.0 1240.0 > 356.0 3396.0 > ... > > This file - called 1000.log - represents a data point at 1000. What I'd like > to do is to use a loop, to read in 50 or so of these files, and then produce > a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per file, > and two stacks per bar. ?The first stack would be the mean of the sent, and > the second would be the mean of the received. > > I've used a loop to read files in R before, something like this --- > > for (i in 1:50){ > ? ?tmpFile <- paste(base, i*100, ".log", sep="") > ? ?tmp <- read.table(tmpFile) > } ># Load data library(plyr) paths <- dir(base, pattern = "\\.log", full = TRUE) names(paths) <- basename(paths) df <- ddply(paths, read.table) # Compute averages: avg <- ddply(df, ".id", summarise, sent = mean(sent), received = mean(received) You can read more about plyr at http://had.co.nz/plyr. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
Gabor Grothendieck
2010-Jun-11 19:47 UTC
[R] Transforming simulation data which is spread across many files into a barplot
Try this:
base <- "file" # replace as appropriate
N <- 50
filenames <- paste(base, seq_len(N)*100, ".log", sep =
"")
mat <- sapply(filenames, function(fn)
colMeans(read.table(fn, col.names = c("Sent", "Received")))
)
barplot(mat)
On Fri, Jun 11, 2010 at 2:32 PM, Ian Bentley <ian.bentley at gmail.com>
wrote:> I'm an R newbie, and I'm just trying to use some of it's
graphing
> capabilities, but I'm a bit stuck - basically in massaging the already
> available data into a format R likes.
>
> I have a simulation environment which produces logs, which represent a
> number of different things. ?I then run a python script on this data, and
> putting it in a nicer format. ?Essentially, the python script reduces the
> number of files by two orders of magnitude.
>
> What I'm left with, is a number of files, which each have two columns
of
> data in them.
> The files look something like this:
> --1000.log--
> Sent Received
> 405.0 3832.0
> 176.0 1742.0
> 176.0 1766.0
> 176.0 1240.0
> 356.0 3396.0
> ...
>
> This file - called 1000.log - represents a data point at 1000. What I'd
like
> to do is to use a loop, to read in 50 or so of these files, and then
produce
> a stacked barplot. ?Ideally, the stacked barplot would have 1 bar per file,
> and two stacks per bar. ?The first stack would be the mean of the sent, and
> the second would be the mean of the received.
>
> I've used a loop to read files in R before, something like this ---
>
> for (i in 1:50){
> ? ?tmpFile <- paste(base, i*100, ".log", sep="")
> ? ?tmp <- read.table(tmpFile)
> }
>
> --- But I really don't know how to handle massaging this data into the
> matrix I need.
>
> I hope this makes sense, I find it a little hard to describe.
>
> Can anyone give me some help jumping into this one?
>
> Thanks
>
> --
> Ian Bentley
> M.Sc. Candidate
> Queen's University
> Kingston, Ontario
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>