victor jimenez
2012-May-03 18:07 UTC
[Rd] loading multiple CSV files into a single data frame
Sometimes I have hundreds of CSV files scattered in a directory tree,
resulting from experiments' executions. For instance, giving an example
from my field, I may want to collect the performance of a processor for
several design parameters such as "cache size" (possible values: 2, 4,
8
and 16) and "cache associativity" (possible values: direct-mapped,
4-way,
fully-associative). The results of all these experiments will be stored in
a directory tree like:
results
|-- direct-mapped
| |-- 2 -- data.csv
| |-- 4 -- data.csv
| |-- 8 -- data.csv
| |-- 16 -- data.csv
|-- 4-way
| |-- 2 -- data.csv
| |-- 4 -- data.csv
...
|-- fully-associative
| |-- 2 -- data.csv
| |-- 4 -- data.csv
...
I am developing a package that would allow me to gather all those CSV into
a single data frame. Currently, I just need to execute the following
statement:
dframe <- gather("results/@ASSOC@/@SIZE@/data.csv")
and this command returns a data frame containing the columns ASSOC, SIZE
and all the remaining columns inside the CSV files (in my case the
processor performance), effectively loading all the CSV files into a single
data frame. So, I would get something like:
ASSOC, SIZE, PERF
direct-mapped, 2, 1.4
direct-mapped, 4, 1.6
direct-mapped, 8, 1.7
direct-mapped, 16, 1.7
4-way, 2, 1.4
4-way, 4, 1.5
...
I would like to ask whether there is any similar functionality already
implemented in R. If so, there is no need to reinvent the wheel :)
If it is not implemented and the R community believes that this feature
would be useful, I would be glad to contribute my code.
Thank you,
Victor
P.S: I was not sure whether to submit this question to R-devel or R-help,
but since it may lead to some programming discussion I decided to post it
to R-devel. Please, let me know if it is better to move it to the other
list.
[[alternative HTML version deleted]]
Gabor Grothendieck
2012-May-03 18:54 UTC
[Rd] loading multiple CSV files into a single data frame
On Thu, May 3, 2012 at 2:07 PM, victor jimenez <betabandido at gmail.com> wrote:> Sometimes I have hundreds of CSV files scattered in a directory tree, > resulting from experiments' executions. For instance, giving an example > from my field, I may want to collect the performance of a processor for > several design parameters such as "cache size" (possible values: 2, 4, 8 > and 16) and "cache associativity" (possible values: direct-mapped, 4-way, > fully-associative). The results of all these experiments will be stored in > a directory tree like: > > results > ?|-- direct-mapped > ?| ? ? ? |-- 2 -- data.csv > ?| ? ? ? |-- 4 -- data.csv > ?| ? ? ? |-- 8 -- data.csv > ?| ? ? ? |-- 16 -- data.csv > ?|-- 4-way > ?| ? ? ? |-- 2 -- data.csv > ?| ? ? ? |-- 4 -- data.csv > ... > ?|-- fully-associative > ?| ? ? ? |-- 2 -- data.csv > ?| ? ? ? |-- 4 -- data.csv > ... > > I am developing a package that would allow me to gather all those CSV into > a single data frame. Currently, I just need to execute the following > statement: > > dframe <- gather("results/@ASSOC@/@SIZE@/data.csv") > > and this command returns a data frame containing the columns ASSOC, SIZE > and all the remaining columns inside the CSV files (in my case the > processor performance), effectively loading all the CSV files into a single > data frame. So, I would get something like: > > ASSOC, ? ? ? ? ?SIZE, PERF > direct-mapped, ? ? ? 2, ? ? 1.4 > direct-mapped, ? ? ? 4, ? ? 1.6 > direct-mapped, ? ? ? 8, ? ? 1.7 > direct-mapped, ? ? 16, ? ? 1.7 > 4-way, ? ? ? ? ? ? ? ? ? 2, ? ? 1.4 > 4-way, ? ? ? ? ? ? ? ? ? 4, ? ? 1.5 > ... > > I would like to ask whether there is any similar functionality already > implemented in R. If so, there is no need to reinvent the wheel :) > If it is not implemented and the R community believes that this feature > would be useful, I would be glad to contribute my code. >If your csv files all have the same columns and represent time series then read.zoo in the zoo package can read multiple csv files in at once using a single read.zoo command producing a single zoo object. library(zoo) ?read.zoo vignette("zoo-read") Also see the other zoo vignettes and help files. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Maybe Matching Threads
- convert zoo object to "standard" R object so I can plot and output to csv file
- ggplot2: two time series with different dates in a single plot
- Combining month and year into a single variable
- Trouble when suppressing a portion of fast-math-transformations
- Hourly data with zoo