victor jimenez
2012-May-03 18:07 UTC
[Rd] loading multiple CSV files into a single data frame
Sometimes I have hundreds of CSV files scattered in a directory tree, resulting from experiments' executions. For instance, giving an example from my field, I may want to collect the performance of a processor for several design parameters such as "cache size" (possible values: 2, 4, 8 and 16) and "cache associativity" (possible values: direct-mapped, 4-way, fully-associative). The results of all these experiments will be stored in a directory tree like: results |-- direct-mapped | |-- 2 -- data.csv | |-- 4 -- data.csv | |-- 8 -- data.csv | |-- 16 -- data.csv |-- 4-way | |-- 2 -- data.csv | |-- 4 -- data.csv ... |-- fully-associative | |-- 2 -- data.csv | |-- 4 -- data.csv ... I am developing a package that would allow me to gather all those CSV into a single data frame. Currently, I just need to execute the following statement: dframe <- gather("results/@ASSOC@/@SIZE@/data.csv") and this command returns a data frame containing the columns ASSOC, SIZE and all the remaining columns inside the CSV files (in my case the processor performance), effectively loading all the CSV files into a single data frame. So, I would get something like: ASSOC, SIZE, PERF direct-mapped, 2, 1.4 direct-mapped, 4, 1.6 direct-mapped, 8, 1.7 direct-mapped, 16, 1.7 4-way, 2, 1.4 4-way, 4, 1.5 ... I would like to ask whether there is any similar functionality already implemented in R. If so, there is no need to reinvent the wheel :) If it is not implemented and the R community believes that this feature would be useful, I would be glad to contribute my code. Thank you, Victor P.S: I was not sure whether to submit this question to R-devel or R-help, but since it may lead to some programming discussion I decided to post it to R-devel. Please, let me know if it is better to move it to the other list. [[alternative HTML version deleted]]
Gabor Grothendieck
2012-May-03 18:54 UTC
[Rd] loading multiple CSV files into a single data frame
On Thu, May 3, 2012 at 2:07 PM, victor jimenez <betabandido at gmail.com> wrote:> Sometimes I have hundreds of CSV files scattered in a directory tree, > resulting from experiments' executions. For instance, giving an example > from my field, I may want to collect the performance of a processor for > several design parameters such as "cache size" (possible values: 2, 4, 8 > and 16) and "cache associativity" (possible values: direct-mapped, 4-way, > fully-associative). The results of all these experiments will be stored in > a directory tree like: > > results > ?|-- direct-mapped > ?| ? ? ? |-- 2 -- data.csv > ?| ? ? ? |-- 4 -- data.csv > ?| ? ? ? |-- 8 -- data.csv > ?| ? ? ? |-- 16 -- data.csv > ?|-- 4-way > ?| ? ? ? |-- 2 -- data.csv > ?| ? ? ? |-- 4 -- data.csv > ... > ?|-- fully-associative > ?| ? ? ? |-- 2 -- data.csv > ?| ? ? ? |-- 4 -- data.csv > ... > > I am developing a package that would allow me to gather all those CSV into > a single data frame. Currently, I just need to execute the following > statement: > > dframe <- gather("results/@ASSOC@/@SIZE@/data.csv") > > and this command returns a data frame containing the columns ASSOC, SIZE > and all the remaining columns inside the CSV files (in my case the > processor performance), effectively loading all the CSV files into a single > data frame. So, I would get something like: > > ASSOC, ? ? ? ? ?SIZE, PERF > direct-mapped, ? ? ? 2, ? ? 1.4 > direct-mapped, ? ? ? 4, ? ? 1.6 > direct-mapped, ? ? ? 8, ? ? 1.7 > direct-mapped, ? ? 16, ? ? 1.7 > 4-way, ? ? ? ? ? ? ? ? ? 2, ? ? 1.4 > 4-way, ? ? ? ? ? ? ? ? ? 4, ? ? 1.5 > ... > > I would like to ask whether there is any similar functionality already > implemented in R. If so, there is no need to reinvent the wheel :) > If it is not implemented and the R community believes that this feature > would be useful, I would be glad to contribute my code. >If your csv files all have the same columns and represent time series then read.zoo in the zoo package can read multiple csv files in at once using a single read.zoo command producing a single zoo object. library(zoo) ?read.zoo vignette("zoo-read") Also see the other zoo vignettes and help files. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Apparently Analagous Threads
- convert zoo object to "standard" R object so I can plot and output to csv file
- ggplot2: two time series with different dates in a single plot
- Combining month and year into a single variable
- Trouble when suppressing a portion of fast-math-transformations
- Hourly data with zoo