thr3ads.net - R help - [R] Writing a single output file [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Amy Milano

2010-Dec-23 13:07 UTC

[R] Writing a single output file

Dear R helpers!

Let me first wish all of you "Merry Christmas and Very Happy New year
2011"

"Christmas day is a day of Joy and Charity,
May God make you rich in both" - Phillips Brooks

##
----------------------------------------------------------------------------------------------------------------------------

I have a process which generates number of outputs. The R code for the same is
as given below.

for(i in 1:n)   
{                                                                                                        
write.csv(output[i], file = paste("output", i, ".csv", sep =
""), row.names =
FALSE)                                                                    
} 

Depending on value of 'n', I get different output files. 

Suppose n = 3, that means I am having three output csv files viz.
'output1.csv', 'output2.csv' and 'output3.csv'

output1.csv
date               yield_rate
12/23/2010        5.25
12/22/2010        5.19
.................................
.................................


output2.csv

date               yield_rate

12/23/2010        4.16

12/22/2010        4.59

.................................

.................................

output3.csv


date               yield_rate


12/23/2010        6.15


12/22/2010        6.41


.................................


.................................



Thus all the output files have same column names viz. Date and yield_rate. Also,
I do need these files individually too.

My further requirement is to have a single dataframe as given below.

Date             yield_rate1               yield_rate2               
yield_rate3
12/23/2010       5.25                          4.16                         
6.15
12/22/2010       5.19                          4.59                         
6.41
...............................................................................................
...............................................................................................

where yield_rate1 = output1$yield_rate and so on.

One way is to simply create a dataframe as 

df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 = 
read.csv('output1.csv')$yield_rate,   yield_rate2 =
read.csv('output2.csv')$yield_rate,
yield_rate3 = read.csv('output3.csv')$yield_rate)

However, the problem arises when I am not aware how many output files are there
as n can be 5 or even 100.      

So is it possible to write some loop or some function which will enable me to
read 'n' files individually and then keeping "Date" common,
only pickup the yield_curve data from each output file.

Thanking in advance for any guidance.

Regards

Amy




      
	[[alternative HTML version deleted]]

jim holtman

2010-Dec-23 13:39 UTC

head link

[R] Writing a single output file

This should get you close:
> # get file names
> setwd('/temp')
> fileNames <- list.files(pattern = "file.*.csv")
> fileNames[1] "file1.csv" "file2.csv" "file3.csv"
"file4.csv"> input <- do.call(rbind, lapply(fileNames, function(.name){+     .data <- read.table(.name, header = TRUE, as.is = TRUE)
+     # add file name to the data
+     .data$file <- .name
+     .data
+ }))> input        date yield_rate      file
1 12/23/2010       5.25 file1.csv
2 12/22/2010       5.19 file1.csv
3 12/23/2010       5.25 file2.csv
4 12/22/2010       5.19 file2.csv
5 12/23/2010       5.25 file3.csv
6 12/22/2010       5.19 file3.csv
7 12/23/2010       5.25 file4.csv
8 12/22/2010       5.19 file4.csv> require(reshape)
> in.melt <- melt(input, measure = 'yield_rate')
> cast(in.melt, date ~ file)        date file1.csv file2.csv file3.csv file4.csv
1 12/22/2010      5.19      5.19      5.19      5.19
2 12/23/2010      5.25      5.25      5.25      5.25>

On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com>
wrote:> Dear R helpers!
>
> Let me first wish all of you "Merry Christmas and Very Happy New year
2011"
>
> "Christmas day is a day of Joy and Charity,
> May God make you rich in both" - Phillips Brooks
>
> ##
----------------------------------------------------------------------------------------------------------------------------
>
> I have a process which generates number of outputs. The R code for the same
is as given below.
>
> for(i in 1:n)
> {
> write.csv(output[i], file = paste("output", i, ".csv",
sep = ""), row.names = FALSE)
> }
>
> Depending on value of 'n', I get different output files.
>
> Suppose n = 3, that means I am having three output csv files viz.
'output1.csv', 'output2.csv' and 'output3.csv'
>
> output1.csv
> date?????????????? yield_rate
> 12/23/2010??????? 5.25
> 12/22/2010??????? 5.19
> .................................
> .................................
>
>
> output2.csv
>
> date?????????????? yield_rate
>
> 12/23/2010??????? 4.16
>
> 12/22/2010??????? 4.59
>
> .................................
>
> .................................
>
> output3.csv
>
>
> date?????????????? yield_rate
>
>
> 12/23/2010??????? 6.15
>
>
> 12/22/2010??????? 6.41
>
>
> .................................
>
>
> .................................
>
>
>
> Thus all the output files have same column names viz. Date and yield_rate.
Also, I do need these files individually too.
>
> My further requirement is to have a single dataframe as given below.
>
> Date???????????? yield_rate1?????????????? yield_rate2???????????????
yield_rate3
> 12/23/2010?????? 5.25?????????????????????????
4.16????????????????????????? 6.15
> 12/22/2010?????? 5.19 ? ? ? ? ? ? ? ? ? ? ? ?? 4.59 ? ? ? ? ? ? ? ? ? ? ?
?? 6.41
>
...............................................................................................
>
...............................................................................................
>
> where yield_rate1 = output1$yield_rate and so on.
>
> One way is to simply create a dataframe as
>
> df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 =?
read.csv('output1.csv')$yield_rate,?? yield_rate2 =
read.csv('output2.csv')$yield_rate,
> yield_rate3 = read.csv('output3.csv')$yield_rate)
>
> However, the problem arises when I am not aware how many output files are
there as n can be 5 or even 100.
>
> So is it possible to write some loop or some function which will enable me
to read 'n' files individually and then keeping "Date" common,
only pickup the yield_curve data from each output file.
>
> Thanking in advance for any guidance.
>
> Regards
>
> Amy
>
>
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

Hadley Wickham

2010-Dec-23 16:28 UTC

head link

[R] Writing a single output file

>> input <- do.call(rbind, lapply(fileNames, function(.name){
> + ? ? .data <- read.table(.name, header = TRUE, as.is = TRUE)
> + ? ? # add file name to the data
> + ? ? .data$file <- .name
> + ? ? .data
> + }))
You can simplify this a little with plyr:

fileNames <- list.files(pattern = "file.*.csv")
names(fileNames) <- fileNames

input <- ldply(fileNames, read.table, header = TRUE, as.is = TRUE)

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Gabor Grothendieck

2010-Dec-23 16:48 UTC

head link

[R] Writing a single output file

On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com>
wrote:> Dear R helpers!
>
> Let me first wish all of you "Merry Christmas and Very Happy New year
2011"
>
> "Christmas day is a day of Joy and Charity,
> May God make you rich in both" - Phillips Brooks
>
> ##
----------------------------------------------------------------------------------------------------------------------------
>
> I have a process which generates number of outputs. The R code for the same
is as given below.
>
> for(i in 1:n)
> {
> write.csv(output[i], file = paste("output", i, ".csv",
sep = ""), row.names = FALSE)
> }
>
> Depending on value of 'n', I get different output files.
>
> Suppose n = 3, that means I am having three output csv files viz.
'output1.csv', 'output2.csv' and 'output3.csv'
>
> output1.csv
> date?????????????? yield_rate
> 12/23/2010??????? 5.25
> 12/22/2010??????? 5.19
> .................................
> .................................
>
>
> output2.csv
>
> date?????????????? yield_rate
>
> 12/23/2010??????? 4.16
>
> 12/22/2010??????? 4.59
>
> .................................
>
> .................................
>
> output3.csv
>
>
> date?????????????? yield_rate
>
>
> 12/23/2010??????? 6.15
>
>
> 12/22/2010??????? 6.41
>

In the development version of zoo you can do all this in basically one
read.zoo command producing the required zoo series:

# chron's default date format is the same as in the output*.csv files
library(chron)

# pull in development version of read.zoo
library(zoo)
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=813&root=zoo")

# this does it
z <- read.zoo(Sys.glob("output*.csv"), header = TRUE, FUN =
as.chron)

as.data.frame(z) or data.frame(Time = time(z), coredata(z)) can be
used to convert z to a data frame with times as row names or a data
frame with times in column respectively (although you may wish to just
leave it as a zoo object so you can take advantage of zoo's other
facilities too).


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Adaikalavan Ramasamy

2010-Dec-25 20:16 UTC

head link

[R] Writing a single output file

Many ways of doing this and you have to think about efficiency and 
logisitcs of different approaches.

If the data is not large, you can read all n files into a list and then 
combine. If data is very large, you may wish to read one file at a time, 
combining and then deleting it before reading the next file. You can use 
cbind() to combine if all the Date columns are the same, otherwise 
merge() is useful.

The simple brute force approach would be:

  fns <- list.files(pattern="^output")
  do.call( "cbind", lapply(fns, read.csv, row.names=1) )


The slightly more optimized and flexible optiop but slightly less 
elegant could be something like this:

  fns <- list.files(pattern="^output")
  out <- read.csv(fns[1], row.names=NULL)

  for(fn in fns[-1]){
    tmp <- read.csv(fn, row.names=NULL)
    out <- merge(out, tmp, by=1, all=T)
    rm(tmp); gc()
  }

You have to see which option is best for your file sizes. Good luck.

Regards, Adai



On 23/12/2010 13:07, Amy Milano wrote:> Dear R helpers!
>
> Let me first wish all of you "Merry Christmas and Very Happy New year
2011"
>
> "Christmas day is a day of Joy and Charity,
> May God make you rich in both" - Phillips Brooks
>
> ##
----------------------------------------------------------------------------------------------------------------------------
>
> I have a process which generates number of outputs. The R code for the same
is as given below.
>
> for(i in 1:n)
> {
> write.csv(output[i], file = paste("output", i, ".csv",
sep = ""), row.names = FALSE)
> }
>
> Depending on value of 'n', I get different output files.
>
> Suppose n = 3, that means I am having three output csv files viz.
'output1.csv', 'output2.csv' and 'output3.csv'
>
> output1.csv
> date               yield_rate
> 12/23/2010        5.25
> 12/22/2010        5.19
> .................................
> .................................
>
>
> output2.csv
>
> date               yield_rate
>
> 12/23/2010        4.16
>
> 12/22/2010        4.59
>
> .................................
>
> .................................
>
> output3.csv
>
>
> date               yield_rate
>
>
> 12/23/2010        6.15
>
>
> 12/22/2010        6.41
>
>
> .................................
>
>
> .................................
>
>
>
> Thus all the output files have same column names viz. Date and yield_rate.
Also, I do need these files individually too.
>
> My further requirement is to have a single dataframe as given below.
>
> Date             yield_rate1               yield_rate2               
yield_rate3
> 12/23/2010       5.25                          4.16                        
6.15
> 12/22/2010       5.19                          4.59                        
6.41
>
...............................................................................................
>
...............................................................................................
>
> where yield_rate1 = output1$yield_rate and so on.
>
> One way is to simply create a dataframe as
>
> df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 = 
read.csv('output1.csv')$yield_rate,   yield_rate2 =
read.csv('output2.csv')$yield_rate,
> yield_rate3 = read.csv('output3.csv')$yield_rate)
>
> However, the problem arises when I am not aware how many output files are
there as n can be 5 or even 100.
>
> So is it possible to write some loop or some function which will enable me
to read 'n' files individually and then keeping "Date" common,
only pickup the yield_curve data from each output file.
>
> Thanking in advance for any guidance.
>
> Regards
>
> Amy
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>

Amy Milano

2010-Dec-30 06:18 UTC

head link

[R] Writing a single output file

Dear sir,

At the outset I sincerely apologize for reverting back bit late as I was out of
office. I thank you for your guidance extended by you in response to my earlier
mail regarding "Writing a single output file" where I was trying to
read multiple output files and create a single output date.frame. However, I
think things are not working as I am mentioning below -


# Your code

setwd('/temp')
fileNames <- list.files(pattern = "file.*.csv")

input <- do.call(rbind, lapply(fileNames, function(.name)
{
.data <- read.table(.name, header = TRUE, as.is = TRUE)
.data$file <- .name
.data
}))


# This produces following output containing only two columns and moreover date
and yield_rates are clubbed together.


 
 date.yield_rate      file
1   12/23/10,5.25 file1.csv
2   12/22/10,5.19 file1.csv
3   12/23/10,4.16 file2.csv
4   12/22/10,4.59 file2.csv
5   12/23/10,6.15 file3.csv
6   12/22/10,6.41 file3.csv
7   12/23/10,8.15 file4.csv
8   12/22/10,8.68 file4.csv


# and NOT the kind of output given below where date and yield_rates are
different.
> input        date      yield_rate      file
1 12/23/2010       5.25 file1.csv
2 12/22/2010       5.19 file1.csv
3 12/23/2010       5.25 file2.csv
4 12/22/2010       5.19 file2.csv
5 12/23/2010       5.25 file3.csv
6
 12/22/2010       5.19 file3.csv
7 12/23/2010       5.25 file4.csv
8 12/22/2010       5.19 file4.csv

So when I tried following code to produce the required result, it throws me an
error.

require(reshape)

in.melt <- melt(input, measure =
'yield_rate')> in.melt <- melt(input, measure = 'yield_rate')Error: measure variables not found in data: yield_rate

# So I tried 

in.melt <- melt(input, measure = 'date.yield_rate')


cast(in.melt, date.yield_rate ~ file)
> cast(in.melt, date ~ file)Error: Casting formula contains variables not found in molten data: date

# If I try to change it as 

cast(in.melt, date.yield_rate ~ file)    # Gives following error.
Error: Casting formula contains variables not found in molten data:
date.yield_rate

Sir, it will be a
 great help if you can guide me and once again sinserely apologize for reverting
so late.

Regards

Amy


--- On Thu, 12/23/10, jim holtman <jholtman@gmail.com> wrote:

From: jim holtman <jholtman@gmail.com>
Subject: Re: [R] Writing a single output file
To: "Amy Milano" <milano_amy@yahoo.com>
Cc: r-help@r-project.org
Date: Thursday, December 23, 2010, 1:39 PM

This should get you close:
> # get file names
> setwd('/temp')
> fileNames <- list.files(pattern = "file.*.csv")
> fileNames[1] "file1.csv" "file2.csv" "file3.csv"
"file4.csv"> input <- do.call(rbind, lapply(fileNames, function(.name){+     .data <- read.table(.name, header = TRUE, as.is = TRUE)
+     # add
 file name to the data
+     .data$file <- .name
+     .data
+ }))> input        date yield_rate      file
1 12/23/2010       5.25 file1.csv
2 12/22/2010       5.19 file1.csv
3 12/23/2010       5.25 file2.csv
4 12/22/2010       5.19 file2.csv
5 12/23/2010       5.25 file3.csv
6 12/22/2010       5.19 file3.csv
7 12/23/2010       5.25 file4.csv
8 12/22/2010       5.19 file4.csv> require(reshape)
> in.melt <- melt(input, measure = 'yield_rate')
> cast(in.melt, date ~ file)        date file1.csv file2.csv file3.csv file4.csv
1 12/22/2010      5.19      5.19 
     5.19      5.19
2 12/23/2010      5.25      5.25      5.25      5.25>

On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy@yahoo.com>
wrote:> Dear R helpers!
>
> Let me first wish all of you "Merry Christmas and Very Happy New year
2011"
>
> "Christmas day is a day of Joy and Charity,
> May God make you rich in both" - Phillips Brooks
>
> ##
----------------------------------------------------------------------------------------------------------------------------
>
> I have a process which generates number of outputs. The R code for the same
is as given below.
>
> for(i in 1:n)
> {
> write.csv(output[i], file = paste("output", i, ".csv",
sep = ""), row.names  FALSE)
> }
>
> Depending on value of 'n', I get different output files.
>
> Suppose n = 3, that means I am having three output csv files viz.
'output1.csv', 'output2.csv' and 'output3.csv'
>
> output1.csv
> date               yield_rate
> 12/23/2010        5.25
> 12/22/2010        5.19
> .................................
> .................................
>
>
> output2.csv
>
> date               yield_rate
>
> 12/23/2010        4.16
>
> 12/22/2010        4.59
>
> .................................
>
>
 .................................>
> output3.csv
>
>
> date               yield_rate
>
>
> 12/23/2010        6.15
>
>
> 12/22/2010        6.41
>
>
> .................................
>
>
> .................................
>
>
>
> Thus all the output files have same column names viz. Date and yield_rate.
Also, I do need these files individually too.
>
> My further requirement is to have a single dataframe as given below.
>
> Date             yield_rate1              
 yield_rate2                yield_rate3> 12/23/2010       5.25                         
4.16                          6.15
> 12/22/2010       5.19                          4.59                      
   6.41
>
...............................................................................................
>
...............................................................................................
>
> where
 yield_rate1 = output1$yield_rate and so on.>
> One way is to simply create a dataframe as
>
> df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 = 
read.csv('output1.csv')$yield_rate,   yield_rate2 =
read.csv('output2.csv')$yield_rate,
> yield_rate3 = read.csv('output3.csv')$yield_rate)
>
> However, the problem arises when I am not aware how many output files are
there as n can be 5 or even 100.
>
> So is it possible to write some loop or some function which will enable me
to read 'n' files individually and then keeping "Date" common,
only pickup the yield_curve data from each output file.
>
> Thanking in advance for any guidance.
>
> Regards
>
> Amy
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
>
 ______________________________________________> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



      
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more reasonably related threads

R help - Dec 2010 - Writing a single output file

[R] Writing a single output file

[R] Writing a single output file

[R] Writing a single output file

[R] Writing a single output file

[R] Writing a single output file

[R] Writing a single output file

Maybe Matching Threads