thr3ads.net - R help - [R] how to combine data of several csv-files [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Antje

2007-Jul-30 08:00 UTC

[R] how to combine data of several csv-files

Hello,

I'm looking for a solution for the following problem:

1) I have a folder with several csv files; each contains a set of 
measurement values
2) The measurements of each file belong to a position in a two 
dimensional matrix (lets say "B02.csv" belongs to position 2,2
3) The size of the matrix is fix
4) I cannot assure to have a csv file for each position
5) Each position belongs to one category; This information is available 
in a file (means 2,2 and 2,3 may belong to category "c1"; 3,2 and 3,3 
may belong to category "c2")

Now, I process each available file and get a vector of 6 values or NA back.

The aim is to calculate mean and sd for vectors (element wise) coming 
from the same category (means if vec1 <- c(1,2,3,4,5,6) and vec2 <- 
c(6,7,8,9,10,11) belong to the same category, I would like to get mean 
<- c(3.5, 4.5, 5.5, 6.5, 7.5, 8.5))

... but I'm not sure how to proceed. I end up with a list containing 
these vectors for each processed file and I don't know how to combine 
them easily...

Does anybody have a suggestion for me?

What I've got so far:

folder <- choose.dir(getwd(), "Choose folder containing csv files")
setwd(folder)

rowString <- LETTERS[1:8]; cols <- 12

mat <- outer(rowString, formatC(seq(2,length=cols), flag = "0",
width =
2), paste, sep = "")
mat <- paste(mat, ".csv", sep = "_")

layoutfilename <- file.choose()
layoutfile <- read.csv(layoutfilename, sep=";", header=F,
na.strings = "")

classmatrix <- sapply(layoutfile,as.character)
classes <- factor(classmatrix)

colnames(classmatrix) <- c(1:cols)
rownames(classmatrix) <- rowString

ret <- sapply(mat, calcHist)

Antje

2007-Jul-30 11:09 UTC

head link

[R] how to combine data of several csv-files

Hello,

sorry for this confusion but I don't know a better way to explain...
I have no problems to read in the files and to process them. I end up 
with a list of results like this:

 > ret
$A02.csv
[1] NA

$B02.csv
[1] 89.130435  8.695652  2.173913  0.000000  0.000000  0.000000  9.892473

$C02.csv
[1] 86.842105 10.526316  2.631579  0.000000  0.000000  0.000000 10.026385

$D02.csv
[1] 85.000000 10.000000  5.000000  0.000000  0.000000  0.000000  4.474273

$E02.csv
[1] 70.786517 13.483146  7.865169  5.617978  2.247191  0.000000 12.125341

$F02.csv
[1] 70.83333 14.16667 10.00000  2.50000  2.50000  0.00000 17.26619

$G02.csv
[1] 64.772727 13.636364  7.954545 11.363636  2.272727  0.000000 12.735166

$H02.csv
[1] NA

$A03.csv
[1] NA

and I have a matrix with categories like this:


 > classmatrix
   1  2
A NA NA
B NA "cat1"
C NA "cat1"
D NA "cat1"
E NA "cat2"
F NA "cat2"
G NA "cat2"
H NA NA


Now, I'm looking for a way to calculate the mean element wise for all 
results coming from the same category:

in this case the mean of the elements:

$B02.csv
$C02.csv
$D02.csv

(belonging to "cat1")

I just don't know, how to combine the result list with the categories...

Does it become clearer? Probably, I try to provide a simple example but 
this will take some time to prepare...

Thanks anyway!

Antje




8rino-Luca Pantani schrieb:> I'm unclear to what it is your problem.
> Import files into data frame?
> Combine them in one dataframe?
> Some (written) examples of the files would help people to help you out.
> 
> An example on how to get help better and faster
>  >>>>>>>>>>>>
> I have several csv files in the following form
> V1 V2
> 1   4
> 0.3   56
> ................
> V1   V2
> 2.5   25
> 4.5  45
> .....................
> 
> I would like to import them in only one dataframe, and then recode a 
> column in order to get
> V1 V2 V3
> 1   4   file1
> 0.3   56   file1
> 2.5   25   file2
> 4.5  45   file2
> .....................
>  >>>>>>>>>>>>
> Antje ha scritto:
>> Hello,
>>
>> I'm looking for a solution for the following problem:
>>
>> 1) I have a folder with several csv files; each contains a set of 
>> measurement values
>>   
>

Antje

2007-Jul-30 12:09 UTC

head link

[R] how to combine data of several csv-files

okay, I played a bit around and now I have some kind of testcase for you:

v1 <- NA
v2 <- rnorm(6)
v3 <- rnorm(6)
v4 <- rnorm(6)
v5 <- rnorm(6)
v6 <- rnorm(6)
v7 <- rnorm(6)
v8 <- rnorm(6)
v8 <- NA

list <- list(v1,v2,v3,v4,v5,v6,v7,v8)
categ <-
c(NA,"cat1","cat1","cat1","cat2","cat2","cat2",NA)

 > list
[[1]]
[1] NA

[[2]]
[1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750

[[3]]
[1]  0.02354036 -1.36186952 -0.42197792  1.50445971 -1.76763996  0.53722404

[[4]]
[1] -1.40362589  0.13045724 -0.84651458  1.57005071  0.06961015  0.25269771

[[5]]
[1] -1.1829260  2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698

[[6]]
[1]  1.17099178  0.49248118 -0.18690065  1.50050976 -0.65552410 -0.01243247

[[7]]
[1] -0.046778203 -0.233788840  0.443908897 -1.649740180  0.003991354
-0.228020092

[[8]]
[1] NA

now, I need the means (and sd) of element 1 of list[2],list[3],list[4] (because
they belong to "cat1") and

= mean(-0.6442149, 0.02354036, -1.40362589)

the same for element 2 up to element 6 (--> I would the get a vector
containing the means for "cat1")
the same for the vectors belonging to "cat2".

does anybody now understand what I mean?

Antje

niederlein-rstat at yahoo.de

2007-Jul-30 12:39 UTC

head link

[R] how to combine data of several csv-files

Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt.
Name: nicht verf?gbar
URL:
stat.ethz.ch/pipermail/r-help/attachments/20070730/60e9c4b2/attachment.pl

Antje

2007-Jul-30 14:18 UTC

head link

[R] how to combine data of several csv-files

Hello,

thank you for your help. But I guess, it's still not what I want... printing
df.my gives me

df.my
   v1         v2          v3          v4         v5          v6           v7 v8
1 NA -0.6442149  0.02354036 -1.40362589 -1.1829260  1.17099178 -0.046778203 NA
2 NA -0.2047012 -1.36186952  0.13045724  2.1411553  0.49248118 -0.233788840 NA
3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065  0.443908897 NA
4 NA -0.2097442  1.50445971  1.57005071 -0.1053442  1.50050976 -1.649740180 NA
5 NA -0.7343465 -1.76763996  0.06961015 -0.8179396 -0.65552410  0.003991354 NA
6 NA -1.3888750  0.53722404  0.25269771 -1.2342698 -0.01243247 -0.228020092 NA

now, I have to combine like this:

   v1         v2          v3          v4         v5          v6           v7    
v8
   NA	     cat1	 cat1	     cat1       cat2        cat2         cat2   NA

-->

mean(df.my$v2[1],df.my$v3[1],df.my$v4[1])
mean(df.my$v2[2],df.my$v3[2],df.my$v4[2])
mean(df.my$v2[3],df.my$v3[3],df.my$v4[3])
mean(df.my$v2[4],df.my$v3[4],df.my$v4[4])
mean(df.my$v2[5],df.my$v3[5],df.my$v4[5])
mean(df.my$v2[6],df.my$v3[6],df.my$v4[6])

the same for v5, v6 and v7

further, I'm not sure how to avoid the list, because this is the result of
the processing I did before...

Ciao,
Antje


8rino-Luca Pantani schrieb:> I hope I see.
> 
> Why not try the following, and avoid lists, which I'm not still able to
> manage properly ;-)
> v1 <- NA
> v2 <- rnorm(6)
> v3 <- rnorm(6)
> v4 <- rnorm(6)
> v5 <- rnorm(6)
> v6 <- rnorm(6)
> v7 <- rnorm(6)
> v8 <- rnorm(6)
> v8 <- NA
> (df.my <- cbind.data.frame(v1, v2, v3, v4, v5, v6, v7, v8))
> (df.my2 <- reshape(df.my,
>                 
varying=list(c("v1","v2","v3",
"v4","v5","v6","v7","v8")),
>                  idvar="sequential",
>                  timevar="cat",
>                  direction="long"
>        ))
> aggregate(df.my2$v1, by=list(category=df.my2$cat), mean)
> aggregate(df.my2$v1, by=list(category=df.my2$cat), function(x){sd(x, 
> na.rm = TRUE)})
> 
> 
> Antje ha scritto:
>> okay, I played a bit around and now I have some kind of testcase for
you:
>>
>> v1 <- NA
>> v2 <- rnorm(6)
>> v3 <- rnorm(6)
>> v4 <- rnorm(6)
>> v5 <- rnorm(6)
>> v6 <- rnorm(6)
>> v7 <- rnorm(6)
>> v8 <- rnorm(6)
>> v8 <- NA
>>
>> list <- list(v1,v2,v3,v4,v5,v6,v7,v8)
>> categ <-
c(NA,"cat1","cat1","cat1","cat2","cat2","cat2",NA)
>>
>> > list
>> [[1]]
>> [1] NA
>>
>> [[2]]
>> [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750
>>
>> [[3]]
>> [1]  0.02354036 -1.36186952 -0.42197792  1.50445971 -1.76763996  
>> 0.53722404
>>
>> [[4]]
>> [1] -1.40362589  0.13045724 -0.84651458  1.57005071  0.06961015  
>> 0.25269771
>>
>> [[5]]
>> [1] -1.1829260  2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698
>>
>> [[6]]
>> [1]  1.17099178  0.49248118 -0.18690065  1.50050976 -0.65552410 
>> -0.01243247
>>
>> [[7]]
>> [1] -0.046778203 -0.233788840  0.443908897 -1.649740180  0.003991354 
>> -0.228020092
>>
>> [[8]]
>> [1] NA
>>
>> now, I need the means (and sd) of element 1 of list[2],list[3],list[4] 
>> (because they belong to "cat1") and
>>
>> = mean(-0.6442149, 0.02354036, -1.40362589)
>>
>> the same for element 2 up to element 6 (--> I would the get a vector
>> containing the means for "cat1")
>> the same for the vectors belonging to "cat2".
>>
>> does anybody now understand what I mean?
>>
>> Antje
>>
>>
>>
>

R help - Jul 2007 - how to combine data of several csv-files

[R] how to combine data of several csv-files

[R] how to combine data of several csv-files

[R] how to combine data of several csv-files

[R] how to combine data of several csv-files

[R] how to combine data of several csv-files