Hello, I'm looking for a solution for the following problem: 1) I have a folder with several csv files; each contains a set of measurement values 2) The measurements of each file belong to a position in a two dimensional matrix (lets say "B02.csv" belongs to position 2,2 3) The size of the matrix is fix 4) I cannot assure to have a csv file for each position 5) Each position belongs to one category; This information is available in a file (means 2,2 and 2,3 may belong to category "c1"; 3,2 and 3,3 may belong to category "c2") Now, I process each available file and get a vector of 6 values or NA back. The aim is to calculate mean and sd for vectors (element wise) coming from the same category (means if vec1 <- c(1,2,3,4,5,6) and vec2 <- c(6,7,8,9,10,11) belong to the same category, I would like to get mean <- c(3.5, 4.5, 5.5, 6.5, 7.5, 8.5)) ... but I'm not sure how to proceed. I end up with a list containing these vectors for each processed file and I don't know how to combine them easily... Does anybody have a suggestion for me? What I've got so far: folder <- choose.dir(getwd(), "Choose folder containing csv files") setwd(folder) rowString <- LETTERS[1:8]; cols <- 12 mat <- outer(rowString, formatC(seq(2,length=cols), flag = "0", width = 2), paste, sep = "") mat <- paste(mat, ".csv", sep = "_") layoutfilename <- file.choose() layoutfile <- read.csv(layoutfilename, sep=";", header=F, na.strings = "") classmatrix <- sapply(layoutfile,as.character) classes <- factor(classmatrix) colnames(classmatrix) <- c(1:cols) rownames(classmatrix) <- rowString ret <- sapply(mat, calcHist)
Hello, sorry for this confusion but I don't know a better way to explain... I have no problems to read in the files and to process them. I end up with a list of results like this: > ret $A02.csv [1] NA $B02.csv [1] 89.130435 8.695652 2.173913 0.000000 0.000000 0.000000 9.892473 $C02.csv [1] 86.842105 10.526316 2.631579 0.000000 0.000000 0.000000 10.026385 $D02.csv [1] 85.000000 10.000000 5.000000 0.000000 0.000000 0.000000 4.474273 $E02.csv [1] 70.786517 13.483146 7.865169 5.617978 2.247191 0.000000 12.125341 $F02.csv [1] 70.83333 14.16667 10.00000 2.50000 2.50000 0.00000 17.26619 $G02.csv [1] 64.772727 13.636364 7.954545 11.363636 2.272727 0.000000 12.735166 $H02.csv [1] NA $A03.csv [1] NA and I have a matrix with categories like this: > classmatrix 1 2 A NA NA B NA "cat1" C NA "cat1" D NA "cat1" E NA "cat2" F NA "cat2" G NA "cat2" H NA NA Now, I'm looking for a way to calculate the mean element wise for all results coming from the same category: in this case the mean of the elements: $B02.csv $C02.csv $D02.csv (belonging to "cat1") I just don't know, how to combine the result list with the categories... Does it become clearer? Probably, I try to provide a simple example but this will take some time to prepare... Thanks anyway! Antje 8rino-Luca Pantani schrieb:> I'm unclear to what it is your problem. > Import files into data frame? > Combine them in one dataframe? > Some (written) examples of the files would help people to help you out. > > An example on how to get help better and faster > >>>>>>>>>>>> > I have several csv files in the following form > V1 V2 > 1 4 > 0.3 56 > ................ > V1 V2 > 2.5 25 > 4.5 45 > ..................... > > I would like to import them in only one dataframe, and then recode a > column in order to get > V1 V2 V3 > 1 4 file1 > 0.3 56 file1 > 2.5 25 file2 > 4.5 45 file2 > ..................... > >>>>>>>>>>>> > Antje ha scritto: >> Hello, >> >> I'm looking for a solution for the following problem: >> >> 1) I have a folder with several csv files; each contains a set of >> measurement values >> >
okay, I played a bit around and now I have some kind of testcase for you: v1 <- NA v2 <- rnorm(6) v3 <- rnorm(6) v4 <- rnorm(6) v5 <- rnorm(6) v6 <- rnorm(6) v7 <- rnorm(6) v8 <- rnorm(6) v8 <- NA list <- list(v1,v2,v3,v4,v5,v6,v7,v8) categ <- c(NA,"cat1","cat1","cat1","cat2","cat2","cat2",NA) > list [[1]] [1] NA [[2]] [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750 [[3]] [1] 0.02354036 -1.36186952 -0.42197792 1.50445971 -1.76763996 0.53722404 [[4]] [1] -1.40362589 0.13045724 -0.84651458 1.57005071 0.06961015 0.25269771 [[5]] [1] -1.1829260 2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698 [[6]] [1] 1.17099178 0.49248118 -0.18690065 1.50050976 -0.65552410 -0.01243247 [[7]] [1] -0.046778203 -0.233788840 0.443908897 -1.649740180 0.003991354 -0.228020092 [[8]] [1] NA now, I need the means (and sd) of element 1 of list[2],list[3],list[4] (because they belong to "cat1") and = mean(-0.6442149, 0.02354036, -1.40362589) the same for element 2 up to element 6 (--> I would the get a vector containing the means for "cat1") the same for the vectors belonging to "cat2". does anybody now understand what I mean? Antje
niederlein-rstat at yahoo.de
2007-Jul-30 12:39 UTC
[R] how to combine data of several csv-files
Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt. Name: nicht verf?gbar URL: https://stat.ethz.ch/pipermail/r-help/attachments/20070730/60e9c4b2/attachment.pl
Hello, thank you for your help. But I guess, it's still not what I want... printing df.my gives me df.my v1 v2 v3 v4 v5 v6 v7 v8 1 NA -0.6442149 0.02354036 -1.40362589 -1.1829260 1.17099178 -0.046778203 NA 2 NA -0.2047012 -1.36186952 0.13045724 2.1411553 0.49248118 -0.233788840 NA 3 NA -1.1986041 -0.42197792 -0.84651458 -0.1327081 -0.18690065 0.443908897 NA 4 NA -0.2097442 1.50445971 1.57005071 -0.1053442 1.50050976 -1.649740180 NA 5 NA -0.7343465 -1.76763996 0.06961015 -0.8179396 -0.65552410 0.003991354 NA 6 NA -1.3888750 0.53722404 0.25269771 -1.2342698 -0.01243247 -0.228020092 NA now, I have to combine like this: v1 v2 v3 v4 v5 v6 v7 v8 NA cat1 cat1 cat1 cat2 cat2 cat2 NA --> mean(df.my$v2[1],df.my$v3[1],df.my$v4[1]) mean(df.my$v2[2],df.my$v3[2],df.my$v4[2]) mean(df.my$v2[3],df.my$v3[3],df.my$v4[3]) mean(df.my$v2[4],df.my$v3[4],df.my$v4[4]) mean(df.my$v2[5],df.my$v3[5],df.my$v4[5]) mean(df.my$v2[6],df.my$v3[6],df.my$v4[6]) the same for v5, v6 and v7 further, I'm not sure how to avoid the list, because this is the result of the processing I did before... Ciao, Antje 8rino-Luca Pantani schrieb:> I hope I see. > > Why not try the following, and avoid lists, which I'm not still able to > manage properly ;-) > v1 <- NA > v2 <- rnorm(6) > v3 <- rnorm(6) > v4 <- rnorm(6) > v5 <- rnorm(6) > v6 <- rnorm(6) > v7 <- rnorm(6) > v8 <- rnorm(6) > v8 <- NA > (df.my <- cbind.data.frame(v1, v2, v3, v4, v5, v6, v7, v8)) > (df.my2 <- reshape(df.my, > varying=list(c("v1","v2","v3", "v4","v5","v6","v7","v8")), > idvar="sequential", > timevar="cat", > direction="long" > )) > aggregate(df.my2$v1, by=list(category=df.my2$cat), mean) > aggregate(df.my2$v1, by=list(category=df.my2$cat), function(x){sd(x, > na.rm = TRUE)}) > > > Antje ha scritto: >> okay, I played a bit around and now I have some kind of testcase for you: >> >> v1 <- NA >> v2 <- rnorm(6) >> v3 <- rnorm(6) >> v4 <- rnorm(6) >> v5 <- rnorm(6) >> v6 <- rnorm(6) >> v7 <- rnorm(6) >> v8 <- rnorm(6) >> v8 <- NA >> >> list <- list(v1,v2,v3,v4,v5,v6,v7,v8) >> categ <- c(NA,"cat1","cat1","cat1","cat2","cat2","cat2",NA) >> >> > list >> [[1]] >> [1] NA >> >> [[2]] >> [1] -0.6442149 -0.2047012 -1.1986041 -0.2097442 -0.7343465 -1.3888750 >> >> [[3]] >> [1] 0.02354036 -1.36186952 -0.42197792 1.50445971 -1.76763996 >> 0.53722404 >> >> [[4]] >> [1] -1.40362589 0.13045724 -0.84651458 1.57005071 0.06961015 >> 0.25269771 >> >> [[5]] >> [1] -1.1829260 2.1411553 -0.1327081 -0.1053442 -0.8179396 -1.2342698 >> >> [[6]] >> [1] 1.17099178 0.49248118 -0.18690065 1.50050976 -0.65552410 >> -0.01243247 >> >> [[7]] >> [1] -0.046778203 -0.233788840 0.443908897 -1.649740180 0.003991354 >> -0.228020092 >> >> [[8]] >> [1] NA >> >> now, I need the means (and sd) of element 1 of list[2],list[3],list[4] >> (because they belong to "cat1") and >> >> = mean(-0.6442149, 0.02354036, -1.40362589) >> >> the same for element 2 up to element 6 (--> I would the get a vector >> containing the means for "cat1") >> the same for the vectors belonging to "cat2". >> >> does anybody now understand what I mean? >> >> Antje >> >> >> >