Dear R-helpers, I'm stuck with a little problem that surely has an easy solution but I can't think of a way to solve it. I'd really appreciate any help you can offer me! I'll provide a small example. Given a dataframe data.txt that looks like this: ID freq Var Var_mean Ratio_mean Var_median Ratio_median Var_sum Ratio_min Var_max Ratio_max Var_min Ratio_sum 134 5 2.140 0.447 4.784 0.272 7.881 2.237 0.957 0.833 2.568 0.187 11.437 30 4 1.743 0.450 3.873 0.358 4.869 1.799 0.968 0.915 1.904 0.169 10.329 137 4 1.401 0.304 4.614 0.310 4.514 1.215 1.153 0.480 2.921 0.114 12.268 9 3 0.065 0.023 2.849 0.021 3.108 0.069 0.950 0.030 2.137 0.017 3.799 35 3 0.192 0.067 2.849 0.025 7.756 0.202 0.950 0.153 1.258 0.025 7.756 95 3 1.365 0.360 3.792 0.335 4.073 1.080 1.264 0.484 2.820 0.261 5.237 120 3 5.171 1.891 2.735 0.542 9.532 5.672 0.912 4.970 1.040 0.160 32.408 123 3 0.721 0.226 3.182 0.155 4.661 0.679 1.061 0.372 1.939 0.153 4.704 133 3 3.242 0.918 3.531 0.921 3.519 2.754 1.177 1.551 2.090 0.282 11.512 136 3 1.058 0.371 2.850 0.337 3.141 1.113 0.950 0.449 2.358 0.328 3.225 140 3 5.935 1.877 3.162 0.327 18.167 5.630 1.054 5.114 1.160 0.189 31.332 141 3 0.974 0.325 2.997 0.160 6.092 0.975 0.999 0.661 1.473 0.154 6.330 147 3 3.207 0.798 4.018 0.951 3.373 2.394 1.339 1.148 2.793 0.296 10.840 2 2 3.859 2.020 1.911 2.020 1.911 4.039 0.955 3.714 1.039 0.326 11.851 8 2 0.017 0.009 1.900 0.009 1.900 0.018 0.950 0.014 1.266 0.005 3.799 10 2 0.096 0.060 1.603 0.060 1.603 0.120 0.802 0.060 1.599 0.060 1.607 26 2 7.308 0.504 14.500 0.504 14.500 1.008 7.250 0.813 8.990 0.195 37.459 46 2 9.070 7.542 1.203 7.542 1.203 15.085 0.601 7.576 1.197 7.509 1.208 49 2 9.485 5.035 1.884 5.035 1.884 10.070 0.942 9.406 1.008 0.664 14.289 63 2 21.308 13.956 1.527 13.956 1.527 27.912 0.763 14.148 1.506 13.764 1.548 And I need to calculate the mean of every column whose name starts with Ratio_. That is, I need to calculate the mean of the columns Ratio_mean (3.4882), Ratio_median (5.2607), Ratio_sum (1.29985), Ratio_min (2.1533), Ratio_max (11.1469). But I would like to have the results just at once, in one column or one after the other (and not to run the code for every single result). I do have almost 200 columns that start with Ratio_, so doing it by hand it's really burdensome... I tried with a "for" loop: for (x in 1:5) { x <- with(data, c(Ratio_mean, Ratio_median, Ratio_sum, Ratio_max, Ratio_min)) Mean <- with(data mean(x)) print(Mean) } But the result is: [1] 4.66979 [1] 4.66979 [1] 4.66979 [1] 4.66979 [1] 4.66979 which is the mean of all the columns together. I would like to obtain the mean of every column in each line: [1] 3.4882 [1] 5.2607 [1] 1.29985 [1] 2.1533 [1] 11.1469 I think my problem is that I don't know how to assign correctly the variable x. Does anyone know if that is possible or an alternative way to get that result? Thank you so much in advance! [[alternative HTML version deleted]]
Part of the answer is that you're overwriting your loop variable x and the variable Mean in each loop iteration. I have no idea what you think with(data, c()) is doing, either, and there's at least one comma missing. Also, data is the name of a function. That said, apply(mydata[ , grepl("^Ratio_", colnames(mydata)], 2, mean) will take the mean of all columns whose name starts with Ratio_. You could also use colMeans(mydata[ , grepl("^Ratio_", colnames(mydata)]) if mean isn't a surrogate for some more complex function. Sarah On Wed, Jun 13, 2012 at 9:41 AM, Nympha Nymphaea <nymphita at gmail.com> wrote:> Dear R-helpers, > > I'm stuck with a little problem that surely has an easy solution but I > can't think of a way to solve it. I'd really appreciate any help you can > offer me! > > I'll provide a small example. Given a dataframe data.txt that looks like > this: > > ID ? ?freq ? ?Var ? ?Var_mean ? ?Ratio_mean ? ?Var_median > Ratio_median ? ?Var_sum ? ?Ratio_min ? ?Var_max ? ?Ratio_max ? ?Var_min > Ratio_sum > 134 ? ?5 ? ?2.140 ? ?0.447 ? ?4.784 ? ?0.272 ? ?7.881 ? ?2.237 ? ?0.957 > 0.833 ? ?2.568 ? ?0.187 ? ?11.437 > 30 ? ?4 ? ?1.743 ? ?0.450 ? ?3.873 ? ?0.358 ? ?4.869 ? ?1.799 ? ?0.968 > 0.915 ? ?1.904 ? ?0.169 ? ?10.329 > 137 ? ?4 ? ?1.401 ? ?0.304 ? ?4.614 ? ?0.310 ? ?4.514 ? ?1.215 ? ?1.153 > 0.480 ? ?2.921 ? ?0.114 ? ?12.268 > 9 ? ?3 ? ?0.065 ? ?0.023 ? ?2.849 ? ?0.021 ? ?3.108 ? ?0.069 ? ?0.950 > 0.030 ? ?2.137 ? ?0.017 ? ?3.799 > 35 ? ?3 ? ?0.192 ? ?0.067 ? ?2.849 ? ?0.025 ? ?7.756 ? ?0.202 ? ?0.950 > 0.153 ? ?1.258 ? ?0.025 ? ?7.756 > 95 ? ?3 ? ?1.365 ? ?0.360 ? ?3.792 ? ?0.335 ? ?4.073 ? ?1.080 ? ?1.264 > 0.484 ? ?2.820 ? ?0.261 ? ?5.237 > 120 ? ?3 ? ?5.171 ? ?1.891 ? ?2.735 ? ?0.542 ? ?9.532 ? ?5.672 ? ?0.912 > 4.970 ? ?1.040 ? ?0.160 ? ?32.408 > 123 ? ?3 ? ?0.721 ? ?0.226 ? ?3.182 ? ?0.155 ? ?4.661 ? ?0.679 ? ?1.061 > 0.372 ? ?1.939 ? ?0.153 ? ?4.704 > 133 ? ?3 ? ?3.242 ? ?0.918 ? ?3.531 ? ?0.921 ? ?3.519 ? ?2.754 ? ?1.177 > 1.551 ? ?2.090 ? ?0.282 ? ?11.512 > 136 ? ?3 ? ?1.058 ? ?0.371 ? ?2.850 ? ?0.337 ? ?3.141 ? ?1.113 ? ?0.950 > 0.449 ? ?2.358 ? ?0.328 ? ?3.225 > 140 ? ?3 ? ?5.935 ? ?1.877 ? ?3.162 ? ?0.327 ? ?18.167 ? ?5.630 ? ?1.054 > 5.114 ? ?1.160 ? ?0.189 ? ?31.332 > 141 ? ?3 ? ?0.974 ? ?0.325 ? ?2.997 ? ?0.160 ? ?6.092 ? ?0.975 ? ?0.999 > 0.661 ? ?1.473 ? ?0.154 ? ?6.330 > 147 ? ?3 ? ?3.207 ? ?0.798 ? ?4.018 ? ?0.951 ? ?3.373 ? ?2.394 ? ?1.339 > 1.148 ? ?2.793 ? ?0.296 ? ?10.840 > 2 ? ?2 ? ?3.859 ? ?2.020 ? ?1.911 ? ?2.020 ? ?1.911 ? ?4.039 ? ?0.955 > 3.714 ? ?1.039 ? ?0.326 ? ?11.851 > 8 ? ?2 ? ?0.017 ? ?0.009 ? ?1.900 ? ?0.009 ? ?1.900 ? ?0.018 ? ?0.950 > 0.014 ? ?1.266 ? ?0.005 ? ?3.799 > 10 ? ?2 ? ?0.096 ? ?0.060 ? ?1.603 ? ?0.060 ? ?1.603 ? ?0.120 ? ?0.802 > 0.060 ? ?1.599 ? ?0.060 ? ?1.607 > 26 ? ?2 ? ?7.308 ? ?0.504 ? ?14.500 ? ?0.504 ? ?14.500 ? ?1.008 ? ?7.250 > 0.813 ? ?8.990 ? ?0.195 ? ?37.459 > 46 ? ?2 ? ?9.070 ? ?7.542 ? ?1.203 ? ?7.542 ? ?1.203 ? ?15.085 ? ?0.601 > 7.576 ? ?1.197 ? ?7.509 ? ?1.208 > 49 ? ?2 ? ?9.485 ? ?5.035 ? ?1.884 ? ?5.035 ? ?1.884 ? ?10.070 ? ?0.942 > 9.406 ? ?1.008 ? ?0.664 ? ?14.289 > 63 ? ?2 ? ?21.308 ? ?13.956 ? ?1.527 ? ?13.956 ? ?1.527 ? ?27.912 > 0.763 ? ?14.148 ? ?1.506 ? ?13.764 ? ?1.548 > > And I need to calculate the mean of every column whose name starts with > Ratio_. > That is, I need to calculate the mean of the columns Ratio_mean (3.4882), > Ratio_median (5.2607), Ratio_sum (1.29985), Ratio_min (2.1533), Ratio_max > (11.1469). > But I would like to have the results just at once, in one column or one > after the other (and not to run the code for every single result). > I do have almost 200 columns that start with Ratio_, so doing it by hand > it's really burdensome... > > I tried with a "for" loop: > > > for (x in 1:5) { > ? x <- with(data, c(Ratio_mean, Ratio_median, Ratio_sum, Ratio_max, > Ratio_min)) > ? Mean <- with(data mean(x)) > ? print(Mean) > ?} > > > But the result is: > [1] 4.66979 > [1] 4.66979 > [1] 4.66979 > [1] 4.66979 > [1] 4.66979 > > which is the mean of all the columns together. I would like to obtain the > mean of every column in each line: > > [1] 3.4882 > [1] 5.2607 > [1] 1.29985 > [1] 2.1533 > [1] 11.1469 > > > I think my problem is that I don't know how to assign correctly the > variable x. Does anyone know if that is possible or an alternative way to > get that result? > Thank you so much in advance! > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org
On 06/13/2012 11:41 PM, Nympha Nymphaea wrote:> Dear R-helpers, > > I'm stuck with a little problem that surely has an easy solution but I > can't think of a way to solve it. I'd really appreciate any help you can > offer me! > > I'll provide a small example. Given a dataframe data.txt that looks like > this: > > ID freq Var Var_mean Ratio_mean Var_median > Ratio_median Var_sum Ratio_min Var_max Ratio_max Var_min > Ratio_sum > 134 5 2.140 0.447 4.784 0.272 7.881 2.237 0.957 > 0.833 2.568 0.187 11.437 > 30 4 1.743 0.450 3.873 0.358 4.869 1.799 0.968 > 0.915 1.904 0.169 10.329 > 137 4 1.401 0.304 4.614 0.310 4.514 1.215 1.153 > 0.480 2.921 0.114 12.268 > 9 3 0.065 0.023 2.849 0.021 3.108 0.069 0.950 > 0.030 2.137 0.017 3.799 > 35 3 0.192 0.067 2.849 0.025 7.756 0.202 0.950 > 0.153 1.258 0.025 7.756 > 95 3 1.365 0.360 3.792 0.335 4.073 1.080 1.264 > 0.484 2.820 0.261 5.237 > 120 3 5.171 1.891 2.735 0.542 9.532 5.672 0.912 > 4.970 1.040 0.160 32.408 > 123 3 0.721 0.226 3.182 0.155 4.661 0.679 1.061 > 0.372 1.939 0.153 4.704 > 133 3 3.242 0.918 3.531 0.921 3.519 2.754 1.177 > 1.551 2.090 0.282 11.512 > 136 3 1.058 0.371 2.850 0.337 3.141 1.113 0.950 > 0.449 2.358 0.328 3.225 > 140 3 5.935 1.877 3.162 0.327 18.167 5.630 1.054 > 5.114 1.160 0.189 31.332 > 141 3 0.974 0.325 2.997 0.160 6.092 0.975 0.999 > 0.661 1.473 0.154 6.330 > 147 3 3.207 0.798 4.018 0.951 3.373 2.394 1.339 > 1.148 2.793 0.296 10.840 > 2 2 3.859 2.020 1.911 2.020 1.911 4.039 0.955 > 3.714 1.039 0.326 11.851 > 8 2 0.017 0.009 1.900 0.009 1.900 0.018 0.950 > 0.014 1.266 0.005 3.799 > 10 2 0.096 0.060 1.603 0.060 1.603 0.120 0.802 > 0.060 1.599 0.060 1.607 > 26 2 7.308 0.504 14.500 0.504 14.500 1.008 7.250 > 0.813 8.990 0.195 37.459 > 46 2 9.070 7.542 1.203 7.542 1.203 15.085 0.601 > 7.576 1.197 7.509 1.208 > 49 2 9.485 5.035 1.884 5.035 1.884 10.070 0.942 > 9.406 1.008 0.664 14.289 > 63 2 21.308 13.956 1.527 13.956 1.527 27.912 > 0.763 14.148 1.506 13.764 1.548 > > And I need to calculate the mean of every column whose name starts with > Ratio_. > That is, I need to calculate the mean of the columns Ratio_mean (3.4882), > Ratio_median (5.2607), Ratio_sum (1.29985), Ratio_min (2.1533), Ratio_max > (11.1469). > But I would like to have the results just at once, in one column or one > after the other (and not to run the code for every single result). > I do have almost 200 columns that start with Ratio_, so doing it by hand > it's really burdensome... > > I tried with a "for" loop: > > > for (x in 1:5) { > x<- with(data, c(Ratio_mean, Ratio_median, Ratio_sum, Ratio_max, > Ratio_min)) > Mean<- with(data mean(x)) > print(Mean) > } > > > But the result is: > [1] 4.66979 > [1] 4.66979 > [1] 4.66979 > [1] 4.66979 > [1] 4.66979 > > which is the mean of all the columns together. I would like to obtain the > mean of every column in each line: > > [1] 3.4882 > [1] 5.2607 > [1] 1.29985 > [1] 2.1533 > [1] 11.1469 > > > I think my problem is that I don't know how to assign correctly the > variable x. Does anyone know if that is possible or an alternative way to > get that result?Hi Nympha, Try this: nn<-read.table("nn.dat",header=TRUE) for(variable in grep("Ratio_",names(nn),fixed=TRUE)) print(mean(nn[,variable])) Jim