Jacqueline Oehri
2013-Aug-20 16:43 UTC
[R] How to apply a function to every element of a dataframe, when the function uses for each colummn and row different values to calculate with?
Dear R users I have a question concerning applying a function to each element of a dataframe: 1) --> I have a dataframe like this: "d": (columnames: names of Landcovertypes, rownames: coordinates, nr: rowsums, nc:colummnsums) (look at the end of the mail for the structure of d, dput(d) ) here, "d" has 14 rows and 6 colummns:> dPL_7_1_7.txt PL_7_1_8.txt PUEH_4_0.txt PUEH_7_1_2.txt UEH_7_2_2.txt nr 821194 0 0 0 0 0 29 821202 0 0 0 0 0 8 821206 1 0 0 0 0 2 827162 1 0 0 0 0 6 827166 0 1 1 1 1 17 827178 0 0 0 0 0 0 827182 1 0 0 0 0 4 827186 0 0 0 0 0 16 827190 0 0 0 0 0 16 827194 0 0 0 0 0 18 827198 0 0 0 0 0 19 827206 0 0 0 0 0 19 833166 0 0 0 0 0 8 nc 86 120 905 300 309 18733 -->And i want to apply the following function "f" to each element xij of the dataframe "d": (xij is the element of the dataframe "d" at row nr. "i" and colummn nr. "j", x11 is therefore the element in the first row & the first collumn, which in case of "d" is equal to "0".) f = (x[i][j] -((nr[i]*nc[j])/n))^2/((nr[i]*nc[j])/n) so that in the end I will have a new dataframe "e", which contains the results of the function "f" as its elements instead of the original values! (do you know what I mean?) Do you have any hints how to do that? 2) After this, I wanted to filter out for EACH ROW in "e" the maximum value in the row & assign or link the respective columname of this maxiumum value to the respective rowname; so that in the end I will know for each rowname, which columname "fits best to it" i.e. which columname had the biggest value for this respective row. For example, in dataframe "d", in the third row called "821206 ", the maximum-value lies in the first colummn, which is named "PL_7_1_7.txt ". In this example I would link the name "821206 " somehow to the name "PL_7_1_7.txt ". Do you have any suggestions for me, how to do this the best way? or where i should look up possible solutions? I m really lost... What i tried until now was this:>f.good <- function(x, nr, nc, n) { n <- d[14,6] nr <- d[,6] nc <- d[14,] z1 <- (x-((nr*nc)/n))^2/((nr*nc)/n) return(z1) } and then i wanted to use the "apply" function:>apply(d, c(1,2), f.good) but it never worked at all, and I think Im far away from a solution! Can somebody help me out and give me a hint what to do? does somebody know a clever way to achieve tasks 1) &2) ? Im very glad about every input!!!!! Thanks a lot already!!! Have a nice day! Best wishes, Jacqueline> dput(d)structure(list(PL_7_1_7.txt = c(0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 86), PL_7_1_8.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 120), PUEH_4_0.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 905), PUEH_7_1_2.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 300), UEH_7_2_2.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 309), nr = c(29, 8, 2, 6, 17, 0, 4, 16, 16, 18, 19, 19, 8, 18733)), .Names = c("PL_7_1_7.txt", "PL_7_1_8.txt", "PUEH_4_0.txt", "PUEH_7_1_2.txt", "UEH_7_2_2.txt", "nr"), row.names = c("821194", "821202", "821206", "827162", "827166", "827178", "827182", "827186", "827190", "827194", "827198", "827206", "833166", "nc"), class = "data.frame")
Wuming Gong
2013-Aug-20 21:06 UTC
[R] How to apply a function to every element of a dataframe, when the function uses for each colummn and row different values to calculate with?
Hi Jacqueline, (1) x <- as.matrix(d[rownames(d) != 'nc', colnames(d) != 'nr']) nc <- d['nc', ] nr <- d[, 'nr'] e <- (x - nc %o% nr)^2 / (nc %o% nr / 2) (2) if I understand correctly, ?max.col is what you need. Wuming On Tue, Aug 20, 2013 at 11:43 AM, Jacqueline Oehri < jacqueline.oehri@gmail.com> wrote:> Dear R users > > > I have a question concerning applying a function to each element of a > dataframe: > > > 1) > --> I have a dataframe like this: "d": > (columnames: names of Landcovertypes, rownames: coordinates, nr: > rowsums, nc:colummnsums) > (look at the end of the mail for the structure of d, dput(d) ) > here, "d" has 14 rows and 6 colummns: > > > d > PL_7_1_7.txt PL_7_1_8.txt PUEH_4_0.txt PUEH_7_1_2.txt UEH_7_2_2.txt > nr > 821194 0 0 0 0 0 > 29 > 821202 0 0 0 0 0 > 8 > 821206 1 0 0 0 0 > 2 > 827162 1 0 0 0 0 > 6 > 827166 0 1 1 1 1 > 17 > 827178 0 0 0 0 0 > 0 > 827182 1 0 0 0 0 > 4 > 827186 0 0 0 0 0 > 16 > 827190 0 0 0 0 0 > 16 > 827194 0 0 0 0 0 > 18 > 827198 0 0 0 0 0 > 19 > 827206 0 0 0 0 0 > 19 > 833166 0 0 0 0 0 > 8 > nc 86 120 905 300 309 > 18733 > > > -->And i want to apply the following function "f" to each element xij > of the dataframe "d": > (xij is the element of the dataframe "d" at row nr. "i" and colummn > nr. "j", x11 is therefore the element in the first row & the first > collumn, which in case of "d" is equal to "0".) > > f = (x[i][j] -((nr[i]*nc[j])/n))^2/((nr[i]*nc[j])/n) > > > so that in the end I will have a new dataframe "e", which contains the > results of the function "f" as its elements instead of the original > values! (do you know what I mean?) > Do you have any hints how to do that? > > 2) After this, I wanted to filter out for EACH ROW in "e" the maximum > value in the row & assign or link the respective columname of this > maxiumum value to the respective rowname; > so that in the end I will know for each rowname, which columname "fits > best to it" i.e. which columname had the biggest value for this > respective row. > For example, in dataframe "d", in the third row called "821206 ", the > maximum-value lies in the first colummn, which is named "PL_7_1_7.txt > ". In this example I would link the name "821206 " somehow to the name > "PL_7_1_7.txt ". > > Do you have any suggestions for me, how to do this the best way? or > where i should look up possible solutions? I m really lost... > > What i tried until now was this: > > > > f.good <- function(x, nr, nc, n) { > n <- d[14,6] > nr <- d[,6] > nc <- d[14,] > z1 <- (x-((nr*nc)/n))^2/((nr*nc)/n) > return(z1) > } > > and then i wanted to use the "apply" function: > > > > apply(d, c(1,2), f.good) > > but it never worked at all, and I think Im far away from a solution! > > Can somebody help me out and give me a hint what to do? does somebody > know a clever way to achieve tasks 1) &2) ? > > Im very glad about every input!!!!! > > Thanks a lot already!!! Have a nice day! > > Best wishes, > Jacqueline > > > > dput(d) > structure(list(PL_7_1_7.txt = c(0, 0, 1, 1, 0, 0, 1, 0, 0, 0, > 0, 0, 0, 86), PL_7_1_8.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, > 0, 0, 0, 120), PUEH_4_0.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, > 0, 0, 0, 905), PUEH_7_1_2.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, > 0, 0, 0, 0, 300), UEH_7_2_2.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, > 0, 0, 0, 0, 309), nr = c(29, 8, 2, 6, 17, 0, 4, 16, 16, 18, 19, > 19, 8, 18733)), .Names = c("PL_7_1_7.txt", "PL_7_1_8.txt", "PUEH_4_0.txt", > "PUEH_7_1_2.txt", "UEH_7_2_2.txt", "nr"), row.names = c("821194", > "821202", "821206", "827162", "827166", "827178", "827182", "827186", > "827190", "827194", "827198", "827206", "833166", "nc"), class > "data.frame") > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
David Winsemius
2013-Aug-20 22:01 UTC
[R] How to apply a function to every element of a dataframe, when the function uses for each colummn and row different values to calculate with?
PLEASE do not crosspost to Rhelp and googlegroups. (removed that address.) On Aug 20, 2013, at 9:43 AM, Jacqueline Oehri wrote:> Dear R users > > > I have a question concerning applying a function to each element of a dataframe: > > > 1) > --> I have a dataframe like this: "d": > (columnames: names of Landcovertypes, rownames: coordinates, nr: > rowsums, nc:colummnsums) > (look at the end of the mail for the structure of d, dput(d) ) > here, "d" has 14 rows and 6 colummns: > >> d > PL_7_1_7.txt PL_7_1_8.txt PUEH_4_0.txt PUEH_7_1_2.txt UEH_7_2_2.txt nr > 821194 0 0 0 0 0 29 > 821202 0 0 0 0 0 8 > 821206 1 0 0 0 0 2 > 827162 1 0 0 0 0 6 > 827166 0 1 1 1 1 17 > 827178 0 0 0 0 0 0 > 827182 1 0 0 0 0 4 > 827186 0 0 0 0 0 16 > 827190 0 0 0 0 0 16 > 827194 0 0 0 0 0 18 > 827198 0 0 0 0 0 19 > 827206 0 0 0 0 0 19 > 833166 0 0 0 0 0 8 > nc 86 120 905 300 309 18733 > > > -->And i want to apply the following function "f" to each element xij > of the dataframe "d": > (xij is the element of the dataframe "d" at row nr. "i" and colummn > nr. "j", x11 is therefore the element in the first row & the first > collumn, which in case of "d" is equal to "0".) > > f = (x[i][j] -((nr[i]*nc[j])/n))^2/((nr[i]*nc[j])/n)Looks like you are trying to reinvent the chisq.test function. These are snippets of that code with the continuity correction material removed: sr <- rowSums(x) sc <- colSums(x) E <- outer(sr, sc, "*")/n STATISTIC <- sum( ..see below.. ) You would probably remove the sum and go with fmat <- (abs(x - E) )^2/E (I'm not sure why that abs is in the `chisq.test` code.)> > > so that in the end I will have a new dataframe "e", which contains the > results of the function "f" as its elements instead of the original > values! (do you know what I mean?) > Do you have any hints how to do that? > > 2) After this, I wanted to filter out for EACH ROW in "e" the maximum > value in the row & assign or link the respective columname of this > maxiumum value to the respective rowname; > so that in the end I will know for each rowname,Just index the column names by the result of row-which.max: colnames(m) [ apply(m, 1, which.max) ] # (be sure to remove the "nr" column) Test to see if I'm missing anything: chisq.test # to see the code # posting dput() on the corner of your matrix was a good idea: m <- d[!rownames(d)=="nc", !colnames(d)=="nr"] m <- data.matrix(m); n <- sum(m); sr <- rowSums(m) sc <- colSums(m) E <- outer(sr, sc, "*")/n fmat <- (abs(m - E) )^2/E colnames(m) [ apply(m, 1, which.max) ] [1] "PL_7_1_7.txt" "PL_7_1_7.txt" "PL_7_1_7.txt" "PL_7_1_7.txt" [5] "PL_7_1_8.txt" "PL_7_1_7.txt" "PL_7_1_7.txt" "PL_7_1_7.txt" [9] "PL_7_1_7.txt" "PL_7_1_7.txt" "PL_7_1_7.txt" "PL_7_1_7.txt" [13] "PL_7_1_7.txt" The sum of the "predicteds" checks out:> sum(E, na.rm=TRUE)[1] 7> round(fmat, 3)PL_7_1_7.txt PL_7_1_8.txt PUEH_4_0.txt PUEH_7_1_2.txt UEH_7_2_2.txt 821194 NaN NaN NaN NaN NaN 821202 NaN NaN NaN NaN NaN 821206 0.762 0.143 0.143 0.143 0.143 827162 0.762 0.143 0.143 0.143 0.143 827166 1.714 0.321 0.321 0.321 0.321 827178 NaN NaN NaN NaN NaN 827182 0.762 0.143 0.143 0.143 0.143 827186 NaN NaN NaN NaN NaN 827190 NaN NaN NaN NaN NaN 827194 NaN NaN NaN NaN NaN 827198 NaN NaN NaN NaN NaN 827206 NaN NaN NaN NaN NaN 833166 NaN NaN NaN NaN NaN Obviously with a more complete set of data than you offered you would get fewer NaN rows caused by the zero denominators in your data. Note that which.max of c(0,0,0,0,0) is 1, so be aware that there is ambiguity when the row count is zero. -- David> which columname "fits > best to it" i.e. which columname had the biggest value for this > respective row. > For example, in dataframe "d", in the third row called "821206 ", the > maximum-value lies in the first colummn, which is named "PL_7_1_7.txt > ". In this example I would link the name "821206 " somehow to the name > "PL_7_1_7.txt ". > > Do you have any suggestions for me, how to do this the best way? or > where i should look up possible solutions? I m really lost...> > What i tried until now was this: > >> > f.good <- function(x, nr, nc, n) { > n <- d[14,6] > nr <- d[,6] > nc <- d[14,] > z1 <- (x-((nr*nc)/n))^2/((nr*nc)/n) > return(z1) > } > > and then i wanted to use the "apply" function: > >> > apply(d, c(1,2), f.good) > > but it never worked at all, and I think Im far away from a solution! > > Can somebody help me out and give me a hint what to do? does somebody > know a clever way to achieve tasks 1) &2) ? > > Im very glad about every input!!!!! > > Thanks a lot already!!! Have a nice day! > > Best wishes, > Jacqueline > > >> dput(d) > structure(list(PL_7_1_7.txt = c(0, 0, 1, 1, 0, 0, 1, 0, 0, 0, > 0, 0, 0, 86), PL_7_1_8.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, > 0, 0, 0, 120), PUEH_4_0.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, > 0, 0, 0, 905), PUEH_7_1_2.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, > 0, 0, 0, 0, 300), UEH_7_2_2.txt = c(0, 0, 0, 0, 1, 0, 0, 0, 0, > 0, 0, 0, 0, 309), nr = c(29, 8, 2, 6, 17, 0, 4, 16, 16, 18, 19, > 19, 8, 18733)), .Names = c("PL_7_1_7.txt", "PL_7_1_8.txt", "PUEH_4_0.txt", > "PUEH_7_1_2.txt", "UEH_7_2_2.txt", "nr"), row.names = c("821194", > "821202", "821206", "827162", "827166", "827178", "827182", "827186", > "827190", "827194", "827198", "827206", "833166", "nc"), class = "data.frame") > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA