Chris Evans
2020-Sep-21 12:12 UTC
[R] Is there a simple way to analyse all the data using dplyr?
I am sure the answer is "yes" and I'm also sure the question may sound mad. Here's a reprex that I think captures what I'm doing n <- 500 gender <- sample(c("Man","Woman","Other"), n, replace = TRUE) GPC_score <- rnorm(n) scaleMeasures <- runif(n) bind_cols(gender = gender, GPC_score = GPC_score, scaleMeasures = scaleMeasures) -> tibUse ### let's have the correlation between the two variables broken down by gender tibUse %>% filter(gender != "Other") %>% select(gender, GPC_score, scaleMeasures) %>% na.omit() %>% group_by(gender) %>% summarise(cor = cor(cur_data())[1,2]) -> tmp1 ### but I'd also like the correlation for the whole dataset, not by gender ### this is a kludge to achieve that which I am using partly because I cant' ### find the equivalent of cur_data() for an ungrouped tibble/df tibUse %>% mutate(gender = "All") %>% # nasty kludge to get all the data! select(gender, GPC_score, scaleMeasures) %>% na.omit() %>% group_by(gender) %>% # ditto! summarise(cor = cor(cur_data())[1,2]) -> tmp2 bind_rows(tmp1, tmp2) ### gets me what I want: # A tibble: 3 x 2 gender cor <chr> <dbl> 1 Man 0.0225 2 Woman 0.0685 3 All 0.0444 In reality I have some functions that are more complex than cor()[2,1] (sorry about that particular kludge) that digest dataframes and I'd love to have a simpler way of doing this. So two questions: 1) I am sure there a term/function that works on an ungrouped tibble in dplyr as cur_data() does for a grouped tibble ... but I can't find it. 2) I suspect someone has automated a way to get the analysis of the complete data after the analyses of the groups within a single dplyr run ... it seems an obvious and common use case, but I can't find that either. Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but that's the recurrent problem I have with my wetware and searchware doesn't seem to being fixing this! TIA, Chris -- Small contribution in our coronavirus rigours: https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/ Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield <chris.evans at sheffield.ac.uk> I do some consultation work for the University of Roehampton <chris.evans at roehampton.ac.uk> and other places but <chris at psyctc.org> remains my main Email address. I have a work web site at: https://www.psyctc.org/psyctc/ and a site I manage for CORE and CORE system trust at: http://www.coresystemtrust.org.uk/ I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at: https://www.psyctc.org/pelerinage2016/ceworkdiary/ Beware: French time, generally an hour ahead of UK.
Eric Berger
2020-Sep-21 13:03 UTC
[R] Is there a simple way to analyse all the data using dplyr?
Hi, I am not sure if the request is about a 'simple way' or requires dplyr. Here's an approach without using dplyr that is just 2 lines (not counting creating the data or outputting the result). n <- 500 myDf <- data.frame( gender=sample(c("Man","Woman","Other"), n, replace = TRUE), GPC_score=rnorm(n), scaleMeasures=runif(n)) aL <- list(Man="Man",Woman="Woman",All=c("Man","Woman","Other")) z <- sapply( 1:length(aL), function(i) { x=myDf[ myDf$gender %in% aL[[i]], ]; cor(x[,2],x[,3]) } ) names(z) <- names(aL) z HTH, Eric On Mon, Sep 21, 2020 at 3:13 PM Chris Evans <chrishold at psyctc.org> wrote:> > I am sure the answer is "yes" and I'm also sure the question may sound mad. Here's a reprex that I think captures what I'm doing > > n <- 500 > gender <- sample(c("Man","Woman","Other"), n, replace = TRUE) > GPC_score <- rnorm(n) > scaleMeasures <- runif(n) > bind_cols(gender = gender, > GPC_score = GPC_score, > scaleMeasures = scaleMeasures) -> tibUse > > ### let's have the correlation between the two variables broken down by gender > tibUse %>% > filter(gender != "Other") %>% > select(gender, GPC_score, scaleMeasures) %>% > na.omit() %>% > group_by(gender) %>% > summarise(cor = cor(cur_data())[1,2]) -> tmp1 > > ### but I'd also like the correlation for the whole dataset, not by gender > ### this is a kludge to achieve that which I am using partly because I cant' > ### find the equivalent of cur_data() for an ungrouped tibble/df > tibUse %>% > mutate(gender = "All") %>% # nasty kludge to get all the data! > select(gender, GPC_score, scaleMeasures) %>% > na.omit() %>% > group_by(gender) %>% # ditto! > summarise(cor = cor(cur_data())[1,2]) -> tmp2 > > bind_rows(tmp1, > tmp2) > > ### gets me what I want: > # A tibble: 3 x 2 > gender cor > <chr> <dbl> > 1 Man 0.0225 > 2 Woman 0.0685 > 3 All 0.0444 > > In reality I have some functions that are more complex than cor()[2,1] (sorry about that particular kludge) that digest dataframes and I'd love to have a simpler way of doing this. > > So two questions: > 1) I am sure there a term/function that works on an ungrouped tibble in dplyr as cur_data() does for a grouped tibble ... but I can't find it. > 2) I suspect someone has automated a way to get the analysis of the complete data after the analyses of the groups within a single dplyr run ... it seems an obvious and common use case, but I can't find that either. > > Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but that's the recurrent problem I have with my wetware and searchware doesn't seem to being fixing this! > > TIA, > > Chris > > -- > Small contribution in our coronavirus rigours: > https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/ > > Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield <chris.evans at sheffield.ac.uk> > I do some consultation work for the University of Roehampton <chris.evans at roehampton.ac.uk> and other places > but <chris at psyctc.org> remains my main Email address. I have a work web site at: > https://www.psyctc.org/psyctc/ > and a site I manage for CORE and CORE system trust at: > http://www.coresystemtrust.org.uk/ > I have "semigrated" to France, see: > https://www.psyctc.org/pelerinage2016/semigrating-to-france/ > https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ > > If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at: > https://www.psyctc.org/pelerinage2016/ceworkdiary/ > Beware: French time, generally an hour ahead of UK. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Chris Evans
2020-Sep-21 16:05 UTC
[R] Is there a simple way to analyse all the data using dplyr?
Thanks Eric, That's very neat! Sort of fits my belief about base R and telegrams (that's not knocking it, I really do respect it, my wetware is just not good at it). For many reasons, particularly the convenience for formatting and passing on results from the real function I'm applying, I am really keen to find tidyverse/dplyr answers/options. Any offers?! TIA (all), Chris ----- Original Message -----> From: "Eric Berger" <ericjberger at gmail.com> > To: "Chris Evans" <chrishold at psyctc.org> > Cc: "r-help" <r-help at r-project.org> > Sent: Monday, 21 September, 2020 15:03:44 > Subject: Re: [R] Is there a simple way to analyse all the data using dplyr?> Hi, > I am not sure if the request is about a 'simple way' or requires > dplyr. Here's an approach without using dplyr that is just 2 lines > (not counting creating the data or outputting the result). > > n <- 500 > myDf <- data.frame( gender=sample(c("Man","Woman","Other"), n, replace = TRUE), > GPC_score=rnorm(n), scaleMeasures=runif(n)) > aL <- list(Man="Man",Woman="Woman",All=c("Man","Woman","Other")) > z <- sapply( 1:length(aL), function(i) { x=myDf[ myDf$gender %in% > aL[[i]], ]; cor(x[,2],x[,3]) } ) > names(z) <- names(aL) > z > > HTH, > Eric > > > On Mon, Sep 21, 2020 at 3:13 PM Chris Evans <chrishold at psyctc.org> wrote: >> >> I am sure the answer is "yes" and I'm also sure the question may sound mad. >> Here's a reprex that I think captures what I'm doing >> >> n <- 500 >> gender <- sample(c("Man","Woman","Other"), n, replace = TRUE) >> GPC_score <- rnorm(n) >> scaleMeasures <- runif(n) >> bind_cols(gender = gender, >> GPC_score = GPC_score, >> scaleMeasures = scaleMeasures) -> tibUse >> >> ### let's have the correlation between the two variables broken down by gender >> tibUse %>% >> filter(gender != "Other") %>% >> select(gender, GPC_score, scaleMeasures) %>% >> na.omit() %>% >> group_by(gender) %>% >> summarise(cor = cor(cur_data())[1,2]) -> tmp1 >> >> ### but I'd also like the correlation for the whole dataset, not by gender >> ### this is a kludge to achieve that which I am using partly because I cant' >> ### find the equivalent of cur_data() for an ungrouped tibble/df >> tibUse %>% >> mutate(gender = "All") %>% # nasty kludge to get all the data! >> select(gender, GPC_score, scaleMeasures) %>% >> na.omit() %>% >> group_by(gender) %>% # ditto! >> summarise(cor = cor(cur_data())[1,2]) -> tmp2 >> >> bind_rows(tmp1, >> tmp2) >> >> ### gets me what I want: >> # A tibble: 3 x 2 >> gender cor >> <chr> <dbl> >> 1 Man 0.0225 >> 2 Woman 0.0685 >> 3 All 0.0444 >> >> In reality I have some functions that are more complex than cor()[2,1] (sorry >> about that particular kludge) that digest dataframes and I'd love to have a >> simpler way of doing this. >> >> So two questions: >> 1) I am sure there a term/function that works on an ungrouped tibble in dplyr as >> cur_data() does for a grouped tibble ... but I can't find it. >> 2) I suspect someone has automated a way to get the analysis of the complete >> data after the analyses of the groups within a single dplyr run ... it seems an >> obvious and common use case, but I can't find that either. >> >> Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but >> that's the recurrent problem I have with my wetware and searchware doesn't seem >> to being fixing this! >> >> TIA, >> >> Chris >> >> -- >> Small contribution in our coronavirus rigours: >> https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/ >> >> Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield >> <chris.evans at sheffield.ac.uk> >> I do some consultation work for the University of Roehampton >> <chris.evans at roehampton.ac.uk> and other places >> but <chris at psyctc.org> remains my main Email address. I have a work web site >> at: >> https://www.psyctc.org/psyctc/ >> and a site I manage for CORE and CORE system trust at: >> http://www.coresystemtrust.org.uk/ >> I have "semigrated" to France, see: >> https://www.psyctc.org/pelerinage2016/semigrating-to-france/ >> https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ >> >> If you want an Emeeting, I am trying to keep them to Thursdays and my diary is >> at: >> https://www.psyctc.org/pelerinage2016/ceworkdiary/ >> Beware: French time, generally an hour ahead of UK. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.-- Small contribution in our coronavirus rigours: https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/ Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield <chris.evans at sheffield.ac.uk> I do some consultation work for the University of Roehampton <chris.evans at roehampton.ac.uk> and other places but <chris at psyctc.org> remains my main Email address. I have a work web site at: https://www.psyctc.org/psyctc/ and a site I manage for CORE and CORE system trust at: http://www.coresystemtrust.org.uk/ I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at: https://www.psyctc.org/pelerinage2016/ceworkdiary/ Beware: French time, generally an hour ahead of UK.