Chris Evans
2020-Sep-21 12:12 UTC
[R] Is there a simple way to analyse all the data using dplyr?
I am sure the answer is "yes" and I'm also sure the question may
sound mad. Here's a reprex that I think captures what I'm doing
n <- 500
gender <- sample(c("Man","Woman","Other"), n,
replace = TRUE)
GPC_score <- rnorm(n)
scaleMeasures <- runif(n)
bind_cols(gender = gender,
GPC_score = GPC_score,
scaleMeasures = scaleMeasures) -> tibUse
### let's have the correlation between the two variables broken down by
gender
tibUse %>%
filter(gender != "Other") %>%
select(gender, GPC_score, scaleMeasures) %>%
na.omit() %>%
group_by(gender) %>%
summarise(cor = cor(cur_data())[1,2]) -> tmp1
### but I'd also like the correlation for the whole dataset, not by gender
### this is a kludge to achieve that which I am using partly because I cant'
### find the equivalent of cur_data() for an ungrouped tibble/df
tibUse %>%
mutate(gender = "All") %>% # nasty kludge to get all the data!
select(gender, GPC_score, scaleMeasures) %>%
na.omit() %>%
group_by(gender) %>% # ditto!
summarise(cor = cor(cur_data())[1,2]) -> tmp2
bind_rows(tmp1,
tmp2)
### gets me what I want:
# A tibble: 3 x 2
gender cor
<chr> <dbl>
1 Man 0.0225
2 Woman 0.0685
3 All 0.0444
In reality I have some functions that are more complex than cor()[2,1] (sorry
about that particular kludge) that digest dataframes and I'd love to have a
simpler way of doing this.
So two questions:
1) I am sure there a term/function that works on an ungrouped tibble in dplyr as
cur_data() does for a grouped tibble ... but I can't find it.
2) I suspect someone has automated a way to get the analysis of the complete
data after the analyses of the groups within a single dplyr run ... it seems an
obvious and common use case, but I can't find that either.
Sorry, I'm over 99% sure I'm being stupid and missing the obvious here
... but that's the recurrent problem I have with my wetware and searchware
doesn't seem to being fixing this!
TIA,
Chris
--
Small contribution in our coronavirus rigours:
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/
Chris Evans <chris at psyctc.org> Visiting Professor, University of
Sheffield <chris.evans at sheffield.ac.uk>
I do some consultation work for the University of Roehampton <chris.evans at
roehampton.ac.uk> and other places
but <chris at psyctc.org> remains my main Email address. I have a work
web site at:
https://www.psyctc.org/psyctc/
and a site I manage for CORE and CORE system trust at:
http://www.coresystemtrust.org.uk/
I have "semigrated" to France, see:
https://www.psyctc.org/pelerinage2016/semigrating-to-france/
https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/
If you want an Emeeting, I am trying to keep them to Thursdays and my diary is
at:
https://www.psyctc.org/pelerinage2016/ceworkdiary/
Beware: French time, generally an hour ahead of UK.
Eric Berger
2020-Sep-21 13:03 UTC
[R] Is there a simple way to analyse all the data using dplyr?
Hi,
I am not sure if the request is about a 'simple way' or requires
dplyr. Here's an approach without using dplyr that is just 2 lines
(not counting creating the data or outputting the result).
n <- 500
myDf <- data.frame(
gender=sample(c("Man","Woman","Other"), n, replace
= TRUE),
GPC_score=rnorm(n), scaleMeasures=runif(n))
aL <-
list(Man="Man",Woman="Woman",All=c("Man","Woman","Other"))
z <- sapply( 1:length(aL), function(i) { x=myDf[ myDf$gender %in%
aL[[i]], ]; cor(x[,2],x[,3]) } )
names(z) <- names(aL)
z
HTH,
Eric
On Mon, Sep 21, 2020 at 3:13 PM Chris Evans <chrishold at psyctc.org>
wrote:>
> I am sure the answer is "yes" and I'm also sure the question
may sound mad. Here's a reprex that I think captures what I'm doing
>
> n <- 500
> gender <- sample(c("Man","Woman","Other"),
n, replace = TRUE)
> GPC_score <- rnorm(n)
> scaleMeasures <- runif(n)
> bind_cols(gender = gender,
> GPC_score = GPC_score,
> scaleMeasures = scaleMeasures) -> tibUse
>
> ### let's have the correlation between the two variables broken down by
gender
> tibUse %>%
> filter(gender != "Other") %>%
> select(gender, GPC_score, scaleMeasures) %>%
> na.omit() %>%
> group_by(gender) %>%
> summarise(cor = cor(cur_data())[1,2]) -> tmp1
>
> ### but I'd also like the correlation for the whole dataset, not by
gender
> ### this is a kludge to achieve that which I am using partly because I
cant'
> ### find the equivalent of cur_data() for an ungrouped tibble/df
> tibUse %>%
> mutate(gender = "All") %>% # nasty kludge to get all the
data!
> select(gender, GPC_score, scaleMeasures) %>%
> na.omit() %>%
> group_by(gender) %>% # ditto!
> summarise(cor = cor(cur_data())[1,2]) -> tmp2
>
> bind_rows(tmp1,
> tmp2)
>
> ### gets me what I want:
> # A tibble: 3 x 2
> gender cor
> <chr> <dbl>
> 1 Man 0.0225
> 2 Woman 0.0685
> 3 All 0.0444
>
> In reality I have some functions that are more complex than cor()[2,1]
(sorry about that particular kludge) that digest dataframes and I'd love to
have a simpler way of doing this.
>
> So two questions:
> 1) I am sure there a term/function that works on an ungrouped tibble in
dplyr as cur_data() does for a grouped tibble ... but I can't find it.
> 2) I suspect someone has automated a way to get the analysis of the
complete data after the analyses of the groups within a single dplyr run ... it
seems an obvious and common use case, but I can't find that either.
>
> Sorry, I'm over 99% sure I'm being stupid and missing the obvious
here ... but that's the recurrent problem I have with my wetware and
searchware doesn't seem to being fixing this!
>
> TIA,
>
> Chris
>
> --
> Small contribution in our coronavirus rigours:
>
https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/
>
> Chris Evans <chris at psyctc.org> Visiting Professor, University of
Sheffield <chris.evans at sheffield.ac.uk>
> I do some consultation work for the University of Roehampton
<chris.evans at roehampton.ac.uk> and other places
> but <chris at psyctc.org> remains my main Email address. I have a
work web site at:
> https://www.psyctc.org/psyctc/
> and a site I manage for CORE and CORE system trust at:
> http://www.coresystemtrust.org.uk/
> I have "semigrated" to France, see:
> https://www.psyctc.org/pelerinage2016/semigrating-to-france/
>
https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/
>
> If you want an Emeeting, I am trying to keep them to Thursdays and my diary
is at:
> https://www.psyctc.org/pelerinage2016/ceworkdiary/
> Beware: French time, generally an hour ahead of UK.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Chris Evans
2020-Sep-21 16:05 UTC
[R] Is there a simple way to analyse all the data using dplyr?
Thanks Eric, That's very neat! Sort of fits my belief about base R and telegrams (that's not knocking it, I really do respect it, my wetware is just not good at it). For many reasons, particularly the convenience for formatting and passing on results from the real function I'm applying, I am really keen to find tidyverse/dplyr answers/options. Any offers?! TIA (all), Chris ----- Original Message -----> From: "Eric Berger" <ericjberger at gmail.com> > To: "Chris Evans" <chrishold at psyctc.org> > Cc: "r-help" <r-help at r-project.org> > Sent: Monday, 21 September, 2020 15:03:44 > Subject: Re: [R] Is there a simple way to analyse all the data using dplyr?> Hi, > I am not sure if the request is about a 'simple way' or requires > dplyr. Here's an approach without using dplyr that is just 2 lines > (not counting creating the data or outputting the result). > > n <- 500 > myDf <- data.frame( gender=sample(c("Man","Woman","Other"), n, replace = TRUE), > GPC_score=rnorm(n), scaleMeasures=runif(n)) > aL <- list(Man="Man",Woman="Woman",All=c("Man","Woman","Other")) > z <- sapply( 1:length(aL), function(i) { x=myDf[ myDf$gender %in% > aL[[i]], ]; cor(x[,2],x[,3]) } ) > names(z) <- names(aL) > z > > HTH, > Eric > > > On Mon, Sep 21, 2020 at 3:13 PM Chris Evans <chrishold at psyctc.org> wrote: >> >> I am sure the answer is "yes" and I'm also sure the question may sound mad. >> Here's a reprex that I think captures what I'm doing >> >> n <- 500 >> gender <- sample(c("Man","Woman","Other"), n, replace = TRUE) >> GPC_score <- rnorm(n) >> scaleMeasures <- runif(n) >> bind_cols(gender = gender, >> GPC_score = GPC_score, >> scaleMeasures = scaleMeasures) -> tibUse >> >> ### let's have the correlation between the two variables broken down by gender >> tibUse %>% >> filter(gender != "Other") %>% >> select(gender, GPC_score, scaleMeasures) %>% >> na.omit() %>% >> group_by(gender) %>% >> summarise(cor = cor(cur_data())[1,2]) -> tmp1 >> >> ### but I'd also like the correlation for the whole dataset, not by gender >> ### this is a kludge to achieve that which I am using partly because I cant' >> ### find the equivalent of cur_data() for an ungrouped tibble/df >> tibUse %>% >> mutate(gender = "All") %>% # nasty kludge to get all the data! >> select(gender, GPC_score, scaleMeasures) %>% >> na.omit() %>% >> group_by(gender) %>% # ditto! >> summarise(cor = cor(cur_data())[1,2]) -> tmp2 >> >> bind_rows(tmp1, >> tmp2) >> >> ### gets me what I want: >> # A tibble: 3 x 2 >> gender cor >> <chr> <dbl> >> 1 Man 0.0225 >> 2 Woman 0.0685 >> 3 All 0.0444 >> >> In reality I have some functions that are more complex than cor()[2,1] (sorry >> about that particular kludge) that digest dataframes and I'd love to have a >> simpler way of doing this. >> >> So two questions: >> 1) I am sure there a term/function that works on an ungrouped tibble in dplyr as >> cur_data() does for a grouped tibble ... but I can't find it. >> 2) I suspect someone has automated a way to get the analysis of the complete >> data after the analyses of the groups within a single dplyr run ... it seems an >> obvious and common use case, but I can't find that either. >> >> Sorry, I'm over 99% sure I'm being stupid and missing the obvious here ... but >> that's the recurrent problem I have with my wetware and searchware doesn't seem >> to being fixing this! >> >> TIA, >> >> Chris >> >> -- >> Small contribution in our coronavirus rigours: >> https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/ >> >> Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield >> <chris.evans at sheffield.ac.uk> >> I do some consultation work for the University of Roehampton >> <chris.evans at roehampton.ac.uk> and other places >> but <chris at psyctc.org> remains my main Email address. I have a work web site >> at: >> https://www.psyctc.org/psyctc/ >> and a site I manage for CORE and CORE system trust at: >> http://www.coresystemtrust.org.uk/ >> I have "semigrated" to France, see: >> https://www.psyctc.org/pelerinage2016/semigrating-to-france/ >> https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ >> >> If you want an Emeeting, I am trying to keep them to Thursdays and my diary is >> at: >> https://www.psyctc.org/pelerinage2016/ceworkdiary/ >> Beware: French time, generally an hour ahead of UK. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.-- Small contribution in our coronavirus rigours: https://www.coresystemtrust.org.uk/home/free-options-to-replace-paper-core-forms-during-the-coronavirus-pandemic/ Chris Evans <chris at psyctc.org> Visiting Professor, University of Sheffield <chris.evans at sheffield.ac.uk> I do some consultation work for the University of Roehampton <chris.evans at roehampton.ac.uk> and other places but <chris at psyctc.org> remains my main Email address. I have a work web site at: https://www.psyctc.org/psyctc/ and a site I manage for CORE and CORE system trust at: http://www.coresystemtrust.org.uk/ I have "semigrated" to France, see: https://www.psyctc.org/pelerinage2016/semigrating-to-france/ https://www.psyctc.org/pelerinage2016/register-to-get-updates-from-pelerinage2016/ If you want an Emeeting, I am trying to keep them to Thursdays and my diary is at: https://www.psyctc.org/pelerinage2016/ceworkdiary/ Beware: French time, generally an hour ahead of UK.