thr3ads.net - R help - [R] (no subject) [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Rui Barradas

2024-Sep-16 18:47 UTC

[R] (no subject)

?s 15:23 de 16/09/2024, Francesca escreveu:> Sorry for posting a non understandable code. In my screen the dataset
> looked correctly.
> 
> 
> I recreated my dataset, folllowing your example:
> 
> test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA,
17,
>   2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
>                          c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
> NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
>                          c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
> 2, 3, 2, 3, 3, 2, 2 ,4),
>                          c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
> 8, 5, 1, 2, 4, 7, 6, 6)))
> colnames(test)   
<-c("cp1","cp2","role","groupid")
> 
> What I have done so far is the following, that works:
>   test %>%
>    group_by(groupid) %>%
>    mutate(across(starts_with("cp"), list(mean = mean)))
> 
> But the problem is with NA: everytime the mean encounters a NA, it creates
> NA for all group members.
> I need the software to calculate the mean ignoring NA. So when the group is
> made of three people, mean of the three.
> If the group is two values and an NA, calculate the mean of two.
> 
> My code works , creates a mean at each position for three subjects,
> replacing instead of the value of the single, the group mean.
> But when NA appears, all the group gets NA.
> 
> Perhaps there is a different way to obtain the same result.
> 
> 
> 
> On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> 
>> ?s 08:28 de 16/09/2024, Francesca escreveu:
>>> Dear Contributors,
>>> I hope someone has found a similar issue.
>>>
>>> I have this data set,
>>>
>>>
>>>
>>> cp1
>>> cp2
>>> role
>>> groupid
>>> 1
>>> 10
>>> 13
>>> 4
>>> 5
>>> 2
>>> 5
>>> 10
>>> 3
>>> 1
>>> 3
>>> 7
>>> 7
>>> 4
>>> 6
>>> 4
>>> 10
>>> 4
>>> 2
>>> 7
>>> 5
>>> 5
>>> 8
>>> 3
>>> 2
>>> 6
>>> 8
>>> 7
>>> 4
>>> 4
>>> 7
>>> 8
>>> 8
>>> 4
>>> 7
>>> 8
>>> 10
>>> 15
>>> 3
>>> 3
>>> 9
>>> 15
>>> 10
>>> 2
>>> 2
>>> 10
>>> 5
>>> 5
>>> 2
>>> 4
>>> 11
>>> 20
>>> 20
>>> 2
>>> 5
>>> 12
>>> 9
>>> 11
>>> 3
>>> 6
>>> 13
>>> 10
>>> 13
>>> 4
>>> 3
>>> 14
>>> 12
>>> 6
>>> 4
>>> 2
>>> 15
>>> 7
>>> 4
>>> 4
>>> 1
>>> 16
>>> 10
>>> 0
>>> 3
>>> 7
>>> 17
>>> 20
>>> 15
>>> 3
>>> 8
>>> 18
>>> 10
>>> 7
>>> 3
>>> 4
>>> 19
>>> 8
>>> 13
>>> 3
>>> 5
>>> 20
>>> 10
>>> 9
>>> 2
>>> 6
>>>
>>>
>>>
>>> I need to to average of groups, using the values of column groupid,
and
>>> create a twin dataset in which the mean of the group is replaced
instead
>> of
>>> individual values.
>>> So for example, groupid 3, I calculate the mean (12+18)/2 and then
I
>>> replace in the new dataframe, but in the same positions, instead of
12
>> and
>>> 18, the values of the corresponding mean.
>>> I found this solution, where db10_means is the output dataset, db10
is my
>>> initial data.
>>>
>>> db10_means<-db10 %>%
>>>     group_by(groupid) %>%
>>>     mutate(across(starts_with("cp"), list(mean = mean)))
>>>
>>> It works perfectly, except that for NA values, where it replaces to
all
>>> group members the NA, while in some cases, the group is made of
some NA
>> and
>>> some values.
>>> So, when I have a group of two values and one NA, I would like that
for
>>> those with a value, the mean is replaced, for those with NA, the NA
is
>>> replaced.
>>> Here the mean function has not the na.rm=T option associated, but
it
>>> appears that this solution cannot be implemented in this case. I am
not
>>> even sure that this would be enough to solve my problem.
>>> Thanks for any help provided.
>>>
>> Hello,
>>
>> Your data is a mess, please don't post html, this is plain text
only
>> list. Anyway, I managed to create a data frame by copying the data to a
>> file named "rhelp.txt" and then running
>>
>>
>>
>> db10 <- scan(file = "rhelp.txt", what = character())
>> header <- db10[1:4]
>> db10 <- db10[-(1:4)] |> as.numeric()
>> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
>>     as.data.frame() |>
>>     setNames(header)
>>
>> str(db10)
>> #> 'data.frame':    25 obs. of  4 variables:
>> #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
>> #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
>> #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
>> #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
>>
>>
>> And here is the data in dput format.
>>
>>
>>
>> db10 <-
>>     structure(list(
>>       cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
>>               2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
>>       cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
>>               4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
>>       role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
>>                11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
>>       groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
>>                   20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
>>       class = "data.frame", row.names = c(NA, -25L))
>>
>>
>>
>> As for the problem, I am not sure if you want summarise instead of
>> mutate but here is a summarise solution.
>>
>>
>>
>> library(dplyr)
>>
>> db10 %>%
>>     group_by(groupid) %>%
>>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)))
>>
>> # same result, summarise's new argument .by avoids the need to
group_by
>> db10 %>%
>>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)), .by >> groupid)
>>
>>
>>
>> Can you post the expected output too?
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> --
>> Este e-mail foi analisado pelo software antiv?rus AVG para verificar a
>> presen?a de v?rus.
>> www.avg.com
>>
> 
> Hello,

Something like this?


test <-
   structure(list(
     cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
             2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
     cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
             4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
     role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
              11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
     groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
                 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
     class = "data.frame", row.names = c(NA, -25L))

library(dplyr)

test %>%
   group_by(groupid) %>%
   mutate(across(starts_with("cp"), list(mean = ~ mean(.x, na.rm =
TRUE))))
#> # A tibble: 25 ? 6
#> # Groups:   groupid [11]
#>      cp1   cp2  role groupid cp1_mean cp2_mean
#>    <dbl> <dbl> <dbl>   <dbl>    <dbl>   
<dbl>
#>  1     1    10    13       4     7        8
#>  2     5     2     5      10     5        2
#>  3     3     1     3       7     6.17     5.17
#>  4     7     4     6       4     7        8
#>  5    10     4     2       7     6.17     5.17
#>  6     5     5     8       3    10.7     13.3
#>  7     2     6     8       7     6.17     5.17
#>  8     4     4     7       8     5        4
#>  9     8     4     7       8     5        4
#> 10    10    15     3       3    10.7     13.3
#> # ? 15 more rows


Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

CALUM POLWART

2024-Sep-16 23:39 UTC

head link

[R] (no subject)

Rui's solution is good.

Bert's suggestion is also good!

For Berts suggestion you'd make the list bit

list(mean = mean_narm)

But prior to that define a function:

mean_narm<- function(x) {

m <- mean(x, na.rm = T)

if (!is.Nan (m)) {
m <- NA
}

return (m)
}

Would do what you suggested in your reply to Bert.

On Mon, 16 Sep 2024, 19:48 Rui Barradas, <ruipbarradas at sapo.pt> wrote:
> ?s 15:23 de 16/09/2024, Francesca escreveu:
> > Sorry for posting a non understandable code. In my screen the dataset
> > looked correctly.
> >
> >
> > I recreated my dataset, folllowing your example:
> >
> > test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5,
NA, 17,
> >   2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
> >                          c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 ,
5,
> 19,
> > NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
> >                          c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4,
4,
> 2,
> > 2, 3, 2, 3, 3, 2, 2 ,4),
> >                          c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4,
7,
> 5,
> > 8, 5, 1, 2, 4, 7, 6, 6)))
> > colnames(test)   
<-c("cp1","cp2","role","groupid")
> >
> > What I have done so far is the following, that works:
> >   test %>%
> >    group_by(groupid) %>%
> >    mutate(across(starts_with("cp"), list(mean = mean)))
> >
> > But the problem is with NA: everytime the mean encounters a NA, it
> creates
> > NA for all group members.
> > I need the software to calculate the mean ignoring NA. So when the
group
> is
> > made of three people, mean of the three.
> > If the group is two values and an NA, calculate the mean of two.
> >
> > My code works , creates a mean at each position for three subjects,
> > replacing instead of the value of the single, the group mean.
> > But when NA appears, all the group gets NA.
> >
> > Perhaps there is a different way to obtain the same result.
> >
> >
> >
> > On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
> >
> >> ?s 08:28 de 16/09/2024, Francesca escreveu:
> >>> Dear Contributors,
> >>> I hope someone has found a similar issue.
> >>>
> >>> I have this data set,
> >>>
> >>>
> >>>
> >>> cp1
> >>> cp2
> >>> role
> >>> groupid
> >>> 1
> >>> 10
> >>> 13
> >>> 4
> >>> 5
> >>> 2
> >>> 5
> >>> 10
> >>> 3
> >>> 1
> >>> 3
> >>> 7
> >>> 7
> >>> 4
> >>> 6
> >>> 4
> >>> 10
> >>> 4
> >>> 2
> >>> 7
> >>> 5
> >>> 5
> >>> 8
> >>> 3
> >>> 2
> >>> 6
> >>> 8
> >>> 7
> >>> 4
> >>> 4
> >>> 7
> >>> 8
> >>> 8
> >>> 4
> >>> 7
> >>> 8
> >>> 10
> >>> 15
> >>> 3
> >>> 3
> >>> 9
> >>> 15
> >>> 10
> >>> 2
> >>> 2
> >>> 10
> >>> 5
> >>> 5
> >>> 2
> >>> 4
> >>> 11
> >>> 20
> >>> 20
> >>> 2
> >>> 5
> >>> 12
> >>> 9
> >>> 11
> >>> 3
> >>> 6
> >>> 13
> >>> 10
> >>> 13
> >>> 4
> >>> 3
> >>> 14
> >>> 12
> >>> 6
> >>> 4
> >>> 2
> >>> 15
> >>> 7
> >>> 4
> >>> 4
> >>> 1
> >>> 16
> >>> 10
> >>> 0
> >>> 3
> >>> 7
> >>> 17
> >>> 20
> >>> 15
> >>> 3
> >>> 8
> >>> 18
> >>> 10
> >>> 7
> >>> 3
> >>> 4
> >>> 19
> >>> 8
> >>> 13
> >>> 3
> >>> 5
> >>> 20
> >>> 10
> >>> 9
> >>> 2
> >>> 6
> >>>
> >>>
> >>>
> >>> I need to to average of groups, using the values of column
groupid, and
> >>> create a twin dataset in which the mean of the group is
replaced
> instead
> >> of
> >>> individual values.
> >>> So for example, groupid 3, I calculate the mean (12+18)/2 and
then I
> >>> replace in the new dataframe, but in the same positions,
instead of 12
> >> and
> >>> 18, the values of the corresponding mean.
> >>> I found this solution, where db10_means is the output dataset,
db10 is
> my
> >>> initial data.
> >>>
> >>> db10_means<-db10 %>%
> >>>     group_by(groupid) %>%
> >>>     mutate(across(starts_with("cp"), list(mean =
mean)))
> >>>
> >>> It works perfectly, except that for NA values, where it
replaces to all
> >>> group members the NA, while in some cases, the group is made
of some NA
> >> and
> >>> some values.
> >>> So, when I have a group of two values and one NA, I would like
that for
> >>> those with a value, the mean is replaced, for those with NA,
the NA is
> >>> replaced.
> >>> Here the mean function has not the na.rm=T option associated,
but it
> >>> appears that this solution cannot be implemented in this case.
I am not
> >>> even sure that this would be enough to solve my problem.
> >>> Thanks for any help provided.
> >>>
> >> Hello,
> >>
> >> Your data is a mess, please don't post html, this is plain
text only
> >> list. Anyway, I managed to create a data frame by copying the data
to a
> >> file named "rhelp.txt" and then running
> >>
> >>
> >>
> >> db10 <- scan(file = "rhelp.txt", what = character())
> >> header <- db10[1:4]
> >> db10 <- db10[-(1:4)] |> as.numeric()
> >> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
> >>     as.data.frame() |>
> >>     setNames(header)
> >>
> >> str(db10)
> >> #> 'data.frame':    25 obs. of  4 variables:
> >> #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
> >> #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
> >> #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
> >> #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
> >>
> >>
> >> And here is the data in dput format.
> >>
> >>
> >>
> >> db10 <-
> >>     structure(list(
> >>       cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> >>               2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> >>       cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> >>               4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> >>       role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> >>                11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> >>       groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> >>                   20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> >>       class = "data.frame", row.names = c(NA, -25L))
> >>
> >>
> >>
> >> As for the problem, I am not sure if you want summarise instead of
> >> mutate but here is a summarise solution.
> >>
> >>
> >>
> >> library(dplyr)
> >>
> >> db10 %>%
> >>     group_by(groupid) %>%
> >>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm
= TRUE)))
> >>
> >> # same result, summarise's new argument .by avoids the need to
group_by
> >> db10 %>%
> >>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm
= TRUE)), .by > >> groupid)
> >>
> >>
> >>
> >> Can you post the expected output too?
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >>
> >> --
> >> Este e-mail foi analisado pelo software antiv?rus AVG para
verificar a
> >> presen?a de v?rus.
> >> www.avg.com
> >>
> >
> >
> Hello,
>
> Something like this?
>
>
> test <-
>    structure(list(
>      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
>              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
>      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
>              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
>      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
>               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
>      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
>                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
>      class = "data.frame", row.names = c(NA, -25L))
>
> library(dplyr)
>
> test %>%
>    group_by(groupid) %>%
>    mutate(across(starts_with("cp"), list(mean = ~ mean(.x, na.rm
= TRUE))))
> #> # A tibble: 25 ? 6
> #> # Groups:   groupid [11]
> #>      cp1   cp2  role groupid cp1_mean cp2_mean
> #>    <dbl> <dbl> <dbl>   <dbl>    <dbl>  
<dbl>
> #>  1     1    10    13       4     7        8
> #>  2     5     2     5      10     5        2
> #>  3     3     1     3       7     6.17     5.17
> #>  4     7     4     6       4     7        8
> #>  5    10     4     2       7     6.17     5.17
> #>  6     5     5     8       3    10.7     13.3
> #>  7     2     6     8       7     6.17     5.17
> #>  8     4     4     7       8     5        4
> #>  9     8     4     7       8     5        4
> #> 10    10    15     3       3    10.7     13.3
> #> # ? 15 more rows
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antiv?rus AVG para verificar a
> presen?a de v?rus.
> www.avg.com
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Sep 2024 - (no subject)

[R] (no subject)

[R] (no subject)