thr3ads.net - R help - [R] (no subject) [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Francesca

2024-Sep-16 14:23 UTC

[R] (no subject)

Sorry for posting a non understandable code. In my screen the dataset
looked correctly.


I recreated my dataset, folllowing your example:

test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA, 17,
 2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
                        c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
                        c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
2, 3, 2, 3, 3, 2, 2 ,4),
                        c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
8, 5, 1, 2, 4, 7, 6, 6)))
colnames(test)   
<-c("cp1","cp2","role","groupid")

What I have done so far is the following, that works:
 test %>%
  group_by(groupid) %>%
  mutate(across(starts_with("cp"), list(mean = mean)))

But the problem is with NA: everytime the mean encounters a NA, it creates
NA for all group members.
I need the software to calculate the mean ignoring NA. So when the group is
made of three people, mean of the three.
If the group is two values and an NA, calculate the mean of two.

My code works , creates a mean at each position for three subjects,
replacing instead of the value of the single, the group mean.
But when NA appears, all the group gets NA.

Perhaps there is a different way to obtain the same result.



On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> ?s 08:28 de 16/09/2024, Francesca escreveu:
> > Dear Contributors,
> > I hope someone has found a similar issue.
> >
> > I have this data set,
> >
> >
> >
> > cp1
> > cp2
> > role
> > groupid
> > 1
> > 10
> > 13
> > 4
> > 5
> > 2
> > 5
> > 10
> > 3
> > 1
> > 3
> > 7
> > 7
> > 4
> > 6
> > 4
> > 10
> > 4
> > 2
> > 7
> > 5
> > 5
> > 8
> > 3
> > 2
> > 6
> > 8
> > 7
> > 4
> > 4
> > 7
> > 8
> > 8
> > 4
> > 7
> > 8
> > 10
> > 15
> > 3
> > 3
> > 9
> > 15
> > 10
> > 2
> > 2
> > 10
> > 5
> > 5
> > 2
> > 4
> > 11
> > 20
> > 20
> > 2
> > 5
> > 12
> > 9
> > 11
> > 3
> > 6
> > 13
> > 10
> > 13
> > 4
> > 3
> > 14
> > 12
> > 6
> > 4
> > 2
> > 15
> > 7
> > 4
> > 4
> > 1
> > 16
> > 10
> > 0
> > 3
> > 7
> > 17
> > 20
> > 15
> > 3
> > 8
> > 18
> > 10
> > 7
> > 3
> > 4
> > 19
> > 8
> > 13
> > 3
> > 5
> > 20
> > 10
> > 9
> > 2
> > 6
> >
> >
> >
> > I need to to average of groups, using the values of column groupid,
and
> > create a twin dataset in which the mean of the group is replaced
instead
> of
> > individual values.
> > So for example, groupid 3, I calculate the mean (12+18)/2 and then I
> > replace in the new dataframe, but in the same positions, instead of 12
> and
> > 18, the values of the corresponding mean.
> > I found this solution, where db10_means is the output dataset, db10 is
my
> > initial data.
> >
> > db10_means<-db10 %>%
> >    group_by(groupid) %>%
> >    mutate(across(starts_with("cp"), list(mean = mean)))
> >
> > It works perfectly, except that for NA values, where it replaces to
all
> > group members the NA, while in some cases, the group is made of some
NA
> and
> > some values.
> > So, when I have a group of two values and one NA, I would like that
for
> > those with a value, the mean is replaced, for those with NA, the NA is
> > replaced.
> > Here the mean function has not the na.rm=T option associated, but it
> > appears that this solution cannot be implemented in this case. I am
not
> > even sure that this would be enough to solve my problem.
> > Thanks for any help provided.
> >
> Hello,
>
> Your data is a mess, please don't post html, this is plain text only
> list. Anyway, I managed to create a data frame by copying the data to a
> file named "rhelp.txt" and then running
>
>
>
> db10 <- scan(file = "rhelp.txt", what = character())
> header <- db10[1:4]
> db10 <- db10[-(1:4)] |> as.numeric()
> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
>    as.data.frame() |>
>    setNames(header)
>
> str(db10)
> #> 'data.frame':    25 obs. of  4 variables:
> #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
> #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
> #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
> #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
>
>
> And here is the data in dput format.
>
>
>
> db10 <-
>    structure(list(
>      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
>              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
>      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
>              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
>      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
>               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
>      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
>                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
>      class = "data.frame", row.names = c(NA, -25L))
>
>
>
> As for the problem, I am not sure if you want summarise instead of
> mutate but here is a summarise solution.
>
>
>
> library(dplyr)
>
> db10 %>%
>    group_by(groupid) %>%
>    summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))
>
> # same result, summarise's new argument .by avoids the need to group_by
> db10 %>%
>    summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)),
.by > groupid)
>
>
>
> Can you post the expected output too?
>
> Hope this helps,
>
> Rui Barradas
>
>
> --
> Este e-mail foi analisado pelo software antiv?rus AVG para verificar a
> presen?a de v?rus.
> www.avg.com
>

-- 

Francesca


----------------------------------

	[[alternative HTML version deleted]]

Bert Gunter

2024-Sep-16 14:29 UTC

head link

[R] (no subject)

See the na.rm argument of ?mean

But what happens if all values are NA?

-- Bert


On Mon, Sep 16, 2024 at 7:24?AM Francesca <francesca.pancotto at
gmail.com> wrote:>
> Sorry for posting a non understandable code. In my screen the dataset
> looked correctly.
>
>
> I recreated my dataset, folllowing your example:
>
> test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA,
17,
>  2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
>                         c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
> NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
>                         c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
> 2, 3, 2, 3, 3, 2, 2 ,4),
>                         c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
> 8, 5, 1, 2, 4, 7, 6, 6)))
> colnames(test)   
<-c("cp1","cp2","role","groupid")
>
> What I have done so far is the following, that works:
>  test %>%
>   group_by(groupid) %>%
>   mutate(across(starts_with("cp"), list(mean = mean)))
>
> But the problem is with NA: everytime the mean encounters a NA, it creates
> NA for all group members.
> I need the software to calculate the mean ignoring NA. So when the group is
> made of three people, mean of the three.
> If the group is two values and an NA, calculate the mean of two.
>
> My code works , creates a mean at each position for three subjects,
> replacing instead of the value of the single, the group mean.
> But when NA appears, all the group gets NA.
>
> Perhaps there is a different way to obtain the same result.
>
>
>
> On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
>
> > ?s 08:28 de 16/09/2024, Francesca escreveu:
> > > Dear Contributors,
> > > I hope someone has found a similar issue.
> > >
> > > I have this data set,
> > >
> > >
> > >
> > > cp1
> > > cp2
> > > role
> > > groupid
> > > 1
> > > 10
> > > 13
> > > 4
> > > 5
> > > 2
> > > 5
> > > 10
> > > 3
> > > 1
> > > 3
> > > 7
> > > 7
> > > 4
> > > 6
> > > 4
> > > 10
> > > 4
> > > 2
> > > 7
> > > 5
> > > 5
> > > 8
> > > 3
> > > 2
> > > 6
> > > 8
> > > 7
> > > 4
> > > 4
> > > 7
> > > 8
> > > 8
> > > 4
> > > 7
> > > 8
> > > 10
> > > 15
> > > 3
> > > 3
> > > 9
> > > 15
> > > 10
> > > 2
> > > 2
> > > 10
> > > 5
> > > 5
> > > 2
> > > 4
> > > 11
> > > 20
> > > 20
> > > 2
> > > 5
> > > 12
> > > 9
> > > 11
> > > 3
> > > 6
> > > 13
> > > 10
> > > 13
> > > 4
> > > 3
> > > 14
> > > 12
> > > 6
> > > 4
> > > 2
> > > 15
> > > 7
> > > 4
> > > 4
> > > 1
> > > 16
> > > 10
> > > 0
> > > 3
> > > 7
> > > 17
> > > 20
> > > 15
> > > 3
> > > 8
> > > 18
> > > 10
> > > 7
> > > 3
> > > 4
> > > 19
> > > 8
> > > 13
> > > 3
> > > 5
> > > 20
> > > 10
> > > 9
> > > 2
> > > 6
> > >
> > >
> > >
> > > I need to to average of groups, using the values of column
groupid, and
> > > create a twin dataset in which the mean of the group is replaced
instead
> > of
> > > individual values.
> > > So for example, groupid 3, I calculate the mean (12+18)/2 and
then I
> > > replace in the new dataframe, but in the same positions, instead
of 12
> > and
> > > 18, the values of the corresponding mean.
> > > I found this solution, where db10_means is the output dataset,
db10 is my
> > > initial data.
> > >
> > > db10_means<-db10 %>%
> > >    group_by(groupid) %>%
> > >    mutate(across(starts_with("cp"), list(mean = mean)))
> > >
> > > It works perfectly, except that for NA values, where it replaces
to all
> > > group members the NA, while in some cases, the group is made of
some NA
> > and
> > > some values.
> > > So, when I have a group of two values and one NA, I would like
that for
> > > those with a value, the mean is replaced, for those with NA, the
NA is
> > > replaced.
> > > Here the mean function has not the na.rm=T option associated, but
it
> > > appears that this solution cannot be implemented in this case. I
am not
> > > even sure that this would be enough to solve my problem.
> > > Thanks for any help provided.
> > >
> > Hello,
> >
> > Your data is a mess, please don't post html, this is plain text
only
> > list. Anyway, I managed to create a data frame by copying the data to
a
> > file named "rhelp.txt" and then running
> >
> >
> >
> > db10 <- scan(file = "rhelp.txt", what = character())
> > header <- db10[1:4]
> > db10 <- db10[-(1:4)] |> as.numeric()
> > db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
> >    as.data.frame() |>
> >    setNames(header)
> >
> > str(db10)
> > #> 'data.frame':    25 obs. of  4 variables:
> > #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
> > #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
> > #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
> > #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
> >
> >
> > And here is the data in dput format.
> >
> >
> >
> > db10 <-
> >    structure(list(
> >      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> >              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> >      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> >              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> >      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> >               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> >      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> >                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> >      class = "data.frame", row.names = c(NA, -25L))
> >
> >
> >
> > As for the problem, I am not sure if you want summarise instead of
> > mutate but here is a summarise solution.
> >
> >
> >
> > library(dplyr)
> >
> > db10 %>%
> >    group_by(groupid) %>%
> >    summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)))
> >
> > # same result, summarise's new argument .by avoids the need to
group_by
> > db10 %>%
> >    summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)), .by > > groupid)
> >
> >
> >
> > Can you post the expected output too?
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > --
> > Este e-mail foi analisado pelo software antiv?rus AVG para verificar a
> > presen?a de v?rus.
> > www.avg.com
> >
>
>
> --
>
> Francesca
>
>
> ----------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

John Kane

2024-Sep-16 14:40 UTC

head link

[R] (no subject)

Hi,

Thanks for the revised dataset.  The R-list does not accept HTML s a safety
measure so it strips everything down to text which is what gives us the
very garbled text so you need to always send in text format.

The best way to supply sample   data is using the dput() function.  The
dput() function gives us an exact copy of your R data set. Here is a very
simple example of how to do it.

```
dat <- data.frame(xx = 1:10, yy = letters[1:10])

dput(dat)
```
This gives us
```
structure(list(xx = 1:10, yy = c("a", "b", "c",
"d", "e", "f",
                                 "g", "h", "i",
"j")), class "data.frame", row.names = c(NA,  -10L))
```
Paste it into your email and we  can then copy it into R





On Mon, 16 Sept 2024 at 10:24, Francesca <francesca.pancotto at gmail.com>
wrote:
> Sorry for posting a non understandable code. In my screen the dataset
> looked correctly.
>
>
> I recreated my dataset, folllowing your example:
>
> test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA,
17,
>  2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
>                         c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
> NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
>                         c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
> 2, 3, 2, 3, 3, 2, 2 ,4),
>                         c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
> 8, 5, 1, 2, 4, 7, 6, 6)))
> colnames(test)   
<-c("cp1","cp2","role","groupid")
>
> What I have done so far is the following, that works:
>  test %>%
>   group_by(groupid) %>%
>   mutate(across(starts_with("cp"), list(mean = mean)))
>
> But the problem is with NA: everytime the mean encounters a NA, it creates
> NA for all group members.
> I need the software to calculate the mean ignoring NA. So when the group is
> made of three people, mean of the three.
> If the group is two values and an NA, calculate the mean of two.
>
> My code works , creates a mean at each position for three subjects,
> replacing instead of the value of the single, the group mean.
> But when NA appears, all the group gets NA.
>
> Perhaps there is a different way to obtain the same result.
>
>
>
> On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
>
> > ?s 08:28 de 16/09/2024, Francesca escreveu:
> > > Dear Contributors,
> > > I hope someone has found a similar issue.
> > >
> > > I have this data set,
> > >
> > >
> > >
> > > cp1
> > > cp2
> > > role
> > > groupid
> > > 1
> > > 10
> > > 13
> > > 4
> > > 5
> > > 2
> > > 5
> > > 10
> > > 3
> > > 1
> > > 3
> > > 7
> > > 7
> > > 4
> > > 6
> > > 4
> > > 10
> > > 4
> > > 2
> > > 7
> > > 5
> > > 5
> > > 8
> > > 3
> > > 2
> > > 6
> > > 8
> > > 7
> > > 4
> > > 4
> > > 7
> > > 8
> > > 8
> > > 4
> > > 7
> > > 8
> > > 10
> > > 15
> > > 3
> > > 3
> > > 9
> > > 15
> > > 10
> > > 2
> > > 2
> > > 10
> > > 5
> > > 5
> > > 2
> > > 4
> > > 11
> > > 20
> > > 20
> > > 2
> > > 5
> > > 12
> > > 9
> > > 11
> > > 3
> > > 6
> > > 13
> > > 10
> > > 13
> > > 4
> > > 3
> > > 14
> > > 12
> > > 6
> > > 4
> > > 2
> > > 15
> > > 7
> > > 4
> > > 4
> > > 1
> > > 16
> > > 10
> > > 0
> > > 3
> > > 7
> > > 17
> > > 20
> > > 15
> > > 3
> > > 8
> > > 18
> > > 10
> > > 7
> > > 3
> > > 4
> > > 19
> > > 8
> > > 13
> > > 3
> > > 5
> > > 20
> > > 10
> > > 9
> > > 2
> > > 6
> > >
> > >
> > >
> > > I need to to average of groups, using the values of column
groupid, and
> > > create a twin dataset in which the mean of the group is replaced
> instead
> > of
> > > individual values.
> > > So for example, groupid 3, I calculate the mean (12+18)/2 and
then I
> > > replace in the new dataframe, but in the same positions, instead
of 12
> > and
> > > 18, the values of the corresponding mean.
> > > I found this solution, where db10_means is the output dataset,
db10 is
> my
> > > initial data.
> > >
> > > db10_means<-db10 %>%
> > >    group_by(groupid) %>%
> > >    mutate(across(starts_with("cp"), list(mean = mean)))
> > >
> > > It works perfectly, except that for NA values, where it replaces
to all
> > > group members the NA, while in some cases, the group is made of
some NA
> > and
> > > some values.
> > > So, when I have a group of two values and one NA, I would like
that for
> > > those with a value, the mean is replaced, for those with NA, the
NA is
> > > replaced.
> > > Here the mean function has not the na.rm=T option associated, but
it
> > > appears that this solution cannot be implemented in this case. I
am not
> > > even sure that this would be enough to solve my problem.
> > > Thanks for any help provided.
> > >
> > Hello,
> >
> > Your data is a mess, please don't post html, this is plain text
only
> > list. Anyway, I managed to create a data frame by copying the data to
a
> > file named "rhelp.txt" and then running
> >
> >
> >
> > db10 <- scan(file = "rhelp.txt", what = character())
> > header <- db10[1:4]
> > db10 <- db10[-(1:4)] |> as.numeric()
> > db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
> >    as.data.frame() |>
> >    setNames(header)
> >
> > str(db10)
> > #> 'data.frame':    25 obs. of  4 variables:
> > #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
> > #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
> > #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
> > #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
> >
> >
> > And here is the data in dput format.
> >
> >
> >
> > db10 <-
> >    structure(list(
> >      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> >              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> >      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> >              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> >      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> >               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> >      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> >                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> >      class = "data.frame", row.names = c(NA, -25L))
> >
> >
> >
> > As for the problem, I am not sure if you want summarise instead of
> > mutate but here is a summarise solution.
> >
> >
> >
> > library(dplyr)
> >
> > db10 %>%
> >    group_by(groupid) %>%
> >    summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)))
> >
> > # same result, summarise's new argument .by avoids the need to
group_by
> > db10 %>%
> >    summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)), .by > > groupid)
> >
> >
> >
> > Can you post the expected output too?
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> >
> > --
> > Este e-mail foi analisado pelo software antiv?rus AVG para verificar a
> > presen?a de v?rus.
> > www.avg.com
> >
>
>
> --
>
> Francesca
>
>
> ----------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
John Kane
Kingston ON Canada

	[[alternative HTML version deleted]]

Rui Barradas

2024-Sep-16 18:47 UTC

head link

[R] (no subject)

?s 15:23 de 16/09/2024, Francesca escreveu:> Sorry for posting a non understandable code. In my screen the dataset
> looked correctly.
> 
> 
> I recreated my dataset, folllowing your example:
> 
> test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5, NA,
17,
>   2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
>                          c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5, 19,
> NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
>                          c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2,
> 2, 3, 2, 3, 3, 2, 2 ,4),
>                          c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5,
> 8, 5, 1, 2, 4, 7, 6, 6)))
> colnames(test)   
<-c("cp1","cp2","role","groupid")
> 
> What I have done so far is the following, that works:
>   test %>%
>    group_by(groupid) %>%
>    mutate(across(starts_with("cp"), list(mean = mean)))
> 
> But the problem is with NA: everytime the mean encounters a NA, it creates
> NA for all group members.
> I need the software to calculate the mean ignoring NA. So when the group is
> made of three people, mean of the three.
> If the group is two values and an NA, calculate the mean of two.
> 
> My code works , creates a mean at each position for three subjects,
> replacing instead of the value of the single, the group mean.
> But when NA appears, all the group gets NA.
> 
> Perhaps there is a different way to obtain the same result.
> 
> 
> 
> On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> 
>> ?s 08:28 de 16/09/2024, Francesca escreveu:
>>> Dear Contributors,
>>> I hope someone has found a similar issue.
>>>
>>> I have this data set,
>>>
>>>
>>>
>>> cp1
>>> cp2
>>> role
>>> groupid
>>> 1
>>> 10
>>> 13
>>> 4
>>> 5
>>> 2
>>> 5
>>> 10
>>> 3
>>> 1
>>> 3
>>> 7
>>> 7
>>> 4
>>> 6
>>> 4
>>> 10
>>> 4
>>> 2
>>> 7
>>> 5
>>> 5
>>> 8
>>> 3
>>> 2
>>> 6
>>> 8
>>> 7
>>> 4
>>> 4
>>> 7
>>> 8
>>> 8
>>> 4
>>> 7
>>> 8
>>> 10
>>> 15
>>> 3
>>> 3
>>> 9
>>> 15
>>> 10
>>> 2
>>> 2
>>> 10
>>> 5
>>> 5
>>> 2
>>> 4
>>> 11
>>> 20
>>> 20
>>> 2
>>> 5
>>> 12
>>> 9
>>> 11
>>> 3
>>> 6
>>> 13
>>> 10
>>> 13
>>> 4
>>> 3
>>> 14
>>> 12
>>> 6
>>> 4
>>> 2
>>> 15
>>> 7
>>> 4
>>> 4
>>> 1
>>> 16
>>> 10
>>> 0
>>> 3
>>> 7
>>> 17
>>> 20
>>> 15
>>> 3
>>> 8
>>> 18
>>> 10
>>> 7
>>> 3
>>> 4
>>> 19
>>> 8
>>> 13
>>> 3
>>> 5
>>> 20
>>> 10
>>> 9
>>> 2
>>> 6
>>>
>>>
>>>
>>> I need to to average of groups, using the values of column groupid,
and
>>> create a twin dataset in which the mean of the group is replaced
instead
>> of
>>> individual values.
>>> So for example, groupid 3, I calculate the mean (12+18)/2 and then
I
>>> replace in the new dataframe, but in the same positions, instead of
12
>> and
>>> 18, the values of the corresponding mean.
>>> I found this solution, where db10_means is the output dataset, db10
is my
>>> initial data.
>>>
>>> db10_means<-db10 %>%
>>>     group_by(groupid) %>%
>>>     mutate(across(starts_with("cp"), list(mean = mean)))
>>>
>>> It works perfectly, except that for NA values, where it replaces to
all
>>> group members the NA, while in some cases, the group is made of
some NA
>> and
>>> some values.
>>> So, when I have a group of two values and one NA, I would like that
for
>>> those with a value, the mean is replaced, for those with NA, the NA
is
>>> replaced.
>>> Here the mean function has not the na.rm=T option associated, but
it
>>> appears that this solution cannot be implemented in this case. I am
not
>>> even sure that this would be enough to solve my problem.
>>> Thanks for any help provided.
>>>
>> Hello,
>>
>> Your data is a mess, please don't post html, this is plain text
only
>> list. Anyway, I managed to create a data frame by copying the data to a
>> file named "rhelp.txt" and then running
>>
>>
>>
>> db10 <- scan(file = "rhelp.txt", what = character())
>> header <- db10[1:4]
>> db10 <- db10[-(1:4)] |> as.numeric()
>> db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
>>     as.data.frame() |>
>>     setNames(header)
>>
>> str(db10)
>> #> 'data.frame':    25 obs. of  4 variables:
>> #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
>> #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
>> #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
>> #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
>>
>>
>> And here is the data in dput format.
>>
>>
>>
>> db10 <-
>>     structure(list(
>>       cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
>>               2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
>>       cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
>>               4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
>>       role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
>>                11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
>>       groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
>>                   20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
>>       class = "data.frame", row.names = c(NA, -25L))
>>
>>
>>
>> As for the problem, I am not sure if you want summarise instead of
>> mutate but here is a summarise solution.
>>
>>
>>
>> library(dplyr)
>>
>> db10 %>%
>>     group_by(groupid) %>%
>>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)))
>>
>> # same result, summarise's new argument .by avoids the need to
group_by
>> db10 %>%
>>     summarise(across(starts_with("cp"), ~ mean(.x, na.rm =
TRUE)), .by >> groupid)
>>
>>
>>
>> Can you post the expected output too?
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> --
>> Este e-mail foi analisado pelo software antiv?rus AVG para verificar a
>> presen?a de v?rus.
>> www.avg.com
>>
> 
> Hello,

Something like this?


test <-
   structure(list(
     cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
             2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
     cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
             4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
     role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
              11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
     groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
                 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
     class = "data.frame", row.names = c(NA, -25L))

library(dplyr)

test %>%
   group_by(groupid) %>%
   mutate(across(starts_with("cp"), list(mean = ~ mean(.x, na.rm =
TRUE))))
#> # A tibble: 25 ? 6
#> # Groups:   groupid [11]
#>      cp1   cp2  role groupid cp1_mean cp2_mean
#>    <dbl> <dbl> <dbl>   <dbl>    <dbl>   
<dbl>
#>  1     1    10    13       4     7        8
#>  2     5     2     5      10     5        2
#>  3     3     1     3       7     6.17     5.17
#>  4     7     4     6       4     7        8
#>  5    10     4     2       7     6.17     5.17
#>  6     5     5     8       3    10.7     13.3
#>  7     2     6     8       7     6.17     5.17
#>  8     4     4     7       8     5        4
#>  9     8     4     7       8     5        4
#> 10    10    15     3       3    10.7     13.3
#> # ? 15 more rows


Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

R help - Sep 2024 - (no subject)

[R] (no subject)

[R] (no subject)

[R] (no subject)

[R] (no subject)