thr3ads.net - R help - [R] (no subject) [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Francesca

2024-Sep-16 15:05 UTC

[R] (no subject)

All' Na Is Na.


Il lun 16 set 2024, 16:29 Bert Gunter <bgunter.4567 at gmail.com> ha
scritto:
> See the na.rm argument of ?mean
>
> But what happens if all values are NA?
>
> -- Bert
>
>
> On Mon, Sep 16, 2024 at 7:24?AM Francesca <francesca.pancotto at
gmail.com>
> wrote:
> >
> > Sorry for posting a non understandable code. In my screen the dataset
> > looked correctly.
> >
> >
> > I recreated my dataset, folllowing your example:
> >
> > test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20,  5,
NA, 17,
> >  2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
> >                         c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7 , 5,
> 19,
> > NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
> >                         c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4,
4,
> 2,
> > 2, 3, 2, 3, 3, 2, 2 ,4),
> >                         c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7,
5,
> > 8, 5, 1, 2, 4, 7, 6, 6)))
> > colnames(test)   
<-c("cp1","cp2","role","groupid")
> >
> > What I have done so far is the following, that works:
> >  test %>%
> >   group_by(groupid) %>%
> >   mutate(across(starts_with("cp"), list(mean = mean)))
> >
> > But the problem is with NA: everytime the mean encounters a NA, it
> creates
> > NA for all group members.
> > I need the software to calculate the mean ignoring NA. So when the
group
> is
> > made of three people, mean of the three.
> > If the group is two values and an NA, calculate the mean of two.
> >
> > My code works , creates a mean at each position for three subjects,
> > replacing instead of the value of the single, the group mean.
> > But when NA appears, all the group gets NA.
> >
> > Perhaps there is a different way to obtain the same result.
> >
> >
> >
> > On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
> >
> > > ?s 08:28 de 16/09/2024, Francesca escreveu:
> > > > Dear Contributors,
> > > > I hope someone has found a similar issue.
> > > >
> > > > I have this data set,
> > > >
> > > >
> > > >
> > > > cp1
> > > > cp2
> > > > role
> > > > groupid
> > > > 1
> > > > 10
> > > > 13
> > > > 4
> > > > 5
> > > > 2
> > > > 5
> > > > 10
> > > > 3
> > > > 1
> > > > 3
> > > > 7
> > > > 7
> > > > 4
> > > > 6
> > > > 4
> > > > 10
> > > > 4
> > > > 2
> > > > 7
> > > > 5
> > > > 5
> > > > 8
> > > > 3
> > > > 2
> > > > 6
> > > > 8
> > > > 7
> > > > 4
> > > > 4
> > > > 7
> > > > 8
> > > > 8
> > > > 4
> > > > 7
> > > > 8
> > > > 10
> > > > 15
> > > > 3
> > > > 3
> > > > 9
> > > > 15
> > > > 10
> > > > 2
> > > > 2
> > > > 10
> > > > 5
> > > > 5
> > > > 2
> > > > 4
> > > > 11
> > > > 20
> > > > 20
> > > > 2
> > > > 5
> > > > 12
> > > > 9
> > > > 11
> > > > 3
> > > > 6
> > > > 13
> > > > 10
> > > > 13
> > > > 4
> > > > 3
> > > > 14
> > > > 12
> > > > 6
> > > > 4
> > > > 2
> > > > 15
> > > > 7
> > > > 4
> > > > 4
> > > > 1
> > > > 16
> > > > 10
> > > > 0
> > > > 3
> > > > 7
> > > > 17
> > > > 20
> > > > 15
> > > > 3
> > > > 8
> > > > 18
> > > > 10
> > > > 7
> > > > 3
> > > > 4
> > > > 19
> > > > 8
> > > > 13
> > > > 3
> > > > 5
> > > > 20
> > > > 10
> > > > 9
> > > > 2
> > > > 6
> > > >
> > > >
> > > >
> > > > I need to to average of groups, using the values of column
groupid,
> and
> > > > create a twin dataset in which the mean of the group is
replaced
> instead
> > > of
> > > > individual values.
> > > > So for example, groupid 3, I calculate the mean (12+18)/2
and then I
> > > > replace in the new dataframe, but in the same positions,
instead of
> 12
> > > and
> > > > 18, the values of the corresponding mean.
> > > > I found this solution, where db10_means is the output
dataset, db10
> is my
> > > > initial data.
> > > >
> > > > db10_means<-db10 %>%
> > > >    group_by(groupid) %>%
> > > >    mutate(across(starts_with("cp"), list(mean =
mean)))
> > > >
> > > > It works perfectly, except that for NA values, where it
replaces to
> all
> > > > group members the NA, while in some cases, the group is made
of some
> NA
> > > and
> > > > some values.
> > > > So, when I have a group of two values and one NA, I would
like that
> for
> > > > those with a value, the mean is replaced, for those with NA,
the NA
> is
> > > > replaced.
> > > > Here the mean function has not the na.rm=T option
associated, but it
> > > > appears that this solution cannot be implemented in this
case. I am
> not
> > > > even sure that this would be enough to solve my problem.
> > > > Thanks for any help provided.
> > > >
> > > Hello,
> > >
> > > Your data is a mess, please don't post html, this is plain
text only
> > > list. Anyway, I managed to create a data frame by copying the
data to a
> > > file named "rhelp.txt" and then running
> > >
> > >
> > >
> > > db10 <- scan(file = "rhelp.txt", what = character())
> > > header <- db10[1:4]
> > > db10 <- db10[-(1:4)] |> as.numeric()
> > > db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
> > >    as.data.frame() |>
> > >    setNames(header)
> > >
> > > str(db10)
> > > #> 'data.frame':    25 obs. of  4 variables:
> > > #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
> > > #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
> > > #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
> > > #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
> > >
> > >
> > > And here is the data in dput format.
> > >
> > >
> > >
> > > db10 <-
> > >    structure(list(
> > >      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
> > >              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
> > >      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
> > >              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
> > >      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
> > >               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
> > >      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
> > >                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
> > >      class = "data.frame", row.names = c(NA, -25L))
> > >
> > >
> > >
> > > As for the problem, I am not sure if you want summarise instead
of
> > > mutate but here is a summarise solution.
> > >
> > >
> > >
> > > library(dplyr)
> > >
> > > db10 %>%
> > >    group_by(groupid) %>%
> > >    summarise(across(starts_with("cp"), ~ mean(.x, na.rm
= TRUE)))
> > >
> > > # same result, summarise's new argument .by avoids the need
to group_by
> > > db10 %>%
> > >    summarise(across(starts_with("cp"), ~ mean(.x, na.rm
= TRUE)), .by > > > groupid)
> > >
> > >
> > >
> > > Can you post the expected output too?
> > >
> > > Hope this helps,
> > >
> > > Rui Barradas
> > >
> > >
> > > --
> > > Este e-mail foi analisado pelo software antiv?rus AVG para
verificar a
> > > presen?a de v?rus.
> > > www.avg.com
> > >
> >
> >
> > --
> >
> > Francesca
> >
> >
> > ----------------------------------
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bert Gunter

2024-Sep-16 18:02 UTC

head link

[R] (no subject)

It's NA *not* Na. Details matter.

Ah, but note:> mean(c(NA,NA), na.rm = TRUE)[1] NaN

So if that might happen, you'll have to write your own mean function,
say mymean(), to do what you want. I leave that (simple) pleasure to
you.

-- Bert

On Mon, Sep 16, 2024 at 8:05?AM Francesca <francesca.pancotto at
gmail.com> wrote:>
> All' Na Is Na.
>
>
> Il lun 16 set 2024, 16:29 Bert Gunter <bgunter.4567 at gmail.com> ha
scritto:
>>
>> See the na.rm argument of ?mean
>>
>> But what happens if all values are NA?
>>
>> -- Bert
>>
>>
>> On Mon, Sep 16, 2024 at 7:24?AM Francesca <francesca.pancotto at
gmail.com> wrote:
>> >
>> > Sorry for posting a non understandable code. In my screen the
dataset
>> > looked correctly.
>> >
>> >
>> > I recreated my dataset, folllowing your example:
>> >
>> > test<-data.frame(matrix(c( 8,  8,  5 , 5 ,NA ,NA , 1, 15, 20, 
5, NA, 17,
>> >  2 , 5 , 5,  2 , 5 ,NA,  5 ,10, 10,  5 ,12, NA),
>> >                         c( 18,  5,  5,  5, NA,  9,  2,  2, 10,  7
, 5, 19,
>> > NA, 10, NA, 4, NA,  8, NA,  5, 10,  3, 17, NA),
>> >                         c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4,
4, 4, 2,
>> > 2, 3, 2, 3, 3, 2, 2 ,4),
>> >                         c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8,
4, 7, 5,
>> > 8, 5, 1, 2, 4, 7, 6, 6)))
>> > colnames(test)   
<-c("cp1","cp2","role","groupid")
>> >
>> > What I have done so far is the following, that works:
>> >  test %>%
>> >   group_by(groupid) %>%
>> >   mutate(across(starts_with("cp"), list(mean = mean)))
>> >
>> > But the problem is with NA: everytime the mean encounters a NA, it
creates
>> > NA for all group members.
>> > I need the software to calculate the mean ignoring NA. So when the
group is
>> > made of three people, mean of the three.
>> > If the group is two values and an NA, calculate the mean of two.
>> >
>> > My code works , creates a mean at each position for three
subjects,
>> > replacing instead of the value of the single, the group mean.
>> > But when NA appears, all the group gets NA.
>> >
>> > Perhaps there is a different way to obtain the same result.
>> >
>> >
>> >
>> > On Mon, 16 Sept 2024 at 11:35, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>> >
>> > > ?s 08:28 de 16/09/2024, Francesca escreveu:
>> > > > Dear Contributors,
>> > > > I hope someone has found a similar issue.
>> > > >
>> > > > I have this data set,
>> > > >
>> > > >
>> > > >
>> > > > cp1
>> > > > cp2
>> > > > role
>> > > > groupid
>> > > > 1
>> > > > 10
>> > > > 13
>> > > > 4
>> > > > 5
>> > > > 2
>> > > > 5
>> > > > 10
>> > > > 3
>> > > > 1
>> > > > 3
>> > > > 7
>> > > > 7
>> > > > 4
>> > > > 6
>> > > > 4
>> > > > 10
>> > > > 4
>> > > > 2
>> > > > 7
>> > > > 5
>> > > > 5
>> > > > 8
>> > > > 3
>> > > > 2
>> > > > 6
>> > > > 8
>> > > > 7
>> > > > 4
>> > > > 4
>> > > > 7
>> > > > 8
>> > > > 8
>> > > > 4
>> > > > 7
>> > > > 8
>> > > > 10
>> > > > 15
>> > > > 3
>> > > > 3
>> > > > 9
>> > > > 15
>> > > > 10
>> > > > 2
>> > > > 2
>> > > > 10
>> > > > 5
>> > > > 5
>> > > > 2
>> > > > 4
>> > > > 11
>> > > > 20
>> > > > 20
>> > > > 2
>> > > > 5
>> > > > 12
>> > > > 9
>> > > > 11
>> > > > 3
>> > > > 6
>> > > > 13
>> > > > 10
>> > > > 13
>> > > > 4
>> > > > 3
>> > > > 14
>> > > > 12
>> > > > 6
>> > > > 4
>> > > > 2
>> > > > 15
>> > > > 7
>> > > > 4
>> > > > 4
>> > > > 1
>> > > > 16
>> > > > 10
>> > > > 0
>> > > > 3
>> > > > 7
>> > > > 17
>> > > > 20
>> > > > 15
>> > > > 3
>> > > > 8
>> > > > 18
>> > > > 10
>> > > > 7
>> > > > 3
>> > > > 4
>> > > > 19
>> > > > 8
>> > > > 13
>> > > > 3
>> > > > 5
>> > > > 20
>> > > > 10
>> > > > 9
>> > > > 2
>> > > > 6
>> > > >
>> > > >
>> > > >
>> > > > I need to to average of groups, using the values of
column groupid, and
>> > > > create a twin dataset in which the mean of the group is
replaced instead
>> > > of
>> > > > individual values.
>> > > > So for example, groupid 3, I calculate the mean
(12+18)/2 and then I
>> > > > replace in the new dataframe, but in the same positions,
instead of 12
>> > > and
>> > > > 18, the values of the corresponding mean.
>> > > > I found this solution, where db10_means is the output
dataset, db10 is my
>> > > > initial data.
>> > > >
>> > > > db10_means<-db10 %>%
>> > > >    group_by(groupid) %>%
>> > > >    mutate(across(starts_with("cp"), list(mean
= mean)))
>> > > >
>> > > > It works perfectly, except that for NA values, where it
replaces to all
>> > > > group members the NA, while in some cases, the group is
made of some NA
>> > > and
>> > > > some values.
>> > > > So, when I have a group of two values and one NA, I
would like that for
>> > > > those with a value, the mean is replaced, for those with
NA, the NA is
>> > > > replaced.
>> > > > Here the mean function has not the na.rm=T option
associated, but it
>> > > > appears that this solution cannot be implemented in this
case. I am not
>> > > > even sure that this would be enough to solve my problem.
>> > > > Thanks for any help provided.
>> > > >
>> > > Hello,
>> > >
>> > > Your data is a mess, please don't post html, this is
plain text only
>> > > list. Anyway, I managed to create a data frame by copying the
data to a
>> > > file named "rhelp.txt" and then running
>> > >
>> > >
>> > >
>> > > db10 <- scan(file = "rhelp.txt", what =
character())
>> > > header <- db10[1:4]
>> > > db10 <- db10[-(1:4)] |> as.numeric()
>> > > db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
>> > >    as.data.frame() |>
>> > >    setNames(header)
>> > >
>> > > str(db10)
>> > > #> 'data.frame':    25 obs. of  4 variables:
>> > > #>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
>> > > #>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
>> > > #>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
>> > > #>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...
>> > >
>> > >
>> > > And here is the data in dput format.
>> > >
>> > >
>> > >
>> > > db10 <-
>> > >    structure(list(
>> > >      cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
>> > >              2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
>> > >      cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
>> > >              4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
>> > >      role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
>> > >               11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
>> > >      groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
>> > >                  20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
>> > >      class = "data.frame", row.names = c(NA, -25L))
>> > >
>> > >
>> > >
>> > > As for the problem, I am not sure if you want summarise
instead of
>> > > mutate but here is a summarise solution.
>> > >
>> > >
>> > >
>> > > library(dplyr)
>> > >
>> > > db10 %>%
>> > >    group_by(groupid) %>%
>> > >    summarise(across(starts_with("cp"), ~ mean(.x,
na.rm = TRUE)))
>> > >
>> > > # same result, summarise's new argument .by avoids the
need to group_by
>> > > db10 %>%
>> > >    summarise(across(starts_with("cp"), ~ mean(.x,
na.rm = TRUE)), .by >> > > groupid)
>> > >
>> > >
>> > >
>> > > Can you post the expected output too?
>> > >
>> > > Hope this helps,
>> > >
>> > > Rui Barradas
>> > >
>> > >
>> > > --
>> > > Este e-mail foi analisado pelo software antiv?rus AVG para
verificar a
>> > > presen?a de v?rus.
>> > > www.avg.com
>> > >
>> >
>> >
>> > --
>> >
>> > Francesca
>> >
>> >
>> > ----------------------------------
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2024 - (no subject)

[R] (no subject)

[R] (no subject)