thr3ads.net - R help - [R] (no subject) [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Francesca

2024-Sep-16 07:28 UTC

[R] (no subject)

Dear Contributors,
I hope someone has found a similar issue.

I have this data set,



cp1
cp2
role
groupid
1
10
13
4
5
2
5
10
3
1
3
7
7
4
6
4
10
4
2
7
5
5
8
3
2
6
8
7
4
4
7
8
8
4
7
8
10
15
3
3
9
15
10
2
2
10
5
5
2
4
11
20
20
2
5
12
9
11
3
6
13
10
13
4
3
14
12
6
4
2
15
7
4
4
1
16
10
0
3
7
17
20
15
3
8
18
10
7
3
4
19
8
13
3
5
20
10
9
2
6



I need to to average of groups, using the values of column groupid, and
create a twin dataset in which the mean of the group is replaced instead of
individual values.
So for example, groupid 3, I calculate the mean (12+18)/2 and then I
replace in the new dataframe, but in the same positions, instead of 12 and
18, the values of the corresponding mean.
I found this solution, where db10_means is the output dataset, db10 is my
initial data.

db10_means<-db10 %>%
  group_by(groupid) %>%
  mutate(across(starts_with("cp"), list(mean = mean)))

It works perfectly, except that for NA values, where it replaces to all
group members the NA, while in some cases, the group is made of some NA and
some values.
So, when I have a group of two values and one NA, I would like that for
those with a value, the mean is replaced, for those with NA, the NA is
replaced.
Here the mean function has not the na.rm=T option associated, but it
appears that this solution cannot be implemented in this case. I am not
even sure that this would be enough to solve my problem.
Thanks for any help provided.

-- 

Francesca


----------------------------------

	[[alternative HTML version deleted]]

Rolf Turner

2024-Sep-16 09:05 UTC

head link

[R] Your data set manipulations

On Mon, 16 Sep 2024 09:28:14 +0200
Francesca <francesca.pancotto at gmail.com> wrote:
> Dear Contributors,
> I hope someone has found a similar issue.
I hope *not*! ??
> I have this data set,
You may have, but we haven't.  The data you provided have an
incomprehensible (to me at least) structure.  Please use dput()
to include your data in the message.
> cp1
> cp2
> role
> groupid
> 1
> 10
> 13
> 4
> 5
> 2
> 5
> 10
<SNIP>

<SNIP>
> 10
> 9
> 2
> 6
> 
> 
> 
> I need to to average of groups, using the values of column groupid,
> and create a twin dataset in which the mean of the group is replaced
> instead of individual values.
> So for example, groupid 3, I calculate the mean (12+18)/2 and then I
> replace in the new dataframe, but in the same positions, instead of
> 12 and 18, the values of the corresponding mean.
> I found this solution, where db10_means is the output dataset, db10
> is my initial data.
> 
> db10_means<-db10 %>%
>   group_by(groupid) %>%
>   mutate(across(starts_with("cp"), list(mean = mean)))
What does "%>%" mean?
> It works perfectly, except that for NA values,
I see no sign of there being any NAs in your data set.
> where it replaces to
> all group members the NA, while in some cases, the group is made of
> some NA and some values.
> So, when I have a group of two values and one NA, I would like that
> for those with a value, the mean is replaced, for those with NA, the
> NA is replaced.
> Here the mean function has not the na.rm=T option associated, but it
> appears that this solution cannot be implemented in this case. I am
> not even sure that this would be enough to solve my problem.
> Thanks for any help provided.
A more coherent message is required before I (at least) could possibly
give any help.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
         +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

Rui Barradas

2024-Sep-16 09:35 UTC

head link

[R] (no subject)

?s 08:28 de 16/09/2024, Francesca escreveu:> Dear Contributors,
> I hope someone has found a similar issue.
> 
> I have this data set,
> 
> 
> 
> cp1
> cp2
> role
> groupid
> 1
> 10
> 13
> 4
> 5
> 2
> 5
> 10
> 3
> 1
> 3
> 7
> 7
> 4
> 6
> 4
> 10
> 4
> 2
> 7
> 5
> 5
> 8
> 3
> 2
> 6
> 8
> 7
> 4
> 4
> 7
> 8
> 8
> 4
> 7
> 8
> 10
> 15
> 3
> 3
> 9
> 15
> 10
> 2
> 2
> 10
> 5
> 5
> 2
> 4
> 11
> 20
> 20
> 2
> 5
> 12
> 9
> 11
> 3
> 6
> 13
> 10
> 13
> 4
> 3
> 14
> 12
> 6
> 4
> 2
> 15
> 7
> 4
> 4
> 1
> 16
> 10
> 0
> 3
> 7
> 17
> 20
> 15
> 3
> 8
> 18
> 10
> 7
> 3
> 4
> 19
> 8
> 13
> 3
> 5
> 20
> 10
> 9
> 2
> 6
> 
> 
> 
> I need to to average of groups, using the values of column groupid, and
> create a twin dataset in which the mean of the group is replaced instead of
> individual values.
> So for example, groupid 3, I calculate the mean (12+18)/2 and then I
> replace in the new dataframe, but in the same positions, instead of 12 and
> 18, the values of the corresponding mean.
> I found this solution, where db10_means is the output dataset, db10 is my
> initial data.
> 
> db10_means<-db10 %>%
>    group_by(groupid) %>%
>    mutate(across(starts_with("cp"), list(mean = mean)))
> 
> It works perfectly, except that for NA values, where it replaces to all
> group members the NA, while in some cases, the group is made of some NA and
> some values.
> So, when I have a group of two values and one NA, I would like that for
> those with a value, the mean is replaced, for those with NA, the NA is
> replaced.
> Here the mean function has not the na.rm=T option associated, but it
> appears that this solution cannot be implemented in this case. I am not
> even sure that this would be enough to solve my problem.
> Thanks for any help provided.
> Hello,

Your data is a mess, please don't post html, this is plain text only 
list. Anyway, I managed to create a data frame by copying the data to a 
file named "rhelp.txt" and then running



db10 <- scan(file = "rhelp.txt", what = character())
header <- db10[1:4]
db10 <- db10[-(1:4)] |> as.numeric()
db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |>
   as.data.frame() |>
   setNames(header)

str(db10)
#> 'data.frame':    25 obs. of  4 variables:
#>  $ cp1    : num  1 5 3 7 10 5 2 4 8 10 ...
#>  $ cp2    : num  10 2 1 4 4 5 6 4 4 15 ...
#>  $ role   : num  13 5 3 6 2 8 8 7 7 3 ...
#>  $ groupid: num  4 10 7 4 7 3 7 8 8 3 ...


And here is the data in dput format.



db10 <-
   structure(list(
     cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2,
             2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10),
     cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10,
             4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9),
     role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5,
              11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2),
     groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5,
                 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)),
     class = "data.frame", row.names = c(NA, -25L))



As for the problem, I am not sure if you want summarise instead of 
mutate but here is a summarise solution.



library(dplyr)

db10 %>%
   group_by(groupid) %>%
   summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)))

# same result, summarise's new argument .by avoids the need to group_by
db10 %>%
   summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by
=
groupid)



Can you post the expected output too?

Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Sep 2024 - (no subject)

[R] (no subject)

[R] Your data set manipulations

[R] (no subject)

Seemingly Similar Threads