Dear Contributors, I hope someone has found a similar issue. I have this data set, cp1 cp2 role groupid 1 10 13 4 5 2 5 10 3 1 3 7 7 4 6 4 10 4 2 7 5 5 8 3 2 6 8 7 4 4 7 8 8 4 7 8 10 15 3 3 9 15 10 2 2 10 5 5 2 4 11 20 20 2 5 12 9 11 3 6 13 10 13 4 3 14 12 6 4 2 15 7 4 4 1 16 10 0 3 7 17 20 15 3 8 18 10 7 3 4 19 8 13 3 5 20 10 9 2 6 I need to to average of groups, using the values of column groupid, and create a twin dataset in which the mean of the group is replaced instead of individual values. So for example, groupid 3, I calculate the mean (12+18)/2 and then I replace in the new dataframe, but in the same positions, instead of 12 and 18, the values of the corresponding mean. I found this solution, where db10_means is the output dataset, db10 is my initial data. db10_means<-db10 %>% group_by(groupid) %>% mutate(across(starts_with("cp"), list(mean = mean))) It works perfectly, except that for NA values, where it replaces to all group members the NA, while in some cases, the group is made of some NA and some values. So, when I have a group of two values and one NA, I would like that for those with a value, the mean is replaced, for those with NA, the NA is replaced. Here the mean function has not the na.rm=T option associated, but it appears that this solution cannot be implemented in this case. I am not even sure that this would be enough to solve my problem. Thanks for any help provided. -- Francesca ---------------------------------- [[alternative HTML version deleted]]
On Mon, 16 Sep 2024 09:28:14 +0200 Francesca <francesca.pancotto at gmail.com> wrote:> Dear Contributors, > I hope someone has found a similar issue.I hope *not*! ??> I have this data set,You may have, but we haven't. The data you provided have an incomprehensible (to me at least) structure. Please use dput() to include your data in the message.> cp1 > cp2 > role > groupid > 1 > 10 > 13 > 4 > 5 > 2 > 5 > 10<SNIP> <SNIP>> 10 > 9 > 2 > 6 > > > > I need to to average of groups, using the values of column groupid, > and create a twin dataset in which the mean of the group is replaced > instead of individual values. > So for example, groupid 3, I calculate the mean (12+18)/2 and then I > replace in the new dataframe, but in the same positions, instead of > 12 and 18, the values of the corresponding mean. > I found this solution, where db10_means is the output dataset, db10 > is my initial data. > > db10_means<-db10 %>% > group_by(groupid) %>% > mutate(across(starts_with("cp"), list(mean = mean)))What does "%>%" mean?> It works perfectly, except that for NA values,I see no sign of there being any NAs in your data set.> where it replaces to > all group members the NA, while in some cases, the group is made of > some NA and some values. > So, when I have a group of two values and one NA, I would like that > for those with a value, the mean is replaced, for those with NA, the > NA is replaced. > Here the mean function has not the na.rm=T option associated, but it > appears that this solution cannot be implemented in this case. I am > not even sure that this would be enough to solve my problem. > Thanks for any help provided.A more coherent message is required before I (at least) could possibly give any help. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. (secretaries) phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619
?s 08:28 de 16/09/2024, Francesca escreveu:> Dear Contributors, > I hope someone has found a similar issue. > > I have this data set, > > > > cp1 > cp2 > role > groupid > 1 > 10 > 13 > 4 > 5 > 2 > 5 > 10 > 3 > 1 > 3 > 7 > 7 > 4 > 6 > 4 > 10 > 4 > 2 > 7 > 5 > 5 > 8 > 3 > 2 > 6 > 8 > 7 > 4 > 4 > 7 > 8 > 8 > 4 > 7 > 8 > 10 > 15 > 3 > 3 > 9 > 15 > 10 > 2 > 2 > 10 > 5 > 5 > 2 > 4 > 11 > 20 > 20 > 2 > 5 > 12 > 9 > 11 > 3 > 6 > 13 > 10 > 13 > 4 > 3 > 14 > 12 > 6 > 4 > 2 > 15 > 7 > 4 > 4 > 1 > 16 > 10 > 0 > 3 > 7 > 17 > 20 > 15 > 3 > 8 > 18 > 10 > 7 > 3 > 4 > 19 > 8 > 13 > 3 > 5 > 20 > 10 > 9 > 2 > 6 > > > > I need to to average of groups, using the values of column groupid, and > create a twin dataset in which the mean of the group is replaced instead of > individual values. > So for example, groupid 3, I calculate the mean (12+18)/2 and then I > replace in the new dataframe, but in the same positions, instead of 12 and > 18, the values of the corresponding mean. > I found this solution, where db10_means is the output dataset, db10 is my > initial data. > > db10_means<-db10 %>% > group_by(groupid) %>% > mutate(across(starts_with("cp"), list(mean = mean))) > > It works perfectly, except that for NA values, where it replaces to all > group members the NA, while in some cases, the group is made of some NA and > some values. > So, when I have a group of two values and one NA, I would like that for > those with a value, the mean is replaced, for those with NA, the NA is > replaced. > Here the mean function has not the na.rm=T option associated, but it > appears that this solution cannot be implemented in this case. I am not > even sure that this would be enough to solve my problem. > Thanks for any help provided. >Hello, Your data is a mess, please don't post html, this is plain text only list. Anyway, I managed to create a data frame by copying the data to a file named "rhelp.txt" and then running db10 <- scan(file = "rhelp.txt", what = character()) header <- db10[1:4] db10 <- db10[-(1:4)] |> as.numeric() db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |> as.data.frame() |> setNames(header) str(db10) #> 'data.frame': 25 obs. of 4 variables: #> $ cp1 : num 1 5 3 7 10 5 2 4 8 10 ... #> $ cp2 : num 10 2 1 4 4 5 6 4 4 15 ... #> $ role : num 13 5 3 6 2 8 8 7 7 3 ... #> $ groupid: num 4 10 7 4 7 3 7 8 8 3 ... And here is the data in dput format. db10 <- structure(list( cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2, 2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10), cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10, 4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9), role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5, 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2), groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5, 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)), class = "data.frame", row.names = c(NA, -25L)) As for the problem, I am not sure if you want summarise instead of mutate but here is a summarise solution. library(dplyr) db10 %>% group_by(groupid) %>% summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE))) # same result, summarise's new argument .by avoids the need to group_by db10 %>% summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by = groupid) Can you post the expected output too? Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus. www.avg.com