Petr Pikal
2024-Aug-29 05:53 UTC
[R] Fill NA values in columns with values of another column
Hallo Francesca If you had an object with correct setting, something like template> dput(res)structure(list(V1 = c("1", "2", "3", "4", "5", "6", "7", "8"), V2 = c(2, 7, 10, 4, 9, 5, 2, 6)), class = "data.frame", row.names c("1", "2", "3", "4", "5", "6", "7", "8")) you could merge it with your object where some values are missing> dput(daf)structure(list(X1 = c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L, 9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L, 10L, NA, NA, NA), X2 = c(8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L, 4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L, 8L, 3L, 3L, 8L, 1L)), class = "data.frame", row.names c(NA, -32L))> merge(daf, res, by.x="X2", by.y="V1")X2 X1 V2 1 1 NA 2 2 1 NA 2 3 1 2 2 4 1 2 2 5 2 NA 7 6 2 NA 7 7 2 7 7 8 2 7 7 9 3 10 10 10 3 NA 10 11 3 10 10 12 3 NA 10 13 4 4 4 14 4 NA 4 15 4 4 4 16 4 NA 4 17 5 9 9 18 5 NA 9 19 5 NA 9 Cheers. Petr st 28. 8. 2024 v 0:45 odes?latel Francesca PANCOTTO via R-help < r-help at r-project.org> napsal:> Dear Contributors, > I have a problem with a database composed of many individuals for many > periods, for which I need to perform a manipulation of data as follows. > Here I report the procedure I need to do for the first 32 observations of > the first period. > > > cbind(VB1d[,1],s1id[,1]) > [,1] [,2] > [1,] 6 8 > [2,] 9 5 > [3,] NA 1 > [4,] 5 6 > [5,] NA 7 > [6,] NA 2 > [7,] 4 4 > [8,] 2 7 > [9,] 2 7 > [10,] NA 3 > [11,] NA 2 > [12,] NA 4 > [13,] 5 6 > [14,] 9 5 > [15,] NA 5 > [16,] NA 6 > [17,] 10 3 > [18,] 7 2 > [19,] 2 1 > [20,] NA 7 > [21,] 7 2 > [22,] NA 8 > [23,] NA 4 > [24,] NA 5 > [25,] NA 6 > [26,] 2 1 > [27,] 4 4 > [28,] 6 8 > [29,] 10 3 > [30,] NA 3 > [31,] NA 8 > [32,] NA 1 > > > In column s1id, I have numbers from 1 to 8, which are the id of 8 groups , > randomly mixed in the larger group of 32. > For each group, I want the value that is reported for only to group > members, to all the four group members. > > For example, value 8 in first row , second column, is group 8. The value > for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to > 8, I have 6. > But in row 22, the value 8 of the second variable, reports a value NA. > in each group is the same, only two values have the correct number, the > other two are NA. > I need that each group, identified by the values of the variable S1id, > correctly report the number of variable VB1d that is present for just two > group members. > > I hope my explanation is acceptable. > The task appears complex to me right now, especially because I will need to > multiply this procedure for x12x14 similar databases. > > Anyone has ever encountered a similar problem? > Thanks in advance for any help provided. > > ---------------------------------- > > Francesca Pancotto > > Associate Professor Political Economy > > University of Modena, Largo Santa Eufemia, 19, Modena > > Office Phone: +39 0522 523264 > > Web: *https://sites.google.com/view/francescapancotto/home > <https://sites.google.com/view/francescapancotto/home>* > > ---------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Bert Gunter
2024-Aug-29 21:29 UTC
[R] Fill NA values in columns with values of another column
Petr et.al: I think using merge is a very nice idea! (note that the email omitted the last rows of the result, though your code of course produced them) The only minor problem is that the order of the rows in the result is changed from the original. If the OP needs to preserve the original ordering, that can be easily done. Here is a complete implementation of your idea (I think). ## assume that dat is a data frame with the first two columns as in the OP's post, i.e. the first column is the Value with NA's and the second is the Group spl <- dat |> nrow() |> seq_len() |> split(dat[[2]]) |> unlist() ## for reordering dat[spl, 1:2] <- dat[, 1:2] |> na.omit() |> ## remove rows with NA's unique() |> ## remove duplicate rows merge(dat[, 1:2], by.x=2, by.y=2) |> _[, 2:1] ## and now merge() Note the final reordering of the first two columns because of the way merge() works. I suspect there may be a slicker way to do this using unsplit(), but I could not figure out how The result is:> datValue Group 1 6 8 2 9 5 3 2 1 4 5 6 5 2 7 6 7 2 7 4 4 8 2 7 9 2 7 10 10 3 11 7 2 12 4 4 13 5 6 14 9 5 15 9 5 16 5 6 17 10 3 18 7 2 19 2 1 20 2 7 21 7 2 22 6 8 23 4 4 24 9 5 25 5 6 26 2 1 27 4 4 28 6 8 29 10 3 30 10 3 31 6 8 32 2 1 Cheers, Bert On Wed, Aug 28, 2024 at 10:53?PM Petr Pikal <petr.pik at gmail.com> wrote:> Hallo Francesca > > If you had an object with correct setting, something like template > > > dput(res) > structure(list(V1 = c("1", "2", "3", "4", "5", "6", "7", "8"), > V2 = c(2, 7, 10, 4, 9, 5, 2, 6)), class = "data.frame", row.names > c("1", > "2", "3", "4", "5", "6", "7", "8")) > > you could merge it with your object where some values are missing > > > dput(daf) > structure(list(X1 = c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, > NA, NA, 5L, 9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, > 2L, 4L, 6L, 10L, NA, NA, NA), X2 = c(8L, 5L, 1L, 6L, 7L, 2L, > 4L, 7L, 7L, 3L, 2L, 4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, > 4L, 5L, 6L, 1L, 4L, 8L, 3L, 3L, 8L, 1L)), class = "data.frame", row.names > c(NA, > -32L)) > > > merge(daf, res, by.x="X2", by.y="V1") > X2 X1 V2 > 1 1 NA 2 > 2 1 NA 2 > 3 1 2 2 > 4 1 2 2 > 5 2 NA 7 > 6 2 NA 7 > 7 2 7 7 > 8 2 7 7 > 9 3 10 10 > 10 3 NA 10 > 11 3 10 10 > 12 3 NA 10 > 13 4 4 4 > 14 4 NA 4 > 15 4 4 4 > 16 4 NA 4 > 17 5 9 9 > 18 5 NA 9 > 19 5 NA 9 > > Cheers. > Petr > > > > > st 28. 8. 2024 v 0:45 odes?latel Francesca PANCOTTO via R-help < > r-help at r-project.org> napsal: > > > Dear Contributors, > > I have a problem with a database composed of many individuals for many > > periods, for which I need to perform a manipulation of data as follows. > > Here I report the procedure I need to do for the first 32 observations of > > the first period. > > > > > > cbind(VB1d[,1],s1id[,1]) > > [,1] [,2] > > [1,] 6 8 > > [2,] 9 5 > > [3,] NA 1 > > [4,] 5 6 > > [5,] NA 7 > > [6,] NA 2 > > [7,] 4 4 > > [8,] 2 7 > > [9,] 2 7 > > [10,] NA 3 > > [11,] NA 2 > > [12,] NA 4 > > [13,] 5 6 > > [14,] 9 5 > > [15,] NA 5 > > [16,] NA 6 > > [17,] 10 3 > > [18,] 7 2 > > [19,] 2 1 > > [20,] NA 7 > > [21,] 7 2 > > [22,] NA 8 > > [23,] NA 4 > > [24,] NA 5 > > [25,] NA 6 > > [26,] 2 1 > > [27,] 4 4 > > [28,] 6 8 > > [29,] 10 3 > > [30,] NA 3 > > [31,] NA 8 > > [32,] NA 1 > > > > > > In column s1id, I have numbers from 1 to 8, which are the id of 8 groups > , > > randomly mixed in the larger group of 32. > > For each group, I want the value that is reported for only to group > > members, to all the four group members. > > > > For example, value 8 in first row , second column, is group 8. The value > > for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to > > 8, I have 6. > > But in row 22, the value 8 of the second variable, reports a value NA. > > in each group is the same, only two values have the correct number, the > > other two are NA. > > I need that each group, identified by the values of the variable S1id, > > correctly report the number of variable VB1d that is present for just two > > group members. > > > > I hope my explanation is acceptable. > > The task appears complex to me right now, especially because I will need > to > > multiply this procedure for x12x14 similar databases. > > > > Anyone has ever encountered a similar problem? > > Thanks in advance for any help provided. > > > > ---------------------------------- > > > > Francesca Pancotto > > > > Associate Professor Political Economy > > > > University of Modena, Largo Santa Eufemia, 19, Modena > > > > Office Phone: +39 0522 523264 > > > > Web: *https://sites.google.com/view/francescapancotto/home > > <https://sites.google.com/view/francescapancotto/home>* > > > > ---------------------------------- > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > https://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]