Olu Ola
2017-May-09 19:14 UTC
[R] How to replace missing values by mean of subgroup of a group
Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column?for each food item by the mean price of the specific food item?for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated. | hhid | city | food | food price | | 101 | Paxton | rice | 10 | | 101 | Paxton | beans | 30 | | 101 | Paxton | flour | NA | | 101 | Paxton | eggs | 20 | | 102 | Paxton | rice | NA | | 102 | Paxton | beans | NA | | 102 | Paxton | flour | 34 | | 102 | Paxton | eggs | 21 | | 103 | Paxton | rice | 15 | | 103 | Paxton | beans | 28 | | 103 | Paxton | flour | 32 | | 103 | Paxton | eggs | NA | | 104 | Hull | rice | NA | | 104 | Hull | beans | 34 | | 104 | Hull | flour | NA | | 104 | Hull | eggs | 24 | | 105 | Hull | rice | 18 | | 105 | Hull | beans | 38 | | 105 | Hull | flour | 36 | | 105 | Hull | eggs | 26 | | 106 | Hull | rice | NA | | 106 | Hull | beans | NA | | 106 | Hull | flour | 40 | | 106 | Hull | eggs | NA | [[alternative HTML version deleted]]
Boris Steipe
2017-May-09 20:20 UTC
[R] How to replace missing values by mean of subgroup of a group
Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes: - your data _is_ a data frame - the desired column is called food.price, not "food price" (cf. ?make.names ) # define a function that imputes NA values in the same city, for the same food imputeFoodPrice <- function(DF, i) { sel <- DF$city == DF$city[i] & DF$food == DF$food[i] imputed <- mean(DF$food.price[sel], na.rm = TRUE) if (is.nan(imputed)) { # careful, there might be no other match imputed <- NA } return(imputed) } # apply the function to replace NA values for (iMissing in which(is.na(myDF$food.price))) { myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing) } B.> On May 9, 2017, at 3:14 PM, Olu Ola via R-help <r-help at r-project.org> wrote: > > Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated. > > | hhid | city | food | food price | > | 101 | Paxton | rice | 10 | > | 101 | Paxton | beans | 30 | > | 101 | Paxton | flour | NA | > | 101 | Paxton | eggs | 20 | > | 102 | Paxton | rice | NA | > | 102 | Paxton | beans | NA | > | 102 | Paxton | flour | 34 | > | 102 | Paxton | eggs | 21 | > | 103 | Paxton | rice | 15 | > | 103 | Paxton | beans | 28 | > | 103 | Paxton | flour | 32 | > | 103 | Paxton | eggs | NA | > | 104 | Hull | rice | NA | > | 104 | Hull | beans | 34 | > | 104 | Hull | flour | NA | > | 104 | Hull | eggs | 24 | > | 105 | Hull | rice | 18 | > | 105 | Hull | beans | 38 | > | 105 | Hull | flour | 36 | > | 105 | Hull | eggs | 26 | > | 106 | Hull | rice | NA | > | 106 | Hull | beans | NA | > | 106 | Hull | flour | 40 | > | 106 | Hull | eggs | NA | > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Boris Steipe
2017-May-09 22:35 UTC
[R] How to replace missing values by mean of subgroup of a group
Great. I am CC'ing the list - this is important so that others who may come across this thread in the archives know that this question has been resolved. Cheers, B.> On May 9, 2017, at 5:18 PM, Olu Ola <oluola2011 at yahoo.com> wrote: > > Thank you!!! It worked. > > Regards, > Olu > > > On Tuesday, May 9, 2017 4:20 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes: > - your data _is_ a data frame > - the desired column is called food.price, not "food price" (cf. ?make.names ) > > # define a function that imputes NA values in the same city, for the same food > imputeFoodPrice <- function(DF, i) { > sel <- DF$city == DF$city[i] & DF$food == DF$food[i] > imputed <- mean(DF$food.price[sel], na.rm = TRUE) > if (is.nan(imputed)) { # careful, there might be no other match > imputed <- NA > } > return(imputed) > } > > > # apply the function to replace NA values > for (iMissing in which(is.na(myDF$food.price))) { > myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing) > } > > > B. > > > > > On May 9, 2017, at 3:14 PM, Olu Ola via R-help <r-help at r-project.org> wrote: > > > > Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated. > > > > | hhid | city | food | food price | > > | 101 | Paxton | rice | 10 | > > | 101 | Paxton | beans | 30 | > > | 101 | Paxton | flour | NA | > > | 101 | Paxton | eggs | 20 | > > | 102 | Paxton | rice | NA | > > | 102 | Paxton | beans | NA | > > | 102 | Paxton | flour | 34 | > > | 102 | Paxton | eggs | 21 | > > | 103 | Paxton | rice | 15 | > > | 103 | Paxton | beans | 28 | > > | 103 | Paxton | flour | 32 | > > | 103 | Paxton | eggs | NA | > > | 104 | Hull | rice | NA | > > | 104 | Hull | beans | 34 | > > | 104 | Hull | flour | NA | > > | 104 | Hull | eggs | 24 | > > | 105 | Hull | rice | 18 | > > | 105 | Hull | beans | 38 | > > | 105 | Hull | flour | 36 | > > | 105 | Hull | eggs | 26 | > > | 106 | Hull | rice | NA | > > | 106 | Hull | beans | NA | > > | 106 | Hull | flour | 40 | > > | 106 | Hull | eggs | NA | > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >
Bert Gunter
2017-May-10 00:44 UTC
[R] How to replace missing values by mean of subgroup of a group
Of course, statistically, one should not do this. But that's another thread on another site. Cheers, Bert On May 9, 2017 3:36 PM, "Boris Steipe" <boris.steipe at utoronto.ca> wrote: Great. I am CC'ing the list - this is important so that others who may come across this thread in the archives know that this question has been resolved. Cheers, B.> On May 9, 2017, at 5:18 PM, Olu Ola <oluola2011 at yahoo.com> wrote: > > Thank you!!! It worked. > > Regards, > Olu > > > On Tuesday, May 9, 2017 4:20 PM, Boris Steipe <boris.steipe at utoronto.ca>wrote:> > > Pedestrian code, so you can analyze this easily. However entirelyuntested since I have no ambitionto recreate your input data as a data frame. This code assumes:> - your data _is_ a data frame > - the desired column is called food.price, not "food price" (cf.?make.names )> > # define a function that imputes NA values in the same city, for the samefood> imputeFoodPrice <- function(DF, i) { > sel <- DF$city == DF$city[i] & DF$food == DF$food[i] > imputed <- mean(DF$food.price[sel], na.rm = TRUE) > if (is.nan(imputed)) { # careful, there might be no other match > imputed <- NA > } > return(imputed) > } > > > # apply the function to replace NA values > for (iMissing in which(is.na(myDF$food.price))) { > myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing) > } > > > B. > > > > > On May 9, 2017, at 3:14 PM, Olu Ola via R-help <r-help at r-project.org>wrote:> > > > Hello,I have the following food data with some NA values in the foodprices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated.> > > > | hhid | city | food | food price | > > | 101 | Paxton | rice | 10 | > > | 101 | Paxton | beans | 30 | > > | 101 | Paxton | flour | NA | > > | 101 | Paxton | eggs | 20 | > > | 102 | Paxton | rice | NA | > > | 102 | Paxton | beans | NA | > > | 102 | Paxton | flour | 34 | > > | 102 | Paxton | eggs | 21 | > > | 103 | Paxton | rice | 15 | > > | 103 | Paxton | beans | 28 | > > | 103 | Paxton | flour | 32 | > > | 103 | Paxton | eggs | NA | > > | 104 | Hull | rice | NA | > > | 104 | Hull | beans | 34 | > > | 104 | Hull | flour | NA | > > | 104 | Hull | eggs | 24 | > > | 105 | Hull | rice | 18 | > > | 105 | Hull | beans | 38 | > > | 105 | Hull | flour | 36 | > > | 105 | Hull | eggs | 26 | > > | 106 | Hull | rice | NA | > > | 106 | Hull | beans | NA | > > | 106 | Hull | flour | 40 | > > | 106 | Hull | eggs | NA | > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Boris Steipe
2017-May-10 00:59 UTC
[R] How to replace missing values by mean of subgroup of a group
Of course, and I neglected to point this out: "The other thread" would be how to properly impute missing values, "The other site" could be https://stats.stackexchange.com/ ... there is _lots_ of information available if you search for it. An applicable R Package is MICE and you can find an introduction here: https://datascienceplus.com/imputing-missing-data-with-r-mice-package/ (Besides the usual documentation.) B.> On May 9, 2017, at 8:44 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > Of course, statistically, one should not do this. But that's another thread on another site. > > Cheers, > Bert > > > > On May 9, 2017 3:36 PM, "Boris Steipe" <boris.steipe at utoronto.ca> wrote: > Great. > > I am CC'ing the list - this is important so that others who may come across this thread in the archives know that this question has been resolved. > > Cheers, > > B. > > > > On May 9, 2017, at 5:18 PM, Olu Ola <oluola2011 at yahoo.com> wrote: > > > > Thank you!!! It worked. > > > > Regards, > > Olu > > > > > > On Tuesday, May 9, 2017 4:20 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > > > > > Pedestrian code, so you can analyze this easily. However entirely untested since I have no ambitionto recreate your input data as a data frame. This code assumes: > > - your data _is_ a data frame > > - the desired column is called food.price, not "food price" (cf. ?make.names ) > > > > # define a function that imputes NA values in the same city, for the same food > > imputeFoodPrice <- function(DF, i) { > > sel <- DF$city == DF$city[i] & DF$food == DF$food[i] > > imputed <- mean(DF$food.price[sel], na.rm = TRUE) > > if (is.nan(imputed)) { # careful, there might be no other match > > imputed <- NA > > } > > return(imputed) > > } > > > > > > # apply the function to replace NA values > > for (iMissing in which(is.na(myDF$food.price))) { > > myDF$food.price[iMissing] <- imputeFoodPrice(myDF, iMissing) > > } > > > > > > B. > > > > > > > > > On May 9, 2017, at 3:14 PM, Olu Ola via R-help <r-help at r-project.org> wrote: > > > > > > Hello,I have the following food data with some NA values in the food prices. I will like to replace the NA values in the food price column for each food item by the mean price of the specific food item for each city. For example, the price of bean for the household with hhid 102 in the data set is missing. I will like to replace the missing value with the mean price of bean for the households living in Paxton city (that is households 101 and 103). the data set is presented below. Any help will be greatly appreciated. > > > > > > | hhid | city | food | food price | > > > | 101 | Paxton | rice | 10 | > > > | 101 | Paxton | beans | 30 | > > > | 101 | Paxton | flour | NA | > > > | 101 | Paxton | eggs | 20 | > > > | 102 | Paxton | rice | NA | > > > | 102 | Paxton | beans | NA | > > > | 102 | Paxton | flour | 34 | > > > | 102 | Paxton | eggs | 21 | > > > | 103 | Paxton | rice | 15 | > > > | 103 | Paxton | beans | 28 | > > > | 103 | Paxton | flour | 32 | > > > | 103 | Paxton | eggs | NA | > > > | 104 | Hull | rice | NA | > > > | 104 | Hull | beans | 34 | > > > | 104 | Hull | flour | NA | > > > | 104 | Hull | eggs | 24 | > > > | 105 | Hull | rice | 18 | > > > | 105 | Hull | beans | 38 | > > > | 105 | Hull | flour | 36 | > > > | 105 | Hull | eggs | 26 | > > > | 106 | Hull | rice | NA | > > > | 106 | Hull | beans | NA | > > > | 106 | Hull | flour | 40 | > > > | 106 | Hull | eggs | NA | > > > > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >