HI all, I have a data frame with three variables. Some of the variables do have missing values and I want to replace those missing values (1represented by NA) with the mean value of that variable. In this sample data, variable z and y do have missing values. The mean value of y and z are152. 25 and 359.5, respectively . I want replace those missing values by the respective mean value ( rounded to the nearest whole number). DF1 <- read.table(header=TRUE, text='ID1 x y z 1 25 122 352 2 30 135 376 3 40 NA 350 4 26 157 NA 5 60 195 360') mean x= 36.2 mean y=152.25 mean z= 359.5 output ID1 x y z 1 25 122 352 2 30 135 376 3 40 152 350 4 26 157 360 5 60 195 360 Thank you in advance
On 27/04/17 12:45, Val wrote:> HI all, > > I have a data frame with three variables. Some of the variables do > have missing values and I want to replace those missing values > (1represented by NA) with the mean value of that variable. In this > sample data, variable z and y do have missing values. The mean value > of y and z are152. 25 and 359.5, respectively . I want replace those > missing values by the respective mean value ( rounded to the nearest > whole number). > > DF1 <- read.table(header=TRUE, text='ID1 x y z > 1 25 122 352 > 2 30 135 376 > 3 40 NA 350 > 4 26 157 NA > 5 60 195 360') > mean x= 36.2 > mean y=152.25 > mean z= 359.5 > > output > ID1 x y z > 1 25 122 352 > 2 30 135 376 > 3 40 152 350 > 4 26 157 360 > 5 60 195 360This is pretty basic. You really ought to learn a bit more about R if you are going to use R. That being said, try: newDF1 <- as.data.frame(lapply(DF1,function(x){ x[is.na(x)] <- mean(x,na.rm=TRUE) x})) There may be sexier ways of accomplishing your goal, but this should work. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Hi Val, You could do this by nesting 2 for loops, and defining a function such that it returns the mean of the column when the value is ?NA?. df1 <- data.frame(x = c(25, 30, 40, 26, 60), y = c(122, 135, NA, 157, 195), z = c(352, 376, 350, NA, 360)); df2 <- df1[0, ] means <- sapply(df1, mean, na.rm = T); return_mean_if_NA <- function(x, y) { if (is.na(x)){ x <- y } else { return(x) } } for (i in 1:ncol(df1)){ for (j in 1:nrow(df1)){ df2[j, i] <- return_mean_if_NA(df1[j, i], means[i]) } } Hope this helps! Regards, Bo Lin> On 27 Apr 2017, at 8:45 AM, Val <valkremk at gmail.com> wrote: > > HI all, > > I have a data frame with three variables. Some of the variables do > have missing values and I want to replace those missing values > (1represented by NA) with the mean value of that variable. In this > sample data, variable z and y do have missing values. The mean value > of y and z are152. 25 and 359.5, respectively . I want replace those > missing values by the respective mean value ( rounded to the nearest > whole number). > > DF1 <- read.table(header=TRUE, text='ID1 x y z > 1 25 122 352 > 2 30 135 376 > 3 40 NA 350 > 4 26 157 NA > 5 60 195 360') > mean x= 36.2 > mean y=152.25 > mean z= 359.5 > > output > ID1 x y z > 1 25 122 352 > 2 30 135 376 > 3 40 152 350 > 4 26 157 360 > 5 60 195 360 > > > Thank you in advance > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Apologies, I re-read the question and realised you hope to replace the missing values rounded to the nearest whole number. Here?s the code in full. df1 <- data.frame(x = c(25, 30, 40, 26, 60), y = c(122, 135, NA, 157, 195), z = c(352, 376, 350, NA, 360)) means <- sapply(df1, mean, na.rm = T); return_mean_if_NA <- function(x, y) { if (is.na(x)){ x <- y } else { return(x) } } df2 <- df1[0, ] for (i in 1:ncol(df1)){ for (j in 1:nrow(df1)){ df2[j, i] <- round(return_mean_if_NA(df1[j, i], means[i]), 0) } } HTH. Regards, Bo Lin> On 27 Apr 2017, at 9:19 AM, Ng Bo Lin <ngbolin91 at gmail.com> wrote: > > Hi Val, > > You could do this by nesting 2 for loops, and defining a function such that it returns the mean of the column when the value is ?NA?. > > df1 <- data.frame(x = c(25, 30, 40, 26, 60), y = c(122, 135, NA, 157, 195), z = c(352, 376, 350, NA, 360)); df2 <- df1[0, ] > > means <- sapply(df1, mean, na.rm = T); return_mean_if_NA <- function(x, y) { if (is.na(x)){ x <- y } else { return(x) } } > > for (i in 1:ncol(df1)){ > for (j in 1:nrow(df1)){ > df2[j, i] <- return_mean_if_NA(df1[j, i], means[i]) > } > } > > > Hope this helps! > > Regards, > Bo Lin > >> On 27 Apr 2017, at 8:45 AM, Val <valkremk at gmail.com> wrote: >> >> HI all, >> >> I have a data frame with three variables. Some of the variables do >> have missing values and I want to replace those missing values >> (1represented by NA) with the mean value of that variable. In this >> sample data, variable z and y do have missing values. The mean value >> of y and z are152. 25 and 359.5, respectively . I want replace those >> missing values by the respective mean value ( rounded to the nearest >> whole number). >> >> DF1 <- read.table(header=TRUE, text='ID1 x y z >> 1 25 122 352 >> 2 30 135 376 >> 3 40 NA 350 >> 4 26 157 NA >> 5 60 195 360') >> mean x= 36.2 >> mean y=152.25 >> mean z= 359.5 >> >> output >> ID1 x y z >> 1 25 122 352 >> 2 30 135 376 >> 3 40 152 350 >> 4 26 157 360 >> 5 60 195 360 >> >> >> Thank you in advance >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Hi not sure if sexiest but zoo package has several functions for replacing missing values. as.data.frame(lapply(DF1, function(x) na.aggregate(x, FUN=function(y) round(mean(y))))) Cheers Petr> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rolf > Turner > Sent: Thursday, April 27, 2017 3:17 AM > To: Val <valkremk at gmail.com> > Cc: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org> > Subject: Re: [R] [FORGED] missing and replace > > On 27/04/17 12:45, Val wrote: > > HI all, > > > > I have a data frame with three variables. Some of the variables do > > have missing values and I want to replace those missing values > > (1represented by NA) with the mean value of that variable. In this > > sample data, variable z and y do have missing values. The mean value > > of y and z are152. 25 and 359.5, respectively . I want replace those > > missing values by the respective mean value ( rounded to the nearest > > whole number). > > > > DF1 <- read.table(header=TRUE, text='ID1 x y z > > 1 25 122 352 > > 2 30 135 376 > > 3 40 NA 350 > > 4 26 157 NA > > 5 60 195 360') > > mean x= 36.2 > > mean y=152.25 > > mean z= 359.5 > > > > output > > ID1 x y z > > 1 25 122 352 > > 2 30 135 376 > > 3 40 152 350 > > 4 26 157 360 > > 5 60 195 360 > > This is pretty basic. You really ought to learn a bit more about R if you are > going to use R. That being said, try: > > newDF1 <- as.data.frame(lapply(DF1,function(x){ > x[is.na(x)] <- mean(x,na.rm=TRUE) > x})) > > There may be sexier ways of accomplishing your goal, but this should work. > > cheers, > > Rolf Turner > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
Dear All, Replacing missing values with means is generally not a good idea: "Perhaps the easiest way to impute is to replace each missing value with the mean of the observed values for that variable. Unfortunately, this strategy can severely distort the distribution for this variable, leading to complications with summary measures including, notably, underestimates of the standard deviation. Moreover, mean imputation distorts relationships between variables by ?pulling? estimates of the correlation toward zero." That's from Gelman and Hill -- more here : http://www.stat.columbia.edu/~gelman/arm/missing.pdf best, Fraser ________________________________________ From: Val [valkremk at gmail.com] Sent: Wednesday, April 26, 2017 8:45 PM To: r-help at R-project.org (r-help at r-project.org) Subject: [R] missing and replace HI all, I have a data frame with three variables. Some of the variables do have missing values and I want to replace those missing values (1represented by NA) with the mean value of that variable. In this sample data, variable z and y do have missing values. The mean value of y and z are152. 25 and 359.5, respectively . I want replace those missing values by the respective mean value ( rounded to the nearest whole number). DF1 <- read.table(header=TRUE, text='ID1 x y z 1 25 122 352 2 30 135 376 3 40 NA 350 4 26 157 NA 5 60 195 360') mean x= 36.2 mean y=152.25 mean z= 359.5 output ID1 x y z 1 25 122 352 2 30 135 376 3 40 152 350 4 26 157 360 5 60 195 360 Thank you in advance ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.