thr3ads.net - R help - [R] Applying HP and BP filters to calculate potential GDP (de-trending) [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Admire Tarisirayi Chirume

2021-Oct-18 12:38 UTC

[R] Replacing NA s with the average

Good day colleagues. Below is a csv file attached which i am using in
my> analysis.
>
>
>
> household.id <http://hh.id>
>
> hd17.perm
>
> hd17employ
>
> health.exp
>
> total.food.exp
>
> total.nfood.exp
>
> 1
>
> 2
>
> yes
>
> 1654
>
> 23654
>
> 23655
>
> 2
>
> 2
>
> yes
>
> NA
>
> NA
>
> 65984
>
> 3
>
> 6
>
> no
>
> 2547
>
> 123311
>
> 52416
>
> 4
>
> 8
>
> NA
>
> 2365
>
> 13648
>
> 12544
>
> 5
>
> 6
>
> NA
>
> 1254
>
> 36549
>
> 12365
>
> 6
>
> 8
>
> yes
>
> 1236
>
> 236541
>
> 26522
>
> 7
>
> 8
>
> no
>
> NA
>
> 13264
>
> 23698
>
>
>
>
>
> So I created a df using the above and its a csv file as follows
>
> wbpractice <- read.csv("world_practice.csv")
>
> Now i am doing data cleaning and trying to replace all missing values with
> the averages of the respective columns.
>
> the dimension of the actual dataset is;
>
> dim(wbpractice)[1] 31998    6

I used the following script which i executed by i got some error messages

for(i in 1:ncol( wbpractice  )){
     wbpractice  [is.na( wbpractice  [,i]), i] <- mean( wbpractice  [,i],
na.rm = TRUE)
    }

Any help to replace all NAs with average values in my dataframe?


>
>>
	[[alternative HTML version deleted]]

PIKAL Petr

2021-Oct-18 14:43 UTC

head link

[R] Replacing NA s with the average

Hi.

sometimes is worth to try google first

R fill NA with average

resulted in

https://stackoverflow.com/questions/25835643/replace-missing-values-with-col
umn-mean

and from that

library(zoo)
na.aggregate(DF)

will replace all numeric NA values with column averages.

Cheers
Petr
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Admire
Tarisirayi
> Chirume
> Sent: Monday, October 18, 2021 2:39 PM
> To: Jim Lemon <drjimlemon at gmail.com>
> Cc: r-help mailing list <r-help at r-project.org>
> Subject: [R] Replacing NA s with the average
> 
> Good day colleagues. Below is a csv file attached which i am using in my
> > analysis.
> >
> >
> >
> > household.id <http://hh.id>
> >
> > hd17.perm
> >
> > hd17employ
> >
> > health.exp
> >
> > total.food.exp
> >
> > total.nfood.exp
> >
> > 1
> >
> > 2
> >
> > yes
> >
> > 1654
> >
> > 23654
> >
> > 23655
> >
> > 2
> >
> > 2
> >
> > yes
> >
> > NA
> >
> > NA
> >
> > 65984
> >
> > 3
> >
> > 6
> >
> > no
> >
> > 2547
> >
> > 123311
> >
> > 52416
> >
> > 4
> >
> > 8
> >
> > NA
> >
> > 2365
> >
> > 13648
> >
> > 12544
> >
> > 5
> >
> > 6
> >
> > NA
> >
> > 1254
> >
> > 36549
> >
> > 12365
> >
> > 6
> >
> > 8
> >
> > yes
> >
> > 1236
> >
> > 236541
> >
> > 26522
> >
> > 7
> >
> > 8
> >
> > no
> >
> > NA
> >
> > 13264
> >
> > 23698
> >
> >
> >
> >
> >
> > So I created a df using the above and its a csv file as follows
> >
> > wbpractice <- read.csv("world_practice.csv")
> >
> > Now i am doing data cleaning and trying to replace all missing values
> > with the averages of the respective columns.
> >
> > the dimension of the actual dataset is;
> >
> > dim(wbpractice)
> [1] 31998    6
> 
> I used the following script which i executed by i got some error messages
> 
> for(i in 1:ncol( wbpractice  )){
>      wbpractice  [is.na( wbpractice  [,i]), i] <- mean( wbpractice 
[,i],
na.rm > TRUE)>     }
> 
> Any help to replace all NAs with average values in my dataframe?
> 
> 
> 
> >
> >>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2021-Oct-18 14:58 UTC

head link

[R] Replacing NA s with the average

Hello,

Please don't post in HTML, post in plain text like the posting guide 
asks for. Your data is unreadable.

Here are two test data sets, one with all columns numeric, the other 
with some columns numeric.


wbpractice1 <- mtcars  # all columns are numeric
wbpractice2 <- iris    # not all columns are numeric
wbpractice1[] <- lapply(wbpractice1, \(x){
   is.na(x) <- sample(length(x), 0.25*length(x))
   x
})
wbpractice2[-5] <- lapply(wbpractice2[-5], \(x){
   is.na(x) <- sample(length(x), 0.25*length(x))
   x
})


#---

If all columns are numeric just lapply an anonymous function to each of 
them replacing the values where is.na is TRUE by the mean.


wbpractice1[] <- lapply(wbpractice1, \(x){
   x[is.na(x)] <- mean(x, na.rm = TRUE)
   x
})


But if some columns are not numeric, determine which are first, then 
apply the same code to that subset.


num_cols <- sapply(wbpractice2, is.numeric)
wbpractice2[num_cols] <- lapply(wbpractice2[num_cols], \(x){
   x[is.na(x)] <- mean(x, na.rm = TRUE)
   x
})


And here are dplyr solutions.


library(dplyr)

wbpractice1 %>%
   mutate(across(everything(), ~ifelse(is.na(.x), mean(.x, na.rm = 
TRUE), .x)))

wbpractice2 %>%
   mutate(across(where(is.numeric), ~ifelse(is.na(.x), mean(.x, na.rm = 
TRUE), .x)))



Hope this helps,

Rui Barradas


?s 13:38 de 18/10/21, Admire Tarisirayi Chirume
escreveu:> Good day colleagues. Below is a csv file attached which i am using in my
>> analysis.
>>
>>
>>
>> household.id <http://hh.id>
>>
>> hd17.perm
>>
>> hd17employ
>>
>> health.exp
>>
>> total.food.exp
>>
>> total.nfood.exp
>>
>> 1
>>
>> 2
>>
>> yes
>>
>> 1654
>>
>> 23654
>>
>> 23655
>>
>> 2
>>
>> 2
>>
>> yes
>>
>> NA
>>
>> NA
>>
>> 65984
>>
>> 3
>>
>> 6
>>
>> no
>>
>> 2547
>>
>> 123311
>>
>> 52416
>>
>> 4
>>
>> 8
>>
>> NA
>>
>> 2365
>>
>> 13648
>>
>> 12544
>>
>> 5
>>
>> 6
>>
>> NA
>>
>> 1254
>>
>> 36549
>>
>> 12365
>>
>> 6
>>
>> 8
>>
>> yes
>>
>> 1236
>>
>> 236541
>>
>> 26522
>>
>> 7
>>
>> 8
>>
>> no
>>
>> NA
>>
>> 13264
>>
>> 23698
>>
>>
>>
>>
>>
>> So I created a df using the above and its a csv file as follows
>>
>> wbpractice <- read.csv("world_practice.csv")
>>
>> Now i am doing data cleaning and trying to replace all missing values
with
>> the averages of the respective columns.
>>
>> the dimension of the actual dataset is;
>>
>> dim(wbpractice)
> [1] 31998    6
> 
> I used the following script which i executed by i got some error messages
> 
> for(i in 1:ncol( wbpractice  )){
>       wbpractice  [is.na( wbpractice  [,i]), i] <- mean( wbpractice 
[,i],
> na.rm = TRUE)
>      }
> 
> Any help to replace all NAs with average values in my dataframe?
> 
> 
> 
>>
>>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Richard O'Keefe

2021-Oct-19 01:22 UTC

head link

[R] Replacing NA s with the average

It *sounds* as though you are trying to impute missing data.
There are better approaches than just plugging in means.
You might want to look into CALIBERrfimpute or missForest.

On Tue, 19 Oct 2021 at 01:39, Admire Tarisirayi Chirume
<atchirume at gmail.com> wrote:>
> Good day colleagues. Below is a csv file attached which i am using in my
> > analysis.
> >
> >
> >
> > household.id <http://hh.id>
> >
> > hd17.perm
> >
> > hd17employ
> >
> > health.exp
> >
> > total.food.exp
> >
> > total.nfood.exp
> >
> > 1
> >
> > 2
> >
> > yes
> >
> > 1654
> >
> > 23654
> >
> > 23655
> >
> > 2
> >
> > 2
> >
> > yes
> >
> > NA
> >
> > NA
> >
> > 65984
> >
> > 3
> >
> > 6
> >
> > no
> >
> > 2547
> >
> > 123311
> >
> > 52416
> >
> > 4
> >
> > 8
> >
> > NA
> >
> > 2365
> >
> > 13648
> >
> > 12544
> >
> > 5
> >
> > 6
> >
> > NA
> >
> > 1254
> >
> > 36549
> >
> > 12365
> >
> > 6
> >
> > 8
> >
> > yes
> >
> > 1236
> >
> > 236541
> >
> > 26522
> >
> > 7
> >
> > 8
> >
> > no
> >
> > NA
> >
> > 13264
> >
> > 23698
> >
> >
> >
> >
> >
> > So I created a df using the above and its a csv file as follows
> >
> > wbpractice <- read.csv("world_practice.csv")
> >
> > Now i am doing data cleaning and trying to replace all missing values
with
> > the averages of the respective columns.
> >
> > the dimension of the actual dataset is;
> >
> > dim(wbpractice)
> [1] 31998    6
>
> I used the following script which i executed by i got some error messages
>
> for(i in 1:ncol( wbpractice  )){
>      wbpractice  [is.na( wbpractice  [,i]), i] <- mean( wbpractice 
[,i],
> na.rm = TRUE)
>     }
>
> Any help to replace all NAs with average values in my dataframe?
>
>
>
> >
> >>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Admire Tarisirayi Chirume

2022-Mar-03 07:43 UTC

head link

[R] Applying HP and BP filters to calculate potential GDP (de-trending)

Goodday, i hope i find you well. Kindly assist me on how to apply the HP
and BP filetrs to the following data set.

Thank you in advance.

Admire

R help - Mar 2022 - Applying HP and BP filters to calculate potential GDP (de-trending)

[R] Replacing NA s with the average

[R] Replacing NA s with the average

[R] Replacing NA s with the average

[R] Replacing NA s with the average

[R] Applying HP and BP filters to calculate potential GDP (de-trending)