thr3ads.net - R help - [R] remove rows with infinite/nan values from a zoo dataset [Sep 2013]

If this information is useful, please help other people find it:
Share via:

arun

2013-Sep-03 06:47 UTC

[R] remove rows with infinite/nan values from a zoo dataset

Hi,
Please dput() the example dataset.? When I read from the one shown below, it
looks a bit altered.

library(zoo)
dat1<- read.zoo(text="2009-07-15,#N/A N/A,#N/A N/A,18.96858
2009-07-16,20.30685,20.40664,#N/A N/A
2009-07-17,20.78813,20.03991,20.40664
2009-07-20,21.41278,21.41278,20.03991
2009-07-21,22.9963,22.98397,21.41278
2009-07-22,23.06443,23.01112,22.98397
2009-07-23,23.45905,24.72232,23.01112
2009-07-24,24.89291,25.56603,24.72232
2009-07-27,25.38929,24.80535,25.56603
2009-07-28,25.26712,25.65566,24.80535
2009-07-29,25.83884,24.98163,25.65566
2009-07-30,#N/A N/A,#N/A N/A,24.98163
2009-08-03,25.25553,25.93297,#N/A N/A
2009-08-04,26.02464,25.49159,25.93297
",sep=",",header=FALSE,FUN=as.Date,format="%Y-%m-%d",fill=TRUE)


dput(dat1)? ###
structure(c(NA, 20.30685, 20.78813, 21.41278, 22.9963, 23.06443, 
23.45905, 24.89291, 25.38929, 25.26712, 25.83884, NA, 25.25553, 
26.02464, NA, 20.40664, 20.03991, 21.41278, 22.98397, 23.01112, 
24.72232, 25.56603, 24.80535, 25.65566, 24.98163, NA, 25.93297, 
25.49159, NA, NA, 20.40664, 20.03991, 21.41278, 22.98397, 23.01112, 
24.72232, 25.56603, 24.80535, 25.65566, NA, NA, 25.93297), .Dim = c(14L, 
3L), .Dimnames = list(NULL, c("V2", "V3", "V4")),
index = structure(c(14440,
14441, 14442, 14445, 14446, 14447, 14448, 14449, 14452, 14453, 
14454, 14455, 14459, 14460), class = "Date"), class = "zoo")


dat2<- dat1[!rowSums(is.na(dat1)),]
dat2
#???????????????? V2?????? V3?????? V4
#2009-07-17 20.78813 20.03991 20.40664
#2009-07-20 21.41278 21.41278 20.03991
#2009-07-21 22.99630 22.98397 21.41278
#2009-07-22 23.06443 23.01112 22.98397
#2009-07-23 23.45905 24.72232 23.01112
#2009-07-24 24.89291 25.56603 24.72232
#2009-07-27 25.38929 24.80535 25.56603
#2009-07-28 25.26712 25.65566 24.80535
#2009-07-29 25.83884 24.98163 25.65566
#2009-08-04 26.02464 25.49159 25.93297


dat2[1,2]<- Inf
?dat2[5,3]<- -Inf


dat2[rowSums(is.finite(dat2))==ncol(dat2),]
#???????????????? V2?????? V3?????? V4
#2009-07-20 21.41278 21.41278 20.03991
#2009-07-21 22.99630 22.98397 21.41278
#2009-07-22 23.06443 23.01112 22.98397
#2009-07-24 24.89291 25.56603 24.72232
#2009-07-27 25.38929 24.80535 25.56603
#2009-07-28 25.26712 25.65566 24.80535
#2009-07-29 25.83884 24.98163 25.65566
#2009-08-04 26.02464 25.49159 25.93297


A.K.

Hi There, 

I have a dataset with many rows and few columns as following: 

2009-07-15	#N/A N/A	#N/A N/A	18.96858 
2009-07-16	20.30685	20.40664	#N/A N/A 
2009-07-17	20.78813	20.03991	20.40664 
2009-07-20	21.41278	21.41278	20.03991 
2009-07-21	22.9963	22.98397	21.41278 
2009-07-22	23.06443	23.01112	22.98397 
2009-07-23	23.45905	24.72232	23.01112 
2009-07-24	24.89291	25.56603	24.72232 
2009-07-27	25.38929	24.80535	25.56603 
2009-07-28	25.26712	25.65566	24.80535 
2009-07-29	25.83884	24.98163	25.65566 
2009-07-30	#N/A N/A	#N/A N/A	24.98163 
2009-08-03	25.25553	25.93297	#N/A N/A 
2009-08-04	26.02464	25.49159	25.93297 

The class of the dataset is "zoo". My question might be stupid 
but could anyone suggest a way to remove the rows with #N/A values? 
I tried "rapply" command but it didn't work due to the data class.

btw, how about for the "Inf" values? 

Thank you in advance!

arun

2013-Sep-03 15:49 UTC

head link

[R] remove rows with infinite/nan values from a zoo dataset

Hi,

No problem.

In my previous post, I showed how to dput() your example dataset.? Please use
dput() in the future.
vec1<-
c(3.369247e-04,0.000000e+00,9.022183e-04,0.000000e+00,-1.105819e-04,-Inf,1.191271e-04,1.681718e-04,NaN,1.150126e-04,1.031037e-03,2.710993e-04)

indx<-seq(as.Date("2009-09-01"),as.Date("2009-09-17"),by=1)
indx1<-indx[-c(5:7,12:13)]
library(zoo)
z1<- zoo(vec1,order.by=indx1)
?sum(z1,na.rm=TRUE) #without removing the Inf. 
#[1] -Inf


sum(z1[is.finite(z1)],na.rm=TRUE)
#[1] 0.002833009


#or just
sum(z1[is.finite(z1)])
#[1] 0.002833009
A.K.





Thank you for your reply A.K. 

Sorry for my misleading -- the first question should be removing
 #N/A N/A values when reading a csv file. So the example provided in the
 original post was dragged from a csv spreadsheet directly. 
(which I used the code
"prices=read.zoo("C:\\Users\\Desktop\\\\awc_au.csv",header=TRUE,sep=",",format="%Y-%m-%d"
")

Then the following up question is removing from a zoo data set. 
After some calculation, the new zoo data set is as following: 
?2009-09-01 ? ? ? ? 2009-09-02 ? ? ? 2009-09-03 ? ? 2009-09-04 ? ? 2009-09-08 ?
?2009-09-09
?3.369247e-04 ?0.000000e+00 ?9.022183e-04 ?0.000000e+00 -1.105819e-04 ? ? ? ?
?-Inf
? ?2009-09-10 ? ? ? 2009-09-11 ? ? ?2009-09-14 ? ?2009-09-15 ? ? ?2009-09-16 ? ?
2009-09-17
?1.191271e-04 ?1.681718e-04 ? ? ? ?NaN ? ? ? ? ? ? 1.150126e-04 ?1.031037e-03
?2.710993e-04

I need to sum them up so I used "sum(Z, na.rm=TRUE)" to remove the NaN
values but not for the Inf/-Inf.

Hope it is clear to you. 

Cheers, 
R.L 
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Tuesday, September 3, 2013 2:47 AM
Subject: Re: remove rows with infinite/nan values from a zoo dataset

Hi,
Please dput() the example dataset.? When I read from the one shown below, it
looks a bit altered.

library(zoo)
dat1<- read.zoo(text="2009-07-15,#N/A N/A,#N/A N/A,18.96858
2009-07-16,20.30685,20.40664,#N/A N/A
2009-07-17,20.78813,20.03991,20.40664
2009-07-20,21.41278,21.41278,20.03991
2009-07-21,22.9963,22.98397,21.41278
2009-07-22,23.06443,23.01112,22.98397
2009-07-23,23.45905,24.72232,23.01112
2009-07-24,24.89291,25.56603,24.72232
2009-07-27,25.38929,24.80535,25.56603
2009-07-28,25.26712,25.65566,24.80535
2009-07-29,25.83884,24.98163,25.65566
2009-07-30,#N/A N/A,#N/A N/A,24.98163
2009-08-03,25.25553,25.93297,#N/A N/A
2009-08-04,26.02464,25.49159,25.93297
",sep=",",header=FALSE,FUN=as.Date,format="%Y-%m-%d",fill=TRUE)


dput(dat1)? ###
structure(c(NA, 20.30685, 20.78813, 21.41278, 22.9963, 23.06443, 
23.45905, 24.89291, 25.38929, 25.26712, 25.83884, NA, 25.25553, 
26.02464, NA, 20.40664, 20.03991, 21.41278, 22.98397, 23.01112, 
24.72232, 25.56603, 24.80535, 25.65566, 24.98163, NA, 25.93297, 
25.49159, NA, NA, 20.40664, 20.03991, 21.41278, 22.98397, 23.01112, 
24.72232, 25.56603, 24.80535, 25.65566, NA, NA, 25.93297), .Dim = c(14L, 
3L), .Dimnames = list(NULL, c("V2", "V3", "V4")),
index = structure(c(14440,
14441, 14442, 14445, 14446, 14447, 14448, 14449, 14452, 14453, 
14454, 14455, 14459, 14460), class = "Date"), class = "zoo")


dat2<- dat1[!rowSums(is.na(dat1)),]
dat2
#???????????????? V2?????? V3?????? V4
#2009-07-17 20.78813 20.03991 20.40664
#2009-07-20 21.41278 21.41278 20.03991
#2009-07-21 22.99630 22.98397 21.41278
#2009-07-22 23.06443 23.01112 22.98397
#2009-07-23 23.45905 24.72232 23.01112
#2009-07-24 24.89291 25.56603 24.72232
#2009-07-27 25.38929 24.80535 25.56603
#2009-07-28 25.26712 25.65566 24.80535
#2009-07-29 25.83884 24.98163 25.65566
#2009-08-04 26.02464 25.49159 25.93297


dat2[1,2]<- Inf
?dat2[5,3]<- -Inf


dat2[rowSums(is.finite(dat2))==ncol(dat2),]
#???????????????? V2?????? V3?????? V4
#2009-07-20 21.41278 21.41278 20.03991
#2009-07-21 22.99630 22.98397 21.41278
#2009-07-22 23.06443 23.01112 22.98397
#2009-07-24 24.89291 25.56603 24.72232
#2009-07-27 25.38929 24.80535 25.56603
#2009-07-28 25.26712 25.65566 24.80535
#2009-07-29 25.83884 24.98163 25.65566
#2009-08-04 26.02464 25.49159 25.93297


A.K.

Hi There, 

I have a dataset with many rows and few columns as following: 

2009-07-15??? #N/A N/A??? #N/A N/A??? 18.96858 
2009-07-16??? 20.30685??? 20.40664??? #N/A N/A 
2009-07-17??? 20.78813??? 20.03991??? 20.40664 
2009-07-20??? 21.41278??? 21.41278??? 20.03991 
2009-07-21??? 22.9963??? 22.98397??? 21.41278 
2009-07-22??? 23.06443??? 23.01112??? 22.98397 
2009-07-23??? 23.45905??? 24.72232??? 23.01112 
2009-07-24??? 24.89291??? 25.56603??? 24.72232 
2009-07-27??? 25.38929??? 24.80535??? 25.56603 
2009-07-28??? 25.26712??? 25.65566??? 24.80535 
2009-07-29??? 25.83884??? 24.98163??? 25.65566 
2009-07-30??? #N/A N/A??? #N/A N/A??? 24.98163 
2009-08-03??? 25.25553??? 25.93297??? #N/A N/A 
2009-08-04??? 26.02464??? 25.49159??? 25.93297 

The class of the dataset is "zoo". My question might be stupid 
but could anyone suggest a way to remove the rows with #N/A values? 
I tried "rapply" command but it didn't work due to the data class.

btw, how about for the "Inf" values? 

Thank you in advance!

R help - Sep 2013 - remove rows with infinite/nan values from a zoo dataset

[R] remove rows with infinite/nan values from a zoo dataset

[R] remove rows with infinite/nan values from a zoo dataset