Hi, temp3<- read.table(text=" ID CTIME WEIGHT HM001 1223 24.0 HM001 1224 25.2 HM001 1225 23.1 HM001 1226 NA HM001 1227 32.1 HM001 1228 32.4 HM001 1229 1323.2 HM001 1230 27.4 HM001 1231 22.4236 #changed here to test the previous solution ",sep="",header=TRUE,stringsAsFactors=FALSE) ?tempnew<- na.omit(temp3) ?grep("\\d{4}",temp3$WEIGHT) #[1] 7 9 #not correct temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the decimals tail(temp3) #???? ID CTIME? WEIGHT #4 HM001? 1226????? NA #5 HM001? 1227 32.1000 #6 HM001? 1228 32.4000 #7 HM001? 1229????? NA #8 HM001? 1230 27.4000 #9 HM001? 1231 22.4236 #Based on the variance, You could set up some limit, for example 50 and use: tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT) A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, January 28, 2013 2:20 AM Subject: Re: Thank you your help. Thank you for your reply again.??Your understanding is exactly right. I attached?a?picture that show dataset. 'weight' is a dependent variable. And CTIME means hour/minute. This data will have accumulated for years. Speaking of accepted variance range, it would?be from 10 to 50. Actually, I am java programmer. So, I am strange this R Language. Can u give me some example to use grep function? -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "jamansymptom at naver.com"<jamansymptom at naver.com>; Cc: Sent: 2013-01-28 (?) 15:27:12 Subject: Re: Thank you your help. Hi, Your original post was that "...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 kg". So, my understanding was that you get values 2000 or 2000-4000 reads in place of 20-40 occasionally due to some misreading. If your dataset contains observed value, strange value and NA and you want to replace the strange value to NA, could you mention the range of strange values.? If the strange value ranges anywhere between 1000-9999, it should get replaced with the ?grep() solution.? But, if it depends upon something else, you need to specify.? Also, regarding the variance, what is your accepted range of variance. A.K. ----- Original Message ----- From: "jamansymptom at naver.com" <jamansymptom>@naver.com> To: smartpink111 at yahoo.com Cc: Sent: Monday, January 28, 2013 1:15 AM Subject: Thank you your help. Thank you to answer my question. It is not exactly what I want. I should have informed detailed situation. There is a sensor get data every minute. And that data will be accumulated and be portion of dataset. And the dataset contains observed value, strange value and NA. Namely, I am not sure where strange value will be occured. And I can't expect when strange value will be occured. I need the procedure performing like below.? 1. using a method, set the range of variance 2. using for(i) statement, check whether variance(weihgt) is in the range. 3. when variance is out of range, impute weight[i] as NA. Thank you.?
HI, How do you want to combine the results? It looks like the 5 datasets are list elements. If I take the first three list elements, imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))) #It could be combined by: do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be the sum of the number of rows of each dataset. I guess you want something like this: res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3) ?names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="") ?res #????? ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3 #1? HM001? 1223?? 24.90000?? 24.90000?? 24.90000 #2? HM001? 1224?? 25.20000?? 25.20000?? 25.20000 #3? HM001? 1225?? 25.50000?? 25.50000?? 25.50000 #4? HM001? 1226?? 25.24132?? 25.54828?? 25.46838 #5? HM001? 1227?? 25.70000?? 25.70000?? 25.70000 #6? HM001? 1228?? 27.10000?? 27.10000?? 27.10000 #7? HM001? 1229?? 27.30000?? 27.30000?? 27.30000 #8? HM001? 1230?? 27.40000?? 27.40000?? 27.40000 #9? HM001? 1231?? 28.40000?? 28.40000?? 28.40000 #10 HM001? 1232?? 29.20000?? 29.20000?? 29.20000 #11 HM001? 1233?? 30.13770?? 29.89770?? 30.88185 #12 HM001? 1234?? 31.17251?? 31.35045?? 31.57952 #13 HM001? 1235?? 32.40000?? 32.40000?? 32.40000 #14 HM001? 1236?? 33.70000?? 33.70000?? 33.70000 #15 HM001? 1237?? 34.30000?? 34.30000?? 34.30000 A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, January 28, 2013 7:35 PM Subject: Thank you your help and one more question. http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q# I deeply appreciate your help.?Answering your question, I am software engineer. And I am developing system accumulating data to draw chart and table. For higher perfromance, I have to deal missing value treatment.? So, I use Amelia Pacakge. Below is the result follwing your answer. ---------------------------------------------------------------->temp2??? #origin data?ID CTIME WEIGHT 1? HM001? 1223?? 24.9 2? HM001? 1224?? 25.2 3? HM001? 1225?? 25.5 4? HM001? 1226???? NA 5? HM001? 1227?? 25.7 6? HM001? 1228?? 27.1 7? HM001? 1229?? 27.3 8? HM001? 1230?? 27.4 9? HM001? 1231?? 28.4 10 HM001? 1232?? 29.2 11 HM001? 1233 1221.0 12 HM001? 1234???? NA 13 HM001? 1235?? 32.4 14 HM001? 1236?? 33.7 15 HM001? 1237?? 34.3?>?temp2$WEIGHT<- ifelse(temp2$WEIGHT>50,NA,temp2$WEIGHT)?>temp2??? # After eliminating?strange value ????? ID CTIME WEIGHT 1? HM001? 1223?? 24.9 2? HM001? 1224?? 25.2 3? HM001? 1225?? 25.5 4? HM001? 1226???? NA 5? HM001? 1227?? 25.7 6? HM001? 1228?? 27.1 7? HM001? 1229?? 27.3 8? HM001? 1230?? 27.4 9? HM001? 1231?? 28.4 10 HM001? 1232?? 29.2 11 HM001? 1233???? NA 12 HM001? 1234???? NA 13 HM001? 1235?? 32.4 14 HM001? 1236?? 33.7 15 HM001? 1237?? 34.3 -------------------------------------------------------------- I have One more question. Below?are codes and results. --------------------------------------------------------------> a.out2<-amelia(temp2, m=5, ts="CTIME", cs="ID", polytime=1)-- Imputation 1 -- ?1? 2? 3? 4 -- Imputation 2 -- ?1? 2? 3 -- Imputation 3 -- ?1? 2? 3? 4 -- Imputation 4 -- ?1? 2? 3 -- Imputation 5 -- ?1? 2? 3> a.out2$imputations$imp1 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.24132 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.13770 12 HM001? 1234 31.17251 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp2 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.54828 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 29.89770 12 HM001? 1234 31.35045 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp3 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.46838 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.88185 12 HM001? 1234 31.57952 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp4 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.86703 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.61241 12 HM001? 1234 30.17042 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp5 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 26.05747 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 31.03894 12 HM001? 1234 30.90960 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 ---------------------------------------- I got 5 datasets including imputed values. But What I want is not five datasets, only one data set which combine those 5 imputed datasets. I wannacombine $imp1, $imp2... $imp5 to get a final result set. This result set is also (3 X 15) matrix. Would you help me once more please? -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "???"<jamansymptom at naver.com>; Cc: "R help"<r-help at r-project.org>; Sent: 2013-01-28 (?) 23:48:51 Subject: Re: Thank you your help. Hi, temp3<- read.table(text=" ID CTIME WEIGHT HM001 1223 24.0 HM001 1224 25.2 HM001 1225 23.1 HM001 1226 NA HM001 1227 32.1 HM001 1228 32.4 HM001 1229 1323.2 HM001 1230 27.4 HM001 1231 22.4236 #changed here to test the previous solution ",sep="",header=TRUE,stringsAsFactors=FALSE) ?tempnew<- na.omit(temp3) ?grep("\\d{4}",temp3$WEIGHT) #[1] 7 9 #not correct temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the decimals tail(temp3) #???? ID CTIME? WEIGHT #4 HM001? 1226????? NA #5 HM001? 1227 32.1000 #6 HM001? 1228 32.4000 #7 HM001? 1229????? NA #8 HM001? 1230 27.4000 #9 HM001? 1231 22.4236 #Based on the variance, You could set up some limit, for example 50 and use: tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT) A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Monday, January 28, 2013 2:20 AM Subject: Re: Thank you your help. Thank you for your reply again.??Your understanding is exactly right. I attached?a?picture that show dataset. 'weight' is a dependent variable. And CTIME means hour/minute. This data will have accumulated for years. Speaking of accepted variance range, it would?be from 10 to 50. Actually, I am java programmer. So, I am strange this R Language. Can u give me some example to use grep function? -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "jamansymptom at naver.com"<jamansymptom>@naver.com>; Cc: Sent: 2013-01-28 (?) 15:27:12 Subject: Re: Thank you your help. Hi, Your original post was that "...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 kg". So, my understanding was that you get values 2000 or 2000-4000 reads in place of 20-40 occasionally due to some misreading. If your dataset contains observed value, strange value and NA and you want to replace the strange value to NA, could you mention the range of strange values.? If the strange value ranges anywhere between 1000-9999, it should get replaced with the ?grep() solution.? But, if it depends upon something else, you need to specify.? Also, regarding the variance, what is your accepted range of variance. A.K. ----- Original Message ----- From: "jamansymptom at naver.com" <jamansymptom>@naver.com> To: smartpink111 at yahoo.com Cc: Sent: Monday, January 28, 2013 1:15 AM Subject: Thank you your help. Thank you to answer my question. It is not exactly what I want. I should have informed detailed situation. There is a sensor get data every minute. And that data will be accumulated and be portion of dataset. And the dataset contains observed value, strange value and NA. Namely, I am not sure where strange value will be occured. And I can't expect when strange value will be occured. I need the procedure performing like below.? 1. using a method, set the range of variance 2. using for(i) statement, check whether variance(weihgt) is in the range. 3. when variance is out of range, impute weight[i] as NA. Thank you.????
HI, I don't have Amelia package installed. If you want to get the mean value, you could use either ?aggregate(),? or ?ddply() from library(plyr) library(plyr) imputNew<-do.call(rbind,imput1_2_3) ?res1<-ddply(imputNew,.(ID,CTIME),function(x) mean(x$WEIGHT)) ?names(res1)[3]<-"WEIGHT" ?head(res1) ?# ?? ID CTIME?? WEIGHT #1 HM001? 1223 24.90000 #2 HM001? 1224 25.20000 #3 HM001? 1225 25.50000 #4 HM001? 1226 25.41933 #5 HM001? 1227 25.70000 #6 HM001? 1228 27.10000 #or res2<-aggregate(.~ID+CTIME,data=imputNew,mean) #or res3<-? do.call(rbind,lapply(split(imputNew,imputNew$CTIME),function(x) {x$WEIGHT<-mean(x[,3]);head(x,1)})) row.names(res3)<-1:nrow(res3) identical(res1,res2) #[1] TRUE ?identical(res1,res3) #[1] TRUE A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Monday, January 28, 2013 9:47 PM Subject: Re: Thank you your help and one more question. Thank you for replying my question. What I want is the matrix like below. I have 3 data sets that named weightimp1, 2, 3. And, to get the matrix like below, I have to combine 3 data sets(named weightimp1, 2, 3). I don't know how to 3data sets combined. It could be mean of 3 data set. Or, there?might be a?value(temp2$imputations$...) in?Amelia package. I prefer to use Amelia package method, but if it?dosen't exist, can u recommend how to?set as a mean value?? #????? ID CTIME WEIGHT (It represents 3 data sets(weightimp1, 2, 3) #1? HM001? 1223?? 24.90000?? #2? HM001? 1224?? 25.20000? #3? HM001? 1225?? 25.50000?? #4? HM001? 1226?? 25.24132?? #5? HM001? 1227?? 25.70000?? #6? HM001? 1228?? 27.10000?? #7? HM001? 1229?? 27.30000?? #8? HM001? 1230?? 27.40000?? #9? HM001? 1231?? 28.40000?? #10 HM001? 1232?? 29.20000?? #11 HM001? 1233?? 30.13770?? #12 HM001? 1234?? 31.17251?? #13 HM001? 1235?? 32.40000?? #14 HM001? 1236?? 33.70000?? #15 HM001? 1237?? 34.30000?? -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "???"<jamansymptom at naver.com>; Cc: "R help"<r-help at r-project.org>; Sent: 2013-01-29 (?) 11:25:38 Subject: Re: Thank you your help and one more question. HI, How do you want to combine the results? It looks like the 5 datasets are list elements. If I take the first three list elements, imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))) #It could be combined by: do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be the sum of the number of rows of each dataset. I guess you want something like this: res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3) ?names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="") ?res #????? ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3 #1? HM001? 1223?? 24.90000?? 24.90000?? 24.90000 #2? HM001? 1224?? 25.20000?? 25.20000?? 25.20000 #3? HM001? 1225?? 25.50000?? 25.50000?? 25.50000 #4? HM001? 1226?? 25.24132?? 25.54828?? 25.46838 #5? HM001? 1227?? 25.70000?? 25.70000?? 25.70000 #6? HM001? 1228?? 27.10000?? 27.10000?? 27.10000 #7? HM001? 1229?? 27.30000?? 27.30000?? 27.30000 #8? HM001? 1230?? 27.40000?? 27.40000?? 27.40000 #9? HM001? 1231?? 28.40000?? 28.40000?? 28.40000 #10 HM001? 1232?? 29.20000?? 29.20000?? 29.20000 #11 HM001? 1233?? 30.13770?? 29.89770?? 30.88185 #12 HM001? 1234?? 31.17251?? 31.35045?? 31.57952 #13 HM001? 1235?? 32.40000?? 32.40000?? 32.40000 #14 HM001? 1236?? 33.70000?? 33.70000?? 33.70000 #15 HM001? 1237?? 34.30000?? 34.30000?? 34.30000 A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Monday, January 28, 2013 7:35 PM Subject: Thank you your help and one more question. http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q# I deeply appreciate your help.?Answering your question, I am software engineer. And I am developing system accumulating data to draw chart and table. For higher perfromance, I have to deal missing value treatment.? So, I use Amelia Pacakge. Below is the result follwing your answer. ---------------------------------------------------------------->temp2??? #origin data?ID CTIME WEIGHT 1? HM001? 1223?? 24.9 2? HM001? 1224?? 25.2 3? HM001? 1225?? 25.5 4? HM001? 1226???? NA 5? HM001? 1227?? 25.7 6? HM001? 1228?? 27.1 7? HM001? 1229?? 27.3 8? HM001? 1230?? 27.4 9? HM001? 1231?? 28.4 10 HM001? 1232?? 29.2 11 HM001? 1233 1221.0 12 HM001? 1234???? NA 13 HM001? 1235?? 32.4 14 HM001? 1236?? 33.7 15 HM001? 1237?? 34.3?>?temp2$WEIGHT<- ifelse(temp2$WEIGHT>50,NA,temp2$WEIGHT)?>temp2??? # After eliminating?strange value ????? ID CTIME WEIGHT 1? HM001? 1223?? 24.9 2? HM001? 1224?? 25.2 3? HM001? 1225?? 25.5 4? HM001? 1226???? NA 5? HM001? 1227?? 25.7 6? HM001? 1228?? 27.1 7? HM001? 1229?? 27.3 8? HM001? 1230?? 27.4 9? HM001? 1231?? 28.4 10 HM001? 1232?? 29.2 11 HM001? 1233???? NA 12 HM001? 1234???? NA 13 HM001? 1235?? 32.4 14 HM001? 1236?? 33.7 15 HM001? 1237?? 34.3 -------------------------------------------------------------- I have One more question. Below?are codes and results. --------------------------------------------------------------> a.out2<-amelia(temp2, m=5, ts="CTIME", cs="ID", polytime=1)-- Imputation 1 -- ?1? 2? 3? 4 -- Imputation 2 -- ?1? 2? 3 -- Imputation 3 -- ?1? 2? 3? 4 -- Imputation 4 -- ?1? 2? 3 -- Imputation 5 -- ?1? 2? 3> a.out2$imputations$imp1 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.24132 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.13770 12 HM001? 1234 31.17251 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp2 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.54828 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 29.89770 12 HM001? 1234 31.35045 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp3 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.46838 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.88185 12 HM001? 1234 31.57952 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp4 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.86703 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.61241 12 HM001? 1234 30.17042 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp5 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 26.05747 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 31.03894 12 HM001? 1234 30.90960 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 ---------------------------------------- I got 5 datasets including imputed values. But What I want is not five datasets, only one data set which combine those 5 imputed datasets. I wannacombine $imp1, $imp2... $imp5 to get a final result set. This result set is also (3 X 15) matrix. Would you help me once more please? -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "???"<jamansymptom>@naver.com>; Cc: "R help"<r-help>@r-project.org>; Sent: 2013-01-28 (?) 23:48:51 Subject: Re: Thank you your help. Hi, temp3<- read.table(text=" ID CTIME WEIGHT HM001 1223 24.0 HM001 1224 25.2 HM001 1225 23.1 HM001 1226 NA HM001 1227 32.1 HM001 1228 32.4 HM001 1229 1323.2 HM001 1230 27.4 HM001 1231 22.4236 #changed here to test the previous solution ",sep="",header=TRUE,stringsAsFactors=FALSE) ?tempnew<- na.omit(temp3) ?grep("\\d{4}",temp3$WEIGHT) #[1] 7 9 #not correct temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the decimals tail(temp3) #???? ID CTIME? WEIGHT #4 HM001? 1226????? NA #5 HM001? 1227 32.1000 #6 HM001? 1228 32.4000 #7 HM001? 1229????? NA #8 HM001? 1230 27.4000 #9 HM001? 1231 22.4236 #Based on the variance, You could set up some limit, for example 50 and use: tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT) A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Monday, January 28, 2013 2:20 AM Subject: Re: Thank you your help. Thank you for your reply again.??Your understanding is exactly right. I attached?a?picture that show dataset. 'weight' is a dependent variable. And CTIME means hour/minute. This data will have accumulated for years. Speaking of accepted variance range, it would?be from 10 to 50. Actually, I am java programmer. So, I am strange this R Language. Can u give me some example to use grep function? -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "jamansymptom at naver.com"<jamansymptom>@naver.com>; Cc: Sent: 2013-01-28 (?) 15:27:12 Subject: Re: Thank you your help. Hi, Your original post was that "...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 kg". So, my understanding was that you get values 2000 or 2000-4000 reads in place of 20-40 occasionally due to some misreading. If your dataset contains observed value, strange value and NA and you want to replace the strange value to NA, could you mention the range of strange values.? If the strange value ranges anywhere between 1000-9999, it should get replaced with the ?grep() solution.? But, if it depends upon something else, you need to specify.? Also, regarding the variance, what is your accepted range of variance. A.K. ----- Original Message ----- From: "jamansymptom at naver.com" <jamansymptom>@naver.com> To: smartpink111 at yahoo.com Cc: Sent: Monday, January 28, 2013 1:15 AM Subject: Thank you your help. Thank you to answer my question. It is not exactly what I want. I should have informed detailed situation. There is a sensor get data every minute. And that data will be accumulated and be portion of dataset. And the dataset contains observed value, strange value and NA. Namely, I am not sure where strange value will be occured. And I can't expect when strange value will be occured. I need the procedure performing like below.? 1. using a method, set the range of variance 2. using for(i) statement, check whether variance(weihgt) is in the range. 3. when variance is out of range, impute weight[i] as NA. Thank you.?????
Hi, I think I understand your mistake. imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))) ?imput<- list(imput1_2_3[1],imput1_2_3[2],imput1_2_3[3]) #what you tried.? You should use [[ ]]instead of [].? Here, it is not necessary aggregate(.~ID+CTIME,data=imput,mean) #Error in eval(expr, envir, enclos) : object 'ID' not found #You don't need the above step. class(imput1_2_3) #already a list [1] "list" ?imput<-do.call(rbind,imput1_2_3) ?aggregate(.~ID+CTIME,data=imput,mean) ? # ?? ID CTIME?? WEIGHT #1? HM001? 1223 24.90000 #2? HM001? 1224 25.20000 #3? HM001? 1225 25.50000 #4? HM001? 1226 25.41933 #5? HM001? 1227 25.70000 #6? HM001? 1228 27.10000 #7? HM001? 1229 27.30000 #8? HM001? 1230 27.40000 #9? HM001? 1231 28.40000 #10 HM001? 1232 29.20000 #11 HM001? 1233 30.30575 #12 HM001? 1234 31.36749 #13 HM001? 1235 32.40000 #14 HM001? 1236 33.70000 #15 HM001? 1237 34.30000 A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Tuesday, January 29, 2013 12:04 AM Subject: Re: Thank you your help and one more question. I decided to follow aggregate(). So i install library(plyr). But, While executing this statement 'res <- aggregate(.~ID+CIME, data=input,mean)', Error was occcured. What should I do next time? ?> library(plyr)> a.out2$imputations$imp1 ????? ID?????? CTIME ACTIVE_KWH 1? HM001 2.01212e+11?? 24.20000 2? HM001 2.01212e+11?? 25.50000 3? HM001 2.01212e+11?? 25.60000 4? HM001 2.01212e+11?? 25.90065 5? HM001 2.01212e+11?? 26.60000 6? HM001 2.01212e+11?? 26.70000 7? HM001 2.01212e+11?? 27.10000 8? HM001 2.01212e+11?? 27.40000 9? HM001 2.01212e+11?? 27.50000 10 HM001 2.01212e+11?? 27.80000 11 HM001 2.01212e+11?? 28.20000 12 HM001 2.01212e+11?? 28.44605 13 HM001 2.01212e+11?? 28.70000 14 HM001 2.01212e+11?? 28.90000 15 HM001 2.01212e+11?? 29.10000 $imp2 ????? ID?????? CTIME ACTIVE_KWH 1? HM001 2.01212e+11?? 24.20000 2? HM001 2.01212e+11?? 25.50000 3? HM001 2.01212e+11?? 25.60000 4? HM001 2.01212e+11?? 25.87163 5? HM001 2.01212e+11?? 26.60000 6? HM001 2.01212e+11?? 26.70000 7? HM001 2.01212e+11?? 27.10000 8? HM001 2.01212e+11?? 27.40000 9? HM001 2.01212e+11?? 27.50000 10 HM001 2.01212e+11?? 27.80000 11 HM001 2.01212e+11?? 28.20000 12 HM001 2.01212e+11?? 28.68048 13 HM001 2.01212e+11?? 28.70000 14 HM001 2.01212e+11?? 28.90000 15 HM001 2.01212e+11?? 29.10000?> imput <- list(a.out2$imputations[1], a.out2$imputations[2]) > do.call(rbind, imput)[[1]] [[1]]$imp1 ????? ID?????? CTIME ACTIVE_KWH 1? HM001 2.01212e+11?? 24.20000 2? HM001 2.01212e+11?? 25.50000 3? HM001 2.01212e+11?? 25.60000 4? HM001 2.01212e+11?? 25.90065 5? HM001 2.01212e+11?? 26.60000 6? HM001 2.01212e+11?? 26.70000 7? HM001 2.01212e+11?? 27.10000 8? HM001 2.01212e+11?? 27.40000 9? HM001 2.01212e+11?? 27.50000 10 HM001 2.01212e+11?? 27.80000 11 HM001 2.01212e+11?? 28.20000 12 HM001 2.01212e+11?? 28.44605 13 HM001 2.01212e+11?? 28.70000 14 HM001 2.01212e+11?? 28.90000 15 HM001 2.01212e+11?? 29.10000 [[2]] [[2]]$imp2 ????? ID?????? CTIME ACTIVE_KWH 1? HM001 2.01212e+11?? 24.20000 2? HM001 2.01212e+11?? 25.50000 3? HM001 2.01212e+11?? 25.60000 4? HM001 2.01212e+11?? 25.87163 5? HM001 2.01212e+11?? 26.60000 6? HM001 2.01212e+11?? 26.70000 7? HM001 2.01212e+11?? 27.10000 8? HM001 2.01212e+11?? 27.40000 9? HM001 2.01212e+11?? 27.50000 10 HM001 2.01212e+11?? 27.80000 11 HM001 2.01212e+11?? 28.20000 12 HM001 2.01212e+11?? 28.68048 13 HM001 2.01212e+11?? 28.70000 14 HM001 2.01212e+11?? 28.90000 15 HM001 2.01212e+11?? 29.10000> res <- aggregate(.~ID+CTIME, data=imput,mean)Follwing Error. eval(expr, envir, enclos) : no element 'ID'????? # I transfer this line in english because it was written by my mother language. -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "???"<jamansymptom at naver.com>; Cc: "R help"<r-help at r-project.org>; Sent: 2013-01-29 (?) 12:20:10 Subject: Re: Thank you your help and one more question. HI, I don't have Amelia package installed. If you want to get the mean value, you could use either ?aggregate(),? or ?ddply() from library(plyr) library(plyr) imputNew<-do.call(rbind,imput1_2_3) ?res1<-ddply(imputNew,.(ID,CTIME),function(x) mean(x$WEIGHT)) ?names(res1)[3]<-"WEIGHT" ?head(res1) ?# ?? ID CTIME?? WEIGHT #1 HM001? 1223 24.90000 #2 HM001? 1224 25.20000 #3 HM001? 1225 25.50000 #4 HM001? 1226 25.41933 #5 HM001? 1227 25.70000 #6 HM001? 1228 27.10000 #or res2<-aggregate(.~ID+CTIME,data=imputNew,mean) #or res3<-? do.call(rbind,lapply(split(imputNew,imputNew$CTIME),function(x) {x$WEIGHT<-mean(x[,3]);head(x,1)})) row.names(res3)<-1:nrow(res3) identical(res1,res2) #[1] TRUE ?identical(res1,res3) #[1] TRUE A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Monday, January 28, 2013 9:47 PM Subject: Re: Thank you your help and one more question. Thank you for replying my question. What I want is the matrix like below. I have 3 data sets that named weightimp1, 2, 3. And, to get the matrix like below, I have to combine 3 data sets(named weightimp1, 2, 3). I don't know how to 3data sets combined. It could be mean of 3 data set. Or, there?might be a?value(temp2$imputations$...) in?Amelia package. I prefer to use Amelia package method, but if it?dosen't exist, can u recommend how to?set as a mean value?? #????? ID CTIME WEIGHT (It represents 3 data sets(weightimp1, 2, 3) #1? HM001? 1223?? 24.90000?? #2? HM001? 1224?? 25.20000? #3? HM001? 1225?? 25.50000?? #4? HM001? 1226?? 25.24132?? #5? HM001? 1227?? 25.70000?? #6? HM001? 1228?? 27.10000?? #7? HM001? 1229?? 27.30000?? #8? HM001? 1230?? 27.40000?? #9? HM001? 1231?? 28.40000?? #10 HM001? 1232?? 29.20000?? #11 HM001? 1233?? 30.13770?? #12 HM001? 1234?? 31.17251?? #13 HM001? 1235?? 32.40000?? #14 HM001? 1236?? 33.70000?? #15 HM001? 1237?? 34.30000?? -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "???"<jamansymptom>@naver.com>; Cc: "R help"<r-help>@r-project.org>; Sent: 2013-01-29 (?) 11:25:38 Subject: Re: Thank you your help and one more question. HI, How do you want to combine the results? It looks like the 5 datasets are list elements. If I take the first three list elements, imput1_2_3<-list(imp1=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.24132, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.1377, 31.17251, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp2=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.54828, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 29.8977, 31.35045, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15")), imp3=structure(list(ID = c("HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001", "HM001"), CTIME = 1223:1237, WEIGHT = c(24.9, 25.2, 25.5, 25.46838, 25.7, 27.1, 27.3, 27.4, 28.4, 29.2, 30.88185, 31.57952, 32.4, 33.7, 34.3)), .Names = c("ID", "CTIME", "WEIGHT" ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))) #It could be combined by: do.call(rbind, imput1_2_3)# But if you do this the total number or rows will be the sum of the number of rows of each dataset. I guess you want something like this: res<-Reduce(function(...) merge(...,by=c("ID","CTIME")),imput1_2_3) ?names(res)[3:5]<- paste("WEIGHT","IMP",1:3,sep="") ?res #????? ID CTIME WEIGHTIMP1 WEIGHTIMP2 WEIGHTIMP3 #1? HM001? 1223?? 24.90000?? 24.90000?? 24.90000 #2? HM001? 1224?? 25.20000?? 25.20000?? 25.20000 #3? HM001? 1225?? 25.50000?? 25.50000?? 25.50000 #4? HM001? 1226?? 25.24132?? 25.54828?? 25.46838 #5? HM001? 1227?? 25.70000?? 25.70000?? 25.70000 #6? HM001? 1228?? 27.10000?? 27.10000?? 27.10000 #7? HM001? 1229?? 27.30000?? 27.30000?? 27.30000 #8? HM001? 1230?? 27.40000?? 27.40000?? 27.40000 #9? HM001? 1231?? 28.40000?? 28.40000?? 28.40000 #10 HM001? 1232?? 29.20000?? 29.20000?? 29.20000 #11 HM001? 1233?? 30.13770?? 29.89770?? 30.88185 #12 HM001? 1234?? 31.17251?? 31.35045?? 31.57952 #13 HM001? 1235?? 32.40000?? 32.40000?? 32.40000 #14 HM001? 1236?? 33.70000?? 33.70000?? 33.70000 #15 HM001? 1237?? 34.30000?? 34.30000?? 34.30000 A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Monday, January 28, 2013 7:35 PM Subject: Thank you your help and one more question. http://us-mg6.mail.yahoo.com/neo/launch?.rand=3qkohpi922i2q# I deeply appreciate your help.?Answering your question, I am software engineer. And I am developing system accumulating data to draw chart and table. For higher perfromance, I have to deal missing value treatment.? So, I use Amelia Pacakge. Below is the result follwing your answer. ---------------------------------------------------------------->temp2??? #origin data?ID CTIME WEIGHT 1? HM001? 1223?? 24.9 2? HM001? 1224?? 25.2 3? HM001? 1225?? 25.5 4? HM001? 1226???? NA 5? HM001? 1227?? 25.7 6? HM001? 1228?? 27.1 7? HM001? 1229?? 27.3 8? HM001? 1230?? 27.4 9? HM001? 1231?? 28.4 10 HM001? 1232?? 29.2 11 HM001? 1233 1221.0 12 HM001? 1234???? NA 13 HM001? 1235?? 32.4 14 HM001? 1236?? 33.7 15 HM001? 1237?? 34.3?>?temp2$WEIGHT<- ifelse(temp2$WEIGHT>50,NA,temp2$WEIGHT)?>temp2??? # After eliminating?strange value ????? ID CTIME WEIGHT 1? HM001? 1223?? 24.9 2? HM001? 1224?? 25.2 3? HM001? 1225?? 25.5 4? HM001? 1226???? NA 5? HM001? 1227?? 25.7 6? HM001? 1228?? 27.1 7? HM001? 1229?? 27.3 8? HM001? 1230?? 27.4 9? HM001? 1231?? 28.4 10 HM001? 1232?? 29.2 11 HM001? 1233???? NA 12 HM001? 1234???? NA 13 HM001? 1235?? 32.4 14 HM001? 1236?? 33.7 15 HM001? 1237?? 34.3 -------------------------------------------------------------- I have One more question. Below?are codes and results. --------------------------------------------------------------> a.out2<-amelia(temp2, m=5, ts="CTIME", cs="ID", polytime=1)-- Imputation 1 -- ?1? 2? 3? 4 -- Imputation 2 -- ?1? 2? 3 -- Imputation 3 -- ?1? 2? 3? 4 -- Imputation 4 -- ?1? 2? 3 -- Imputation 5 -- ?1? 2? 3> a.out2$imputations$imp1 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.24132 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.13770 12 HM001? 1234 31.17251 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp2 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.54828 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 29.89770 12 HM001? 1234 31.35045 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp3 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.46838 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.88185 12 HM001? 1234 31.57952 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp4 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 25.86703 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 30.61241 12 HM001? 1234 30.17042 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 $imp5 ????? ID CTIME?? WEIGHT 1? HM001? 1223 24.90000 2? HM001? 1224 25.20000 3? HM001? 1225 25.50000 4? HM001? 1226 26.05747 5? HM001? 1227 25.70000 6? HM001? 1228 27.10000 7? HM001? 1229 27.30000 8? HM001? 1230 27.40000 9? HM001? 1231 28.40000 10 HM001? 1232 29.20000 11 HM001? 1233 31.03894 12 HM001? 1234 30.90960 13 HM001? 1235 32.40000 14 HM001? 1236 33.70000 15 HM001? 1237 34.30000 ---------------------------------------- I got 5 datasets including imputed values. But What I want is not five datasets, only one data set which combine those 5 imputed datasets. I wannacombine $imp1, $imp2... $imp5 to get a final result set. This result set is also (3 X 15) matrix. Would you help me once more please? -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "???"<jamansymptom>@naver.com>; Cc: "R help"<r-help>@r-project.org>; Sent: 2013-01-28 (?) 23:48:51 Subject: Re: Thank you your help. Hi, temp3<- read.table(text=" ID CTIME WEIGHT HM001 1223 24.0 HM001 1224 25.2 HM001 1225 23.1 HM001 1226 NA HM001 1227 32.1 HM001 1228 32.4 HM001 1229 1323.2 HM001 1230 27.4 HM001 1231 22.4236 #changed here to test the previous solution ",sep="",header=TRUE,stringsAsFactors=FALSE) ?tempnew<- na.omit(temp3) ?grep("\\d{4}",temp3$WEIGHT) #[1] 7 9 #not correct temp3[,3][grep("\\d{4}..*",temp3$WEIGHT)]<-NA #match 4 digit numbers before the decimals tail(temp3) #???? ID CTIME? WEIGHT #4 HM001? 1226????? NA #5 HM001? 1227 32.1000 #6 HM001? 1228 32.4000 #7 HM001? 1229????? NA #8 HM001? 1230 27.4000 #9 HM001? 1231 22.4236 #Based on the variance, You could set up some limit, for example 50 and use: tempnew$WEIGHT<- ifelse(tempnew$WEIGHT>50,NA,tempnew$WEIGHT) A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Monday, January 28, 2013 2:20 AM Subject: Re: Thank you your help. Thank you for your reply again.??Your understanding is exactly right. I attached?a?picture that show dataset. 'weight' is a dependent variable. And CTIME means hour/minute. This data will have accumulated for years. Speaking of accepted variance range, it would?be from 10 to 50. Actually, I am java programmer. So, I am strange this R Language. Can u give me some example to use grep function? -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "jamansymptom at naver.com"<jamansymptom>@naver.com>; Cc: Sent: 2013-01-28 (?) 15:27:12 Subject: Re: Thank you your help. Hi, Your original post was that "...it was evaluated from 20kg -40kg. But By some errors, it is evaluated 2000 kg". So, my understanding was that you get values 2000 or 2000-4000 reads in place of 20-40 occasionally due to some misreading. If your dataset contains observed value, strange value and NA and you want to replace the strange value to NA, could you mention the range of strange values.? If the strange value ranges anywhere between 1000-9999, it should get replaced with the ?grep() solution.? But, if it depends upon something else, you need to specify.? Also, regarding the variance, what is your accepted range of variance. A.K. ----- Original Message ----- From: "jamansymptom at naver.com" <jamansymptom>@naver.com> To: smartpink111 at yahoo.com Cc: Sent: Monday, January 28, 2013 1:15 AM Subject: Thank you your help. Thank you to answer my question. It is not exactly what I want. I should have informed detailed situation. There is a sensor get data every minute. And that data will be accumulated and be portion of dataset. And the dataset contains observed value, strange value and NA. Namely, I am not sure where strange value will be occured. And I can't expect when strange value will be occured. I need the procedure performing like below.? 1. using a method, set the range of variance 2. using for(i) statement, check whether variance(weihgt) is in the range. 3. when variance is out of range, impute weight[i] as NA. Thank you.??????
HI, temp<-read.table(text=" ?ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3 ",sep="",header=TRUE,stringsAsFactors=F) #Here, I assume that you consider <1000 as low values, You can change it accordingly. ?temp[,3:4][temp[,3]<1000& !is.na(temp[,3]),]<-NA ?temp #????? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH #1? HM001 201212121301???? 1201.9?????? 1115.5 #2? HM001 201212121302???? 1202.2?????? 1115.8 #3? HM001 201212121303???? 1202.8?????? 1115.8 #4? HM001 201212121304???????? NA?????? 1116.1 #5? HM001 201212121305???? 1203.9?????? 1116.7 #6? HM001 201212121306???????? NA?????? 1116.7 #7? HM001 201212121307???????? NA?????? 1116.7 #8? HM001 201212121308???????? NA?????????? NA #9? HM001 201212121309???? 1206.0?????? 1118.2 #10 HM001 201212121310???? 1206.3?????? 1118.6 #11 HM001 201212121311???? 1206.5?????? 1118.8 #12 HM001 201212121312???????? NA?????????? NA #13 HM001 201212121313???? 1207.3?????????? NA #14 HM001 201212121314???? 1207.9?????? 1121.1 #15 HM001 201212121315???? 1208.4?????? 1121.3 #Suppose your dataset is like this: temp1<-read.table(text=" ?ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 21.0 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 22.0 ",sep="",header=TRUE,stringsAsFactors=F) temp1[,3][temp1[,3]<1000&!is.na(temp[,3])]<-NA ?temp1[,4][temp1[,4]<1000&!is.na(temp[,4])]<-NA Hope it helps. A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Tuesday, January 29, 2013 3:36 AM Subject: Re: I succeed to get result dataset. Arun ~ I have a dfficuliting in using R again. A?Dataset?'temp'?contatins NA and strange value(like 8 row 12.0, 31.0 which is out of range of value). **What I want is to set strange value as NA.**? Then I'll impute dataset 'temp' by myself. Since, It is impossible to be little for 'WIDTH' and 'HEIGHT', I?define a procdeure like below.> for(i in 2:m){?ex$WIDTH[i]<- ifelse(ex$WIDTH [i]- ex$WIDTH [i-1]<0,NA, ex$WIDTH [i]) ?ex$HEIGHT[i]<- ifelse(ex$HEIGHT[i]- ex$HEIGHT [i-1]<0,NA, ex$HEIGHT [i]) } But result is wrong.?Do u have better idea to define procedure performing well? `There is a dataset named 'temp'. ????? ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3> m<- 15 > for(i in 2:m){temp$ACTIVE_KWH[i]<- ifelse(temp$ ACTIVE_KWH [i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ ACTIVE_KWH [i])temp$REACTIVE_KWH[i]<- ifelse(temp$ REACTIVE_KWH [i]- temp$REACTIVE_KWH[i-1]<0,NA, temp$ REACTIVE_KWH [i]) } **result of for statement** ?? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301???? 1201.9?????? 1115.5 2? HM001 201212121302???? 1202.2?????? 1115.8 3? HM001 201212121303???? 1202.8?????? 1115.8 4? HM001 201212121304???????? NA?????? 1116.1 5? HM001 201212121305???????? NA?????? 1116.7 6 ?HM001 201212121306???????? NA?????? 1116.7 7? HM001 201212121307???????? NA?????? 1116.7 8? HM001 201212121308???????? NA?????????? NA 9? HM001 201212121309???????? NA?????????? NA 10 HM001 201212121310???????? NA?????????? NA 11 HM001 201212121311???????? NA?????????? NA 12 HM001 201212121312???????? NA?????????? NA 13 HM001 201212121313???????? NA?????????? NA 14 HM001 201212121314???????? NA?????????? NA 15 HM001 201212121315???????? NA?????????? NA **What I expect (row8 WIDTH=NA, HEIGHT=NA)**? ID??????? CTIME? WIDTH HEIGHT 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308???? NA???? NA 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3 -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "???"<jamansymptom at naver.com>; Cc: Sent: 2013-01-29 (?) 15:23:56 Subject: Re: I succeed to get result dataset. HI, I am glad that it got fixed. You can ask for help. Thank you for the kind words. Good night! Arun?????????????????????????????????????????????????????
Hi, Sorry, I didn't check your codes previously. I hope this works for you (especially the <0). Using the first dataset temp: temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])< 0)]<-NA temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)][c(FALSE,diff(temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)])< 0)]<-NA temp #????? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH #1? HM001 201212121301???? 1201.9?????? 1115.5 #2? HM001 201212121302???? 1202.2?????? 1115.8 #3? HM001 201212121303???? 1202.8?????? 1115.8 #4? HM001 201212121304???????? NA?????? 1116.1 #5? HM001 201212121305???? 1203.9?????? 1116.7 #6? HM001 201212121306???????? NA?????? 1116.7 #7? HM001 201212121307???????? NA?????? 1116.7 #8? HM001 201212121308???????? NA?????????? NA #9? HM001 201212121309???? 1206.0?????? 1118.2 #10 HM001 201212121310???? 1206.3?????? 1118.6 #11 HM001 201212121311???? 1206.5?????? 1118.8 #12 HM001 201212121312???????? NA?????????? NA #13 HM001 201212121313???? 1207.3?????????? NA #14 HM001 201212121314???? 1207.9?????? 1121.1 #15 HM001 201212121315???? 1208.4?????? 1121.3 temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)][c(FALSE,diff(temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)])< 0)]<-NA #Similarly with the second dataset: temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)][c(FALSE,diff(temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)])< 0)]<-NA temp1$REACTIVE_KWH[!is.na(temp1$REACTIVE_KWH)][c(FALSE,diff(temp1$REACTIVE_KWH[!is.na(temp1$REACTIVE_KWH)])< 0)]<-NA A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Tuesday, January 29, 2013 7:42 PM Subject: I think you misunderstood my explantation. Hi, Assume that first CTIME value is '201201010000'. It means?ACTIVE_KWH?measured from??'201201010000' to present. show example below row. 1? HM001 201212121301 1201.9 1115.5 1 row's? ACTIVE_KWH?? accumulated?value that measured from '201201010000' to '201212121301'. when CTIME is '201212121301',??ACTIVE_KWH? is '1201.9'.? And, when? CTIME is? '201212121302', ACTIVE_KWH? is?'1202.2'. It?means that?0.3 is measured?during 1 minute.? And??ACTIVE_KWH? is a accumulated value. Thus, ACTIVE_KWH? must increase, as CTIME? increases. You got it?? So, I have to define strange value?as subtraction?value like (?temp$ACTIVE_KWH[i] -??temp$ACTIVE_KWH[i-1]) > 50). '50' can be chagned. ---------------------------------------------------------------------> for(i in 2:m){?temp$ACTIVE_KWH[i]<- ifelse(temp$ACTIVE_KWH[i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ACTIVE_KWH[i]) } ---------------------------------------------------------------------- But, in this case, ?critical error occured.?If??temp$ACTIVE_KWH[3]?is NA, posterior data (temp$ACTIVE_KWH[4], [5], [6]...) ?is imputed as NA. Last mail contains Detailed source code and result. Can you recommend better idea to avoid imputed dataset as a successive NA. -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "???"<jamansymptom at naver.com>; Cc: "R help"<r-help at r-project.org>; Sent: 2013-01-29 (?) 23:28:30 Subject: Re: I succeed to get result dataset. HI, temp<-read.table(text=" ?ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3 ",sep="",header=TRUE,stringsAsFactors=F) #Here, I assume that you consider <1000 as low values, You can change it accordingly. ?temp[,3:4][temp[,3]<1000& !is.na(temp[,3]),]<-NA ?temp #????? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH #1? HM001 201212121301???? 1201.9?????? 1115.5 #2? HM001 201212121302???? 1202.2?????? 1115.8 #3? HM001 201212121303???? 1202.8?????? 1115.8 #4? HM001 201212121304???????? NA?????? 1116.1 #5? HM001 201212121305???? 1203.9?????? 1116.7 #6? HM001 201212121306???????? NA?????? 1116.7 #7? HM001 201212121307???????? NA?????? 1116.7 #8? HM001 201212121308???????? NA?????????? NA #9? HM001 201212121309???? 1206.0?????? 1118.2 #10 HM001 201212121310???? 1206.3?????? 1118.6 #11 HM001 201212121311???? 1206.5?????? 1118.8 #12 HM001 201212121312???????? NA?????????? NA #13 HM001 201212121313???? 1207.3?????????? NA #14 HM001 201212121314???? 1207.9?????? 1121.1 #15 HM001 201212121315???? 1208.4?????? 1121.3 #Suppose your dataset is like this: temp1<-read.table(text=" ?ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 21.0 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 22.0 ",sep="",header=TRUE,stringsAsFactors=F) temp1[,3][temp1[,3]<1000&!is.na(temp[,3])]<-NA ?temp1[,4][temp1[,4]<1000&!is.na(temp[,4])]<-NA Hope it helps. A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Tuesday, January 29, 2013 3:36 AM Subject: Re: I succeed to get result dataset. Arun ~ I have a dfficuliting in using R again. A?Dataset?'temp'?contatins NA and strange value(like 8 row 12.0, 31.0 which is out of range of value). **What I want is to set strange value as NA.**? Then I'll impute dataset 'temp' by myself. Since, It is impossible to be little for 'WIDTH' and 'HEIGHT', I?define a procdeure like below.> for(i in 2:m){?ex$WIDTH[i]<- ifelse(ex$WIDTH [i]- ex$WIDTH [i-1]<0,NA, ex$WIDTH [i]) ?ex$HEIGHT[i]<- ifelse(ex$HEIGHT[i]- ex$HEIGHT [i-1]<0,NA, ex$HEIGHT [i]) } But result is wrong.?Do u have better idea to define procedure performing well? `There is a dataset named 'temp'. ????? ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3> m<- 15 > for(i in 2:m){temp$ACTIVE_KWH[i]<- ifelse(temp$ ACTIVE_KWH [i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ ACTIVE_KWH [i])temp$REACTIVE_KWH[i]<- ifelse(temp$ REACTIVE_KWH [i]- temp$REACTIVE_KWH[i-1]<0,NA, temp$ REACTIVE_KWH [i]) } **result of for statement** ?? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301???? 1201.9?????? 1115.5 2? HM001 201212121302???? 1202.2?????? 1115.8 3? HM001 201212121303???? 1202.8?????? 1115.8 4? HM001 201212121304???????? NA?????? 1116.1 5? HM001 201212121305???????? NA?????? 1116.7 6 ?HM001 201212121306???????? NA?????? 1116.7 7? HM001 201212121307???????? NA?????? 1116.7 8? HM001 201212121308???????? NA?????????? NA 9? HM001 201212121309???????? NA?????????? NA 10 HM001 201212121310???????? NA?????????? NA 11 HM001 201212121311???????? NA?????????? NA 12 HM001 201212121312???????? NA?????????? NA 13 HM001 201212121313???????? NA?????????? NA 14 HM001 201212121314???????? NA?????????? NA 15 HM001 201212121315???????? NA?????????? NA **What I expect (row8 WIDTH=NA, HEIGHT=NA)**? ID??????? CTIME? WIDTH HEIGHT 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308???? NA???? NA 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3 -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "???"<jamansymptom>@naver.com>; Cc: Sent: 2013-01-29 (?) 15:23:56 Subject: Re: I succeed to get result dataset. HI, I am glad that it got fixed. You can ask for help. Thank you for the kind words. Good night! Arun????????????????????????????????????????????????????????
Hi, Your dataset had already some missing values.? So, I need to subset only those rows that are not missing. !is.na(temp$ACTIVE_KWH) # [1]? TRUE? TRUE? TRUE FALSE? TRUE FALSE FALSE? TRUE? TRUE? TRUE? TRUE FALSE #[13]? TRUE? TRUE? TRUE temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)] #[1] 1201.9 1202.2 1202.8 1203.9?? 12.0 1206.0 1206.3 1206.5 1207.3 1207.9 #[11] 1208.4 ?diff() will get the differences between successive values diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)]) ?#[1]???? 0.3???? 0.6???? 1.1 -1191.9? 1194.0???? 0.3???? 0.2???? 0.8???? 0.6 #[10]???? 0.5 #Here, the length is 1 less than the previous case as the first value is removed. ?diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])<0 # [1] FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE FALSE #Added `FALSE` at the beginning to make the length equal to subset data indx<- c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])<0) indx #[1] FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE FALSE #Using this index, further subset the already subset data for differences of values <0 ?temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][indx] #[1] 12 temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][indx]<- NA #changed to NA #Similarly for REACTIVE_KWH Hope this helps. A.K. ________________________________ From: ??? <jamansymptom at naver.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, January 30, 2013 12:51 AM Subject: Re: I think you misunderstood my explantation. Oh, I forgot to ask about those code. Can u expain what dose that mean? Using the first dataset temp: temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])< 0)]<-NA temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)][c(FALSE,diff(temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)])< 0)]<-NA? -----Original Message----- From: "arun"<smartpink111 at yahoo.com> To: "???"<jamansymptom at naver.com>; Cc: "R help"<r-help at r-project.org>; Sent: 2013-01-30 (?) 10:37:18 Subject: Re: I think you misunderstood my explantation. Hi, Sorry, I didn't check your codes previously. I hope this works for you (especially the <0). Using the first dataset temp: temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)][c(FALSE,diff(temp$ACTIVE_KWH[!is.na(temp$ACTIVE_KWH)])< 0)]<-NA temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)][c(FALSE,diff(temp$REACTIVE_KWH[!is.na(temp$REACTIVE_KWH)])< 0)]<-NA temp #????? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH #1? HM001 201212121301???? 1201.9?????? 1115.5 #2? HM001 201212121302???? 1202.2?????? 1115.8 #3? HM001 201212121303???? 1202.8?????? 1115.8 #4? HM001 201212121304???????? NA?????? 1116.1 #5? HM001 201212121305???? 1203.9?????? 1116.7 #6? HM001 201212121306???????? NA?????? 1116.7 #7? HM001 201212121307???????? NA?????? 1116.7 #8? HM001 201212121308???????? NA?????????? NA #9? HM001 201212121309???? 1206.0?????? 1118.2 #10 HM001 201212121310???? 1206.3?????? 1118.6 #11 HM001 201212121311???? 1206.5?????? 1118.8 #12 HM001 201212121312???????? NA?????????? NA #13 HM001 201212121313???? 1207.3?????????? NA #14 HM001 201212121314???? 1207.9?????? 1121.1 #15 HM001 201212121315???? 1208.4?????? 1121.3 temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)][c(FALSE,diff(temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)])< 0)]<-NA #Similarly with the second dataset: temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)][c(FALSE,diff(temp1$ACTIVE_KWH[!is.na(temp1$ACTIVE_KWH)])< 0)]<-NA temp1$REACTIVE_KWH[!is.na(temp1$REACTIVE_KWH)][c(FALSE,diff(temp1$REACTIVE_KWH[!is.na(temp1$REACTIVE_KWH)])< 0)]<-NA A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Tuesday, January 29, 2013 7:42 PM Subject: I think you misunderstood my explantation. Hi, Assume that first CTIME value is '201201010000'. It means?ACTIVE_KWH?measured from??'201201010000' to present. show example below row. 1? HM001 201212121301 1201.9 1115.5 1 row's? ACTIVE_KWH?? accumulated?value that measured from '201201010000' to '201212121301'. when CTIME is '201212121301',??ACTIVE_KWH? is '1201.9'.? And, when? CTIME is? '201212121302', ACTIVE_KWH? is?'1202.2'. It?means that?0.3 is measured?during 1 minute.? And??ACTIVE_KWH? is a accumulated value. Thus, ACTIVE_KWH? must increase, as CTIME? increases. You got it?? So, I have to define strange value?as subtraction?value like (?temp$ACTIVE_KWH[i] -??temp$ACTIVE_KWH[i-1]) > 50). '50' can be chagned. ---------------------------------------------------------------------> for(i in 2:m){?temp$ACTIVE_KWH[i]<- ifelse(temp$ACTIVE_KWH[i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ACTIVE_KWH[i]) } ---------------------------------------------------------------------- But, in this case, ?critical error occured.?If??temp$ACTIVE_KWH[3]?is NA, posterior data (temp$ACTIVE_KWH[4], [5], [6]...) ?is imputed as NA. Last mail contains Detailed source code and result. Can you recommend better idea to avoid imputed dataset as a successive NA. -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "???"<jamansymptom>@naver.com>; Cc: "R help"<r-help>@r-project.org>; Sent: 2013-01-29 (?) 23:28:30 Subject: Re: I succeed to get result dataset. HI, temp<-read.table(text=" ?ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3 ",sep="",header=TRUE,stringsAsFactors=F) #Here, I assume that you consider <1000 as low values, You can change it accordingly. ?temp[,3:4][temp[,3]<1000& !is.na(temp[,3]),]<-NA ?temp #????? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH #1? HM001 201212121301???? 1201.9?????? 1115.5 #2? HM001 201212121302???? 1202.2?????? 1115.8 #3? HM001 201212121303???? 1202.8?????? 1115.8 #4? HM001 201212121304???????? NA?????? 1116.1 #5? HM001 201212121305???? 1203.9?????? 1116.7 #6? HM001 201212121306???????? NA?????? 1116.7 #7? HM001 201212121307???????? NA?????? 1116.7 #8? HM001 201212121308???????? NA?????????? NA #9? HM001 201212121309???? 1206.0?????? 1118.2 #10 HM001 201212121310???? 1206.3?????? 1118.6 #11 HM001 201212121311???? 1206.5?????? 1118.8 #12 HM001 201212121312???????? NA?????????? NA #13 HM001 201212121313???? 1207.3?????????? NA #14 HM001 201212121314???? 1207.9?????? 1121.1 #15 HM001 201212121315???? 1208.4?????? 1121.3 #Suppose your dataset is like this: temp1<-read.table(text=" ?ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 21.0 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 22.0 ",sep="",header=TRUE,stringsAsFactors=F) temp1[,3][temp1[,3]<1000&!is.na(temp[,3])]<-NA ?temp1[,4][temp1[,4]<1000&!is.na(temp[,4])]<-NA Hope it helps. A.K. ________________________________ From: ??? <jamansymptom>@naver.com> To: arun <smartpink111>@yahoo.com> Sent: Tuesday, January 29, 2013 3:36 AM Subject: Re: I succeed to get result dataset. Arun ~ I have a dfficuliting in using R again. A?Dataset?'temp'?contatins NA and strange value(like 8 row 12.0, 31.0 which is out of range of value). **What I want is to set strange value as NA.**? Then I'll impute dataset 'temp' by myself. Since, It is impossible to be little for 'WIDTH' and 'HEIGHT', I?define a procdeure like below.> for(i in 2:m){?ex$WIDTH[i]<- ifelse(ex$WIDTH [i]- ex$WIDTH [i-1]<0,NA, ex$WIDTH [i]) ?ex$HEIGHT[i]<- ifelse(ex$HEIGHT[i]- ex$HEIGHT [i-1]<0,NA, ex$HEIGHT [i]) } But result is wrong.?Do u have better idea to define procedure performing well? `There is a dataset named 'temp'. ????? ID??????? CTIME?? ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308?? 12.0?? 31.0 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3> m<- 15 > for(i in 2:m){temp$ACTIVE_KWH[i]<- ifelse(temp$ ACTIVE_KWH [i]- temp$ACTIVE_KWH[i-1]<0,NA, temp$ ACTIVE_KWH [i])temp$REACTIVE_KWH[i]<- ifelse(temp$ REACTIVE_KWH [i]- temp$REACTIVE_KWH[i-1]<0,NA, temp$ REACTIVE_KWH [i]) } **result of for statement** ?? ID??????? CTIME ACTIVE_KWH REACTIVE_KWH 1? HM001 201212121301???? 1201.9?????? 1115.5 2? HM001 201212121302???? 1202.2?????? 1115.8 3? HM001 201212121303???? 1202.8?????? 1115.8 4? HM001 201212121304???????? NA?????? 1116.1 5? HM001 201212121305???????? NA?????? 1116.7 6 ?HM001 201212121306???????? NA?????? 1116.7 7? HM001 201212121307???????? NA?????? 1116.7 8? HM001 201212121308???????? NA?????????? NA 9? HM001 201212121309???????? NA?????????? NA 10 HM001 201212121310???????? NA?????????? NA 11 HM001 201212121311???????? NA?????????? NA 12 HM001 201212121312???????? NA?????????? NA 13 HM001 201212121313???????? NA?????????? NA 14 HM001 201212121314???????? NA?????????? NA 15 HM001 201212121315???????? NA?????????? NA **What I expect (row8 WIDTH=NA, HEIGHT=NA)**? ID??????? CTIME? WIDTH HEIGHT 1? HM001 201212121301 1201.9 1115.5 2? HM001 201212121302 1202.2 1115.8 3? HM001 201212121303 1202.8 1115.8 4? HM001 201212121304???? NA 1116.1 5? HM001 201212121305 1203.9 1116.7 6? HM001 201212121306???? NA 1116.7 7? HM001 201212121307???? NA 1116.7 8? HM001 201212121308???? NA???? NA 9? HM001 201212121309 1206.0 1118.2 10 HM001 201212121310 1206.3 1118.6 11 HM001 201212121311 1206.5 1118.8 12 HM001 201212121312???? NA???? NA 13 HM001 201212121313 1207.3???? NA 14 HM001 201212121314 1207.9 1121.1 15 HM001 201212121315 1208.4 1121.3 -----Original Message----- From: "arun"<smartpink111>@yahoo.com> To: "???"<jamansymptom>@naver.com>; Cc: Sent: 2013-01-29 (?) 15:23:56 Subject: Re: I succeed to get result dataset. HI, I am glad that it got fixed. You can ask for help. Thank you for the kind words. Good night! Arun?????????????????????????????????????????????????????????