Hi Alexandra, The error probably comes from the first iteration of i in 0:23. As indexing in R begins at 1, there is no element 0. Try using: for(i in 1:24) { ... and see what happens. Jim On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena <amc5981 at gmail.com> wrote:> Update: > > I have this so far. * The first column of windHW is the wind speed. > The 5th column of the dataframe, spring, is the 5*sigma value of every > hour. hourRow gives out all the rows of wind speed at a given hour. > > for (i in 0:23){ > hourRow = which(windHW$hour==i,arr.ind=TRUE) > for (h in hourRow){ > if (windHW[h,1]>=spring[spring$hour==i,5]){ > windHW[h,1]<-NA} > } > } > > This then gives the error: Error in if (windHW[h, 1] >> spring[spring$hour == i, 5]) { : argument is of length zero > > *Note: The dataframe for each of the seasons have 24 rows > corresponding to each hour of the day 0:23. > > Thanks, > Alexandra > > > On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena <amc5981 at gmail.com> > wrote: > > Hello, > > > > I have a large dataframe (windHW) of wind speeds (ws) at each hour > > from many days over a set of years. Some of these values are > > obviously wrong (600 m/s) and I want to get rid of all the values that > > are larger than 5*sigma for each hour. The 5*sigma (variable name > > sigma5) values are located in different dataframes for each season, > > with each dataframe titled as a season. For example, in the > > dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. > > > > So my question is as follows: how can I get it so that the code will > > be able to find all the wind speed values in the dataframe, windHW, of > > a specific hour be higher than the 5*sigma value at that hour? > > For example, I would like to find if any of the wind speed values at > > hour 1 are higher than 79.6 m/s, and if so, then replace that value > > with NA. > > > > I have something like this but I can't seem to figure out how to get > > it for specific hours: > > > > windHW$ws[windHW$ws>=spring$sigma5] <- NA > > > > I imported the data using readLines and into the dataframe windHW. I > > also have R version 3.1.1 > > > > Any help would be appreciated! > > > > Thanks, > > Alexandra > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Alexandra Catena
2015-Apr-10 23:24 UTC
[R] Finding values in a dataframe at a specified hour
Hi Jim, Thanks for the response, but unfortunately it results in the same error. I think it is something wrong with the if statement. I tried it out manually for the first row and hour that it's testing and indeed, the wind speed is not higher than the 5*sigma value. Since it is not higher than the 5*sigma value, I would think it would just pass to the next loop, yet it doesn't. I will keep trying! Thanks, Alexandra On Fri, Apr 10, 2015 at 3:43 PM, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Alexandra, > The error probably comes from the first iteration of i in 0:23. As indexing > in R begins at 1, there is no element 0. Try using: > > for(i in 1:24) { > ... > > and see what happens. > > Jim > > > On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena <amc5981 at gmail.com> wrote: >> >> Update: >> >> I have this so far. * The first column of windHW is the wind speed. >> The 5th column of the dataframe, spring, is the 5*sigma value of every >> hour. hourRow gives out all the rows of wind speed at a given hour. >> >> for (i in 0:23){ >> hourRow = which(windHW$hour==i,arr.ind=TRUE) >> for (h in hourRow){ >> if (windHW[h,1]>=spring[spring$hour==i,5]){ >> windHW[h,1]<-NA} >> } >> } >> >> This then gives the error: Error in if (windHW[h, 1] >>> spring[spring$hour == i, 5]) { : argument is of length zero >> >> *Note: The dataframe for each of the seasons have 24 rows >> corresponding to each hour of the day 0:23. >> >> Thanks, >> Alexandra >> >> >> On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena <amc5981 at gmail.com> >> wrote: >> > Hello, >> > >> > I have a large dataframe (windHW) of wind speeds (ws) at each hour >> > from many days over a set of years. Some of these values are >> > obviously wrong (600 m/s) and I want to get rid of all the values that >> > are larger than 5*sigma for each hour. The 5*sigma (variable name >> > sigma5) values are located in different dataframes for each season, >> > with each dataframe titled as a season. For example, in the >> > dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. >> > >> > So my question is as follows: how can I get it so that the code will >> > be able to find all the wind speed values in the dataframe, windHW, of >> > a specific hour be higher than the 5*sigma value at that hour? >> > For example, I would like to find if any of the wind speed values at >> > hour 1 are higher than 79.6 m/s, and if so, then replace that value >> > with NA. >> > >> > I have something like this but I can't seem to figure out how to get >> > it for specific hours: >> > >> > windHW$ws[windHW$ws>=spring$sigma5] <- NA >> > >> > I imported the data using readLines and into the dataframe windHW. I >> > also have R version 3.1.1 >> > >> > Any help would be appreciated! >> > >> > Thanks, >> > Alexandra >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >
Hi Alexandra, I answered too quickly. Your response made me look for a deeper error: The value of i doesn't matter, as it isn't being used as an index. However, the first value of i=0 may cause the error in the second loop, where h is used as an index. for (i in 0:23){ hourRow = which(windHW$hour==i,arr.ind=TRUE) for (h in hourRow){ if (windHW[h+1,1]>=spring[spring$hour==i,5]){ windHW[h+1,1]<-NA} } } Jim On Sat, Apr 11, 2015 at 9:24 AM, Alexandra Catena <amc5981 at gmail.com> wrote:> Hi Jim, > > Thanks for the response, but unfortunately it results in the same > error. I think it is something wrong with the if statement. I tried > it out manually for the first row and hour that it's testing and > indeed, the wind speed is not higher than the 5*sigma value. Since it > is not higher than the 5*sigma value, I would think it would just pass > to the next loop, yet it doesn't. I will keep trying! > > Thanks, > Alexandra > > On Fri, Apr 10, 2015 at 3:43 PM, Jim Lemon <drjimlemon at gmail.com> wrote: > > Hi Alexandra, > > The error probably comes from the first iteration of i in 0:23. As > indexing > > in R begins at 1, there is no element 0. Try using: > > > > for(i in 1:24) { > > ... > > > > and see what happens. > > > > Jim > > > > > > On Sat, Apr 11, 2015 at 7:06 AM, Alexandra Catena <amc5981 at gmail.com> > wrote: > >> > >> Update: > >> > >> I have this so far. * The first column of windHW is the wind speed. > >> The 5th column of the dataframe, spring, is the 5*sigma value of every > >> hour. hourRow gives out all the rows of wind speed at a given hour. > >> > >> for (i in 0:23){ > >> hourRow = which(windHW$hour==i,arr.ind=TRUE) > >> for (h in hourRow){ > >> if (windHW[h,1]>=spring[spring$hour==i,5]){ > >> windHW[h,1]<-NA} > >> } > >> } > >> > >> This then gives the error: Error in if (windHW[h, 1] >> >> spring[spring$hour == i, 5]) { : argument is of length zero > >> > >> *Note: The dataframe for each of the seasons have 24 rows > >> corresponding to each hour of the day 0:23. > >> > >> Thanks, > >> Alexandra > >> > >> > >> On Fri, Apr 10, 2015 at 1:07 PM, Alexandra Catena <amc5981 at gmail.com> > >> wrote: > >> > Hello, > >> > > >> > I have a large dataframe (windHW) of wind speeds (ws) at each hour > >> > from many days over a set of years. Some of these values are > >> > obviously wrong (600 m/s) and I want to get rid of all the values that > >> > are larger than 5*sigma for each hour. The 5*sigma (variable name > >> > sigma5) values are located in different dataframes for each season, > >> > with each dataframe titled as a season. For example, in the > >> > dataframe, spring, the 5*sigma value is 79.6 m/s for hour 1. > >> > > >> > So my question is as follows: how can I get it so that the code will > >> > be able to find all the wind speed values in the dataframe, windHW, of > >> > a specific hour be higher than the 5*sigma value at that hour? > >> > For example, I would like to find if any of the wind speed values at > >> > hour 1 are higher than 79.6 m/s, and if so, then replace that value > >> > with NA. > >> > > >> > I have something like this but I can't seem to figure out how to get > >> > it for specific hours: > >> > > >> > windHW$ws[windHW$ws>=spring$sigma5] <- NA > >> > > >> > I imported the data using readLines and into the dataframe windHW. I > >> > also have R version 3.1.1 > >> > > >> > Any help would be appreciated! > >> > > >> > Thanks, > >> > Alexandra > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > >[[alternative HTML version deleted]]