Hey guys. Learning R after gaining a background in Python this year, and I'm translating my Python projects to R now. This is the first time I'm posting to the mailing list. Essentially, I have 92 data points in one double that I created from a netcdf file. Those 92 data points represent a measurement from a location over the course of three months. From there I've calculated the median value of the data, which is 8.534281. My goal here is to test each data point to see if it exceeds the median value, and, next, calculate the longest streak of days in the 92-day span in which the data exceeded the median value. To achieve this in Python, I've created the following function, where q is a vector of length 92 and q_JJA_median is, of course, the median value of the dataset. ###################################### def consecutive(q, q_JJA_median): is_consecutive = 0 n = 0 array_exceed = [] for i in range(len(q)): if q[i] > q_JJA_median: n+=1 is_consecutive = 1 else: if is_consecutive: array_exceed.append(n) n = 0 is_consecutive = 0 if q[i] > q_JJA_median: array_exceed.append(n) if len(array_exceed) == 0: array_exceed = 0 if type(array_exceed) is int: array_exceed = [0] return array_exceed ####################################### Here is my work thus far written for R: ####################################### is_consecutive = 0 n = 0 array_exceed <- c() for (i in q) { if (i > q_JJA_median) { n = n + 1 is_consecutive = is_consecutive + 1 } else { if (is_consecutive) { append(array_exceed, n) n <- 0 is_consecutive <- 0 } } if (i > q_JJA_median) { append(array_exceed,n) } } ####################################### My code written for R has been changed and manipulated many times, and none of my attempts have been successful. I'm still new to the syntax of the R language, so my problem very well could be a product of my lack of experience. Additionally, I read on a forum post that using the append function can be slower than many other options. Is this true? If so, how can I circumvent that issue? Thank you!
Hi Nick, I think you want to get the maximum run length: jja<-runif(92,0,16) med_jja<-median(jja) med_jja [1] 7.428935 # get a logical vector of values greater than the median jja_plus_med<-jja > med_jja # now get the length of runs runs_jja_plus_med<-rle(jja_plus_med) # finally find the maximum run length max(runs_jja_plus_med$lengths) [1] 5 Jim On Mon, Jun 6, 2016 at 5:51 AM, Nick Tulli <nick.tulli.95 at gmail.com> wrote:> Hey guys. Learning R after gaining a background in Python this year, > and I'm translating my Python projects to R now. This is the first > time I'm posting to the mailing list. > > Essentially, I have 92 data points in one double that I created from a > netcdf file. Those 92 data points represent a measurement from a > location over the course of three months. From there I've calculated > the median value of the data, which is 8.534281. My goal here is to > test each data point to see if it exceeds the median value, and, next, > calculate the longest streak of days in the 92-day span in which the > data exceeded the median value. > > To achieve this in Python, I've created the following function, where > q is a vector of length 92 and q_JJA_median is, of course, the median > value of the dataset. > ###################################### > def consecutive(q, q_JJA_median): > is_consecutive = 0 > n = 0 > array_exceed = [] > for i in range(len(q)): > if q[i] > q_JJA_median: > n+=1 > is_consecutive = 1 > else: > if is_consecutive: > array_exceed.append(n) > n = 0 > is_consecutive = 0 > if q[i] > q_JJA_median: > array_exceed.append(n) > if len(array_exceed) == 0: > array_exceed = 0 > if type(array_exceed) is int: > array_exceed = [0] > return array_exceed > ####################################### > > Here is my work thus far written for R: > ####################################### > is_consecutive = 0 > n = 0 > array_exceed <- c() > for (i in q) { > if (i > q_JJA_median) { > n = n + 1 > is_consecutive = is_consecutive + 1 > } > else { > if (is_consecutive) { > append(array_exceed, n) > n <- 0 > is_consecutive <- 0 > } > } > if (i > q_JJA_median) { > append(array_exceed,n) > } > } > ####################################### > > My code written for R has been changed and manipulated many times, and > none of my attempts have been successful. I'm still new to the syntax > of the R language, so my problem very well could be a product of my > lack of experience. > > Additionally, I read on a forum post that using the append function > can be slower than many other options. Is this true? If so, how can I > circumvent that issue? > > Thank you! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Yes, see ?rle, as Jim indicated. Just wanted to add that there is an rpy2 package that enables you to use R within python, which may mean that you do not need to translate your python code. Or at least not all of it. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Jun 5, 2016 at 7:32 PM, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Nick, > I think you want to get the maximum run length: > > jja<-runif(92,0,16) > med_jja<-median(jja) > med_jja > [1] 7.428935 > # get a logical vector of values greater than the median > jja_plus_med<-jja > med_jja > # now get the length of runs > runs_jja_plus_med<-rle(jja_plus_med) > # finally find the maximum run length > max(runs_jja_plus_med$lengths) > [1] 5 > > Jim > > > On Mon, Jun 6, 2016 at 5:51 AM, Nick Tulli <nick.tulli.95 at gmail.com> wrote: >> Hey guys. Learning R after gaining a background in Python this year, >> and I'm translating my Python projects to R now. This is the first >> time I'm posting to the mailing list. >> >> Essentially, I have 92 data points in one double that I created from a >> netcdf file. Those 92 data points represent a measurement from a >> location over the course of three months. From there I've calculated >> the median value of the data, which is 8.534281. My goal here is to >> test each data point to see if it exceeds the median value, and, next, >> calculate the longest streak of days in the 92-day span in which the >> data exceeded the median value. >> >> To achieve this in Python, I've created the following function, where >> q is a vector of length 92 and q_JJA_median is, of course, the median >> value of the dataset. >> ###################################### >> def consecutive(q, q_JJA_median): >> is_consecutive = 0 >> n = 0 >> array_exceed = [] >> for i in range(len(q)): >> if q[i] > q_JJA_median: >> n+=1 >> is_consecutive = 1 >> else: >> if is_consecutive: >> array_exceed.append(n) >> n = 0 >> is_consecutive = 0 >> if q[i] > q_JJA_median: >> array_exceed.append(n) >> if len(array_exceed) == 0: >> array_exceed = 0 >> if type(array_exceed) is int: >> array_exceed = [0] >> return array_exceed >> ####################################### >> >> Here is my work thus far written for R: >> ####################################### >> is_consecutive = 0 >> n = 0 >> array_exceed <- c() >> for (i in q) { >> if (i > q_JJA_median) { >> n = n + 1 >> is_consecutive = is_consecutive + 1 >> } >> else { >> if (is_consecutive) { >> append(array_exceed, n) >> n <- 0 >> is_consecutive <- 0 >> } >> } >> if (i > q_JJA_median) { >> append(array_exceed,n) >> } >> } >> ####################################### >> >> My code written for R has been changed and manipulated many times, and >> none of my attempts have been successful. I'm still new to the syntax >> of the R language, so my problem very well could be a product of my >> lack of experience. >> >> Additionally, I read on a forum post that using the append function >> can be slower than many other options. Is this true? If so, how can I >> circumvent that issue? >> >> Thank you! >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.