I?m trying to compare National Weather Service Rapid Update Forecast (RAP) data to GPS breadcrumbs collected by a really clever Apple Phone Ap that lays down longitude, latitude, altitude, compass direction, and speed every six seconds. Below is a small subset of the GPS data from another flight. I want to delete the rows where the balloon does not move (Speed column) for a full minute assuming that it is sitting on the ground ? beginning of the flight, changing passengers, or waiting for the chase crew at the end of the flight. for example, I want to eliminate the data for minute 30 but keep the data for minute 31 because the balloon starts to move again at second 17. Any suggestions? I?ve tried putzing around with multiple lags without success. Minute Second Speed 29 47 0 29 53 0 29 59 0 30 5 0 30 11 0 30 17 0 30 23 0 30 29 0 30 35 0 30 41 0 30 47 0 30 53 0 30 59 0 31 5 0 31 11 0 31 17 0.402649 31 23 0.671081 31 29 1.588225 31 35 2.438261 31 41 2.706693 31 47 2.930386 31 53 3.310666 31 59 3.198819 32 5 3.422512 It would be even better if I could delete the rows where there were ten consecutive zero speed entries such as from minute 30 second 17 to minute 31 second 11. Thanks, Philip Heinrich [[alternative HTML version deleted]]
Hi Philip, Not very elegant, but: phdf<-read.table(text="Minute Second Speed 29 47 0 29 53 0 29 59 0 30 5 0 30 11 0 30 17 0 30 23 0 30 29 0 30 35 0 30 41 0 30 47 0 30 53 0 30 59 0 31 5 0 31 11 0 31 17 0.402649 31 23 0.671081 31 29 1.588225 31 35 2.438261 31 41 2.706693 31 47 2.930386 31 53 3.310666 31 59 3.198819 32 5 3.422512", header=TRUE,stringsAsFactors=FALSE) keep<-rep(TRUE,length(phdf$Speed)) for(mini in unique(phdf$Minute)) if(all(phdf$Speed[phdf$Minute == mini] == 0)) keep[phdf$Minute == mini]<-FALSE phdf<-phdf[keep,] Jim On Sat, Aug 15, 2020 at 6:59 AM Philip <herd_dog at cox.net> wrote:> > I?m trying to compare National Weather Service Rapid Update Forecast (RAP) data to GPS breadcrumbs collected by a really clever Apple Phone Ap that lays down longitude, latitude, altitude, compass direction, and speed every six seconds. Below is a small subset of the GPS data from another flight. > > I want to delete the rows where the balloon does not move (Speed column) for a full minute assuming that it is sitting on the ground ? beginning of the flight, changing passengers, or waiting for the chase crew at the end of the flight. for example, I want to eliminate the data for minute 30 but keep the data for minute 31 because the balloon starts to move again at second 17. Any suggestions? I?ve tried putzing around with multiple lags without success. > > Minute Second Speed > 29 47 0 > 29 53 0 > 29 59 0 > 30 5 0 > 30 11 0 > 30 17 0 > 30 23 0 > 30 29 0 > 30 35 0 > 30 41 0 > 30 47 0 > 30 53 0 > 30 59 0 > 31 5 0 > 31 11 0 > 31 17 0.402649 > 31 23 0.671081 > 31 29 1.588225 > 31 35 2.438261 > 31 41 2.706693 > 31 47 2.930386 > 31 53 3.310666 > 31 59 3.198819 > 32 5 3.422512 > > > It would be even better if I could delete the rows where there were ten consecutive zero speed entries such as from minute 30 second 17 to minute 31 second 11. > > Thanks, > Philip Heinrich > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Well which is it?: "I want to eliminate the data for minute 30 but keep the data for minute 31 because the balloon starts to move again at second 17. " or "It would be even better if I could delete the rows where there were ten consecutive zero speed entries such as from minute 30 second 17 to minute 31 second 11." If you want to delete say data with >= 10 consecutive 0's, ?rle is your friend: rle(phdf$Speed) Run Length Encoding lengths: int [1:10] 15 1 1 1 1 1 1 1 1 1 values : num [1:10] 0 0.402649 0.671081 1.588225 2.438261 2.706693 2.930386 3.310666 3.198819 3.422512 e.g.> z <- rle(phdf$Speed)$lengths## which gives:> z[1] 15 1 1 1 1 1 1 1 1 1> rep(z>9,z)[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [11] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE [21] FALSE FALSE FALSE FALSE ##Thus:> phdf[ rep(z<10, z), ]Minute Second Speed 16 31 17 0.402649 17 31 23 0.671081 18 31 29 1.588225 19 31 35 2.438261 20 31 41 2.706693 21 31 47 2.930386 22 31 53 3.310666 23 31 59 3.198819 24 32 5 3.422512 But of course, this probably isn't what you meant. So further clarification is needed. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Aug 14, 2020 at 1:59 PM Philip <herd_dog at cox.net> wrote:> I?m trying to compare National Weather Service Rapid Update Forecast (RAP) > data to GPS breadcrumbs collected by a really clever Apple Phone Ap that > lays down longitude, latitude, altitude, compass direction, and speed every > six seconds. Below is a small subset of the GPS data from another > flight. > > I want to delete the rows where the balloon does not move (Speed column) > for a full minute assuming that it is sitting on the ground ? beginning of > the flight, changing passengers, or waiting for the chase crew at the end > of the flight. for example, I want to eliminate the data for minute 30 but > keep the data for minute 31 because the balloon starts to move again at > second 17. Any suggestions? I?ve tried putzing around with multiple lags > without success. > > Minute Second Speed > 29 47 0 > 29 53 0 > 29 59 0 > 30 5 0 > 30 11 0 > 30 17 0 > 30 23 0 > 30 29 0 > 30 35 0 > 30 41 0 > 30 47 0 > 30 53 0 > 30 59 0 > 31 5 0 > 31 11 0 > 31 17 0.402649 > 31 23 0.671081 > 31 29 1.588225 > 31 35 2.438261 > 31 41 2.706693 > 31 47 2.930386 > 31 53 3.310666 > 31 59 3.198819 > 32 5 3.422512 > > > It would be even better if I could delete the rows where there were ten > consecutive zero speed entries such as from minute 30 second 17 to minute > 31 second 11. > > Thanks, > Philip Heinrich > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 2020-08-14 13:58 -0700, Philip wrote: | I?m trying to compare National Weather | Service Rapid Update Forecast (RAP) | data to GPS breadcrumbs collected by a | really clever Apple Phone Ap that lays | down longitude, latitude, altitude, | compass direction, and speed every six | seconds. Below is a small subset of | the GPS data from another flight. | | I want to delete the rows where the | balloon does not move (Speed column) | for a full minute assuming that it is | sitting on the ground ? beginning of | the flight, changing passengers, or | waiting for the chase crew at the end | of the flight. for example, I want to | eliminate the data for minute 30 but | keep the data for minute 31 because | the balloon starts to move again at | second 17. Any suggestions? I?ve | tried putzing around with multiple | lags without success. | | Minute Second Speed | [...] | | It would be even better if I could | delete the rows where there were ten | consecutive zero speed entries such as | from minute 30 second 17 to minute 31 | second 11. Dear Philip, first I though about solving this using some combination of unique, duplicated, table ... then I saw Jim's reply, and rewrote it a little: rap <- structure(list(Minute = c(29L, 29L, 29L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 32L), Second = c(47L, 53L, 59L, 5L, 11L, 17L, 23L, 29L, 35L, 41L, 47L, 53L, 59L, 5L, 11L, 17L, 23L, 29L, 35L, 41L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 43L, 47L, 53L, 54L, 54L, 54L, 54L, 54L, 54L, 59L, 5L), Speed = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.402649, 0.671081, 1.588225, 2.438261, 2.706693, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.930386, 3.310666, 0, 0, 0, 0, 0, 0, 3.198819, 3.422512)), class = "data.frame", row.names = c(NA, -42L)) minis <- unique(rap$Minute) FUN <- function(mini, rap) { all(rap$Speed[rap$Minute==mini]==0) } keep <- rap$Minute %in% minis[!simplify2array( parallel::mclapply(minis, FUN, rap))] rap[keep,] As to Bert's reply, I am a loss as how to use the lengths list in rle(rap$Speed) for this ... Best, Rasmus -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200815/b24cec39/attachment.sig>
On 2020-08-14 15:56 -0700, Bert Gunter wrote: | On Fri, Aug 14, 2020 at 3:21 PM Rasmus Liland wrote: | | | | As to Bert's reply, I am a loss as | | how to use the lengths list in | | rle(rap$Speed) for this ... | | I showed how in my message for one | interpretation of the query. I would | need further clarification for other | interpretations, but note that in my | reply, rep(z|9, z) gives a logical | vector in which all rows in which a | speed of 0 appears in a run of 10 or | greater (as no other speed would be | replicated) is TRUE, FALSE otherwise. | ( z = $lengths). I would agree that | whether this is a good starting point | for other interpretations remains to | be seen. Yes, the time cols might be omitted to complete this. z <- rle(rap$Speed)$lengths rap[rep(x=z<10, times=z),] This is not bad. /Rasmus P.S. Adding this back to the list (this ends up in a searchable archive, and so on and so fourth). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200815/bbbfbc2e/attachment.sig>
Your suggestion worked like a charm. Thank you rle <- rle(TF$Speed)$lengths #Counts number of repetitions phdf$rle <- rep(rle>9,rle) #TRUE is ten or more zeros. FALSE if less than 10 phdf2 <- TF[rep(rle<10,rle),] #Move rows that are FALSE to new data file From: Bert Gunter Sent: Friday, August 14, 2020 2:43 PM To: Philip Cc: r-help Subject: Re: [R] Hot Air Balloon Weather Briefings Well which is it?: "I want to eliminate the data for minute 30 but keep the data for minute 31 because the balloon starts to move again at second 17. " or "It would be even better if I could delete the rows where there were ten consecutive zero speed entries such as from minute 30 second 17 to minute 31 second 11." If you want to delete say data with >= 10 consecutive 0's, ?rle is your friend: rle(phdf$Speed) Run Length Encoding lengths: int [1:10] 15 1 1 1 1 1 1 1 1 1 values : num [1:10] 0 0.402649 0.671081 1.588225 2.438261 2.706693 2.930386 3.310666 3.198819 3.422512 e.g.> z <- rle(phdf$Speed)$lengths## which gives:> z[1] 15 1 1 1 1 1 1 1 1 1> rep(z>9,z)[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [11] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE [21] FALSE FALSE FALSE FALSE ##Thus:> phdf[ rep(z<10, z), ]Minute Second Speed 16 31 17 0.402649 17 31 23 0.671081 18 31 29 1.588225 19 31 35 2.438261 20 31 41 2.706693 21 31 47 2.930386 22 31 53 3.310666 23 31 59 3.198819 24 32 5 3.422512 But of course, this probably isn't what you meant. So further clarification is needed. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Aug 14, 2020 at 1:59 PM Philip <herd_dog at cox.net> wrote: I?m trying to compare National Weather Service Rapid Update Forecast (RAP) data to GPS breadcrumbs collected by a really clever Apple Phone Ap that lays down longitude, latitude, altitude, compass direction, and speed every six seconds. Below is a small subset of the GPS data from another flight. I want to delete the rows where the balloon does not move (Speed column) for a full minute assuming that it is sitting on the ground ? beginning of the flight, changing passengers, or waiting for the chase crew at the end of the flight. for example, I want to eliminate the data for minute 30 but keep the data for minute 31 because the balloon starts to move again at second 17. Any suggestions? I?ve tried putzing around with multiple lags without success. Minute Second Speed 29 47 0 29 53 0 29 59 0 30 5 0 30 11 0 30 17 0 30 23 0 30 29 0 30 35 0 30 41 0 30 47 0 30 53 0 30 59 0 31 5 0 31 11 0 31 17 0.402649 31 23 0.671081 31 29 1.588225 31 35 2.438261 31 41 2.706693 31 47 2.930386 31 53 3.310666 31 59 3.198819 32 5 3.422512 It would be even better if I could delete the rows where there were ten consecutive zero speed entries such as from minute 30 second 17 to minute 31 second 11. Thanks, Philip Heinrich [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hi Philip, My fault for assuming that what worked for the sample data would work for the entire data set. If you run the following code: # read the file into a data frame phdf<-read.csv("phdf.csv",stringsAsFactors=FALSE) print(dim(phdf)) # create a logical variable for the subsetting step keep<-rep(TRUE,length(phdf$Speed)) # this follows the conventional for(i in ...) syntax # mini (minute index) is the name of the variable # that is assigned the successive values of the # unique values in the Minute column of phdf # here I was lazy and only dealt with the sample data # what I should have done was to create a column # with minute values that don't repeat phdf$hourmin<-phdf$Hour.Z * 60 + phdf$Minute for(mini in unique(phdf$hourmin)) { # mark all minutes that are all zeros for deletion if(all(phdf$Speed[phdf$hourmin == mini] == 0)) keep[phdf$hourmin == mini]<-FALSE # but now there is another condition # that I didn't notice (more laziness) # drop minutes containing ten consecutive zeros # I'll use a cheap trick for this # if the length of the run length encoding (rle) # of the Speed == 0 condition is less than three, # then there can't be a broken run of zeros if(length(rle(phdf$Speed[phdf$hourmin == mini])$length)<3) keep[phdf$hourmin == mini]<-FALSE } # now drop any rows for which has marked FALSE (note all caps!) phdf<-phdf[keep,] print(dim(phdf)) You will see that it has removed 67 rows. This is the same as if I had only applied the first "all zeros" condition, for there were no unique minutes with 11 observations that contained a single non-zero speed value. I can see that you are getting short runs of zero speeds near the end. I assume that this is due to the balloon slowly bumping along the ground. Often just looking at the data can suggest solutions to problems like this. Coincidentally I am about to email an old friend of mine whose sons have dabbled in sending balloons high into the air and I will let them know that they are not alone in performing this unusual practice. If I haven't answered all your questions, feel free to let me know. Jim On Sun, Aug 16, 2020 at 4:39 AM Philip <herd_dog at cox.net> wrote:> > Thanks for getting back to me so quickly. > > I can get your code to run without errors but I'm not sure what it > accomplishes since I still get 813 rows of data and 11 variables. The > entire file for a January flight is attached. Also attached is a .jpg of a > flight from a couple of years ago where my wife putzed back and forth across > a road for over an hour by going up or down to catch different winds. Stuff > like this is one of the charms of the sport for those of us who are easily > amused. > > keep <- rep(TRUE,length(phdf$Speed)) #813 repetitions of TRUE > for(mini in unique(phdf$Minute)) > if(all(phdf$Speed[phdf$Minute==mini]==0)) > keep[phdf$Minute==mini]<-False #813 repetitions of > TRUE in the keep data file. > > #Don't understand the assignment (<-) to FALSE > phdf <- phdf[keep,] #Still have 813 rows of data. > > As you may know, we lay out the balloon and then blow cold air into the > envelope with a gas powered fan. When it is packed we "go hot". Just > before we turn on the pilot light and hit the burner we will turn on the > tracking software which results in several minutes of no movement at the > beginning of the flight. This is what is happening between rows 2 and 70 of > the attached spreadsheet. You will also notice that the balloon slowly > accelerates between rows 71 and 79 p to about 3.4 MPH just after liftoff. > The no movement is what I want to eliminate. > > Can you let me know what I am missing. > > Two more thing. I really not sure about the: > > for(mini in unique(phdf$Minute)) > > .....line of code. I understand the unique function but not the "mini > in..." part. Is mini a function or just a label? I understand stuff like: > > for(i in 1:5) print(1:i) > > .....from the R base documentation but not real sure how it fits in here. > > And finally, I'm retired, So I have plenty of time and determined to learn > R. But I keep running into things like the "mini in unique" command. I > have read four or five books and watched or read dozens of tutorials but > there always seems to be another layer that alludes me. Any suggestions? > > Philip