I have a character string that represents a time duration. It has an hours minutes seconds structure(ish) but with letters denoting units (H,M or S) no leading zeros and no placeholder at all where one or other of the units are not required. It looks like this: t<-c("10H20M33S","1H1M","1M","21M9S","2H55S" )) df<-data.frame(t) df #ideally should look like: t2<-c("10:20:33","01:00:01","00:01:00","00:21:09","02:00:55") df2<-data.frame(t2) df2 I need to get it into hours minutes and seconds either in time format or as a string with leading zeros and all three time units represented in each one, as in df2. The data, part of a very large dataset, are for onward use and processing in a GIS application. I?ve messed about with string handling statements in SQL to no avail, but wondered if R would be a better bet? I?ve had a look at some of the commands in stringr, but am unsure how to operationalise a solution using this package. Any advice is welcome. -- View this message in context: http://r.789695.n4.nabble.com/A-problem-with-string-handling-to-make-a-time-duration-tp4706795.html Sent from the R help mailing list archive at Nabble.com.
John Laing
2015-May-04 21:14 UTC
[R] A problem with string handling to make a time duration
Regular expressions are the tool for this problem. This pattern matches your input data: t <- c("10H20M33S", "1H1M", "1M", "21M9S", "2H55S") patt <- "^(([0-9]+)H)?(([0-9]+)M)?(([0-9]+)S)?$" all(grepl(patt, t)) # TRUE We can use the pattern to extract hour/minute/second components hms <- lapply(c(h="\\2", m="\\4", s="\\6"), function(r) sub(patt, r, t)) And then just plug those components back into the desired format formatted <- gsub(" ", "0", sprintf("%2s:%2s:%2s", hms$h, hms$m, hms$s)) In the last line we need the gsub because zero-padding with %02s seems to be platform-dependent. JL On Mon, May 4, 2015 at 3:59 PM, gavinr <g.rudge at bham.ac.uk> wrote:> I have a character string that represents a time duration. It has an hours > minutes seconds structure(ish) but with letters denoting units (H,M or S) no > leading zeros and no placeholder at all where one or other of the units are > not required. > > It looks like this: > > t<-c("10H20M33S","1H1M","1M","21M9S","2H55S" )) > df<-data.frame(t) > df > > #ideally should look like: > t2<-c("10:20:33","01:00:01","00:01:00","00:21:09","02:00:55") > df2<-data.frame(t2) > df2 > > I need to get it into hours minutes and seconds either in time format or as > a string with leading zeros and all three time units represented in each > one, as in df2. The data, part of a very large dataset, are for onward use > and processing in a GIS application. I?ve messed about with string handling > statements in SQL to no avail, but wondered if R would be a better bet? > I?ve had a look at some of the commands in stringr, but am unsure how to > operationalise a solution using this package. Any advice is welcome. > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/A-problem-with-string-handling-to-make-a-time-duration-tp4706795.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Franklin Bretschneider
2015-May-05 10:26 UTC
[R] A problem with string handling to make a time duration
Hello gavinr,> I have a character string that represents a time duration. It has an hours > minutes seconds structure(ish) but with letters denoting units (H,M or S) no > leading zeros and no placeholder at all where one or other of the units are > not required. > > It looks like this: > > t<-c("10H20M33S","1H1M","1M","21M9S","2H55S" )) > df<-data.frame(t) > df > > #ideally should look like: > t2<-c("10:20:33","01:00:01","00:01:00","00:21:09","02:00:55") > df2<-data.frame(t2) > df2 > > I need to get it into hours minutes and seconds either in time format or as > a string with leading zeros and all three time units represented in each > one, as in df2. The data, part of a very large dataset, are for onward use > and processing in a GIS application. I?ve messed about with string handling > statements in SQL to no avail, but wondered if R would be a better bet? > I?ve had a look at some of the commands in stringr, but am unsure how to > operationalise a solution using this package. Any advice is welcome. >This can be done easily with the substring function, e.g. # say: string="12H15M45S" #then pick: h=substr(string,1,2) m=substr(string,4,5) # and join again: newstr = paste(h,m,sep=":") # etcetera Success and Best regards, Frank -- Franklin Bretschneider Dept of Biology Utrecht University bretschr at xs4all.nl
Thanks guys. The first solution with the gsub / lapply works perfectly. The solution using substrings would work if the times were in a consistent format, but without the leading zeros and with some parts of the string absent completely it would need some extra logic to apply. I need something to automate over a data set with a million or so time points in it. -- View this message in context: http://r.789695.n4.nabble.com/A-problem-with-string-handling-to-make-a-time-duration-tp4706795p4706822.html Sent from the R help mailing list archive at Nabble.com.