Hi, I am working with a dataset for sometime and I need some help in parsing some data. There is a column called "Duration" which has data like following: 2 minutes => 120 2 min => 120 10 seconds =>10 2 hrs =>7200 2-3 minutes => 150 or 120 5 minutes (when i arrived => 300 Flyby approx 20 sec. => 20 felt like 10 mins but tim => 600 I need to convert them to numerics as given. Any help in this regard will be highly appreciated. Thanks Susanta [[alternative HTML version deleted]]
On Tue, Oct 26, 2010 at 3:28 PM, Susanta Mohapatra <mohapatra.susanta at gmail.com> wrote:> Hi, > > I am working with a dataset for sometime and I need some help in parsing > some data. > > There is a column called "Duration" which has data like following: > > 2 minutes => 120 > 2 min => 120 > 10 seconds =>10 > 2 hrs =>7200 > ?2-3 minutes => 150 or 120 > 5 minutes (when i arrived => 300 > Flyby approx 20 sec. => 20 > felt like 10 mins but tim => 600 > > I need to convert them to numerics as given. Any help in this regard will be > highly appreciated.Assuming that "convert to numerics as given" means creating a list of numeric vectors, one per row. # sample input x <- c("2 minutes => 120", "2 min => 120", "10 seconds =>10", "2 hrs =>7200", " 2-3 minutes => 150 or 120", "5 minutes (when i arrived => 300", "Flyby approx 20 sec. => 20", "felt like 10 mins but tim => 600") library(gsubfn) out <- strapply(x, "\\d+", as.numeric) The result looks like this:> str(out)List of 8 $ : num [1:2] 2 120 $ : num [1:2] 2 120 $ : num [1:2] 10 10 $ : num [1:2] 2 7200 $ : num [1:4] 2 3 150 120 $ : num [1:2] 5 300 $ : num [1:2] 20 20 $ : num [1:2] 10 600 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Tue, Oct 26, 2010 at 7:17 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> On Tue, Oct 26, 2010 at 3:28 PM, Susanta Mohapatra > <mohapatra.susanta at gmail.com> wrote: >> Hi, >> >> I am working with a dataset for sometime and I need some help in parsing >> some data. >> >> There is a column called "Duration" which has data like following: >> >> 2 minutes => 120 >> 2 min => 120 >> 10 seconds =>10 >> 2 hrs =>7200 >> ?2-3 minutes => 150 or 120 >> 5 minutes (when i arrived => 300 >> Flyby approx 20 sec. => 20 >> felt like 10 mins but tim => 600 >> >> I need to convert them to numerics as given. Any help in this regard will be >> highly appreciated. > > Assuming that "convert to numerics as given" means creating a list of > numeric vectors, one per row. >or if => was supposed to mean that that is the desired result then try this: f <- function(n1, n2, units) { if (n2 == "" && substr(units, 1, 3) == "sec") n1 else if (n2 == "" && substr(units, 1, 3) == "min") paste(60 * as.numeric(n1)) else if (n2 == "" && substr(units, 1, 3) == "hrs") paste(3600 * as.numeric(n1)) else if (n2 != "" && substr(units, 1, 3) == "sec") paste(n1, "or", -as.numeric(n2)) else if (n2 != "" && substr(units, 1, 3) == "min") paste(60 * as.numeric(n1), "or", -60 * as.numeric(n2)) else if (n2 != "" && substr(units, 1, 3) == "hrs") paste(3600 * as.numeric(n1), "or", -3660 * as.numeric(n2)) else NA } xx <- c("2 minutes ", "2 min ", "10 seconds ", "2 hrs ", " 2-3 minutes ", "5 minutes (when i arrived ", "Flyby approx 20 sec. ", "felt like 10 mins but tim ") library(gsubfn) out2 <- strapply(xx, "(\\d+)(-\\d+)? (\\S+)", f) The output looks like this:> str(out2)List of 8 $ : chr "120" $ : chr "120" $ : chr "10" $ : chr "7200" $ : chr "120 or 180" $ : chr "300" $ : chr "20" $ : chr "600" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Maybe Matching Threads
- how to make aggregation in R ?
- issue with redirect_to
- Convertir programa Matlab a R sacado de Threshold Models of Collective Behavior de Michèle Lai & Yann Poltera
- same test statistic for t-test with and without equal variance assumption
- writing several command line in R console