Alexandre Karev
2013-Apr-28 11:48 UTC
[R] Multiple assignment to several columns in dataset
Hello! I've time stamp ('time') field in dataset ('dt') with values like "18:10", "19:43", .... I need to split time field into hour and minutes and add both as new columns to dataset. We are able to do it in bash+awk, but curious to stay within R codebase as much as possible. For now we are using such solution: tstamp <- strsplit(dt$time, ":") # constructing hours field dt$hr <- lapply(tstamp, function(v) {v[1] } ) # constructing minutes field dt$m <- lapply(tstamp, function(v) {v[2] } ) It works find on sample (and simple, small) data set. But while working on real data with several millions of records, it seems not very practical to make two separate passes on tstamp list. We've tried to use instead such construction: dt[c('hr', 'm')] <- strsplit(dt$time, ":") But the R environment 'consumes' whole system 'memory' - 8Gb, and starts to swapping while proceeding this statement and 'hangs' for such long time that we have never had patience to wait for results. Is it any simple and efficient way to assign several dataset columns with values computed/prepared on base of set of other columns? R-egards, Alex [[alternative HTML version deleted]]
See ?strptime on how to handle time formats. If you want to stay playing with strsplit: It actually returns a list, hence you probably want to: dt$hr <- sapply(tstamp, "[", 1) Uwe Ligges On 28.04.2013 13:48, Alexandre Karev wrote:> Hello! > > I've time stamp ('time') field in dataset ('dt') with values like "18:10", > "19:43", .... > I need to split time field into hour and minutes and add both as new > columns to dataset. > We are able to do it in bash+awk, but curious to stay within R codebase as > much as possible. > > For now we are using such solution: > > tstamp <- strsplit(dt$time, ":") > > # constructing hours field > dt$hr <- lapply(tstamp, function(v) {v[1] } ) > > # constructing minutes field > dt$m <- lapply(tstamp, function(v) {v[2] } ) > > It works find on sample (and simple, small) data set. > > But while working on real data with several millions of records, it seems > not very practical to make two separate passes on tstamp list. > > We've tried to use instead such construction: > > dt[c('hr', 'm')] <- strsplit(dt$time, ":") > > But the R environment 'consumes' whole system 'memory' - 8Gb, and starts to > swapping while proceeding this statement and 'hangs' for such long time > that we have never had patience to wait for results. > > Is it any simple and efficient way to assign several dataset columns with > values computed/prepared on base of set of other columns? > > > R-egards, > Alex > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello, See if the following does what you need. hours <- function(x, format = "%H:%M"){ as.integer(format(strptime(x, format = format), "%H")) } minutes <- function(x, format = "%H:%M"){ as.integer(format(strptime(x, format = format), "%M")) } x <- c("18:10", "19:43") hours(x) minutes(x) Hope this helps, Rui Barradas Em 28-04-2013 12:48, Alexandre Karev escreveu:> Hello! > > I've time stamp ('time') field in dataset ('dt') with values like "18:10", > "19:43", .... > I need to split time field into hour and minutes and add both as new > columns to dataset. > We are able to do it in bash+awk, but curious to stay within R codebase as > much as possible. > > For now we are using such solution: > > tstamp <- strsplit(dt$time, ":") > > # constructing hours field > dt$hr <- lapply(tstamp, function(v) {v[1] } ) > > # constructing minutes field > dt$m <- lapply(tstamp, function(v) {v[2] } ) > > It works find on sample (and simple, small) data set. > > But while working on real data with several millions of records, it seems > not very practical to make two separate passes on tstamp list. > > We've tried to use instead such construction: > > dt[c('hr', 'm')] <- strsplit(dt$time, ":") > > But the R environment 'consumes' whole system 'memory' - 8Gb, and starts to > swapping while proceeding this statement and 'hangs' for such long time > that we have never had patience to wait for results. > > Is it any simple and efficient way to assign several dataset columns with > values computed/prepared on base of set of other columns? > > > R-egards, > Alex > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- transform(_data,...) using strptime gives an error
- DateTime Math in R - POSIXct
- problem with loop to put data into array with missing data forsome files
- problem with loop to put data into array with missing data for some files
- [PATCH] Replace mkproto.pl with mkproto.awk