thr3ads.net - R help - [R] Multiple assignment to several columns in dataset [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Alexandre Karev

2013-Apr-28 11:48 UTC

[R] Multiple assignment to several columns in dataset

Hello!

I've time stamp ('time') field in dataset ('dt') with values
like "18:10",
"19:43", ....
I need to split time field into hour and minutes and add both as new
columns to dataset.
We are able to do it in bash+awk, but curious to stay within R codebase as
much as possible.

For now we are using such solution:

 tstamp <- strsplit(dt$time, ":")

# constructing hours field
 dt$hr  <- lapply(tstamp, function(v) {v[1] } )

# constructing minutes field
 dt$m   <- lapply(tstamp, function(v) {v[2] } )

It works find on sample (and simple, small) data set.

But while working on real data with several millions of records, it seems
not very practical to make two separate passes on tstamp list.

We've tried to use instead such construction:

dt[c('hr', 'm')] <- strsplit(dt$time, ":")

But the R environment 'consumes' whole system 'memory' - 8Gb,
and starts to
swapping while proceeding this statement and 'hangs' for such long time
that we have never had patience to wait for results.

Is it any simple and efficient way to assign several dataset columns with
values computed/prepared on base of set of other columns?


R-egards,
Alex

	[[alternative HTML version deleted]]

Uwe Ligges

2013-Apr-28 17:21 UTC

head link

[R] Multiple assignment to several columns in dataset

See ?strptime on how to handle time formats.


If you want to stay playing with strsplit: It actually returns a list, 
hence you probably want to:

  dt$hr  <- sapply(tstamp, "[", 1)

Uwe Ligges




On 28.04.2013 13:48, Alexandre Karev wrote:> Hello!
>
> I've time stamp ('time') field in dataset ('dt') with
values like "18:10",
> "19:43", ....
> I need to split time field into hour and minutes and add both as new
> columns to dataset.
> We are able to do it in bash+awk, but curious to stay within R codebase as
> much as possible.
>
> For now we are using such solution:
>
>   tstamp <- strsplit(dt$time, ":")
>
> # constructing hours field
>   dt$hr  <- lapply(tstamp, function(v) {v[1] } )
>
> # constructing minutes field
>   dt$m   <- lapply(tstamp, function(v) {v[2] } )
>
> It works find on sample (and simple, small) data set.
>
> But while working on real data with several millions of records, it seems
> not very practical to make two separate passes on tstamp list.
>
> We've tried to use instead such construction:
>
> dt[c('hr', 'm')] <- strsplit(dt$time, ":")
>
> But the R environment 'consumes' whole system 'memory' -
8Gb, and starts to
> swapping while proceeding this statement and 'hangs' for such long
time
> that we have never had patience to wait for results.
>
> Is it any simple and efficient way to assign several dataset columns with
> values computed/prepared on base of set of other columns?
>
>
> R-egards,
> Alex
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Rui Barradas

2013-Apr-28 17:54 UTC

head link

[R] Multiple assignment to several columns in dataset

Hello,

See if the following does what you need.


hours <- function(x, format = "%H:%M"){
	as.integer(format(strptime(x, format = format), "%H"))
}
minutes <- function(x, format = "%H:%M"){
	as.integer(format(strptime(x, format = format), "%M"))
}

x <- c("18:10", "19:43")
hours(x)
minutes(x)


Hope this helps,

Rui Barradas

Em 28-04-2013 12:48, Alexandre Karev escreveu:> Hello!
>
> I've time stamp ('time') field in dataset ('dt') with
values like "18:10",
> "19:43", ....
> I need to split time field into hour and minutes and add both as new
> columns to dataset.
> We are able to do it in bash+awk, but curious to stay within R codebase as
> much as possible.
>
> For now we are using such solution:
>
>   tstamp <- strsplit(dt$time, ":")
>
> # constructing hours field
>   dt$hr  <- lapply(tstamp, function(v) {v[1] } )
>
> # constructing minutes field
>   dt$m   <- lapply(tstamp, function(v) {v[2] } )
>
> It works find on sample (and simple, small) data set.
>
> But while working on real data with several millions of records, it seems
> not very practical to make two separate passes on tstamp list.
>
> We've tried to use instead such construction:
>
> dt[c('hr', 'm')] <- strsplit(dt$time, ":")
>
> But the R environment 'consumes' whole system 'memory' -
8Gb, and starts to
> swapping while proceeding this statement and 'hangs' for such long
time
> that we have never had patience to wait for results.
>
> Is it any simple and efficient way to assign several dataset columns with
> values computed/prepared on base of set of other columns?
>
>
> R-egards,
> Alex
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more reasonably related threads

R help - Apr 2013 - Multiple assignment to several columns in dataset

[R] Multiple assignment to several columns in dataset

[R] Multiple assignment to several columns in dataset

[R] Multiple assignment to several columns in dataset

Seemingly Similar Threads