Dear all R users, I am going to use R to process some of my physiological data about eye. The problem is the recording machine does not sample in a reliably constant rate: the time intervals between data sampled can vary from 9msec to ~120msec, while most around in the 15-30msec range. The below is a fraction of a single data file of a trial: Time CursorX CursorY Pupilsize 1811543 -1 -1 -1 1811563 -1 -1 -1 1811584 511 370 4.175665 1811603 511 368 4.181973 1811624 521 368 4.210732 1811644 512 377 4.149632 1811664 524 377 4.275845 1811684 518 368 4.236212 1811703 516 370 4.238384 1811725 507 364 4.181157 1811744 509 371 4.185016 1811764 509 377 4.231987 1811784 514 387 4.252449 1811802 515 388 4.273726 My goal is to "resample" these data so that the "Time" column increments by a regular interval, and the other columns of data are the averages (or estimates) at the point in time according to available data points. I have done something that I use a regular interval that is larger than the naturally occurring record machine, i.e. > 120msec for example, and acquire an average of the available data points for any particular regular time interval. Now, I need to achieve resampling for smaller regular interval: i.e. 5msec intervals, and interpolate / intrapolate the missing data points from the available ones. i.e. I may have to split up data points into the number of the regular intervals that it may occupied in time. Do you know if there is any package that is doing something similar? And because of the size of the data and computational demand (1500 files each with 2000-8000+ lines), can you suggest me some (algorithmically) more efficient way of doing this? Thanks a lot! Regards, John
On Thu, 11 Dec 2008, tsunhin wong wrote:> Dear all R users, > > I am going to use R to process some of my physiological data about eye. > > The problem is the recording machine does not sample in a reliably > constant rate: the time intervals between data sampled can vary from > 9msec to ~120msec, while most around in the 15-30msec range. > The below is a fraction of a single data file of a trial: > > Time CursorX CursorY Pupilsize > 1811543 -1 -1 -1 > 1811563 -1 -1 -1 > 1811584 511 370 4.175665 > 1811603 511 368 4.181973 > 1811624 521 368 4.210732 > 1811644 512 377 4.149632 > 1811664 524 377 4.275845 > 1811684 518 368 4.236212 > 1811703 516 370 4.238384 > 1811725 507 364 4.181157 > 1811744 509 371 4.185016 > 1811764 509 377 4.231987 > 1811784 514 387 4.252449 > 1811802 515 388 4.273726 > > My goal is to "resample" these data so that the "Time" column > increments by a regular interval, and the other columns of data are > the averages (or estimates) at the point in time according to > available data points.'resample' is the probably wrong word in a forum in which here are many statisticians for whom that word has special meaning. 'interpolate' is more appropriate. Following the _posting guide_, when I do help.search("interpolate") I get a number of useful hits. Like stats::approx Interpolation Functions Read the help page ?approx and try example( approx ) Of course, this is R and there are many other ways to skin this cat like predict( gam( <...> ), newdata = interpolation.points ) from the mgcv package. HTH, Chuck> I have done something that I use a regular interval that is larger > than the naturally occurring record machine, i.e. > 120msec for > example, and acquire an average of the available data points for any > particular regular time interval. > > Now, I need to achieve resampling for smaller regular interval: i.e. > 5msec intervals, and interpolate / intrapolate the missing data points > from the available ones. > i.e. I may have to split up data points into the number of the regular > intervals that it may occupied in time. > > Do you know if there is any package that is doing something similar? > And because of the size of the data and computational demand (1500 > files each with 2000-8000+ lines), can you suggest me some > (algorithmically) more efficient way of doing this? > > Thanks a lot! > > Regards, > > John > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
This has been solved in an earlier post: http://www.nabble.com/linear-interpolation-of-multiple-random-time-series-to11694879.html On Thu, Dec 11, 2008 at 4:38 PM, tsunhin wong <thjwong@gmail.com> wrote:> Dear all R users, > > I am going to use R to process some of my physiological data about eye. > > The problem is the recording machine does not sample in a reliably > constant rate: the time intervals between data sampled can vary from > 9msec to ~120msec, while most around in the 15-30msec range. > The below is a fraction of a single data file of a trial: > > Time CursorX CursorY Pupilsize > 1811543 -1 -1 -1 > 1811563 -1 -1 -1 > 1811584 511 370 4.175665 > 1811603 511 368 4.181973 > 1811624 521 368 4.210732 > 1811644 512 377 4.149632 > 1811664 524 377 4.275845 > 1811684 518 368 4.236212 > 1811703 516 370 4.238384 > 1811725 507 364 4.181157 > 1811744 509 371 4.185016 > 1811764 509 377 4.231987 > 1811784 514 387 4.252449 > 1811802 515 388 4.273726 > > My goal is to "resample" these data so that the "Time" column > increments by a regular interval, and the other columns of data are > the averages (or estimates) at the point in time according to > available data points. > I have done something that I use a regular interval that is larger > than the naturally occurring record machine, i.e. > 120msec for > example, and acquire an average of the available data points for any > particular regular time interval. > > Now, I need to achieve resampling for smaller regular interval: i.e. > 5msec intervals, and interpolate / intrapolate the missing data points > from the available ones. > i.e. I may have to split up data points into the number of the regular > intervals that it may occupied in time. > > Do you know if there is any package that is doing something similar? > And because of the size of the data and computational demand (1500 > files each with 2000-8000+ lines), can you suggest me some > (algorithmically) more efficient way of doing this? > > Thanks a lot! > > Regards, > > John > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Mike Lawrence Graduate Student Department of Psychology Dalhousie University www.thatmike.com Looking to arrange a meeting? Do so at: http://www.timetomeet.info/with/mike/ ~ Certainty is folly... I think. ~ [[alternative HTML version deleted]]
Here some sample code to interpolate your data using zoo: by5 is a sequence for every 5 time units which we merge with z, the original data. Then we use na.approx to replace all NAs with linear interpolations. Lines <- "Time CursorX CursorY Pupilsize 1811543 -1 -1 -1 1811563 -1 -1 -1 1811584 511 370 4.175665 1811603 511 368 4.181973 1811624 521 368 4.210732 1811644 512 377 4.149632 1811664 524 377 4.275845 1811684 518 368 4.236212 1811703 516 370 4.238384 1811725 507 364 4.181157 1811744 509 371 4.185016 1811764 509 377 4.231987 1811784 514 387 4.252449 1811802 515 388 4.273726" library(zoo) z <- read.zoo(textConnection(Lines), header = TRUE) by5 <- seq(5 * floor(start(z)/5), ceiling(5 * ceiling(end(z)/5)), 5) zz <- merge(z, zoo(, as.integer(by5))) zza <- na.approx(zz) On Thu, Dec 11, 2008 at 3:38 PM, tsunhin wong <thjwong at gmail.com> wrote:> Dear all R users, > > I am going to use R to process some of my physiological data about eye. > > The problem is the recording machine does not sample in a reliably > constant rate: the time intervals between data sampled can vary from > 9msec to ~120msec, while most around in the 15-30msec range. > The below is a fraction of a single data file of a trial: > > Time CursorX CursorY Pupilsize > 1811543 -1 -1 -1 > 1811563 -1 -1 -1 > 1811584 511 370 4.175665 > 1811603 511 368 4.181973 > 1811624 521 368 4.210732 > 1811644 512 377 4.149632 > 1811664 524 377 4.275845 > 1811684 518 368 4.236212 > 1811703 516 370 4.238384 > 1811725 507 364 4.181157 > 1811744 509 371 4.185016 > 1811764 509 377 4.231987 > 1811784 514 387 4.252449 > 1811802 515 388 4.273726 > > My goal is to "resample" these data so that the "Time" column > increments by a regular interval, and the other columns of data are > the averages (or estimates) at the point in time according to > available data points. > I have done something that I use a regular interval that is larger > than the naturally occurring record machine, i.e. > 120msec for > example, and acquire an average of the available data points for any > particular regular time interval. > > Now, I need to achieve resampling for smaller regular interval: i.e. > 5msec intervals, and interpolate / intrapolate the missing data points > from the available ones. > i.e. I may have to split up data points into the number of the regular > intervals that it may occupied in time. > > Do you know if there is any package that is doing something similar? > And because of the size of the data and computational demand (1500 > files each with 2000-8000+ lines), can you suggest me some > (algorithmically) more efficient way of doing this? > > Thanks a lot! > > Regards, > > John > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >