Colleagues, R 2.9.0 on all platforms I have a dataset that contains three columns of interest: ID's, serial elapsed times, and a marker. Representative data: Subject Time Marker 1 100.5 0 1 101 0 1 102 1 1 103 0 1 105 0 For each subject, I would like to find the time associated with MARKER == 1, then replace Time with Time - (Time[Marker == 1]) The result for this subject would be: Subject Time Marker 1 -1.5 0 1 -1 0 1 0 1 1 1 0 1 3 0 One proviso: some subjects do not have Marker == 1; for these subjects, I would like Time to remain unchanged. At present, I am looping over each subject. The number of subjects is large so this process is quite slow. I assume that one of the apply functions could speed this markedly but I am not facile with them. Any help would be appreciated. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com
Dennis Fisher wrote:> > Colleagues, > > R 2.9.0 on all platforms > > I have a dataset that contains three columns of interest: ID's, serial > elapsed times, and a marker. Representative data: > Subject Time Marker > 1 100.5 0 > 1 101 0 > 1 102 1 > 1 103 0 > 1 105 0 > > For each subject, I would like to find the time associated with MARKER > == 1, then replace Time with Time - (Time[Marker == 1]) > The result for this subject would be: > Subject Time Marker > 1 -1.5 0 > 1 -1 0 > 1 0 1 > 1 1 0 > 1 3 0 > > One proviso: some subjects do not have Marker == 1; for these > subjects, I would like Time to remain unchanged. > > At present, I am looping over each subject. The number of subjects is > large so this process is quite slow. I assume that one of the apply > functions could speed this markedly but I am not facile with them. > Any help would be appreciated. > > Dennis >The best way to approach this problem would probably be to use a function like by() which splits your data into subsets and then executes a function on each subset. The function would search the subset for the marker time. If it exists, the marker time would be subtracted from the Time column. If not, no action would be taken. Instead of the by() function, which is in the base R package, I will use ddply() from the plyr package. by() would return a list of sub-dataframes, one for each subject. Among other things, ddply will reassmble this list into one data frame. Assuming your data.frame is called "testData": # Define a function that will adjust the times for a subject. adjustTimes <- function( subjectData ){ # Find the row containing the marker value 1. markerLocation <- match( 1, subjectData[[ "Marker" ]] ) # If the marker was not present, markerLocation will have the value "NA" # and we skip the following block of code. if( !is.na( markerLocation ) ){ markerTime <- subjectData[[ 'Time' ]][ markerLocation ] subjectData[[ 'Time' ]] <- subjectData[[ 'Time' ]] - markerTime } return( subjectData ) } require( plyr ) # Split the testData by subject, apply the adjustTimes function to # each subset and then recombine the results back into a data frame. processedData <- ddply( testData, 'Subject', adjustTimes ) Hope this helps! -Charlie -- View this message in context: http://n4.nabble.com/Use-of-apply-rather-than-a-loop-tp948941p948957.html Sent from the R help mailing list archive at Nabble.com.
You could try using merge: > d<-data.frame(Subject=rep(11:13,each=3),Time=101:109,Marker=c(0,1,0, 0,0,0, 0,0,1)) > d Subject Time Marker 1 11 101 0 2 11 102 1 3 11 103 0 4 12 104 0 5 12 105 0 6 12 106 0 7 13 107 0 8 13 108 0 9 13 109 1 > d$Time - merge(d,d[d$Marker==1,],by="Subject",all.x=TRUE)$Time.y [1] -1 0 1 NA NA NA -2 -1 0 If you want the reference times for Subjects without a marked instance to be 0, replace the NA's in Time.y by 0: > NAtoZero<-function(x){ x[is.na(x)]<-0 ; x } > d$Time - NAtoZero(merge(d,d[d$Marker==1,],by="Subject",all.x=TRUE)$Time.y) [1] -1 0 1 104 105 106 -2 -1 0 If there is more than one marked instance for a given subject this method will fail. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Dennis Fisher > Sent: Friday, December 04, 2009 2:47 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Use of apply rather than a loop > > Colleagues, > > R 2.9.0 on all platforms > > I have a dataset that contains three columns of interest: > ID's, serial > elapsed times, and a marker. Representative data: > Subject Time Marker > 1 100.5 0 > 1 101 0 > 1 102 1 > 1 103 0 > 1 105 0 > > For each subject, I would like to find the time associated > with MARKER > == 1, then replace Time with Time - (Time[Marker == 1]) > The result for this subject would be: > Subject Time Marker > 1 -1.5 0 > 1 -1 0 > 1 0 1 > 1 1 0 > 1 3 0 > > One proviso: some subjects do not have Marker == 1; for these > subjects, I would like Time to remain unchanged. > > At present, I am looping over each subject. The number of > subjects is > large so this process is quite slow. I assume that one of the apply > functions could speed this markedly but I am not facile with them. > Any help would be appreciated. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Try this: transform(DF, Time = ave(1:nrow(DF), Subject, FUN = function(ix) if (any(Marker[ix] == 1)) Time - Time[Marker == 1] else Time)) On Fri, Dec 4, 2009 at 5:47 PM, Dennis Fisher <fisher@plessthan.com> wrote:> Colleagues, > > R 2.9.0 on all platforms > > I have a dataset that contains three columns of interest: ID's, serial > elapsed times, and a marker. Representative data: > Subject Time Marker > 1 100.5 0 > 1 101 0 > 1 102 1 > 1 103 0 > 1 105 0 > > For each subject, I would like to find the time associated with MARKER => 1, then replace Time with Time - (Time[Marker == 1]) > The result for this subject would be: > Subject Time Marker > 1 -1.5 0 > 1 -1 0 > 1 0 1 > 1 1 0 > 1 3 0 > > One proviso: some subjects do not have Marker == 1; for these subjects, I > would like Time to remain unchanged. > > At present, I am looping over each subject. The number of subjects is > large so this process is quite slow. I assume that one of the apply > functions could speed this markedly but I am not facile with them. Any help > would be appreciated. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]