Julie Lee-Yaw
2012-Sep-17 23:15 UTC
[R] help with calculation from dataframe with multiple entries per sample
Hi I have a dataframe similar to:>Sample<-c(1,1,1,2,2,2,3,3,3)>Time<-c(1,2,3,1,2,3,1,2,3)>Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)>mydata<-as.data.frame(cbind(Sample,Time,Mass))Sample Time Mass 1 1 1 3.0 2 1 2 3.1 3 1 3 3.4 4 2 1 4.0 5 2 2 4.3 6 2 3 4.4 7 3 1 3.0 8 3 2 3.2 9 3 3 3.5 where for each sample, I've measured mass at different points in time. I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:>mydata #after adding new variableSample Time MassGain2-3 1 1 1 3.00.3 2 1 2 3.1 0.3 3 1 3 3.4 0.3 4 2 1 4.0 0.1 5 2 2 4.3 0.1 6 2 3 4.4 0.1 7 3 1 3.0 0.3 8 3 2 3.2 0.3 9 3 3 3.5 0.3 Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions. Thanks! [[alternative HTML version deleted]]
Phil Spector
2012-Sep-17 23:56 UTC
[R] help with calculation from dataframe with multiple entries per sample
Julie - Since the apply functions operate on one row at a time, they can't do what you want. I think the easiest way to solve your problem is to reshape the data set, and merge it back with the original:> dd = data.frame(Sample=c(1,1,1,2,2,2,3,3,3),+ Time=c(1,2,3,1,2,3,1,2,3), + Mass=c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5))> rdd = reshape(dd,timevar='Time',idvar='Sample',direction='wide') > rdd$"Gain2-3" = rdd$Mass.3 - rdd$Mass.2 > merge(dd,subset(rdd,select=c('Sample',"Gain2-3")))Sample Time Mass Gain2-3 1 1 1 3.0 0.3 2 1 2 3.1 0.3 3 1 3 3.4 0.3 4 2 1 4.0 0.1 5 2 2 4.3 0.1 6 2 3 4.4 0.1 7 3 1 3.0 0.3 8 3 2 3.2 0.3 9 3 3 3.5 0.3 You may want to avoid using special characters like dashes in variable names. Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Mon, 17 Sep 2012, Julie Lee-Yaw wrote:> Hi? > > I have a dataframe similar to: > >> Sample<-c(1,1,1,2,2,2,3,3,3) > >> Time<-c(1,2,3,1,2,3,1,2,3) > >> Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5) > >> mydata<-as.data.frame(cbind(Sample,Time,Mass)) > > > ? Sample Time Mass > 1 ? ? ?1 ? ?1 ?3.0 > 2 ? ? ?1 ? ?2 ?3.1 > 3 ? ? ?1 ? ?3 ?3.4 > 4 ? ? ?2 ? ?1 ?4.0 > 5 ? ? ?2 ? ?2 ?4.3 > 6 ? ? ?2 ? ?3 ?4.4 > 7 ? ? ?3 ? ?1 ?3.0 > 8 ? ? ?3 ? ?2 ?3.2 > 9 ? ? ?3 ? ?3 ?3.5 > > where for each sample, I've measured mass at different points in time.? > > I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting: > >> mydata #after adding new variable > > ? Sample Time MassGain2-3 > 1 ? ? ?1 ? ?1 ?3.00.3 > 2 ? ? ?1 ? ?2 ?3.1 0.3 > 3 ? ? ?1 ? ?3 ?3.4 0.3 > 4 ? ? ?2 ? ?1 ?4.0 0.1 > 5 ? ? ?2 ? ?2 ?4.3 0.1 > 6 ? ? ?2 ? ?3 ?4.4 0.1 > 7 ? ? ?3 ? ?1 ?3.0 0.3 > 8 ? ? ?3 ? ?2 ?3.2 0.3 > 9 ? ? ?3 ? ?3 ?3.5 0.3 > > Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.? > > Thanks! > [[alternative HTML version deleted]] > >
Rui Barradas
2012-Sep-17 23:59 UTC
[R] help with calculation from dataframe with multiple entries per sample
Hello, Try the following. sp <- split(mydata, mydata$Sample) do.call(rbind, lapply(sp, function(x){x$Gain <- x$Mass[3] - x$Mass[2]; x})) Hope this helps, Rui Barradas Em 18-09-2012 00:15, Julie Lee-Yaw escreveu:> Hi > > I have a dataframe similar to: > >> Sample<-c(1,1,1,2,2,2,3,3,3) >> Time<-c(1,2,3,1,2,3,1,2,3) >> Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5) >> mydata<-as.data.frame(cbind(Sample,Time,Mass)) > > Sample Time Mass > 1 1 1 3.0 > 2 1 2 3.1 > 3 1 3 3.4 > 4 2 1 4.0 > 5 2 2 4.3 > 6 2 3 4.4 > 7 3 1 3.0 > 8 3 2 3.2 > 9 3 3 3.5 > > where for each sample, I've measured mass at different points in time. > > I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting: > >> mydata #after adding new variable > Sample Time MassGain2-3 > 1 1 1 3.00.3 > 2 1 2 3.1 0.3 > 3 1 3 3.4 0.3 > 4 2 1 4.0 0.1 > 5 2 2 4.3 0.1 > 6 2 3 4.4 0.1 > 7 3 1 3.0 0.3 > 8 3 2 3.2 0.3 > 9 3 3 3.5 0.3 > > Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions. > > Thanks! > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
David Winsemius
2012-Sep-18 00:00 UTC
[R] help with calculation from dataframe with multiple entries per sample
On Sep 17, 2012, at 4:15 PM, Julie Lee-Yaw wrote:> Hi > > I have a dataframe similar to: > >> Sample<-c(1,1,1,2,2,2,3,3,3) > >> Time<-c(1,2,3,1,2,3,1,2,3) > >> Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5) > >> mydata<-as.data.frame(cbind(Sample,Time,Mass)) >Please tell me where you learned that as.data.frame(cbind(.)) construction.> ( > Sample Time Mass > 1 1 1 3.0 > 2 1 2 3.1 > 3 1 3 3.4 > 4 2 1 4.0 > 5 2 2 4.3 > 6 2 3 4.4 > 7 3 1 3.0 > 8 3 2 3.2 > 9 3 3 3.5 > > where for each sample, I've measured mass at different points in time. > > I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting: > >> mydata #after adding new variablemydata$gain2.3 <- with( mydata, ave( Mass , Time, FUN=function(x) diff(x[2],x[3]) ) )> > Sample Time MassGain2-3 > 1 1 1 3.00.3 > 2 1 2 3.1 0.3 > 3 1 3 3.4 0.3 > 4 2 1 4.0 0.1 > 5 2 2 4.3 0.1 > 6 2 3 4.4 0.1 > 7 3 1 3.0 0.3 > 8 3 2 3.2 0.3 > 9 3 3 3.5 0.3 >> mydata$gain2.3 <- with( mydata, ave( Mass , Sample, FUN=function(x) (x[3]-x[2]) ) ) > mydataSample Time Mass gain2.3 1 1 1 3.0 0.3 2 1 2 3.1 0.3 3 1 3 3.4 0.3 4 2 1 4.0 0.1 5 2 2 4.3 0.1 6 2 3 4.4 0.1 7 3 1 3.0 0.3 8 3 2 3.2 0.3 9 3 3 3.5 0.3> Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.-- David Winsemius, MD Alameda, CA, USA
arun
2012-Sep-18 02:28 UTC
[R] help with calculation from dataframe with multiple entries per sample
HI, Try this: ?mydata$Gain<-rep(tapply(mydata$Mass,mydata$Sample,FUN=function(x) (x[3]-x[2])),each=length(unique(mydata$Sample))) ?mydata #? Sample Time Mass Gain #1????? 1??? 1? 3.0? 0.3 #2????? 1??? 2? 3.1? 0.3 #3????? 1??? 3? 3.4? 0.3 #4????? 2??? 1? 4.0? 0.1 #5????? 2??? 2? 4.3? 0.1 #6????? 2??? 3? 4.4? 0.1 #7????? 3??? 1? 3.0? 0.3 #8????? 3??? 2? 3.2? 0.3 #9????? 3??? 3? 3.5? 0.3 A.K. ----- Original Message ----- From: Julie Lee-Yaw <julleeyaw at yahoo.ca> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Monday, September 17, 2012 7:15 PM Subject: [R] help with calculation from dataframe with multiple entries per sample Hi? I have a dataframe similar to:>Sample<-c(1,1,1,2,2,2,3,3,3)>Time<-c(1,2,3,1,2,3,1,2,3)>Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)>mydata<-as.data.frame(cbind(Sample,Time,Mass))? Sample Time Mass 1 ? ? ?1 ? ?1 ?3.0 2 ? ? ?1 ? ?2 ?3.1 3 ? ? ?1 ? ?3 ?3.4 4 ? ? ?2 ? ?1 ?4.0 5 ? ? ?2 ? ?2 ?4.3 6 ? ? ?2 ? ?3 ?4.4 7 ? ? ?3 ? ?1 ?3.0 8 ? ? ?3 ? ?2 ?3.2 9 ? ? ?3 ? ?3 ?3.5 where for each sample, I've measured mass at different points in time.? I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:>mydata #after adding new variable? Sample Time MassGain2-3 1 ? ? ?1 ? ?1 ?3.00.3 2 ? ? ?1 ? ?2 ?3.1 0.3 3 ? ? ?1 ? ?3 ?3.4 0.3 4 ? ? ?2 ? ?1 ?4.0 0.1 5 ? ? ?2 ? ?2 ?4.3 0.1 6 ? ? ?2 ? ?3 ?4.4 0.1 7 ? ? ?3 ? ?1 ?3.0 0.3 8 ? ? ?3 ? ?2 ?3.2 0.3 9 ? ? ?3 ? ?3 ?3.5 0.3 Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.? Thanks! ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
arun
2012-Sep-18 03:11 UTC
[R] help with calculation from dataframe with multiple entries per sample
HI, Modified version of my earlier solution: res1<-tapply(mydata$Mass,mydata$Sample,FUN=function(x) (x[3]-x[2])) res2<-data.frame(Sample=names(res1),Gain2_3=res1) ?merge(mydata,res2) #Sample Time Mass Gain2_3 #1????? 1??? 1? 3.0???? 0.3 #2????? 1??? 2? 3.1???? 0.3 #3????? 1??? 3? 3.4???? 0.3 #4????? 2??? 1? 4.0???? 0.1 #5????? 2??? 2? 4.3???? 0.1 #6????? 2??? 3? 4.4???? 0.1 #7????? 3??? 1? 3.0???? 0.3 #8????? 3??? 2? 3.2???? 0.3 #9????? 3??? 3? 3.5???? 0.3 A.K. ----- Original Message ----- From: Julie Lee-Yaw <julleeyaw at yahoo.ca> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Monday, September 17, 2012 7:15 PM Subject: [R] help with calculation from dataframe with multiple entries per sample Hi? I have a dataframe similar to:>Sample<-c(1,1,1,2,2,2,3,3,3)>Time<-c(1,2,3,1,2,3,1,2,3)>Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)>mydata<-as.data.frame(cbind(Sample,Time,Mass))? Sample Time Mass 1 ? ? ?1 ? ?1 ?3.0 2 ? ? ?1 ? ?2 ?3.1 3 ? ? ?1 ? ?3 ?3.4 4 ? ? ?2 ? ?1 ?4.0 5 ? ? ?2 ? ?2 ?4.3 6 ? ? ?2 ? ?3 ?4.4 7 ? ? ?3 ? ?1 ?3.0 8 ? ? ?3 ? ?2 ?3.2 9 ? ? ?3 ? ?3 ?3.5 where for each sample, I've measured mass at different points in time.? I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:>mydata #after adding new variable? Sample Time MassGain2-3 1 ? ? ?1 ? ?1 ?3.00.3 2 ? ? ?1 ? ?2 ?3.1 0.3 3 ? ? ?1 ? ?3 ?3.4 0.3 4 ? ? ?2 ? ?1 ?4.0 0.1 5 ? ? ?2 ? ?2 ?4.3 0.1 6 ? ? ?2 ? ?3 ?4.4 0.1 7 ? ? ?3 ? ?1 ?3.0 0.3 8 ? ? ?3 ? ?2 ?3.2 0.3 9 ? ? ?3 ? ?3 ?3.5 0.3 Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.? Thanks! ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jul_biologyGrad
2012-Sep-19 04:47 UTC
[R] help with calculation from dataframe with multiple entries per sample
Thanks everyone for the help! I pulled together a bunch of your suggestions to get the result that I needed. I'm posting my final code below. Probably not the most efficient way of doing things but gets the job done in a way that a newbie can understand! ##Here again is the example dataset Sample<-c(1,1,1,2,2,2,3,3,3) Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5) Time<-c(1,2,3,1,2,3,1,2,3) mydata<-as.data.frame(cbind(Sample,Time,Mass)) ## I split the dataset by Sample and then calculate the difference between mass at time 3 and mass at time 2 for each Sample; then use the merge function to attach this data to my initial dataset sp<-split(mydata,mydata$Sample) y<-rbind(lapply(sp,function(x){Gain<-x$Mass[x$Time==3]-x$Mass[x$Time==2]})) ## note here that as I modification to some of the suggestions posted, I wanted a way to specifically call "mass at time 3" etc. for each sample rather than relying on the position of such data within each split/Sample (hence allowing me to deal with samples that may have the Time/Mass data input in a different order # some massaging of the results u<-t(y) s<-data.frame(Sample=row.names(u),Gain2_3=u) fulldata<-merge(mydata,s) ## as I wished to export the data in the end using write.csv, I had to covert "list" data into "numeric" in the final dataframe fulldata$Gain<-as.numeric(fulldata$Gain2_3) fulldata$Gain2_3<-NULL Thanks again everyone! -- View this message in context: http://r.789695.n4.nabble.com/help-with-calculation-from-dataframe-with-multiple-entries-per-sample-tp4643434p4643581.html Sent from the R help mailing list archive at Nabble.com.