Julie Lee-Yaw
2012-Sep-17 23:15 UTC
[R] help with calculation from dataframe with multiple entries per sample
Hi I have a dataframe similar to:>Sample<-c(1,1,1,2,2,2,3,3,3)>Time<-c(1,2,3,1,2,3,1,2,3)>Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)>mydata<-as.data.frame(cbind(Sample,Time,Mass))Sample Time Mass 1 1 1 3.0 2 1 2 3.1 3 1 3 3.4 4 2 1 4.0 5 2 2 4.3 6 2 3 4.4 7 3 1 3.0 8 3 2 3.2 9 3 3 3.5 where for each sample, I've measured mass at different points in time. I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:>mydata #after adding new variableSample Time MassGain2-3 1 1 1 3.00.3 2 1 2 3.1 0.3 3 1 3 3.4 0.3 4 2 1 4.0 0.1 5 2 2 4.3 0.1 6 2 3 4.4 0.1 7 3 1 3.0 0.3 8 3 2 3.2 0.3 9 3 3 3.5 0.3 Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions. Thanks! [[alternative HTML version deleted]]
Phil Spector
2012-Sep-17 23:56 UTC
[R] help with calculation from dataframe with multiple entries per sample
Julie -
Since the apply functions operate on one row at a time, they
can't do what you want. I think the easiest way to solve your
problem is to reshape the data set, and merge it back with the
original:
> dd = data.frame(Sample=c(1,1,1,2,2,2,3,3,3),
+ Time=c(1,2,3,1,2,3,1,2,3),
+ Mass=c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5))> rdd =
reshape(dd,timevar='Time',idvar='Sample',direction='wide')
> rdd$"Gain2-3" = rdd$Mass.3 - rdd$Mass.2
> merge(dd,subset(rdd,select=c('Sample',"Gain2-3")))
Sample Time Mass Gain2-3
1 1 1 3.0 0.3
2 1 2 3.1 0.3
3 1 3 3.4 0.3
4 2 1 4.0 0.1
5 2 2 4.3 0.1
6 2 3 4.4 0.1
7 3 1 3.0 0.3
8 3 2 3.2 0.3
9 3 3 3.5 0.3
You may want to avoid using special characters like dashes in variable
names.
Hope this helps.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Mon, 17 Sep 2012, Julie Lee-Yaw wrote:
> Hi?
>
> I have a dataframe similar to:
>
>> Sample<-c(1,1,1,2,2,2,3,3,3)
>
>> Time<-c(1,2,3,1,2,3,1,2,3)
>
>> Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)
>
>> mydata<-as.data.frame(cbind(Sample,Time,Mass))
>
>
> ? Sample Time Mass
> 1 ? ? ?1 ? ?1 ?3.0
> 2 ? ? ?1 ? ?2 ?3.1
> 3 ? ? ?1 ? ?3 ?3.4
> 4 ? ? ?2 ? ?1 ?4.0
> 5 ? ? ?2 ? ?2 ?4.3
> 6 ? ? ?2 ? ?3 ?4.4
> 7 ? ? ?3 ? ?1 ?3.0
> 8 ? ? ?3 ? ?2 ?3.2
> 9 ? ? ?3 ? ?3 ?3.5
>
> where for each sample, I've measured mass at different points in time.?
>
> I now want to calculate the difference between Mass at Time 2 and 3 for
each unique Sample and store this as a new variable called "Gain2-3".
So in my example three values of 0.3,0.1,0.3 would be calculated for my three
unique samples and these values would be repeated in the table according to
Sample. I am thus expecting:
>
>> mydata #after adding new variable
>
> ? Sample Time MassGain2-3
> 1 ? ? ?1 ? ?1 ?3.00.3
> 2 ? ? ?1 ? ?2 ?3.1 0.3
> 3 ? ? ?1 ? ?3 ?3.4 0.3
> 4 ? ? ?2 ? ?1 ?4.0 0.1
> 5 ? ? ?2 ? ?2 ?4.3 0.1
> 6 ? ? ?2 ? ?3 ?4.4 0.1
> 7 ? ? ?3 ? ?1 ?3.0 0.3
> 8 ? ? ?3 ? ?2 ?3.2 0.3
> 9 ? ? ?3 ? ?3 ?3.5 0.3
>
> Does anyone have any suggestions as to how to do this? I've looked at
the various apply functions but I can't seem to make anything work. I'm
fairly new to R and would appreciate specific suggestions.?
>
> Thanks!
> [[alternative HTML version deleted]]
>
>
Rui Barradas
2012-Sep-17 23:59 UTC
[R] help with calculation from dataframe with multiple entries per sample
Hello,
Try the following.
sp <- split(mydata, mydata$Sample)
do.call(rbind, lapply(sp, function(x){x$Gain <- x$Mass[3] - x$Mass[2]; x}))
Hope this helps,
Rui Barradas
Em 18-09-2012 00:15, Julie Lee-Yaw escreveu:> Hi
>
> I have a dataframe similar to:
>
>> Sample<-c(1,1,1,2,2,2,3,3,3)
>> Time<-c(1,2,3,1,2,3,1,2,3)
>> Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)
>> mydata<-as.data.frame(cbind(Sample,Time,Mass))
>
> Sample Time Mass
> 1 1 1 3.0
> 2 1 2 3.1
> 3 1 3 3.4
> 4 2 1 4.0
> 5 2 2 4.3
> 6 2 3 4.4
> 7 3 1 3.0
> 8 3 2 3.2
> 9 3 3 3.5
>
> where for each sample, I've measured mass at different points in time.
>
> I now want to calculate the difference between Mass at Time 2 and 3 for
each unique Sample and store this as a new variable called "Gain2-3".
So in my example three values of 0.3,0.1,0.3 would be calculated for my three
unique samples and these values would be repeated in the table according to
Sample. I am thus expecting:
>
>> mydata #after adding new variable
> Sample Time MassGain2-3
> 1 1 1 3.00.3
> 2 1 2 3.1 0.3
> 3 1 3 3.4 0.3
> 4 2 1 4.0 0.1
> 5 2 2 4.3 0.1
> 6 2 3 4.4 0.1
> 7 3 1 3.0 0.3
> 8 3 2 3.2 0.3
> 9 3 3 3.5 0.3
>
> Does anyone have any suggestions as to how to do this? I've looked at
the various apply functions but I can't seem to make anything work. I'm
fairly new to R and would appreciate specific suggestions.
>
> Thanks!
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
David Winsemius
2012-Sep-18 00:00 UTC
[R] help with calculation from dataframe with multiple entries per sample
On Sep 17, 2012, at 4:15 PM, Julie Lee-Yaw wrote:> Hi > > I have a dataframe similar to: > >> Sample<-c(1,1,1,2,2,2,3,3,3) > >> Time<-c(1,2,3,1,2,3,1,2,3) > >> Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5) > >> mydata<-as.data.frame(cbind(Sample,Time,Mass)) >Please tell me where you learned that as.data.frame(cbind(.)) construction.> ( > Sample Time Mass > 1 1 1 3.0 > 2 1 2 3.1 > 3 1 3 3.4 > 4 2 1 4.0 > 5 2 2 4.3 > 6 2 3 4.4 > 7 3 1 3.0 > 8 3 2 3.2 > 9 3 3 3.5 > > where for each sample, I've measured mass at different points in time. > > I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting: > >> mydata #after adding new variablemydata$gain2.3 <- with( mydata, ave( Mass , Time, FUN=function(x) diff(x[2],x[3]) ) )> > Sample Time MassGain2-3 > 1 1 1 3.00.3 > 2 1 2 3.1 0.3 > 3 1 3 3.4 0.3 > 4 2 1 4.0 0.1 > 5 2 2 4.3 0.1 > 6 2 3 4.4 0.1 > 7 3 1 3.0 0.3 > 8 3 2 3.2 0.3 > 9 3 3 3.5 0.3 >> mydata$gain2.3 <- with( mydata, ave( Mass , Sample, FUN=function(x) (x[3]-x[2]) ) ) > mydataSample Time Mass gain2.3 1 1 1 3.0 0.3 2 1 2 3.1 0.3 3 1 3 3.4 0.3 4 2 1 4.0 0.1 5 2 2 4.3 0.1 6 2 3 4.4 0.1 7 3 1 3.0 0.3 8 3 2 3.2 0.3 9 3 3 3.5 0.3> Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.-- David Winsemius, MD Alameda, CA, USA
arun
2012-Sep-18 02:28 UTC
[R] help with calculation from dataframe with multiple entries per sample
HI, Try this: ?mydata$Gain<-rep(tapply(mydata$Mass,mydata$Sample,FUN=function(x) (x[3]-x[2])),each=length(unique(mydata$Sample))) ?mydata #? Sample Time Mass Gain #1????? 1??? 1? 3.0? 0.3 #2????? 1??? 2? 3.1? 0.3 #3????? 1??? 3? 3.4? 0.3 #4????? 2??? 1? 4.0? 0.1 #5????? 2??? 2? 4.3? 0.1 #6????? 2??? 3? 4.4? 0.1 #7????? 3??? 1? 3.0? 0.3 #8????? 3??? 2? 3.2? 0.3 #9????? 3??? 3? 3.5? 0.3 A.K. ----- Original Message ----- From: Julie Lee-Yaw <julleeyaw at yahoo.ca> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Monday, September 17, 2012 7:15 PM Subject: [R] help with calculation from dataframe with multiple entries per sample Hi? I have a dataframe similar to:>Sample<-c(1,1,1,2,2,2,3,3,3)>Time<-c(1,2,3,1,2,3,1,2,3)>Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)>mydata<-as.data.frame(cbind(Sample,Time,Mass))? Sample Time Mass 1 ? ? ?1 ? ?1 ?3.0 2 ? ? ?1 ? ?2 ?3.1 3 ? ? ?1 ? ?3 ?3.4 4 ? ? ?2 ? ?1 ?4.0 5 ? ? ?2 ? ?2 ?4.3 6 ? ? ?2 ? ?3 ?4.4 7 ? ? ?3 ? ?1 ?3.0 8 ? ? ?3 ? ?2 ?3.2 9 ? ? ?3 ? ?3 ?3.5 where for each sample, I've measured mass at different points in time.? I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:>mydata #after adding new variable? Sample Time MassGain2-3 1 ? ? ?1 ? ?1 ?3.00.3 2 ? ? ?1 ? ?2 ?3.1 0.3 3 ? ? ?1 ? ?3 ?3.4 0.3 4 ? ? ?2 ? ?1 ?4.0 0.1 5 ? ? ?2 ? ?2 ?4.3 0.1 6 ? ? ?2 ? ?3 ?4.4 0.1 7 ? ? ?3 ? ?1 ?3.0 0.3 8 ? ? ?3 ? ?2 ?3.2 0.3 9 ? ? ?3 ? ?3 ?3.5 0.3 Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.? Thanks! ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
arun
2012-Sep-18 03:11 UTC
[R] help with calculation from dataframe with multiple entries per sample
HI, Modified version of my earlier solution: res1<-tapply(mydata$Mass,mydata$Sample,FUN=function(x) (x[3]-x[2])) res2<-data.frame(Sample=names(res1),Gain2_3=res1) ?merge(mydata,res2) #Sample Time Mass Gain2_3 #1????? 1??? 1? 3.0???? 0.3 #2????? 1??? 2? 3.1???? 0.3 #3????? 1??? 3? 3.4???? 0.3 #4????? 2??? 1? 4.0???? 0.1 #5????? 2??? 2? 4.3???? 0.1 #6????? 2??? 3? 4.4???? 0.1 #7????? 3??? 1? 3.0???? 0.3 #8????? 3??? 2? 3.2???? 0.3 #9????? 3??? 3? 3.5???? 0.3 A.K. ----- Original Message ----- From: Julie Lee-Yaw <julleeyaw at yahoo.ca> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Monday, September 17, 2012 7:15 PM Subject: [R] help with calculation from dataframe with multiple entries per sample Hi? I have a dataframe similar to:>Sample<-c(1,1,1,2,2,2,3,3,3)>Time<-c(1,2,3,1,2,3,1,2,3)>Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)>mydata<-as.data.frame(cbind(Sample,Time,Mass))? Sample Time Mass 1 ? ? ?1 ? ?1 ?3.0 2 ? ? ?1 ? ?2 ?3.1 3 ? ? ?1 ? ?3 ?3.4 4 ? ? ?2 ? ?1 ?4.0 5 ? ? ?2 ? ?2 ?4.3 6 ? ? ?2 ? ?3 ?4.4 7 ? ? ?3 ? ?1 ?3.0 8 ? ? ?3 ? ?2 ?3.2 9 ? ? ?3 ? ?3 ?3.5 where for each sample, I've measured mass at different points in time.? I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:>mydata #after adding new variable? Sample Time MassGain2-3 1 ? ? ?1 ? ?1 ?3.00.3 2 ? ? ?1 ? ?2 ?3.1 0.3 3 ? ? ?1 ? ?3 ?3.4 0.3 4 ? ? ?2 ? ?1 ?4.0 0.1 5 ? ? ?2 ? ?2 ?4.3 0.1 6 ? ? ?2 ? ?3 ?4.4 0.1 7 ? ? ?3 ? ?1 ?3.0 0.3 8 ? ? ?3 ? ?2 ?3.2 0.3 9 ? ? ?3 ? ?3 ?3.5 0.3 Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.? Thanks! ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jul_biologyGrad
2012-Sep-19 04:47 UTC
[R] help with calculation from dataframe with multiple entries per sample
Thanks everyone for the help! I pulled together a bunch of your suggestions
to get the result that I needed. I'm posting my final code below. Probably
not the most efficient way of doing things but gets the job done in a way
that a newbie can understand!
##Here again is the example dataset
Sample<-c(1,1,1,2,2,2,3,3,3)
Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)
Time<-c(1,2,3,1,2,3,1,2,3)
mydata<-as.data.frame(cbind(Sample,Time,Mass))
## I split the dataset by Sample and then calculate the difference between
mass at time 3 and mass at time 2 for each Sample; then use the merge
function to attach this data to my initial dataset
sp<-split(mydata,mydata$Sample)
y<-rbind(lapply(sp,function(x){Gain<-x$Mass[x$Time==3]-x$Mass[x$Time==2]}))
## note here that as I modification to some of the suggestions posted, I
wanted a way to specifically call "mass at time 3" etc. for each
sample
rather than relying on the position of such data within each split/Sample
(hence allowing me to deal with samples that may have the Time/Mass data
input in a different order
# some massaging of the results
u<-t(y)
s<-data.frame(Sample=row.names(u),Gain2_3=u)
fulldata<-merge(mydata,s)
## as I wished to export the data in the end using write.csv, I had to
covert "list" data into "numeric" in the final dataframe
fulldata$Gain<-as.numeric(fulldata$Gain2_3)
fulldata$Gain2_3<-NULL
Thanks again everyone!
--
View this message in context:
http://r.789695.n4.nabble.com/help-with-calculation-from-dataframe-with-multiple-entries-per-sample-tp4643434p4643581.html
Sent from the R help mailing list archive at Nabble.com.