Hi I am a newbie to R but have tried a number of ways in R to do this and can't find a good solution. (I could do it out of R in perl or awk but would like to know how to do this in R). I have a large data frame 49 variables and 7000 observations however for simplicity I can express it in the following data frame Base, Image, LVEF, ES_Time A, 1, 4.32, 0.89 A, 2, 4.98, 0.67 A, 3, 3.7, 0.5 A, 3. 4.1, 0.8 B, 1, 7.4, 0.7 B, 3, 7.2, 0.8 B, 4, 7.8, 0.6 C, 1, 5.6, 1.1 C, 4, 5.2, 1.3 C, 5, 5.9, 1.2 C, 6, 6.1, 1.2 C, 7. 3.2, 1.1 For each value of LVEF and ES_Time I would like to normalise the value to the maximum for that factor grouped by Base or Image number, adding an extra column to the data frame with the normalised value in it. So for the Base = B group in the data frame (the data frame should have the same length I'm just showing the B part) I would get a modified data frame as follows. Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time ... B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8 B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8 B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8 ... Where the results of the division would replace the division shown here. I hope this makes sense. If anyone can help I would be very grateful. Sandy Small NHS Glasgow, UK ********************************************************************** This message may contain confidential and privileged information. If you are not the intended recipient please accept our apologies. Please do not disclose, copy or distribute information in this e-mail or take any action in reliance on its contents: to do so is strictly prohibited and may be unlawful. Please inform us that this message has gone astray before deleting it. Thank you for your co-operation. NHSmail is used daily by over 100,000 staff in the NHS. Over a million messages are sent every day by the system. To find out why more and more NHS personnel are switching to this NHS Connecting for Health system please visit www.connectingforhealth.nhs.uk/nhsmail
Sandy Small wrote:> Hi > I am a newbie to R but have tried a number of ways in R to do this and > can't find a good solution. (I could do it out of R in perl or awk but > would like to know how to do this in R). > > I have a large data frame 49 variables and 7000 observations however for > simplicity I can express it in the following data frame > > Base, Image, LVEF, ES_Time > A, 1, 4.32, 0.89 > A, 2, 4.98, 0.67 > A, 3, 3.7, 0.5 > A, 3. 4.1, 0.8 > B, 1, 7.4, 0.7 > B, 3, 7.2, 0.8 > B, 4, 7.8, 0.6 > C, 1, 5.6, 1.1 > C, 4, 5.2, 1.3 > C, 5, 5.9, 1.2 > C, 6, 6.1, 1.2 > C, 7. 3.2, 1.1 > > For each value of LVEF and ES_Time I would like to normalise the value > to the maximum for that factor grouped by Base or Image number, adding > an extra column to the data frame with the normalised value in it. > > So for the Base = B group in the data frame (the data frame should have > the same length I'm just showing the B part) I would get a modified data > frame as follows. > > Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time > ... > B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8 > B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8 > B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8 > ... > > Where the results of the division would replace the division shown here. > I hope this makes sense. > If anyone can help I would be very grateful. >You want to look at the by(), tapply() or sparseby() functions (the latter in the reshape package, the others are in base R). For example, I think this untested code does what you want: newdf <- sparseby(olddf, c("Base", "Image"), function(subset) within(subset, { Norm_LVEF <- LVEF/max(LVEF) Norm_ES_Time <- ES_Time/max(ES_Time) })) where olddf is the old dataframe, and newdf is newly created. Duncan Murdoch
On Nov 9, 2007 5:56 AM, Sandy Small <sandy.small at nhs.net> wrote:> Hi > I am a newbie to R but have tried a number of ways in R to do this and > can't find a good solution. (I could do it out of R in perl or awk but > would like to know how to do this in R). > > I have a large data frame 49 variables and 7000 observations however for > simplicity I can express it in the following data frame > > Base, Image, LVEF, ES_Time > A, 1, 4.32, 0.89 > A, 2, 4.98, 0.67 > A, 3, 3.7, 0.5 > A, 3. 4.1, 0.8 > B, 1, 7.4, 0.7 > B, 3, 7.2, 0.8 > B, 4, 7.8, 0.6 > C, 1, 5.6, 1.1 > C, 4, 5.2, 1.3 > C, 5, 5.9, 1.2 > C, 6, 6.1, 1.2 > C, 7. 3.2, 1.1 > > For each value of LVEF and ES_Time I would like to normalise the value > to the maximum for that factor grouped by Base or Image number, adding > an extra column to the data frame with the normalised value in it. > > So for the Base = B group in the data frame (the data frame should have > the same length I'm just showing the B part) I would get a modified data > frame as follows. > > Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time > ... > B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8 > B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8 > B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8 > ... > > Where the results of the division would replace the division shown here. > I hope this makes sense. > If anyone can help I would be very grateful. > > Sandy Small > NHS Glasgow, UKHere is a solution using sqldf: library(sqldf) sqldf("select u.*, u.LVEF / max_LVEF Norm_LVEF, u.ES_Time / max_ES_Time Norm_ES_Time from DF u join (select Base, max(LVEF) max_LVEF, max(ES_Time) max_ES_Time from DF group by Base) using(Base)" ) See http://sqldf.googlecode.com for more info.
Here is another approach using transform and ave which I think is a little simpler than the others suggested:> new.data <- transform( iris,+ normSW = Sepal.Width / ave(Sepal.Width, Species, FUN=max), + normSL = Sepal.Length / ave(Sepal.Length, Species, FUN=max) + ) You can adjust it for your data. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Sandy Small > Sent: Friday, November 09, 2007 3:57 AM > To: r-help at r-project.org > Subject: [R] Normalizing grouped data in a data frame > > Hi > I am a newbie to R but have tried a number of ways in R to do > this and can't find a good solution. (I could do it out of R > in perl or awk but would like to know how to do this in R). > > I have a large data frame 49 variables and 7000 observations > however for simplicity I can express it in the following data frame > > Base, Image, LVEF, ES_Time > A, 1, 4.32, 0.89 > A, 2, 4.98, 0.67 > A, 3, 3.7, 0.5 > A, 3. 4.1, 0.8 > B, 1, 7.4, 0.7 > B, 3, 7.2, 0.8 > B, 4, 7.8, 0.6 > C, 1, 5.6, 1.1 > C, 4, 5.2, 1.3 > C, 5, 5.9, 1.2 > C, 6, 6.1, 1.2 > C, 7. 3.2, 1.1 > > For each value of LVEF and ES_Time I would like to normalise > the value to the maximum for that factor grouped by Base or > Image number, adding an extra column to the data frame with > the normalised value in it. > > So for the Base = B group in the data frame (the data frame > should have the same length I'm just showing the B part) I > would get a modified data frame as follows. > > Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time ... > B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8 > B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8 > B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8 > ... > > Where the results of the division would replace the division > shown here. > I hope this makes sense. > If anyone can help I would be very grateful. > > Sandy Small > NHS Glasgow, UK > > > ********************************************************************** > This message may contain confidential and privileged information. > If you are not the intended recipient please accept our apologies. > Please do not disclose, copy or distribute information in > this e-mail or take any action in reliance on its contents: > to do so is strictly prohibited and may be unlawful. Please > inform us that this message has gone astray before > deleting it. Thank you for your co-operation. > > NHSmail is used daily by over 100,000 staff in the NHS. Over > a million messages are sent every day by the system. To > find out why more and more NHS personnel are switching to > this NHS Connecting for Health system please visit > www.connectingforhealth.nhs.uk/nhsmail > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >