Hello everyone, I have a .csv file with the following format: uniqueID SubjectID Distance_miles Tag 1 1001 5.5 3 2 1001 7 1 3 1001 6.5 1 4 1001 5 1 5 1002 2 2 6 1002 2 2 7 1002 1.5 2 8 1003 15 2 9 1003 17 2 10 1003 18 2 For each SubjectID, I want to calculate the median distance, where the Tag variable indicates the number of times that distance was recorded. My final output table would be... SubjectID Median Distance 1001 5.5 1002 2 1003 17 I have used the following script to calculate the median for a data frame where each recorded distance has its own row, and where temp is a dataframe containing each unique SubjectID and routes is the file I describe above. for(i in 1:nrow(temp)){ temp$mediandistance[i] <- median(routes$Distance_miles[routes$Subject_ID==temp$Subject_ID[i]]) } I am interested to know... (1) Is there a way to incorporate a weighted median into this script, where the weights are the number of times each distance is recorded? (2) Can I transform my current matrix into one that gives each distance its own row? Help is much appreciated, Kirsten Beyer
one approach is: sp <- split(dat[-1], dat$SubjectID) t(sapply(sp, function (d) c(d$SubjectID[1], median(rep(d$Distance_miles, d$Tag))))) where 'dat' is the name of your data.frame. I hope this helps. Best, Dimitris Kirsten Beyer wrote:> Hello everyone, > > I have a .csv file with the following format: > > uniqueID SubjectID Distance_miles Tag > 1 1001 5.5 3 > 2 1001 7 1 > 3 1001 6.5 1 > 4 1001 5 1 > 5 1002 2 2 > 6 1002 2 2 > 7 1002 1.5 2 > 8 1003 15 2 > 9 1003 17 2 > 10 1003 18 2 > > > For each SubjectID, I want to calculate the median distance, where the > Tag variable indicates the number of times that distance was recorded. > My final output table would be... > > SubjectID Median Distance > 1001 5.5 > 1002 2 > 1003 17 > > I have used the following script to calculate the median for a data > frame where each recorded distance has its own row, and where temp is > a dataframe containing each unique SubjectID and routes is the file I > describe above. > > for(i in 1:nrow(temp)){ > temp$mediandistance[i] <- > median(routes$Distance_miles[routes$Subject_ID==temp$Subject_ID[i]]) > } > > I am interested to know... > (1) Is there a way to incorporate a weighted median into this script, > where the weights are the number of times each distance is recorded? > (2) Can I transform my current matrix into one that gives each > distance its own row? > > Help is much appreciated, > Kirsten Beyer > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
This may do it but I have not verified the figures. library(cwhmisc) by(xx[,3:4],xx[,2],function(x,y) weighted.mean(x,y)) --- On Thu, 7/30/09, Kirsten Beyer <kirsten-beyer at uiowa.edu> wrote:> From: Kirsten Beyer <kirsten-beyer at uiowa.edu> > Subject: [R] weight median by count for multiple records > To: r-help at r-project.org > Received: Thursday, July 30, 2009, 1:58 PM > Hello everyone, > > I have a .csv file with the following format: > > uniqueID? ???SubjectID? ? > ? Distance_miles? ???Tag > 1? ? ? ? ? ? ? ? > ? 1001? ? ? ? ? ? ? > ? ? ? 5.5? ? ? ? ? > ? ???3 > 2? ? ? ? ? ? ? ? > ? 1001? ? ? ? ? ? ? > ? ? ? 7? ? ? ? ? > ? ? ? ? 1 > 3? ? ? ? ? ? ? ? > ? 1001? ? ? ? ? ? ? > ? ? ? 6.5? ? ? ? ? > ? ???1 > 4? ? ? ? ? ? ? ? > ? 1001? ? ? ? ? ? ? > ? ? ? 5? ? ? ? ? > ? ? ? ? 1 > 5? ? ? ? ? ? ? ? > ? 1002? ? ? ? ? ? ? > ? ? ? 2? ? ? ? ? > ? ? ? ? 2 > 6? ? ? ? ? ? ? ? > ? 1002? ? ? ? ? ? ? > ? ? ? 2? ? ? ? ? > ? ? ? ? 2 > 7? ? ? ? ? ? ? ? > ? 1002? ? ? ? ? ? ? > ? ? ? 1.5? ? ? ? ? > ? ???2 > 8? ? ? ? ? ? ? ? > ? 1003? ? ? ? ? ? ? > ? ? ? 15? ? ? ? ? > ? ? ? 2 > 9? ? ? ? ? ? ? ? > ? 1003? ? ? ? ? ? ? > ? ? ? 17? ? ? ? ? > ? ? ? 2 > 10? ? ? ? ? ? ? ? > 1003? ? ? ? ? ? ? ? > ? ? 18? ? ? ? ? ? > ? ? 2 > > > For each SubjectID, I want to calculate the median > distance, where the > Tag variable indicates the number of times that distance > was recorded. > My final output table would be... > > SubjectID???Median Distance > 1001? ? ? ? ? ? ? ? > 5.5 > 1002? ? ? ? ? ? ? ? > 2 > 1003? ? ? ? ? ? ? ? > 17 > > I have used the following script to calculate the median > for a data > frame where each recorded distance has its own row, and > where temp is > a dataframe containing each unique SubjectID and routes is > the file I > describe above. > > for(i in 1:nrow(temp)){ > temp$mediandistance[i] <- > median(routes$Distance_miles[routes$Subject_ID==temp$Subject_ID[i]]) > } > > I am interested to know... > (1) Is there a way to incorporate a weighted median into > this script, > where the weights are the number of times each distance is > recorded? > (2) Can I transform my current matrix into one that gives > each > distance its own row? > > Help is much appreciated, > Kirsten Beyer > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >__________________________________________________________________ Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now http://ca.toolbar.yahoo.com.