Hi all, I'm responsible for collating data on antibiotic use at my local group of hospitals. I have data for five different hospitals, about 40 different antibiotics and monthly data going back to 2006. At the moment, I have this stored in 5 datafiles, one for each hospital, formatted as follows: date, antibiotic1, antibiotic2, antibiotic3.... 1-mmm-yy, ab11, ab21, ab31.... 1-mmm-yy, ab12, ab22, ab32... This works most of the time for me, because the most common thing I need to do is to track a particular hospital's antibiotic use over time (sum of columns, as a time series by row). What I would like to do is to amalgamate the data so instead of analysing an individual hospital (ie a datasheet in the current format) is to be able to look at a particular antibiotic across the five hospitals. The best way I can visualise this is having the data in a data cube, with each hospital as a single plane. Currently, my hospitals are (x,y,1), (x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the sum of antibiotic1 in all hospitals. I imagine one way of doing this is having a hospital column in the data: date, hospital, antibiotic1, antibiotic2, antibiotic3... 1-mmm-yy, hospital1, a11, a21, a31... 1-mmm-yy, hospital2, a11, a21, a31... etc Two questions: 1) Is there a better way of storing the data than this? 2) Is there an easy way to turn what I have into what I want? I know that once I have the data sorted, I'll be able to dpyl it into the categories I currently use - it's the getting from here to there I need help with, please. Cheers, Trent. -- -- Trent Yarwood trentyarwood at gmail.com [[alternative HTML version deleted]]
Hi Trent, I may be missing something, but I think that if you simply add your column with the hospital code to each of the five data frames: hospital1.df$hospital<-"hospital1" hospital2.df$hospital<-"hospital2" ... and then concatenate (rbind) them you can get what you want without going to a 3D array. Jim On Wed, Aug 5, 2015 at 4:30 PM, Trent Yarwood <trentyarwood at gmail.com> wrote:> Hi all, > > I'm responsible for collating data on antibiotic use at my local group of > hospitals. I have data for five different hospitals, about 40 different > antibiotics and monthly data going back to 2006. > > At the moment, I have this stored in 5 datafiles, one for each hospital, > formatted as follows: > > date, antibiotic1, antibiotic2, antibiotic3.... > 1-mmm-yy, ab11, ab21, ab31.... > 1-mmm-yy, ab12, ab22, ab32... > > This works most of the time for me, because the most common thing I need to > do is to track a particular hospital's antibiotic use over time (sum of > columns, as a time series by row). > > What I would like to do is to amalgamate the data so instead of analysing > an individual hospital (ie a datasheet in the current format) is to be able > to look at a particular antibiotic across the five hospitals. > > The best way I can visualise this is having the data in a data cube, with > each hospital as a single plane. Currently, my hospitals are (x,y,1), > (x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the sum > of antibiotic1 in all hospitals. > > I imagine one way of doing this is having a hospital column in the data: > > date, hospital, antibiotic1, antibiotic2, antibiotic3... > 1-mmm-yy, hospital1, a11, a21, a31... > 1-mmm-yy, hospital2, a11, a21, a31... etc > > Two questions: > > 1) Is there a better way of storing the data than this? > 2) Is there an easy way to turn what I have into what I want? > > I know that once I have the data sorted, I'll be able to dpyl it into the > categories I currently use - it's the getting from here to there I need > help with, please. > > Cheers, > > Trent. > > > > > > > -- > -- > Trent Yarwood > trentyarwood at gmail.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear Trent If you want them side-by-side in one data frame then you could use merge making sure it only merges by date. I would use sub to change all the anitbiotic names by adding "h1" "h2" and so on. Then you can sum antibiotic over hospital by using grep to select all the columns containing antibiotic1. The side-by-side solution has some advantages over stacking them vertically and some disadvantages. You may need to do both for different purposes. You would need to learn about regular expressions if they are not already familiar to you to get the best out of sub and grep. On 05/08/2015 07:30, Trent Yarwood wrote:> Hi all, > > I'm responsible for collating data on antibiotic use at my local group of > hospitals. I have data for five different hospitals, about 40 different > antibiotics and monthly data going back to 2006. > > At the moment, I have this stored in 5 datafiles, one for each hospital, > formatted as follows: > > date, antibiotic1, antibiotic2, antibiotic3.... > 1-mmm-yy, ab11, ab21, ab31.... > 1-mmm-yy, ab12, ab22, ab32... > > This works most of the time for me, because the most common thing I need to > do is to track a particular hospital's antibiotic use over time (sum of > columns, as a time series by row). > > What I would like to do is to amalgamate the data so instead of analysing > an individual hospital (ie a datasheet in the current format) is to be able > to look at a particular antibiotic across the five hospitals. > > The best way I can visualise this is having the data in a data cube, with > each hospital as a single plane. Currently, my hospitals are (x,y,1), > (x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the sum > of antibiotic1 in all hospitals. > > I imagine one way of doing this is having a hospital column in the data: > > date, hospital, antibiotic1, antibiotic2, antibiotic3... > 1-mmm-yy, hospital1, a11, a21, a31... > 1-mmm-yy, hospital2, a11, a21, a31... etc > > Two questions: > > 1) Is there a better way of storing the data than this? > 2) Is there an easy way to turn what I have into what I want? > > I know that once I have the data sorted, I'll be able to dpyl it into the > categories I currently use - it's the getting from here to there I need > help with, please. > > Cheers, > > Trent. > > > > > >-- Michael http://www.dewey.myzen.co.uk/home.html