thr3ads.net - R help - [R] Advice on data format [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Trent Yarwood

2015-Aug-05 06:30 UTC

[R] Advice on data format

Hi all,

I'm responsible for collating data on antibiotic use at my local group of
hospitals.  I have data for five different hospitals, about 40 different
antibiotics and monthly data going back to 2006.

At the moment, I have this stored in 5 datafiles, one for each hospital,
formatted as follows:

date, antibiotic1, antibiotic2, antibiotic3....
1-mmm-yy, ab11, ab21, ab31....
1-mmm-yy, ab12, ab22, ab32...

This works most of the time for me, because the most common thing I need to
do is to track a particular hospital's antibiotic use over time (sum of
columns, as a time series by row).

What I would like to do is to amalgamate the data so instead of analysing
an individual hospital (ie a datasheet in the current format) is to be able
to look at a particular antibiotic across the five hospitals.

The best way I can visualise this is having the data in a data cube, with
each hospital as a single plane. Currently, my hospitals are (x,y,1),
(x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the sum
of antibiotic1 in all hospitals.

I imagine one way of doing this is having a hospital column in the data:

date, hospital, antibiotic1, antibiotic2, antibiotic3...
1-mmm-yy, hospital1, a11, a21, a31...
1-mmm-yy, hospital2, a11, a21, a31... etc

Two questions:

1) Is there a better way of storing the data than this?
2) Is there an easy way to turn what I have into what I want?

I know that once I have the data sorted, I'll be able to dpyl it into the
categories I currently use - it's the getting from here to there I need
help with, please.

Cheers,

Trent.






-- 
-- 
Trent Yarwood
trentyarwood at gmail.com

	[[alternative HTML version deleted]]

Jim Lemon

2015-Aug-05 10:05 UTC

head link

[R] Advice on data format

Hi Trent,
I may be missing something, but I think that if you simply add your
column with the hospital code to each of the five data frames:

hospital1.df$hospital<-"hospital1"
hospital2.df$hospital<-"hospital2"
...

and then concatenate (rbind) them you can get what you want without
going to a 3D array.

Jim


On Wed, Aug 5, 2015 at 4:30 PM, Trent Yarwood <trentyarwood at gmail.com>
wrote:> Hi all,
>
> I'm responsible for collating data on antibiotic use at my local group
of
> hospitals.  I have data for five different hospitals, about 40 different
> antibiotics and monthly data going back to 2006.
>
> At the moment, I have this stored in 5 datafiles, one for each hospital,
> formatted as follows:
>
> date, antibiotic1, antibiotic2, antibiotic3....
> 1-mmm-yy, ab11, ab21, ab31....
> 1-mmm-yy, ab12, ab22, ab32...
>
> This works most of the time for me, because the most common thing I need to
> do is to track a particular hospital's antibiotic use over time (sum of
> columns, as a time series by row).
>
> What I would like to do is to amalgamate the data so instead of analysing
> an individual hospital (ie a datasheet in the current format) is to be able
> to look at a particular antibiotic across the five hospitals.
>
> The best way I can visualise this is having the data in a data cube, with
> each hospital as a single plane. Currently, my hospitals are (x,y,1),
> (x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the
sum
> of antibiotic1 in all hospitals.
>
> I imagine one way of doing this is having a hospital column in the data:
>
> date, hospital, antibiotic1, antibiotic2, antibiotic3...
> 1-mmm-yy, hospital1, a11, a21, a31...
> 1-mmm-yy, hospital2, a11, a21, a31... etc
>
> Two questions:
>
> 1) Is there a better way of storing the data than this?
> 2) Is there an easy way to turn what I have into what I want?
>
> I know that once I have the data sorted, I'll be able to dpyl it into
the
> categories I currently use - it's the getting from here to there I need
> help with, please.
>
> Cheers,
>
> Trent.
>
>
>
>
>
>
> --
> --
> Trent Yarwood
> trentyarwood at gmail.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Michael Dewey

2015-Aug-05 12:50 UTC

head link

[R] Advice on data format

Dear Trent

If you want them side-by-side in one data frame then you could use merge 
making sure it only merges by date. I would use sub to change all the 
anitbiotic names by adding "h1" "h2" and so on. Then you can
sum
antibiotic over hospital by using grep to select all the columns 
containing antibiotic1. The side-by-side solution has some advantages 
over stacking them vertically and some disadvantages. You may need to do 
both for different purposes.

You would need to learn about regular expressions if they are not 
already familiar to you to get the best out of sub and grep.

On 05/08/2015 07:30, Trent Yarwood wrote:> Hi all,
>
> I'm responsible for collating data on antibiotic use at my local group
of
> hospitals.  I have data for five different hospitals, about 40 different
> antibiotics and monthly data going back to 2006.
>
> At the moment, I have this stored in 5 datafiles, one for each hospital,
> formatted as follows:
>
> date, antibiotic1, antibiotic2, antibiotic3....
> 1-mmm-yy, ab11, ab21, ab31....
> 1-mmm-yy, ab12, ab22, ab32...
>
> This works most of the time for me, because the most common thing I need to
> do is to track a particular hospital's antibiotic use over time (sum of
> columns, as a time series by row).
>
> What I would like to do is to amalgamate the data so instead of analysing
> an individual hospital (ie a datasheet in the current format) is to be able
> to look at a particular antibiotic across the five hospitals.
>
> The best way I can visualise this is having the data in a data cube, with
> each hospital as a single plane. Currently, my hospitals are (x,y,1),
> (x,y,2) etc. What I'd like to do is look at (2,y,z) - for example, the
sum
> of antibiotic1 in all hospitals.
>
> I imagine one way of doing this is having a hospital column in the data:
>
> date, hospital, antibiotic1, antibiotic2, antibiotic3...
> 1-mmm-yy, hospital1, a11, a21, a31...
> 1-mmm-yy, hospital2, a11, a21, a31... etc
>
> Two questions:
>
> 1) Is there a better way of storing the data than this?
> 2) Is there an easy way to turn what I have into what I want?
>
> I know that once I have the data sorted, I'll be able to dpyl it into
the
> categories I currently use - it's the getting from here to there I need
> help with, please.
>
> Cheers,
>
> Trent.
>
>
>
>
>
>
-- 
Michael
http://www.dewey.myzen.co.uk/home.html

R help - Aug 2015 - Advice on data format

[R] Advice on data format

[R] Advice on data format

[R] Advice on data format