Haven't quite learned to 'cast' yet, but I have always used the
'apply'
functions for this type of processing:
> x <- "id patient_id date code class eala
+ ID1564262 1562 6.4.200612:00 5555 1 NA
+ ID1564262 1562 6.4.200612:00 5555 1 NA
+ ID1564264 1365 14.2.200614:35 5555 1 50
+ ID1564265 1342 7.4.200614:30 2222 2 50
+ ID1564266 1648 7.4.200614:30 2222 2 50
+ ID1564267 1263 10.2.200615:45 2222 2 10
+ ID1564267 1263 10.2.200615:45 3333 3 10
+ ID1564269 5646 13.5.200617:02 3333 3 10
+ ID1564270 7561 13.5.200617:02 6666 1 10
+ ID1564271 1676 15.5.200620:41 2222 2 20">
> x.in <- read.table(textConnection(x), header=TRUE)
> # 'by' seems to drop NAs so convert to a character string for
processing
> x.in$eala <- ifelse(is.na(x.in$eala), "NA",
as.character(x.in$eala))
> # convert date to POSIXlt so we can access the year and month
> myDate <- strptime(x.in$date, "%d.%m.%Y%H:%M")
> x.in$year <- myDate$year + 1900
> x.in$month <- myDate$mon+1
> # split the data by eala, year, month and summarize
> x.by <- by(x.in, list(x.in$eala, x.in$year, x.in$month), function(x){
+ data.frame(eala=x$eala[1], month=x$month[1], year=x$year[1],
+ icount=length(unique(x$id)), pcount=length(unique(x$patient_id)),
+ count1=sum(x$class == 1), count2=sum(x$class == 2),
count3=sum(x$class == 3))
+ })> # convert back to a data frame
> do.call(rbind, x.by)
eala month year icount pcount count1 count2 count3
1 10 2 2006 1 1 0 1 1
2 50 2 2006 1 1 1 0 0
3 50 4 2006 2 2 0 2 0
4 NA 4 2006 1 1 2 0 0
5 10 5 2006 2 2 1 0 1
6 20 5 2006 1 1 0 1 0>
>
On 2/20/07, Lauri Nikkinen <lauri.nikkinen@iki.fi>
wrote:>
> Hi R-users,
>
> I have a data set like this (first ten rows):
>
> id patient_id date code class eala ID1564262 1562 6.4.2006 12:00 5555 1
> NA ID1564262 1562 6.4.2006 12:00 5555 1 NA ID1564264 1365 14.2.2006 14:35
> 5555 1 50 ID1564265 1342 7.4.2006 14:30 2222 2 50 ID1564266 1648
> 7.4.200614:30
> 2222 2 50 ID1564267 1263 10.2.2006 15:45 2222 2 10 ID1564267 1263
> 10.2.200615:45
> 3333 3 10 ID1564269 5646 13.5.2006 17:02 3333 3 10 ID1564270 7561
> 13.5.200617:02
> 6666 1 10 ID1564271 1676 15.5.2006 20:41 2222 2 20
>
> How can I do a new (pivot?) data.frame in R which I can achieve by MS SQL:
>
> select eala,
> datepart(month, date) as month,
> datepart(year, date) as year,
> count(distinct id) as id_count,
> count(distinct patient_id) as patient_count,
> count(distinct(case when class = 1 then code else null end)) as count_1,
> count(distinct(case when class = 2 then code else null end)) as count_2,
> count(distinct(case when class = 3 then code else null end)) as count_3,
> into temp2
> from temp1
> group by datepart(month, date), datepart(year, date), eala
> order by datepart(month, date), datepart(year, date), eala
>
> I tried something like this but could not go further:
>
> stats <- function(x) {
> count <- function(x) length(na.omit(x))
> c(
> n = count(x),
> uniikit = length(unique(x))
> )
> }
> library(reshape)
> attach(dframe)
> dfm <- melt(dframe,
measure.var=c("id","patient_id"), id.var=c
> ("code",""this
> should be month"",""this should be year),
variable_name="variable")
>
> cast(dfm, code + month + year ~ variable, stats)
>
> Regards,
>
> Lauri
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
[[alternative HTML version deleted]]