Re: A. Mani : Avoiding loops (Petr Pikal)> Message: 9
> Date: Mon, 22 Aug 2005 06:40:45 +0200
> From: "Petr Pikal" <petr.pikal at precheza.cz>
> Subject: Re: [R] A. Mani : Avoiding loops
> To: "A. Mani" <a_mani_sc_gs at vsnl.net>, r-help
> <r-help at stat.math.ethz.ch>
>
> On 20 Aug 2005 at 3:26, A. Mani wrote:
>
> > On Friday 19 August 2005 11:54, Sean O'Riordain wrote:
> > > Hi,
> > > I'm not sure what you actually want from your email
(following the
> > > posting guide is a good way of helping you explain things to the
> > > rest of us in a way we understand - it might even answer your
> > > question!
> > >
> > > I'm only a beginner at R so no doubt one of our expert
colleagues
> > > will help me...
> > >
> > > > fred <- data.frame()
> > > > fred <- edit(fred)
> > > > fred
> > >
> > > A B C D E
> > > 1 1 2 X Y 1
> > > 2 2 3 G L 1
> > > 3 3 1 G L 5
> > >
> > > > fred[,3]
> > >
> > > [1] X G G
> > > Levels: G X
> > >
> > > > fred[fred[,3]=="G",]
> > >
> > > A B C D E
> > > 2 2 3 G L 1
> > > 3 3 1 G L 5
> > >
> > > so at this point I can create a new dataframe with column 3 (C)
=> > > "G"; either explicitly or implicitly...
> > >
> > > and if I want to calculate the sum() of column E, then I just say
> > > something like...
> > >
> > > > sum(fred[fred[,3]=="G",][,5])
> > >
> > > [1] 6
> > >
> > >
> > > now naturally being a bit clueless at manipulating stuff in R, I
> > > didn't know how to do this before I started... and you guys
only get
> > > to see the lines that I typed in and got a "successful"
result...
> > >
> > > according to section 6 of the "Introduction to R"
manual which comes
> > > with R, I could also have said
> > >
> > > > sum(fred[fred$C=="G",]$E)
> > >
> > > [1] 6
> > >
> > > Hmmm.... I wonder would it be reasonable to put an example of
this
> > > type into section 2.7 of the "Introduction to R"?
> > >
> > >
> > > cheers!
> > > Sean
> > >
> > > On 18/08/05, A. Mani <a_mani_sc_gs at vsnl.net> wrote:
> > > > Hello,
> > > > I want to avoid loops in the following situation.
There is
> > > > a
> > > > 5-col dataframe with col headers alone. two of the columns
are
> > > > non-numeric. The problem is to calculate statistics(scores)
for
> > > > each element of one column. The functions depend on matching
in
> > > > the other non-numeric column.
> > > >
> > > > A B C E F
> > > > 1 2 X Y 1
> > > > 2 3 G L 1
> > > > 3 1 G L 5
> > > > and so on ...30000+ entries.
> > > >
> > > > I need scores for col E entries which depend on conditional
> > > > implications.
> > > >
> > > >
> > > > Thanks,
> > > >
> > Hello,
> > Sorry about the incomplete problem. Here is a better version for
> > the
> > problem: (the measure is not simple)
> > The data frame is like
> > col1 col2 col3 col4 col5
> > <num> <nonum> <nonum> <num>
<num>
> > A B C E F
> > There are repeated strings in col3, col2. Problem : Calculate
> > Measure(Ci) = [No. of repeats of Ci *100] + [If (Bi, Ci) is same as
> > (Bj, Cj) and 6>= Ej - Ei >=3 then add 100 else 10] .
>
> Hi
>
> I am not sure what exactly you would like to compute,
> **working** example could help. But if you want to do some
> computation for row "i" which depends on row "j", I
suppose that
> you can not avoid loops.
>
> Generally you can use one of aggregate, tapply, by or ave for some
> computation split by factor. See help pages.
>
> tapply(vector or data frame, list(factors), function)
>
> is the standard form.
>
> HTH
> Petr
>
>
> >
> >
> > Actually it is to stretched further by adding similar blocks.
> >
> > How do we use *apply or
> > something else in the situation ?
> >
> >
> > In prolog it is extremely easy, but here it is not quite...
> >
> >
Here is some code and a little data
dat <- read.table("/home/project5R/datasplf.csv", header=TRUE,
sep=",", na.strings="NA", dec=".",
strip.white=TRUE)
attach(dat)
showData(dat, placement='-20+200', font=.logFont, maxwidth=80,
maxheight=30)
x <- as.matrix(dat)
x1 <- as.vector(x[,1])
xd1 <- as.Date(x1, format= "%m-%d-%Y")
n <- length(x1)
n
x2 <- as.vector(x[,2])
length(x2)
x3 <- as.vector(x[,3])
length(x3)
x4 <- as.vector(x[,4])
x5 <- as.vector(x[,5])
x5[is.na(x5)] <- 0
xd4 <- as.Date(x4, format= "%m-%d-%Y")
xd4
p6 <- (1-(abs(x5 - 6)/6))*100
p6
xd1 <- as.Date(x1, format= "%m-%d-%Y")
xd1
x23 <- cbind(x2,x3)
xp <- paste(x2,x3)
xp
y <- cbind(x23,xd4,xd1,xp)
_____________________________________________________________
#The Score to be computed is for the doctors. It is no. of patients *100 + rate
of decrease of diabetic score *1000 + no.of tests at approx 3 months *....(see
below )
_____________________________________________________________
# To be debugged (loops)
sc <- vector(n, mode = "numeric")
for (i in 1:n){for(j in 1:n) {If identical(x3[[i]],x3[[j]]) &
identical(x2[[i]],x2[[j]])}
sc[[i]] <- sc[[i]] + 100 else sc[[i]] <- sc[[i]] +0 }
sc
scf <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) &
identical(x2[[i]],x2[[j]]) &
abs(1-(abs(xd4[[i]]-xd4[[j]]))/90) <= 1.25)} scf[[i]] <- scf[[i]] +
100 else scf[[i]] <- scf[i] +0
scr <- vector(0, length= n, mode = "numeric", step=0)
for (i,j in 1:n) {If (identical(x3[[i]],x3[[j]]) &
identical(x2[[i]],x2[[j]])}
scr[[i]] <- ((abs(x5[[i]]-x5[[j]]))/(abs(xd4[[i]]-xd4[[j]]))) *1000 +
scr[[i]]
sce <- vector(0, length= n, mode = "numeric", step=0)
for (i in 1:n) {sce[[i]] <- sce[[i]] + (1 - abs(x5[[i]]- 6)/6)*100}
se <- scf + sce + scr + sc
score <- cbind(x3, se)
____________________________
DATA
"DOB","ID","DOCTOR","DATE of
TEST","TEST1"
12-23-1921,2177532.174,NA,01-20-2003,NA
NA,2358368.261,"152N7R",01-26-2003,NA
NA,2358368.261,"152N7R",01-27-2003,NA
07-24-1938,2174903.913,NA,01-31-2003,6.7
12-25-1924,2185493.043,NA,01-31-2003,NA
07-21-1943,2181658.696,"K9PL9N,L",01-28-2003,7
05-24-1938,2306571.304,"SH7RM9N",01-13-2003,NA
07-29-1949,2296516.522,"H3001FR9",01-20-2003,NA
Thanks,
A. Mani
Member, Cal. Math. Soc