thr3ads.net - R help - [R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Maya Joshi

2011-Sep-03 03:18 UTC

[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

Dear R experts.

I might be missing something obvious. I have been trying to fix this problem
for some weeks. Please help.

#data
ped <- c(rep(1, 4), rep(2, 3), rep(3, 3))
y <- rnorm(10, 8, 2)

# variable set 1
M1a <- sample (c(1, 2,3), 10, replace= T)
M1b <- sample (c(1, 2,3), 10, replace= T)
M1aP1 <- sample (c(1, 2,3), 10, replace= T)
M1bP2 <- sample (c(1, 2,3), 10, replace= T)

# variable set 2
M2a <- sample (c(1, 2,3), 10, replace= T)
M2b <- sample (c(1, 2,3), 10, replace= T)
M2aP1 <- sample (c(1, 2,3), 10, replace= T)
M2bP2 <- sample (c(1, 2,3), 10, replace= T)

# variable set 3
M3a <- sample (c(1, 2,3), 10, replace= T)
M3b <- sample (c(1, 2,3), 10, replace= T)
M3aP1 <- sample (c(1, 2,3), 10, replace= T)
M3bP2 <- sample (c(1, 2,3), 10, replace= T)

mydf <- data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2,
M3a,M3b,M3aP1,M3bP2, y)

# functions and further calculations

mmat <- matrix
(c("M1a","M2a","M3a","M1b","M2b","M3b","M1aP1","M2aP1","M3aP1",
"M1bP2","M2bP2","M3bP2"), ncol = 4)

# first function
myfun <- function(x) {
x<- as.vector(x)
ot1 <- ifelse(mydf[x[1]] == mydf[x[3]], 1, -1)
ot2 <- ifelse(mydf[x[2]] == mydf[x[4]], 1, -1)
qt <- ot1 + ot2
return(qt)
}
qt <- apply(mmat, 1, myfun)
ydv <- c((y - mean(y))^2)
qtd <- data.frame(ped, ydv, qt)

# second function
myfun2 <- function(dataframe) {
vydv <- sum(ydv)*0.25
sumD <- sum(ydv * qt)
Rt <- vydv / sumD
return(Rt)
}

# using plyr
require(plyr)
dfsumd1 <- ddply(mydf,.(mydf$ped),myfun2)

Here are 2 issues:
(1) The output just one, I need the output for all three set of variables
(as listed above)

(2)  all three values of dfsumd is returning to same for all level of ped:
1,2, 3
Means that the function is applied to whole dataset but only replicated in
output !!!

I tried with plyr not being lazy but due to my limited R knowledge, If you
have a different suggestion, you are welcome too.

Thank you in advance...

Maya

	[[alternative HTML version deleted]]

David Winsemius

2011-Sep-03 04:28 UTC

head link

[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

On Sep 2, 2011, at 11:18 PM, Maya Joshi wrote:
> Dear R experts.
>
> I might be missing something obvious. I have been trying to fix this  
> problem
> for some weeks. Please help.
>
> #data
> ped <- c(rep(1, 4), rep(2, 3), rep(3, 3))
> y <- rnorm(10, 8, 2)
>
> # variable set 1
> M1a <- sample (c(1, 2,3), 10, replace= T)
> M1b <- sample (c(1, 2,3), 10, replace= T)
> M1aP1 <- sample (c(1, 2,3), 10, replace= T)
> M1bP2 <- sample (c(1, 2,3), 10, replace= T)
>
> # variable set 2
> M2a <- sample (c(1, 2,3), 10, replace= T)
> M2b <- sample (c(1, 2,3), 10, replace= T)
> M2aP1 <- sample (c(1, 2,3), 10, replace= T)
> M2bP2 <- sample (c(1, 2,3), 10, replace= T)
>
> # variable set 3
> M3a <- sample (c(1, 2,3), 10, replace= T)
> M3b <- sample (c(1, 2,3), 10, replace= T)
> M3aP1 <- sample (c(1, 2,3), 10, replace= T)
> M3bP2 <- sample (c(1, 2,3), 10, replace= T)
>
> mydf <- data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2,
> M3a,M3b,M3aP1,M3bP2, y)
>
> # functions and further calculations
>
> mmat <- matrix
>
(c("M1a","M2a","M3a","M1b","M2b","M3b","M1aP1","M2aP1","M3aP1",
> "M1bP2","M2bP2","M3bP2"), ncol = 4)
>
> # first function
> myfun <- function(x) {
> x<- as.vector(x)
> ot1 <- ifelse(mydf[x[1]] == mydf[x[3]], 1, -1)
You really ought to explain what you are trying to do. This code will  
compare two lists. The list from mydf[x2]] will be the column  
mydf["M2b"] which will have as its first element the four element  
vector assigned above. I am guessing that was not what you wanted.  
Notice this simple case using the "[" function as you are attempting  
throws an error:

 > list(M2b = c(1,2,3)) == list(M2a = c(1,2,3))
Error in list(M2b = c(1, 2, 3)) == list(M2a = c(1, 2, 3)) :
   comparison of these types is not implemented'

So ... now that your code has failed to explain what you wanted why  
don't your try some natural language explanations.

Notice that the "[[" function is generally what people want when  
extracting from lists:

 > list(M2b = c(1,2,3))[["M2b"]] == list(M2a =
c(1,2,3))[["M2a"]]
[1] TRUE TRUE TRUE

> ot2 <- ifelse(mydf[x[2]] == mydf[x[4]], 1, -1)
> qt <- ot1 + ot2
> return(qt)
> }
> qt <- apply(mmat, 1, myfun)
> ydv <- c((y - mean(y))^2)
> qtd <- data.frame(ped, ydv, qt)
>
> # second function
> myfun2 <- function(dataframe) {
> vydv <- sum(ydv)*0.25
> sumD <- sum(ydv * qt)
> Rt <- vydv / sumD
> return(Rt)
> }
>
> # using plyr
> require(plyr)
> dfsumd1 <- ddply(mydf,.(mydf$ped),myfun2)
>
> Here are 2 issues:
> (1) The output just one, I need the output for all three set of  
> variables
> (as listed above)
An incredibly vague description.
>
> (2)  all three values of dfsumd is returning to same for all level  
> of ped:
> 1,2, 3
I sympathize with those forced to adopt the English language, but that  
is the standard this decade. So givne your apparent difficulties, you  
need to exert more effort at making explicit what is _supposed_ to be  
returned.
> Means that the function is applied to whole dataset but only  
> replicated in
> output !!!
>
> I tried with plyr not being lazy but due to my limited R knowledge,  
> If you
> have a different suggestion, you are welcome too.
>
> Thank you in advance...
>
> Maya
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Dennis Murphy

2011-Sep-03 08:07 UTC

head link

[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

Hi:

I tried to figure out what you were doing...some of it I think I
grasped, other parts not so much.

On Fri, Sep 2, 2011 at 8:18 PM, Maya Joshi <maya.d.joshi at gmail.com>
wrote:> Dear R experts.
>
> I might be missing something obvious. I have been trying to fix this
problem
> for some weeks. Please help.

Let's start by reducing some of your code:

ped <- rep(1:3, c(4, 3, 3))
y <- rnorm(10, 8, 2)
# This replaces all of your sample() statements, and is equivalent:
smat <- matrix(sample(1:3, 120, replace = TRUE), ncol = 12)
colnames(smat) <- c('M1a', 'M1b', 'M1aP1',
'M1bP2',
                    'M2a', 'M2b', 'M2aP1',
'M2bP2',
                    'M3a', 'M3b', 'M3aP1',
'M3bP2')
mydf <- as.data.frame(cbind(ped, y, smat))
>
> #data
> ped <- c(rep(1, 4), rep(2, 3), rep(3, 3))
> y <- rnorm(10, 8, 2)
>
> # variable set 1
> M1a <- sample (c(1, 2,3), 10, replace= T)
> M1b <- sample (c(1, 2,3), 10, replace= T)
> M1aP1 <- sample (c(1, 2,3), 10, replace= T)
> M1bP2 <- sample (c(1, 2,3), 10, replace= T)
>
> # variable set 2
> M2a <- sample (c(1, 2,3), 10, replace= T)
> M2b <- sample (c(1, 2,3), 10, replace= T)
> M2aP1 <- sample (c(1, 2,3), 10, replace= T)
> M2bP2 <- sample (c(1, 2,3), 10, replace= T)
>
> # variable set 3
> M3a <- sample (c(1, 2,3), 10, replace= T)
> M3b <- sample (c(1, 2,3), 10, replace= T)
> M3aP1 <- sample (c(1, 2,3), 10, replace= T)
> M3bP2 <- sample (c(1, 2,3), 10, replace= T)
>
> mydf <- data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2,
> M3a,M3b,M3aP1,M3bP2, y)
>
> # functions and further calculations
>
> mmat <- matrix
>
(c("M1a","M2a","M3a","M1b","M2b","M3b","M1aP1","M2aP1","M3aP1",
> "M1bP2","M2bP2","M3bP2"), ncol = 4)
As far as I can tell, you want to compare a with aP1 and b with bP1
for all three M values. Given the way your data are arranged, each row
of the matrix smat contains all 12 values. apply() on index 1
generally takes a row vector as its input source. Your function to
pass to apply should respect that. My version of myfun() takes the row
vector as input, divides it into two subgroups of six, uses the
ifelse() function to do the comparisons and reshapes it into a 3 x 2
matrix, and then finally takes the row sums of the matrix, which is
the return value from the function.

myfun <- function(x) {
    # Indices of the input vector to be compared
    u <- c(1, 2, 5, 6, 9, 10)
    v <- c(3, 4, 7, 8, 11, 12)
    ot <- matrix(ifelse(x[u] == x[v], 1, -1), ncol = 2, byrow = TRUE)
    rowSums(ot)
   }

# Apply myfun() to the matrix of samples (smat)
qt <- t(apply(smat, 1, myfun))
colnames(qt) <- paste('M', 1L:3L, sep = '')
mydf2 <- data.frame(ped, y, as.data.frame(qt), Msum = rowSums(qt))

I wasn't sure what you wanted to do with M1-M3, so I added a row sum
variable just in case.>
> # first function
> myfun <- function(x) {
> x<- as.vector(x)
> ot1 <- ifelse(mydf[x[1]] == mydf[x[3]], 1, -1)
> ot2 <- ifelse(mydf[x[2]] == mydf[x[4]], 1, -1)
> qt <- ot1 + ot2
> return(qt)
> }
> qt <- apply(mmat, 1, myfun)
# Mean deviation vector for y - OK.> ydv <- c((y - mean(y))^2)
> qtd <- data.frame(ped, ydv, qt)
>
Here's the point where things get more confusing. I'm not sure what
you intend from sumD; as it stands, it will, in each row, multiply y
by the values in qt, which will generate an n x 3 matrix, n
representing the number of rows in the sub-data frame indexed by ped.
sumD will then sum the entire n x 3 matrix. Is that what you wanted?

More generally, when you write a function to pass to ddply(), the
input argument should be a data frame and the output should be a data
frame, especially if you intend to return more than one value. Part of
the problem with your function is that the variables inside the body
have no reference to the input data frame. Moreover, your mean squared
deviation function always uses 4 as the denominator rather than the
number of rows of the input data frame. (We'll ignore the bias in the
estimator since we're concentrating on the code.)

I have absolutely no idea if the following is what you intended, but
I'm showing this to illustrate how to create a function for input into
ddply() that can be applied groupwise.

myfun2 <- function(d) {
    # input argument d is a data frame
    # vector of squared mean deviations of y
    ydv <- with(d, (y - mean(y))^2)
    vydv <- mean(ydv)
    # multiply the squared deviations vector by the sum
    # of M1-M3 and then add the cross-products
    sumD <- sum(ydv * d$Msum)
    # or equivalently, as an inner product:
    # sumD <- ydv %*% d$Msum
    # Return the ratio as a data frame
    data.frame(Rt = vydv / sumD)
  }

ddply(mydf2, .(ped), myfun2)

## My result:> ddply(mydf2, .(ped), myfun2)  ped          Rt
1   1  0.35787388
2   2 -0.17739049
3   3 -0.09958257

I'm reasonably sure that myfun() is OK, but myfun2() contains more
than a little guesswork. Assuming it's wrong, what the code can tell
you is how to pass the variables in from the input data frame and to
output a data frame when called within ddply().

HTH,
Dennis

> # second function
> myfun2 <- function(dataframe) {
> vydv <- sum(ydv)*0.25
> sumD <- sum(ydv * qt)
> Rt <- vydv / sumD
> return(Rt)
> }
>
> # using plyr
> require(plyr)
> dfsumd1 <- ddply(mydf,.(mydf$ped),myfun2)
>
> Here are 2 issues:
> (1) The output just one, I need the output for all three set of variables
> (as listed above)
>
> (2) ?all three values of dfsumd is returning to same for all level of ped:
> 1,2, 3
> Means that the function is applied to whole dataset but only replicated in
> output !!!
>
> I tried with plyr not being lazy but due to my limited R knowledge, If you
> have a different suggestion, you are welcome too.
>
> Thank you in advance...
>
> Maya
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more maybe matching threads

R help - Sep 2011 - problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

Reasonably Related Threads