Hi,
Thanks for including code and data so that we could reproduce what you're
doing.
Your problem is that you tell ddply to split the dataset by runNumber and
cat1, which results in 4 groups. ddply then applies my.summary() to these
four groups. One of these groups (cat1 = 1 and runNumber=1) has both
start.loc and end.loc, as it contains rows which has start=TRUE and
end=TRUE. This group will work fine. The other three groups, however, are
broken. The group with cat1 = 2 and runNumber = 1 has neither start.loc nor
end.loc, while the two groups with runNumber = 2 each have only one of the
two. The error disappears if you split the dataset only by runNumber, as
then each group has both start.loc and end.loc.
If you want to apply my.summary() to each of these four groups, you're going
to have to fix the earlier code that assigns the start and end variables.
Jonathan
On Wed, Jul 28, 2010 at 7:59 AM, jd6688 <jdsignature@gmail.com> wrote:
>
> mydata <- read.table(textConnection("
>
> Id cat1 location item_values p-values sequence
> a111 1 3002737 100 0.01 1
> a112 1 3017821 102 0.05 2
> a113 2 3027730 103 0.02 3
> a114 2 3036220 104 0.04 4
> a115 1 3053984 105 0.03 5
> a118 1 3090500 106 0.02 8
> a119 1 3103304 107 0.03 9
> a120 2 3090500 106 0.02 10
> a121 2 3103304 107 0.03 11
>
> "), header = TRUE)
>
> closeAllConnections()
>
>
>
> first <- function(x)c(TRUE, diff(x)!=1)
>
>
>
>
> last <- function(x)c(diff(x)!=1, TRUE)
>
>
>
> mydata$start <- first(mydata$sequence)
> mydata$end <- last(mydata$sequence)
>
> mydata$runNumber <- cumsum(first(mydata$sequence))
>
> #load library
> library(plyr)
>
>
> ddply(mydata[, -1], .(runNumber,cat1), function(x) {max(x$item_values)})
>
>
>
> my.summary <- function(x) {
> start.loc <- x$location[which(x$start == TRUE)]
> end.loc <- x$location[which(x$end == TRUE)]
> peak <- max(x$item_values)
> output <- data.frame(
> start_of_the_location = start.loc,
> end_of_the_location = end.loc,
> peak_value = peak)
> return(output)
> }
>
>
> ddply(mydata[, -1], .(runNumber,cat1), my.summary)
>
> why ddply returned the following error
>
> Error in data.frame(start_of_the_location = start.loc, end_of_the_location
> = end.loc, :
> arguments imply differing number of rows: 0, 1
> > mydata[,-1]
> cat1 location item_values p.values sequence start end runNumber
> 1 1 3002737 100 0.01 1 TRUE FALSE 1
> 2 1 3017821 102 0.05 2 FALSE FALSE 1
> 3 2 3027730 103 0.02 3 FALSE FALSE 1
> 4 2 3036220 104 0.04 4 FALSE FALSE 1
> 5 1 3053984 105 0.03 5 FALSE TRUE 1
> 6 1 3090500 106 0.02 8 TRUE FALSE 2
> 7 1 3103304 107 0.03 9 FALSE FALSE 2
> 8 2 3090500 106 0.02 10 FALSE FALSE 2
> 9 2 3103304 107 0.03 11 FALSE TRUE 2
> >
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/error-arguments-imply-differing-number-tp2305014p2305014.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]