I am trying to calculate quantiles of a data frame column split up by
two factors:
# Calculate the quantiles
quarts = tapply(gdf$tt, list(gdf$Runway, gdf$OnHour), FUN=quantile,
na.rm = TRUE)
This does not work:> quarts
04L 04R 15R 22L 22R 27 32
33L 33R
0 NULL Numeric,5 NULL Numeric,5 NULL Numeric,5 NULL
Numeric,5 NULL
1 NULL Numeric,5 NULL Numeric,5 NULL NULL NULL
Numeric,5 NULL
2 NULL NULL NULL Numeric,5 NULL NULL NULL
NULL NULL
3 NULL NULL NULL NULL NULL NULL NULL
Numeric,5 NULL
4 NULL NULL NULL NULL NULL NULL NULL
NULL NULL
5 NULL NULL NULL NULL NULL NULL NULL
NULL NULL
6 NULL NULL NULL NULL NULL NULL NULL
NULL NULL
7 NULL Numeric,5 NULL NULL NULL Numeric,5 NULL
Numeric,5 NULL
8 NULL Numeric,5 NULL Numeric,5 NULL Numeric,5 NULL
Numeric,5 NULL
. . .
But if I leave out either of the two factors, it does
work> quarts = tapply(gdf$tt, list(gdf$Runway), FUN=quantile, na.rm = TRUE)
> quarts
$`04L`
0% 25% 50% 75% 100%
4 8 9 10 20
$`04R`
0% 25% 50% 75% 100%
0 9 10 11 28
. . . .
How can I get this to work?
Thanks,
Jim Rome
Hi James,
I don't know how to solve it with "tapply" (something with split I
think..), but you could use "plyr" (from Hadley Wickham).
library(plyr)
# Generate some data
set.seed(321)
myD <- data.frame(
Place = sample(c("AWQ","DFR", "WEQ"), 10,
replace=T),
Light = sample(LETTERS[1:2], 15, replace=T),
value=rnorm(30)
)
myD[c(3,12,29), "value"] <- NA
# data.frame to data.frame
ddply(myD, .(Place, Light), summarise,
quan_value = quantile(value, na.rm=TRUE))
# data.frame to list
quant <- function(df) quantile(df$value, na.rm=TRUE)
dlply(myD, .(Place, Light), quant)
Cheers
Patrick
Am 09.04.2010 03:24, schrieb James Rome:> I am trying to calculate quantiles of a data frame column split up by
> two factors:
> # Calculate the quantiles
> quarts = tapply(gdf$tt, list(gdf$Runway, gdf$OnHour), FUN=quantile,
> na.rm = TRUE)
> This does not work:
>> quarts
> 04L 04R 15R 22L 22R 27 32
> 33L 33R
> 0 NULL Numeric,5 NULL Numeric,5 NULL Numeric,5 NULL
> Numeric,5 NULL
> 1 NULL Numeric,5 NULL Numeric,5 NULL NULL NULL
> Numeric,5 NULL
> 2 NULL NULL NULL Numeric,5 NULL NULL NULL
> NULL NULL
> 3 NULL NULL NULL NULL NULL NULL NULL
> Numeric,5 NULL
> 4 NULL NULL NULL NULL NULL NULL NULL
> NULL NULL
> 5 NULL NULL NULL NULL NULL NULL NULL
> NULL NULL
> 6 NULL NULL NULL NULL NULL NULL NULL
> NULL NULL
> 7 NULL Numeric,5 NULL NULL NULL Numeric,5 NULL
> Numeric,5 NULL
> 8 NULL Numeric,5 NULL Numeric,5 NULL Numeric,5 NULL
> Numeric,5 NULL
> . . .
>
> But if I leave out either of the two factors, it does work
>> quarts = tapply(gdf$tt, list(gdf$Runway), FUN=quantile, na.rm = TRUE)
>> quarts
> $`04L`
> 0% 25% 50% 75% 100%
> 4 8 9 10 20
>
> $`04R`
> 0% 25% 50% 75% 100%
> 0 9 10 11 28
> . . . .
>
> How can I get this to work?
>
> Thanks,
> Jim Rome
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
On Thu, 8 Apr 2010, James Rome wrote:> I am trying to calculate quantiles of a data frame column split up by > two factors: > # Calculate the quantiles > quarts = tapply(gdf$tt, list(gdf$Runway, gdf$OnHour), FUN=quantile, > na.rm = TRUE) > This does not work:It seems like it did work. It returned a matrix list of the results, some of which are NULL and some of which are numeric vectors of length 5. Try str( quarts ) to get a sense of what is going on. HTH, Chuck p.s. providing commented, minimal, self-contained, reproducible code (as requested) will give you more informative answers.>> quarts > 04L 04R 15R 22L 22R 27 32 > 33L 33R > 0 NULL Numeric,5 NULL Numeric,5 NULL Numeric,5 NULL > Numeric,5 NULL > 1 NULL Numeric,5 NULL Numeric,5 NULL NULL NULL > Numeric,5 NULL > 2 NULL NULL NULL Numeric,5 NULL NULL NULL > NULL NULL > 3 NULL NULL NULL NULL NULL NULL NULL > Numeric,5 NULL > 4 NULL NULL NULL NULL NULL NULL NULL > NULL NULL > 5 NULL NULL NULL NULL NULL NULL NULL > NULL NULL > 6 NULL NULL NULL NULL NULL NULL NULL > NULL NULL > 7 NULL Numeric,5 NULL NULL NULL Numeric,5 NULL > Numeric,5 NULL > 8 NULL Numeric,5 NULL Numeric,5 NULL Numeric,5 NULL > Numeric,5 NULL > . . . > > But if I leave out either of the two factors, it does work >> quarts = tapply(gdf$tt, list(gdf$Runway), FUN=quantile, na.rm = TRUE) >> quarts > $`04L` > 0% 25% 50% 75% 100% > 4 8 9 10 20 > > $`04R` > 0% 25% 50% 75% 100% > 0 9 10 11 28 > . . . . > > How can I get this to work? > > Thanks, > Jim Rome > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901