Alex van der Spek
2012-Oct-10 12:47 UTC
[R] Summary using by() returns character arrays in a list
I use by() to generate a summary statistics like so: Lbys <- by(dat[Nidx], dat$LipTest, summary) where Nidx is an index vector with names picking out the columns in the data frame dat. This returns a list of character arrays (see below for str() output) where the columns are named correctly but the rownames are empty strings and the values are strings prepended with the summary statistic's name (e.g. "Min.", "Median "). I am reading the code of summary.data.frame() but can't figure out how I can change the action of that function to return list of numeric matrices with as rownames the summary statistic's name ("Min.", "Max." etc) and as values the numeric values of the calculated summary statistic. Any help much appreciated! Regards, Alex van der Spek> str(Lbys)List of 2 $ : 'table' chr [1:6, 1:19] "Min. :-0.190 " "1st Qu.: 9.297 " "Median :10.373 " "Mean :10.100 " ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:6] "" "" "" "" ... .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms." "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ... $ T38: 'table' chr [1:6, 1:19] "Min. :8.648 " "1st Qu.:8.920 " "Median :9.018 " "Mean :9.027 " ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:6] "" "" "" "" ... .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms." "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ... - attr(*, "dim")= int 2 - attr(*, "dimnames")=List of 1 ..$ dat$LipTest: chr [1:2] "" "T38" - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES dat$LipTest, FUN = summary) - attr(*, "class")= chr "by"
PIKAL Petr
2012-Oct-10 13:43 UTC
[R] Summary using by() returns character arrays in a list
Hi> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Alex van der Spek > Sent: Wednesday, October 10, 2012 2:48 PM > To: r-help at r-project.org > Subject: [R] Summary using by() returns character arrays in a list > > I use by() to generate a summary statistics like so: > > Lbys <- by(dat[Nidx], dat$LipTest, summary) > > where Nidx is an index vector with names picking out the columns in the > data frame dat. > > This returns a list of character arrays (see below for str() output) > where the columns are named correctly but the rownames are empty > strings and the values are strings prepended with the summary > statistic's name (e.g. > "Min.", "Median ").Without knowledge of your data it is difficult to understand what is wrong. If I use iris data set as input everything goes as expected data(iris)> summary(iris)Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 Species setosa :50 versicolor:50 virginica :50> by(iris, iris$Species, summary)iris$Species: setosa Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200 Median :5.000 Median :3.400 Median :1.500 Median :0.200 Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300 Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600 Species setosa :50 versicolor: 0 virginica : 0> > I am reading the code of summary.data.frame() but can't figure out how > I can change the action of that function to return list of numeric > matrices with as rownames the summary statistic's name ("Min.", "Max." > etc) and as values the numeric values of the calculated summary > statistic.Just what do you not like on such output and how do you want the output structured? Maybe you want aggregate, but without simple data it is hard to say. aggregate(iris[1:2], list(iris$Species), summary) Regards Petr> > Any help much appreciated! > Regards, > Alex van der Spek > > > > str(Lbys) > List of 2 > $ : 'table' chr [1:6, 1:19] "Min. :-0.190 " "1st Qu.: 9.297 " > "Median :10.373 " "Mean :10.100 " ... > ..- attr(*, "dimnames")=List of 2 > .. ..$ : chr [1:6] "" "" "" "" ... > .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms." > "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ... > $ T38: 'table' chr [1:6, 1:19] "Min. :8.648 " "1st Qu.:8.920 " > "Median :9.018 " "Mean :9.027 " ... > ..- attr(*, "dimnames")=List of 2 > .. ..$ : chr [1:6] "" "" "" "" ... > .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms." > "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ... > - attr(*, "dim")= int 2 > - attr(*, "dimnames")=List of 1 > ..$ dat$LipTest: chr [1:2] "" "T38" > - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES > dat$LipTest, FUN = summary) > - attr(*, "class")= chr "by" > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
HI, May be this helps you: Using the dataset iris: by.list<-by(iris, iris$Species, summary) dat1<-do.call(rbind,lapply(by.list,function(x) gsub(".*\\:","",x))) row.names(dat1)<-paste(rep(unlist(dimnames(by.list),use.names=F),each=6),unlist(lapply(lapply(by.list,`[`,1:6),function(x) gsub("\\:.*","",x)),use.names=F),sep=":") ?dat2<-data.frame(dat1) colnames(dat2)<-colnames(dat1) dat2[]<-sapply(dat2,function(x) as.numeric(as.character(x))) ?head(dat2,8) #??????????????????? Sepal.Length? Sepal.Width? Petal.Length? Petal.Width #setosa:Min.??????????????? 4.300??????? 2.300???????? 1.000??????? 0.100 #setosa:1st Qu.???????????? 4.800??????? 3.200???????? 1.400??????? 0.200 #setosa:Median????????????? 5.000??????? 3.400???????? 1.500??????? 0.200 #setosa:Mean??????????????? 5.006??????? 3.428???????? 1.462??????? 0.246 #setosa:3rd Qu.???????????? 5.200??????? 3.675???????? 1.575??????? 0.300 #setosa:Max.??????????????? 5.800??????? 4.400???????? 1.900??????? 0.600 #versicolor:Min.??????????? 4.900??????? 2.000???????? 3.000??????? 1.000 #versicolor:1st Qu.???????? 5.600??????? 2.525???????? 4.000??????? 1.200 ???????????????????????? Species #setosa:Min.?????????????????? 50 #setosa:1st Qu.???????????????? 0 #setosa:Median????????????????? 0 #setosa:Mean?????????????????? NA #setosa:3rd Qu.??????????????? NA #setosa:Max.?????????????????? NA #versicolor:Min.??????????????? 0 #versicolor:1st Qu.??????????? 50 ?str(dat2) #'data.frame':??? 18 obs. of? 5 variables: # $? Sepal.Length: num? 4.3 4.8 5 5.01 5.2 ... # $? Sepal.Width : num? 2.3 3.2 3.4 3.43 3.67 ... # $? Petal.Length: num? 1 1.4 1.5 1.46 1.57 ... # $? Petal.Width : num? 0.1 0.2 0.2 0.246 0.3 ... ?#$?????? Species: num? 50 0 0 NA NA NA 0 50 0 NA ... Not sure, if you need the last column. I agree that aggregate() or ddply() will be easier. A.K. ----- Original Message ----- From: Alex van der Spek <doorz at xs4all.nl> To: r-help at r-project.org Cc: Sent: Wednesday, October 10, 2012 8:47 AM Subject: [R] Summary using by() returns character arrays in a list I use by() to generate a summary statistics like so: Lbys <- by(dat[Nidx], dat$LipTest, summary) where Nidx is an index vector with names picking out the columns in the data frame dat. This returns a list of character arrays (see below for str() output) where the columns are named correctly but the rownames are empty strings and the values are strings prepended with the summary statistic's name (e.g. "Min.", "Median "). I am reading the code of summary.data.frame() but can't figure out how I can change the action of that function to return list of numeric matrices with as rownames the summary statistic's name ("Min.", "Max." etc) and as values the numeric values of the calculated summary statistic. Any help much appreciated! Regards, Alex van der Spek> str(Lbys)List of 2 $? ? : 'table' chr [1:6, 1:19] "Min.? :-0.190? " "1st Qu.: 9.297? " "Median :10.373? " "Mean? :10.100? " ... ? ..- attr(*, "dimnames")=List of 2 ? .. ..$ : chr [1:6] "" "" "" "" ... ? .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms." "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ... $ T38: 'table' chr [1:6, 1:19] "Min.? :8.648? " "1st Qu.:8.920? " "Median :9.018? " "Mean? :9.027? " ... ? ..- attr(*, "dimnames")=List of 2 ? .. ..$ : chr [1:6] "" "" "" "" ... ? .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms." "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ... - attr(*, "dim")= int 2 - attr(*, "dimnames")=List of 1 ? ..$ dat$LipTest: chr [1:2] "" "T38" - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES dat$LipTest, FUN = summary) - attr(*, "class")= chr "by" ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.