Dear all, I would like to aggregate a data frame (consisting of 2 columns - one for the bins, say factors, and one for the values) along bins and quantiles within the bins. I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=cut2(data.frame$bin,g=10)),sum) but then the quantiles apply to the population as a whole and not the individual bins. Upon this realisation I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum) which gives the following error: Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? clearly I am doing something wrong, but cannot figure out what. I believe the error stems either from a. the output of tapply being a list of a dimension equal to the number of bins, and not a list of equal dimension as the values, or b. that somehow aggregate does not like that the second list (of the quantiles within the bins are not sorted nicely) 1. Do you have a reference for doing the summation on both bins and quantiles within the bins? 2. If not, can you give me some guidance as to what I am doing wrong and how I can solve the sort/list issue? Any help would be greatly appreciated Kind regards, Ivan Alves [[alternative HTML version deleted]]
Apologies, just a typo in the first instruction (when translating the names), the question is still valid On 21 Oct 2008, at 00:38, Ivan Alves wrote:> Dear all, > > I would like to aggregate a data frame (consisting of 2 columns - one > for the bins, say factors, and one for the values) along bins and > quantiles within the bins. > > I have tried > > aggregate(data.frame$values, list(bin = data.frame > $bin,Quantile=cut2(data.frame$values,g=10)),sum) > > but then the quantiles apply to the population as a whole and not the > individual bins. Upon this realisation I have tried > > aggregate(data.frame$values, list(bin = data.frame > $bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum) > > which gives the following error: > > Error in sort.list(unique.default(x), na.last = TRUE) : > 'x' must be atomic for 'sort.list' > Have you called 'sort' on a list? > > clearly I am doing something wrong, but cannot figure out what. I > believe the error stems either from a. the output of tapply being a > list of a dimension equal to the number of bins, and not a list of > equal dimension as the values, or b. that somehow aggregate does not > like that the second list (of the quantiles within the bins which do > not appear to be > sorted nicely) > > 1. Do you have a reference for doing the summation on both bins and > quantiles within the bins? > 2. If not, can you give me some guidance as to what I am doing wrong > and how I can solve the sort/list issue? > > Any help would be greatly appreciated > > Kind regards, > > Ivan Alves > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear all, Thanks to Jim and Mark for suggesting including the reproducible code. Please note that the enclosed file would need to go to into the home folder or that the path for reading the CSV file be changed. I hope no encoding issues emerge when reading it. And the code library(Hmisc) #need the cut2 function to mark the quantile a given line belongs to a <- read.csv(file = "~/example.csv", colClasses=c("Date","numeric")) #beware of the path dim(a) #should give "[1] 5076 2" aggregate(a$value, list(Date = a[,"Date"],Quantile=cut2(a $value,g=10)),sum) #should give the sum by year but on the quantiles for the whole population aggregate(a$value, list(Date = a[,"Date"],Quantile=tapply(a $value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below Once again, many thanks for any help Ivan On 21 Oct 2008, at 02:40, jim holtman wrote:> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > You need to at least post a subset of your data so that we can > understand the data structures that you are using. 'dput' will create > an easily readable format for posting your data (much easier than if > you post the listing of a table). Usually it is some 'type mismatch' > which says you really have to have the data to run the script against. > > On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves <papucho at mac.com> wrote: >> Dear all, >> >> I would like to aggregate a data frame (consisting of 2 columns - one >> for the bins, say factors, and one for the values) along bins and >> quantiles within the bins. >> >> I have tried >> >> aggregate(data.frame$values, list(bin = data.frame >> $bin,Quantile=cut2(data.frame$bin,g=10)),sum) >> >> but then the quantiles apply to the population as a whole and not the >> individual bins. Upon this realisation I have tried >> >> aggregate(data.frame$values, list(bin = data.frame >> $bin,Quantile=tapply(data.frame$values,data.frame >> $bin,cut2,g=10)),sum) >> >> which gives the following error: >> >> Error in sort.list(unique.default(x), na.last = TRUE) : >> 'x' must be atomic for 'sort.list' >> Have you called 'sort' on a list? >> >> clearly I am doing something wrong, but cannot figure out what. I >> believe the error stems either from a. the output of tapply being a >> list of a dimension equal to the number of bins, and not a list of >> equal dimension as the values, or b. that somehow aggregate does not >> like that the second list (of the quantiles within the bins are not >> sorted nicely) >> >> 1. Do you have a reference for doing the summation on both bins and >> quantiles within the bins? >> 2. If not, can you give me some guidance as to what I am doing wrong >> and how I can solve the sort/list issue? >> >> Any help would be greatly appreciated >> >> Kind regards, >> >> Ivan Alves >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve?
CART is a commercial package and not part of R. R has several packages that do various kinds of regression and classification trees. Try: RSiteSearch("Classification Tree",restr="func") -- Bert Gunter -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rahul-A.Agarwal at ubs.com Sent: Wednesday, October 22, 2008 2:13 AM To: R-help at r-project.org Subject: [R] Problem in Cart Hi, Can some one help me in this project. I would like to initiate a project using CART. For example for NIFTY 50 stocks the first node could be cheap or expensive based on PE (Price Earning) with 2 subsequent nodes for earnings certainty and return on assets. Can anyone tell me how to go ahead for this project. I believe Prof. Ripley can have a say on it. Rahul Agarwal Visit our website at http://www.ubs.com This message contains confidential information and is in...{{dropped:8}}
Hi, Sorry for the confusion but I am looking to use R for regression trees. My query is stated below and I am not able to understand how can I use tree library in this case Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -----Original Message----- From: Bert Gunter [mailto:gunter.berton at gene.com] Sent: Wednesday, October 22, 2008 7:08 PM To: Agarwal, Rahul-A; R-help at r-project.org Subject: RE: [R] Problem in Cart CART is a commercial package and not part of R. R has several packages that do various kinds of regression and classification trees. Try: RSiteSearch("Classification Tree",restr="func") -- Bert Gunter -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rahul-A.Agarwal at ubs.com Sent: Wednesday, October 22, 2008 2:13 AM To: R-help at r-project.org Subject: [R] Problem in Cart Hi, Can some one help me in this project. I would like to initiate a project using CART. For example for NIFTY 50 stocks the first node could be cheap or expensive based on PE (Price Earning) with 2 subsequent nodes for earnings certainty and return on assets. Can anyone tell me how to go ahead for this project. I believe Prof. Ripley can have a say on it. Rahul Agarwal Visit our website at http://www.ubs.com This message contains confidential information and is\ i...{{dropped:28}}