Dear all, I would like to aggregate a data frame (consisting of 2 columns - one for the bins, say factors, and one for the values) along bins and quantiles within the bins. I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=cut2(data.frame$bin,g=10)),sum) but then the quantiles apply to the population as a whole and not the individual bins. Upon this realisation I have tried aggregate(data.frame$values, list(bin = data.frame $bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum) which gives the following error: Error in sort.list(unique.default(x), na.last = TRUE) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? clearly I am doing something wrong, but cannot figure out what. I believe the error stems either from a. the output of tapply being a list of a dimension equal to the number of bins, and not a list of equal dimension as the values, or b. that somehow aggregate does not like that the second list (of the quantiles within the bins are not sorted nicely) 1. Do you have a reference for doing the summation on both bins and quantiles within the bins? 2. If not, can you give me some guidance as to what I am doing wrong and how I can solve the sort/list issue? Any help would be greatly appreciated Kind regards, Ivan Alves [[alternative HTML version deleted]]
Apologies, just a typo in the first instruction (when translating the names), the question is still valid On 21 Oct 2008, at 00:38, Ivan Alves wrote:> Dear all, > > I would like to aggregate a data frame (consisting of 2 columns - one > for the bins, say factors, and one for the values) along bins and > quantiles within the bins. > > I have tried > > aggregate(data.frame$values, list(bin = data.frame > $bin,Quantile=cut2(data.frame$values,g=10)),sum) > > but then the quantiles apply to the population as a whole and not the > individual bins. Upon this realisation I have tried > > aggregate(data.frame$values, list(bin = data.frame > $bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum) > > which gives the following error: > > Error in sort.list(unique.default(x), na.last = TRUE) : > 'x' must be atomic for 'sort.list' > Have you called 'sort' on a list? > > clearly I am doing something wrong, but cannot figure out what. I > believe the error stems either from a. the output of tapply being a > list of a dimension equal to the number of bins, and not a list of > equal dimension as the values, or b. that somehow aggregate does not > like that the second list (of the quantiles within the bins which do > not appear to be > sorted nicely) > > 1. Do you have a reference for doing the summation on both bins and > quantiles within the bins? > 2. If not, can you give me some guidance as to what I am doing wrong > and how I can solve the sort/list issue? > > Any help would be greatly appreciated > > Kind regards, > > Ivan Alves > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Dear all,
Thanks to Jim and Mark for suggesting including the reproducible
code. Please note that the enclosed file would need to go to into the
home folder or that the path for reading the CSV file be changed. I
hope no encoding issues emerge when reading it.
And the code
library(Hmisc) #need the cut2 function to mark the quantile a given
line belongs to
a <- read.csv(file = "~/example.csv",
colClasses=c("Date","numeric"))
#beware of the path
dim(a) #should give "[1] 5076 2"
aggregate(a$value, list(Date = a[,"Date"],Quantile=cut2(a
$value,g=10)),sum) #should give the sum by year but on the quantiles
for the whole population
aggregate(a$value, list(Date = a[,"Date"],Quantile=tapply(a
$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below
Once again, many thanks for any help
Ivan
On 21 Oct 2008, at 02:40, jim holtman wrote:
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> You need to at least post a subset of your data so that we can
> understand the data structures that you are using. 'dput' will
create
> an easily readable format for posting your data (much easier than if
> you post the listing of a table). Usually it is some 'type
mismatch'
> which says you really have to have the data to run the script against.
>
> On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves <papucho at mac.com>
wrote:
>> Dear all,
>>
>> I would like to aggregate a data frame (consisting of 2 columns - one
>> for the bins, say factors, and one for the values) along bins and
>> quantiles within the bins.
>>
>> I have tried
>>
>> aggregate(data.frame$values, list(bin = data.frame
>> $bin,Quantile=cut2(data.frame$bin,g=10)),sum)
>>
>> but then the quantiles apply to the population as a whole and not the
>> individual bins. Upon this realisation I have tried
>>
>> aggregate(data.frame$values, list(bin = data.frame
>> $bin,Quantile=tapply(data.frame$values,data.frame
>> $bin,cut2,g=10)),sum)
>>
>> which gives the following error:
>>
>> Error in sort.list(unique.default(x), na.last = TRUE) :
>> 'x' must be atomic for 'sort.list'
>> Have you called 'sort' on a list?
>>
>> clearly I am doing something wrong, but cannot figure out what. I
>> believe the error stems either from a. the output of tapply being a
>> list of a dimension equal to the number of bins, and not a list of
>> equal dimension as the values, or b. that somehow aggregate does not
>> like that the second list (of the quantiles within the bins are not
>> sorted nicely)
>>
>> 1. Do you have a reference for doing the summation on both bins and
>> quantiles within the bins?
>> 2. If not, can you give me some guidance as to what I am doing wrong
>> and how I can solve the sort/list issue?
>>
>> Any help would be greatly appreciated
>>
>> Kind regards,
>>
>> Ivan Alves
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
CART is a commercial package and not part of R.
R has several packages that do various kinds of regression and
classification trees. Try:
RSiteSearch("Classification Tree",restr="func")
-- Bert Gunter
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Rahul-A.Agarwal at ubs.com
Sent: Wednesday, October 22, 2008 2:13 AM
To: R-help at r-project.org
Subject: [R] Problem in Cart
Hi,
Can some one help me in this project. I would like to initiate a
project using CART. For example for NIFTY 50 stocks the first node could
be cheap or expensive based on PE (Price Earning) with 2 subsequent
nodes for earnings certainty and return on assets.
Can anyone tell me how to go ahead for this project.
I believe Prof. Ripley can have a say on it.
Rahul Agarwal
Visit our website at http://www.ubs.com
This message contains confidential information and is in...{{dropped:8}}
Hi, Sorry for the confusion but I am looking to use R for regression
trees.
My query is stated below and I am not able to understand how can I use
tree library in this case
Rahul Agarwal
Analyst
Equities Quantitative Research
UBS_ISC, Hyderabad
On Net: 19 533 6363
-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com]
Sent: Wednesday, October 22, 2008 7:08 PM
To: Agarwal, Rahul-A; R-help at r-project.org
Subject: RE: [R] Problem in Cart
CART is a commercial package and not part of R.
R has several packages that do various kinds of regression and
classification trees. Try:
RSiteSearch("Classification Tree",restr="func")
-- Bert Gunter
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Rahul-A.Agarwal at ubs.com
Sent: Wednesday, October 22, 2008 2:13 AM
To: R-help at r-project.org
Subject: [R] Problem in Cart
Hi,
Can some one help me in this project. I would like to initiate a
project using CART. For example for NIFTY 50 stocks the first node could
be cheap or expensive based on PE (Price Earning) with 2 subsequent
nodes for earnings certainty and return on assets.
Can anyone tell me how to go ahead for this project.
I believe Prof. Ripley can have a say on it.
Rahul Agarwal
Visit our website at http://www.ubs.com
This message contains confidential information and is\ i...{{dropped:28}}