thr3ads.net - R help - [R] Conditional Weighted Average (ddply or any other function) [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Punit Anand

2013-Mar-01 16:01 UTC

[R] Conditional Weighted Average (ddply or any other function)

Hello R community,

I am computing weighted average statistic by using ddply function:

My data set is:
N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27

with colnames:
(Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)

I want to compute
1) Weighted ROE based on Sector and Fiscal Year.
For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)

2) Weighted ROE based on Country and Fiscal Year.
For firm N1 of Country C1 and Fiscalyear FY-3 weight is
MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)

3) Weighted ROE based on Country, Sector and  Fiscal Year.
For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1, Fiscalyear FY-3)

4) Weighted ROE based on Country, Industry and  Fiscal Year.
For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1, Fiscalyear FY-3)


I tried using ddply function:
ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKTCAP)))

where wavg <- function(x, wt) x %*% wt/sum(wt)
but this doesn't give me the right answer.

I could try subseting the data into different sectors and compute the
weighted average which doesn't look like an elegant solution and would
defeat the purpose of ddply

I coudn't think of properly using melt and cast functions to solve
this issue. Any help will be highly appreciated.

Thanks and Regards,
Punit

John Kane

2013-Mar-01 17:31 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

It is not at all clear what you are doing.  You state that the data set you are
using is what I have called dat1 : see dput form below.

As far as I can see there is no numerical value in there.  

##===========data set in dput form================#
dat1  <-  structure(list(Name = c("N1", "N1",
"N1", "N1", "N1", "N1", "N1",
         "N2", "N2", "N2", "N2",
"N2", "N2", "N2"), Ticker = c("T1",
"T1",
         "T1", "T1", "T1", "T1",
"T1", "T2", "T2", "T2", "T2",
"T2", "T2",
         "T2"), Sector = c("S1", "S1",
"S1", "S1", "S1", "S1", "S1",
"S2",
          "S2", "S2", "S2", "S2",
"S2", "S2"), Industry = c("I1", "I1",   
"I1", "I1", "I1", "I1", "I1",
"I2", "I2", "I2", "I2", "I2",
"I2",
          "I2"), Country = c("C1", "C1",
"C1", "C1", "C1", "C1", "C1",
           "C2", "C2", "C2", "C2",
"C2", "C2", "C2"), Year = c("FY-4",
"FY-3",
          "FY-2", "FY-1", "FY0", "FY1",
"FY2", "FY-4", "FY-3", "FY-2",
          "FY-2", "FY0", "FY2", "FY2"),
ROE = c("ROE11", "ROE12", "ROE13",
          "ROE14", "ROE15", "ROE16",
"ROE17", "ROE21", "ROE22", "ROE23",
          "ROE24", "ROE25", "ROE26",
"ROE27"), EPS = c("EPS11", "EPS12",
          "EPS13", "EPS14", "EPS15",
"EPS16", "EPS17", "EPS21", "EPS22",
          "EPS23", "EPS24", "EPS25",
"EPS26", "EPS27"), MKTCAP = c("MKT11",
          "MKT12", "MKT13", "MKT14",
"MKT15", "MKT16", "MKT17", "MKT21",
          "MKT22", "MKT23", "MKT24",
"MKT25", "MKT26", "MKT27")), .Names =
c("Name",
         "Ticker", "Sector", "Industry",
"Country", "Year", "ROE", "EPS",
         "MKTCAP"), class = "data.frame", row.names = c(NA,
-14L))
## =================end of dataset==================#

There is no FISCALYEAR variable that you specifed below
> ddply (dataread , .(Sector, FISCALYEAR), summarise, > WROE=wavg(ROE,
MKTCAP)))
I think we need a bit more information. 

John Kane
Kingston ON Canada

> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 11:01:42 -0500
> To: r-help at r-project.org
> Subject: [R] Conditional Weighted Average (ddply or any other function)
> 
> Hello R community,
> 
> I am computing weighted average statistic by using ddply function:
> 
> My data set is:
> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
> 
> with colnames:
> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
> 
> I want to compute
> 1) Weighted ROE based on Sector and Fiscal Year.
> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
> 
> 2) Weighted ROE based on Country and Fiscal Year.
> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
> 
> 3) Weighted ROE based on Country, Sector and  Fiscal Year.
> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1, Fiscalyear
> FY-3)
> 
> 4) Weighted ROE based on Country, Industry and  Fiscal Year.
> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
> weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1, Fiscalyear
> FY-3)
> 
> 
> I tried using ddply function:
> ddply (dataread , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE,
> MKTCAP)))
> 
> where wavg <- function(x, wt) x %*% wt/sum(wt)
> but this doesn't give me the right answer.
> 
> I could try subseting the data into different sectors and compute the
> weighted average which doesn't look like an elegant solution and would
> defeat the purpose of ddply
> 
> I coudn't think of properly using melt and cast functions to solve
> this issue. Any help will be highly appreciated.
> 
> Thanks and Regards,
> Punit
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most
webmails

Punit Anand

2013-Mar-01 19:13 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

Hi John,

The sample size is huge involving 10,000 + firms. I have put a
representative sample using dput ( Name, ticker and country have been
changed so that firms cannot be identified due to proprietary data
set, also EPS is not required and removed from the dataset)

structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC",
"CTAX",
"INN", "NOB", "SH", "SZ",
"WASH", "WILLSON"), class = "factor"),
    Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
    8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
    6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
    1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
    "CT56", "INN12", "NB12", "SH12",
"SZ12", "W12", "W15"), class =
"factor"),
    Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
    5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial &
Professional Serv",
    "Energy", "Media", "Retail",
"Transportation"), class = "factor"),
    Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
Discretionary",
    "Energy", "Industrials"), class = "factor"),
Country = structure(c(4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L), .Label = c("Brazil", "China",
"India", "UK"), class "factor"),
    FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
    1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
    5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
    2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label = c("FY-1",
    "FY-2", "FY-3", "FY0", "FY1",
"FY2"), class = "factor"),
    ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
    0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
    0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
    0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
    0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
    0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
    0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
    3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
    4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
    700357605, 1814942993, 1858225342, 1242890503, 1242890503,
    1879700000, 557093400, 224900300, 1634700000, 1443200000,
    3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
    7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
    6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
    4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
    513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
    )), .Names = c("NAME", "Ticker", "Industry",
"Sector", "Country",
"FISCALYEAR", "ROE", "MKT"), class =
"data.frame", row.names = c(NA,
-49L))

Thanks,
Punit
> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at inbox.com>
wrote:
> See below
>
>
>> -----Original Message-----
>> From: anandpunit at gmail.com
>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>> To: jrkrideau at inbox.com
>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>> function)
>>
>> Hi John,
>>
>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
>> Ticker, Sector, Country, FISCALYEAR or Year are character strings.
>>
>> and column "Year" is referring to "FISCALYEAR"
>>
>  Definitely a no-no in R-help.  :)  We really need  some representative
sample data to play with.  See
https://github.com/hadley/devtools/wiki/Reproducibility for some general
pointers on how to compose a good question.  The fact that you included the code
you are using was excellent but without some data it is rather useless.
>
>  The easiest way to supply data  is to use the dput() function.  Example
with your file named "testfile":
> dput(testfile)
> Then copy the output and paste into your email.  This is what I did with
your data that I pasted into my email .  I added the dat1  <-  to it.
>
> For large data sets, you can just supply a representative sample.  Usually,
dput(head(testfile, 100)) will be sufficient.
>
> I hope this is of some help.
>
>
>>
>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>> It is not at all clear what you are doing.  You state that the data
set
>>> you are using is what I have called dat1 : see dput form below.
>>>
>>> As far as I can see there is no numerical value in there.
>>>
>>> ##===========data set in dput form================#
>>> dat1  <-  structure(list(Name = c("N1",
"N1", "N1", "N1", "N1", "N1",
>>> "N1",
>>>          "N2", "N2", "N2",
"N2", "N2", "N2", "N2"), Ticker =
c("T1",
>>> "T1",
>>>          "T1", "T1", "T1",
"T1", "T1", "T2", "T2", "T2",
"T2", "T2",
>>> "T2",
>>>          "T2"), Sector = c("S1",
"S1", "S1", "S1", "S1", "S1",
"S1",
>>> "S2",
>>>           "S2", "S2", "S2",
"S2", "S2", "S2"), Industry = c("I1",
"I1",
>>> "I1", "I1", "I1", "I1",
"I1", "I2", "I2", "I2", "I2",
"I2", "I2",
>>>           "I2"), Country = c("C1",
"C1", "C1", "C1", "C1", "C1",
"C1",
>>>            "C2", "C2", "C2",
"C2", "C2", "C2", "C2"), Year =
c("FY-4",
>>> "FY-3",
>>>           "FY-2", "FY-1", "FY0",
"FY1", "FY2", "FY-4", "FY-3",
"FY-2",
>>>           "FY-2", "FY0", "FY2",
"FY2"), ROE = c("ROE11", "ROE12",
>>> "ROE13",
>>>           "ROE14", "ROE15", "ROE16",
"ROE17", "ROE21", "ROE22", "ROE23",
>>>           "ROE24", "ROE25", "ROE26",
"ROE27"), EPS = c("EPS11", "EPS12",
>>>           "EPS13", "EPS14", "EPS15",
"EPS16", "EPS17", "EPS21", "EPS22",
>>>           "EPS23", "EPS24", "EPS25",
"EPS26", "EPS27"), MKTCAP >>> c("MKT11",
>>>           "MKT12", "MKT13", "MKT14",
"MKT15", "MKT16", "MKT17", "MKT21",
>>>           "MKT22", "MKT23", "MKT24",
"MKT25", "MKT26", "MKT27")), .Names
>>> = c("Name",
>>>          "Ticker", "Sector",
"Industry", "Country", "Year", "ROE",
>>> "EPS",
>>>          "MKTCAP"), class = "data.frame",
row.names = c(NA, -14L))
>>> ## =================end of dataset==================#
>>>
>>> There is no FISCALYEAR variable that you specifed below
>>>
>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, >
WROE=wavg(ROE,
>>>> MKTCAP)))
>>>
>>> I think we need a bit more information.
>>>
>>> John Kane
>>> Kingston ON Canada
>>>
>>>
>>>> -----Original Message-----
>>>> From: anandpunit at gmail.com
>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>> To: r-help at r-project.org
>>>> Subject: [R] Conditional Weighted Average (ddply or any other
function)
>>>>
>>>> Hello R community,
>>>>
>>>> I am computing weighted average statistic by using ddply
function:
>>>>
>>>> My data set is:
>>>> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
>>>> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
>>>> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
>>>> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
>>>> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
>>>> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
>>>> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
>>>> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
>>>> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
>>>> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
>>>> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
>>>> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
>>>> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
>>>> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
>>>>
>>>> with colnames:
>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>
>>>> I want to compute
>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>
>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
>>>>
>>>> 3) Weighted ROE based on Country, Sector and  Fiscal Year.
>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1,
Fiscalyear
>>>> FY-3)
>>>>
>>>> 4) Weighted ROE based on Country, Industry and  Fiscal Year.
>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry = I1,
Fiscalyear
>>>> FY-3)
>>>>
>>>>
>>>> I tried using ddply function:
>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
WROE=wavg(ROE,
>>>> MKTCAP)))
>>>>
>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>> but this doesn't give me the right answer.
>>>>
>>>> I could try subseting the data into different sectors and
compute the
>>>> weighted average which doesn't look like an elegant
solution and would
>>>> defeat the purpose of ddply
>>>>
>>>> I coudn't think of properly using melt and cast functions
to solve
>>>> this issue. Any help will be highly appreciated.
>>>>
>>>> Thanks and Regards,
>>>> Punit
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.

John Kane

2013-Mar-01 19:23 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

Okay I got the data but you seem to have an undefined variable in wavg.
You write :
ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,  MKTCAP)))

There is no MKTCAP in the data.frame.  Also there is one too many ) in the
equation: I think you mean :
ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,  MKTCAP))

Have you left out a equation that calculates  MKTCAP?

John Kane
Kingston ON Canada

> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 13:53:44 -0500
> To: jrkrideau at inbox.com
> Subject: Re: [R] Conditional Weighted Average (ddply or any other
> function)
> 
> Hi John,
> 
> The sample size is huge involving 10,000 + firms. I have put a
> representative sample using dput ( Name, ticker and country have been
> changed so that firms cannot be identified due to proprietary data
> set, also EPS is not required and removed from the dataset)
> 
> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC",
"CTAX",
> "INN", "NOB", "SH", "SZ",
"WASH", "WILLSON"), class = "factor"),
>     Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>     8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>     6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>     1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
>     "CT56", "INN12", "NB12",
"SH12", "SZ12", "W12", "W15"), class
> "factor"),
>     Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>     4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>     5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial
&
> Professional Serv",
>     "Energy", "Media", "Retail",
"Transportation"), class = "factor"),
>     Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
> Discretionary",
>     "Energy", "Industrials"), class =
"factor"), Country > structure(c(4L,
>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>     2L, 2L, 2L), .Label = c("Brazil", "China",
"India", "UK"), class > "factor"),
>     FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>     1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>     5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>     2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label =
c("FY-1",
>     "FY-2", "FY-3", "FY0", "FY1",
"FY2"), class = "factor"),
>     ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>     0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>     0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>     0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>     0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>     0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>     0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>     3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>     4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>     700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>     1879700000, 557093400, 224900300, 1634700000, 1443200000,
>     3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>     7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>     6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>     4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>     513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
>     )), .Names = c("NAME", "Ticker",
"Industry", "Sector", "Country",
> "FISCALYEAR", "ROE", "MKT"), class =
"data.frame", row.names = c(NA,
> -49L))
> 
> Thanks,
> Punit
> 
> 
> 
> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at inbox.com>
wrote:
>> See below
>> 
>> 
>>> -----Original Message-----
>>> From: anandpunit at gmail.com
>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>> function)
>>> 
>>> Hi John,
>>> 
>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
>>> Ticker, Sector, Country, FISCALYEAR or Year are character strings.
>>> 
>>> and column "Year" is referring to "FISCALYEAR"
>>> 
>>  Definitely a no-no in R-help.  :)  We really need  some representative
>> sample data to play with.  See
>> https://github.com/hadley/devtools/wiki/Reproducibility for some
general
>> pointers on how to compose a good question.  The fact that you included
>> the code you are using was excellent but without some data it is rather
>> useless.
>> 
>>  The easiest way to supply data  is to use the dput() function. 
Example
>> with your file named "testfile":
>> dput(testfile)
>> Then copy the output and paste into your email.  This is what I did
with
>> your data that I pasted into my email .  I added the dat1  <-  to
it.
>> 
>> For large data sets, you can just supply a representative sample.
>> Usually,  dput(head(testfile, 100)) will be sufficient.
>> 
>> I hope this is of some help.
>> 
>> 
>>> 
>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>>> It is not at all clear what you are doing.  You state that the
data
>>>> set
>>>> you are using is what I have called dat1 : see dput form below.
>>>> 
>>>> As far as I can see there is no numerical value in there.
>>>> 
>>>> ##===========data set in dput form================#
>>>> dat1  <-  structure(list(Name = c("N1",
"N1", "N1", "N1", "N1", "N1",
>>>> "N1",
>>>>          "N2", "N2", "N2",
"N2", "N2", "N2", "N2"), Ticker =
c("T1",
>>>> "T1",
>>>>          "T1", "T1", "T1",
"T1", "T1", "T2", "T2", "T2",
"T2", "T2",
>>>> "T2",
>>>>          "T2"), Sector = c("S1",
"S1", "S1", "S1", "S1", "S1",
"S1",
>>>> "S2",
>>>>           "S2", "S2", "S2",
"S2", "S2", "S2"), Industry = c("I1",
>>>> "I1",
>>>> "I1", "I1", "I1", "I1",
"I1", "I2", "I2", "I2", "I2",
"I2", "I2",
>>>>           "I2"), Country = c("C1",
"C1", "C1", "C1", "C1", "C1",
"C1",
>>>>            "C2", "C2", "C2",
"C2", "C2", "C2", "C2"), Year =
c("FY-4",
>>>> "FY-3",
>>>>           "FY-2", "FY-1", "FY0",
"FY1", "FY2", "FY-4", "FY-3",
"FY-2",
>>>>           "FY-2", "FY0", "FY2",
"FY2"), ROE = c("ROE11", "ROE12",
>>>> "ROE13",
>>>>           "ROE14", "ROE15",
"ROE16", "ROE17", "ROE21", "ROE22",
>>>> "ROE23",
>>>>           "ROE24", "ROE25",
"ROE26", "ROE27"), EPS = c("EPS11",
>>>> "EPS12",
>>>>           "EPS13", "EPS14",
"EPS15", "EPS16", "EPS17", "EPS21",
>>>> "EPS22",
>>>>           "EPS23", "EPS24",
"EPS25", "EPS26", "EPS27"), MKTCAP
>>>> c("MKT11",
>>>>           "MKT12", "MKT13",
"MKT14", "MKT15", "MKT16", "MKT17",
>>>> "MKT21",
>>>>           "MKT22", "MKT23",
"MKT24", "MKT25", "MKT26", "MKT27")),
>>>> .Names
>>>> = c("Name",
>>>>          "Ticker", "Sector",
"Industry", "Country", "Year", "ROE",
>>>> "EPS",
>>>>          "MKTCAP"), class = "data.frame",
row.names = c(NA, -14L))
>>>> ## =================end of dataset==================#
>>>> 
>>>> There is no FISCALYEAR variable that you specifed below
>>>> 
>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, >
WROE=wavg(ROE,
>>>>> MKTCAP)))
>>>> 
>>>> I think we need a bit more information.
>>>> 
>>>> John Kane
>>>> Kingston ON Canada
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: anandpunit at gmail.com
>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] Conditional Weighted Average (ddply or any
other
>>>>> function)
>>>>> 
>>>>> Hello R community,
>>>>> 
>>>>> I am computing weighted average statistic by using ddply
function:
>>>>> 
>>>>> My data set is:
>>>>> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
>>>>> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
>>>>> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
>>>>> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
>>>>> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
>>>>> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
>>>>> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
>>>>> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
>>>>> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
>>>>> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
>>>>> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
>>>>> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
>>>>> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
>>>>> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
>>>>> 
>>>>> with colnames:
>>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>> 
>>>>> I want to compute
>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>> 
>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
>>>>> 
>>>>> 3) Weighted ROE based on Country, Sector and  Fiscal Year.
>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1,
Fiscalyear
>>>>> FY-3)
>>>>> 
>>>>> 4) Weighted ROE based on Country, Industry and  Fiscal
Year.
>>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry =
I1,
>>>>> Fiscalyear
>>>>> FY-3)
>>>>> 
>>>>> 
>>>>> I tried using ddply function:
>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
WROE=wavg(ROE,
>>>>> MKTCAP)))
>>>>> 
>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>> but this doesn't give me the right answer.
>>>>> 
>>>>> I could try subseting the data into different sectors and
compute the
>>>>> weighted average which doesn't look like an elegant
solution and
>>>>> would
>>>>> defeat the purpose of ddply
>>>>> 
>>>>> I coudn't think of properly using melt and cast
functions to solve
>>>>> this issue. Any help will be highly appreciated.
>>>>> 
>>>>> Thanks and Regards,
>>>>> Punit
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>> 
>>>> ____________________________________________________________
>>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
>>>> http://www.inbox.com/smileys
>>>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?,
Google Talk?
>>>> and most webmails
>>>> 
>>>> 
>> 
>> ____________________________________________________________
>> FREE ONLINE PHOTOSHARING - Share your photos online with your friends
>> and family!
>> Visit http://www.inbox.com/photosharing to find out more!
>> 
>>
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your
desktop!

John Kane

2013-Mar-01 19:29 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

Thanks for the data .

You have an undefined variable in the ddply statement.  There is no MKTCAP in
the data.frame.


You also have one two many ) in the statement. I think it should read:
ddply (dat1 , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKTCAP))



John Kane
Kingston ON Canada

> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 14:13:37 -0500
> To: r-help at r-project.org
> Subject: Re: [R] Conditional Weighted Average (ddply or any other
> function)
> 
> Hi John,
> 
> The sample size is huge involving 10,000 + firms. I have put a
> representative sample using dput ( Name, ticker and country have been
> changed so that firms cannot be identified due to proprietary data
> set, also EPS is not required and removed from the dataset)
> 
> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC",
"CTAX",
> "INN", "NOB", "SH", "SZ",
"WASH", "WILLSON"), class = "factor"),
>     Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>     8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>     6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>     1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
>     "CT56", "INN12", "NB12",
"SH12", "SZ12", "W12", "W15"), class
> "factor"),
>     Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>     4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>     5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial
&
> Professional Serv",
>     "Energy", "Media", "Retail",
"Transportation"), class = "factor"),
>     Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
> Discretionary",
>     "Energy", "Industrials"), class =
"factor"), Country > structure(c(4L,
>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>     2L, 2L, 2L), .Label = c("Brazil", "China",
"India", "UK"), class > "factor"),
>     FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>     1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>     5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>     2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label =
c("FY-1",
>     "FY-2", "FY-3", "FY0", "FY1",
"FY2"), class = "factor"),
>     ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>     0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>     0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>     0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>     0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>     0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>     0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>     3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>     4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>     700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>     1879700000, 557093400, 224900300, 1634700000, 1443200000,
>     3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>     7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>     6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>     4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>     513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
>     )), .Names = c("NAME", "Ticker",
"Industry", "Sector", "Country",
> "FISCALYEAR", "ROE", "MKT"), class =
"data.frame", row.names = c(NA,
> -49L))
> 
> Thanks,
> Punit
> 
>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at
inbox.com> wrote:
>> See below
>> 
>> 
>>> -----Original Message-----
>>> From: anandpunit at gmail.com
>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>> function)
>>> 
>>> Hi John,
>>> 
>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
>>> Ticker, Sector, Country, FISCALYEAR or Year are character strings.
>>> 
>>> and column "Year" is referring to "FISCALYEAR"
>>> 
>>  Definitely a no-no in R-help.  :)  We really need  some representative
>> sample data to play with.  See
>> https://github.com/hadley/devtools/wiki/Reproducibility for some
general
>> pointers on how to compose a good question.  The fact that you included
>> the code you are using was excellent but without some data it is rather
>> useless.
>> 
>>  The easiest way to supply data  is to use the dput() function. 
Example
>> with your file named "testfile":
>> dput(testfile)
>> Then copy the output and paste into your email.  This is what I did
with
>> your data that I pasted into my email .  I added the dat1  <-  to
it.
>> 
>> For large data sets, you can just supply a representative sample.
>> Usually,  dput(head(testfile, 100)) will be sufficient.
>> 
>> I hope this is of some help.
>> 
>> 
>>> 
>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>>> It is not at all clear what you are doing.  You state that the
data
>>>> set
>>>> you are using is what I have called dat1 : see dput form below.
>>>> 
>>>> As far as I can see there is no numerical value in there.
>>>> 
>>>> ##===========data set in dput form================#
>>>> dat1  <-  structure(list(Name = c("N1",
"N1", "N1", "N1", "N1", "N1",
>>>> "N1",
>>>>          "N2", "N2", "N2",
"N2", "N2", "N2", "N2"), Ticker =
c("T1",
>>>> "T1",
>>>>          "T1", "T1", "T1",
"T1", "T1", "T2", "T2", "T2",
"T2", "T2",
>>>> "T2",
>>>>          "T2"), Sector = c("S1",
"S1", "S1", "S1", "S1", "S1",
"S1",
>>>> "S2",
>>>>           "S2", "S2", "S2",
"S2", "S2", "S2"), Industry = c("I1",
>>>> "I1",
>>>> "I1", "I1", "I1", "I1",
"I1", "I2", "I2", "I2", "I2",
"I2", "I2",
>>>>           "I2"), Country = c("C1",
"C1", "C1", "C1", "C1", "C1",
"C1",
>>>>            "C2", "C2", "C2",
"C2", "C2", "C2", "C2"), Year =
c("FY-4",
>>>> "FY-3",
>>>>           "FY-2", "FY-1", "FY0",
"FY1", "FY2", "FY-4", "FY-3",
"FY-2",
>>>>           "FY-2", "FY0", "FY2",
"FY2"), ROE = c("ROE11", "ROE12",
>>>> "ROE13",
>>>>           "ROE14", "ROE15",
"ROE16", "ROE17", "ROE21", "ROE22",
>>>> "ROE23",
>>>>           "ROE24", "ROE25",
"ROE26", "ROE27"), EPS = c("EPS11",
>>>> "EPS12",
>>>>           "EPS13", "EPS14",
"EPS15", "EPS16", "EPS17", "EPS21",
>>>> "EPS22",
>>>>           "EPS23", "EPS24",
"EPS25", "EPS26", "EPS27"), MKTCAP
>>>> c("MKT11",
>>>>           "MKT12", "MKT13",
"MKT14", "MKT15", "MKT16", "MKT17",
>>>> "MKT21",
>>>>           "MKT22", "MKT23",
"MKT24", "MKT25", "MKT26", "MKT27")),
>>>> .Names
>>>> = c("Name",
>>>>          "Ticker", "Sector",
"Industry", "Country", "Year", "ROE",
>>>> "EPS",
>>>>          "MKTCAP"), class = "data.frame",
row.names = c(NA, -14L))
>>>> ## =================end of dataset==================#
>>>> 
>>>> There is no FISCALYEAR variable that you specifed below
>>>> 
>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, >
WROE=wavg(ROE,
>>>>> MKTCAP)))
>>>> 
>>>> I think we need a bit more information.
>>>> 
>>>> John Kane
>>>> Kingston ON Canada
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: anandpunit at gmail.com
>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] Conditional Weighted Average (ddply or any
other
>>>>> function)
>>>>> 
>>>>> Hello R community,
>>>>> 
>>>>> I am computing weighted average statistic by using ddply
function:
>>>>> 
>>>>> My data set is:
>>>>> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
>>>>> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
>>>>> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
>>>>> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
>>>>> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
>>>>> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
>>>>> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
>>>>> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
>>>>> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
>>>>> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
>>>>> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
>>>>> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
>>>>> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
>>>>> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
>>>>> 
>>>>> with colnames:
>>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>> 
>>>>> I want to compute
>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>> 
>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
>>>>> 
>>>>> 3) Weighted ROE based on Country, Sector and  Fiscal Year.
>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1,
Fiscalyear
>>>>> FY-3)
>>>>> 
>>>>> 4) Weighted ROE based on Country, Industry and  Fiscal
Year.
>>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry =
I1,
>>>>> Fiscalyear
>>>>> FY-3)
>>>>> 
>>>>> 
>>>>> I tried using ddply function:
>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
WROE=wavg(ROE,
>>>>> MKTCAP)))
>>>>> 
>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>> but this doesn't give me the right answer.
>>>>> 
>>>>> I could try subseting the data into different sectors and
compute the
>>>>> weighted average which doesn't look like an elegant
solution and
>>>>> would
>>>>> defeat the purpose of ddply
>>>>> 
>>>>> I coudn't think of properly using melt and cast
functions to solve
>>>>> this issue. Any help will be highly appreciated.
>>>>> 
>>>>> Thanks and Regards,
>>>>> Punit
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most
webmails

John Kane

2013-Mar-01 19:38 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

Is there any chance you meant
 ddply (dat1 , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKT))
??

It gives a result but I have no idea if it makes sense.

John Kane
Kingston ON Canada

> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 14:13:37 -0500
> To: r-help at r-project.org
> Subject: Re: [R] Conditional Weighted Average (ddply or any other
> function)
> 
> Hi John,
> 
> The sample size is huge involving 10,000 + firms. I have put a
> representative sample using dput ( Name, ticker and country have been
> changed so that firms cannot be identified due to proprietary data
> set, also EPS is not required and removed from the dataset)
> 
> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CCC",
"CTAX",
> "INN", "NOB", "SH", "SZ",
"WASH", "WILLSON"), class = "factor"),
>     Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>     8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>     6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>     1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC13",
>     "CT56", "INN12", "NB12",
"SH12", "SZ12", "W12", "W15"), class
> "factor"),
>     Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>     4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>     5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Commercial
&
> Professional Serv",
>     "Energy", "Media", "Retail",
"Transportation"), class = "factor"),
>     Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
> Discretionary",
>     "Energy", "Industrials"), class =
"factor"), Country > structure(c(4L,
>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>     1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>     2L, 2L, 2L), .Label = c("Brazil", "China",
"India", "UK"), class > "factor"),
>     FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>     1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>     5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>     2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label =
c("FY-1",
>     "FY-2", "FY-3", "FY0", "FY1",
"FY2"), class = "factor"),
>     ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>     0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>     0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>     0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>     0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>     0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>     0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>     3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>     4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>     700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>     1879700000, 557093400, 224900300, 1634700000, 1443200000,
>     3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>     7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>     6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>     4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>     513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
>     )), .Names = c("NAME", "Ticker",
"Industry", "Sector", "Country",
> "FISCALYEAR", "ROE", "MKT"), class =
"data.frame", row.names = c(NA,
> -49L))
> 
> Thanks,
> Punit
> 
>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at
inbox.com> wrote:
>> See below
>> 
>> 
>>> -----Original Message-----
>>> From: anandpunit at gmail.com
>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>> function)
>>> 
>>> Hi John,
>>> 
>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric, Name,
>>> Ticker, Sector, Country, FISCALYEAR or Year are character strings.
>>> 
>>> and column "Year" is referring to "FISCALYEAR"
>>> 
>>  Definitely a no-no in R-help.  :)  We really need  some representative
>> sample data to play with.  See
>> https://github.com/hadley/devtools/wiki/Reproducibility for some
general
>> pointers on how to compose a good question.  The fact that you included
>> the code you are using was excellent but without some data it is rather
>> useless.
>> 
>>  The easiest way to supply data  is to use the dput() function. 
Example
>> with your file named "testfile":
>> dput(testfile)
>> Then copy the output and paste into your email.  This is what I did
with
>> your data that I pasted into my email .  I added the dat1  <-  to
it.
>> 
>> For large data sets, you can just supply a representative sample.
>> Usually,  dput(head(testfile, 100)) will be sufficient.
>> 
>> I hope this is of some help.
>> 
>> 
>>> 
>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>>> It is not at all clear what you are doing.  You state that the
data
>>>> set
>>>> you are using is what I have called dat1 : see dput form below.
>>>> 
>>>> As far as I can see there is no numerical value in there.
>>>> 
>>>> ##===========data set in dput form================#
>>>> dat1  <-  structure(list(Name = c("N1",
"N1", "N1", "N1", "N1", "N1",
>>>> "N1",
>>>>          "N2", "N2", "N2",
"N2", "N2", "N2", "N2"), Ticker =
c("T1",
>>>> "T1",
>>>>          "T1", "T1", "T1",
"T1", "T1", "T2", "T2", "T2",
"T2", "T2",
>>>> "T2",
>>>>          "T2"), Sector = c("S1",
"S1", "S1", "S1", "S1", "S1",
"S1",
>>>> "S2",
>>>>           "S2", "S2", "S2",
"S2", "S2", "S2"), Industry = c("I1",
>>>> "I1",
>>>> "I1", "I1", "I1", "I1",
"I1", "I2", "I2", "I2", "I2",
"I2", "I2",
>>>>           "I2"), Country = c("C1",
"C1", "C1", "C1", "C1", "C1",
"C1",
>>>>            "C2", "C2", "C2",
"C2", "C2", "C2", "C2"), Year =
c("FY-4",
>>>> "FY-3",
>>>>           "FY-2", "FY-1", "FY0",
"FY1", "FY2", "FY-4", "FY-3",
"FY-2",
>>>>           "FY-2", "FY0", "FY2",
"FY2"), ROE = c("ROE11", "ROE12",
>>>> "ROE13",
>>>>           "ROE14", "ROE15",
"ROE16", "ROE17", "ROE21", "ROE22",
>>>> "ROE23",
>>>>           "ROE24", "ROE25",
"ROE26", "ROE27"), EPS = c("EPS11",
>>>> "EPS12",
>>>>           "EPS13", "EPS14",
"EPS15", "EPS16", "EPS17", "EPS21",
>>>> "EPS22",
>>>>           "EPS23", "EPS24",
"EPS25", "EPS26", "EPS27"), MKTCAP
>>>> c("MKT11",
>>>>           "MKT12", "MKT13",
"MKT14", "MKT15", "MKT16", "MKT17",
>>>> "MKT21",
>>>>           "MKT22", "MKT23",
"MKT24", "MKT25", "MKT26", "MKT27")),
>>>> .Names
>>>> = c("Name",
>>>>          "Ticker", "Sector",
"Industry", "Country", "Year", "ROE",
>>>> "EPS",
>>>>          "MKTCAP"), class = "data.frame",
row.names = c(NA, -14L))
>>>> ## =================end of dataset==================#
>>>> 
>>>> There is no FISCALYEAR variable that you specifed below
>>>> 
>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise, >
WROE=wavg(ROE,
>>>>> MKTCAP)))
>>>> 
>>>> I think we need a bit more information.
>>>> 
>>>> John Kane
>>>> Kingston ON Canada
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: anandpunit at gmail.com
>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>> To: r-help at r-project.org
>>>>> Subject: [R] Conditional Weighted Average (ddply or any
other
>>>>> function)
>>>>> 
>>>>> Hello R community,
>>>>> 
>>>>> I am computing weighted average statistic by using ddply
function:
>>>>> 
>>>>> My data set is:
>>>>> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
>>>>> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
>>>>> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
>>>>> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
>>>>> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
>>>>> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
>>>>> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
>>>>> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
>>>>> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
>>>>> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
>>>>> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
>>>>> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
>>>>> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
>>>>> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
>>>>> 
>>>>> with colnames:
>>>>> (Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>> 
>>>>> I want to compute
>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight is
>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>> 
>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>> For firm N1 of Country C1 and Fiscalyear FY-3 weight is
>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear FY-3)
>>>>> 
>>>>> 3) Weighted ROE based on Country, Sector and  Fiscal Year.
>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear FY-3
>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Sector = S1,
Fiscalyear
>>>>> FY-3)
>>>>> 
>>>>> 4) Weighted ROE based on Country, Industry and  Fiscal
Year.
>>>>> For firm N1 of Country C1, Industry I1 and Fiscalyear FY-3
>>>>> weight is MKT1 / SUM(MKT, where Country = C1, Industry =
I1,
>>>>> Fiscalyear
>>>>> FY-3)
>>>>> 
>>>>> 
>>>>> I tried using ddply function:
>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
WROE=wavg(ROE,
>>>>> MKTCAP)))
>>>>> 
>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>> but this doesn't give me the right answer.
>>>>> 
>>>>> I could try subseting the data into different sectors and
compute the
>>>>> weighted average which doesn't look like an elegant
solution and
>>>>> would
>>>>> defeat the purpose of ddply
>>>>> 
>>>>> I coudn't think of properly using melt and cast
functions to solve
>>>>> this issue. Any help will be highly appreciated.
>>>>> 
>>>>> Thanks and Regards,
>>>>> Punit
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most
webmails

John Kane

2013-Mar-01 19:39 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

See my last post which crossed yours.  

John Kane
Kingston ON Canada

> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 14:33:17 -0500
> To: jrkrideau at inbox.com
> Subject: Re: [R] Conditional Weighted Average (ddply or any other
> function)
> 
> oops - it should be MKT, I have been playing with a number of data
> sets simultaneously.
> 
> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,  MKT)))
> 
> On Fri, Mar 1, 2013 at 2:23 PM, John Kane <jrkrideau at inbox.com>
wrote:
>> Okay I got the data but you seem to have an undefined variable in wavg.
>> You write :
>> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,
>> MKTCAP)))
>> 
>> There is no MKTCAP in the data.frame.  Also there is one too many ) in
>> the equation: I think you mean :
>> ddply (dataread , .(Sector, FISCALYEAR), summarise,WROE=wavg(ROE,
>> MKTCAP))
>> 
>> Have you left out a equation that calculates  MKTCAP?
>> 
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: anandpunit at gmail.com
>>> Sent: Fri, 1 Mar 2013 13:53:44 -0500
>>> To: jrkrideau at inbox.com
>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>> function)
>>> 
>>> Hi John,
>>> 
>>> The sample size is huge involving 10,000 + firms. I have put a
>>> representative sample using dput ( Name, ticker and country have
been
>>> changed so that firms cannot be identified due to proprietary data
>>> set, also EPS is not required and removed from the dataset)
>>> 
>>> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
>>> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
>>> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("CCC", "CTAX",
>>> "INN", "NOB", "SH", "SZ",
"WASH", "WILLSON"), class = "factor"),
>>>     Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>>>     8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>>>     6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>>>     1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("CC13",
>>>     "CT56", "INN12", "NB12",
"SH12", "SZ12", "W12", "W15"), class
>>> "factor"),
>>>     Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>>>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>>>     4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>>>     5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Commercial &
>>> Professional Serv",
>>>     "Energy", "Media", "Retail",
"Transportation"), class = "factor"),
>>>     Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>     1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>>>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label =
c("Consumer
>>> Discretionary",
>>>     "Energy", "Industrials"), class =
"factor"), Country >>> structure(c(4L,
>>>     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>>>     3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>>>     1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>>     2L, 2L, 2L), .Label = c("Brazil", "China",
"India", "UK"), class >>> "factor"),
>>>     FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>>>     1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>>>     5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>>>     2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label =
c("FY-1",
>>>     "FY-2", "FY-3", "FY0",
"FY1", "FY2"), class = "factor"),
>>>     ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>>>     0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>>>     0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>>>     0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>>>     0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>>>     0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>>>     0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>>>     3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>>>     4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>>>     700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>>>     1879700000, 557093400, 224900300, 1634700000, 1443200000,
>>>     3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>>>     7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>>>     6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>>>     4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>>>     513478700, 260993500, 882575400, 1.151e+09, 855938413,
855938413
>>>     )), .Names = c("NAME", "Ticker",
"Industry", "Sector", "Country",
>>> "FISCALYEAR", "ROE", "MKT"), class =
"data.frame", row.names = c(NA,
>>> -49L))
>>> 
>>> Thanks,
>>> Punit
>>> 
>>> 
>>> 
>>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at
inbox.com> wrote:
>>>> See below
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: anandpunit at gmail.com
>>>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>>>> To: jrkrideau at inbox.com
>>>>> Subject: Re: [R] Conditional Weighted Average (ddply or any
other
>>>>> function)
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric,
Name,
>>>>> Ticker, Sector, Country, FISCALYEAR or Year are character
strings.
>>>>> 
>>>>> and column "Year" is referring to
"FISCALYEAR"
>>>>> 
>>>>  Definitely a no-no in R-help.  :)  We really need  some
>>>> representative
>>>> sample data to play with.  See
>>>> https://github.com/hadley/devtools/wiki/Reproducibility for
some
>>>> general
>>>> pointers on how to compose a good question.  The fact that you
>>>> included
>>>> the code you are using was excellent but without some data it
is
>>>> rather
>>>> useless.
>>>> 
>>>>  The easiest way to supply data  is to use the dput() function.
>>>> Example
>>>> with your file named "testfile":
>>>> dput(testfile)
>>>> Then copy the output and paste into your email.  This is what I
did
>>>> with
>>>> your data that I pasted into my email .  I added the dat1 
<-  to it.
>>>> 
>>>> For large data sets, you can just supply a representative
sample.
>>>> Usually,  dput(head(testfile, 100)) will be sufficient.
>>>> 
>>>> I hope this is of some help.
>>>> 
>>>> 
>>>>> 
>>>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at
inbox.com>
>>>>> wrote:
>>>>>> It is not at all clear what you are doing.  You state
that the data
>>>>>> set
>>>>>> you are using is what I have called dat1 : see dput
form below.
>>>>>> 
>>>>>> As far as I can see there is no numerical value in
there.
>>>>>> 
>>>>>> ##===========data set in dput form================#
>>>>>> dat1  <-  structure(list(Name = c("N1",
"N1", "N1", "N1", "N1",
>>>>>> "N1",
>>>>>> "N1",
>>>>>>          "N2", "N2",
"N2", "N2", "N2", "N2", "N2"),
Ticker = c("T1",
>>>>>> "T1",
>>>>>>          "T1", "T1",
"T1", "T1", "T1", "T2", "T2",
"T2", "T2", "T2",
>>>>>> "T2",
>>>>>>          "T2"), Sector = c("S1",
"S1", "S1", "S1", "S1", "S1",
"S1",
>>>>>> "S2",
>>>>>>           "S2", "S2",
"S2", "S2", "S2", "S2"), Industry =
c("I1",
>>>>>> "I1",
>>>>>> "I1", "I1", "I1",
"I1", "I1", "I2", "I2", "I2",
"I2", "I2", "I2",
>>>>>>           "I2"), Country = c("C1",
"C1", "C1", "C1", "C1", "C1",
>>>>>> "C1",
>>>>>>            "C2", "C2",
"C2", "C2", "C2", "C2", "C2"),
Year >>>>>> c("FY-4",
>>>>>> "FY-3",
>>>>>>           "FY-2", "FY-1",
"FY0", "FY1", "FY2", "FY-4",
"FY-3",
>>>>>> "FY-2",
>>>>>>           "FY-2", "FY0",
"FY2", "FY2"), ROE = c("ROE11", "ROE12",
>>>>>> "ROE13",
>>>>>>           "ROE14", "ROE15",
"ROE16", "ROE17", "ROE21", "ROE22",
>>>>>> "ROE23",
>>>>>>           "ROE24", "ROE25",
"ROE26", "ROE27"), EPS = c("EPS11",
>>>>>> "EPS12",
>>>>>>           "EPS13", "EPS14",
"EPS15", "EPS16", "EPS17", "EPS21",
>>>>>> "EPS22",
>>>>>>           "EPS23", "EPS24",
"EPS25", "EPS26", "EPS27"), MKTCAP
>>>>>> c("MKT11",
>>>>>>           "MKT12", "MKT13",
"MKT14", "MKT15", "MKT16", "MKT17",
>>>>>> "MKT21",
>>>>>>           "MKT22", "MKT23",
"MKT24", "MKT25", "MKT26", "MKT27")),
>>>>>> .Names
>>>>>> = c("Name",
>>>>>>          "Ticker", "Sector",
"Industry", "Country", "Year", "ROE",
>>>>>> "EPS",
>>>>>>          "MKTCAP"), class =
"data.frame", row.names = c(NA, -14L))
>>>>>> ## =================end of dataset==================#
>>>>>> 
>>>>>> There is no FISCALYEAR variable that you specifed below
>>>>>> 
>>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
>
>>>>>>> WROE=wavg(ROE,
>>>>>>> MKTCAP)))
>>>>>> 
>>>>>> I think we need a bit more information.
>>>>>> 
>>>>>> John Kane
>>>>>> Kingston ON Canada
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: anandpunit at gmail.com
>>>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>>>> To: r-help at r-project.org
>>>>>>> Subject: [R] Conditional Weighted Average (ddply or
any other
>>>>>>> function)
>>>>>>> 
>>>>>>> Hello R community,
>>>>>>> 
>>>>>>> I am computing weighted average statistic by using
ddply function:
>>>>>>> 
>>>>>>> My data set is:
>>>>>>> N1  T1  S1  I1  C1 FY-4  ROE11  EPS11 MKT11
>>>>>>> N1  T1  S1  I1  C1 FY-3  ROE12  EPS12 MKT12
>>>>>>> N1  T1  S1  I1  C1 FY-2  ROE13  EPS13 MKT13
>>>>>>> N1  T1  S1  I1  C1 FY-1  ROE14  EPS14 MKT14
>>>>>>> N1  T1  S1  I1  C1 FY0   ROE15  EPS15 MKT15
>>>>>>> N1  T1  S1  I1  C1 FY1   ROE16  EPS16 MKT16
>>>>>>> N1  T1  S1  I1  C1 FY2   ROE17  EPS17 MKT17
>>>>>>> N2  T2  S2  I2  C2 FY-4  ROE21  EPS21 MKT21
>>>>>>> N2  T2  S2  I2  C2 FY-3  ROE22  EPS22 MKT22
>>>>>>> N2  T2  S2  I2  C2 FY-2  ROE23  EPS23 MKT23
>>>>>>> N2  T2  S2  I2  C2 FY-2  ROE24  EPS24 MKT24
>>>>>>> N2  T2  S2  I2  C2 FY0   ROE25  EPS25 MKT25
>>>>>>> N2  T2  S2  I2  C2 FY2   ROE26  EPS26 MKT26
>>>>>>> N2  T2  S2  I2  C2 FY2   ROE27  EPS27 MKT27
>>>>>>> 
>>>>>>> with colnames:
>>>>>>>
(Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>>>> 
>>>>>>> I want to compute
>>>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight
is
>>>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>>>> 
>>>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>>>> For firm N1 of Country C1 and Fiscalyear FY-3
weight is
>>>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear
FY-3)
>>>>>>> 
>>>>>>> 3) Weighted ROE based on Country, Sector and 
Fiscal Year.
>>>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear
FY-3
>>>>>>> weight is MKT1 / SUM(MKT, where Country = C1,
Sector = S1,
>>>>>>> Fiscalyear
>>>>>>> FY-3)
>>>>>>> 
>>>>>>> 4) Weighted ROE based on Country, Industry and 
Fiscal Year.
>>>>>>> For firm N1 of Country C1, Industry I1 and
Fiscalyear FY-3
>>>>>>> weight is MKT1 / SUM(MKT, where Country = C1,
Industry = I1,
>>>>>>> Fiscalyear
>>>>>>> FY-3)
>>>>>>> 
>>>>>>> 
>>>>>>> I tried using ddply function:
>>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
WROE=wavg(ROE,
>>>>>>> MKTCAP)))
>>>>>>> 
>>>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>>>> but this doesn't give me the right answer.
>>>>>>> 
>>>>>>> I could try subseting the data into different
sectors and compute
>>>>>>> the
>>>>>>> weighted average which doesn't look like an
elegant solution and
>>>>>>> would
>>>>>>> defeat the purpose of ddply
>>>>>>> 
>>>>>>> I coudn't think of properly using melt and cast
functions to solve
>>>>>>> this issue. Any help will be highly appreciated.
>>>>>>> 
>>>>>>> Thanks and Regards,
>>>>>>> Punit
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>> 
>>>>>>
____________________________________________________________
>>>>>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more
at
>>>>>> http://www.inbox.com/smileys
>>>>>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger,
ICQ?, Google
>>>>>> Talk?
>>>>>> and most webmails
>>>>>> 
>>>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> FREE ONLINE PHOTOSHARING - Share your photos online with your
friends
>>>> and family!
>>>> Visit http://www.inbox.com/photosharing to find out more!
>>>> 
>>>> 
>> 
>> ____________________________________________________________
>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks &
orcas on
>> your desktop!
>> Check it out at http://www.inbox.com/marineaquarium
>> 
>>
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and
family!
Visit http://www.inbox.com/photosharing to find out more!

John Kane

2013-Mar-01 20:38 UTC

head link

[R] Conditional Weighted Average (ddply or any other function)

> For Sectors the results are correct, but for some (Country, Industry,
> FISCALYEAR) combinations or (Country,FISCALYEAR)combinations > the
result don't match the spreadsheet (Excel) computation
Well the normal assumption here is to assume that Excel is wrong. I am not
joking. Spreadsheets are notoriously error-prone, especially Excel, from
problems with basic algorithms to 'simple' problems such as typos or
accidentally specifying the wrong column of data.

See http://www.r-bloggers.com/spreadsheet-errors/
[http://www.r-bloggers.com/spreadsheet-errors/] for one nasty example or from a
recent thread on Excel and R
"The idea that the Excel solver "has a good reputation for being fast
and accurate" does not withstand an examination of the Excel solver's
ability to solve the StRD nls test problems. Solver's ability is abysmal. 13
of 27 "answers" have zero accurate digits, and three more have fewer
than two accurate digits -- and this is after tuning the solver to get a good
answer.
...
Excel solver does have the virture that it will always produce an answer, albeit
one with zero accurate digits."
Bruce McCullough 

On the other hand there can be any number of reasons why R and Excel are not
giving the same results assuming Excel may be correct.

One thing that come to mind is that R and Excel may be handling missing data in
a different manner if you have any missing data.

Otherwise you may need to break out some data were the combination a)works with
some combinations and does not with others and let some experts have a look at
it.

To my untutored eye you look like you're okay but I am not an expert by any
means.

Sorry not to be of more help

John Kane
Kingston ON Canada
> -----Original Message-----
> From: anandpunit at gmail.com
> Sent: Fri, 1 Mar 2013 15:05:19 -0500
> To: jrkrideau at inbox.com
> Subject: Re: [R] Conditional Weighted Average (ddply or any other
> function)
> 
> For Sectors the results are correct, but for some (Country, Industry,
> FISCALYEAR) combinations or (Country, FISCALYEAR) combinations the
> result don't match the spreadsheet (Excel) computation, so verifying
> from experts, whether I am using ddply correctly with the right
> intention?
> 
> 
> On Fri, Mar 1, 2013 at 2:38 PM, John Kane <jrkrideau at inbox.com>
wrote:
>> Is there any chance you meant
>> ddply (dat1 , .(Sector, FISCALYEAR), summarise, WROE=wavg(ROE, MKT))
>> ??
>> 
>> It gives a result but I have no idea if it makes sense.
>> 
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: anandpunit at gmail.com
>>> Sent: Fri, 1 Mar 2013 14:13:37 -0500
>>> To: r-help at r-project.org
>>> Subject: Re: [R] Conditional Weighted Average (ddply or any other
>>> function)
>>> 
>>> Hi John,
>>> 
>>> The sample size is huge involving 10,000 + firms. I have put a
>>> representative sample using dput ( Name, ticker and country have
been
>>> changed so that firms cannot be identified due to proprietary data
>>> set, also EPS is not required and removed from the dataset)
>>> 
>>> structure(list(NAME = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 7L,
>>> 7L, 7L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
>>> 6L, 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("CCC", "CTAX",
>>> "INN", "NOB", "SH", "SZ",
"WASH", "WILLSON"), class = "factor"),
>>> Ticker = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L,
>>> 8L, 8L, 8L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L,
>>> 6L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label =
c("CC13",
>>> "CT56", "INN12", "NB12",
"SH12", "SZ12", "W12", "W15"), class
>>> "factor"),
>>> Industry = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
>>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
>>> 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L,
>>> 5L, 5L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Commercial &
>>> Professional Serv",
>>> "Energy", "Media", "Retail",
"Transportation"), class = "factor"),
>>> Sector = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
>>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Consumer
>>> Discretionary",
>>> "Energy", "Industrials"), class =
"factor"), Country >>> structure(c(4L,
>>> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
>>> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>>> 2L, 2L, 2L), .Label = c("Brazil", "China",
"India", "UK"), class >>> "factor"),
>>> FISCALYEAR = structure(c(3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L,
>>> 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 3L, 2L, 1L, 4L,
>>> 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L, 3L,
>>> 2L, 1L, 4L, 5L, 6L, 3L, 2L, 1L, 4L, 5L, 6L), .Label =
c("FY-1",
>>> "FY-2", "FY-3", "FY0",
"FY1", "FY2"), class = "factor"),
>>> ROE = c(0.026, 0.0656, 0.1621, 0.1885, 0.1968, 0.2126, 0.0207,
>>> 0.0319, 0.0963, 0.0431, 0.066, 0.066, 0.0707, 0.0797, 0.0781,
>>> 0.078, 0.098, 0.126, 0.0352, 0.2625, 0.3714, 0.2929, 0.3133,
>>> 0.2509, 0.2398, 0.2779, 0.1109, 0.0509, 0.069, 0.1017, 0.1298,
>>> 0.5842, 0.3953, 0.4429, 0.3616, 0.26, 0.2, 0.4472, 0.2912,
>>> 0.21, 0.2849, 0.3553, 0.4347, 0.3289, 0.3846, 0.2643, 0.0458,
>>> 0.1265, 0.28), MKT = c(2919236084, 836858582, 2015182617,
>>> 3399344971, 4324821777, 4324821777, 7619453125, 3579844727,
>>> 4132238281, 3712239990, 2879757813, 2879757813, 1525237793,
>>> 700357605, 1814942993, 1858225342, 1242890503, 1242890503,
>>> 1879700000, 557093400, 224900300, 1634700000, 1443200000,
>>> 3582664735, 3582664735, 5830366211, 10660833984, 9024061523,
>>> 7628660645, 9154108398, 9154108398, 7064532227, 1804380005,
>>> 6331067871, 10445639648, 9153587891, 9153587891, 6231200000,
>>> 4.078e+09, 10107500000, 12460300000, 17800051556, 17800051556,
>>> 513478700, 260993500, 882575400, 1.151e+09, 855938413, 855938413
>>> )), .Names = c("NAME", "Ticker",
"Industry", "Sector", "Country",
>>> "FISCALYEAR", "ROE", "MKT"), class =
"data.frame", row.names = c(NA,
>>> -49L))
>>> 
>>> Thanks,
>>> Punit
>>> 
>>>> On Fri, Mar 1, 2013 at 12:51 PM, John Kane <jrkrideau at
inbox.com>
>>>> wrote:
>>>> See below
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: anandpunit at gmail.com
>>>>> Sent: Fri, 1 Mar 2013 12:36:53 -0500
>>>>> To: jrkrideau at inbox.com
>>>>> Subject: Re: [R] Conditional Weighted Average (ddply or any
other
>>>>> function)
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I was using symbols, Column ROE, EPS, MKTCAP are numeric,
Name,
>>>>> Ticker, Sector, Country, FISCALYEAR or Year are character
strings.
>>>>> 
>>>>> and column "Year" is referring to
"FISCALYEAR"
>>>>> 
>>>> Definitely a no-no in R-help. :) We really need some
>>>> representative
>>>> sample data to play with. See
>>>> https://github.com/hadley/devtools/wiki/Reproducibility
[https://github.com/hadley/devtools/wiki/Reproducibility] for some
>>>> general
>>>> pointers on how to compose a good question. The fact that you
>>>> included
>>>> the code you are using was excellent but without some data it
is
>>>> rather
>>>> useless.
>>>> 
>>>> The easiest way to supply data is to use the dput() function.
>>>> Example
>>>> with your file named "testfile":
>>>> dput(testfile)
>>>> Then copy the output and paste into your email. This is what I
did
>>>> with
>>>> your data that I pasted into my email . I added the dat1 <-
to it.
>>>> 
>>>> For large data sets, you can just supply a representative
sample.
>>>> Usually, dput(head(testfile, 100)) will be sufficient.
>>>> 
>>>> I hope this is of some help.
>>>> 
>>>> 
>>>>> 
>>>>> On Fri, Mar 1, 2013 at 12:31 PM, John Kane <jrkrideau at
inbox.com>
>>>>> wrote:
>>>>>> It is not at all clear what you are doing. You state
that the data
>>>>>> set
>>>>>> you are using is what I have called dat1 : see dput
form below.
>>>>>> 
>>>>>> As far as I can see there is no numerical value in
there.
>>>>>> 
>>>>>> ##===========data set in dput form================#
>>>>>> dat1 <- structure(list(Name = c("N1",
"N1", "N1", "N1", "N1",
>>>>>> "N1",
>>>>>> "N1",
>>>>>> "N2", "N2", "N2",
"N2", "N2", "N2", "N2"), Ticker =
c("T1",
>>>>>> "T1",
>>>>>> "T1", "T1", "T1",
"T1", "T1", "T2", "T2", "T2",
"T2", "T2",
>>>>>> "T2",
>>>>>> "T2"), Sector = c("S1",
"S1", "S1", "S1", "S1", "S1",
"S1",
>>>>>> "S2",
>>>>>> "S2", "S2", "S2",
"S2", "S2", "S2"), Industry = c("I1",
>>>>>> "I1",
>>>>>> "I1", "I1", "I1",
"I1", "I1", "I2", "I2", "I2",
"I2", "I2", "I2",
>>>>>> "I2"), Country = c("C1",
"C1", "C1", "C1", "C1", "C1",
>>>>>> "C1",
>>>>>> "C2", "C2", "C2",
"C2", "C2", "C2", "C2"), Year
>>>>>> c("FY-4",
>>>>>> "FY-3",
>>>>>> "FY-2", "FY-1", "FY0",
"FY1", "FY2", "FY-4", "FY-3",
>>>>>> "FY-2",
>>>>>> "FY-2", "FY0", "FY2",
"FY2"), ROE = c("ROE11", "ROE12",
>>>>>> "ROE13",
>>>>>> "ROE14", "ROE15",
"ROE16", "ROE17", "ROE21", "ROE22",
>>>>>> "ROE23",
>>>>>> "ROE24", "ROE25",
"ROE26", "ROE27"), EPS = c("EPS11",
>>>>>> "EPS12",
>>>>>> "EPS13", "EPS14",
"EPS15", "EPS16", "EPS17", "EPS21",
>>>>>> "EPS22",
>>>>>> "EPS23", "EPS24",
"EPS25", "EPS26", "EPS27"), MKTCAP
>>>>>> c("MKT11",
>>>>>> "MKT12", "MKT13",
"MKT14", "MKT15", "MKT16", "MKT17",
>>>>>> "MKT21",
>>>>>> "MKT22", "MKT23",
"MKT24", "MKT25", "MKT26", "MKT27")),
>>>>>> .Names
>>>>>> = c("Name",
>>>>>> "Ticker", "Sector",
"Industry", "Country", "Year", "ROE",
>>>>>> "EPS",
>>>>>> "MKTCAP"), class = "data.frame",
row.names = c(NA, -14L))
>>>>>> ## =================end of dataset==================#
>>>>>> 
>>>>>> There is no FISCALYEAR variable that you specifed below
>>>>>> 
>>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
>
>>>>>>> WROE=wavg(ROE,
>>>>>>> MKTCAP)))
>>>>>> 
>>>>>> I think we need a bit more information.
>>>>>> 
>>>>>> John Kane
>>>>>> Kingston ON Canada
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: anandpunit at gmail.com
>>>>>>> Sent: Fri, 1 Mar 2013 11:01:42 -0500
>>>>>>> To: r-help at r-project.org
>>>>>>> Subject: [R] Conditional Weighted Average (ddply or
any other
>>>>>>> function)
>>>>>>> 
>>>>>>> Hello R community,
>>>>>>> 
>>>>>>> I am computing weighted average statistic by using
ddply function:
>>>>>>> 
>>>>>>> My data set is:
>>>>>>> N1 T1 S1 I1 C1 FY-4 ROE11 EPS11 MKT11
>>>>>>> N1 T1 S1 I1 C1 FY-3 ROE12 EPS12 MKT12
>>>>>>> N1 T1 S1 I1 C1 FY-2 ROE13 EPS13 MKT13
>>>>>>> N1 T1 S1 I1 C1 FY-1 ROE14 EPS14 MKT14
>>>>>>> N1 T1 S1 I1 C1 FY0 ROE15 EPS15 MKT15
>>>>>>> N1 T1 S1 I1 C1 FY1 ROE16 EPS16 MKT16
>>>>>>> N1 T1 S1 I1 C1 FY2 ROE17 EPS17 MKT17
>>>>>>> N2 T2 S2 I2 C2 FY-4 ROE21 EPS21 MKT21
>>>>>>> N2 T2 S2 I2 C2 FY-3 ROE22 EPS22 MKT22
>>>>>>> N2 T2 S2 I2 C2 FY-2 ROE23 EPS23 MKT23
>>>>>>> N2 T2 S2 I2 C2 FY-2 ROE24 EPS24 MKT24
>>>>>>> N2 T2 S2 I2 C2 FY0 ROE25 EPS25 MKT25
>>>>>>> N2 T2 S2 I2 C2 FY2 ROE26 EPS26 MKT26
>>>>>>> N2 T2 S2 I2 C2 FY2 ROE27 EPS27 MKT27
>>>>>>> 
>>>>>>> with colnames:
>>>>>>>
(Name,Ticker,Sector,Industry,Country,Year,ROE,EPS,MKTCAP)
>>>>>>> 
>>>>>>> I want to compute
>>>>>>> 1) Weighted ROE based on Sector and Fiscal Year.
>>>>>>> For firm N1 of Sector S1 and Fiscalyear FY-3 weight
is
>>>>>>> MKT1 / SUM(MKT, where Sector = S1, Fiscalyear FY-3)
>>>>>>> 
>>>>>>> 2) Weighted ROE based on Country and Fiscal Year.
>>>>>>> For firm N1 of Country C1 and Fiscalyear FY-3
weight is
>>>>>>> MKT1 / SUM(MKT, where Country = C1, Fiscalyear
FY-3)
>>>>>>> 
>>>>>>> 3) Weighted ROE based on Country, Sector and Fiscal
Year.
>>>>>>> For firm N1 of Country C1, Sector S1 and Fiscalyear
FY-3
>>>>>>> weight is MKT1 / SUM(MKT, where Country = C1,
Sector = S1,
>>>>>>> Fiscalyear
>>>>>>> FY-3)
>>>>>>> 
>>>>>>> 4) Weighted ROE based on Country, Industry and
Fiscal Year.
>>>>>>> For firm N1 of Country C1, Industry I1 and
Fiscalyear FY-3
>>>>>>> weight is MKT1 / SUM(MKT, where Country = C1,
Industry = I1,
>>>>>>> Fiscalyear
>>>>>>> FY-3)
>>>>>>> 
>>>>>>> 
>>>>>>> I tried using ddply function:
>>>>>>> ddply (dataread , .(Sector, FISCALYEAR), summarise,
WROE=wavg(ROE,
>>>>>>> MKTCAP)))
>>>>>>> 
>>>>>>> where wavg <- function(x, wt) x %*% wt/sum(wt)
>>>>>>> but this doesn't give me the right answer.
>>>>>>> 
>>>>>>> I could try subseting the data into different
sectors and compute
>>>>>>> the
>>>>>>> weighted average which doesn't look like an
elegant solution and
>>>>>>> would
>>>>>>> defeat the purpose of ddply
>>>>>>> 
>>>>>>> I coudn't think of properly using melt and cast
functions to solve
>>>>>>> this issue. Any help will be highly appreciated.
>>>>>>> 
>>>>>>> Thanks and Regards,
>>>>>>> Punit
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
[https://stat.ethz.ch/mailman/listinfo/r-help]
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html
[http://www.R-project.org/posting-guide.html]
>>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
[https://stat.ethz.ch/mailman/listinfo/r-help]
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
[http://www.R-project.org/posting-guide.html]
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ____________________________________________________________
>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
>> http://www.inbox.com/smileys [http://www.inbox.com/smileys]
>> Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk?
>> and most webmails
>> 
>>
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your
desktop!

Maybe Matching Threads

Search for more maybe matching threads

R help - Mar 2013 - Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

[R] Conditional Weighted Average (ddply or any other function)

Maybe Matching Threads