Quick response on the binomial:
If possible I would suggest you should model
pi = (number/freq of type A) / (total_freq of type A)
veg.glm = glm ( pi ~ x, weights = total_freq, family=binomial)
The glm method is supposed to work only on the natural numbers (inc 0!) but
also works for decimal data - it gives a warning in these cases which can
be ignored.
Hope this helps!
Gerard
                                                                           
             "Imelda.Somodi"
             <imelda.somodi at gm
             ail.com>                                                   To 
             Sent by:                  r-help at r-project.org                
             r-help-bounces at r-                                          cc 
             project.org                                                   
                                                                   Subject 
                                       [R]  Proportional response and      
             20/01/2009 09:11          boosting                            
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
Dear experts of boosting!
I am planning to build vegetation models via boosting with either gbm or
mboost. My problem is that my response variable is the proportion of a
vegetation type in natural vegetation at a location.
ResponseA = (area of vegetation type A/area of all natural vegetation
types)
That means that the response has a continuous distribution between 0 and 1
with many 0s and 1s as well. As I understood from reading these forums, it
is pretty close to a beta distribution with the exception that the marginal
values (0,1) are also included. Because of the latter feature I cannot even
build a beta regression, not that I could do a boosted variant of that.
Nevertheless, I can think of my response as a binomial one with values
between 0 and 1 and take 1 square meter (as if it was a pixel) of natural
vegetation as an observation. This way I can do binomial glms for my data,
so that I specify the no. of square meters of natural vegetation as weights
(I round them to get integers to be applicable in glm). I hope I am allowed
to post a side-question here. I always get a warning with these glms
though.
I give here a simple one-variable example:
Call: tmp <- glm(ossz_ujstand2$k2_stand ~ BIO_1 + I((BIO_1)^2),
family=binomial, na.action=na.omit,weights= ossz_ujstand2$weights),
Where BIO_1 is a variable describing climate, and weights are the area of
natural vegetation rounded to integers for each observation (a vector).
Warning: "non-integer #successes in a binomial glm!"
I read somewhere on this site that this can be normal, but would be
reassured if it was stated that it is indeed so in my case as well.
My problem with boosting is that I don?t know how to handle my response
variable distribution. I am not quite sure how to treat the loss function
either. It seems to me that it somehow corresponds to the link function as
it needs to be defined by family() like link functions in glm. The
potential
choices for family also correspond. At the same time some papers about
boosting imply to me that the loss function takes more the role of the
curve
estimation technique and that data with any distribution can be boosted
with
any type of loss functions.
As a start I tried to do the same with boosting as I did with glms. Here is
an example.
With mboost:
index<-!is.na(ossz_ujstand2$k2_stand)            # I need this to remove
NAs
proba.bb2<-blackboost(k2_stand~BIO_1+BIO_12,data=ossz_ujstand2[index,],weights=ossz_ujstand2$weights[index],family=Binomial())
Error in family at check_y(y) :
  response is not a factor but ?family = Binomial()?
With gbm using the modified code of Elith et al. 2008 Journal of Animal
Ecology:
index<-!is.na(ossz_ujstand2$k2_stand)
k2.tc5.lr01<- gbm.step(data=ossz_ujstand2[index,],
    gbm.x = 50:147,
    gbm.y = 27,
    family = "bernoulli",
    tree.complexity = 5,
    learning.rate = 0.1,
    bag.fraction = 0.75,
    weights=ossz_ujstand2$weights)
Error in gbm.fit(x, y, offset = offset, distribution = distribution, w = w,
:
  Bernoulli requires the response to be in {0,1}
So obviously the solution with weights does not work. Is there a
straightforward way to model my response with the prefabricated families or
I have to write a new loss function? I understand that it is possible in
mboost, but I would greatly appreciate support on how to do this.
Obviously,
I am even uncertain about what type of link I should use for my data.
Thank you very much!
Imelda Somodi
Assistant research fellow
Institute of Ecology and Botany
Hungarian Academy of Science
--
View this message in context:
http://www.nabble.com/Proportional-response-and-boosting-tp21559467p21559467.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
**********************************************************************************
The information transmitted is intended only for the person or entity to which
it is addressed and may contain confidential and/or privileged material. Any
review, retransmission, dissemination or other use of, or taking of any action
in reliance upon, this information by persons or entities other than the
intended recipient is prohibited. If you received this in error, please contact
the sender and delete the material from any computer.  It is the policy of the
Department of Justice, Equality and Law Reform and the Agencies and Offices
using its IT services to disallow the sending of offensive material.
Should you consider that the material contained in this message is offensive you
should contact the sender immediately and also mailminder[at]justice.ie.
Is le haghaidh an duine n? an eintitis ar a bhfuil s? d?rithe, agus le haghaidh
an duine n? an eintitis sin amh?in, a bhearta?tear an fhaisn?is a tarchuireadh
agus f?adfaidh s? go bhfuil ?bhar faoi r?n agus/n? faoi phribhl?id inti.
Toirmisctear aon athbhreithni?, atarchur n? leathadh a dh?anamh ar an bhfaisn?is
seo, aon ?s?id eile a bhaint aisti n? aon ghn?omh a dh?anamh ar a hiontaoibh, ag
daoine n? ag eintitis seachas an faighteoir beartaithe. M? fuair t? ? seo tr?
dhearmad, t?igh i dteagmh?il leis an seolt?ir, le do thoil, agus scrios an
t-?bhar as aon r?omhaire. Is ? beartas na Roinne Dl? agus Cirt, Comhionannais
agus Athch?irithe Dl?, agus na nOif?g? agus na nGn?omhaireachta? a ?s?ideann
seirbh?s? TF na Roinne, seoladh ?bhair chol?il a dh?chead?.
M?s rud ? go measann t? gur ?bhar col?il at? san ?bhar at? sa teachtaireacht seo
is ceart duit dul i dteagmh?il leis an seolt?ir l?ithreach agus le
mailminder[ag]justice.ie chomh maith.
***********************************************************************************