Dear R-user, I tried to generate classification / regression tree with a absence/presence matrix of species (400) in different locations (50) to visualise species which are important for splitting up two locations. Rpart and tree did not work for more than 10 species which is logical due to the limited amount of locations (n=50). However the error prompt is a "+" and no specific message, but I am pretty sure that I did not enter a false sign by mistake. Is it allowed at all to use 0/1 data for this statistical technique and if yes is there a way or different method to use all 400 species entries? Otherwise I would apply a PCA beforehand but I would prefer to have the raw species informations. using R 2.1.1-1 (debian repos.) regards, Martin -- Martin Wegmann DLR - German Aerospace Center German Remote Sensing Data Center @ Dept.of Geography Remote Sensing and Biodiversity Unit && Dept. of Animal Ecology and Tropical Biology University of Wuerzburg Am Hubland 97074 W??rzburg phone: +49-(0)931 - 888 4797 mobile: +49-(0)175 2091725 fax: +49-(0)931 - 888 4961 http://www.biota-africa.org http://www.biogis.de
Martin, If the data are actually coded 0/1, the tree function would probably intepret them as integers and try a regression instead of a classification. If the dependent variable is called "var", try x <- tree(factor(var)~species) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email droberts at montana.edu Montana State University Bozeman, MT 59717-3460 Martin Wegmann wrote:> Dear R-user, > > I tried to generate classification / regression tree with a absence/presence > matrix of species (400) in different locations (50) to visualise species > which are important for splitting up two locations. > Rpart and tree did not work for more than 10 species which is logical due to > the limited amount of locations (n=50). However the error prompt is a "+" and > no specific message, but I am pretty sure that I did not enter a false sign > by mistake. > Is it allowed at all to use 0/1 data for this statistical technique and if yes > is there a way or different method to use all 400 species entries? > Otherwise I would apply a PCA beforehand but I would prefer to have the raw > species informations. > > using R 2.1.1-1 (debian repos.) > > regards, Martin > >
On Friday 23 September 2005 17:08, Dave Roberts wrote:> Martin, > > If the data are actually coded 0/1, the tree function would > probably intepret them as integers and try a regression instead of a > classification. If the dependent variable is called "var", trythanks, but I think I provided too less informations. My dependent variable are the locations which are names (I could transform them to numbers from 1 - n). The independent variables consist of 0/1 data (species). If I do tree(locations~factor(species1)+factor(species2)+.....+factor(speciesn), sp_data) I receive the same results as without the factor() part. BTW just a subset of the locations are displayed what is pretty weird considering that I included all locations in the analysis. Martin> x <- tree(factor(var)~species) > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > David W. Roberts office 406-994-4548 > Professor and Head FAX 406-994-3190 > Department of Ecology email droberts at montana.edu > Montana State University > Bozeman, MT 59717-3460 > > Martin Wegmann wrote: > > Dear R-user, > > > > I tried to generate classification / regression tree with a > > absence/presence matrix of species (400) in different locations (50) to > > visualise species which are important for splitting up two locations. > > Rpart and tree did not work for more than 10 species which is logical due > > to the limited amount of locations (n=50). However the error prompt is a > > "+" and no specific message, but I am pretty sure that I did not enter a > > false sign by mistake. > > Is it allowed at all to use 0/1 data for this statistical technique and > > if yes is there a way or different method to use all 400 species entries? > > Otherwise I would apply a PCA beforehand but I would prefer to have the > > raw species informations. > > > > using R 2.1.1-1 (debian repos.) > > > > regards, Martin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Martin Wegmann DLR - German Aerospace Center German Remote Sensing Data Center @ Dept.of Geography Remote Sensing and Biodiversity Unit && Dept. of Animal Ecology and Tropical Biology University of Wuerzburg Am Hubland 97074 W??rzburg phone: +49-(0)931 - 888 4797 mobile: +49-(0)175 2091725 fax: +49-(0)931 - 888 4961 http://www.biota-africa.org http://www.biogis.de
Martin, I should have tried before the last post to save postings, but on my machine I tried samples = 1224, species = 962, clusters = 10 with no problems at all. > summary(test) Classification tree: tree(formula = factor(opt.10$clustering) ~ pa) Variables actually used in tree construction: [1] "pa.PICENG" "pa.ARTTSV" "pa.PSEMEN" "pa.AGRSPI" "pa.DESCES" "pa.ABILAS" [7] "pa.FESIDA" "pa.POLBIS" "pa.CAREXX" "pa.PINCON" "pa.GEUMAC" Number of terminal nodes: 16 Residual mean deviance: 1.551 = 1873 / 1208 Misclassification error rate: 0.2435 = 298 / 1224 You may want to reclassify to fewer than 50 locations, but I think it should work. Good luck, Dave Roberts Martin Wegmann wrote:> On Friday 23 September 2005 17:08, Dave Roberts wrote: > >>Martin, >> >> If the data are actually coded 0/1, the tree function would >>probably intepret them as integers and try a regression instead of a >>classification. If the dependent variable is called "var", try > > > thanks, but I think I provided too less informations. > My dependent variable are the locations which are names (I could transform > them to numbers from 1 - n). The independent variables consist of 0/1 data > (species). > If I do > tree(locations~factor(species1)+factor(species2)+.....+factor(speciesn), > sp_data) > I receive the same results as without the factor() part. > BTW just a subset of the locations are displayed what is pretty weird > considering that I included all locations in the analysis. > > Martin > > > >>x <- tree(factor(var)~species) >> >>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>David W. Roberts office 406-994-4548 >>Professor and Head FAX 406-994-3190 >>Department of Ecology email droberts at montana.edu >>Montana State University >>Bozeman, MT 59717-3460 >> >>Martin Wegmann wrote: >> >>>Dear R-user, >>> >>>I tried to generate classification / regression tree with a >>>absence/presence matrix of species (400) in different locations (50) to >>>visualise species which are important for splitting up two locations. >>>Rpart and tree did not work for more than 10 species which is logical due >>>to the limited amount of locations (n=50). However the error prompt is a >>>"+" and no specific message, but I am pretty sure that I did not enter a >>>false sign by mistake. >>>Is it allowed at all to use 0/1 data for this statistical technique and >>>if yes is there a way or different method to use all 400 species entries? >>>Otherwise I would apply a PCA beforehand but I would prefer to have the >>>raw species informations. >>> >>>using R 2.1.1-1 (debian repos.) >>> >>>regards, Martin >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! >>http://www.R-project.org/posting-guide.html > >-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email droberts at montana.edu Montana State University Bozeman, MT 59717-3460