Dear all! W2k, R 2.5.1 I am working with an ongoing malting barley variety evaluation within Sweden. The structure is 25 cultivars tested each year at four sites, in field trials with three replicates and 'lattice' structure (the replicates are divided into five sub blocks in a structured way). As we are normally keeping around 15 varieties from each year to the next, and take in 10 new for next year, we have tested totally 72 different varieties during five years. I store the data in a field trial database, and generate text tables with the subset of data I want and import the frame to R. I take in all cultivars in R and use 'subset' to select what I want to look at. Using lme{nlme} works with no problems to get mean results over the years, but as I now have a number of years I want to analyse the general site x cultivar relation. I am testing AMMI{agricolae} for this and it seems to work except for the subsetting. This is what happens: If I do the subsetting like this: x62_samvar <- subset(x62_5, cn %in% c("Astoria","Barke","Christina","Makof", "Prestige","Publican","Quench")) A test run with AMMI seems to work in the first part:> AMMI(site, cn, rep, yield)ANALYSIS AMMI: yield Class level information ENV: Hag Klb Bjt Ska GEN: Astoria Prestige Makof Christina Publican Quench REP: 1 2 3 Number of observations: 240 model Y: yield ~ ENV + REP%in%ENV + GEN + ENV:GEN Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) ENV 3 120092418 40030806 90.0424 1.665e-06 *** REP(ENV) 8 3556620 444578 0.5674 0.803923 GEN 5 21376142 4275228 5.4564 9.680e-05 *** ENV:GEN 15 28799807 1919987 2.4504 0.002555 ** Residuals 208 162973213 783525 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Coeff var Mean yield 13.08629 6764.098 After this something goes wrong, as AMMI finds a cultivar name not selected in the subsetting. (The plotting might go wrong anyhow, but I haven?t got that far yet): Error in model.frame.default(Terms, newdata, na.action = na.action, xlev object$xlevels) : factor 'y' has new level(s) Arkadia Looking at the dataframe using> edit(x62_samvar)only shows the selected lines, but using levels() gives another answer as> levels(x62_samvar$cn)gives back all 72 cultivar names used during the five years (starting with Arcadia). Where do I go wrong and how do I use subset in a proper way? Thanks /CG -- CG Pettersson, PhD Swedish University of Agricultural Sciences (SLU) Dept. of Crop Production Ecology. Box 7043. SE-750 07 Uppsala, Sweden cg.pettersson at vpe.slu.se
CG Pettersson wrote:> Dear all! > > W2k, R 2.5.1 > > I am working with an ongoing malting barley variety evaluation within > Sweden. The structure is 25 cultivars tested each year at four sites, in > field trials with three replicates and 'lattice' structure (the replicates > are divided into five sub blocks in a structured way). As we are normally > keeping around 15 varieties from each year to the next, and take in 10 new > for next year, we have tested totally 72 different varieties during five > years. > > I store the data in a field trial database, and generate text tables with > the subset of data I want and import the frame to R. I take in all > cultivars in R and use 'subset' to select what I want to look at. Using > lme{nlme} works with no problems to get mean results over the years, but > as I now have a number of years I want to analyse the general site x > cultivar relation. I am testing AMMI{agricolae} for this and it seems to > work except for the subsetting. This is what happens: > > If I do the subsetting like this: > > x62_samvar <- subset(x62_5, cn %in% > c("Astoria","Barke","Christina","Makof", "Prestige","Publican","Quench")) > > A test run with AMMI seems to work in the first part: > >> AMMI(site, cn, rep, yield) > > ANALYSIS AMMI: yield > Class level information > > ENV: Hag Klb Bjt Ska > GEN: Astoria Prestige Makof Christina Publican Quench > REP: 1 2 3 > > Number of observations: 240 > > model Y: yield ~ ENV + REP%in%ENV + GEN + ENV:GEN > > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > ENV 3 120092418 40030806 90.0424 1.665e-06 *** > REP(ENV) 8 3556620 444578 0.5674 0.803923 > GEN 5 21376142 4275228 5.4564 9.680e-05 *** > ENV:GEN 15 28799807 1919987 2.4504 0.002555 ** > Residuals 208 162973213 783525 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Coeff var Mean yield > 13.08629 6764.098 > > After this something goes wrong, as AMMI finds a cultivar name not > selected in the subsetting. (The plotting might go wrong anyhow, but I > haven?t got that far yet): > > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev > object$xlevels) : > factor 'y' has new level(s) Arkadia > > > Looking at the dataframe using > >> edit(x62_samvar) > > only shows the selected lines, but using levels() gives another answer as > >> levels(x62_samvar$cn) > > gives back all 72 cultivar names used during the five years (starting with > Arcadia). > > Where do I go wrong and how do I use subset in a proper way?So you have to drop the levels you are excluding. Example: x <- factor(letters[1:4]) x x[1:2] x[1:2, drop=TRUE] Uwe Ligges> Thanks > /CG >
Thanks a lot. But an ignorant R user, like me, needed the code example from Jim Holtman posted outside the list earlier today to understand that: x62_samvar$cn <- x62_samvar$cn[,drop=TRUE] was the way to code. Thank you both! /CG On Thu, July 19, 2007 3:01 pm, Uwe Ligges said:> > > CG Pettersson wrote: >> Dear all! >> >> W2k, R 2.5.1 >> >> I am working with an ongoing malting barley variety evaluation within >> Sweden. The structure is 25 cultivars tested each year at four sites, in/snip>> >> Where do I go wrong and how do I use subset in a proper way? > > > So you have to drop the levels you are excluding. Example: > > x <- factor(letters[1:4]) > x > x[1:2] > x[1:2, drop=TRUE] > > > Uwe Ligges > > > > >> Thanks >> /CG >> >-- CG Pettersson, PhD Swedish University of Agricultural Sciences (SLU) Dept. of Crop Production Ecology. Box 7043. SE-750 07 Uppsala, Sweden cg.pettersson at vpe.slu.se