thr3ads.net - R help - [R] Subsetting dataframes [Jul 2007]

If this information is useful, please help other people find it:
Share via:

CG Pettersson

2007-Jul-19 09:52 UTC

[R] Subsetting dataframes

Dear all!

W2k, R 2.5.1

I am working with an ongoing malting barley variety evaluation within
Sweden. The structure is 25 cultivars tested each year at four sites, in
field trials with three replicates and 'lattice' structure (the
replicates
are divided into five sub blocks in a structured way). As we are normally
keeping around 15 varieties from each year to the next, and take in 10 new
for next year, we have tested totally 72 different varieties during five
years.

I store the data in a field trial database, and generate text tables with
the subset of data I want and import the frame to R. I take in all
cultivars in R and use 'subset' to select what I want to look at. Using
lme{nlme} works with no problems to get mean results over the years, but
as I now have a number of years I want to analyse the general site x
cultivar relation. I am testing AMMI{agricolae} for this and it seems to
work except for the subsetting. This is what happens:

If I do the subsetting like this:

x62_samvar <- subset(x62_5, cn %in%
c("Astoria","Barke","Christina","Makof",
"Prestige","Publican","Quench"))

A test run with AMMI seems to work in the first part:
> AMMI(site, cn, rep, yield)
ANALYSIS AMMI:  yield
Class level information

ENV:  Hag Klb Bjt Ska
GEN:  Astoria Prestige Makof Christina Publican Quench
REP:  1 2 3

Number of observations:  240

model Y: yield  ~ ENV + REP%in%ENV + GEN + ENV:GEN

Analysis of Variance Table

Response: Y
           Df    Sum Sq   Mean Sq F value    Pr(>F)
ENV         3 120092418  40030806 90.0424 1.665e-06 ***
REP(ENV)    8   3556620    444578  0.5674  0.803923
GEN         5  21376142   4275228  5.4564 9.680e-05 ***
ENV:GEN    15  28799807   1919987  2.4504  0.002555 **
Residuals 208 162973213    783525
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Coeff var       Mean yield
13.08629         6764.098

After this something goes wrong, as AMMI finds a cultivar name not
selected in the subsetting. (The plotting might go wrong anyhow, but I
haven?t got that far yet):

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
object$xlevels) :
        factor 'y' has new level(s) Arkadia


Looking at the dataframe using
> edit(x62_samvar)
only shows the selected lines, but using levels() gives another answer as
> levels(x62_samvar$cn)
gives back all 72 cultivar names used during the five years (starting with
Arcadia).

Where do I go wrong and how do I use subset in a proper way?

Thanks
/CG

-- 
CG Pettersson, PhD
Swedish University of Agricultural Sciences (SLU)
Dept. of Crop Production Ecology. Box 7043.
SE-750 07 Uppsala, Sweden
cg.pettersson at vpe.slu.se

Uwe Ligges

2007-Jul-19 13:01 UTC

head link

[R] Subsetting dataframes

CG Pettersson wrote:> Dear all!
> 
> W2k, R 2.5.1
> 
> I am working with an ongoing malting barley variety evaluation within
> Sweden. The structure is 25 cultivars tested each year at four sites, in
> field trials with three replicates and 'lattice' structure (the
replicates
> are divided into five sub blocks in a structured way). As we are normally
> keeping around 15 varieties from each year to the next, and take in 10 new
> for next year, we have tested totally 72 different varieties during five
> years.
> 
> I store the data in a field trial database, and generate text tables with
> the subset of data I want and import the frame to R. I take in all
> cultivars in R and use 'subset' to select what I want to look at.
Using
> lme{nlme} works with no problems to get mean results over the years, but
> as I now have a number of years I want to analyse the general site x
> cultivar relation. I am testing AMMI{agricolae} for this and it seems to
> work except for the subsetting. This is what happens:
> 
> If I do the subsetting like this:
> 
> x62_samvar <- subset(x62_5, cn %in%
>
c("Astoria","Barke","Christina","Makof",
"Prestige","Publican","Quench"))
> 
> A test run with AMMI seems to work in the first part:
> 
>> AMMI(site, cn, rep, yield)
> 
> ANALYSIS AMMI:  yield
> Class level information
> 
> ENV:  Hag Klb Bjt Ska
> GEN:  Astoria Prestige Makof Christina Publican Quench
> REP:  1 2 3
> 
> Number of observations:  240
> 
> model Y: yield  ~ ENV + REP%in%ENV + GEN + ENV:GEN
> 
> Analysis of Variance Table
> 
> Response: Y
>            Df    Sum Sq   Mean Sq F value    Pr(>F)
> ENV         3 120092418  40030806 90.0424 1.665e-06 ***
> REP(ENV)    8   3556620    444578  0.5674  0.803923
> GEN         5  21376142   4275228  5.4564 9.680e-05 ***
> ENV:GEN    15  28799807   1919987  2.4504  0.002555 **
> Residuals 208 162973213    783525
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> 
> Coeff var       Mean yield
> 13.08629         6764.098
> 
> After this something goes wrong, as AMMI finds a cultivar name not
> selected in the subsetting. (The plotting might go wrong anyhow, but I
> haven?t got that far yet):
> 
> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
> object$xlevels) :
>         factor 'y' has new level(s) Arkadia
> 
> 
> Looking at the dataframe using
> 
>> edit(x62_samvar)
> 
> only shows the selected lines, but using levels() gives another answer as
> 
>> levels(x62_samvar$cn)
> 
> gives back all 72 cultivar names used during the five years (starting with
> Arcadia).
> 
> Where do I go wrong and how do I use subset in a proper way?

So you have to drop the levels you are excluding. Example:

   x <- factor(letters[1:4])
   x
   x[1:2]
   x[1:2, drop=TRUE]


Uwe Ligges



> Thanks
> /CG
>

CG Pettersson

2007-Jul-19 14:20 UTC

head link

[R] Subsetting dataframes

Thanks a lot.
But an ignorant R user, like me, needed the code example from Jim Holtman
posted outside the list earlier today to understand that:

x62_samvar$cn <- x62_samvar$cn[,drop=TRUE]

was the way to code. Thank you both!

/CG


On Thu, July 19, 2007 3:01 pm, Uwe Ligges said:>
>
> CG Pettersson wrote:
>> Dear all!
>>
>> W2k, R 2.5.1
>>
>> I am working with an ongoing malting barley variety evaluation within
>> Sweden. The structure is 25 cultivars tested each year at four sites,
in
/snip
>>
>> Where do I go wrong and how do I use subset in a proper way?
>
>
> So you have to drop the levels you are excluding. Example:
>
>    x <- factor(letters[1:4])
>    x
>    x[1:2]
>    x[1:2, drop=TRUE]
>
>
> Uwe Ligges
>
>
>
>
>> Thanks
>> /CG
>>
>

-- 
CG Pettersson, PhD
Swedish University of Agricultural Sciences (SLU)
Dept. of Crop Production Ecology. Box 7043.
SE-750 07 Uppsala, Sweden
cg.pettersson at vpe.slu.se

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jul 2007 - Subsetting dataframes

[R] Subsetting dataframes

[R] Subsetting dataframes

[R] Subsetting dataframes

Possibly Parallel Threads