Hello all! I have been using R for two years, but still have trivial problems. For the moment, I run R1.9.1 on a W2k computer. I work with a dataset from a variety evaluation project. Every year 25 varieties have been tested. The tested cultivars have been the same each year in all trials, but some have changed over the three year period. Now I want to do some calculations and plotting on subsets. The structure looks like this: Year ADB Block Vcode Variety Yield Protein 2002 SW0024 1 20226 Denise 5843 12.8.... 2002 SW0024 1 9865 Astoria 6729 11.4 2002 SW0024 1 9622 Barke 6121 12 2002 SW0024 1 9604 Cecilia 5579 12.7 2002 SW0024 1 20223 Granta 5591 11.6 2002 SW0024 1 20222 Class 5591 11.7 2002 SW0024 1 9922 Wiking 5744 12.5 2002 SW0024 1 20103 Vortex 5863 10.6 . . And so on both down and sideways. Three years and four trials with three replicats each gives 900 lines. How do I use several criteria to subset this? Ast <- subset(data, Variety == "Astoria") works fine giving back the 36 lines where Astoria appears. But how do I pick two (or more) varieties? AND picking one (or two) years? Everything I have tried results in rubbish, like: AstBark <- subset(data, Variety == c("Astoria","Barke")) which seems to work but gives back 19 and 18 lines of the varieties respectively. There exists 36 of each.... Thanks /CG CG Pettersson, MSci, PhD Stud. Swedish University of Agricultural Sciences Dep. of Ecology and Crop Production. Box 7043 SE-750 07 Uppsala
Use %in% instead of ==, and string conditions together with &. HTH, Andy> From: CG Pettersson > > Hello all! > > I have been using R for two years, but still have trivial problems. > For the moment, I run R1.9.1 on a W2k computer. > > I work with a dataset from a variety evaluation project. Every year 25 > varieties have been tested. The tested cultivars have been the same > each year in all trials, but some have changed over the three year > period. Now I want to do some calculations and plotting on subsets. > The structure looks like this: > > Year ADB Block Vcode Variety Yield Protein > 2002 SW0024 1 20226 Denise 5843 12.8.... > 2002 SW0024 1 9865 Astoria 6729 11.4 > 2002 SW0024 1 9622 Barke 6121 12 > 2002 SW0024 1 9604 Cecilia 5579 12.7 > 2002 SW0024 1 20223 Granta 5591 11.6 > 2002 SW0024 1 20222 Class 5591 11.7 > 2002 SW0024 1 9922 Wiking 5744 12.5 > 2002 SW0024 1 20103 Vortex 5863 10.6 > . > . > And so on both down and sideways. > Three years and four trials with three replicats each gives 900 lines. > > > How do I use several criteria to subset this? > > Ast <- subset(data, Variety == "Astoria") > > works fine giving back the 36 lines where Astoria appears. > But how do I pick two (or more) varieties? AND picking one (or two) > years? > > Everything I have tried results in rubbish, like: > > AstBark <- subset(data, Variety == c("Astoria","Barke")) > > which seems to work but gives back 19 and 18 lines of the varieties > > respectively. There exists 36 of each.... > > Thanks > > /CG > > > CG Pettersson, MSci, PhD Stud. > Swedish University of Agricultural Sciences > Dep. of Ecology and Crop Production. Box 7043 > SE-750 07 Uppsala > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
On Fri, 29 Oct 2004, CG Pettersson wrote:> I have been using R for two years, but still have trivial problems. > For the moment, I run R1.9.1 on a W2k computer.Time to read a book! This sort of thing is discussed at length in Chapter 2 of MASS4 (see the FAQ), for example. It even discusses why your example does not work.> I work with a dataset from a variety evaluation project. Every year 25 > varieties have been tested. The tested cultivars have been the same > each year in all trials, but some have changed over the three year > period. Now I want to do some calculations and plotting on subsets. > The structure looks like this: > > Year ADB Block Vcode Variety Yield Protein > 2002 SW0024 1 20226 Denise 5843 12.8.... > 2002 SW0024 1 9865 Astoria 6729 11.4 > 2002 SW0024 1 9622 Barke 6121 12 > 2002 SW0024 1 9604 Cecilia 5579 12.7 > 2002 SW0024 1 20223 Granta 5591 11.6 > 2002 SW0024 1 20222 Class 5591 11.7 > 2002 SW0024 1 9922 Wiking 5744 12.5 > 2002 SW0024 1 20103 Vortex 5863 10.6 > . > . > And so on both down and sideways. > Three years and four trials with three replicats each gives 900 lines. > > How do I use several criteria to subset this? > > Ast <- subset(data, Variety == "Astoria") > > works fine giving back the 36 lines where Astoria appears. > But how do I pick two (or more) varieties? AND picking one (or two) > years? > > Everything I have tried results in rubbish, like: > > AstBark <- subset(data, Variety == c("Astoria","Barke"))Variety %in% c("Astoria","Barke") is probably what you intended. You can pick also on years by using & in the indexing criterion. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi On 29 Oct 2004 at 12:00, CG Pettersson wrote:> Hello all! > > I have been using R for two years, but still have trivial problems. > For the moment, I run R1.9.1 on a W2k computer. > > I work with a dataset from a variety evaluation project. Every year 25 > varieties have been tested. The tested cultivars have been the same > each year in all trials, but some have changed over the three year > period. Now I want to do some calculations and plotting on subsets. > The structure looks like this: > > Year ADB Block Vcode Variety Yield Protein > 2002 SW0024 1 20226 Denise 5843 12.8.... > 2002 SW0024 1 9865 Astoria 6729 11.4 > 2002 SW0024 1 9622 Barke 6121 12 > 2002 SW0024 1 9604 Cecilia 5579 12.7 > 2002 SW0024 1 20223 Granta 5591 11.6 > 2002 SW0024 1 20222 Class 5591 11.7 > 2002 SW0024 1 9922 Wiking 5744 12.5 > 2002 SW0024 1 20103 Vortex 5863 10.6 > . > . > And so on both down and sideways. > Three years and four trials with three replicats each gives 900 lines. > > > How do I use several criteria to subset this? > > Ast <- subset(data, Variety == "Astoria") > > works fine giving back the 36 lines where Astoria appears. > But how do I pick two (or more) varieties? AND picking one (or two) > years? > > Everything I have tried results in rubbish, like: > > AstBark <- subset(data, Variety == c("Astoria","Barke"))subset(data, subset = (Variety %in% c("Astoria", "Barke"))) should work. There has to be logical expression subset in subset() function which gives you correct answer. HTH Cheers Petr> > which seems to work but gives back 19 and 18 lines of the varieties > > respectively. There exists 36 of each.... > > Thanks > > /CG > > > CG Pettersson, MSci, PhD Stud. > Swedish University of Agricultural Sciences > Dep. of Ecology and Crop Production. Box 7043 > SE-750 07 Uppsala > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz