Hi... I'm working through the book, A Handbook of Statistical Analyses using R by Everitt, and I'm trying to do the following (p. 19 of his book): boxplot(log(marketvalue)~country, data = subset(Forbes2000, country %in% c("United Kingdom","Germany","India","Turkey")), ylab="log(marketvalue", varwidth=TRUE) This *almost* works, but I'm getting ALL the countries on the x-axis, not just the 4-specified. I tried tinkering with variations in the subset command to no avail. Can someone tell me what's wrong/missing with the above command? Thanks, Joe
Part of the problem is that 'country' is probably a factor and you will get all the levels that were in the original factor in the new subset. Try the following that will remove the extra levels in the factor: my.subset <- subset(Forbes2000, country %in% c("United Kingdom","Germany","India","Turkey")) # remove the extra factor levels my.subset$country <- my.subset$country[, drop=TRUE] boxplot(log(marketvalue)~country, data = my.subset, ylab="log(marketvalue", varwidth=TRUE) On Jan 19, 2008 10:11 PM, Joe Trubisz <jtrubisz at mac.com> wrote:> Hi... > > I'm working through the book, A Handbook of Statistical Analyses > using R by Everitt, and I'm trying to do the following (p. 19 of his > book): > > boxplot(log(marketvalue)~country, > data = subset(Forbes2000, > country %in% c("United > Kingdom","Germany","India","Turkey")), > ylab="log(marketvalue", > varwidth=TRUE) > > This *almost* works, but I'm getting ALL the countries on the x-axis, > not just the 4-specified. > I tried tinkering with variations in the subset command to no avail. > > Can someone tell me what's wrong/missing with the above command? > > Thanks, > Joe > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
This is very standard problem for neophytes. Subsetting a factor does not automatically subset the levels, so the all appear in the boxplot. I think the simplest way round this is to replace ~ country in the formula by ~ factor(country). The call to factor() will re-set the levels to only those which appear. So try boxplot(log(marketvalue) ~ factor(country), ## changed line data = subset(Forbes2000, country %in% c("United Kingdom","Germany","India","Turkey")), ylab = "log(marketvalue)", varwidth = TRUE) Bill Venables. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Joe Trubisz Sent: Sunday, 20 January 2008 1:12 PM To: R-help at r-project.org Subject: [R] Newbie question on subsets Hi... I'm working through the book, A Handbook of Statistical Analyses using R by Everitt, and I'm trying to do the following (p. 19 of his book): boxplot(log(marketvalue) ~ country, data = subset(Forbes2000, country %in% c("United Kingdom","Germany","India","Turkey")), ylab = "log(marketvalue)", varwidth = TRUE) This *almost* works, but I'm getting ALL the countries on the x-axis, not just the 4-specified. I tried tinkering with variations in the subset command to no avail. Can someone tell me what's wrong/missing with the above command? Thanks, Joe ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.