Hi...
I'm working through the book, A Handbook of Statistical Analyses
using R by Everitt, and I'm trying to do the following (p. 19 of his
book):
boxplot(log(marketvalue)~country,
data = subset(Forbes2000,
country %in% c("United
Kingdom","Germany","India","Turkey")),
ylab="log(marketvalue",
varwidth=TRUE)
This *almost* works, but I'm getting ALL the countries on the x-axis,
not just the 4-specified.
I tried tinkering with variations in the subset command to no avail.
Can someone tell me what's wrong/missing with the above command?
Thanks,
Joe
Part of the problem is that 'country' is probably a factor and you
will get all the levels that were in the original factor in the new
subset. Try the following that will remove the extra levels in the
factor:
my.subset <- subset(Forbes2000,
country %in% c("United
Kingdom","Germany","India","Turkey"))
# remove the extra factor levels
my.subset$country <- my.subset$country[, drop=TRUE]
boxplot(log(marketvalue)~country,
data = my.subset,
ylab="log(marketvalue",
varwidth=TRUE)
On Jan 19, 2008 10:11 PM, Joe Trubisz <jtrubisz at mac.com>
wrote:> Hi...
>
> I'm working through the book, A Handbook of Statistical Analyses
> using R by Everitt, and I'm trying to do the following (p. 19 of his
> book):
>
> boxplot(log(marketvalue)~country,
> data = subset(Forbes2000,
> country %in% c("United
> Kingdom","Germany","India","Turkey")),
> ylab="log(marketvalue",
> varwidth=TRUE)
>
> This *almost* works, but I'm getting ALL the countries on the x-axis,
> not just the 4-specified.
> I tried tinkering with variations in the subset command to no avail.
>
> Can someone tell me what's wrong/missing with the above command?
>
> Thanks,
> Joe
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
This is very standard problem for neophytes. Subsetting a factor does
not automatically subset the levels, so the all appear in the boxplot.
I think the simplest way round this is to replace ~ country in the
formula by ~ factor(country). The call to factor() will re-set the
levels to only those which appear. So try
boxplot(log(marketvalue) ~ factor(country), ## changed line
data = subset(Forbes2000,
country %in%
c("United
Kingdom","Germany","India","Turkey")),
ylab = "log(marketvalue)", varwidth = TRUE)
Bill Venables.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Joe Trubisz
Sent: Sunday, 20 January 2008 1:12 PM
To: R-help at r-project.org
Subject: [R] Newbie question on subsets
Hi...
I'm working through the book, A Handbook of Statistical Analyses
using R by Everitt, and I'm trying to do the following (p. 19 of his
book):
boxplot(log(marketvalue) ~ country,
data = subset(Forbes2000,
country %in%
c("United
Kingdom","Germany","India","Turkey")),
ylab = "log(marketvalue)", varwidth = TRUE)
This *almost* works, but I'm getting ALL the countries on the x-axis,
not just the 4-specified.
I tried tinkering with variations in the subset command to no avail.
Can someone tell me what's wrong/missing with the above command?
Thanks,
Joe
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.