Dear list, I cannot figure out why, after sub-setting my data, that particular item which I don't want to plot is still in the newly created subset (please see example below). R somehow remembers what was in the original data set. A work around is exporting and importing the new subset. Then it's all fine; but I don't like this idea and was wondering what am I missing here? Thanks! Stefan P.S. I am using R 2.13.2 for Mac.> dat<-read.csv("~/MyFiles/data.csv") > class(dat$treat)[1] "factor"> dattreat yield 1 cont 98.7 2 cont 97.2 3 cont 96.1 4 cont 98.1 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3> plot(dat$treat,dat$yield) > dat.sub<-dat[which(dat$treat!='cont')] > class(dat.sub$treat)[1] "factor"> dat.subtreat yield 5 10 103.0 6 10 101.3 7 10 102.1 8 10 101.9 9 30 121.1 10 30 123.1 11 30 119.7 12 30 118.9 13 60 109.9 14 60 110.1 15 60 113.1 16 60 112.3> plot(dat.sub$treat,dat.sub$yield)[[alternative HTML version deleted]]
> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Schreiber, Stefan > Sent: Tuesday, November 01, 2011 2:29 PM > To: r-help at r-project.org > Subject: [R] factor level issue after subsetting > > Dear list, > > I cannot figure out why, after sub-setting my data, that particular > item > which I don't want to plot is still in the newly created subset (please > see example below). R somehow remembers what was in the original data > set.That is the nature of factors. Once created, unused levels must be xplicitly dropped plot(droplevels(dat.sub$treat),dat.sub$yield) Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 A work around is exporting and importing the new subset. Then it's> all fine; but I don't like this idea and was wondering what am I > missing > here? > > Thanks! > Stefan > > P.S. I am using R 2.13.2 for Mac. > > > dat<-read.csv("~/MyFiles/data.csv") > > class(dat$treat) > [1] "factor" > > dat > treat yield > 1 cont 98.7 > 2 cont 97.2 > 3 cont 96.1 > 4 cont 98.1 > 5 10 103.0 > 6 10 101.3 > 7 10 102.1 > 8 10 101.9 > 9 30 121.1 > 10 30 123.1 > 11 30 119.7 > 12 30 118.9 > 13 60 109.9 > 14 60 110.1 > 15 60 113.1 > 16 60 112.3 > > plot(dat$treat,dat$yield) > > dat.sub<-dat[which(dat$treat!='cont')] > > class(dat.sub$treat) > [1] "factor" > > dat.sub > treat yield > 5 10 103.0 > 6 10 101.3 > 7 10 102.1 > 8 10 101.9 > 9 30 121.1 > 10 30 123.1 > 11 30 119.7 > 12 30 118.9 > 13 60 109.9 > 14 60 110.1 > 15 60 113.1 > 16 60 112.3 > > plot(dat.sub$treat,dat.sub$yield) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
first of all, the subsetting line is overly complicated. dat.sub<-dat[dat$treat!='cont',] will work just fine. R does exactly what you're describing. It knows the levels of the factor. Once you remove 'cont' from the data, that doesn't mean that the level is removed from the factor:> df<-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100)) > str(df)'data.frame': 100 obs. of 2 variables: $ let: Factor w/ 5 levels "a","b","c","d",..: 1 5 1 4 3 5 2 2 1 3 ... $ num: num 0.224 -0.523 0.974 -0.268 -0.61 ...> df.sub<-df[df$let!='a',] > str(df.sub)'data.frame': 82 obs. of 2 variables: $ let: Factor w/ 5 levels "a","b","c","d",..: 5 4 3 5 2 2 3 3 5 3 ... $ num: num -0.523 -0.268 -0.61 -1.383 -0.193 ...> unique(df.sub$let)[1] e d c b Levels: a b c d e> df.sub$let<-factor(df.sub$let) > unique(df.sub$let)[1] e d c b Levels: e d c b> str(df.sub$let)Factor w/ 4 levels "e","d","c","b": 1 2 3 1 4 4 3 3 1 3 ...>by redefining your factor you can eliminate the problem. the other option, if you don't want factors to begin with is: options(stringsAsFactors=FALSE) # to set the global option or dat<-read.csv("~/MyFiles/data.csv",stringsAsFactors=FALSE) # to set the option locally for this single read.csv call. On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan <Stefan.Schreiber at ales.ualberta.ca> wrote:> Dear list, > > I cannot figure out why, after sub-setting my data, that particular item > which I don't want to plot is still in the newly created subset (please > see example below). R somehow remembers what was in the original data > set. A work around is exporting and importing the new subset. Then it's > all fine; but I don't like this idea and was wondering what am I missing > here? > > Thanks! > Stefan > > P.S. I am using R 2.13.2 for Mac. > >> dat<-read.csv("~/MyFiles/data.csv") >> class(dat$treat) > [1] "factor" >> dat > ? treat yield > 1 ? cont ?98.7 > 2 ? cont ?97.2 > 3 ? cont ?96.1 > 4 ? cont ?98.1 > 5 ? ? 10 103.0 > 6 ? ? 10 101.3 > 7 ? ? 10 102.1 > 8 ? ? 10 101.9 > 9 ? ? 30 121.1 > 10 ? ?30 123.1 > 11 ? ?30 119.7 > 12 ? ?30 118.9 > 13 ? ?60 109.9 > 14 ? ?60 110.1 > 15 ? ?60 113.1 > 16 ? ?60 112.3 >> plot(dat$treat,dat$yield) >> dat.sub<-dat[which(dat$treat!='cont')] >> class(dat.sub$treat) > [1] "factor" >> dat.sub > ? treat yield > 5 ? ? 10 103.0 > 6 ? ? 10 101.3 > 7 ? ? 10 102.1 > 8 ? ? 10 101.9 > 9 ? ? 30 121.1 > 10 ? ?30 123.1 > 11 ? ?30 119.7 > 12 ? ?30 118.9 > 13 ? ?60 109.9 > 14 ? ?60 110.1 > 15 ? ?60 113.1 > 16 ? ?60 112.3 >> plot(dat.sub$treat,dat.sub$yield) > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Stefan:
Use the droplevels function...
dat <- read.table(textConnection("
treat yield
1 cont 98.7
2 cont 97.2
3 cont 96.1
4 cont 98.1
5 10 103.0
6 10 101.3
7 10 102.1
8 10 101.9
9 30 121.1
10 30 123.1
11 30 119.7
12 30 118.9
13 60 109.9
14 60 110.1
15 60 113.1
16 60 112.3"),header=T)
dat
plot(dat$treat,dat$yield)
dat.sub <- subset(dat,treat!="cont");dat.sub
dat.sub <- droplevels(dat.sub) # drop unwanted levels
plot(dat.sub$treat,dat.sub$yield)
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx
From: "Schreiber, Stefan"
<Stefan.Schreiber@ales.ualberta.ca>>To: r-help@r-project.org
>Sent: Tuesday, November 1, 2011 2:28 PM
>Subject: [R] factor level issue after subsetting
>
>Dear list,
>
>I cannot figure out why, after sub-setting my data, that particular item
>which I don't want to plot is still in the newly created subset (please
>see example below). R somehow remembers what was in the original data
>set. A work around is exporting and importing the new subset. Then it's
>all fine; but I don't like this idea and was wondering what am I missing
>here?
>
>Thanks!
>Stefan
>
>P.S. I am using R 2.13.2 for Mac.
>
>> dat<-read.csv("~/MyFiles/data.csv")
>> class(dat$treat)
>[1] "factor"
>> dat
> treat yield
>1 cont 98.7
>2 cont 97.2
>3 cont 96.1
>4 cont 98.1
>5 10 103.0
>6 10 101.3
>7 10 102.1
>8 10 101.9
>9 30 121.1
>10 30 123.1
>11 30 119.7
>12 30 118.9
>13 60 109.9
>14 60 110.1
>15 60 113.1
>16 60 112.3
>> plot(dat$treat,dat$yield)
>> dat.sub<-dat[which(dat$treat!='cont')]
>> class(dat.sub$treat)
>[1] "factor"
>> dat.sub
> treat yield
>5 10 103.0
>6 10 101.3
>7 10 102.1
>8 10 101.9
>9 30 121.1
>10 30 123.1
>11 30 119.7
>12 30 118.9
>13 60 109.9
>14 60 110.1
>15 60 113.1
>16 60 112.3
>> plot(dat.sub$treat,dat.sub$yield)
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
[[alternative HTML version deleted]]