I have read the R online help and wiki and I cannot seem to get something to work the way I need it to. I want to create a new data frame from an subset of an existing data frame which has no reference to the original superset. If you following this example, what I am trying to do may make more sense. I have a file with values like this: shirt,size,40 shirt,color,10 shirt,length,10 shirt,brand, 1 shoes,style,5 shoes,brand,4 shoes,color,1 and I read it into a dataframe like: x <- data.frame(read.delim("temp2.txt", sep=",", header=FALSE)) I then want to plot just a subset of this data (say shirts only)... y <- data.frame(subset(x, V1 == "shirt")) plot(x2[,2:3]) when I do, the resulting plot contains an empty value for 'color' even though my subset has no value in column V2 that equals 'color' anymore. Is it possible create a new data.frame that truly deletes the rows from the original data frame that I am excluding with the subset parameter? Thanks, Michelle [[alternative HTML version deleted]]
Not sure what 'x2' is that you are plotting; it is not defined. read.delimand subset return dataframes, so you don't need data.frame. Here is something that does work:> x <- "shirt,size,40+ shirt,color,10 + shirt,length,10 + shirt,brand, 1 + shoes,style,5 + shoes,brand,4 + shoes,color,1"> x <- read.csv(textConnection(x), header=FALSE) > > > xV1 V2 V3 1 shirt size 40 2 shirt color 10 3 shirt length 10 4 shirt brand 1 5 shoes style 5 6 shoes brand 4 7 shoes color 1> plot(subset(x, V1=="shirt")[,2:3]) >On 6/16/07, Michelle Wynn <mlwynn@indiana.edu> wrote:> > I have read the R online help and wiki and I cannot seem to get something > to > work the way I need it to. > > I want to create a new data frame from an subset of an existing data frame > which has no reference to the original superset. If you following this > example, what I am trying to do may make more sense. > > I have a file with values like this: > > shirt,size,40 > shirt,color,10 > shirt,length,10 > shirt,brand, 1 > shoes,style,5 > shoes,brand,4 > shoes,color,1 > > and I read it into a dataframe like: > x <- data.frame(read.delim("temp2.txt", sep=",", header=FALSE)) > > I then want to plot just a subset of this data (say shirts only)... > y <- data.frame(subset(x, V1 == "shirt")) > plot(x2[,2:3]) > > when I do, the resulting plot contains an empty value for 'color' even > though my subset has no value in column V2 that equals 'color' anymore. > > Is it possible create a new data.frame that truly deletes the rows from > the > original data frame that I am excluding with the subset parameter? > > Thanks, > Michelle > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
I too have no idea what the object named "x2" is, or where it came from. Particularly since after your use of subset(), the new dataframe, y, *does* include a row where V2 = 'color'. But I have a guess at what your problem may be. In your original dataframe ("x") the first and second columns are factors, because that is the default behavior of read.delim(). Factors have levels. The second column has 5 levels. Try levels(x$V2) to see. When you use subset(), you get fewer rows, but the fact that there were five levels is retained. Then, the plot function sees that that there are five levels, and includes an empty place-holder for the level(s) with no data. Try something like y <- data.frame(subset(x, V1 == "shirt")) y$V2 <- factor(unique(format(y$V2))) to force it to get rid of the now-empty factor levels. There are other ways to do this, I just don't happen to remember any of them at the moment. If I'm right this is a question that comes up fairly often. Might even be in the FAQs. -Don At 12:15 PM -0700 6/16/07, Michelle Wynn wrote:>I have read the R online help and wiki and I cannot seem to get something to >work the way I need it to. > >I want to create a new data frame from an subset of an existing data frame >which has no reference to the original superset. If you following this >example, what I am trying to do may make more sense. > >I have a file with values like this: > >shirt,size,40 >shirt,color,10 >shirt,length,10 >shirt,brand, 1 >shoes,style,5 >shoes,brand,4 >shoes,color,1 > >and I read it into a dataframe like: >x <- data.frame(read.delim("temp2.txt", sep=",", header=FALSE)) > >I then want to plot just a subset of this data (say shirts only)... >y <- data.frame(subset(x, V1 == "shirt")) >plot(x2[,2:3]) > >when I do, the resulting plot contains an empty value for 'color' even >though my subset has no value in column V2 that equals 'color' anymore. > >Is it possible create a new data.frame that truly deletes the rows from the >original data frame that I am excluding with the subset parameter? > >Thanks, >Michelle > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- --------------------------------- Don MacQueen Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 macq at llnl.gov
Yes, the problem I have is finding away to get rid of the extra levels I do not need in the subset. What Don posted works for my needs. I will also read up on factors/levels. Thanks for all suggestions and sorry my example had and error (x2 was supposed to y). Michelle On 6/16/07, Don MacQueen <macq@llnl.gov> wrote:> > I too have no idea what the object named "x2" is, or where it came > from. Particularly since after your use of subset(), the new > dataframe, y, *does* include a row where V2 = 'color'. > > But I have a guess at what your problem may be. > > In your original dataframe ("x") the first and second columns are > factors, because that is the default behavior of read.delim(). > > Factors have levels. The second column has 5 levels. Try > levels(x$V2) > to see. > > When you use subset(), you get fewer rows, but the fact that there > were five levels is retained. > > Then, the plot function sees that that there are five levels, and > includes an empty place-holder for the level(s) with no data. > > Try something like > y <- data.frame(subset(x, V1 == "shirt")) > y$V2 <- factor(unique(format(y$V2))) > to force it to get rid of the now-empty factor levels. > > There are other ways to do this, I just don't happen to remember any > of them at the moment. > > If I'm right this is a question that comes up fairly often. Might > even be in the FAQs. > > -Don > > > At 12:15 PM -0700 6/16/07, Michelle Wynn wrote: > >I have read the R online help and wiki and I cannot seem to get something > to > >work the way I need it to. > > > >I want to create a new data frame from an subset of an existing data > frame > >which has no reference to the original superset. If you following this > >example, what I am trying to do may make more sense. > > > >I have a file with values like this: > > > >shirt,size,40 > >shirt,color,10 > >shirt,length,10 > >shirt,brand, 1 > >shoes,style,5 > >shoes,brand,4 > >shoes,color,1 > > > >and I read it into a dataframe like: > >x <- data.frame(read.delim("temp2.txt", sep=",", header=FALSE)) > > > >I then want to plot just a subset of this data (say shirts only)... > >y <- data.frame(subset(x, V1 == "shirt")) > >plot(x2[,2:3]) > > > >when I do, the resulting plot contains an empty value for 'color' even > >though my subset has no value in column V2 that equals 'color' anymore. > > > >Is it possible create a new data.frame that truly deletes the rows from > the > >original data frame that I am excluding with the subset parameter? > > > >Thanks, > >Michelle > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help@stat.math.ethz.ch mailing list > >stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > > -- > --------------------------------- > Don MacQueen > Lawrence Livermore National Laboratory > Livermore, CA, USA > 925-423-1062 > macq@llnl.gov > --------------------------------- >[[alternative HTML version deleted]]