Hi folks, I created a subset of a dataframe (i.e., selected only men): subdata <- subset(data,data$gender==1) After a residual diagnostic of a regression analysis, I detected three outliers: linmod <- lm(y ~ x, data=subdata) plot(linmod) Say, the cases 11,22, and 33 were outliers. Here comes the problem: When I want to exclude these three cases in a further regression analysis, - for instance with linmod2 <- lm(y[-c(11,22,33)] ~ x[-c(11,22,33)], data=subdata) - it does not work. I guess this has something to do with this strange "row.names"-vector which has been added to the dataframe when creating the subset. I find it very strange why R gives the case numbers in the diagnostics but then doesn't allow me to use these numbers for further exclusion. Can anybody tell me: 1. what this row.names vector is 2. How I can refer to cases after creating a subset (e.g., in order to exclude them). Many thanks in advance, Best, Holger -- View this message in context: http://www.nabble.com/Eliminate-cases-in-a-subset-of-a-dataframe-tp25437374p25437374.html Sent from the R help mailing list archive at Nabble.com.
Hi Holger, On Sep 14, 2009, at 10:57 AM, Hollix wrote:> > Hi folks, > > I created a subset of a dataframe (i.e., selected only men): > > subdata <- subset(data,data$gender==1) > > After a residual diagnostic of a regression analysis, I detected three > outliers: > > linmod <- lm(y ~ x, data=subdata) > plot(linmod) > > Say, the cases 11,22, and 33 were outliers. > > Here comes the problem: When I want to exclude these three cases in a > further regression analysis, > - for instance with linmod2 <- lm(y[-c(11,22,33)] ~ x[-c(11,22,33)], > data=subdata) - it does not work.I suspect that your x matrix is probably a 2d matrix, so you might need to do: R> lm(y[-c(11,22,33)] ~ x[-c(11,22,33),] Note the trailing comma after the -c() vector when indexing into x! Perhaps you can just remove those rows from your data and keep your formula "clean", like so? R> linmod2 <- lm(y ~ x, data=subdata[-c(11,22,33),])> I guess this has something to do with this strange "row.names"- > vector which > has been added to the dataframe when creating the subset. I find it > very > strange why R gives the case numbers in the diagnostics but then > doesn't > allow me to use these numbers for further exclusion.Hmm .. not sure what you mean, but this won't get in your way either way if you are using integers to index into your data.frame.> Can anybody tell me: > 1. what this row.names vector is > 2. How I can refer to cases after creating a subset (e.g., in order to > exclude them).Refer to them by their position in the data.frame as you would if you didn't create a subset. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
linmod2 <- update(linmod, data = subdata[-c(11,22,33),]) Hollix wrote:> Hi folks, > > I created a subset of a dataframe (i.e., selected only men): > > subdata <- subset(data,data$gender==1) > > After a residual diagnostic of a regression analysis, I detected three > outliers: > > linmod <- lm(y ~ x, data=subdata) > plot(linmod) > > Say, the cases 11,22, and 33 were outliers. > > Here comes the problem: When I want to exclude these three cases in a > further regression analysis, > - for instance with linmod2 <- lm(y[-c(11,22,33)] ~ x[-c(11,22,33)], > data=subdata) - it does not work. > > I guess this has something to do with this strange "row.names"-vector which > has been added to the dataframe when creating the subset. I find it very > strange why R gives the case numbers in the diagnostics but then doesn't > allow me to use these numbers for further exclusion. > > Can anybody tell me: > 1. what this row.names vector is > 2. How I can refer to cases after creating a subset (e.g., in order to > exclude them). > > Many thanks in advance, > Best, > Holger-- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
At a quick glance, your code seems to be deleting columns not rows try y[-c(11,22,33), ] --- On Mon, 9/14/09, Hollix <Holger.steinmetz at web.de> wrote:> From: Hollix <Holger.steinmetz at web.de> > Subject: [R] Eliminate cases in a subset of a dataframe > To: r-help at r-project.org > Received: Monday, September 14, 2009, 10:57 AM > > Hi folks, > > I created a subset of a dataframe (i.e., selected only > men): > > subdata <- subset(data,data$gender==1) > > After a residual diagnostic of a regression analysis, I > detected three > outliers: > > linmod <- lm(y ~ x, data=subdata) > plot(linmod) > > Say, the cases 11,22, and 33 were outliers. > > Here comes the problem: When I want to exclude these three > cases in a > further regression analysis, > - for instance with linmod2 <- lm(y[-c(11,22,33)] ~ > x[-c(11,22,33)], > data=subdata) - it does not work. > > I guess this has something to do with this strange > "row.names"-vector which > has been added to the dataframe when creating the subset. I > find it very > strange why R gives the case numbers in the diagnostics but > then doesn't > allow me to use these numbers for further exclusion. > > Can anybody tell me: > 1. what this row.names vector is > 2. How I can refer to cases after creating a subset (e.g., > in order to > exclude them). > > Many thanks in advance, > Best, > Holger > -- > View this message in context: http://www.nabble.com/Eliminate-cases-in-a-subset-of-a-dataframe-tp25437374p25437374.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >__________________________________________________________________ [[elided Yahoo spam]]