Dear all, I have searched the forums for an answer - and there is plenty of questions along the same line - but none of the paproaches shown worked to my problem: I have a data frame that I get from a csv: summarystats<-as.data.frame(read.csv(file=f_summary)); where I have the columns Dataset, Class, Type, Category,.. Problem1: I want to find a subset of this frame, based on values in multiple columns What I do currently is: subset1 <- summarystats subset1<-subset1[subset1$Class == 1,] subset1<-subset1[subset1$Type == 1,] subset1<-subset1[subset1$Category == 1,] Now, this works, but is UGLY! I tried using "&&" or "&" , for isntance : subset1<-subset1[ (subset1$Class == 1)&& (subset1$Category == 1),] but it returns an empty data frame. Anyway, the main problem is Problem2: I have a second data frame - a square matrix (rownames == colnames), distm: distm<-read.table(file=f_simmatrix, sep = ","); what I want is select ONLY the columns and rows entries matching the above subset1: subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix of correct size, but with incorrect entries (established by visual inspection). this is the same as: selectedrows<-as.vector(subset1$Dataset) subset2<-distm[selectedrows,selectedrows] also verified using: rownames(subset2)%in% selectedrows [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE What am I missing? Thanks Martin
Hi, I got a bit lost with your explanation for your second problem. A reproducible example would DEFINITELY help to understand what you have and what you're trying to get. For your first problem subset1 <- summarystats[summarystats$Class == 1 & summarystats$Type == 1 & summarystats$Category == 1, ] should work. If not, maybe looking at str(summarystats) could help you figure out what the problem is (or could be) By the way in summarystats<-as.data.frame(read.csv(file=f_summary)) as.data.frame() is useless since read.csv() outputs a data.frame For your second problem, it's difficult for me to understand anything because I don't know what summarystats$Dataset is. Could there be a problem with factors here? HTH, Ivan Le 11/18/2010 15:39, Martin Tomko a ?crit :> Dear all, > I have searched the forums for an answer - and there is plenty of > questions along the same line - but none of the paproaches shown > worked to my problem: > > I have a data frame that I get from a csv: > > summarystats<-as.data.frame(read.csv(file=f_summary)); > > where I have the columns Dataset, Class, Type, Category,.. > Problem1: I want to find a subset of this frame, based on values in > multiple columns > What I do currently is: > > subset1 <- summarystats > subset1<-subset1[subset1$Class == 1,] > subset1<-subset1[subset1$Type == 1,] > subset1<-subset1[subset1$Category == 1,] > > Now, this works, but is UGLY! I tried using "&&" or "&" , for isntance > : subset1<-subset1[ (subset1$Class == 1)&& (subset1$Category == 1),] > but it returns an empty data frame. > > Anyway, the main problem is > Problem2: > I have a second data frame - a square matrix (rownames == colnames), > distm: > > distm<-read.table(file=f_simmatrix, sep = ","); > what I want is select ONLY the columns and rows entries matching the > above subset1: > > subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix of > correct size, but with incorrect entries (established by visual > inspection). > > this is the same as: > selectedrows<-as.vector(subset1$Dataset) > subset2<-distm[selectedrows,selectedrows] > > also verified using: > rownames(subset2)%in% selectedrows > [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE > [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE > [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE > [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > What am I missing? > > Thanks > Martin > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
Hi Gerrit, indeed, that works. Excellent tip! For reference, I did this: subset1<-subset(summarystats,(Type==1)&(Class==1)&(Category==1)) I am still not totally sure when one uses "&" amd when "&&" - I was under the impression that && stands for logical AND.... Thanks a lot. Martin On 11/18/2010 3:58 PM, Gerrit Eichner wrote:> Hello, Martin, > > as to your first problem, look at function subset(), and particularly > at its argument "subset". > > HTH, > > Gerrit > > > On Thu, 18 Nov 2010, Martin Tomko wrote: > >> Dear all, >> I have searched the forums for an answer - and there is plenty of >> questions along the same line - but none of the paproaches shown >> worked to my problem: >> >> I have a data frame that I get from a csv: >> >> summarystats<-as.data.frame(read.csv(file=f_summary)); >> >> where I have the columns Dataset, Class, Type, Category,.. >> Problem1: I want to find a subset of this frame, based on values in >> multiple columns >> What I do currently is: >> >> subset1 <- summarystats >> subset1<-subset1[subset1$Class == 1,] >> subset1<-subset1[subset1$Type == 1,] >> subset1<-subset1[subset1$Category == 1,] >> >> Now, this works, but is UGLY! I tried using "&&" or "&" , for >> isntance : subset1<-subset1[ (subset1$Class == 1)&& (subset1$Category >> == 1),] >> but it returns an empty data frame. >> >> Anyway, the main problem is >> Problem2: >> I have a second data frame - a square matrix (rownames == colnames), >> distm: >> >> distm<-read.table(file=f_simmatrix, sep = ","); >> what I want is select ONLY the columns and rows entries matching the >> above subset1: >> >> subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix of >> correct size, but with incorrect entries (established by visual >> inspection). >> >> this is the same as: >> selectedrows<-as.vector(subset1$Dataset) >> subset2<-distm[selectedrows,selectedrows] >> >> also verified using: >> rownames(subset2)%in% selectedrows >> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> FALSE >> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> FALSE FALSE >> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> FALSE FALSE >> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> >> What am I missing? >> >> Thanks >> Martin >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------- > AOR Dr. Gerrit Eichner Mathematical Institute, Room 212 > gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen > Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany > Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner > --------------------------------------------------------------------- >-- Martin Tomko Postdoctoral Research Assistant Geographic Information Systems Division Department of Geography University of Zurich - Irchel Winterthurerstr. 190 CH-8057 Zurich, Switzerland email: martin.tomko at geo.uzh.ch site: http://www.geo.uzh.ch/~mtomko mob: +41-788 629 558 tel: +41-44-6355256 fax: +41-44-6356848
On Nov 18, 2010, at 10:25 AM, Martin Tomko wrote:> Hi Gerrit, > indeed, that works. Excellent tip! > > For reference, I did this: > > subset1<-subset(summarystats,(Type==1)&(Class==1)&(Category==1)) > > I am still not totally sure when one uses "&" amd when "&&" - I was > under the impression that && stands for logical AND....Both stand for logical AND. "&" is used for vectorized comparisons, while "&&" will only compare the first elements of the two sides (usually, but apparently not always) with a warning if there are longer objects than expected. > c(1,0,1,0,1) & c(0,0,1,1,-1) [1] FALSE FALSE TRUE FALSE TRUE > c(1,0,1,0,1) && c(0,0,1,1,-1) [1] FALSE > c(1,0,1,0,1) && c(1,0,1,1,-1) [1] TRUE -- David.> > Thanks a lot. > > > Martin > > On 11/18/2010 3:58 PM, Gerrit Eichner wrote: >> Hello, Martin, >> >> as to your first problem, look at function subset(), and >> particularly at its argument "subset". >> >> HTH, >> >> Gerrit >> >> >> On Thu, 18 Nov 2010, Martin Tomko wrote: >> >>> Dear all, >>> I have searched the forums for an answer - and there is plenty of >>> questions along the same line - but none of the paproaches shown >>> worked to my problem: >>> >>> I have a data frame that I get from a csv: >>> >>> summarystats<-as.data.frame(read.csv(file=f_summary)); >>> >>> where I have the columns Dataset, Class, Type, Category,.. >>> Problem1: I want to find a subset of this frame, based on values >>> in multiple columns >>> What I do currently is: >>> >>> subset1 <- summarystats >>> subset1<-subset1[subset1$Class == 1,] >>> subset1<-subset1[subset1$Type == 1,] >>> subset1<-subset1[subset1$Category == 1,] >>> >>> Now, this works, but is UGLY! I tried using "&&" or "&" , for >>> isntance : subset1<-subset1[ (subset1$Class == 1)&& >>> (subset1$Category == 1),] >>> but it returns an empty data frame. >>> >>> Anyway, the main problem is >>> Problem2: >>> I have a second data frame - a square matrix (rownames == >>> colnames), distm: >>> >>> distm<-read.table(file=f_simmatrix, sep = ","); >>> what I want is select ONLY the columns and rows entries matching >>> the above subset1: >>> >>> subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix >>> of correct size, but with incorrect entries (established by visual >>> inspection). >>> >>> this is the same as: >>> selectedrows<-as.vector(subset1$Dataset) >>> subset2<-distm[selectedrows,selectedrows] >>> >>> also verified using: >>> rownames(subset2)%in% selectedrows >>> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>> FALSE FALSE >>> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>> FALSE FALSE >>> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>> FALSE FALSE >>> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE >>> >>> What am I missing? >>> >>> Thanks >>> Martin >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> --------------------------------------------------------------------- >> AOR Dr. Gerrit Eichner Mathematical Institute, Room 212 >> gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen >> Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany >> Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner >> --------------------------------------------------------------------- >> > > > -- > Martin Tomko > Postdoctoral Research Assistant > > Geographic Information Systems Division > Department of Geography > University of Zurich - Irchel > Winterthurerstr. 190 > CH-8057 Zurich, Switzerland > > email: martin.tomko at geo.uzh.ch > site: http://www.geo.uzh.ch/~mtomko > mob: +41-788 629 558 > tel: +41-44-6355256 > fax: +41-44-6356848 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Maybe Matching Threads
- using "factor" to eliminate unused levels without dropping other variables
- Creating objects (data.frames) with names stored in character vector
- Manage an unknown and variable number of data frames
- Is the aggregate function the best way to do this?
- Multiple options for a package