thr3ads.net - R help - [R] problems subsetting [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Martin Tomko

2010-Nov-18 14:39 UTC

[R] problems subsetting

Dear all,
I have searched the forums for an answer - and there is plenty of 
questions along the same line - but none of the paproaches shown worked 
to my problem:

I have a data frame that I get from a csv:

summarystats<-as.data.frame(read.csv(file=f_summary));

where I have the columns Dataset, Class, Type, Category,..
Problem1:  I want to find a subset of this frame, based on values in 
multiple columns
What I do currently is:

subset1 <- summarystats
subset1<-subset1[subset1$Class == 1,]
subset1<-subset1[subset1$Type == 1,]
subset1<-subset1[subset1$Category == 1,]

Now, this works, but is UGLY! I tried using "&&" or
"&" , for isntance :
subset1<-subset1[ (subset1$Class == 1)&& (subset1$Category == 1),]
but it returns an empty data frame.

Anyway, the main problem is
Problem2:
I have a second data frame - a square matrix (rownames == colnames), distm:

distm<-read.table(file=f_simmatrix, sep = ",");
what I want is select ONLY the columns and rows entries matching the 
above subset1:

subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix of 
correct size, but with incorrect entries (established by visual inspection).

this is the same as:
selectedrows<-as.vector(subset1$Dataset)
subset2<-distm[selectedrows,selectedrows]

also verified using:
rownames(subset2)%in% selectedrows
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

What am I missing?

Thanks
Martin

Ivan Calandra

2010-Nov-18 14:52 UTC

head link

[R] problems subsetting

Hi,

I got a bit lost with your explanation for your second problem. A 
reproducible example would DEFINITELY help to understand what you have 
and what you're trying to get.

For your first problem
subset1 <- summarystats[summarystats$Class == 1 & summarystats$Type == 1 
& summarystats$Category == 1, ]
should work.
If not, maybe looking at str(summarystats) could help you figure out 
what the problem is (or could be)

By the way in
summarystats<-as.data.frame(read.csv(file=f_summary))
as.data.frame() is useless since read.csv() outputs a data.frame

For your second problem, it's difficult for me to understand anything 
because I don't know what summarystats$Dataset is. Could there be a 
problem with factors here?

HTH,
Ivan



Le 11/18/2010 15:39, Martin Tomko a ?crit :> Dear all,
> I have searched the forums for an answer - and there is plenty of 
> questions along the same line - but none of the paproaches shown 
> worked to my problem:
>
> I have a data frame that I get from a csv:
>
> summarystats<-as.data.frame(read.csv(file=f_summary));
>
> where I have the columns Dataset, Class, Type, Category,..
> Problem1:  I want to find a subset of this frame, based on values in 
> multiple columns
> What I do currently is:
>
> subset1 <- summarystats
> subset1<-subset1[subset1$Class == 1,]
> subset1<-subset1[subset1$Type == 1,]
> subset1<-subset1[subset1$Category == 1,]
>
> Now, this works, but is UGLY! I tried using "&&" or
"&" , for isntance
> : subset1<-subset1[ (subset1$Class == 1)&& (subset1$Category ==
1),]
> but it returns an empty data frame.
>
> Anyway, the main problem is
> Problem2:
> I have a second data frame - a square matrix (rownames == colnames), 
> distm:
>
> distm<-read.table(file=f_simmatrix, sep = ",");
> what I want is select ONLY the columns and rows entries matching the 
> above subset1:
>
> subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix of 
> correct size, but with incorrect entries (established by visual 
> inspection).
>
> this is the same as:
> selectedrows<-as.vector(subset1$Dataset)
> subset2<-distm[selectedrows,selectedrows]
>
> also verified using:
> rownames(subset2)%in% selectedrows
>  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
> FALSE
> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
> FALSE
> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
> FALSE
> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
> What am I missing?
>
> Thanks
> Martin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. S?ugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

Martin Tomko

2010-Nov-18 15:25 UTC

head link

[R] problems subsetting

Hi Gerrit,
indeed, that works. Excellent tip!

For reference, I did this:

subset1<-subset(summarystats,(Type==1)&(Class==1)&(Category==1))

I am still not totally sure when one uses "&" amd when
"&&"  - I was
under the impression that && stands for logical AND....

Thanks a lot.


Martin

On 11/18/2010 3:58 PM, Gerrit Eichner wrote:> Hello, Martin,
>
> as to your first problem, look at function subset(), and particularly 
> at its argument "subset".
>
> HTH,
>
> Gerrit
>
>
> On Thu, 18 Nov 2010, Martin Tomko wrote:
>
>> Dear all,
>> I have searched the forums for an answer - and there is plenty of 
>> questions along the same line - but none of the paproaches shown 
>> worked to my problem:
>>
>> I have a data frame that I get from a csv:
>>
>> summarystats<-as.data.frame(read.csv(file=f_summary));
>>
>> where I have the columns Dataset, Class, Type, Category,..
>> Problem1:  I want to find a subset of this frame, based on values in 
>> multiple columns
>> What I do currently is:
>>
>> subset1 <- summarystats
>> subset1<-subset1[subset1$Class == 1,]
>> subset1<-subset1[subset1$Type == 1,]
>> subset1<-subset1[subset1$Category == 1,]
>>
>> Now, this works, but is UGLY! I tried using "&&" or
"&" , for
>> isntance : subset1<-subset1[ (subset1$Class == 1)&&
(subset1$Category
>> == 1),]
>> but it returns an empty data frame.
>>
>> Anyway, the main problem is
>> Problem2:
>> I have a second data frame - a square matrix (rownames == colnames), 
>> distm:
>>
>> distm<-read.table(file=f_simmatrix, sep = ",");
>> what I want is select ONLY the columns and rows entries matching the 
>> above subset1:
>>
>> subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix of 
>> correct size, but with incorrect entries (established by visual 
>> inspection).
>>
>> this is the same as:
>> selectedrows<-as.vector(subset1$Dataset)
>> subset2<-distm[selectedrows,selectedrows]
>>
>> also verified using:
>> rownames(subset2)%in% selectedrows
>> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
>> FALSE
>> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
>> FALSE FALSE
>> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
>> FALSE FALSE
>> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>
>> What am I missing?
>>
>> Thanks
>> Martin
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------
> AOR Dr. Gerrit Eichner               Mathematical Institute, Room 212
> gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
> Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
> ---------------------------------------------------------------------
>

-- 
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email: 	martin.tomko at geo.uzh.ch
site:	http://www.geo.uzh.ch/~mtomko
mob: 	+41-788 629 558
tel: 	+41-44-6355256
fax: 	+41-44-6356848

David Winsemius

2010-Nov-18 15:42 UTC

head link

[R] problems subsetting

On Nov 18, 2010, at 10:25 AM, Martin Tomko wrote:
> Hi Gerrit,
> indeed, that works. Excellent tip!
>
> For reference, I did this:
>
> subset1<-subset(summarystats,(Type==1)&(Class==1)&(Category==1))
>
> I am still not totally sure when one uses "&" amd when
"&&"  - I was
> under the impression that && stands for logical AND....
Both stand for logical AND. "&" is used for vectorized
comparisons,
while "&&" will only compare the first elements of the two
sides
(usually, but apparently not always) with a warning if there are  
longer objects than expected.

 > c(1,0,1,0,1) & c(0,0,1,1,-1)
[1] FALSE FALSE  TRUE FALSE  TRUE

 > c(1,0,1,0,1) && c(0,0,1,1,-1)
[1] FALSE

 > c(1,0,1,0,1) && c(1,0,1,1,-1)
[1] TRUE

-- 
David.
>
> Thanks a lot.
>
>
> Martin
>
> On 11/18/2010 3:58 PM, Gerrit Eichner wrote:
>> Hello, Martin,
>>
>> as to your first problem, look at function subset(), and  
>> particularly at its argument "subset".
>>
>> HTH,
>>
>> Gerrit
>>
>>
>> On Thu, 18 Nov 2010, Martin Tomko wrote:
>>
>>> Dear all,
>>> I have searched the forums for an answer - and there is plenty of  
>>> questions along the same line - but none of the paproaches shown  
>>> worked to my problem:
>>>
>>> I have a data frame that I get from a csv:
>>>
>>> summarystats<-as.data.frame(read.csv(file=f_summary));
>>>
>>> where I have the columns Dataset, Class, Type, Category,..
>>> Problem1:  I want to find a subset of this frame, based on values  
>>> in multiple columns
>>> What I do currently is:
>>>
>>> subset1 <- summarystats
>>> subset1<-subset1[subset1$Class == 1,]
>>> subset1<-subset1[subset1$Type == 1,]
>>> subset1<-subset1[subset1$Category == 1,]
>>>
>>> Now, this works, but is UGLY! I tried using "&&"
or "&" , for
>>> isntance : subset1<-subset1[ (subset1$Class == 1)&&  
>>> (subset1$Category == 1),]
>>> but it returns an empty data frame.
>>>
>>> Anyway, the main problem is
>>> Problem2:
>>> I have a second data frame - a square matrix (rownames ==  
>>> colnames), distm:
>>>
>>> distm<-read.table(file=f_simmatrix, sep = ",");
>>> what I want is select ONLY the columns and rows entries matching  
>>> the above subset1:
>>>
>>> subset2<-distm[subset1$Dataset,subset1$Dataset] returns a matrix
>>> of correct size, but with incorrect entries (established by visual
>>> inspection).
>>>
>>> this is the same as:
>>> selectedrows<-as.vector(subset1$Dataset)
>>> subset2<-distm[selectedrows,selectedrows]
>>>
>>> also verified using:
>>> rownames(subset2)%in% selectedrows
>>> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
>>> FALSE FALSE
>>> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
>>> FALSE FALSE
>>> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  
>>> FALSE FALSE
>>> [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>>>
>>> What am I missing?
>>>
>>> Thanks
>>> Martin
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> ---------------------------------------------------------------------
>> AOR Dr. Gerrit Eichner               Mathematical Institute, Room 212
>> gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University
Giessen
>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>> Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
>> ---------------------------------------------------------------------
>>
>
>
> -- 
> Martin Tomko
> Postdoctoral Research Assistant
>
> Geographic Information Systems Division
> Department of Geography
> University of Zurich - Irchel
> Winterthurerstr. 190
> CH-8057 Zurich, Switzerland
>
> email: 	martin.tomko at geo.uzh.ch
> site:	http://www.geo.uzh.ch/~mtomko
> mob: 	+41-788 629 558
> tel: 	+41-44-6355256
> fax: 	+41-44-6356848
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Reasonably Related Threads

Search for more seemingly similar threads

R help - Nov 2010 - problems subsetting

[R] problems subsetting

[R] problems subsetting

[R] problems subsetting

[R] problems subsetting

Reasonably Related Threads