thr3ads.net - R help - [R] Data Manipulation using R [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Anup Nandialath

2007-Apr-18 00:03 UTC

[R] Data Manipulation using R

Dear Friends,

I have data set with around 220,000 rows and 17 columns. One of the columns is
an id variable which is grouped from 1000 through 9000. I need to perform the
following operations.

1) Remove all the observations with id's between 6000 and 6999

I tried using this method. 

remdat1 <- subset(data, ID<6000)
remdat2 <- subset(data, ID>=7000)
donedat <- rbind(remdat1, remdat2)

I check the last and first entry and found that it did not have ID values 6000.
Therefore I think that this might be correct, but is this the most efficient way
of doing this?

2) I need to remove observations within columns 3, 4, 6 and 8 when they are
negative. For instance if the number in column 3 is -4, then I need to delete
the entire observation. Can somebody help me with this too.

Thank and Regards

Anup

       
---------------------------------


	[[alternative HTML version deleted]]

Charilaos Skiadas

2007-Apr-18 01:09 UTC

head link

[R] Data Manipulation using R

On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote:
> Dear Friends,
>
> I have data set with around 220,000 rows and 17 columns. One of the  
> columns is an id variable which is grouped from 1000 through 9000.  
> I need to perform the following operations.
>
> 1) Remove all the observations with id's between 6000 and 6999
>
> I tried using this method.
>
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
>
> I check the last and first entry and found that it did not have ID  
> values 6000. Therefore I think that this might be correct, but is  
> this the most efficient way of doing this?
>The rbind is a bit unnecessary probably.

I think all you are missing for both questions is the "or" operator,
"|".  ( ?"|" )

Simply:

donedat <- subset(data, ID< 6000 | ID >=7000)

would do for this. Not sure about efficiency, but if the code is fast  
as it stands I wouldn't worry too much about it.
> 2) I need to remove observations within columns 3, 4, 6 and 8 when  
> they are negative. For instance if the number in column 3 is -4,  
> then I need to delete the entire observation. Can somebody help me  
> with this too.
The following should do it (untested, not sure if it would handle NA's):

toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8]
< 0
data[!toremove,]


If you want more columns than those 4, then we could perhaps look for  
a better line than the first line above.
> Thank and Regards
>
> Anup
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

Stephen Tucker

2007-Apr-18 17:50 UTC

head link

[R] Data Manipulation using R

...is this what you're looking for?

donedat <- subset(data,ID < 6000 | ID >= 7000)
findat <- donedat[-unique(rapply(donedat,function(x)
                                 which( x < 0 ))),,drop=FALSE]

the second line looks through each column, and finds the indices of negative
values - rapply() returns all of them as a vector; unique() removes
duplicated elements, and with negative indexing you remove these values from
donedat.

--- Anup Nandialath <anup_nandialath at yahoo.com> wrote:
> Dear Friends,
> 
> I have data set with around 220,000 rows and 17 columns. One of the columns
> is an id variable which is grouped from 1000 through 9000. I need to
> perform the following operations. 
> 
> 1) Remove all the observations with id's between 6000 and 6999
> 
> I tried using this method. 
> 
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
> 
> I check the last and first entry and found that it did not have ID values
> 6000. Therefore I think that this might be correct, but is this the most
> efficient way of doing this?
> 
> 2) I need to remove observations within columns 3, 4, 6 and 8 when they are
> negative. For instance if the number in column 3 is -4, then I need to
> delete the entire observation. Can somebody help me with this too.
> 
> Thank and Regards
> 
> Anup
> 
>        
> ---------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more maybe matching threads

R help - Apr 2007 - Data Manipulation using R

[R] Data Manipulation using R

[R] Data Manipulation using R

[R] Data Manipulation using R

Maybe Matching Threads